This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.
My work mainly locates in
q_learning.hpp: the main entrance for (double) q learning
async_learning.hpp: the main entrance for async methods
training_config.hpp: wrapper for hyper parameters
environment: implementation of two classical control problems, i.e. mountain car and cart pole
policy: implementation of several behavior policies
replay: implementation of experience replay
network: wrapper for non-standard networks (e.g actor critic network without shared layers)
worker: implementation of async rl methods
Refactoring of existing neural network components is another important part of my work
- Detachment of
optimizer: This influences all the optimizers and most test cases.
- PVS convention: Now many of mlpack components comply with
pass-by-reference, which is less flexible. I proposed the idea of
pass-by-valuein combination with
std::move. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase.
- Exposure of
Backward: Before this we only have
Train, which may lead to duplicate computation in some case. By the exposure of
Backward, we can address this issue.
- Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the
Aliaslayer to address this issue. This is also a huge change, which will influence all the visitors.
- Misc update of old APIs.
Detailed usage can be found in the two test cases:
q_learning_test.cpp. You can run the test cases by
bin/mlpack_test -t QLearningTest and
bin/mlpack_test -t AsyncLearningTest.
In total, I contributed following PRs:
- Implementation of Alias layer
- Async n-step q-learning and one step sarsa
- Implement a framework of DQN and asynchronous learning methods
- Implementation of async one step q-learning
- Add aggregated policy for async rl methods
- Support batched forward and backward for FFN
- Update Optimizer API
- Add new API for some optimizers
- Basic DQN
- Add epsilon greedy policy for DQN
- Add random replay for DQN
- Fix a bug in gaussian init
- Refactor FFN
- Implement two classical control problems for testing reinforcement learning method
- Fix bug of variadic template parameters of Optimizer
The most challenging parts are:
- Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use
episodeas a job unit.
Aliaslayer: This blocked me most and is still blocking me. We need a deep understanding of
Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).
I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.
Generated by 1.8.13