A list of all recent posts this page contains.
- Neural Evolution Algorithms for NES games - Summary
- FrankWolfe Algorithm and Solving Sparse Solutions of Atom Norm Constraint Problems
- Neural Turing Machines - Summary
- Deep Learning Modules - Summary
- Benchmarking mlpack libraries - Summary
- Augmented Recurrent Neural Networks - Summary
- Profiling for parallelization and parallel stochastic optimization methods - Summary
- Deep Reinforcement Learning Methods - Summary
- Deep Learning Module in mlpack - Week 11 + 12
- Neural Evolution Algorithms for NES games - Week 10
- Cross-Validation and Hyper-Parameter Tuning - Summary
- Deep Reinforcement Learning Methods - Week 10
My Google Summer of Code project was to create neural evolution algorithms for NES games. under which:
CNE:Conventional Neural Evolution.
CMAES:Covariance Matrix Adaptation Evolution Strategy.
NEAT:Neural Evolution through Augmented Topologies.
HyperNEAT which is exactly same as
NEAT but the genomes go through a substrate layer to get the links has to be implemented. All the evolutionary algorithm as planned before was supposed to be neural network optimization algorithms. But we later switched to using it as a generalized optimizer which can be used with other functions and machine learning methods, like logistic regression class implemented in
mlpack as well. Therefore
CNE were designed as an optimizer.
- My working PR (ready): This PR is all set to get merged 938
- References: The research paper that I referred for this algorithm is: The CMA Evolution Strategy: A Tutorial by Nikolas Hansen.
- Medium Blog Post: I have a medium blog describing how this algorithm works followed by describing
mlpackopen source library and a full code tutorial showing the optimization in action: CMAES algorithm in C++ using mlpack
- Test Cases:
CMA-ESalgorithm works by using clever probability distribution (using a modifiable Gaussian distribution) and moving the mean to minima and updating the covariance matrix in the process. Unlike other derivative methods for optimization, this method is great for discontinuous and ill conditioned functions. Therefore one of the performance tests was done using Rosenbrock function and that too iteratively for up to 50 dimensions. A change in 1e-5 in values leads to a very different result. A lot of time was spent to make the algorithm better and removing some major bugs. I can confidently present the optimizer now which works perfectly giving accurate results each time.
- Tutorial: A full descriptive tutorial on using
CMAEShas been added in the documentation covering step by step:
CMAESoptimizer class in short.
- Making the function Class.
- Setting up the constructor.
- Complete code example.
- Using as optimizer with other models.
This PR was very challenging for me. The time I decided was way too much less than actually needed. Also during this time, I was new to
armadillo library for which my mentor came to the rescue. I made a lot of modifications to the PR and also spend a lot more in debugging. This PR was indeed a bit scary learning experience. But finally, this PR got completed after optimization, improvements and bug fixes.
- PR (merged): 1088
- Divided into three sections: Optimizer, Test Cases, Tutorial
Conventional Neural Evolutionimplemented in
mlpackis an optimizer that works like biological evolution in nature which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of the population. This then keeps on repeating iteratively till several generations after which healthy candidates are found which perform well in getting the desired output.
- References: The research paper and reference code that I used to learn
CNEwere: Training Feedforward Neural Networks Using Genetic Algorithms, Evolving Artificial Neural Networks, Multineat, PR 753
- Test Cases: Test cases were written for three very different scenarios.
- Optimizing a given function: XOR function.
- Optimizing a vanilla network: It worked for both iris dataset and the thyroid dataset. But for making the test case faster only iris dataset is used.
- Testing with
mlpackexisting model: Logistic regression model was given the optimizer to learn a dataset.
- Tutorial: A full descriptive tutorial on using
CNEhas been added in the documentation covering step by step:
CNEoptimizer class in short.
- The constructor parameters.
- Creating a model using
- Complete code example.
- Logistic regression using
CNEas an optimizer.
This is my first merged PR for any open source project which now rests peacefully in
was not as painful asCMAES` and as always my mentor provided his valuable suggestions and code reviews. I loved working on it completely.
- My Working PR: This PR is not yet complete. 1105
- Reference: The main reference paper for
NEATalgorithm that I went through is: Evolving Neural Networks through Augmenting Topologies. Also a very beutiful implementation video is present on youtube that is worth watching: SethBling MarIO, SethBling marIO code implementation, Bang Lui PR 752
This class is already made by previous GSoC 2016 participant Bang Lui and so rather than reinventing the wheel my work was to make it live again and fix an important bug.
Reading his code was really a nice experience and I think it is one of those best ones in which u can see classes for neurons, links and using them a new class genomes (which is a neural network having links and neurons) and then species as a cluster of genomes and finally population as a cluster of species. To see the true essence of classes and object oriented programming so beautifully implemented.
Some of my time went on to make this one-year-old PR live and working again. I made more than 270 indentation fixes and other code optimization and style fixes that do not get caught by the cppCheck bot. A tutorial for documentation is also under process. I was able to find some errors which created some critical changes in evolution. Also, changes are under testing and debugging for now.
My future work is to complete NEAT till it gets merged successfully. Looking at the work done by Bang Lui the code is worth fixing and pushing to the codebase. For
HyperNEAT which is a small extension to
NEAT, a new PR will follow which is not a major task.
I loved the idea of adding Gaussian process implementation to
mlpack in one of the issues. I would love to contribute for this later.
I would like to thank
mlpack for giving me a chance to contribute to this beautiful library. I have a very different perspective for open source now. Big thanks to my mentor Marcus Edel for tolerating me. There were multiple times I asked for his help and he always responded in no time and with a lot of patience. Also, the detailed code review and improvements that he suggested were unmatched and clearly ensure that good quality code gets in the main codebase. Thanks to Ryan for the code review and final edits. Lastly Google for encouraging open source contribution. Looking forward to more contributions from my side and becoming a better engineer.
This post summarize my work for GSoC 2017.
The basic work of my GSoC 2017 is to implement the Frank Wolfe Algorithm (Conditional Gradient Algorithm) in various situations. From the paper of Jaggi RevisitingFrank-Wolfe, we know that many sparse optimization problem with atom domain constraint could be solved by this Frank-Wolfe type algorithm. So I implemented:
Classic Frank Wolfe Algorithm. The algorithm is implemented in class template, so that it could be used and extended in general forms. The algorithm needs 2 steps: solving the linear constraint problem and updating the solution. Users could extend the usage of the algorithm by writing their own classes defining these 2 steps. For constraint problem solver, I implemented the common case that the constraint is a unit lp ball. For updating step, I implemented the classic update rule and a line search method. I tested the convergence speed, and confirmed that the line search step, although adds some extra work, will give faster convergence speed.
Orthogonal Matching Pursuit. Jaggi's paper revealed that if the constraint domain of the FW algorithm is just an atom norm bound, then the linear constraint problem would be very easy to solve. For example, in the vector case, if the constraint is a bound of the lp norm, the solution could be solved explicitly. A straightforward application is the Orthogonal Matching Pursuit (OMP) algorithm, where the update step is just to reoptimize the coefficients in the span of the current solution space. I also implemented the regularization functionality in the constraint problem solver, to help users to get faster convergence.
- Although we can use this Frank-Wolfe type algorithm to solve
atom domain problemsin a greedy fashion, it is known that using it directly will give slow convergence and large support size. I implemented the tricks in the paper: RaoShahWright(2015)ForwardBackwardGreedy, which introduced:
Enhancement step. It reoptimizes the atom coefficients within the atom norm constraint domain. It is realized using the projected gradient method.
Truncation step. It tries to delete some inefficient atoms, which cancels the drawbacks of this FW type greedy algorithm. This will help us to get faster convergence, as well as a solution with better sparsity.
- I also implemented the
constraint solverfor other atom domains, such as the structured group norm constraint domain, and matrix Schatten norm constraint domain. An important application in the matrices case is the matrix completion problem. As
mlpackalready has that solver implemented using the sdp optimizer, I changed the original implementation into a class template, which can accept both sdp optimizer and FrankWolfe optimizer to solve the matrix completion problem. I tried to compare the different algorithms, it seems that although the Enhancement step and the Truncation step above were added, the convergence of the Franke Wolfe Type algorithm is still very slow.
matrix completion code using Frank-Wolfe type algorithm is already done. I will issue a new PR after my second PR got merged.
A lot of the codes I implemented are tricks to improve the convergence speed of the algorithm. However, most of them are only manually checked with small test problems that these methods are beneficial. We should test on large scale problems and see how everything really works. Also, we can compare the FW algorithm with other optimizers on similar problems, and see the pros and cons of using FW.
I am really grateful to the help from Stephen, Ryan, Marcus, and the whole
mlpack community. It is my first time to join the open source world and it is a wonderful experience. It is also so cool to know so many machine learning people and discuss ideas with them. I will keep in touch with the
mlpack community and I hope I can make more contributions to the project in the future.
Another amazing year of GSoC. This year was more exciting that my previous experiences. My major contribution in this year, was the
Neural Turing Machine, or
NTM is a neural network with additional external memory. As the entire architecture is end-to-end differentiable, NTM is able to learn tasks which require memory storage. Such as, reproducing the given sequence, sorting the sequence and so on. The code is complete and can be found in PR #1072.
I have also implemented
GRU as part of my project this year. While implementing
GRU, the current recurrent neural network framework is extended to support variable length sequences. This involved making changes to current
LSTM implementation as well. Given such changes, both
GRU are tested with variable length reber grammar sequences. As an interesting side project, I added something called recursive reber grammar for testing. Recursive reber grammars are basically reber grammar but they are recursiveky embedded in themselves. Embedded reber grammar is a special case of recursive reber grammar where the recursive depth is set to 1. Its good to see that both
GRU passes recursive reber grammar tests with various recursive depths. Although I have noiced that the cell state size required to pass the tests differes with recursive depth of the grammar. It will be a nice expeiment to see what is actually going on in cell state. Is
LSTM/GRU, storing the depth as a stack? That would be really cool. All this code is already merged but can be accessed by PR #1018.
As of now, I am yet to complete the implementation of
batch nomalization. Before batch normalization can be added to the framework, the framework should be refactored to support batch mode. A lot of people are helping to achieve this task. Even though the program is over, I will keep working on this issue and complete batch normalization as pat of my project.
I would like to congratulate Marcus and Ryan, my mentors, for surviving this summer project wih me :P I know how hard this must have been :) Thank you for all your support and guidance. Its been more than 4 years since I joined
mlpack community and its been a wonderful journey. I hope the coming years will bring more fun...
Goals for the summer
I These were the following algorithms I proposed for implementation:
ssRBM(spike slab rbm),
GAN(Generative Adversarial Networks),
I was able to implement
GAN this summer. Though unfortunately none of my code has been merged till now. You can find the PR's that opened below: RBM*, ssRBM, GAN, ResizeLayer Implementation, CrossEntropyWithLogits
RBMPR was later merged with the
rbm are ready to be merged. The
GAN PR is also mostly complete. I think only superficial style changes are required.
In addition to implementing these algorithms. I am happy to say that
GAN are very well tested. You can find some of the code here. I would also like to point out that for
RBM we are comparable to the
sklearn library in terms of speed(1.5x faster) and accuracy(similar). We couldn't benchmark the
ssRBM, as none of the libraries actually implement the
ssRBM PR. For
GAN's we tested out implementation with examples from keras and tensorflow. I am happy to say that in terms of visual reconstruction for the Gaussian distribution generation we were able to get comparable results with tensorflow. We also implemented the test for generation of Mnist images using FFN as the discriminator and generator network. The results are not as good as that of the keras example but we can see the training of Gan and that it converges to some stable results. You can look at the results below.
The major difficulty with implementing
RBM's was deciding on architecture that could easily be extendable to other extensions. Hence, we decided to implement wrapper layer and then implement the actual working of a type of rbm's as policies. This gives us the advantage of modularity.
Here are the results of our implementation on the digits dataset(smaller version Mnist).
The samples are generated from 1000 steps gibbs sampling
This is image is generated from deeplearning.net example
Our accuracy on the digits dataset was around
86% be this can go higher with some more parameter tuning.
Spike and Slab RBM were hardest to implement mainly because there are no existing implementations of the paper. We had to figure out some formulas ourselves like the free energy function. Some of the details in the paper are a little unclear like the radius parameter for rejecting samples etc. We also had to decide how we wanted to represent parameters such as spike variables, slab variables and lambda bias variables the paper states them to diagonal matrices but we found that most of the papers just have constant value for the diagonal entries so, we decided to represent them as scalar, giving us major speed and memory improvements.
ssRBM on digits data set again we were able to get an accuracy of around
~82% with the digits dataset. One of the reasons accuracy of
ssRBM is less than binary
RBM is because
ssRBM is suited for a dataset that has high correlation within the parts of the images something like the cifar dataset so the correlation can be captured by the slab variables. We tried
ssRBM on the cifar data set code but due to the large volume of data set and scarcity of the computation resources, we got around
70% accuracy but the results should be taken with a bit of salt as I saw that increasing of samples size lets to decrease in the accuracy.
We also tried bringing down the running time of the
ssRBM considerably and were able to do so. We are still looking at the issue to gain possibly more improvements. We tried the cifar data set for
ssRBM testing there were a couple of problems, the paper didn't give a detailed list of parameters that it used either for training the
ssRBM nor for the creation of features vectors from the
ssRBM for testing. Also, the cifar dataset set is quite huge and the problem was compounded by the fact the paper suggested that we create patches 8*8 from every image, leading to an increased size in the dataset set.
We implement the Generative Adversarial Networks in a pretty unique way than most of the other implementations in TensorFlow and Pytorch in the following way.
- Using single optimizer for training.
- No need for separate noise data.
- No need to have separate noise and real labels.
- Providing parameters such as Generator Update step defining when to update the generator network.
The main idea of implementing
GAN's this way came from Mikhail. The above formulation has the advantage of being easy to use and using fewer resources in terms of space requirements and making a lot more logical sense.
There are at present no techniques for testing
GAN's we tested our implementation on the following examples
- 1D Gaussian Test: This test aims to generate data from Gaussian of mean = 4 and var = 0.5 give that the noise function is the uniform distribution.
- Generation of Mnist Digits: We tested our
GANfor generation of images using a simple FNN as both the generator and discriminator network. We used only the 7's digits from Mnist data set for achieving the following results.
Here is te final image the network is trained for just 60 epoch's. We can get better results if we find better hyperparameters.
- DcGan(Strip Down Version) We also tested our
GAN'simplementation for generation of digits using CNN's. This basically would give us a simpler version of the famous
DCGANimplementation. For completing this test we had to add to the Resize Layer(Bilinear Interpolation of images) and the CrossEntropy with logits layer. Right now we are testing our implementation for this test you can find the code here.
We also added some of the training hacks from Soumith Chintala's workshop on
- Batch Training for discriminator and generator
- Pre-Training for discriminator
- Optimizing instead of
- Training the discriminator more using the Gradient Update step.
Most of my last 15-20 days were spent in hyper parameter tuning for the various test to work for
ssRBM. This a highly boring and tedious job. I am really looking forward to the HyperParameter Tuning PR.
One of the major problems faced also was that for testing Deep Learning modules correctly we require huge amounts of data and resources. I found my computing rig in some cases to be very slow for the purpose. Hence, I am looking forward to
Bandicoot and testing all this code on a GPU.
I also found the problem of training
GAN's particularly the
DCGAN example very time consuming there is no efficient way for parameter hyper tuning.
- I think ssRBM needs to be tested on data set such as the Cifar but first, we are required to confirm that the existing implementation of
ssRBMdo achieve good results on some parameters so we can replicate the same results. (Preferably this code should be run on GPU since it would lead to faster results and hence faster hyper parameter tuning).
- Implementation of Wasserstein
GAN. Lately, a lot of noise in the deep learning community of late has been around how great WGAN are compared to
GAN. Hence I think it is reasonable to implement it next.
- I think we need support for
BatchNormalisationand other image manipulation functions.
- I think something similar to hyperopt would be very useful.
- Implementation of stacking module for
The summer has been quite eventful. I am really happy to say that I have learned a lot about having to persevere through an implementation to make it work. The need for constant improvements of the code base is one of the important lessons I have learned. I also added some serious debugging skills to my skillset. Finally, I want to give a huge thank you to Mikhail Lozhnikov without whom the project would not have been possible. I would also like to thank Marcus for his valuable inputs from time to time and also guiding me with the code base. I really think this experience has made a more patient and much better programmer than I was before. Thanks again to the whole
mlpack community :)
This summer I was quite fortunate to work on the
mlpack project with my mentors Ryan Curtin and Marcus Edel as a part of the Google Summer of Code program. I worked on improving their benchmarking system and I would like to now describe what the work was and what I learnt using plethora of success and failures and what work has to be done in future post GSoC.
There were many
mlpack methods that had been added since the previous benchmarking system was built and in many of the current implementations only the runtime metric was being captured. Also many of the implementations were according to the old versions of the benchmarked libraries which used many deprecated methods which needed updating. Many methods were not implemented in
Weka and many libraries like
R were not yet benchmarked.
Work done before GSoC
Before officially becoming a Google Summer of Code intern I made the following contributions to the benchmarking system of
mlpack library starting March 2017:
- Added options to the randomforest and Decision tree
- Added the Code for DecisionTree in
- Implemented Approximate Nearest Neighbors using the
- Added test file for Random Forest, Logistic Regression and Adaboost.
- Made certain corrections to the svm and LARS implementation of sklearn and to the config file.
Work done during GSoC
Updating Scikit Implementations:The
scikitimplementations were updated to facilitate specifying more options and storing metrics like Accuracy, Precision, Recall, MSE along with runtime. The config.yaml file did not have any block for the Logistic Regression and ICA methods, so these were also added. The following table enumerates the parameters added to the various methods:
|LSHForest||Min_hash_match, n_candidates, radius and radius_cutoff_ratio|
|ALLKNN||Radius, tree_type, metric and n_jobs|
|Elastic Net||Max_iter, tolerance and selection|
|GMM||Tolerance and Max_iter|
|ICA||N_components, algorithm, fun, max_iter and tolerance.|
|Logistic Regression||Tolerance and Max Iterations|
- Merged PR's: 60, 61
Benchmarking Against Milk Machine Learning Toolkit for Python:Introduced a new library called
Milkto benchmark against. Implemented Adaboost, Decision Tree, Kmeans, Logistic Regression and Random Forest in the same. Wrote the Unit test files for them and added the config.yaml block for the same.
- Merged PR's: 64
Unit Test file for Adaboost:There was no unit test file for Adaboost present so added the same for the
- Merged PR's: 65
Updating the Shogun implementations:
Shogunhad been updated from 5.0.0 to 6.0.0 so most of the implementations needed updation. Also earlier only the runtime metric was being collected in the implementations, so the codes were changed to collect other metrics like Accuracy, Precision, Recall and MSE.
- Merged PR's: 79
Avoid building the model twice:In the
shogunimplementations, the model was being built twice, once while calculating the runtime and again while calculating the other metrics. This was avoided by returning the predictions made during runtime to the function calculating other metrics and all the
shogunimplementations were updated to do the same. This ensured that the implementations took lesser time to run.
- Merged PR's: 80, 81, 82, 83, 85, 86, 88, 91
Updating MATLAB implementations:There were around 3-4
MATLABimplementations present and earlier and these were mapping only runtime. Many implementations like Decision Tree, K-Nearest Classifier, Support Vector Classifier, Decision Tree Classifier, Lasso, LDA, QDA, Random Forest and Support Vector Regression were added along with python scripts to call them and unit test files to test them.
- Merged PR's: 89, 94
Updating WEKA implementations: The current
WEKAfolder hosted around 3 implementations and after weka got updated those scripts had become outdated . So the presently benchmarked methods had to be re-implemented and many other methods were also added. After updating the
wekafolder holds Decision Stump, Decision Tree, Logistic Regression, Naive Bayes, Perceptron, Random Forest, ALLKNN, KMEANS, Linear Regression and PCA implementations. The python scripts to call and store the results and the Unit test files were also implemented.
- Merged PR’s: 95
Benchmarking against Dlib-ml:Introduced a new C++ Machine Learning library called
dlib-mlto the benchmarking system. Added implementations like SVM, KMEANS, Approximate Nearest Neighbors and ALLKNN. Wrote the install script, the python scripts to call them and the unit test files for the same.
- Merged PR's: 96, 97, 98, 99
Make specifying K Necessary:Some of the K-Nearest Neighbors implementations took the default value of k as 5 while others did not. So to ensure uniformity made the option of specifying k mandatory in the implementations.
- Merged PR's: 101
Benchmarking Against R:This is something that I was personally inclined to do. There is a worldwide debate on
Rand I thought that this is the best platform to settle it to some extent and see which one performs faster. Using
mlr - The machine learning framework for Rimplemented methods like NBC, Adaboost, QDA, LDA, Decision Tree, K-Nearest Classifier, Random Forest, Support Vector Classifier, Lasso, Linear Regression and Support Vector Regression. Also wrote the Python Scripts and Unit Test file for the same.
- Merged PR’s: 102, 103, 104, 105
So basically throughout the summer, I:
- Committed around
5000lines of code.
27of my Pull Requests merged.
Technical Skills Developed
R:I had little to no experience when it came to working on
R. During the course of the project I learnt how to build R from scratch and implement all the major Machine Learning Algorithms in it.
MATLAB:I had never used
MATLABbefore and now I am well versed in calling
MATLABcodes from python scripts, Using Statistical and Machine Learning toolbox and then saving the results and sending them back to the python script that called the code.
Weka:I had only used the
WekaGUI tool before and had never used it in a JAVA code. My mentors taught me how to do that.
Dlib-ml:I had never implemented Machine Learning Algorithms in C++ earlier. This project gave me an opportunity to do that.
While I learnt useful tools and languages I also gained some general advice.
- Don’t be under the assumption that anything is easy: While writing my proposal I assumed that working on new libraries and improving the old ones would be a week's task each but when I started working on them I had to spend far more time that planned originally which resulted in the last part of the project (the webpage) getting delayed.
- It is important to be flexible and willing to change your priorities as many obstacles might come up.
- Always ask for help as “Help will always be given at
mlpackto those who need it.” ( Could not resist quoting Albus Dumbledore :-p).
Plans after GSoC
I wish to continue contributing to the benchmarking system and the initial plans are adding
Shark libraries to the benchmarks. Thereafter writing a manuscript along with my Mentors Ryan Curtin and Marcus Edel on the results. Looking forward to a long time association with them.
In closing I would like to thank Ryan and Marcus for being such amazing mentors, for teaching me so many things and for putting up with me whenever I used to struggle. This is the best internship experience I’ve ever had and I hope I can meet them in person soon.
This post is the summary of my GSoC work.
Relevant pull requests
- Augmented RNN models - benchmarking framework - merged. This PR contains documented and tested benchmarking classes for simple "algorithmic" tasks -
- Added LSTM baseline model with copy task benchmark - paused. This PR contains the working implementation for LSTM model working on the three tasks from the previous PR.
- Adding optimization features not related with my GSoC project - merged. This PR contains gradient clipping implementation and implementation of
CrossEntropyErrorfunction. The name turned out to be a misnomer - the former helped to work around the gradient explosion problem, and the latter one accelerated training (as it was mentioned earlier, this error function is penalizing the mistakes more sharply than
- Implementing Hierarchical Memory Unit - work in progress. Right now this PR contains:
- A documented implementation of the
TreeMemory- a tree-like data structure used for memory access in HAM unit + unit tests for it.
- A documented implementation of the
HAMUnitforward pass + unit tests for it.
- A documented implementation of the
First of all, I should admit that my main surprise was that I vastly underestimated the benchmarking framework part. During implementation, there appeared a lot of issues with the tasks themselves and making them work properly on LSTM even with small input sizes. This has been resolved, but it took way more time than it was planned initially.
Another source of interesting moments was the optimization PR. Both gradient clipping and cross-entropy error function were comparatively easy to implement. However, a lot of effort was invested into testing it and making it compliant with the rest of the
mlpack codebase. This is undoubtedly the insight into industry programming I made during this summer. To clarify, the "industry" programming here is opposed to the competitive programming, which was my main source of programming experience before GSoC.
I would like to thank my mentors, Marcus Edel and Ryan Curtin, for sharing their technical experience and giving well-timed and clear recommendations during the work. Also, I would like to thank Google for this brilliant program, which gave me a chance to have a cool summer as well as gain new experience in the open-source real-world project. (oh, and to get used to
git as well ^_^)
As we are fast approaching the final date ending final evaluation period for GSoC for this year, I think now is the right time to make a cumulative account of all the work done over the past 12 weeks during this project.
Initially, the accepted proposal was about the implementation of stochastic optimization methods, but after an agreement, we decided to include some aspects of the "Profiling for parallelisation" proposal as part of this project.
The primary goal of the "Profiling for parallelisation" project was to find parts of the library which could benefit from a parallel implementation and modify them to take advantage of multiple cores using OpenMP. On the other hand, "Parallel stochastic optimization methods" was concerned with the implementation of parallel approaches to optimization algorithms.
Here is the list of Pull Requests with work related to this project. These PRs have all the code that was intended to be committed to the library :
- #1015:Add tests to the CTest framework
- #1024:Parallel implementation of NaiveKMeans
- #1037:Implementation of parallel SGD
- #1075:Implementation of Stochastic Coordinate Descent
In addition to the above library code, I wrote a number of benchmarks and examples to try out different ideas and gather more information about the performance of existing implementations in and outside of
mlpack. Some important ones are listed here.
A weekly update of the progress on the project was posted as blog posts on this blog, which can be cumulatively found here.
- Parallel testing framework: This part of the project involved adding the tests under the Boost unit testing framework to the CTest test runner included with CMake, which would allow for spawning of multiple jobs and make testing faster. For automating this process, a CMake script for parsing the required test suite names was written. Overall, there is about 2x reduction in testing times with four jobs.
- Profiling for parallelisation: I wrote benchmarks for finding computational bottlenecks in
mlpackcode. KMeans was found to be easily parallelizable with
OpenMP, giving around 3.5x speedup with the parallel implementation. Ideas for profiling and parallelisation were drawn from this paper.
- Implementation of parallel
SGD: I worked on the implementation of a parallel variant of Stochastic Gradient descent, based on the "HOGWILD!" scheme of unsynchronised updates between various threads, described by this paper. The algorithm has been implemented with OpenMP, which was able to generate the right primitives for a lock-free approach, as required by the paper. Convergence is much faster due to the computation of sparse gradients, and the gradient updates take much less time as compared to other optimizers.
- Implementation of parallel
SCD: This involved implementation of another optimizer which worked with partial gradients of the loss functions to update disjoint parts of the decision variable, being trivially parallelizable. The PR for the sequential optimizer is in the final phase for the merge. The changes involved for parallelization should be trivial, as most of the required work has been done in the sequential version. As more changes needed to be added to the
mlpack, some documentation highlighting the various types of FunctionType requirements was also added as a part of the work on this optimizer.
Working on this project made me appreciate the kind of power a robust API like OpenMP brings. With minimal changes to existing code, fast, cross-platform multi core implementations could be written which scale nearly linearly with the number of threads. Particularly, the implementation of
HOGWILD! was much less complex with OpenMP involved to take over the work-sharing and synchronisation, while providing equivalent (or better) performance than a hand-rolled implementation of a thread pool and memory barriers.
Some ideas which would build upon work done during this project over the summer :
- More algorithms in
mlpackneed to be profiled for potential parallelization. One particularly interesting and major change would be with the introduction of parallel tree traversers (although the impact of such an implementation needs to be analyzed if it will be worth the time and effort for parallelization).
- The implementation of
SCDneeds some trivial changes to work on multiple cores.
SCDcould also use a line search algorithm instead of gradients to optimize the function under consideration, possibly leading to a much faster optimizer.
- An in-depth empirical analysis of the implemented parallel optimizers, with different problems and datasets.
This summer has been a great learning experience for me, from not only the work done during my project but also seeing the constant flow of interesting work from the projects from other students. I would like to thank Yannis, Ryan and Marcus for their valuable time and guidance over the course of the entire summer to make this project an fundamental part of my experience as a developer.
This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.
My work mainly locates in
q_learning.hpp: the main entrance for (double) q learning
async_learning.hpp: the main entrance for async methods
training_config.hpp: wrapper for hyper parameters
environment: implementation of two classical control problems, i.e. mountain car and cart pole
policy: implementation of several behavior policies
replay: implementation of experience replay
network: wrapper for non-standard networks (e.g actor critic network without shared layers)
worker: implementation of async rl methods
Refactoring of existing neural network components is another important part of my work
- Detachment of
optimizer: This influences all the optimizers and most test cases.
- PVS convention: Now many of mlpack components comply with
pass-by-reference, which is less flexible. I proposed the idea of
pass-by-valuein combination with
std::move. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase.
- Exposure of
Backward: Before this we only have
Train, which may lead to duplicate computation in some case. By the exposure of
Backward, we can address this issue.
- Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the
Aliaslayer to address this issue. This is also a huge change, which will influence all the visitors.
- Misc update of old APIs.
Detailed usage can be found in the two test cases:
q_learning_test.cpp. You can run the test cases by
bin/mlpack_test -t QLearningTest and
bin/mlpack_test -t AsyncLearningTest.
In total, I contributed following PRs:
- Implementation of Alias layer
- Async n-step q-learning and one step sarsa
- Implement a framework of DQN and asynchronous learning methods
- Implementation of async one step q-learning
- Add aggregated policy for async rl methods
- Support batched forward and backward for FFN
- Update Optimizer API
- Add new API for some optimizers
- Basic DQN
- Add epsilon greedy policy for DQN
- Add random replay for DQN
- Fix a bug in gaussian init
- Refactor FFN
- Implement two classical control problems for testing reinforcement learning method
- Fix bug of variadic template parameters of Optimizer
The most challenging parts are:
- Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use
episodeas a job unit.
Aliaslayer: This blocked me most and is still blocking me. We need a deep understanding of
Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).
I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.
This week was mostly spent in writing test for the GAN PR and Fixing changes in the GAN PR. We wanted to implement the Orilley example. I faced a lot of problems implementing this example as valid convolution padding used in the example leads to creation of very weird padding size which didn't make sense at the time. For taking care of that problem I had to implement the resize layer which basically is a interpolation layer using bilinear interpolation for scaling or de-scaling and image.
This week we also refactored the code using Lozhnikov's idea. Basically we want the predictors matrix to remain constant for the both the generator and discriminator network.
I also tried to get the 1d gaussian test and digits dataset to work this week. The parameter tuning for the digits is dataset pretty hard and we end up generating random noise most of time. We also added support for convents with gan's this week.
This week started by working on my previous PRs.
After some small final touches the CNE PR is ready. My mentor provided very detailed code review. The time that he put into for finding small spaces and other style issues that I missed in between code really ensures that MLPack gets good quality code merged. I am even now more familiar with the codebase and is doing a lot less mistakes.
The next was again working with the CMAES Repo
- The next task was a -nan error that i used to see rarely during test results.
- Also the optimization will some time go a bit wrong and will create very minute deviation from the exact answer. I was converging rosenbrock function for 45 dimension and a very small deviation leads to a different output.
Both the errors got solved and this algorithm now creates accurate result and no error is seen at any time. To make sure i even ran hundreds of iterations in my local machine.
Next was NEAT PR -
The paper i went for is-
also code from sethBling with his famous marIO video on genetic algorithm was helpful.
I finished reading the codebase by last year contributor Bang Lui and is setting up my local machine for running and testing the game today.
This week will be to debug the whole process and see for potential errors in the code. Also adding a tutorial for NEAT will add my small contribution too :)
My GSoC project can be summarized in the following PRs.
- #1016 Add accuracy and mean squared error
- #1020 Add a method form detection tool
- #1031 Add a meta information extractor
- #1044 Add simple cross-validation
- #1074 Add precision, recall and F1
- #1095 Add k-fold cross-validation
- #1101 Add a hyper-parameter tuning module
- #1108 Add a tutorial for the cross-validation and hyper-parameter tuning modules
During the summer I have implemented two
cross-validation strategies. The first one is simple: it splits data into two sets - training and validation sets - and then runs training on the training set and evaluates performance on the validation set. The second cross-validation strategy is
The cross-validation module has the following features.
- Data preparation happens only once when a cross-validation object is constructed.
- In many cases you don't need to be explicit about data of what types you are going to pass: the developed meta-programming tools are used to deduce types for storing data by inspecting machine learning algorithms you specify. It also allows you to pass objects that can be converted to the target types (e.g. objects of types that used by armadillo to store intermediate results).
- The interface is designed in the way that you first pass common (among machine learning algorithms implemented in
mlpack) constructor parameters including data, number of classes, and information about dimensions (datasetInfo). During this step your compiler will check whether the specified machine learning algorithm accepts these parameters. If some check fails, a human-readable message will be printed.
Hyper-parameter tuning module
Another part of my GSoC project is a
hyper-parameter tuning module. It lets you optimize hyper-parameters using one of the cross-validation strategies in couple with one of the metrics as objective.
The implemented module has the following features.
- It has the same interface for constructors as in the cross-validation module, and all features mentioned for the
cross-validation moduleabout construction are applied here too.
- What strategy to use during optimization is specified by a user. By now support for two strategies has been implemented. With the first strategy,
GridSearch, users specify a set of values for each hyper-parameter they want to optimize. The second strategy,
GradientDescent, uses numerically calculated gradients of functions based on cross-validation to optimize real-valued hyper-parameters.
- If some hyper-parameter should not be optimized, it can be marked with the Fixed function.
I have implemented different metrics that can be used for cross-validation and hyper-parameter tuning. These include accuracy, mean squared error, precision, recall, and F1.
I thank Ryan Curtin for his mentorship during the summer and for his passion to provide deep and thoughtful responses to my works. I also want to thank Marcus Edel for his reviews and for being responsive to my general questions about
This week I mainly worked on the issue of sharing layers between two networks. We finally figured out that
Alias layer is the only possible solution, because the memory should be contiguous –
shared_ptr cann't do that. However the implementation of
Alias layer is not easy. I managed to eliminate the strange
void* and make the
include clause elegant by rearranging all the visitors. However I sill got stuck with a stange bad memory access error. I spent two days with no progress. So I decided to come back to the implementation of A3C, without shared layers.