mlpack  blog
mlpack Documentation

Table of Contents

A list of all recent posts this page contains.

Neural Evolution Algorithms for NES games - Summary

Neural Evolution Algorithms for NES games - Summary

My Google Summer of Code project was to create neural evolution algorithms for NES games. under which:

  • CNE: Conventional Neural Evolution.
  • CMAES: Covariance Matrix Adaptation Evolution Strategy.
  • NEAT: Neural Evolution through Augmented Topologies.

and HyperNEAT which is exactly same as NEAT but the genomes go through a substrate layer to get the links has to be implemented. All the evolutionary algorithm as planned before was supposed to be neural network optimization algorithms. But we later switched to using it as a generalized optimizer which can be used with other functions and machine learning methods, like logistic regression class implemented in mlpack as well. Therefore CMAES and CNE were designed as an optimizer.

CMAES Implementation

  • My working PR (ready): This PR is all set to get merged 938
  • References: The research paper that I referred for this algorithm is: The CMA Evolution Strategy: A Tutorial by Nikolas Hansen.
  • Medium Blog Post: I have a medium blog describing how this algorithm works followed by describing mlpack open source library and a full code tutorial showing the optimization in action: CMAES algorithm in C++ using mlpack
  • Test Cases: CMA-ES algorithm works by using clever probability distribution (using a modifiable Gaussian distribution) and moving the mean to minima and updating the covariance matrix in the process. Unlike other derivative methods for optimization, this method is great for discontinuous and ill conditioned functions. Therefore one of the performance tests was done using Rosenbrock function and that too iteratively for up to 50 dimensions. A change in 1e-5 in values leads to a very different result. A lot of time was spent to make the algorithm better and removing some major bugs. I can confidently present the optimizer now which works perfectly giving accurate results each time.
  • Tutorial: A full descriptive tutorial on using CMAES has been added in the documentation covering step by step:
    1. The CMAES optimizer class in short.
    2. Making the function Class.
    3. Setting up the constructor.
    4. Complete code example.
    5. Using as optimizer with other models.

This PR was very challenging for me. The time I decided was way too much less than actually needed. Also during this time, I was new to armadillo library for which my mentor came to the rescue. I made a lot of modifications to the PR and also spend a lot more in debugging. This PR was indeed a bit scary learning experience. But finally, this PR got completed after optimization, improvements and bug fixes.

CNE Implementation

  • PR (merged): 1088
  • Divided into three sections: Optimizer, Test Cases, Tutorial
  • Optimizer: Conventional Neural Evolution implemented in mlpack is an optimizer that works like biological evolution in nature which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of the population. This then keeps on repeating iteratively till several generations after which healthy candidates are found which perform well in getting the desired output.
  • References: The research paper and reference code that I used to learn CNE were: Training Feedforward Neural Networks Using Genetic Algorithms, Evolving Artificial Neural Networks, Multineat, PR 753
  • Test Cases: Test cases were written for three very different scenarios.
    1. Optimizing a given function: XOR function.
    2. Optimizing a vanilla network: It worked for both iris dataset and the thyroid dataset. But for making the test case faster only iris dataset is used.
    3. Testing with mlpack existing model: Logistic regression model was given the optimizer to learn a dataset.
  • Tutorial: A full descriptive tutorial on using CNE has been added in the documentation covering step by step:
    1. The CNE optimizer class in short.
    2. The constructor parameters.
    3. Creating a model using mlpack's ANN Class.
    4. Complete code example.
    5. Logistic regression using CNE as an optimizer.

This is my first merged PR for any open source project which now rests peacefully in mlpack servers.CNEwas not as painful asCMAES` and as always my mentor provided his valuable suggestions and code reviews. I loved working on it completely.

NEAT Implementation

This class is already made by previous GSoC 2016 participant Bang Lui and so rather than reinventing the wheel my work was to make it live again and fix an important bug.

Reading his code was really a nice experience and I think it is one of those best ones in which u can see classes for neurons, links and using them a new class genomes (which is a neural network having links and neurons) and then species as a cluster of genomes and finally population as a cluster of species. To see the true essence of classes and object oriented programming so beautifully implemented.

Some of my time went on to make this one-year-old PR live and working again. I made more than 270 indentation fixes and other code optimization and style fixes that do not get caught by the cppCheck bot. A tutorial for documentation is also under process. I was able to find some errors which created some critical changes in evolution. Also, changes are under testing and debugging for now.

Future work

My future work is to complete NEAT till it gets merged successfully. Looking at the work done by Bang Lui the code is worth fixing and pushing to the codebase. For HyperNEAT which is a small extension to NEAT, a new PR will follow which is not a major task.

I loved the idea of adding Gaussian process implementation to mlpack in one of the issues. I would love to contribute for this later.


I would like to thank mlpack for giving me a chance to contribute to this beautiful library. I have a very different perspective for open source now. Big thanks to my mentor Marcus Edel for tolerating me. There were multiple times I asked for his help and he always responded in no time and with a lot of patience. Also, the detailed code review and improvements that he suggested were unmatched and clearly ensure that good quality code gets in the main codebase. Thanks to Ryan for the code review and final edits. Lastly Google for encouraging open source contribution. Looking forward to more contributions from my side and becoming a better engineer.

FrankWolfe Algorithm and Solving Sparse Solutions of Atom Norm Constraint Problems

FrankWolfe Algorithm and Solving Sparse Solutions of Atom Norm Constraint Problems

This post summarize my work for GSoC 2017.

My Work

The basic work of my GSoC 2017 is to implement the Frank Wolfe Algorithm (Conditional Gradient Algorithm) in various situations. From the paper of Jaggi RevisitingFrank-Wolfe, we know that many sparse optimization problem with atom domain constraint could be solved by this Frank-Wolfe type algorithm. So I implemented:

  • Classic Frank Wolfe Algorithm. The algorithm is implemented in class template, so that it could be used and extended in general forms. The algorithm needs 2 steps: solving the linear constraint problem and updating the solution. Users could extend the usage of the algorithm by writing their own classes defining these 2 steps. For constraint problem solver, I implemented the common case that the constraint is a unit lp ball. For updating step, I implemented the classic update rule and a line search method. I tested the convergence speed, and confirmed that the line search step, although adds some extra work, will give faster convergence speed.
  • Orthogonal Matching Pursuit. Jaggi's paper revealed that if the constraint domain of the FW algorithm is just an atom norm bound, then the linear constraint problem would be very easy to solve. For example, in the vector case, if the constraint is a bound of the lp norm, the solution could be solved explicitly. A straightforward application is the Orthogonal Matching Pursuit (OMP) algorithm, where the update step is just to reoptimize the coefficients in the span of the current solution space. I also implemented the regularization functionality in the constraint problem solver, to help users to get faster convergence.
  • Although we can use this Frank-Wolfe type algorithm to solve atom domain problems in a greedy fashion, it is known that using it directly will give slow convergence and large support size. I implemented the tricks in the paper: RaoShahWright(2015)ForwardBackwardGreedy, which introduced:
    1. Enhancement step. It reoptimizes the atom coefficients within the atom norm constraint domain. It is realized using the projected gradient method.
    2. Truncation step. It tries to delete some inefficient atoms, which cancels the drawbacks of this FW type greedy algorithm. This will help us to get faster convergence, as well as a solution with better sparsity.
  • I also implemented the constraint solver for other atom domains, such as the structured group norm constraint domain, and matrix Schatten norm constraint domain. An important application in the matrices case is the matrix completion problem. As mlpack already has that solver implemented using the sdp optimizer, I changed the original implementation into a class template, which can accept both sdp optimizer and FrankWolfe optimizer to solve the matrix completion problem. I tried to compare the different algorithms, it seems that although the Enhancement step and the Truncation step above were added, the convergence of the Franke Wolfe Type algorithm is still very slow.

I currently made 2 PRs: 1041, 1087

The matrix completion code using Frank-Wolfe type algorithm is already done. I will issue a new PR after my second PR got merged.

Future Work

A lot of the codes I implemented are tricks to improve the convergence speed of the algorithm. However, most of them are only manually checked with small test problems that these methods are beneficial. We should test on large scale problems and see how everything really works. Also, we can compare the FW algorithm with other optimizers on similar problems, and see the pros and cons of using FW.


I am really grateful to the help from Stephen, Ryan, Marcus, and the whole mlpack community. It is my first time to join the open source world and it is a wonderful experience. It is also so cool to know so many machine learning people and discuss ideas with them. I will keep in touch with the mlpack community and I hope I can make more contributions to the project in the future.

Neural Turing Machines - Summary

Neural Turing Machines - Summary

Another amazing year of GSoC. This year was more exciting that my previous experiences. My major contribution in this year, was the Neural Turing Machine, or NTM. NTM is a neural network with additional external memory. As the entire architecture is end-to-end differentiable, NTM is able to learn tasks which require memory storage. Such as, reproducing the given sequence, sorting the sequence and so on. The code is complete and can be found in PR #1072.

I have also implemented GRU as part of my project this year. While implementing GRU, the current recurrent neural network framework is extended to support variable length sequences. This involved making changes to current LSTM implementation as well. Given such changes, both LSTM and GRU are tested with variable length reber grammar sequences. As an interesting side project, I added something called recursive reber grammar for testing. Recursive reber grammars are basically reber grammar but they are recursiveky embedded in themselves. Embedded reber grammar is a special case of recursive reber grammar where the recursive depth is set to 1. Its good to see that both LSTM and GRU passes recursive reber grammar tests with various recursive depths. Although I have noiced that the cell state size required to pass the tests differes with recursive depth of the grammar. It will be a nice expeiment to see what is actually going on in cell state. Is LSTM/GRU, storing the depth as a stack? That would be really cool. All this code is already merged but can be accessed by PR #1018.

As of now, I am yet to complete the implementation of batch nomalization. Before batch normalization can be added to the framework, the framework should be refactored to support batch mode. A lot of people are helping to achieve this task. Even though the program is over, I will keep working on this issue and complete batch normalization as pat of my project.

I would like to congratulate Marcus and Ryan, my mentors, for surviving this summer project wih me :P I know how hard this must have been :) Thank you for all your support and guidance. Its been more than 4 years since I joined mlpack community and its been a wonderful journey. I hope the coming years will bring more fun...

Deep Learning Modules - Summary

Deep Learning Modules - Summary

Goals for the summer

I These were the following algorithms I proposed for implementation: RBM, ssRBM(spike slab rbm), GAN(Generative Adversarial Networks), stackedGAN

Executive Summary

I was able to implement RBM, ssRBM and GAN this summer. Though unfortunately none of my code has been merged till now. You can find the PR's that opened below: RBM*, ssRBM, GAN, ResizeLayer Implementation, CrossEntropyWithLogits

RBM PR was later merged with the ssRBM PR.

The ssRBM and rbm are ready to be merged. The GAN PR is also mostly complete. I think only superficial style changes are required.

In addition to implementing these algorithms. I am happy to say that RBM, ssRBM and GAN are very well tested. You can find some of the code here. I would also like to point out that for RBM we are comparable to the sklearn library in terms of speed(1.5x faster) and accuracy(similar). We couldn't benchmark the ssRBM, as none of the libraries actually implement the ssRBM PR. For GAN's we tested out implementation with examples from keras and tensorflow. I am happy to say that in terms of visual reconstruction for the Gaussian distribution generation we were able to get comparable results with tensorflow. We also implemented the test for generation of Mnist images using FFN as the discriminator and generator network. The results are not as good as that of the keras example but we can see the training of Gan and that it converges to some stable results. You can look at the results below.


The major difficulty with implementing RBM's was deciding on architecture that could easily be extendable to other extensions. Hence, we decided to implement wrapper layer and then implement the actual working of a type of rbm's as policies. This gives us the advantage of modularity.

Here are the results of our implementation on the digits dataset(smaller version Mnist).

The samples are generated from 1000 steps gibbs sampling

This is image is generated from example

Our accuracy on the digits dataset was around 86% be this can go higher with some more parameter tuning.

Refrence for Implementation:, Training RBM


Spike and Slab RBM were hardest to implement mainly because there are no existing implementations of the paper. We had to figure out some formulas ourselves like the free energy function. Some of the details in the paper are a little unclear like the radius parameter for rejecting samples etc. We also had to decide how we wanted to represent parameters such as spike variables, slab variables and lambda bias variables the paper states them to diagonal matrices but we found that most of the papers just have constant value for the diagonal entries so, we decided to represent them as scalar, giving us major speed and memory improvements.

We tested ssRBM on digits data set again we were able to get an accuracy of around ~82% with the digits dataset. One of the reasons accuracy of ssRBM is less than binary RBM is because ssRBM is suited for a dataset that has high correlation within the parts of the images something like the cifar dataset so the correlation can be captured by the slab variables. We tried ssRBM on the cifar data set code but due to the large volume of data set and scarcity of the computation resources, we got around 70% accuracy but the results should be taken with a bit of salt as I saw that increasing of samples size lets to decrease in the accuracy.

We also tried bringing down the running time of the ssRBM considerably and were able to do so. We are still looking at the issue to gain possibly more improvements. We tried the cifar data set for ssRBM testing there were a couple of problems, the paper didn't give a detailed list of parameters that it used either for training the ssRBM nor for the creation of features vectors from the ssRBM for testing. Also, the cifar dataset set is quite huge and the problem was compounded by the fact the paper suggested that we create patches 8*8 from every image, leading to an increased size in the dataset set.

Refrences: Main Paper, Additional Paper1, Additional Paper2


We implement the Generative Adversarial Networks in a pretty unique way than most of the other implementations in TensorFlow and Pytorch in the following way.

  1. Using single optimizer for training.
  2. No need for separate noise data.
  3. No need to have separate noise and real labels.
  4. Providing parameters such as Generator Update step defining when to update the generator network.

The main idea of implementing GAN's this way came from Mikhail. The above formulation has the advantage of being easy to use and using fewer resources in terms of space requirements and making a lot more logical sense.

There are at present no techniques for testing GAN's we tested our implementation on the following examples

  1. 1D Gaussian Test: This test aims to generate data from Gaussian of mean = 4 and var = 0.5 give that the noise function is the uniform distribution.
  1. Generation of Mnist Digits: We tested our GAN for generation of images using a simple FNN as both the generator and discriminator network. We used only the 7's digits from Mnist data set for achieving the following results.

Here is te final image the network is trained for just 60 epoch's. We can get better results if we find better hyperparameters.

  1. DcGan(Strip Down Version) We also tested our GAN's implementation for generation of digits using CNN's. This basically would give us a simpler version of the famous DCGAN implementation. For completing this test we had to add to the Resize Layer(Bilinear Interpolation of images) and the CrossEntropy with logits layer. Right now we are testing our implementation for this test you can find the code here.

We also added some of the training hacks from Soumith Chintala's workshop on GAN's

  1. Batch Training for discriminator and generator
  2. Pre-Training for discriminator
  3. Optimizing $-log(D(G(z))$ instead of $1 - log(D(G(z))$
  4. Training the discriminator more using the Gradient Update step.

Refrences: Gan Paper, Gan Hacks


Most of my last 15-20 days were spent in hyper parameter tuning for the various test to work for GAN's and ssRBM. This a highly boring and tedious job. I am really looking forward to the HyperParameter Tuning PR.

One of the major problems faced also was that for testing Deep Learning modules correctly we require huge amounts of data and resources. I found my computing rig in some cases to be very slow for the purpose. Hence, I am looking forward to Bandicoot and testing all this code on a GPU.

I also found the problem of training GAN's particularly the DCGAN example very time consuming there is no efficient way for parameter hyper tuning.

Future Work

  1. I think ssRBM needs to be tested on data set such as the Cifar but first, we are required to confirm that the existing implementation of ssRBM do achieve good results on some parameters so we can replicate the same results. (Preferably this code should be run on GPU since it would lead to faster results and hence faster hyper parameter tuning).
  2. Implementation of Wasserstein GAN. Lately, a lot of noise in the deep learning community of late has been around how great WGAN are compared to GAN. Hence I think it is reasonable to implement it next.
  3. I think we need support for DeConv Layers, BatchNormalisation and other image manipulation functions.
  4. I think something similar to hyperopt would be very useful.
  5. Implementation of stacking module for RBM and GAN's.


The summer has been quite eventful. I am really happy to say that I have learned a lot about having to persevere through an implementation to make it work. The need for constant improvements of the code base is one of the important lessons I have learned. I also added some serious debugging skills to my skillset. Finally, I want to give a huge thank you to Mikhail Lozhnikov without whom the project would not have been possible. I would also like to thank Marcus for his valuable inputs from time to time and also guiding me with the code base. I really think this experience has made a more patient and much better programmer than I was before. Thanks again to the whole mlpack community :)

Benchmarking mlpack libraries - Summary

Benchmarking mlpack libraries - Summary

This summer I was quite fortunate to work on the mlpack project with my mentors Ryan Curtin and Marcus Edel as a part of the Google Summer of Code program. I worked on improving their benchmarking system and I would like to now describe what the work was and what I learnt using plethora of success and failures and what work has to be done in future post GSoC.


There were many mlpack methods that had been added since the previous benchmarking system was built and in many of the current implementations only the runtime metric was being captured. Also many of the implementations were according to the old versions of the benchmarked libraries which used many deprecated methods which needed updating. Many methods were not implemented in MATLAB, Weka and many libraries like annoy, mrpt, nearpy, milk, dlib-ml and R were not yet benchmarked.

Work done before GSoC

Before officially becoming a Google Summer of Code intern I made the following contributions to the benchmarking system of mlpack library starting March 2017:

  • Added options to the randomforest and Decision tree scikit benchmark code.
  • Added the Code for DecisionTree in mlpy.
  • Implemented Approximate Nearest Neighbors using the Annoy, Mrpt, Sklearn and Nearpy.
  • Added test file for Random Forest, Logistic Regression and Adaboost.
  • Made certain corrections to the svm and LARS implementation of sklearn and to the config file.

Work done during GSoC

  • Updating Scikit Implementations: The scikit implementations were updated to facilitate specifying more options and storing metrics like Accuracy, Precision, Recall, MSE along with runtime. The config.yaml file did not have any block for the Logistic Regression and ICA methods, so these were also added. The following table enumerates the parameters added to the various methods:
Method Parameters Added
LSHForest Min_hash_match, n_candidates, radius and radius_cutoff_ratio
ALLKNN Radius, tree_type, metric and n_jobs
Elastic Net Max_iter, tolerance and selection
GMM Tolerance and Max_iter
ICA N_components, algorithm, fun, max_iter and tolerance.
KMEANS algorithm
Logistic Regression Tolerance and Max Iterations
  • Merged PR's: 60, 61
  • Benchmarking Against Milk Machine Learning Toolkit for Python: Introduced a new library called Milk to benchmark against. Implemented Adaboost, Decision Tree, Kmeans, Logistic Regression and Random Forest in the same. Wrote the Unit test files for them and added the config.yaml block for the same.
  • Merged PR's: 64
  • Unit Test file for Adaboost: There was no unit test file for Adaboost present so added the same for the scikit and milk implementations.
  • Merged PR's: 65
  • Updating the Shogun implementations: Shogun had been updated from 5.0.0 to 6.0.0 so most of the implementations needed updation. Also earlier only the runtime metric was being collected in the implementations, so the codes were changed to collect other metrics like Accuracy, Precision, Recall and MSE.
  • Merged PR's: 79
  • Avoid building the model twice: In the scikit and shogun implementations, the model was being built twice, once while calculating the runtime and again while calculating the other metrics. This was avoided by returning the predictions made during runtime to the function calculating other metrics and all the scikit and shogun implementations were updated to do the same. This ensured that the implementations took lesser time to run.
  • Merged PR's: 80, 81, 82, 83, 85, 86, 88, 91
  • Updating MATLAB implementations: There were around 3-4 MATLAB implementations present and earlier and these were mapping only runtime. Many implementations like Decision Tree, K-Nearest Classifier, Support Vector Classifier, Decision Tree Classifier, Lasso, LDA, QDA, Random Forest and Support Vector Regression were added along with python scripts to call them and unit test files to test them.
  • Merged PR's: 89, 94
  • Updating WEKA implementations: The current WEKA folder hosted around 3 implementations and after weka got updated those scripts had become outdated . So the presently benchmarked methods had to be re-implemented and many other methods were also added. After updating the weka folder holds Decision Stump, Decision Tree, Logistic Regression, Naive Bayes, Perceptron, Random Forest, ALLKNN, KMEANS, Linear Regression and PCA implementations. The python scripts to call and store the results and the Unit test files were also implemented.
  • Merged PR’s: 95
  • Benchmarking against Dlib-ml: Introduced a new C++ Machine Learning library called dlib-ml to the benchmarking system. Added implementations like SVM, KMEANS, Approximate Nearest Neighbors and ALLKNN. Wrote the install script, the python scripts to call them and the unit test files for the same.
  • Merged PR's: 96, 97, 98, 99
  • Make specifying K Necessary: Some of the K-Nearest Neighbors implementations took the default value of k as 5 while others did not. So to ensure uniformity made the option of specifying k mandatory in the implementations.
  • Merged PR's: 101
  • Benchmarking Against R: This is something that I was personally inclined to do. There is a worldwide debate on Python vs R and I thought that this is the best platform to settle it to some extent and see which one performs faster. Using mlr - The machine learning framework for R implemented methods like NBC, Adaboost, QDA, LDA, Decision Tree, K-Nearest Classifier, Random Forest, Support Vector Classifier, Lasso, Linear Regression and Support Vector Regression. Also wrote the Python Scripts and Unit Test file for the same.
  • Merged PR’s: 102, 103, 104, 105
  • The webpage: This is a work in progress where we are thinking about using JavaScript Visualization libraries to present the results on the webpage in a better way. More on this once the work completes. This work might not get completed during GSoC period itself but I will continue working on this after GSoC.

So basically throughout the summer, I:

  • Committed around 5000 lines of code.
  • Had 27 of my Pull Requests merged.

Technical Skills Developed

  • R: I had little to no experience when it came to working on R. During the course of the project I learnt how to build R from scratch and implement all the major Machine Learning Algorithms in it.
  • MATLAB: I had never used MATLAB before and now I am well versed in calling MATLAB codes from python scripts, Using Statistical and Machine Learning toolbox and then saving the results and sending them back to the python script that called the code.
  • Weka: I had only used the Weka GUI tool before and had never used it in a JAVA code. My mentors taught me how to do that.
  • Dlib-ml: I had never implemented Machine Learning Algorithms in C++ earlier. This project gave me an opportunity to do that.

Lessons Learnt

While I learnt useful tools and languages I also gained some general advice.

  • Don’t be under the assumption that anything is easy: While writing my proposal I assumed that working on new libraries and improving the old ones would be a week's task each but when I started working on them I had to spend far more time that planned originally which resulted in the last part of the project (the webpage) getting delayed.
  • It is important to be flexible and willing to change your priorities as many obstacles might come up.
  • Always ask for help as “Help will always be given at mlpack to those who need it.” ( Could not resist quoting Albus Dumbledore :-p).

Plans after GSoC

I wish to continue contributing to the benchmarking system and the initial plans are adding MachineLearning.jl and Shark libraries to the benchmarks. Thereafter writing a manuscript along with my Mentors Ryan Curtin and Marcus Edel on the results. Looking forward to a long time association with them.

In closing I would like to thank Ryan and Marcus for being such amazing mentors, for teaching me so many things and for putting up with me whenever I used to struggle. This is the best internship experience I’ve ever had and I hope I can meet them in person soon.

Augmented Recurrent Neural Networks - Summary

Augmented Recurrent Neural Networks - Summary

This post is the summary of my GSoC work.

Relevant pull requests

  • Augmented RNN models - benchmarking framework - merged. This PR contains documented and tested benchmarking classes for simple "algorithmic" tasks - CopyTask, AddTask, SortTask.
  • Added LSTM baseline model with copy task benchmark - paused. This PR contains the working implementation for LSTM model working on the three tasks from the previous PR.
  • Adding optimization features not related with my GSoC project - merged. This PR contains gradient clipping implementation and implementation of CrossEntropyError function. The name turned out to be a misnomer - the former helped to work around the gradient explosion problem, and the latter one accelerated training (as it was mentioned earlier, this error function is penalizing the mistakes more sharply than MeanSquareError)
  • Implementing Hierarchical Memory Unit - work in progress. Right now this PR contains:
    • A documented implementation of the TreeMemory - a tree-like data structure used for memory access in HAM unit + unit tests for it.
    • A documented implementation of the HAMUnit forward pass + unit tests for it.


First of all, I should admit that my main surprise was that I vastly underestimated the benchmarking framework part. During implementation, there appeared a lot of issues with the tasks themselves and making them work properly on LSTM even with small input sizes. This has been resolved, but it took way more time than it was planned initially.

Another source of interesting moments was the optimization PR. Both gradient clipping and cross-entropy error function were comparatively easy to implement. However, a lot of effort was invested into testing it and making it compliant with the rest of the mlpack codebase. This is undoubtedly the insight into industry programming I made during this summer. To clarify, the "industry" programming here is opposed to the competitive programming, which was my main source of programming experience before GSoC.


I would like to thank my mentors, Marcus Edel and Ryan Curtin, for sharing their technical experience and giving well-timed and clear recommendations during the work. Also, I would like to thank Google for this brilliant program, which gave me a chance to have a cool summer as well as gain new experience in the open-source real-world project. (oh, and to get used to git as well ^_^)

Profiling for parallelization and parallel stochastic optimization methods - Summary

Profiling for parallelization and parallel stochastic optimization methods - Summary

As we are fast approaching the final date ending final evaluation period for GSoC for this year, I think now is the right time to make a cumulative account of all the work done over the past 12 weeks during this project.

Initially, the accepted proposal was about the implementation of stochastic optimization methods, but after an agreement, we decided to include some aspects of the "Profiling for parallelisation" proposal as part of this project.

The primary goal of the "Profiling for parallelisation" project was to find parts of the library which could benefit from a parallel implementation and modify them to take advantage of multiple cores using OpenMP. On the other hand, "Parallel stochastic optimization methods" was concerned with the implementation of parallel approaches to optimization algorithms.

Executive summary

Here is the list of Pull Requests with work related to this project. These PRs have all the code that was intended to be committed to the library :

The list of commits merged into the codebase is here. Here is a better visualisation of the contributions made during GSoC.

In addition to the above library code, I wrote a number of benchmarks and examples to try out different ideas and gather more information about the performance of existing implementations in and outside of mlpack. Some important ones are listed here.

A weekly update of the progress on the project was posted as blog posts on this blog, which can be cumulatively found here.

  • Parallel testing framework: This part of the project involved adding the tests under the Boost unit testing framework to the CTest test runner included with CMake, which would allow for spawning of multiple jobs and make testing faster. For automating this process, a CMake script for parsing the required test suite names was written. Overall, there is about 2x reduction in testing times with four jobs.
  • Profiling for parallelisation: I wrote benchmarks for finding computational bottlenecks in mlpack code. KMeans was found to be easily parallelizable with OpenMP, giving around 3.5x speedup with the parallel implementation. Ideas for profiling and parallelisation were drawn from this paper.
  • Implementation of parallel SGD: I worked on the implementation of a parallel variant of Stochastic Gradient descent, based on the "HOGWILD!" scheme of unsynchronised updates between various threads, described by this paper. The algorithm has been implemented with OpenMP, which was able to generate the right primitives for a lock-free approach, as required by the paper. Convergence is much faster due to the computation of sparse gradients, and the gradient updates take much less time as compared to other optimizers.
  • Implementation of parallel SCD: This involved implementation of another optimizer which worked with partial gradients of the loss functions to update disjoint parts of the decision variable, being trivially parallelizable. The PR for the sequential optimizer is in the final phase for the merge. The changes involved for parallelization should be trivial, as most of the required work has been done in the sequential version. As more changes needed to be added to the FunctionType policy in mlpack, some documentation highlighting the various types of FunctionType requirements was also added as a part of the work on this optimizer.


Working on this project made me appreciate the kind of power a robust API like OpenMP brings. With minimal changes to existing code, fast, cross-platform multi core implementations could be written which scale nearly linearly with the number of threads. Particularly, the implementation of HOGWILD! was much less complex with OpenMP involved to take over the work-sharing and synchronisation, while providing equivalent (or better) performance than a hand-rolled implementation of a thread pool and memory barriers.

Future work

Some ideas which would build upon work done during this project over the summer :

  • More algorithms in mlpack need to be profiled for potential parallelization. One particularly interesting and major change would be with the introduction of parallel tree traversers (although the impact of such an implementation needs to be analyzed if it will be worth the time and effort for parallelization).
  • The implementation of SCD needs some trivial changes to work on multiple cores.
  • SCD could also use a line search algorithm instead of gradients to optimize the function under consideration, possibly leading to a much faster optimizer.
  • An in-depth empirical analysis of the implemented parallel optimizers, with different problems and datasets.


This summer has been a great learning experience for me, from not only the work done during my project but also seeing the constant flow of interesting work from the projects from other students. I would like to thank Yannis, Ryan and Marcus for their valuable time and guidance over the course of the entire summer to make this project an fundamental part of my experience as a developer.

Deep Reinforcement Learning Methods - Summary

Deep Reinforcement Learning Methods - Summary

This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.


My work mainly locates in methods/reinforcement_learning folder

  • q_learning.hpp: the main entrance for (double) q learning
  • async_learning.hpp: the main entrance for async methods
  • training_config.hpp: wrapper for hyper parameters
  • environment: implementation of two classical control problems, i.e. mountain car and cart pole
  • policy: implementation of several behavior policies
  • replay: implementation of experience replay
  • network: wrapper for non-standard networks (e.g actor critic network without shared layers)
  • worker: implementation of async rl methods

Refactoring of existing neural network components is another important part of my work

  • Detachment of module and optimizer: This influences all the optimizers and most test cases.
  • PVS convention: Now many of mlpack components comply with pass-by-reference, which is less flexible. I proposed the idea of pass-by-value in combination with std::move. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase.
  • Exposure of Forward and Backward: Before this we only have Predict and Train, which may lead to duplicate computation in some case. By the exposure of Forward and Backward, we can address this issue.
  • Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the Alias layer to address this issue. This is also a huge change, which will influence all the visitors.
  • Misc update of old APIs.

Detailed usage can be found in the two test cases: async_learning_test.cpp and q_learning_test.cpp. You can run the test cases by bin/mlpack_test -t QLearningTest and bin/mlpack_test -t AsyncLearningTest.

In total, I contributed following PRs:


The most challenging parts are:

  • Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use step instead of episode as a job unit.
  • Alias layer: This blocked me most and is still blocking me. We need a deep understanding of armadillo memory management, boost::variant and #include directives.

Future Work

Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).


I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.

Deep Learning Module in mlpack - Week 11 + 12

Deep Learning Module in mlpack - Week 11 + 12

This week was mostly spent in writing test for the GAN PR and Fixing changes in the GAN PR. We wanted to implement the Orilley example. I faced a lot of problems implementing this example as valid convolution padding used in the example leads to creation of very weird padding size which didn't make sense at the time. For taking care of that problem I had to implement the resize layer which basically is a interpolation layer using bilinear interpolation for scaling or de-scaling and image.

This week we also refactored the code using Lozhnikov's idea. Basically we want the predictors matrix to remain constant for the both the generator and discriminator network.

I also tried to get the 1d gaussian test and digits dataset to work this week. The parameter tuning for the digits is dataset pretty hard and we end up generating random noise most of time. We also added support for convents with gan's this week.

Neural Evolution Algorithms for NES games - Week 10

Neural Evolution Algorithms for NES games - Week 10

This week started by working on my previous PRs.

After some small final touches the CNE PR is ready. My mentor provided very detailed code review. The time that he put into for finding small spaces and other style issues that I missed in between code really ensures that MLPack gets good quality code merged. I am even now more familiar with the codebase and is doing a lot less mistakes.

The next was again working with the CMAES Repo

  • The next task was a -nan error that i used to see rarely during test results.
  • Also the optimization will some time go a bit wrong and will create very minute deviation from the exact answer. I was converging rosenbrock function for 45 dimension and a very small deviation leads to a different output.

Both the errors got solved and this algorithm now creates accurate result and no error is seen at any time. To make sure i even ran hundreds of iterations in my local machine.

Next was NEAT PR -

The paper i went for is-

also code from sethBling with his famous marIO video on genetic algorithm was helpful.

I finished reading the codebase by last year contributor Bang Lui and is setting up my local machine for running and testing the game today.

This week will be to debug the whole process and see for potential errors in the code. Also adding a tutorial for NEAT will add my small contribution too :)

Cross-Validation and Hyper-Parameter Tuning - Summary

Cross-Validation and Hyper-Parameter Tuning - Summary

My GSoC project can be summarized in the following PRs.

Cross-validation module

During the summer I have implemented two cross-validation strategies. The first one is simple: it splits data into two sets - training and validation sets - and then runs training on the training set and evaluates performance on the validation set. The second cross-validation strategy is k-fold cross-validation.

The cross-validation module has the following features.

  1. Data preparation happens only once when a cross-validation object is constructed.
  2. In many cases you don't need to be explicit about data of what types you are going to pass: the developed meta-programming tools are used to deduce types for storing data by inspecting machine learning algorithms you specify. It also allows you to pass objects that can be converted to the target types (e.g. objects of types that used by armadillo to store intermediate results).
  3. The interface is designed in the way that you first pass common (among machine learning algorithms implemented in mlpack) constructor parameters including data, number of classes, and information about dimensions (datasetInfo). During this step your compiler will check whether the specified machine learning algorithm accepts these parameters. If some check fails, a human-readable message will be printed.

Hyper-parameter tuning module

Another part of my GSoC project is a hyper-parameter tuning module. It lets you optimize hyper-parameters using one of the cross-validation strategies in couple with one of the metrics as objective.

The implemented module has the following features.

  1. It has the same interface for constructors as in the cross-validation module, and all features mentioned for the cross-validation module about construction are applied here too.
  2. What strategy to use during optimization is specified by a user. By now support for two strategies has been implemented. With the first strategy, GridSearch, users specify a set of values for each hyper-parameter they want to optimize. The second strategy, GradientDescent, uses numerically calculated gradients of functions based on cross-validation to optimize real-valued hyper-parameters.
  3. If some hyper-parameter should not be optimized, it can be marked with the Fixed function.


I have implemented different metrics that can be used for cross-validation and hyper-parameter tuning. These include accuracy, mean squared error, precision, recall, and F1.


I thank Ryan Curtin for his mentorship during the summer and for his passion to provide deep and thoughtful responses to my works. I also want to thank Marcus Edel for his reviews and for being responsive to my general questions about mlpack.

Deep Reinforcement Learning Methods - Week 10

Deep Reinforcement Learning Methods - Week 10

This week I mainly worked on the issue of sharing layers between two networks. We finally figured out that Alias layer is the only possible solution, because the memory should be contiguous – shared_ptr cann't do that. However the implementation of Alias layer is not easy. I managed to eliminate the strange void* and make the include clause elegant by rearranging all the visitors. However I sill got stuck with a stange bad memory access error. I spent two days with no progress. So I decided to come back to the implementation of A3C, without shared layers.