mlpack
blog

Table of Contents
A list of all recent posts this page contains.
 Neural Evolution Algorithms for NES games  Summary
 FrankWolfe Algorithm and Solving Sparse Solutions of Atom Norm Constraint Problems
 Neural Turing Machines  Summary
 Deep Learning Modules  Summary
 Benchmarking mlpack libraries  Summary
 Augmented Recurrent Neural Networks  Summary
 Profiling for parallelization and parallel stochastic optimization methods  Summary
 Deep Reinforcement Learning Methods  Summary
 Deep Learning Module in mlpack  Week 11 + 12
 Neural Evolution Algorithms for NES games  Week 10
 CrossValidation and HyperParameter Tuning  Summary
 Deep Reinforcement Learning Methods  Week 10
Neural Evolution Algorithms for NES games  Summary
My Google Summer of Code project was to create neural evolution algorithms for NES games. under which:
CNE:
Conventional Neural Evolution.CMAES:
Covariance Matrix Adaptation Evolution Strategy.NEAT:
Neural Evolution through Augmented Topologies.
and HyperNEAT
which is exactly same as NEAT
but the genomes go through a substrate layer to get the links has to be implemented. All the evolutionary algorithm as planned before was supposed to be neural network optimization algorithms. But we later switched to using it as a generalized optimizer which can be used with other functions and machine learning methods, like logistic regression class implemented in mlpack
as well. Therefore CMAES
and CNE
were designed as an optimizer.
CMAES Implementation
 My working PR (ready): This PR is all set to get merged 938
 References: The research paper that I referred for this algorithm is: The CMA Evolution Strategy: A Tutorial by Nikolas Hansen.
 Medium Blog Post: I have a medium blog describing how this algorithm works followed by describing
mlpack
open source library and a full code tutorial showing the optimization in action: CMAES algorithm in C++ using mlpack  Test Cases:
CMAES
algorithm works by using clever probability distribution (using a modifiable Gaussian distribution) and moving the mean to minima and updating the covariance matrix in the process. Unlike other derivative methods for optimization, this method is great for discontinuous and ill conditioned functions. Therefore one of the performance tests was done using Rosenbrock function and that too iteratively for up to 50 dimensions. A change in 1e5 in values leads to a very different result. A lot of time was spent to make the algorithm better and removing some major bugs. I can confidently present the optimizer now which works perfectly giving accurate results each time.  Tutorial: A full descriptive tutorial on using
CMAES
has been added in the documentation covering step by step: The
CMAES
optimizer class in short.  Making the function Class.
 Setting up the constructor.
 Complete code example.
 Using as optimizer with other models.
 The
This PR was very challenging for me. The time I decided was way too much less than actually needed. Also during this time, I was new to armadillo
library for which my mentor came to the rescue. I made a lot of modifications to the PR and also spend a lot more in debugging. This PR was indeed a bit scary learning experience. But finally, this PR got completed after optimization, improvements and bug fixes.
CNE Implementation
 PR (merged): 1088
 Divided into three sections: Optimizer, Test Cases, Tutorial
 Optimizer:
Conventional Neural Evolution
implemented inmlpack
is an optimizer that works like biological evolution in nature which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of the population. This then keeps on repeating iteratively till several generations after which healthy candidates are found which perform well in getting the desired output.  References: The research paper and reference code that I used to learn
CNE
were: Training Feedforward Neural Networks Using Genetic Algorithms, Evolving Artificial Neural Networks, Multineat, PR 753  Test Cases: Test cases were written for three very different scenarios.
 Optimizing a given function: XOR function.
 Optimizing a vanilla network: It worked for both iris dataset and the thyroid dataset. But for making the test case faster only iris dataset is used.
 Testing with
mlpack
existing model: Logistic regression model was given the optimizer to learn a dataset.
 Tutorial: A full descriptive tutorial on using
CNE
has been added in the documentation covering step by step: The
CNE
optimizer class in short.  The constructor parameters.
 Creating a model using
mlpack's
ANN Class.  Complete code example.
 Logistic regression using
CNE
as an optimizer.
 The
This is my first merged PR for any open source project which now rests peacefully in mlpack servers.
CNEwas not as painful as
CMAES` and as always my mentor provided his valuable suggestions and code reviews. I loved working on it completely.
NEAT Implementation
 My Working PR: This PR is not yet complete. 1105
 Reference: The main reference paper for
NEAT
algorithm that I went through is: Evolving Neural Networks through Augmenting Topologies. Also a very beutiful implementation video is present on youtube that is worth watching: SethBling MarIO, SethBling marIO code implementation, Bang Lui PR 752
This class is already made by previous GSoC 2016 participant Bang Lui and so rather than reinventing the wheel my work was to make it live again and fix an important bug.
Reading his code was really a nice experience and I think it is one of those best ones in which u can see classes for neurons, links and using them a new class genomes (which is a neural network having links and neurons) and then species as a cluster of genomes and finally population as a cluster of species. To see the true essence of classes and object oriented programming so beautifully implemented.
Some of my time went on to make this oneyearold PR live and working again. I made more than 270 indentation fixes and other code optimization and style fixes that do not get caught by the cppCheck bot. A tutorial for documentation is also under process. I was able to find some errors which created some critical changes in evolution. Also, changes are under testing and debugging for now.
Future work
My future work is to complete NEAT till it gets merged successfully. Looking at the work done by Bang Lui the code is worth fixing and pushing to the codebase. For HyperNEAT
which is a small extension to NEAT
, a new PR will follow which is not a major task.
I loved the idea of adding Gaussian process implementation to mlpack
in one of the issues. I would love to contribute for this later.
Conclusion
I would like to thank mlpack
for giving me a chance to contribute to this beautiful library. I have a very different perspective for open source now. Big thanks to my mentor Marcus Edel for tolerating me. There were multiple times I asked for his help and he always responded in no time and with a lot of patience. Also, the detailed code review and improvements that he suggested were unmatched and clearly ensure that good quality code gets in the main codebase. Thanks to Ryan for the code review and final edits. Lastly Google for encouraging open source contribution. Looking forward to more contributions from my side and becoming a better engineer.
FrankWolfe Algorithm and Solving Sparse Solutions of Atom Norm Constraint Problems
This post summarize my work for GSoC 2017.
My Work
The basic work of my GSoC 2017 is to implement the Frank Wolfe Algorithm (Conditional Gradient Algorithm) in various situations. From the paper of Jaggi RevisitingFrankWolfe, we know that many sparse optimization problem with atom domain constraint could be solved by this FrankWolfe type algorithm. So I implemented:
Classic Frank Wolfe Algorithm
. The algorithm is implemented in class template, so that it could be used and extended in general forms. The algorithm needs 2 steps: solving the linear constraint problem and updating the solution. Users could extend the usage of the algorithm by writing their own classes defining these 2 steps. For constraint problem solver, I implemented the common case that the constraint is a unit lp ball. For updating step, I implemented the classic update rule and a line search method. I tested the convergence speed, and confirmed that the line search step, although adds some extra work, will give faster convergence speed.Orthogonal Matching Pursuit
. Jaggi's paper revealed that if the constraint domain of the FW algorithm is just an atom norm bound, then the linear constraint problem would be very easy to solve. For example, in the vector case, if the constraint is a bound of the lp norm, the solution could be solved explicitly. A straightforward application is the Orthogonal Matching Pursuit (OMP) algorithm, where the update step is just to reoptimize the coefficients in the span of the current solution space. I also implemented the regularization functionality in the constraint problem solver, to help users to get faster convergence. Although we can use this FrankWolfe type algorithm to solve
atom domain problems
in a greedy fashion, it is known that using it directly will give slow convergence and large support size. I implemented the tricks in the paper: RaoShahWright(2015)ForwardBackwardGreedy, which introduced:Enhancement step
. It reoptimizes the atom coefficients within the atom norm constraint domain. It is realized using the projected gradient method.Truncation step
. It tries to delete some inefficient atoms, which cancels the drawbacks of this FW type greedy algorithm. This will help us to get faster convergence, as well as a solution with better sparsity.
 I also implemented the
constraint solver
for other atom domains, such as the structured group norm constraint domain, and matrix Schatten norm constraint domain. An important application in the matrices case is the matrix completion problem. Asmlpack
already has that solver implemented using the sdp optimizer, I changed the original implementation into a class template, which can accept both sdp optimizer and FrankWolfe optimizer to solve the matrix completion problem. I tried to compare the different algorithms, it seems that although the Enhancement step and the Truncation step above were added, the convergence of the Franke Wolfe Type algorithm is still very slow.
I currently made 2 PRs: 1041, 1087
The matrix completion
code using FrankWolfe type algorithm is already done. I will issue a new PR after my second PR got merged.
Future Work
A lot of the codes I implemented are tricks to improve the convergence speed of the algorithm. However, most of them are only manually checked with small test problems that these methods are beneficial. We should test on large scale problems and see how everything really works. Also, we can compare the FW algorithm with other optimizers on similar problems, and see the pros and cons of using FW.
Acknowledgement
I am really grateful to the help from Stephen, Ryan, Marcus, and the whole mlpack
community. It is my first time to join the open source world and it is a wonderful experience. It is also so cool to know so many machine learning people and discuss ideas with them. I will keep in touch with the mlpack
community and I hope I can make more contributions to the project in the future.
Neural Turing Machines  Summary
Another amazing year of GSoC. This year was more exciting that my previous experiences. My major contribution in this year, was the Neural Turing Machine
, or NTM
. NTM
is a neural network with additional external memory. As the entire architecture is endtoend differentiable, NTM is able to learn tasks which require memory storage. Such as, reproducing the given sequence, sorting the sequence and so on. The code is complete and can be found in PR #1072.
I have also implemented GRU
as part of my project this year. While implementing GRU
, the current recurrent neural network framework is extended to support variable length sequences. This involved making changes to current LSTM
implementation as well. Given such changes, both LSTM
and GRU
are tested with variable length reber grammar sequences. As an interesting side project, I added something called recursive reber grammar for testing. Recursive reber grammars are basically reber grammar but they are recursiveky embedded in themselves. Embedded reber grammar is a special case of recursive reber grammar where the recursive depth is set to 1. Its good to see that both LSTM
and GRU
passes recursive reber grammar tests with various recursive depths. Although I have noiced that the cell state size required to pass the tests differes with recursive depth of the grammar. It will be a nice expeiment to see what is actually going on in cell state. Is LSTM/GRU
, storing the depth as a stack? That would be really cool. All this code is already merged but can be accessed by PR #1018.
As of now, I am yet to complete the implementation of batch nomalization
. Before batch normalization can be added to the framework, the framework should be refactored to support batch mode. A lot of people are helping to achieve this task. Even though the program is over, I will keep working on this issue and complete batch normalization as pat of my project.
I would like to congratulate Marcus and Ryan, my mentors, for surviving this summer project wih me :P I know how hard this must have been :) Thank you for all your support and guidance. Its been more than 4 years since I joined mlpack
community and its been a wonderful journey. I hope the coming years will bring more fun...
Deep Learning Modules  Summary
Goals for the summer
I These were the following algorithms I proposed for implementation: RBM
, ssRBM(spike slab rbm)
, GAN(Generative Adversarial Networks)
, stackedGAN
Executive Summary
I was able to implement RBM
, ssRBM
and GAN
this summer. Though unfortunately none of my code has been merged till now. You can find the PR's that opened below: RBM*, ssRBM, GAN, ResizeLayer Implementation, CrossEntropyWithLogits
 Note
RBM
PR was later merged with thessRBM
PR.
The ssRBM
and rbm
are ready to be merged. The GAN
PR is also mostly complete. I think only superficial style changes are required.
In addition to implementing these algorithms. I am happy to say that RBM
, ssRBM
and GAN
are very well tested. You can find some of the code here. I would also like to point out that for RBM
we are comparable to the sklearn
library in terms of speed(1.5x faster) and accuracy(similar). We couldn't benchmark the ssRBM
, as none of the libraries actually implement the ssRBM
PR. For GAN's
we tested out implementation with examples from keras and tensorflow. I am happy to say that in terms of visual reconstruction for the Gaussian distribution generation we were able to get comparable results with tensorflow. We also implemented the test for generation of Mnist images using FFN as the discriminator and generator network. The results are not as good as that of the keras example but we can see the training of Gan and that it converges to some stable results. You can look at the results below.
RBM
The major difficulty with implementing RBM's
was deciding on architecture that could easily be extendable to other extensions. Hence, we decided to implement wrapper layer and then implement the actual working of a type of rbm's as policies. This gives us the advantage of modularity.
Here are the results of our implementation on the digits dataset(smaller version Mnist).
The samples are generated from 1000 steps gibbs sampling
This is image is generated from deeplearning.net example
Our accuracy on the digits dataset was around 86%
be this can go higher with some more parameter tuning.
Refrence for Implementation: DeepLearning.net, Training RBM
ssRBM
Spike and Slab RBM
were hardest to implement mainly because there are no existing implementations of the paper. We had to figure out some formulas ourselves like the free energy function. Some of the details in the paper are a little unclear like the radius parameter for rejecting samples etc. We also had to decide how we wanted to represent parameters such as spike variables, slab variables and lambda bias variables the paper states them to diagonal matrices but we found that most of the papers just have constant value for the diagonal entries so, we decided to represent them as scalar, giving us major speed and memory improvements.
We tested ssRBM
on digits data set again we were able to get an accuracy of around ~82%
with the digits dataset. One of the reasons accuracy of ssRBM
is less than binary RBM
is because ssRBM
is suited for a dataset that has high correlation within the parts of the images something like the cifar dataset so the correlation can be captured by the slab variables. We tried ssRBM
on the cifar data set code but due to the large volume of data set and scarcity of the computation resources, we got around 70%
accuracy but the results should be taken with a bit of salt as I saw that increasing of samples size lets to decrease in the accuracy.
We also tried bringing down the running time of the ssRBM
considerably and were able to do so. We are still looking at the issue to gain possibly more improvements. We tried the cifar data set for ssRBM
testing there were a couple of problems, the paper didn't give a detailed list of parameters that it used either for training the ssRBM
nor for the creation of features vectors from the ssRBM
for testing. Also, the cifar dataset set is quite huge and the problem was compounded by the fact the paper suggested that we create patches 8*8 from every image, leading to an increased size in the dataset set.
Refrences: Main Paper, Additional Paper1, Additional Paper2
GAN
We implement the Generative Adversarial Networks in a pretty unique way than most of the other implementations in TensorFlow and Pytorch in the following way.
 Using single optimizer for training.
 No need for separate noise data.
 No need to have separate noise and real labels.
 Providing parameters such as Generator Update step defining when to update the generator network.
The main idea of implementing GAN's
this way came from Mikhail. The above formulation has the advantage of being easy to use and using fewer resources in terms of space requirements and making a lot more logical sense.
There are at present no techniques for testing GAN's
we tested our implementation on the following examples
 1D Gaussian Test: This test aims to generate data from Gaussian of mean = 4 and var = 0.5 give that the noise function is the uniform distribution.
 Generation of Mnist Digits: We tested our
GAN
for generation of images using a simple FNN as both the generator and discriminator network. We used only the 7's digits from Mnist data set for achieving the following results.
Here is te final image the network is trained for just 60 epoch's. We can get better results if we find better hyperparameters.
 DcGan(Strip Down Version) We also tested our
GAN's
implementation for generation of digits using CNN's. This basically would give us a simpler version of the famousDCGAN
implementation. For completing this test we had to add to the Resize Layer(Bilinear Interpolation of images) and the CrossEntropy with logits layer. Right now we are testing our implementation for this test you can find the code here.
We also added some of the training hacks from Soumith Chintala's workshop on GAN's
 Batch Training for discriminator and generator
 PreTraining for discriminator
 Optimizing instead of
 Training the discriminator more using the Gradient Update step.
Refrences: Gan Paper, Gan Hacks
Highlights
Most of my last 1520 days were spent in hyper parameter tuning for the various test to work for GAN's
and ssRBM
. This a highly boring and tedious job. I am really looking forward to the HyperParameter Tuning PR.
One of the major problems faced also was that for testing Deep Learning modules correctly we require huge amounts of data and resources. I found my computing rig in some cases to be very slow for the purpose. Hence, I am looking forward to Bandicoot
and testing all this code on a GPU.
I also found the problem of training GAN's
particularly the DCGAN
example very time consuming there is no efficient way for parameter hyper tuning.
Future Work
 I think ssRBM needs to be tested on data set such as the Cifar but first, we are required to confirm that the existing implementation of
ssRBM
do achieve good results on some parameters so we can replicate the same results. (Preferably this code should be run on GPU since it would lead to faster results and hence faster hyper parameter tuning).  Implementation of Wasserstein
GAN
. Lately, a lot of noise in the deep learning community of late has been around how great WGAN are compared toGAN
. Hence I think it is reasonable to implement it next.  I think we need support for
DeConv
Layers,BatchNormalisation
and other image manipulation functions.  I think something similar to hyperopt would be very useful.
 Implementation of stacking module for
RBM
andGAN's
.
Conclusion
The summer has been quite eventful. I am really happy to say that I have learned a lot about having to persevere through an implementation to make it work. The need for constant improvements of the code base is one of the important lessons I have learned. I also added some serious debugging skills to my skillset. Finally, I want to give a huge thank you to Mikhail Lozhnikov without whom the project would not have been possible. I would also like to thank Marcus for his valuable inputs from time to time and also guiding me with the code base. I really think this experience has made a more patient and much better programmer than I was before. Thanks again to the whole mlpack
community :)
Benchmarking mlpack libraries  Summary
This summer I was quite fortunate to work on the mlpack
project with my mentors Ryan Curtin and Marcus Edel as a part of the Google Summer of Code program. I worked on improving their benchmarking system and I would like to now describe what the work was and what I learnt using plethora of success and failures and what work has to be done in future post GSoC.
Introduction
There were many mlpack
methods that had been added since the previous benchmarking system was built and in many of the current implementations only the runtime metric was being captured. Also many of the implementations were according to the old versions of the benchmarked libraries which used many deprecated methods which needed updating. Many methods were not implemented in MATLAB
, Weka
and many libraries like annoy
, mrpt
, nearpy
, milk
, dlibml
and R
were not yet benchmarked.
Work done before GSoC
Before officially becoming a Google Summer of Code intern I made the following contributions to the benchmarking system of mlpack
library starting March 2017:
 Added options to the randomforest and Decision tree
scikit
benchmark code.  Added the Code for DecisionTree in
mlpy
.  Implemented Approximate Nearest Neighbors using the
Annoy
,Mrpt
,Sklearn
andNearpy
.  Added test file for Random Forest, Logistic Regression and Adaboost.
 Made certain corrections to the svm and LARS implementation of sklearn and to the config file.
Work done during GSoC
Updating Scikit Implementations:
Thescikit
implementations were updated to facilitate specifying more options and storing metrics like Accuracy, Precision, Recall, MSE along with runtime. The config.yaml file did not have any block for the Logistic Regression and ICA methods, so these were also added. The following table enumerates the parameters added to the various methods:
Method  Parameters Added 

LSHForest  Min_hash_match, n_candidates, radius and radius_cutoff_ratio 
ALLKNN  Radius, tree_type, metric and n_jobs 
Elastic Net  Max_iter, tolerance and selection 
GMM  Tolerance and Max_iter 
ICA  N_components, algorithm, fun, max_iter and tolerance. 
KMEANS  algorithm 
Logistic Regression  Tolerance and Max Iterations 
 Merged PR's: 60, 61
Benchmarking Against Milk Machine Learning Toolkit for Python:
Introduced a new library calledMilk
to benchmark against. Implemented Adaboost, Decision Tree, Kmeans, Logistic Regression and Random Forest in the same. Wrote the Unit test files for them and added the config.yaml block for the same. Merged PR's: 64
Unit Test file for Adaboost:
There was no unit test file for Adaboost present so added the same for thescikit
andmilk
implementations. Merged PR's: 65
Updating the Shogun implementations:
Shogun
had been updated from 5.0.0 to 6.0.0 so most of the implementations needed updation. Also earlier only the runtime metric was being collected in the implementations, so the codes were changed to collect other metrics like Accuracy, Precision, Recall and MSE. Merged PR's: 79
Avoid building the model twice:
In thescikit
andshogun
implementations, the model was being built twice, once while calculating the runtime and again while calculating the other metrics. This was avoided by returning the predictions made during runtime to the function calculating other metrics and all thescikit
andshogun
implementations were updated to do the same. This ensured that the implementations took lesser time to run. Merged PR's: 80, 81, 82, 83, 85, 86, 88, 91
Updating MATLAB implementations:
There were around 34MATLAB
implementations present and earlier and these were mapping only runtime. Many implementations like Decision Tree, KNearest Classifier, Support Vector Classifier, Decision Tree Classifier, Lasso, LDA, QDA, Random Forest and Support Vector Regression were added along with python scripts to call them and unit test files to test them. Merged PR's: 89, 94
Updating WEKA implementations
: The currentWEKA
folder hosted around 3 implementations and after weka got updated those scripts had become outdated . So the presently benchmarked methods had to be reimplemented and many other methods were also added. After updating theweka
folder holds Decision Stump, Decision Tree, Logistic Regression, Naive Bayes, Perceptron, Random Forest, ALLKNN, KMEANS, Linear Regression and PCA implementations. The python scripts to call and store the results and the Unit test files were also implemented. Merged PR’s: 95
Benchmarking against Dlibml:
Introduced a new C++ Machine Learning library calleddlibml
to the benchmarking system. Added implementations like SVM, KMEANS, Approximate Nearest Neighbors and ALLKNN. Wrote the install script, the python scripts to call them and the unit test files for the same. Merged PR's: 96, 97, 98, 99
Make specifying K Necessary:
Some of the KNearest Neighbors implementations took the default value of k as 5 while others did not. So to ensure uniformity made the option of specifying k mandatory in the implementations. Merged PR's: 101
Benchmarking Against R:
This is something that I was personally inclined to do. There is a worldwide debate onPython
vsR
and I thought that this is the best platform to settle it to some extent and see which one performs faster. Usingmlr  The machine learning framework for R
implemented methods like NBC, Adaboost, QDA, LDA, Decision Tree, KNearest Classifier, Random Forest, Support Vector Classifier, Lasso, Linear Regression and Support Vector Regression. Also wrote the Python Scripts and Unit Test file for the same. Merged PR’s: 102, 103, 104, 105
The webpage:
This is a work in progress where we are thinking about using JavaScript Visualization libraries to present the results on the webpage in a better way. More on this once the work completes. This work might not get completed during GSoC period itself but I will continue working on this after GSoC.
So basically throughout the summer, I:
 Committed around
5000
lines of code.  Had
27
of my Pull Requests merged.
Technical Skills Developed
R:
I had little to no experience when it came to working onR
. During the course of the project I learnt how to build R from scratch and implement all the major Machine Learning Algorithms in it.MATLAB:
I had never usedMATLAB
before and now I am well versed in callingMATLAB
codes from python scripts, Using Statistical and Machine Learning toolbox and then saving the results and sending them back to the python script that called the code.Weka:
I had only used theWeka
GUI tool before and had never used it in a JAVA code. My mentors taught me how to do that.Dlibml:
I had never implemented Machine Learning Algorithms in C++ earlier. This project gave me an opportunity to do that.
Lessons Learnt
While I learnt useful tools and languages I also gained some general advice.
 Don’t be under the assumption that anything is easy: While writing my proposal I assumed that working on new libraries and improving the old ones would be a week's task each but when I started working on them I had to spend far more time that planned originally which resulted in the last part of the project (the webpage) getting delayed.
 It is important to be flexible and willing to change your priorities as many obstacles might come up.
 Always ask for help as “Help will always be given at
mlpack
to those who need it.” ( Could not resist quoting Albus Dumbledore :p).
Plans after GSoC
I wish to continue contributing to the benchmarking system and the initial plans are adding MachineLearning.jl
and Shark
libraries to the benchmarks. Thereafter writing a manuscript along with my Mentors Ryan Curtin and Marcus Edel on the results. Looking forward to a long time association with them.
In closing I would like to thank Ryan and Marcus for being such amazing mentors, for teaching me so many things and for putting up with me whenever I used to struggle. This is the best internship experience I’ve ever had and I hope I can meet them in person soon.
Augmented Recurrent Neural Networks  Summary
This post is the summary of my GSoC work.
Relevant pull requests
 Augmented RNN models  benchmarking framework  merged. This PR contains documented and tested benchmarking classes for simple "algorithmic" tasks 
CopyTask
,AddTask
,SortTask
.  Added LSTM baseline model with copy task benchmark  paused. This PR contains the working implementation for LSTM model working on the three tasks from the previous PR.
 Adding optimization features not related with my GSoC project  merged. This PR contains gradient clipping implementation and implementation of
CrossEntropyError
function. The name turned out to be a misnomer  the former helped to work around the gradient explosion problem, and the latter one accelerated training (as it was mentioned earlier, this error function is penalizing the mistakes more sharply thanMeanSquareError
)  Implementing Hierarchical Memory Unit  work in progress. Right now this PR contains:
 A documented implementation of the
TreeMemory
 a treelike data structure used for memory access in HAM unit + unit tests for it.  A documented implementation of the
HAMUnit
forward pass + unit tests for it.
 A documented implementation of the
Highlights
First of all, I should admit that my main surprise was that I vastly underestimated the benchmarking framework part. During implementation, there appeared a lot of issues with the tasks themselves and making them work properly on LSTM even with small input sizes. This has been resolved, but it took way more time than it was planned initially.
Another source of interesting moments was the optimization PR. Both gradient clipping and crossentropy error function were comparatively easy to implement. However, a lot of effort was invested into testing it and making it compliant with the rest of the mlpack
codebase. This is undoubtedly the insight into industry programming I made during this summer. To clarify, the "industry" programming here is opposed to the competitive programming, which was my main source of programming experience before GSoC.
Acknowledgements
I would like to thank my mentors, Marcus Edel and Ryan Curtin, for sharing their technical experience and giving welltimed and clear recommendations during the work. Also, I would like to thank Google for this brilliant program, which gave me a chance to have a cool summer as well as gain new experience in the opensource realworld project. (oh, and to get used to git
as well ^_^)
Profiling for parallelization and parallel stochastic optimization methods  Summary
As we are fast approaching the final date ending final evaluation period for GSoC for this year, I think now is the right time to make a cumulative account of all the work done over the past 12 weeks during this project.
Initially, the accepted proposal was about the implementation of stochastic optimization methods, but after an agreement, we decided to include some aspects of the "Profiling for parallelisation" proposal as part of this project.
The primary goal of the "Profiling for parallelisation" project was to find parts of the library which could benefit from a parallel implementation and modify them to take advantage of multiple cores using OpenMP. On the other hand, "Parallel stochastic optimization methods" was concerned with the implementation of parallel approaches to optimization algorithms.
Executive summary
Here is the list of Pull Requests with work related to this project. These PRs have all the code that was intended to be committed to the library :
 #1015:Add tests to the CTest framework
 #1024:Parallel implementation of NaiveKMeans
 #1037:Implementation of parallel SGD
 #1075:Implementation of Stochastic Coordinate Descent
The list of commits merged into the codebase is here. Here is a better visualisation of the contributions made during GSoC.
In addition to the above library code, I wrote a number of benchmarks and examples to try out different ideas and gather more information about the performance of existing implementations in and outside of mlpack
. Some important ones are listed here.
A weekly update of the progress on the project was posted as blog posts on this blog, which can be cumulatively found here.
 Parallel testing framework: This part of the project involved adding the tests under the Boost unit testing framework to the CTest test runner included with CMake, which would allow for spawning of multiple jobs and make testing faster. For automating this process, a CMake script for parsing the required test suite names was written. Overall, there is about 2x reduction in testing times with four jobs.
 Profiling for parallelisation: I wrote benchmarks for finding computational bottlenecks in
mlpack
code. KMeans was found to be easily parallelizable withOpenMP
, giving around 3.5x speedup with the parallel implementation. Ideas for profiling and parallelisation were drawn from this paper.  Implementation of parallel
SGD
: I worked on the implementation of a parallel variant of Stochastic Gradient descent, based on the "HOGWILD!" scheme of unsynchronised updates between various threads, described by this paper. The algorithm has been implemented with OpenMP, which was able to generate the right primitives for a lockfree approach, as required by the paper. Convergence is much faster due to the computation of sparse gradients, and the gradient updates take much less time as compared to other optimizers.  Implementation of parallel
SCD
: This involved implementation of another optimizer which worked with partial gradients of the loss functions to update disjoint parts of the decision variable, being trivially parallelizable. The PR for the sequential optimizer is in the final phase for the merge. The changes involved for parallelization should be trivial, as most of the required work has been done in the sequential version. As more changes needed to be added to theFunctionType
policy inmlpack
, some documentation highlighting the various types of FunctionType requirements was also added as a part of the work on this optimizer.
Highlights
Working on this project made me appreciate the kind of power a robust API like OpenMP brings. With minimal changes to existing code, fast, crossplatform multi core implementations could be written which scale nearly linearly with the number of threads. Particularly, the implementation of HOGWILD!
was much less complex with OpenMP involved to take over the worksharing and synchronisation, while providing equivalent (or better) performance than a handrolled implementation of a thread pool and memory barriers.
Future work
Some ideas which would build upon work done during this project over the summer :
 More algorithms in
mlpack
need to be profiled for potential parallelization. One particularly interesting and major change would be with the introduction of parallel tree traversers (although the impact of such an implementation needs to be analyzed if it will be worth the time and effort for parallelization).  The implementation of
SCD
needs some trivial changes to work on multiple cores. SCD
could also use a line search algorithm instead of gradients to optimize the function under consideration, possibly leading to a much faster optimizer. An indepth empirical analysis of the implemented parallel optimizers, with different problems and datasets.
Conclusion
This summer has been a great learning experience for me, from not only the work done during my project but also seeing the constant flow of interesting work from the projects from other students. I would like to thank Yannis, Ryan and Marcus for their valuable time and guidance over the course of the entire summer to make this project an fundamental part of my experience as a developer.
Deep Reinforcement Learning Methods  Summary
This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.
Introduction
My work mainly locates in methods/reinforcement_learning
folder
q_learning.hpp
: the main entrance for (double) q learningasync_learning.hpp
: the main entrance for async methodstraining_config.hpp
: wrapper for hyper parametersenvironment
: implementation of two classical control problems, i.e. mountain car and cart polepolicy
: implementation of several behavior policiesreplay
: implementation of experience replaynetwork
: wrapper for nonstandard networks (e.g actor critic network without shared layers)worker
: implementation of async rl methods
Refactoring of existing neural network components is another important part of my work
 Detachment of
module
andoptimizer
: This influences all the optimizers and most test cases.  PVS convention: Now many of mlpack components comply with
passbyreference
, which is less flexible. I proposed the idea ofpassbyvalue
in combination withstd::move
. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase.  Exposure of
Forward
andBackward
: Before this we only havePredict
andTrain
, which may lead to duplicate computation in some case. By the exposure ofForward
andBackward
, we can address this issue.  Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the
Alias
layer to address this issue. This is also a huge change, which will influence all the visitors.  Misc update of old APIs.
Detailed usage can be found in the two test cases: async_learning_test.cpp
and q_learning_test.cpp
. You can run the test cases by bin/mlpack_test t QLearningTest
and bin/mlpack_test t AsyncLearningTest
.
In total, I contributed following PRs:
 Implementation of Alias layer
 Async nstep qlearning and one step sarsa
 Implement a framework of DQN and asynchronous learning methods
 Implementation of async one step qlearning
 Add aggregated policy for async rl methods
 Support batched forward and backward for FFN
 Update Optimizer API
 Add new API for some optimizers
 Basic DQN
 Add epsilon greedy policy for DQN
 Add random replay for DQN
 Fix a bug in gaussian init
 Refactor FFN
 Implement two classical control problems for testing reinforcement learning method
 Fix bug of variadic template parameters of Optimizer
Highlights
The most challenging parts are:
 Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use
step
instead ofepisode
as a job unit. Alias
layer: This blocked me most and is still blocking me. We need a deep understanding ofarmadillo
memory management,boost::variant
and#include
directives.
Future Work
Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).
Acknowledgment
I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.
Deep Learning Module in mlpack  Week 11 + 12
This week was mostly spent in writing test for the GAN PR and Fixing changes in the GAN PR. We wanted to implement the Orilley example. I faced a lot of problems implementing this example as valid convolution padding used in the example leads to creation of very weird padding size which didn't make sense at the time. For taking care of that problem I had to implement the resize layer which basically is a interpolation layer using bilinear interpolation for scaling or descaling and image.
This week we also refactored the code using Lozhnikov's idea. Basically we want the predictors matrix to remain constant for the both the generator and discriminator network.
I also tried to get the 1d gaussian test and digits dataset to work this week. The parameter tuning for the digits is dataset pretty hard and we end up generating random noise most of time. We also added support for convents with gan's this week.
Neural Evolution Algorithms for NES games  Week 10
This week started by working on my previous PRs.
After some small final touches the CNE PR is ready. My mentor provided very detailed code review. The time that he put into for finding small spaces and other style issues that I missed in between code really ensures that MLPack gets good quality code merged. I am even now more familiar with the codebase and is doing a lot less mistakes.
The next was again working with the CMAES Repo
 The next task was a nan error that i used to see rarely during test results.
 Also the optimization will some time go a bit wrong and will create very minute deviation from the exact answer. I was converging rosenbrock function for 45 dimension and a very small deviation leads to a different output.
Both the errors got solved and this algorithm now creates accurate result and no error is seen at any time. To make sure i even ran hundreds of iterations in my local machine.
Next was NEAT PR 
The paper i went for is
also code from sethBling with his famous marIO video on genetic algorithm was helpful.
I finished reading the codebase by last year contributor Bang Lui and is setting up my local machine for running and testing the game today.
This week will be to debug the whole process and see for potential errors in the code. Also adding a tutorial for NEAT will add my small contribution too :)
CrossValidation and HyperParameter Tuning  Summary
My GSoC project can be summarized in the following PRs.
 #1016 Add accuracy and mean squared error
 #1020 Add a method form detection tool
 #1031 Add a meta information extractor
 #1044 Add simple crossvalidation
 #1074 Add precision, recall and F1
 #1095 Add kfold crossvalidation
 #1101 Add a hyperparameter tuning module
 #1108 Add a tutorial for the crossvalidation and hyperparameter tuning modules
Crossvalidation module
During the summer I have implemented two crossvalidation strategies
. The first one is simple: it splits data into two sets  training and validation sets  and then runs training on the training set and evaluates performance on the validation set. The second crossvalidation strategy is kfold crossvalidation
.
The crossvalidation module has the following features.
 Data preparation happens only once when a crossvalidation object is constructed.
 In many cases you don't need to be explicit about data of what types you are going to pass: the developed metaprogramming tools are used to deduce types for storing data by inspecting machine learning algorithms you specify. It also allows you to pass objects that can be converted to the target types (e.g. objects of types that used by armadillo to store intermediate results).
 The interface is designed in the way that you first pass common (among machine learning algorithms implemented in
mlpack
) constructor parameters including data, number of classes, and information about dimensions (datasetInfo). During this step your compiler will check whether the specified machine learning algorithm accepts these parameters. If some check fails, a humanreadable message will be printed.
Hyperparameter tuning module
Another part of my GSoC project is a hyperparameter tuning module
. It lets you optimize hyperparameters using one of the crossvalidation strategies in couple with one of the metrics as objective.
The implemented module has the following features.
 It has the same interface for constructors as in the crossvalidation module, and all features mentioned for the
crossvalidation module
about construction are applied here too.  What strategy to use during optimization is specified by a user. By now support for two strategies has been implemented. With the first strategy,
GridSearch
, users specify a set of values for each hyperparameter they want to optimize. The second strategy,GradientDescent
, uses numerically calculated gradients of functions based on crossvalidation to optimize realvalued hyperparameters.  If some hyperparameter should not be optimized, it can be marked with the Fixed function.
Metrics
I have implemented different metrics that can be used for crossvalidation and hyperparameter tuning. These include accuracy, mean squared error, precision, recall, and F1.
Acknowledgment
I thank Ryan Curtin for his mentorship during the summer and for his passion to provide deep and thoughtful responses to my works. I also want to thank Marcus Edel for his reviews and for being responsive to my general questions about mlpack
.
Deep Reinforcement Learning Methods  Week 10
This week I mainly worked on the issue of sharing layers between two networks. We finally figured out that Alias
layer is the only possible solution, because the memory should be contiguous – shared_ptr
cann't do that. However the implementation of Alias
layer is not easy. I managed to eliminate the strange void*
and make the include
clause elegant by rearranging all the visitors. However I sill got stuck with a stange bad memory access error. I spent two days with no progress. So I decided to come back to the implementation of A3C, without shared layers.