mlpack  blog
mlpack Documentation

Table of Contents

A list of all recent posts this page contains.

for Proximal Policy Optimization method - Summary

for Proximal Policy Optimization method - Summary

Time flies, the summer is coming to end, we come to the final week of GSoC. This blog is the summary of my GSoC project – implementation of one of the most promising dee reinforcement learning method. During this project, I implemented policy optimization method, one classical continuous task, i.e. Lunar lander, to test my implementation. Also my pull request for prioritized experience replay was merged into master.


My work mainly locates in methods/reinforcement_learning, methods/ann/ loss_functions and methods/ann/dists folder

  • ppo.hpp: the main entrance for proximal policy optimization.
  • ppo_impl.hpp: the main implementation for proximal policy optimization.
  • lunarlander.hpp: the implementation of the continuous task.
  • prioritized_replay.hpp: the implementation of prioritized experience replay.
  • sumtree.hpp: the implementation of segment tree structure for prioritized experience replay.
  • environment: the implementation of two classical control problems, i.e. mountain car and cart pole
  • normal_distribution.hpp: the implementation of normal distribution which accept mean and variance to construct distribution.
  • empty_loss.hpp: the implementation of empty loss which used in proximal policy optimization class, we calculate the loss outside the model declaration, the loss does nothing just backward the gradient.

In total, I contributed following PRs, most of the implementations are combined into the single Pull request in proximal policy optimization.

Proximal Policy Optimization

Prioritized Experience Replay

Change the pendulum action type to double

Fix typo in Bernoulli distribution

remove move() statement in hoeffding_tree_test.cpp

minor fix up


The most challenging parts are:

  • One of the most challenging parts of the work is that how to calculate the surrogate loss for updating the actor-network, it is different from the updater for the critic network which can be optimized by regression on mean square error. The actor-network is optimized by maximizing the PPO-clip objective, it is a little difficult to implement it like the current loss function which calculated by passing target and predict parameters, so I calculate it outside the model and the declaration of the model is passed into the empty loss function. All the backward process except the model part are calculated by my implementation, like the derivation to the normal distribution's mean and variance.
  • Another challenging part of the work is that I implement the proximal policy optimization in the continuous environment, the action is different from the discrete environment. In the discrete environment, the agent just output one dimension's data to represent the action, while in the continuous environment, the agent action prediction is more complicated, the common way to achieve predicting the agent action is to predict a normal distribution, then use the normal distribution to sample an action.
  • Also there are other challenging parts of the work, such as tuning the neural network to make the agent to work. This blocks me now so that I need to tune more parameters to pass the unit test. This part is also a time-consuming process.

Future Work

The pull request of proximal policy optimization is still underdeveloped due to the tuning parameters for the unit test, but it will be fixed soon.

PPO can be used for environments with either discrete or continuous action spaces, so another future work will be to support the discrete action spaces, even though it is easy than the continuous task.

In some continuous environment task, the dimensions of action are more one, we need to handle this situation.


A great thanks to Marcus for his mentorship during the project and detailed code review. Marucs is helpful and often tell me that do not hesitate to ask questions. He gives me great help when something blocked me. I also want to thank Ryan's response in IRC even though at midnight. The community is kind since the first meeting, we talk about a lot of things which contain different areas. Finally, I also appreciate t he generous funding from Google. It is a really good project to sharpen our skills. I will continue to commit to mlpack and make the library more easy to use and more powerful.

Thank for this unforgettable summer session.

String Processing Utilities - Summary

String Processing Utilities - Summary

This post summarize my work for GSoC 2019


The proposal for String Processing Utilities involved implementing basic functions which would be helpful in processing and encoding text and then latter implementing machine learning algorithms on it.


String Cleaning PR1904

  • The implementation started with implementing String Cleaning Functions, A class-based approach was used to implement the function, following were the function which was implemented :
    1. RemovePunctuation(): The function allows you to pass a string known as punctuation, which could involve all the punctuations to be removed.
    2. RemoveChar(): This function allows you to pass a function pointer or function object or a lambda function which return a bool value and if the return value is true, the character would be removed.
    3. LowerCase() : Convert the text to lower case.
    4. UpperCase() : Convert the text to upper case.
    5. RemoveStopWords(): This function accepts a set of stopword and removed all those words from the corpus.
  • After implementing the class, I started implementing CLI and python binding, since mlpack used armadillo to load matrix and hence I had to write a function which could read data from a file using basic input output stream. The types of file are limited to .txt or .csv. The binding has different parameters to set and would work as required based on parameters passed.

String Encoding PR1960

  • The initial plan was to implement a different class for different encoding methods such as BOW encoding, Dictionary encoding or Tf-Idf encoding, but we found that the class had lot of codes which we redundant, and hence we decided to implement a policy-based method and the implement different policy for each of the encoding type.
  • We implemented StringEncoding class which has the function for encoding the corpus (accepts a vector as input) and outputs you the encoded data based on the policy and output type, vector or arma::mat, Also provided an option with padding and to avoid padding depending on the encoding policy
  • We also designed a helper class StringEncodingDictionary, which maintains a dictionary mapping of the token to its labels, The class is a templated class based on the type of tokens, which involves string_view or int type. We arrived at the conclusion of implementing this helper class based on the speed profiling done by lozhnikov. He concluded some results, and thus we decided to implement a helper class.

Policies for String Encoding PR1969

  • We decided to implement three policy for encoding, namely as follows :
    1. Dictionary Encoding: This encoding policy allows you to encode the corpus by assigning a positive integer number to each unique token and treats the dataset as categorical, it supports both padding and non-padding output.
    2. Bag of Words Encoding: The encoder creates a vector of all the unique token and then assigns 1 if the token is present in the document, 0 if not present.
    3. Tf-Idf Encoding: The encoder assigns a tf-idf number to each unique token.

Tokenizer PR1960

  • To help with all the string processing and encoding algorithms, we often needed to tokenize the string and thus we implemented two tokenizers in mlpack. The two tokenizers are as follows:
    1. CharExtract: This tokenizer is used to split a string into characters.
    2. SplitByAnyOf: The SplitByAnyOf class tokenizes a string using a set of delimiters.

After implementing all the encoding policies and tokenizer, I decided to implement CLI and python binding PR1980 for String Encoding, Both string encoding and string cleaning function share a lot of common function and hence we decided to share a common file string_processing_util.hpp between the two bindings.

My proposal also included Implementation of Word2Vec, but we decided to opt-out since we found that google patented it.

Post GSoC

A lot of the codes I implemented are sketchy since I have used boost::string_view and other boost algorithms and hence we need to do a speed check and find out the bottlenecks if any. Also, my plan is to implement any substitute for word2vec, such as GLOVE or any other word embedding algorithms. I had implemented a function for One hot Encoding, which I thought could be useful for word2vec, but we found out that it was buggy to a small extent and hence I have to find a way out and also have to implement some overloaded functionality.

Lastly, the most important part, I have to write tutorials for all the functionality provided to allow someone to understand how to drop these functions in their codebase, Also excited to do some machine learning stuff on text dataset using mlpack.


A big thanks to lozhnikov, Rcurtin, Zoq, and the whole mlpack community. This was my first attempt at GSoC, and I am happy that I was successful in it. I fell in love with the open-source world and it was a wonderful experience. I gathered a lot of knowledge in these past 3 months. I will continue to be in touch with the mlpack community and seek to do more contributions to the project in the future.

Also, I think its time to order some mlpack stickers :)

Thanks :)

Application of ANN Algorithms Implemented in mlpack - Summary

Application of ANN Algorithms Implemented in mlpack - Summary


All GSoC contributions can be summarized by the following.

Contributions to mlpack/models

Pull Requests

Contributions to mlpack/mlpack

Merged Pull Requests

Issues Created

Contributions to zoq/gym_tcp_api

Merged Pull Requests

Loading Images

Image utilities supports loading and saving of images.

It supports filetypes jpg, png, tga,bmp, psd, gif, hdr, pic, pnm for loading and jpg, png, tga, bmp, hdr for saving.

The datatype associated is unsigned char to support RGB values in the range 1-255. To feed data into the network typecast of arma::Mat may be required. Images are stored in matrix as (width * height * channels, NumberOfImages). Therefore imageMatrix.col(0) would be the first image if images are loaded in imageMatrix.

Loading a test image. It also fills up the ImageInfo class object.

data::ImageInfo info;
data::Load("test_image.png", matrix, info, false, true);

Similarily for saving images.

const size_t width = 25;
const size_t height = 25;
const size_t channels = 3;
const size_t quality = 90;
data::ImageInfo info(width, height, channels, quality);
data::Save("test_image.bmp", matrix, info, false, true);


VGG-19 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 19 layers deep and can classify images into 1000 object categories. Details about the network architecture can be found in the following arXiv paper: For more information, read the following paper:

author = {Simonyan, K. and Zisserman, A.},
title = {Very Deep Convolutional Networks for Large-Scale Image Recognition},
journal = {CoRR},
year = {2014}

Tiny Imagenet

Tiny ImageNet Challenge is the default course project for Stanford CS231N. It runs similar to the ImageNet challenge (ILSVRC). The goal of the challenge is for you to do as well as possible on the Image Classification problem. The model uses VGG19 to classify the images into 200 classes.

MNIST handwritten digits

The VGG19 model used for classification. It creates a sequential layer that encompasses the various layers of the VGG19.

// Input parameters, the dataset contains images with shape 28x28x1.
const size_t inputWidth = 28, inputHeight = 28, inputChannel = 1;
bool includeTop = true;
VGG19 vggnet(inputWidth, inputHeight, inputChannel, numClasses, , "max", "mnist");
Sequential<>* vgg19 = vggnet.CompileModel();
// Compiling the architecture.
FFN<NegativeLogLikelihood<>, XavierInitialization> model;
model.Add<IdentityLayer<> >();
model.Add<LogSoftMax<> >();

Sentiment Analysis

We will build a classifier on IMDB movie dataset using a Deep Learning technique called RNN which can be implemented using Long Short Term Memory (LSTM) architecture. The encoded dataset for IMDB contains a vocab file along with sentences encoded as sequences. A sample datapoint [1, 14, 22, 16, 43, 530,..., 973, 1622, 1385, 65]. This sentence contains 1st word, 14th word and so on from the vocabulary.

A vectorized input has to be fed into the LSTM to explot the RNN architecture. To vectorize the sequence dictionary encoding is used. The sample shown would be transformed to [[1, 0, 0,.., 0], [0,..,1,0,...], ....], here the first list has !st position as 1 and rest as 0, similarly the second list has 14th element 1 and rest 0. Each list has a size of the numbers of words in the vocabulary.

Accuracy Plots


Time Series Analysis


We want to use the power of the LSTM in Google stock prediction using time series. We will use mlpack and Recurrent Neural Network(RNN).

// No of timesteps to look in RNN.
const int rho = 25;
size_t inputSize = 5, outputSize = 2;
// RNN model.
RNN<MeanSquaredError<>,HeInitialization> model(rho);
model.Add<IdentityLayer<> >();
model.Add<LSTM<> > (inputSize, 10, maxRho);
model.Add<Dropout<> >(0.5);
model.Add<LeakyReLU<> >();
model.Add<LSTM<> > (10, 10, maxRho);
model.Add<Dropout<> >(0.5);
model.Add<LeakyReLU<> >();
model.Add<LSTM<> > (10, 10, maxRho);
model.Add<LeakyReLU<> >();
model.Add<Linear<> >(10, outputSize);

MSE Plot



Implementation of an example of using Recurrent Neural Network (RNN) to make forcasts on a time series of electric usage (in kWh), which we aim to solve using a recurrent neural network with LSTM.

MSE Plot


Results on other datasets - International Airline Passengers

This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations.

We will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1) over the period of 'rho' frames.

Mean Squared error upto 10 iterations. Training ...

1 - MeanSquaredError := 0.146075
2 - MeanSquaredError := 0.144882
3 - MeanSquaredError := 0.09501
4 - MeanSquaredError := 0.0875479
5 - MeanSquaredError := 0.0836975
6 - MeanSquaredError := 0.0796567
7 - MeanSquaredError := 0.0804368
8 - MeanSquaredError := 0.0803483
9 - MeanSquaredError := 0.0809061
10 - MeanSquaredError := 0.076797

MSE Plot


Future Work

The tutorial associated with the models implemented are not published to mlpack webpage. The blogs are needed to be linked to a common place for the user. VGG19 is being trained on tiny-imagenet dataset, the results of which will be added.


I am sincerely grateful to the whole mlpack community especially Ryan Curtin, Marcus Edel, Sumedh Ghaisas, Shikhar Jaiswal for the support I received. It was an awesome experience.

Application of ANN Algorithms Implemented in mlpack - Week 12

Application of ANN Algorithms Implemented in mlpack - Week 12


  • Rectified Sentiment Analysis neural network model to introduce Embedding layer.

Thanks for reading. Have a nice day :smile:

NEAT & Multiobjective Optimization - Summary


The aim of my project for Google Summer of Code 2019 was to implement NeuroEvolution of Augmenting Topologies (NEAT) in mlpack based on Kenneth Stanley's paper Evolving Neural Networks through Augmenting Topologies. I would also implement support for "phased searching", a searching scheme devised by Colin Green to prevent genome bloat when training NEAT on certain complex tasks.

Besides this, my project aimed to create a framework for multi-objective optimization within mlpack's optimization library ensmallen. This would involve the implementation of several test functions and indicators, as well as an optimizer, NSGA-III.



NeuroEvolution of Augmenting Topologies (NEAT) is a genetic algorithm that can evolve networks of unbound complexity by starting from simple networks and "complexifying" through different genetic operators. It has been used to train agents to play Super Mario World and generate "genetic art".

I implemented NEAT in PR #1908. The PR includes the entire implementation including phased searching, associated tests and documentation. NEAT was tested on:

  • The XOR test, where it's challenge was to create a neural network that emulated a two input XOR gate. NEAT was able to solve this within 150 generations with an error less than 0.1.
  • Multiple reinforcement learning environments implemented in mlpack.
  • The pole balancing task in OpenAI Gym. This was done using the Gym TCP API implemented by my mentor, Marcus Edel. A short video of the trained agent can be seen here.
  • The double pole balancing test. I implemented this as an addition to the existing reinforcement learning codebase. NEAT performed well on both the Markovian and non-Markovian versions of the environment.

The pull request is rather large and is still under review.

Multi-objective optimization

Multi-Objective Optimization is an area of multiple criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. NSGA-III (Non-dominated Sorting Genetic Algorithm) is an extension of the popular NSGA-II algorithm, which optimizes multiple objectives by associating members of the population with a reference set of optimal points.

I implemented support for multi-objective optimization in PR #120. This PR includes:

  • An implementation of NSGA-III. This code is still being debugged and tested.
  • Implementation of the DTLZ test functions.
  • Implementation of multiple indicators, including the epsilon indicator and the Inverse Generational Distance Plus (IGD+) indicator.
  • Associated documentation.

Other work

Besides the work explicitly stated in my project, I also made some smaller changes and additions in the reinforcement learning codebase. This includes:

  • Implementation of both a Markovian and non-Markovian (velocity information not provided) version of the double pole balancing environments (see #1901 and #1951).
  • Fixed issues with consistency in mlpack's reinforcement learning API, where certain functions and variables were missing from some environments. Besides this, the environments now have an option to terminate after a given number of steps. See #1941.
  • Added QLearning tests wherever necessary to prevent issues with consistency in the future. See #1951.

Future Work

  • Get the NEAT PR merged.
  • Finish testing and debugging the NSGA-III optimizer.
  • Write more indicators for multi-objective optimization.

Final Comments

I would like to thank my mentors, Marcus Edel and Manish Kumar for all their help and advice, as well as bearing with me through the multiple technical difficulties I faced during this summer. I'd also like to thank all the other mentors and participants for their support. I hope to continue to contribute to this community in the future, and look forward to the same.

If anyone is interested in seeing my weekly progress through the program, please see my weekly blog.

Implementing Improved Training Techniques for GANs - Week 11

Implementing Improved Training Techniques for GANs - Week 11

This week after Toshal’s PR on support for more than 50 layers was merged a lot of my work was ready to be merged. Specifically, the padding layer has now been completed and merged. Also the mini batch discrimination layer is complete and the build is finally passing so, we should be able to merge that soon as well!

Other than that I have continued my work with spectral norm layer. One difficultly with the implementation is that the the layer uses a power iteration method during the forward pass for computing the spectral norm of the weight matrix. I am not completely sure how we would compute the gradient for this approximation. I have been try to do the manual derivation for this but it is very tedious and I have not been successful with it so far. Hopefully, in the coming week I can get it to work. Otherwise I hope to continue the work post-GSoC.

Implementing Essential Deep Learning Modules - Week 12

Implementing Essential Deep Learning Modules - Week 12

Well my Bias visitor PR is merged. Also my exact-objective PR is merged. I also added a workaround to enable adding more layers to the ANN. It looks like we can have as many layers as we want. We can always develop a tree like structure for adding more layers. The most interesting thing about the workaround PR was that the Branch of the PR could be deleted just after two days of it's creation.

As soon as the workaround for adding more layer got merged, My Weight-Normalization layer was ready to merge. My Inception Layer can also be now merged and I will start working on it soon.

In the upcoming week, I am thinking to add a small test for LSGAN so that it can get merged. It's actual image testing would need some time. Most of the online available tests use Dual Optimizer. So deciding the parameters for training would be challenging. If poosible I will also try to finish my work on Inception Layer.

I am also currently testing my Dual Optimizer PR. It's running from long time on savannah approximately (50 Days). Hopefully, I see good results after it gets completed :)

Application of ANN Algorithms Implemented in mlpack - Week 11

Application of ANN Algorithms Implemented in mlpack - Week 11


  • Completed the tutorials on LSTM Univariate.
  • Added tutorials for VGG19 for mnist dataset.
  • Completed the PR on VGG19 for tiny - imagenet dataset.
  • Added tutorials for VGG19 trained on imagenet.

Next Week Goals

  • Linking the tutorials to the mlpack page.

Thanks for reading. Have a nice day :smile:

Implementing Essential Deep Learning Modules - Week 11

Implementing Essential Deep Learning Modules - Week 11

In the last two weeks I have completed the exact-objective PR. Also my Weight Norm layer is quite complete and it's ready to merge. Both of the PR's required some time but they are now ready to merge.

My radical_test PR also got merged. It was quite weird that the Memory Check was timing out even after 15 hours of build. But yes after merging the build is getting passed without any issues. May be there could be a glitch around the system.

I have also started to implement LSGAN. It's almost complete. I will need to add some validation to ensure that the corect loss function is used for LSGAN. Testing it on savannah will also take sometime.

In the upcoming week I am aiming to complete LSGANs and the Inception Layer module. If possible I will also start working on BIGAN.

String Processing Utilities - Week 11

String Processing Utilities - Week 11

This week started with extending lozhnikov's PR1960, I added Bagofwords encoding policy and also added Tf-Idf with different variants namely Raw_count, binary, sublinear_tf, term_frequency and also added test for both of the encoding policy, I think we are almost done with the encodings, and maybe some minor fixups needs to be done.

Now coming to the string-cleaning PR, we are done with that too, I made some minor fixups last week and also added some test for the CLI binding and updated documentation too, Again some minor fixups are remaining, apart from that everything is done.

For the coming week, my priority is to complete the Word2Vec algorithm, maybe I could just get the initial API done by the coming week and then can just complete the full-fledged API by the second week.

Also, post-GSoC, I will write tutorials for both how to drop string-encoding API into your code and also how to drop scaling matrix API, so stay tuned for both of that :)

Thank you :)

Implementing Improved Training Techniques for GANs - Week 10

Implementing Improved Training Techniques for GANs - Week 10

This week has been productive. I was able to finish the regularizers PR and it has been merged. I was able to complete the work for CGAN<> after implemneting a Concat<> visitor and have also a written a test for that. After we can successfully produce results from the CGAN I think that would also be ready for merge. I think the major task that remains would be checking the gradient issue in GAN as pointed out by Toshal. I have also kept my other PRs on Padding<> and MinibatchDiscrimination layers updated. The work there is complete however is blocked as we are unable to support more than 50 layer types at the momemt. Hopefully we can find a solution in the coming weeks.

Proximal Policy Optimization - Week 9

8 Proximal Policy Optimization - Week 8

This week, I fixed the bug memory access violation and some bugs that troubled me for a long time. I find that more problem than I expected. This is my first time writing model with loss calculated outside the model. I was familiar with PyTorch and TensorFlow, so I write code with the original stereotype. Such as I thought that the Normal distribution will accept mean and variance as parameters, in fact, it accepts mean and covariance as parameters. I am wondering whether I rewrite distribution to make it consistent with PyTorch framework. With mentor kindly remind, I realized that I am a little bit behind my schedule. Yes, it is. I am too optimistic about the workload, I think I need to devote more time to speed up the progress.

Thanks for reading :).

8 Proximal Policy Optimization - Week 8

This week, I rewrite the normal distribution by my own, so I can predict the normal distribution's parameter mean and variance to construct distribution. Then the agent can sample action by using the distribution. After carefully read the diagonal Gaussian distribution may can have the same functionality, maybe I can change to use this.

This week, I was stuck in how to backward gradient through the network. With the help of members in mlpack community, I have a more clear mind on that now. Maybe next week I can solve this problem.

Thanks for reading :).


8 Proximal Policy Optimization - Week 8

This week, I finally completed the backward process of the network. The problem is a little bit of challenge for me in the beginning. The key to solving this problem is that we need to have a clear understanding of how to network graph build so that we can backward the error through the graph. If I come across the same problem with more complicated graph, I think I can solve it on my own. I am here to thank that give me much help in practice.

Thanks for reading :).