mlpack  blog
mlpack Documentation

Table of Contents

A list of all recent posts this page contains.

Implementing Essential Deep Learning Modules - Summary

Implementing Essential Deep Learning Modules - Summary


This summer I got an opportunity to work with mlpack organisation. My proposal for Implementing Essential Deep Learning Modules got selected, and under the mentorship of Shikhar Jaiswal and Marcus Edel I was working on it for the last three months.

The main aim of my project was to enhance the existing Generative Adversarial Network (GANS) framework present in mlpack. The target was to add some more functionality in GANs and to implement some new algorithms to mlpack such that performance of GANs is improved. The project also focussed on adding some new algorithms so that testing of GANs becomes feasible.


Improving Serialization of GANs

The first challenge in the existing GAN framework was to enable loading and saving the GAN model correctly. In mlpack the loading and saving of models was done with the help of Boost Serialization. The most important responsibility was to have complete consistency so that all the functionalities of the model are working perfectly fine after saving and loading the model. The Pull Request #1770 focussed on this and it will get merged soon.

Providing Option to Calculate Exact Objective

In order to make Dual Optimizer functionality efficient it was required to remove the overhead of calculating the objective over the entire data after optimization. In order to do so I opened #109 in mlpack's ensmallen repository. It is merged and now ensmallen's Decomposable Optimizer's provide an option to user that weather he wish to calculate exact objective after optimization or not.

Dual Optimizer for GANs

Various research papers published related to GANs used two seperate Optimizer's in their experiments. However mlpack GAN framework had only one optimizer. Due to this testing of GAN was quite tedious. So the main aim of the #1888 Pull Request was to add Dual Optimizer functonality in GANs. The implementation of Dual Optimizer is quite complete and currently it's testing is going on.

Label Smoothing in GANs

One sided label smoothing mentioned in the Improved GAN paper was seen to give better result while training GAN model. Also in order to add LSGAN in mlpack label smoothing was required. It's implementation and testing is quite complete in #1915, however to make label smoothing work perfectly some commits from serialization PR were required. So, it will get merged once #1770 gets merged.

Bias Visitor

In order to prevent normalizing the Bias parameter in Weight Norm Layer a bias visitor was required to set the bias parameters of a layer. The first step was to add getter method Bias() in some layers. Afterwards these getter methods were used to set the weights. The visitor is merged with the help of #1956 PR.

Work Around to add more layers

The Boost::variant method is able to handle only 50 types. So inorder to add more layers to mlpack's ANN module a work around was required. After digging somewhat about the error online I found a work around. The Boost::variant method provides an implicit conversion which enables adding as many layers as we can with the help of tree like structure. The #1978 PR was one of the fastest to get merged. I just completed it in two days such that the Weight Norm layer gets merged.

Weight Normalization Layer

Weight Normalization is a technique similar to Batch Normalization which normalize the weights of a layer rather than the activation. In this layer only Non-bias weights are normalized. Due to normalization the gradients are projected away from the weight vector, thus testing the gradients got tedious. The layer is implemented as a wrapper around another layer in #1920.

Inception Layer

In order to complete my Frechet Inception Distance PR GoogleNet was required. In order to do that Inception Layer is required. There are various versions of the Inception Layer. The first version of the layer is quite complete. However #1950 will be merged after implementing it's all three versions. The Inception Layer is basically just a wrapper around a Concat Layer.

Frechet Inception Distance

Frechet Inception Distance is used for testing the performance of the GAN model. It uses the concept of Frechet Distance which compares two Gaussian Distribution as well as the parameters of the Inception model. In order to get the parameters of Inception model #1950 will be required to merge first. The Frechet Distance is currently implemented in #1939 and it will be integrated with Inception Model once it is merged.

Fix failing radical test

While working on this PR I learned how important and tough testing is. The RadicalMainTest was failing about 20 times in 100000 iterations. After quite a lot of digging it was found that the reason was that random values were being used for testing. With this PR I learned about Eigen Vectors, Whitening of a matrix and many other important concepts. The #1924 PR provided a fix matrix for the test.

Serialization of glimpse, meanPooling and maxPooling layers

While working on Gan Serialization, I found that the glimpse, meanPooling and maxPooling are not serialized properly. I fixed the serialization in #1905 PR. Finding the error was one of the patience testing job but it felt quite satisfied after fixing it.

Generator Update of GANS.

While testing GANs I found one error in the update mechanism of it's Generator. The issue is being discussed with the mentors, however the error seems ambiguous. Hence, #1933 will get merged after arriving at the conclusion on the issue.


Least Squares GAN uses Least Squares error along with smoothed labels for training. It's implementation is quite complete and #1972 will get merged once LSGANs testing will get completed.

Pull Request Summary

Merged Pull Requests

Open Pull Requests

Future Work

  • Completion of Open Pull Requests.
  • Addition of Stacked-GAN in mlpack.
  • Command Line Interface for training GAN models.
  • Command Line Interface for testing GAN models.


I learned quite a lot while working with mlpack till now. When I joined mlpack I was quite a beginner in Machine Learning and in the past months I have learned quite a lot. I also learned quite a lot how Object Oriented Programming helps in Developing a big project. Also my patience got tested while debugging the code I have written. Overall it was quite good learning and fun.

I will keep contributing and will ensure that all of my Open PR's get merged.

I would also like to thank my mentors ShikharJ, zoq and rcurtin for their constant support and guidance throughout the summer. I learned quite a lot of things from them. I would also like to thank Google to give me an opportunity to work with such highly experienced people.

Implementing an mlpack-Tensorflow Translator - Summary

Implementing an mlpack-Tensorflow Translator - Summary

The Idea

The general objective of this project is to allow the interconversion of trained neural network models among mlpack and all other popular frameworks. For this purpose, two converters have been created:

  • ONNX-mlpack-converter
  • mlpack-Torch-converter

ONNX being a central junction supporting a number of popular frameworks including but not limited to Tensorflow, Torch, Keras and Caffe, the Tensorflow-Torch-ONNX-mlpack conversion is made possible through this project.

The reason why we chose Torch over ONNX for the mlpack-Torch-converter is because the C++ implementation of the ONNX library doesn't directly support model creation. It can still be done though, as nothing is impossilbe to achieve and ONNX models are nothing but protobuf files. There was no robust reason of choosing Torch over Tensorflow except for the fact that Torch's C++ API seemed to be more robust. That being said it actually boils down to one's personal choice and it rightly did boil down to my preference of exploiting the opportunity of learning a new framework instead of working with Tensorflow, with which I was largely familiar.

The Code

The code directly associated with the converters are in the repository under the /src folder, while the tests are under the /src/Tests folder and converted models under the /Examples folder. The tests need and will receive an updateThis project mainly has three source files:

  • model_parser.hpp which is a header file supporting the creation of an mlpack model from a user-defined json file
  • model_parser_impl.hpp which contains the implementation of the definitions present in model_parser.hpp
  • onnx_to_mlpack.hpp which contains the necessary functions to convert ONNX models to mlpack format
  • mlpack_to_torch.hpp which contains the necessary function to convert mlpack models to Torch format

This project however, has additional dependencies like LibTorch and ONNX and is/will be clearly mentioned in the readme file. This is supposed to exist as a separate repository under mlpack.

JSON parser

This parser can be used in a number of ways, like obtaining a LayerTypes<> object corresponding to a string containing the layer type and a map containing the attributes as follows:

std::map<string, double> layerParams;
layerParams["insize"] = 4;
layerParams["outsize"] = 10;
LayerTypes<> linearLayer = getNetworkReference("linear", layerParams);

It can also be used to convert a json file to an mlpack model by calling the convertModel() function and if needed overriding the trainModel() function. An example of the using the converter which will train the model and display the train and validation accuracies is:

std::string fileName = "network.json";
arma::mat dataMat;
data::Load("train.csv", dataMat, true);
Log::Info << "Data loaded" << "\n\n";
dataMat = dataMat.submat(0, 1, dataMat.n_rows - 1, dataMat.n_cols - 1);
arma::mat train, valid;
data::Split(dataMat, train, valid, 0.1);
arma::mat trainX = normalise(train.submat(1, 0, train.n_rows-1,
arma::mat trainY = train.row(0) + 1;
arma::mat validX = normalise(valid.submat(1, 0, valid.n_rows-1,
arma::mat validY = valid.row(0) + 1;
Dataset dataset(trainX, trainY, validX, validY);
convertModel(fileName, dataset);

The trainModel() has not been overridden here but it may be necessary in most cases. However, it should be noted that most but not all layers, initialization types, loss functions and optimizers are supported by this converter. An example of a JSON file containing all the details can be specified as:

"loss": {
"type": "negativeloglikelihood"
"init": {
"type": "glorot"
"optimizer": {
"type": "adam",
"stepsize": 5e-3,
"batchsize": 64,
"tolerance": 1e-8,
"cycles": 20
"network": [
"type": "linear",
"units": 200
"type": "relu"
"type": "linear",
"units": 100
"type": "relu"
"type": "dropout",
"ratio": 0.2
"type": "linear",
"units": 10
"type": "softmax"

ONNX-mlpack converter

Given the ONNX model path and the desired path for storing the converted mlpack model, the converter can do the rest. However, for converting models that take images as input, i.e., convolutional models, the image width and height need to be mentioned too. An example of the usage is:

convertModel("LinearModel.onnx", "ConvertedLinearModel.bin"); /* for a linear model */
convertModel("ConvModel.onnx", "ConvertedConvModel.bin", 32, 28); /* for a convolutional model with input image height 32 and width 28 */

To be noted is that most but not all layers are till now supported by this converter.

mlpack-Torch converter

This converter provides an API similar to the previous one. An example would be:

convertModel("Model.bin", "ConvertedModel.pth");

In case the case of convolutional models too, the input image dimensions need not be mentioned. For directly obtaining the Torch model from a given mlpack model, the convert() function can be used as shown below:

torch::nn::Sequential convertedModel = convert(ffnModel); /* ffnModel is of type mlpack::ann::FFN<> */

This converter also has some limitations pertaining to the layers that can be converted. Moreover, this converter is not yet in working state right now because of a number of yet to be merged PRs in the main mlpack repository.

Additional Pull Requests

The above converters did require a number of changes to the original mlpack repo and are listed as follows:

  • 1985 adds accessor and modifier methods to a number of layers.
  • 1958 originally meant to add the softmax layer to mlpack's ANN module but the PR itself is a bit messed up (has unwanted files associated with it) and needs to be pushed again.

There are also a couple of pull requests that require some rectification before I can push them.


I owe the completion of this project to the entire mlpack community for helping me out whenever I got stuck. My mentor Atharva had given me the exact guidance and support I required during this period. My concepts about backpropagation have been crystal clear after we manually wrote down the steps on paper. He used to give me hints to encourage me and in the end I could do it entirely by myself. Understanding this as well as mlpack's way of implementing them (the matrices g and gy in the Backward() function were the confusing ones) took around an entire week but it was a fun experience. This is just one instance, there were many more during this 12 week period. Marcus and Ryan were also no less than pseudo-mentors for me.

Marcus was my go to person during the Summer for any doubt regarding the C++ implementation of the mlpack ANN implementation or pretty much anything else. I have a habit of spending too much time on things that seem difficult to solve, sometimes a couple of days (when I should have ideally tried for a couple of hours before asking for help) and even if I fail to solve it, I would ask Marcus on the IRC and we would arrive at a solution in less than an hour.

Ryan has been a constant source of support since my association with mlpack. When I had started with an issue, back sometime in February, Ryan had helped me design the layout of the program which would later be the JSON model parser. There were numerous other instances during this period (and many more to come) when my code wouldn't work and Ryan would help me solve it.

Last but not the least, I have also learnt a lot from the general discussions in IRC and would like to thank each and everyone in the mlpack community for the brilliant interaction. I would also like to thank Google for giving me this opportunity to get involved in open source and with the mlpack community in particular.

Advanced Kernel Density Estimation Improvements - Summary

Advanced Kernel Density Estimation Improvements - Summary


Kernel Density Estimation (KDE) is a, widely used, non-parametric technique to estimate a probability density function. mlpack already had an implementation of this technique and the goal of this project is to improve the existing codebase, making it faster and more flexible.

These improvements include:

  • Improvements to the KDE space-partitioning trees algorithm.
  • Cases in which data dimensionality is high and distance computations are expensive.


We can summarize the work in 3 PRs:

Implement probabilistic KDE error bounds

Pull request #1934.

Up to this moment, it was possible to set an exact amount of absolute/relative error tolerance for each query point in the KDE module. The algorithm would then try to accelerate as much as possible the computations making use of the error tolerance and space-partitioning trees.

Sometimes an exact error tolerance is not needed and it would mean a lot for flexibility to be able to select a fuzzy error tolerance. The idea here is to select an amount of error tolerance that would have a certain probability of being accomplished (e.g. with a 95% probability, each query point will differ as much as 5% from the exact real value). This idea comes from this paper.

This is accomplished by probabilistically pruning tree branches. This probability is handled in a smart way so that when an exact prune is made or some points are exhaustively evaluated, the amount of probability that was not used is not lost, but rather reclaimed and used in later stages of the algorithm.

Other improvements and fixes were made in this PR:

  • Statistics building order was fixed for cover and rectangular trees.
  • Scoring function evaluation was fixed for octrees and binary trees.
  • Simplification of metrics code.
  • Assignment operator was added for some trees (issue #1957).

Subspace tree

Pull request #1962.

The dominant cost in KDE is metrics evaluation (i.e. distance between two points) and usually not all of these dimensions are very relevant. The idea here is to use Principal component analysis (PCA), as a dimensionality reduction technique, to take points to a lower dimensional subspace where distance computations are computationally cheaper (this is done in this paper). At the same time the idea is to preserve the error tolerance, so that it is easy to know the amount of maximum error each estimation will have.

This is done by calculating a PCA base for each leaf-node and then merging those bases as we climb to higher nodes.

This PR is still a work in progress.

Reclaim unused KDE error tolerance

Pull request #1984.

This paper mentions the idea of reclaiming not used error tolerance when doing exact pruning. The idea is that, when a set of points are exhaustively evaluated, the error of these computations is zero, so there is an amount of error tolerance for pruning that has not been used and it could be used in later stages of the algorithm. This provides the algorithm with the capability of adjusting as much as possible to the error tolerance and pruning more nodes.

Thanks to Ryan's derivations, we also realized that the bounds of the previous algorithm were quite loose, so a lot of error tolerance was being wasted. This has been reimplemented and will probably represent a huge source of speedup.

Future work

There are some ideas that we did not have time to finish but are quite interesting:

  • Finish subspace tree PR.
  • In the proposal there was the idea of implementing ASKIT and this is really interesting for me.


This has been an awesome summer. I had the opportunity to contribute to a meaningful project and will continue to do so in the future, since there are many ideas that came while I was working on this or did not have time to finish. It has been a very enriching experience for me, I learned lot, it was a ton of fun and, definitely, debugging skills got sharpened.

I would like to thank the whole mlpack community as well as Google for this opportunity. A special mention has to be made for my mentor rcurtin, without his help when I was stuck and his new ideas I would not have enjoyed this as much, so thank you.

For people reading this and thinking about applying for GSoC in the future: Apply now, it will be fun and you will learn a lot from highly skilled people.

Quantum Gaussian Mixture Models - Summary

Quantum Gaussian Mixture Models - Summary

I wrote the final report at in detail.

Thanks for reading :)

Proximal Policy Optimization method - Summary

Proximal Policy Optimization method - Summary

Time flies, the summer is coming to end, we come to the final week of GSoC. This blog is the summary of my GSoC project – implementation of one of the most promising dee reinforcement learning method. During this project, I implemented policy optimization method, one classical continuous task, i.e. Lunar lander, to test my implementation. Also my pull request for prioritized experience replay was merged into master.


My work mainly locates in methods/reinforcement_learning, methods/ann/ loss_functions and methods/ann/dists folder

  • ppo.hpp: the main entrance for proximal policy optimization.
  • ppo_impl.hpp: the main implementation for proximal policy optimization.
  • lunarlander.hpp: the implementation of the continuous task.
  • prioritized_replay.hpp: the implementation of prioritized experience replay.
  • sumtree.hpp: the implementation of segment tree structure for prioritized experience replay.
  • environment: the implementation of two classical control problems, i.e. mountain car and cart pole
  • normal_distribution.hpp: the implementation of normal distribution which accept mean and variance to construct distribution.
  • empty_loss.hpp: the implementation of empty loss which used in proximal policy optimization class, we calculate the loss outside the model declaration, the loss does nothing just backward the gradient.

In total, I contributed following PRs, most of the implementations are combined into the single Pull request in proximal policy optimization.

Proximal Policy Optimization

Prioritized Experience Replay

Change the pendulum action type to double

Fix typo in Bernoulli distribution

remove move() statement in hoeffding_tree_test.cpp

minor fix up


The most challenging parts are:

  • One of the most challenging parts of the work is that how to calculate the surrogate loss for updating the actor-network, it is different from the updater for the critic network which can be optimized by regression on mean square error. The actor-network is optimized by maximizing the PPO-clip objective, it is a little difficult to implement it like the current loss function which calculated by passing target and predict parameters, so I calculate it outside the model and the declaration of the model is passed into the empty loss function. All the backward process except the model part are calculated by my implementation, like the derivation to the normal distribution's mean and variance.
  • Another challenging part of the work is that I implement the proximal policy optimization in the continuous environment, the action is different from the discrete environment. In the discrete environment, the agent just output one dimension's data to represent the action, while in the continuous environment, the agent action prediction is more complicated, the common way to achieve predicting the agent action is to predict a normal distribution, then use the normal distribution to sample an action.
  • Also there are other challenging parts of the work, such as tuning the neural network to make the agent to work. This blocks me now so that I need to tune more parameters to pass the unit test. This part is also a time-consuming process.

Future Work

The pull request of proximal policy optimization is still underdeveloped due to the tuning parameters for the unit test, but it will be fixed soon.

PPO can be used for environments with either discrete or continuous action spaces, so another future work will be to support the discrete action spaces, even though it is easy than the continuous task.

In some continuous environment task, the dimensions of action are more one, we need to handle this situation.


A great thanks to Marcus for his mentorship during the project and detailed code review. Marucs is helpful and often tell me that do not hesitate to ask questions. He gives me great help when something blocked me. I also want to thank Ryan's response in IRC even though at midnight. The community is kind since the first meeting, we talk about a lot of things which contain different areas. Finally, I also appreciate t he generous funding from Google. It is a really good project to sharpen our skills. I will continue to commit to mlpack and make the library more easy to use and more powerful.

Thank for this unforgettable summer session.

String Processing Utilities - Summary

String Processing Utilities - Summary

This post summarize my work for GSoC 2019


The proposal for String Processing Utilities involved implementing basic functions which would be helpful in processing and encoding text and then latter implementing machine learning algorithms on it.


String Cleaning PR1904

  • The implementation started with implementing String Cleaning Functions, A class-based approach was used to implement the function, following were the function which was implemented :
    1. RemovePunctuation(): The function allows you to pass a string known as punctuation, which could involve all the punctuations to be removed.
    2. RemoveChar(): This function allows you to pass a function pointer or function object or a lambda function which return a bool value and if the return value is true, the character would be removed.
    3. LowerCase() : Convert the text to lower case.
    4. UpperCase() : Convert the text to upper case.
    5. RemoveStopWords(): This function accepts a set of stopword and removed all those words from the corpus.
  • After implementing the class, I started implementing CLI and python binding, since mlpack used armadillo to load matrix and hence I had to write a function which could read data from a file using basic input output stream. The types of file are limited to .txt or .csv. The binding has different parameters to set and would work as required based on parameters passed.

String Encoding PR1960

  • The initial plan was to implement a different class for different encoding methods such as BOW encoding, Dictionary encoding or Tf-Idf encoding, but we found that the class had lot of codes which we redundant, and hence we decided to implement a policy-based method and the implement different policy for each of the encoding type.
  • We implemented StringEncoding class which has the function for encoding the corpus (accepts a vector as input) and outputs you the encoded data based on the policy and output type, vector or arma::mat, Also provided an option with padding and to avoid padding depending on the encoding policy
  • We also designed a helper class StringEncodingDictionary, which maintains a dictionary mapping of the token to its labels, The class is a templated class based on the type of tokens, which involves string_view or int type. We arrived at the conclusion of implementing this helper class based on the speed profiling done by lozhnikov. He concluded some results, and thus we decided to implement a helper class.

Policies for String Encoding PR1969

  • We decided to implement three policy for encoding, namely as follows :
    1. Dictionary Encoding: This encoding policy allows you to encode the corpus by assigning a positive integer number to each unique token and treats the dataset as categorical, it supports both padding and non-padding output.
    2. Bag of Words Encoding: The encoder creates a vector of all the unique token and then assigns 1 if the token is present in the document, 0 if not present.
    3. Tf-Idf Encoding: The encoder assigns a tf-idf number to each unique token.

Tokenizer PR1960

  • To help with all the string processing and encoding algorithms, we often needed to tokenize the string and thus we implemented two tokenizers in mlpack. The two tokenizers are as follows:
    1. CharExtract: This tokenizer is used to split a string into characters.
    2. SplitByAnyOf: The SplitByAnyOf class tokenizes a string using a set of delimiters.

After implementing all the encoding policies and tokenizer, I decided to implement CLI and python binding PR1980 for String Encoding, Both string encoding and string cleaning function share a lot of common function and hence we decided to share a common file string_processing_util.hpp between the two bindings.

My proposal also included Implementation of Word2Vec, but we decided to opt-out since we found that google patented it.

Post GSoC

A lot of the codes I implemented are sketchy since I have used boost::string_view and other boost algorithms and hence we need to do a speed check and find out the bottlenecks if any. Also, my plan is to implement any substitute for word2vec, such as GLOVE or any other word embedding algorithms. I had implemented a function for One hot Encoding, which I thought could be useful for word2vec, but we found out that it was buggy to a small extent and hence I have to find a way out and also have to implement some overloaded functionality.

Lastly, the most important part, I have to write tutorials for all the functionality provided to allow someone to understand how to drop these functions in their codebase, Also excited to do some machine learning stuff on text dataset using mlpack.


A big thanks to lozhnikov, Rcurtin, Zoq, and the whole mlpack community. This was my first attempt at GSoC, and I am happy that I was successful in it. I fell in love with the open-source world and it was a wonderful experience. I gathered a lot of knowledge in these past 3 months. I will continue to be in touch with the mlpack community and seek to do more contributions to the project in the future.

Also, I think its time to order some mlpack stickers :)

Thanks :)

Application of ANN Algorithms Implemented in mlpack - Summary

Application of ANN Algorithms Implemented in mlpack - Summary


All GSoC contributions can be summarized by the following.

Contributions to mlpack/models

Pull Requests

Contributions to mlpack/mlpack

Merged Pull Requests

Issues Created

Contributions to zoq/gym_tcp_api

Merged Pull Requests

Loading Images

Image utilities supports loading and saving of images.

It supports filetypes jpg, png, tga,bmp, psd, gif, hdr, pic, pnm for loading and jpg, png, tga, bmp, hdr for saving.

The datatype associated is unsigned char to support RGB values in the range 1-255. To feed data into the network typecast of arma::Mat may be required. Images are stored in matrix as (width * height * channels, NumberOfImages). Therefore imageMatrix.col(0) would be the first image if images are loaded in imageMatrix.

Loading a test image. It also fills up the ImageInfo class object.

data::ImageInfo info;
data::Load("test_image.png", matrix, info, false, true);

Similarily for saving images.

const size_t width = 25;
const size_t height = 25;
const size_t channels = 3;
const size_t quality = 90;
data::ImageInfo info(width, height, channels, quality);
data::Save("test_image.bmp", matrix, info, false, true);


VGG-19 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 19 layers deep and can classify images into 1000 object categories. Details about the network architecture can be found in the following arXiv paper: For more information, read the following paper:

author = {Simonyan, K. and Zisserman, A.},
title = {Very Deep Convolutional Networks for Large-Scale Image Recognition},
journal = {CoRR},
year = {2014}

Tiny Imagenet

Tiny ImageNet Challenge is the default course project for Stanford CS231N. It runs similar to the ImageNet challenge (ILSVRC). The goal of the challenge is for you to do as well as possible on the Image Classification problem. The model uses VGG19 to classify the images into 200 classes.

MNIST handwritten digits

The VGG19 model used for classification. It creates a sequential layer that encompasses the various layers of the VGG19.

// Input parameters, the dataset contains images with shape 28x28x1.
const size_t inputWidth = 28, inputHeight = 28, inputChannel = 1;
bool includeTop = true;
VGG19 vggnet(inputWidth, inputHeight, inputChannel, numClasses, , "max", "mnist");
Sequential<>* vgg19 = vggnet.CompileModel();
// Compiling the architecture.
FFN<NegativeLogLikelihood<>, XavierInitialization> model;
model.Add<IdentityLayer<> >();
model.Add<LogSoftMax<> >();

Sentiment Analysis

We will build a classifier on IMDB movie dataset using a Deep Learning technique called RNN which can be implemented using Long Short Term Memory (LSTM) architecture. The encoded dataset for IMDB contains a vocab file along with sentences encoded as sequences. A sample datapoint [1, 14, 22, 16, 43, 530,..., 973, 1622, 1385, 65]. This sentence contains 1st word, 14th word and so on from the vocabulary.

A vectorized input has to be fed into the LSTM to explot the RNN architecture. To vectorize the sequence dictionary encoding is used. The sample shown would be transformed to [[1, 0, 0,.., 0], [0,..,1,0,...], ....], here the first list has !st position as 1 and rest as 0, similarly the second list has 14th element 1 and rest 0. Each list has a size of the numbers of words in the vocabulary.

Accuracy Plots


Time Series Analysis


We want to use the power of the LSTM in Google stock prediction using time series. We will use mlpack and Recurrent Neural Network(RNN).

// No of timesteps to look in RNN.
const int rho = 25;
size_t inputSize = 5, outputSize = 2;
// RNN model.
RNN<MeanSquaredError<>,HeInitialization> model(rho);
model.Add<IdentityLayer<> >();
model.Add<LSTM<> > (inputSize, 10, maxRho);
model.Add<Dropout<> >(0.5);
model.Add<LeakyReLU<> >();
model.Add<LSTM<> > (10, 10, maxRho);
model.Add<Dropout<> >(0.5);
model.Add<LeakyReLU<> >();
model.Add<LSTM<> > (10, 10, maxRho);
model.Add<LeakyReLU<> >();
model.Add<Linear<> >(10, outputSize);

MSE Plot



Implementation of an example of using Recurrent Neural Network (RNN) to make forcasts on a time series of electric usage (in kWh), which we aim to solve using a recurrent neural network with LSTM.

MSE Plot


Results on other datasets - International Airline Passengers

This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations.

We will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1) over the period of 'rho' frames.

Mean Squared error upto 10 iterations. Training ...

1 - MeanSquaredError := 0.146075
2 - MeanSquaredError := 0.144882
3 - MeanSquaredError := 0.09501
4 - MeanSquaredError := 0.0875479
5 - MeanSquaredError := 0.0836975
6 - MeanSquaredError := 0.0796567
7 - MeanSquaredError := 0.0804368
8 - MeanSquaredError := 0.0803483
9 - MeanSquaredError := 0.0809061
10 - MeanSquaredError := 0.076797

MSE Plot


Future Work

The tutorial associated with the models implemented are not published to mlpack webpage. The blogs are needed to be linked to a common place for the user. VGG19 is being trained on tiny-imagenet dataset, the results of which will be added.


I am sincerely grateful to the whole mlpack community especially Ryan Curtin, Marcus Edel, Sumedh Ghaisas, Shikhar Jaiswal for the support I received. It was an awesome experience.

Application of ANN Algorithms Implemented in mlpack - Week 12

Application of ANN Algorithms Implemented in mlpack - Week 12


  • Rectified Sentiment Analysis neural network model to introduce Embedding layer.

Thanks for reading. Have a nice day :smile:

NEAT & Multiobjective Optimization - Summary


The aim of my project for Google Summer of Code 2019 was to implement NeuroEvolution of Augmenting Topologies (NEAT) in mlpack based on Kenneth Stanley's paper Evolving Neural Networks through Augmenting Topologies. I would also implement support for "phased searching", a searching scheme devised by Colin Green to prevent genome bloat when training NEAT on certain complex tasks.

Besides this, my project aimed to create a framework for multi-objective optimization within mlpack's optimization library ensmallen. This would involve the implementation of several test functions and indicators, as well as an optimizer, NSGA-III.



NeuroEvolution of Augmenting Topologies (NEAT) is a genetic algorithm that can evolve networks of unbound complexity by starting from simple networks and "complexifying" through different genetic operators. It has been used to train agents to play Super Mario World and generate "genetic art".

I implemented NEAT in PR #1908. The PR includes the entire implementation including phased searching, associated tests and documentation. NEAT was tested on:

  • The XOR test, where it's challenge was to create a neural network that emulated a two input XOR gate. NEAT was able to solve this within 150 generations with an error less than 0.1.
  • Multiple reinforcement learning environments implemented in mlpack.
  • The pole balancing task in OpenAI Gym. This was done using the Gym TCP API implemented by my mentor, Marcus Edel. A short video of the trained agent can be seen here.
  • The double pole balancing test. I implemented this as an addition to the existing reinforcement learning codebase. NEAT performed well on both the Markovian and non-Markovian versions of the environment.

The pull request is rather large and is still under review.

Multi-objective optimization

Multi-Objective Optimization is an area of multiple criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. NSGA-III (Non-dominated Sorting Genetic Algorithm) is an extension of the popular NSGA-II algorithm, which optimizes multiple objectives by associating members of the population with a reference set of optimal points.

I implemented support for multi-objective optimization in PR #120. This PR includes:

  • An implementation of NSGA-III. This code is still being debugged and tested.
  • Implementation of the DTLZ test functions.
  • Implementation of multiple indicators, including the epsilon indicator and the Inverse Generational Distance Plus (IGD+) indicator.
  • Associated documentation.

Other work

Besides the work explicitly stated in my project, I also made some smaller changes and additions in the reinforcement learning codebase. This includes:

  • Implementation of both a Markovian and non-Markovian (velocity information not provided) version of the double pole balancing environments (see #1901 and #1951).
  • Fixed issues with consistency in mlpack's reinforcement learning API, where certain functions and variables were missing from some environments. Besides this, the environments now have an option to terminate after a given number of steps. See #1941.
  • Added QLearning tests wherever necessary to prevent issues with consistency in the future. See #1951.

Future Work

  • Get the NEAT PR merged.
  • Finish testing and debugging the NSGA-III optimizer.
  • Write more indicators for multi-objective optimization.

Final Comments

I would like to thank my mentors, Marcus Edel and Manish Kumar for all their help and advice, as well as bearing with me through the multiple technical difficulties I faced during this summer. I'd also like to thank all the other mentors and participants for their support. I hope to continue to contribute to this community in the future, and look forward to the same.

If anyone is interested in seeing my weekly progress through the program, please see my weekly blog.

Implementing Improved Training Techniques for GANs - Week 11

Implementing Improved Training Techniques for GANs - Week 11

This week after Toshal’s PR on support for more than 50 layers was merged a lot of my work was ready to be merged. Specifically, the padding layer has now been completed and merged. Also the mini batch discrimination layer is complete and the build is finally passing so, we should be able to merge that soon as well!

Other than that I have continued my work with spectral norm layer. One difficultly with the implementation is that the the layer uses a power iteration method during the forward pass for computing the spectral norm of the weight matrix. I am not completely sure how we would compute the gradient for this approximation. I have been try to do the manual derivation for this but it is very tedious and I have not been successful with it so far. Hopefully, in the coming week I can get it to work. Otherwise I hope to continue the work post-GSoC.

Implementing Essential Deep Learning Modules - Week 12

Implementing Essential Deep Learning Modules - Week 12

Well my Bias visitor PR is merged. Also my exact-objective PR is merged. I also added a workaround to enable adding more layers to the ANN. It looks like we can have as many layers as we want. We can always develop a tree like structure for adding more layers. The most interesting thing about the workaround PR was that the Branch of the PR could be deleted just after two days of it's creation.

As soon as the workaround for adding more layer got merged, My Weight-Normalization layer was ready to merge. My Inception Layer can also be now merged and I will start working on it soon.

In the upcoming week, I am thinking to add a small test for LSGAN so that it can get merged. It's actual image testing would need some time. Most of the online available tests use Dual Optimizer. So deciding the parameters for training would be challenging. If poosible I will also try to finish my work on Inception Layer.

I am also currently testing my Dual Optimizer PR. It's running from long time on savannah approximately (50 Days). Hopefully, I see good results after it gets completed :)

Quantum Gaussian Mixture Models - Week 12

Quantum Gaussian Mixture Models - Week 12

This week, I comapared QGMM with GMM using the percentage of the convergence on the clusters of the observations as an indicator of the training performance. With a total of 200 experiments, I checked QGMM showed good performance when the initial phi 0, while phi 90 was bad. When we set the initial phi as 0, it wasn't changed from the initial value, but when we set the initial phi as 90, it increased between 91 ~ 269. Therefore, the cosine of phi became negative and the two distributions were overlaid. Actually, I tried to control the value of phi, but I didn't get any workaround. Thus, I should come up with the method to control the phi properly for stable performance.

Lastly, I checked the difference between the augmented and normal Lagrangian multipliers updating lambda every 1000 iterations. Surely, when using the augmented Lagrangian method, it's easy to set the initial lambda for some cases in which we don't know the proper initial value, but if we can set the proper lambda initially, the normal Lagrangian showed better performance overall. I think I should look into this research later with additional data sets because there are some hyperparameters in the augmented method as well.

Thanks for reading :)