mlpack
blog

Table of Contents
A list of all recent posts this page contains.
 Implementing Essential Deep Learning Modules  Summary
 Implementing an mlpackTensorflow Translator  Summary
 Advanced Kernel Density Estimation Improvements  Summary
 Quantum Gaussian Mixture Models  Summary
 Proximal Policy Optimization method  Summary
 String Processing Utilities  Summary
 Application of ANN Algorithms Implemented in mlpack  Summary
 Application of ANN Algorithms Implemented in mlpack  Week 12
 NEAT & Multiobjective Optimization  Summary
 Implementing Improved Training Techniques for GANs  Week 11
 Implementing Essential Deep Learning Modules  Week 12
 Quantum Gaussian Mixture Models  Week 12
Implementing Essential Deep Learning Modules  Summary
Overview
This summer I got an opportunity to work with mlpack
organisation. My proposal for Implementing Essential Deep Learning Modules
got selected, and under the mentorship of Shikhar Jaiswal
and Marcus Edel
I was working on it for the last three months.
The main aim of my project was to enhance the existing Generative Adversarial Network (GANS)
framework present in mlpack
. The target was to add some more functionality
in GANs
and to implement some new algorithms to mlpack
such that performance of GANs
is improved. The project also focussed on adding some new algorithms so that testing of GANs
becomes feasible.
Implementation
Improving Serialization of GANs
The first challenge in the existing GAN
framework was to enable loading and saving the GAN
model correctly. In mlpack
the loading and saving of models was done with the help of Boost Serialization
. The most important responsibility was to have complete consistency so that all the functionalities of the model are working perfectly fine after saving and loading the model. The Pull Request #1770 focussed on this and it will get merged soon.
Providing Option to Calculate Exact Objective
In order to make Dual Optimizer functionality efficient it was required to remove the overhead of calculating the objective over the entire data after optimization. In order to do so I opened #109 in mlpack's ensmallen
repository. It is merged and now ensmallen's Decomposable
Optimizer's provide an option to user that weather he wish to calculate exact objective after optimization or not.
Dual Optimizer for GANs
Various research papers published related to GANs
used two seperate Optimizer's in their experiments. However mlpack GAN
framework had only one optimizer. Due to this testing of GAN
was quite tedious. So the main aim of the #1888 Pull Request was to add Dual Optimizer
functonality in GANs. The implementation of Dual Optimizer
is quite complete and currently it's testing is going on.
Label Smoothing in GANs
One sided label smoothing mentioned in the Improved GAN paper was seen to give better result while training GAN
model. Also in order to add LSGAN in mlpack label smoothing was required. It's implementation and testing is quite complete in #1915, however to make label smoothing work perfectly some commits from serialization PR were required. So, it will get merged once #1770 gets merged.
Bias Visitor
In order to prevent normalizing the Bias
parameter in Weight Norm Layer a bias visitor was required to set the bias parameters
of a layer. The first step was to add getter method Bias()
in some layers. Afterwards these getter methods were used to set the weights. The visitor is merged with the help of #1956 PR.
Work Around to add more layers
The Boost::variant
method is able to handle only 50 types. So inorder to add more layers to mlpack's ANN module a work around was required. After digging somewhat about the error online I found a work around. The Boost::variant
method provides an implicit conversion which enables adding as many layers as we can with the help of tree like structure. The #1978 PR was one of the fastest to get merged. I just completed it in two days such that the Weight Norm layer gets merged.
Weight Normalization Layer
Weight Normalization is a technique similar to Batch Normalization
which normalize the weights
of a layer rather than the activation. In this layer only Nonbias
weights are normalized. Due to normalization the gradients are projected away from the weight vector, thus testing the gradients
got tedious. The layer is implemented as a wrapper around another layer in #1920.
Inception Layer
In order to complete my Frechet Inception Distance PR GoogleNet was required. In order to do that Inception Layer
is required. There are various versions of the Inception Layer
. The first version of the layer is quite complete. However #1950 will be merged after implementing it's all three versions. The Inception Layer
is basically just a wrapper
around a Concat
Layer.
Frechet Inception Distance
Frechet Inception Distance
is used for testing the performance of the GAN
model. It uses the concept of Frechet Distance
which compares two Gaussian
Distribution as well as the parameters of the Inception
model. In order to get the parameters of Inception model #1950 will be required to merge first. The Frechet Distance
is currently implemented in #1939 and it will be integrated with Inception Model
once it is merged.
Fix failing radical test
While working on this PR I learned how important and tough testing is. The RadicalMainTest
was failing about 20 times in 100000 iterations. After quite a lot of digging it was found that the reason was that random values were being used for testing. With this PR I learned about Eigen Vectors
, Whitening
of a matrix and many other important concepts. The #1924 PR provided a fix matrix for the test.
Serialization of glimpse, meanPooling and maxPooling layers
While working on Gan Serialization, I found that the glimpse
, meanPooling
and maxPooling
are not serialized properly. I fixed the serialization in #1905 PR. Finding the error was one of the patience testing job but it felt quite satisfied after fixing it.
Generator Update of GANS.
While testing GANs
I found one error in the update mechanism of it's Generator
. The issue is being discussed with the mentors, however the error seems ambiguous. Hence, #1933 will get merged after arriving at the conclusion on the issue.
LSGAN
Least Squares GAN
uses Least Squares
error along with smoothed labels
for training. It's implementation is quite complete and #1972 will get merged once LSGANs testing will get completed.
Pull Request Summary
Merged Pull Requests
 Pull Request ensmallen/#109  Providing option to calculate exact objective
 Pull Request #1905  Serialization of glimpse, meanPooling and maxPooling layer
 Pull Request #1920  Weight Normalization Layer
 Pull Request #1924  Fix failing radical test
 Pull Request #1956  Bias Visitor
 Pull Request #1978  Work Around to add more layers
Open Pull Requests
 Pull Request #1770  Improving Serialization of GANs
 Pull Request #1888  Dual Optimizer for GANs
 Pull Request #1915  Label Smoothing in GANs
 Pull Request #1933  Error in Generator Update of GANs
 Pull Request #1939  Frechet Inception Distance
 Pull Request #1950  Inception Layer
 Pull Request #1972  LSGAN
Future Work
 Completion of Open Pull Requests.
 Addition of StackedGAN in
mlpack
.  Command Line Interface for training
GAN
models.  Command Line Interface for testing
GAN
models.
Conclusion
I learned quite a lot while working with mlpack till now. When I joined mlpack I was quite a beginner in Machine Learning
and in the past months I have learned quite a lot. I also learned quite a lot how Object Oriented Programming
helps in Developing a big project
. Also my patience got tested while debugging
the code I have written. Overall it was quite good learning and fun.
I will keep contributing and will ensure that all of my Open PR's get merged.
I would also like to thank my mentors ShikharJ
, zoq
and rcurtin
for their constant support and guidance throughout the summer. I learned quite a lot of things from them. I would also like to thank Google to give me an opportunity to work with such highly experienced people.
Implementing an mlpackTensorflow Translator  Summary
The Idea
The general objective of this project is to allow the interconversion of trained neural network models among mlpack and all other popular frameworks. For this purpose, two converters have been created:
 ONNXmlpackconverter
 mlpackTorchconverter
ONNX being a central junction supporting a number of popular frameworks including but not limited to Tensorflow, Torch, Keras and Caffe, the TensorflowTorchONNXmlpack conversion is made possible through this project.
The reason why we chose Torch over ONNX for the mlpackTorchconverter is because the C++ implementation of the ONNX library doesn't directly support model creation. It can still be done though, as nothing is impossilbe to achieve and ONNX models are nothing but protobuf files. There was no robust reason of choosing Torch over Tensorflow except for the fact that Torch's C++ API seemed to be more robust. That being said it actually boils down to one's personal choice and it rightly did boil down to my preference of exploiting the opportunity of learning a new framework instead of working with Tensorflow, with which I was largely familiar.
The Code
The code directly associated with the converters are in the repository https://github.com/sreenikSS/mlpackTensorflowTranslator under the /src folder, while the tests are under the /src/Tests folder and converted models under the /Examples folder. The tests need and will receive an updateThis project mainly has three source files:
 model_parser.hpp which is a header file supporting the creation of an mlpack model from a userdefined json file
 model_parser_impl.hpp which contains the implementation of the definitions present in model_parser.hpp
 onnx_to_mlpack.hpp which contains the necessary functions to convert ONNX models to mlpack format
 mlpack_to_torch.hpp which contains the necessary function to convert mlpack models to Torch format
This project however, has additional dependencies like LibTorch and ONNX and is/will be clearly mentioned in the readme file. This is supposed to exist as a separate repository under mlpack.
JSON parser
This parser can be used in a number of ways, like obtaining a LayerTypes<>
object corresponding to a string containing the layer type and a map containing the attributes as follows:
It can also be used to convert a json file to an mlpack model by calling the convertModel()
function and if needed overriding the trainModel()
function. An example of the using the converter which will train the model and display the train and validation accuracies is:
The trainModel()
has not been overridden here but it may be necessary in most cases. However, it should be noted that most but not all layers, initialization types, loss functions and optimizers are supported by this converter. An example of a JSON file containing all the details can be specified as:
ONNXmlpack converter
Given the ONNX model path and the desired path for storing the converted mlpack model, the converter can do the rest. However, for converting models that take images as input, i.e., convolutional models, the image width and height need to be mentioned too. An example of the usage is:
To be noted is that most but not all layers are till now supported by this converter.
mlpackTorch converter
This converter provides an API similar to the previous one. An example would be:
In case the case of convolutional models too, the input image dimensions need not be mentioned. For directly obtaining the Torch model from a given mlpack model, the convert()
function can be used as shown below:
This converter also has some limitations pertaining to the layers that can be converted. Moreover, this converter is not yet in working state right now because of a number of yet to be merged PRs in the main mlpack repository.
Additional Pull Requests
The above converters did require a number of changes to the original mlpack repo and are listed as follows:
 1985 adds accessor and modifier methods to a number of layers.
 1958 originally meant to add the softmax layer to mlpack's ANN module but the PR itself is a bit messed up (has unwanted files associated with it) and needs to be pushed again.
There are also a couple of pull requests that require some rectification before I can push them.
Acknowledgements
I owe the completion of this project to the entire mlpack community for helping me out whenever I got stuck. My mentor Atharva had given me the exact guidance and support I required during this period. My concepts about backpropagation have been crystal clear after we manually wrote down the steps on paper. He used to give me hints to encourage me and in the end I could do it entirely by myself. Understanding this as well as mlpack's way of implementing them (the matrices g and gy in the Backward()
function were the confusing ones) took around an entire week but it was a fun experience. This is just one instance, there were many more during this 12 week period. Marcus and Ryan were also no less than pseudomentors for me.
Marcus was my go to person during the Summer for any doubt regarding the C++ implementation of the mlpack ANN implementation or pretty much anything else. I have a habit of spending too much time on things that seem difficult to solve, sometimes a couple of days (when I should have ideally tried for a couple of hours before asking for help) and even if I fail to solve it, I would ask Marcus on the IRC and we would arrive at a solution in less than an hour.
Ryan has been a constant source of support since my association with mlpack. When I had started with an issue, back sometime in February, Ryan had helped me design the layout of the program which would later be the JSON model parser. There were numerous other instances during this period (and many more to come) when my code wouldn't work and Ryan would help me solve it.
Last but not the least, I have also learnt a lot from the general discussions in IRC and would like to thank each and everyone in the mlpack community for the brilliant interaction. I would also like to thank Google for giving me this opportunity to get involved in open source and with the mlpack community in particular.
Advanced Kernel Density Estimation Improvements  Summary
Abstract
Kernel Density Estimation
(KDE) is a, widely used, nonparametric technique to estimate a probability density function. mlpack
already had an implementation of this technique and the goal of this project is to improve the existing codebase, making it faster and more flexible.
These improvements include:
 Improvements to the
KDE
spacepartitioning trees algorithm.  Cases in which data dimensionality is high and distance computations are expensive.
Implementation
We can summarize the work in 3 PRs:
Implement probabilistic KDE error bounds
Up to this moment, it was possible to set an exact amount of absolute/relative error tolerance for each query point in the KDE
module. The algorithm would then try to accelerate as much as possible the computations making use of the error tolerance and spacepartitioning trees.
Sometimes an exact error tolerance is not needed and it would mean a lot for flexibility to be able to select a fuzzy
error tolerance. The idea here is to select an amount of error tolerance that would have a certain probability of being accomplished (e.g. with a 95% probability, each query point will differ as much as 5% from the exact real value). This idea comes from this paper.
This is accomplished by probabilistically pruning tree branches. This probability is handled in a smart way so that when an exact prune is made or some points are exhaustively evaluated, the amount of probability that was not used is not lost, but rather reclaimed and used in later stages of the algorithm.
Other improvements and fixes were made in this PR:
 Statistics building order was fixed for cover and rectangular trees.
 Scoring function evaluation was fixed for octrees and binary trees.
 Simplification of metrics code.
 Assignment operator was added for some trees (issue #1957).
Subspace tree
The dominant cost in KDE
is metrics evaluation (i.e. distance between two points) and usually not all of these dimensions are very relevant. The idea here is to use Principal component analysis
(PCA), as a dimensionality reduction technique, to take points to a lower dimensional subspace where distance computations are computationally cheaper (this is done in this paper). At the same time the idea is to preserve the error tolerance, so that it is easy to know the amount of maximum error each estimation will have.
This is done by calculating a PCA
base for each leafnode and then merging those bases as we climb to higher nodes.
This PR is still a work in progress.
Reclaim unused KDE error tolerance
This paper mentions the idea of reclaiming not used error tolerance when doing exact pruning. The idea is that, when a set of points are exhaustively evaluated, the error of these computations is zero, so there is an amount of error tolerance for pruning that has not been used and it could be used in later stages of the algorithm. This provides the algorithm with the capability of adjusting as much as possible to the error tolerance and pruning more nodes.
Thanks to Ryan
's derivations, we also realized that the bounds of the previous algorithm were quite loose, so a lot of error tolerance was being wasted. This has been reimplemented and will probably represent a huge source of speedup.
Future work
There are some ideas that we did not have time to finish but are quite interesting:
 Finish subspace tree PR.
 In the proposal there was the idea of implementing ASKIT and this is really interesting for me.
Conclusion
This has been an awesome summer. I had the opportunity to contribute to a meaningful project and will continue to do so in the future, since there are many ideas that came while I was working on this or did not have time to finish. It has been a very enriching experience for me, I learned lot, it was a ton of fun and, definitely, debugging skills got sharpened.
I would like to thank the whole mlpack community as well as Google for this opportunity. A special mention has to be made for my mentor rcurtin
, without his help when I was stuck and his new ideas I would not have enjoyed this as much, so thank you.
For people reading this and thinking about applying for GSoC in the future: Apply now, it will be fun and you will learn a lot from highly skilled people.
Quantum Gaussian Mixture Models  Summary
I wrote the final report at https://github.com/KimSangYeonDGU/GSoC2019 in detail.
Thanks for reading :)
Proximal Policy Optimization method  Summary
Time flies, the summer is coming to end, we come to the final week of GSoC. This blog is the summary of my GSoC project – implementation of one of the most promising dee reinforcement learning method. During this project, I implemented policy optimization method, one classical continuous task, i.e. Lunar lander, to test my implementation. Also my pull request for prioritized experience replay was merged into master.
Introduction
My work mainly locates in methods/reinforcement_learning
, methods/ann/ loss_functions
and methods/ann/dists
folder
ppo.hpp
: the main entrance for proximal policy optimization.ppo_impl.hpp
: the main implementation for proximal policy optimization.lunarlander.hpp
: the implementation of the continuous task.prioritized_replay.hpp
: the implementation of prioritized experience replay.sumtree.hpp
: the implementation of segment tree structure for prioritized experience replay.environment
: the implementation of two classical control problems, i.e. mountain car and cart polenormal_distribution.hpp
: the implementation of normal distribution which acceptmean
andvariance
to construct distribution.empty_loss.hpp
: the implementation of empty loss which used in proximal policy optimization class, we calculate the loss outside the model declaration, the loss does nothing just backward the gradient.
In total, I contributed following PRs, most of the implementations are combined into the single Pull request in proximal policy optimization.
Change the pendulum action type to double
Fix typo in Bernoulli distribution
remove move() statement in hoeffding_tree_test.cpp
Highlights
The most challenging parts are:
 One of the most challenging parts of the work is that how to calculate the surrogate loss for updating the actornetwork, it is different from the updater for the critic network which can be optimized by regression on mean square error. The actornetwork is optimized by maximizing the PPOclip objective, it is a little difficult to implement it like the current loss function which calculated by passing target and predict parameters, so I calculate it outside the model and the declaration of the model is passed into the empty loss function. All the backward process except the model part are calculated by my implementation, like the derivation to the normal distribution's mean and variance.
 Another challenging part of the work is that I implement the proximal policy optimization in the continuous environment, the action is different from the discrete environment. In the discrete environment, the agent just output one dimension's data to represent the action, while in the continuous environment, the agent action prediction is more complicated, the common way to achieve predicting the agent action is to predict a normal distribution, then use the normal distribution to sample an action.
 Also there are other challenging parts of the work, such as tuning the neural network to make the agent to work. This blocks me now so that I need to tune more parameters to pass the unit test. This part is also a timeconsuming process.
Future Work
The pull request of proximal policy optimization is still underdeveloped due to the tuning parameters for the unit test, but it will be fixed soon.
PPO can be used for environments with either discrete or continuous action spaces, so another future work will be to support the discrete action spaces, even though it is easy than the continuous task.
In some continuous environment task, the dimensions of action are more one, we need to handle this situation.
Acknowledgment
A great thanks to Marcus for his mentorship during the project and detailed code review. Marucs is helpful and often tell me that do not hesitate to ask questions. He gives me great help when something blocked me. I also want to thank Ryan's response in IRC even though at midnight. The community is kind since the first meeting, we talk about a lot of things which contain different areas. Finally, I also appreciate t he generous funding from Google. It is a really good project to sharpen our skills. I will continue to commit to mlpack and make the library more easy to use and more powerful.
Thank for this unforgettable summer session.
String Processing Utilities  Summary
This post summarize my work for GSoC 2019
Overview
The proposal for String Processing Utilities involved implementing basic functions which would be helpful in processing and encoding text and then latter implementing machine learning algorithms on it.
Implementation
String Cleaning PR1904
 The implementation started with implementing String Cleaning Functions, A classbased approach was used to implement the function, following were the function which was implemented :
RemovePunctuation()
: The function allows you to pass a string known as punctuation, which could involve all the punctuations to be removed.RemoveChar()
: This function allows you to pass a function pointer or function object or a lambda function which return a bool value and if the return value is true, the character would be removed.LowerCase()
: Convert the text to lower case.UpperCase()
: Convert the text to upper case.RemoveStopWords()
: This function accepts a set of stopword and removed all those words from the corpus.
 After implementing the class, I started implementing CLI and python binding, since mlpack used armadillo to load matrix and hence I had to write a function which could read data from a file using basic input output stream. The types of file are limited to .txt or .csv. The binding has different parameters to set and would work as required based on parameters passed.
String Encoding PR1960
 The initial plan was to implement a different class for different encoding methods such as BOW encoding, Dictionary encoding or TfIdf encoding, but we found that the class had lot of codes which we redundant, and hence we decided to implement a policybased method and the implement different policy for each of the encoding type.
 We implemented
StringEncoding
class which has the function for encoding the corpus (accepts a vector as input) and outputs you the encoded data based on the policy and output type, vector or arma::mat, Also provided an option with padding and to avoid padding depending on the encoding policy  We also designed a helper class
StringEncodingDictionary
, which maintains a dictionary mapping of the token to its labels, The class is a templated class based on the type of tokens, which involves string_view or int type. We arrived at the conclusion of implementing this helper class based on the speed profiling done by lozhnikov. He concluded some results, and thus we decided to implement a helper class.
Policies for String Encoding PR1969
 We decided to implement three policy for encoding, namely as follows :
Dictionary Encoding
: This encoding policy allows you to encode the corpus by assigning a positive integer number to each unique token and treats the dataset as categorical, it supports both padding and nonpadding output.Bag of Words Encoding
: The encoder creates a vector of all the unique token and then assigns 1 if the token is present in the document, 0 if not present.TfIdf Encoding
: The encoder assigns a tfidf number to each unique token.
Tokenizer PR1960
 To help with all the string processing and encoding algorithms, we often needed to tokenize the string and thus we implemented two tokenizers in mlpack. The two tokenizers are as follows:
CharExtract
: This tokenizer is used to split a string into characters.SplitByAnyOf
: The SplitByAnyOf class tokenizes a string using a set of delimiters.
After implementing all the encoding policies and tokenizer, I decided to implement CLI and python binding PR1980 for String Encoding, Both string encoding and string cleaning function share a lot of common function and hence we decided to share a common file string_processing_util.hpp
between the two bindings.
My proposal also included Implementation of Word2Vec, but we decided to optout since we found that google patented it.
Post GSoC
A lot of the codes I implemented are sketchy since I have used boost::string_view and other boost algorithms and hence we need to do a speed check and find out the bottlenecks if any. Also, my plan is to implement any substitute for word2vec, such as GLOVE or any other word embedding algorithms. I had implemented a function for One hot Encoding, which I thought could be useful for word2vec, but we found out that it was buggy to a small extent and hence I have to find a way out and also have to implement some overloaded functionality.
Lastly, the most important part, I have to write tutorials for all the functionality provided to allow someone to understand how to drop these functions in their codebase, Also excited to do some machine learning stuff on text dataset using mlpack.
Acknowledgement
A big thanks to lozhnikov, Rcurtin, Zoq, and the whole mlpack
community. This was my first attempt at GSoC, and I am happy that I was successful in it. I fell in love with the opensource world and it was a wonderful experience. I gathered a lot of knowledge in these past 3 months. I will continue to be in touch with the mlpack
community and seek to do more contributions to the project in the future.
Also, I think its time to order some mlpack stickers :)
Thanks :)
Application of ANN Algorithms Implemented in mlpack  Summary
Works
All GSoC contributions can be summarized by the following.
Contributions to mlpack/models
Pull Requests
 VGG19 on Imagenet dataset. #32
 Added LSTM Sentiment Analysis. #31
 Added LSTM Univariate Time series analysis #30 Merged
 Added LSTM for multivariate time series. #29 Merged
 Added VGG19 Model for MNIST Dataset. #28
Contributions to mlpack/mlpack
Merged Pull Requests
 Added support for Loading image #1903
 Rectified imports in Python documentation #1820.
 Added cellState as output params in LSTM. #1800.
 Added quoted_strings to regex in LoadCSV #1756.
 Added additional check to LoadARFF #1793.
 Make models more accessible in Python #1771.
 Rectified ann.txt (mlpack/doc) #1731.
 Added .pyc in .gitignore #1721.
Issues Created
 LoadARFF gets stuck if file is not found #1791.
 Exposing Cell and hidden state in LSTM #1782.
 Cannot load text in CSV files #1754.
Contributions to zoq/gym_tcp_api
Merged Pull Requests
Loading Images
Image utilities supports loading and saving of images.
It supports filetypes jpg, png, tga,bmp, psd, gif, hdr, pic, pnm for loading and jpg, png, tga, bmp, hdr for saving.
The datatype associated is unsigned char to support RGB values in the range 1255. To feed data into the network typecast of arma::Mat
may be required. Images are stored in matrix as (width * height * channels, NumberOfImages). Therefore imageMatrix.col(0) would be the first image if images are loaded in imageMatrix.
Loading a test image. It also fills up the ImageInfo class object.
Similarily for saving images.
VGG19
VGG19 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 19 layers deep and can classify images into 1000 object categories. Details about the network architecture can be found in the following arXiv paper: For more information, read the following paper:
Tiny Imagenet
Tiny ImageNet Challenge is the default course project for Stanford CS231N. It runs similar to the ImageNet challenge (ILSVRC). The goal of the challenge is for you to do as well as possible on the Image Classification problem. The model uses VGG19 to classify the images into 200 classes.
MNIST handwritten digits
The VGG19 model used for classification. It creates a sequential layer that encompasses the various layers of the VGG19.
Sentiment Analysis
We will build a classifier on IMDB movie dataset using a Deep Learning technique called RNN which can be implemented using Long Short Term Memory (LSTM) architecture. The encoded dataset for IMDB contains a vocab file along with sentences encoded as sequences. A sample datapoint [1, 14, 22, 16, 43, 530,..., 973, 1622, 1385, 65]. This sentence contains 1st word, 14th word and so on from the vocabulary.
A vectorized input has to be fed into the LSTM to explot the RNN architecture. To vectorize the sequence dictionary encoding is used. The sample shown would be transformed to [[1, 0, 0,.., 0], [0,..,1,0,...], ....], here the first list has !st position as 1 and rest as 0, similarly the second list has 14th element 1 and rest 0. Each list has a size of the numbers of words in the vocabulary.
Accuracy Plots
Time Series Analysis
Multivariate
We want to use the power of the LSTM in Google stock prediction using time series. We will use mlpack and Recurrent Neural Network(RNN).
MSE Plot
Univariate
Implementation of an example of using Recurrent Neural Network (RNN) to make forcasts on a time series of electric usage (in kWh), which we aim to solve using a recurrent neural network with LSTM.
MSE Plot
Results on other datasets  International Airline Passengers
This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations.
We will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1) over the period of 'rho' frames.
Mean Squared error upto 10 iterations. Training ...
1  MeanSquaredError := 0.146075 2  MeanSquaredError := 0.144882 3  MeanSquaredError := 0.09501 4  MeanSquaredError := 0.0875479 5  MeanSquaredError := 0.0836975 6  MeanSquaredError := 0.0796567 7  MeanSquaredError := 0.0804368 8  MeanSquaredError := 0.0803483 9  MeanSquaredError := 0.0809061 10  MeanSquaredError := 0.076797 ....
MSE Plot
Future Work
The tutorial associated with the models implemented are not published to mlpack webpage. The blogs are needed to be linked to a common place for the user. VGG19 is being trained on tinyimagenet dataset, the results of which will be added.
Acknowledgement
I am sincerely grateful to the whole mlpack
community especially Ryan Curtin, Marcus Edel, Sumedh Ghaisas, Shikhar Jaiswal for the support I received. It was an awesome experience.
Overview
The aim of my project for Google Summer of Code 2019 was to implement NeuroEvolution of Augmenting Topologies (NEAT) in mlpack based on Kenneth Stanley's paper Evolving Neural Networks through Augmenting Topologies. I would also implement support for "phased searching", a searching scheme devised by Colin Green to prevent genome bloat when training NEAT on certain complex tasks.
Besides this, my project aimed to create a framework for multiobjective optimization within mlpack's optimization library ensmallen. This would involve the implementation of several test functions and indicators, as well as an optimizer, NSGAIII.
Implementation
NEAT
NeuroEvolution of Augmenting Topologies (NEAT) is a genetic algorithm that can evolve networks of unbound complexity by starting from simple networks and "complexifying" through different genetic operators. It has been used to train agents to play Super Mario World and generate "genetic art".
I implemented NEAT in PR #1908. The PR includes the entire implementation including phased searching, associated tests and documentation. NEAT was tested on:
 The XOR test, where it's challenge was to create a neural network that emulated a two input XOR gate. NEAT was able to solve this within 150 generations with an error less than 0.1.
 Multiple reinforcement learning environments implemented in mlpack.
 The pole balancing task in OpenAI Gym. This was done using the Gym TCP API implemented by my mentor, Marcus Edel. A short video of the trained agent can be seen here.
 The double pole balancing test. I implemented this as an addition to the existing reinforcement learning codebase. NEAT performed well on both the Markovian and nonMarkovian versions of the environment.
The pull request is rather large and is still under review.
Multiobjective optimization
MultiObjective Optimization is an area of multiple criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. NSGAIII (Nondominated Sorting Genetic Algorithm) is an extension of the popular NSGAII algorithm, which optimizes multiple objectives by associating members of the population with a reference set of optimal points.
I implemented support for multiobjective optimization in PR #120. This PR includes:
 An implementation of NSGAIII. This code is still being debugged and tested.
 Implementation of the DTLZ test functions.
 Implementation of multiple indicators, including the epsilon indicator and the Inverse Generational Distance Plus (IGD+) indicator.
 Associated documentation.
Other work
Besides the work explicitly stated in my project, I also made some smaller changes and additions in the reinforcement learning codebase. This includes:
 Implementation of both a Markovian and nonMarkovian (velocity information not provided) version of the double pole balancing environments (see #1901 and #1951).
 Fixed issues with consistency in mlpack's reinforcement learning API, where certain functions and variables were missing from some environments. Besides this, the environments now have an option to terminate after a given number of steps. See #1941.
 Added QLearning tests wherever necessary to prevent issues with consistency in the future. See #1951.
Future Work
 Get the NEAT PR merged.
 Finish testing and debugging the NSGAIII optimizer.
 Write more indicators for multiobjective optimization.
Final Comments
I would like to thank my mentors, Marcus Edel and Manish Kumar for all their help and advice, as well as bearing with me through the multiple technical difficulties I faced during this summer. I'd also like to thank all the other mentors and participants for their support. I hope to continue to contribute to this community in the future, and look forward to the same.
If anyone is interested in seeing my weekly progress through the program, please see my weekly blog.
Implementing Improved Training Techniques for GANs  Week 11
This week after Toshal’s PR on support for more than 50 layers was merged a lot of my work was ready to be merged. Specifically, the padding layer has now been completed and merged. Also the mini batch discrimination layer is complete and the build is finally passing so, we should be able to merge that soon as well!
Other than that I have continued my work with spectral norm layer. One difficultly with the implementation is that the the layer uses a power iteration method during the forward pass for computing the spectral norm of the weight matrix. I am not completely sure how we would compute the gradient for this approximation. I have been try to do the manual derivation for this but it is very tedious and I have not been successful with it so far. Hopefully, in the coming week I can get it to work. Otherwise I hope to continue the work postGSoC.
Implementing Essential Deep Learning Modules  Week 12
Well my Bias visitor
PR is merged. Also my exactobjective
PR is merged. I also added a workaround to enable adding more layers to the ANN
. It looks like we can have as many layers as we want. We can always develop a tree like structure for adding more layers. The most interesting thing about the workaround PR was that the Branch
of the PR could be deleted just after two days of it's creation.
As soon as the workaround for adding more layer got merged, My WeightNormalization
layer was ready to merge. My Inception Layer
can also be now merged and I will start working on it soon.
In the upcoming week, I am thinking to add a small test for LSGAN
so that it can get merged. It's actual image testing would need some time. Most of the online available tests use Dual Optimizer
. So deciding the parameters for training would be challenging. If poosible I will also try to finish my work on Inception Layer
.
I am also currently testing my Dual Optimizer
PR. It's running from long time on savannah approximately (50 Days). Hopefully, I see good results after it gets completed :)
Quantum Gaussian Mixture Models  Week 12
This week, I comapared QGMM with GMM using the percentage of the convergence on the clusters of the observations as an indicator of the training performance. With a total of 200 experiments, I checked QGMM showed good performance when the initial phi 0, while phi 90 was bad. When we set the initial phi as 0, it wasn't changed from the initial value, but when we set the initial phi as 90, it increased between 91 ~ 269. Therefore, the cosine of phi became negative and the two distributions were overlaid. Actually, I tried to control the value of phi, but I didn't get any workaround. Thus, I should come up with the method to control the phi properly for stable performance.
Lastly, I checked the difference between the augmented and normal Lagrangian multipliers updating lambda every 1000 iterations. Surely, when using the augmented Lagrangian method, it's easy to set the initial lambda for some cases in which we don't know the proper initial value, but if we can set the proper lambda initially, the normal Lagrangian showed better performance overall. I think I should look into this research later with additional data sets because there are some hyperparameters in the augmented method as well.
Thanks for reading :)