Work done from week four through seven.

The various implementations done in the benchmark system used to build the model twice :- Once while calculating runtime and then also while calculating various metrics. Those implementations were updated to build the model only once by using the predictions obtained in the first run to calculate the metrics later.The test file timeouts were also upgraded from 9000 to 240 to run the tests faster and avoid longer runtime when some error had occured.

Shogun has been updated from 5.0.0 to 6.0.0 so the implementations were updated according to the latest version of shogun library. Also the shogun implementations were only calculating the runtime metric. Those implementations were updated to calculate other metrics like Accuracy, Precision, Recall, Precision and MSE.

For MATLAB many methods like Support Vector Classifier, Decision Tree, K Nearest Neighbors, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Support Vector Regression, Random forest and Lasso were added including the test file for the same. The other implementations were updated to calculate more metrics like Accuracy, Precision, Recall and MSE.

Weka needed many upgrades and some algorithms added which were not previously benchmarked.Implementations for Decision Tree, Logistic Regression, Random Forest, Perceptron were added including test files for the same. The old implementations were upgraded to calculate more metrics.

The next steps are adding dlib-ml and R to the benchmark systems. The work on that has started and they will get added to the benchmarking systems soon.

more...

Augmented Recurrent Neural Networks: Week 8

Unlike previous weeks, this week was rather quiescent. It didn't feature insane bugs, weird behaviors of the code, mind-boggling compiler messages and anything else of the kind.

However, it featured a work on "casting" gradient clipping and cross-entropy performance function to the mlpack standard API. The pull request with the entire discussion: link. As mentioned in the discussion of the PR, this code is more or less ready for merging into mlpack/master.

more...

Neural Evolution Algorithms for NES games - Week 6 and 7 Progress

these 2 weeks on writing some more test cases for the CMAES algorithm. I faced a lot of difficulties along the way also and had to spend a lot of time on errors. i implemented logistic regression using CMAES which is working well and is able to classify accurately. the rosenbrock function which was used as a test case only converges for 2 dimensions giving accurate value and very close to zero optimization. The test case fails for functions having higher dimension. I tried a lot of different alterations and spent a lot of time in fixing this but i mostly ended up with flat fitness due to which convergence stops in between.

the next was implementing the CMAES optimizer on neural networks. At very high number of iterations the CMAES is able to optimize the vanilla network but fails at small number of evaluations which it is supposed to do because it will be used in almost real time. I tried to fix this using tuning the parameters of the function but nothing happened. Im still working on this and trying to get it converge fast.

If that does not happen in the next 1 days. I will try to rewind the whole commit to the stage where armadillo library was not implemented and carefully change the things to correct the error, because before when it was implemented in purely C language, I was able to converge rosenbrock for higher dimensions as well. this looks promising and if needed i will spend the time it will take to do so.

Moreover, i was also in busy in my campus placement last week for software engineering job and due to which I was not able to give a lot of time in last week. Finally I am placed now. I am 2.5 weeks behind schedule and will have to pace if selected in second evaluation to complete my gsoc project before time.

more...

Cross-Validation and Hyper-Parameter Tuning: Week 8

During the eighth week I was finishing working on the hyper-parameter tuning module. Some minor changes still should be added, but basically it is done. Since the pull request on the simple cross-validation strategy is not merged yet, I need to wait before sending a pull request for the hyper-parameter tuning module. By now it is available in my fork of mlpack. The same goes for k-fold cross-validation.

The main part of my GSoC plan is almost done, so I'm going to work on some optional parts. For the nearest time it will be additional metrics for the cross-validation and hyper-parameter tuning modules like precision, recall and F1.

more...

Profiling for parallelization and parallel stochastic optimization methods - Week 7 & 8

These past two weeks were spent on two things : running HOGWILD! on larger datasets to find performance issues and fixing them and working on an implementation of SCD.

Running parallel SGD on the Netflix datasets revealed a few issues with the code. Using the generalized form of the parallel SGD Optimize function was proving to be very slow with a dataset of this size (with numFunctions = 100198806). Also, shuffling indices was taking more resources than it should.

The first issue was fixed by introducing an overload for ParallelSGD in the regularized SVD implementation, as is done for StandardSGD. To resolve the second issue, arma::shuffle was switched for std::shuffle.

After the latest commit, the runs on the datasets are here.

Comparing these to the runs mentioned in the paper, we see that the time taken on the RCV1 dataset is nearly the same, with lower error. Netflix performance is worse, and has probably to do something with a better initiliazation strategy or using a "delta" model (where ratings are calculated as a difference from the mean rating for the movie, instead of directly multiplying the user and movie matrices).

One interesting thing to note is that the runs in the paper were done on 10 core machines, whereas the runs for mlpack are on a dual-core, quad-thread CPU.

Regarding the implementation of SCD, I am thinking about adding a ResolvableFunctionType requirement on the function iterface. SCD requires the exact computation of gradient in a single dimension, and none of the existing optimizers seem to utilize this type of information.

more...

Build testing with Docker and VMs Week 8

The next steps included running a container using a pre-built image, building mlpack inside a container using a Jenkins plugin. I am using CloudBees Docker Custom Build Environment Plugin.

Apart from that, a matrix configuration is to be created to create images with different configurations and mlpack is to be installed, run and tested against each image.

Also, I am looking for missing library versions.

more...

Atom domain class - Week 7

This week, I was trying to get better structure for the atom domain code. When I was trying to add the full correction update method code for constrained problem in atom domain, I found that I need to rewrite a lot of things in the UpdateSpan class. So I decided to make a new class for atom domain operations. This structure is more general for atom domain optimization, although might make the OMP code less efficient. I will try some large scale tests to see how does that affect the performance.

Since there has been some API change between my old PR and my current code, I guess I will merge everything first. Also, I will make more tests in vector case for atom domain before I do the matrix completion comparison (which was my plan last week).

Even for the vector case, I am still excited to see in the experiments how will the atom norm constraint and support prune step would affect the sparsity of the solution, as well as the convergence speed, comparing with the naive OMP solution. (People familiar with LASSO would know that L1 norm constraint would give sparsity automatically. In terms of the algorithm I implemented, the atom norm constraint problem uses projected gradient solver, where the projection step is simply a soft-thresholding on the atom coefficients. So I guess it will give better solution and faster convergence than OMP.)

more...

Deep Learning Module in MLpack(Week 7)

Week Seven

This week i mostly tried completing ssRBM and GAN PR's. Majority of the time was spend in making both the codes work on the test dataset. We finally managed to do so. With ssRBM Pr we were running into the errors of memory mangament due to me allocating around ~30gb of memory for the parameters. Since i was declaring all the parameters to be of full matricies. But i managed to reduce this to just vectors. The problem left withh ssRBM still is the training part we getting a accuracy of around 12% for the mnist data set that we used in the binary rbm. We are working on fixing the test.

This week i also managed to finish the GAN implmenetation the code on work on the test data but is givin near random images even for say 1000 iterations of alternating sgd(with the discriminator being trained for 3000(3 * 1000) iterations) and generator being trained for 1000 iterations(the generator and discriminator being used here are just simple ffn). The GAN PR also requires review for me to fully undertand where i am gouing wrong. I want to thank Konstantin here also since i was using the CrossEntropy Code that he wrote for the GAN's. I am also not sure how to test GAN. Write now i am just trying to see if it can generate real quality images.

Next Week: I would mostly be working on fixing both the GAN' and ssRBM test. Also i would write serialisation for GAN's next week. I hoping within 10 days both PR's would be mergable.

more...

Deep Reinforcement Learning Methods - Week-6 Highlights

This week I continued working on async one-step q-learning. The major challenge this week is to make the test case success in the Travis CI. It's quite tricky. I tuned the architecture of the network and hyper-parameters, it only costs 2s in my mac but still costs almost 10 minutes in the server, even I reduce the amount of the workers to 4. So I have to try pre-trained network. It's weird that if I set the amount of workers to 0 and only have test process with the pre-trained converged network, it will fail (this only happens in ther server). So I have to set the amount of the workers to 1. Although I don't know why it works. TrainingConfigclass is quite useful in terms of passing in hyper-parameters, however it doesn't conform with the newest mlpack API style. But I assume rl methods won't interact with CrossValidation helper, so I guess the newest API style won't influence my project much. The PR for async one step q-learning is almost ready, hopefully it can be merged within 2 days.

more...

Augmented Recurrent Neural Networks: Week 7

This week featured even more fixes for baseline model. The most notable achievement is that we've finally managed to make an LSTM model that would be able to completely learn the CopyTask (even for maximum length and repetitions parameters).

To this end, several small (but important) changes had to be implemented:

  • The CrossEntropyError layer - earlier the model was learned on MSE objective. Cross entropy penalizes the mistakes sharper than MSE (e.g., assigning 0 probability to the true label results in +infinity loss with cross entropy and in only +1 loss with MSE).
  • Gradient clipping - to rectify the problem of objective explosion, all gradient components that were bigger in absolute value than some fixed number were replaced with that value. In math: g0 = min(max(g, minValue), maxValue).

Also, the model was transferred to MiniBatchSGD from Adam - the latter exposed gradient explosion problem, which resulted in a heavy overfitting.

To avoid the overfitting problem completely, the testing scheme was changed. Here's what happens now on every training epoch:

  • The model is trained on some train samples;
  • After that, the model is evaluated on validation set (we use three datasets, not two);
  • If this iteration gave the best validation score, run it on the test set and record the score on it.

In other words, now we pick the best model among all the model that were visited during training.

Also, now there is a stub (non-functional so far) for highway layer, as proposed in arxiv paper.

I think now the task API & representation problem is more or less solved. Now we switch to the long-waited part - HAM unit implementation. (As I mentioned in the IRC chat, we already have a head start with ready API for HAM unit + implemented and tested memory structure for it). Stay tuned!

more...