mlpack  blog
Profiling for parallelization and parallel stochastic optimization methods - Week 7 + 8

Profiling for parallelization and parallel stochastic optimization methods - Week 7 + 8

Shikhar Bhardwaj, 25 July 2017

These past two weeks were spent on two things : running HOGWILD! on larger datasets to find performance issues and fixing them and working on an implementation of SCD.

Running parallel SGD on the Netflix datasets revealed a few issues with the code. Using the generalized form of the parallel SGD Optimize function was proving to be very slow with a dataset of this size (with numFunctions = 100198806). Also, shuffling indices was taking more resources than it should.

The first issue was fixed by introducing an overload for ParallelSGD in the regularized SVD implementation, as is done for StandardSGD. To resolve the second issue, arma::shuffle was switched for std::shuffle.

After the latest commit, the runs on the datasets are here.

Comparing these to the runs mentioned in the paper, we see that the time taken on the RCV1 dataset is nearly the same, with lower error. Netflix performance is worse, and has probably to do something with a better initiliazation strategy or using a "delta" model (where ratings are calculated as a difference from the mean rating for the movie, instead of directly multiplying the user and movie matrices).

One interesting thing to note is that the runs in the paper were done on 10 core machines, whereas the runs for mlpack are on a dual-core, quad-thread CPU.

Regarding the implementation of SCD, I am thinking about adding a ResolvableFunctionType requirement on the function iterface. SCD requires the exact computation of gradient in a single dimension, and none of the existing optimizers seem to utilize this type of information.