mlpack IRC logs, 2018-06-08

Logs for the day 2018-06-08 (starts at 0:00 UTC) are shown below.

June 2018
--- Log opened Fri Jun 08 00:00:54 2018
02:11 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
02:32 -!- manish7294 [8ba75a91@gateway/web/freenode/ip.] has joined #mlpack
02:35 < manish7294> rcutin: Here are some more results on covertype: k = 12, initial accuracy - 96.7575 final -96.9964, step_size - 1e-08 , total time: 6hrs, 56mins, 18.5 secs, optimizer - sgd
02:36 < manish7294> k = 15, initial accuracy - 96.3226 final -96.955, total time: 13hrs, 47mins, 27.9 secs, optimizer - lbfgs
02:37 < manish7294> In case of sgd step size was kept this low as the optimization was diverging to inf.
02:52 < manish7294> rcurtin: Ahh, Sorry! again made mistake while typing nick
03:04 < rcurtin> no worries
03:04 < rcurtin> the improvement seems marginal but I don't think it's a problem
03:11 < rcurtin> I think the main focus should be acceleration, I think there is a lot that can be done there
03:12 < rcurtin> also you can see if you can increase the convergence tolerance
03:12 -!- manish7294_ [8ba7a3d6@gateway/web/freenode/ip.] has joined #mlpack
03:12 -!- manish7294 [8ba75a91@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
03:13 < manish7294_> rcurtin: On thing is worrying me that instead of terminating with a failure message in case of divergence, I am getting segmentation fault
03:14 < rcurtin> that could make the process take far fewer iterations
03:14 < rcurtin> sorry for the lag again, I am playing mariokart :)
03:15 < manish7294_> And how's your research paper advancing?
03:15 < rcurtin> a segfault is not good, we should investigate that, maybe there is a bug
03:15 < rcurtin> haha
03:15 < rcurtin> right I am playing mariokart not working on it :)
03:15 < rcurtin> but I think it is ready
03:16 < manish7294_> great, nothing comes between mariokart :)
03:18 < rcurtin> ;)
03:18 < rcurtin> they extended the submission deadline to next week but I think it is ready to submit tomorrow anyway
03:20 -!- manish7294_ [8ba7a3d6@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
03:22 -!- manish7294 [~yaaic@2405:205:249f:5a92:bdc3:7225:2430:fb8a] has joined #mlpack
03:22 < rcurtin> by the way I ended up spending all day in meetings so I don't have a kNN bound yet
03:23 < rcurtin> another thought for optimization: for SGD at each iteration it is only necessary to compute new impostors for points in the batch
03:23 < rcurtin> so you could use knn.Search(querySet.cols(begin, begin + batchSize - 1))
03:23 < rcurtin> (I think the 1 is needed, double check that...)
03:29 < manish7294> rcurtin: You have suggested this earlier and it is already there in current implementation; )
03:29 < manish7294> :)
03:30 < manish7294> And I think that is one that has created a difference of almost half in timings of lbfgs and sgd
03:30 < rcurtin> ah, sorry, I did not realize that
03:31 < rcurtin> I need to look closely at the state of the implementation, I will do that tomorrow
03:34 < manish7294> great, but you don't have to hurry as research paper must be top priority:)
03:34 < rcurtin> :)
03:34 < rcurtin> well I am headed to bed now
03:35 < rcurtin> talk later! :)
03:35 < manish7294> sure :)
03:51 < manish7294> leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs
04:00 < jenkins-mlpack> Project docker mlpack weekly build build #45: FAILURE in 3 hr 13 min:
04:00 < jenkins-mlpack> * haritha1313: support for RandSVD in CF
04:00 < jenkins-mlpack> * haritha1313: templatized apply
04:00 < jenkins-mlpack> * haritha1313: adjusting eps addition
04:00 < jenkins-mlpack> * haritha1313: refactoring
04:00 < jenkins-mlpack> * haritha1313: style edits
04:00 < jenkins-mlpack> * haritha1313: style edits
04:00 < jenkins-mlpack> * haritha1313: remove template
04:00 < jenkins-mlpack> * haritha1313: edit
04:00 < jenkins-mlpack> * haritha1313: debugging
04:00 < jenkins-mlpack> * haritha1313: style edit
04:00 < jenkins-mlpack> * haritha1313: debugged matrix error
04:00 < jenkins-mlpack> * haritha1313: debug emptyctortest
04:00 < jenkins-mlpack> * haritha1313: train debug
04:00 < jenkins-mlpack> * haritha1313: debug
04:00 < jenkins-mlpack> * haritha1313: train debug
04:00 < jenkins-mlpack> * haritha1313: regSVD debugging
04:00 < jenkins-mlpack> * haritha1313: test time reduction
04:15 -!- manish7294 [~yaaic@2405:205:249f:5a92:bdc3:7225:2430:fb8a] has quit [Ping timeout: 265 seconds]
05:32 -!- dasayan05 [cb6ef0c8@gateway/web/freenode/ip.] has joined #mlpack
05:33 -!- dasayan05 [cb6ef0c8@gateway/web/freenode/ip.] has quit [Client Quit]
06:54 < zoq> rcurtin: mariokart N64?
07:58 -!- sumedhghaisas [~yaaic@2402:3a80:674:337b:4f7e:24d7:4f30:cb1a] has joined #mlpack
08:00 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 240 seconds]
08:00 -!- sumedhghaisas [~yaaic@2402:3a80:674:337b:4f7e:24d7:4f30:cb1a] has quit [Read error: Connection reset by peer]
08:01 -!- sumedhghaisas [~yaaic@] has joined #mlpack
08:05 -!- sumedhghaisas2 [~yaaic@2402:3a80:674:337b:4f7e:24d7:4f30:cb1a] has joined #mlpack
08:05 -!- sumedhghaisas2 [~yaaic@2402:3a80:674:337b:4f7e:24d7:4f30:cb1a] has quit [Read error: Connection reset by peer]
08:05 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
08:05 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 256 seconds]
08:28 -!- sumedhghaisas [~yaaic@] has joined #mlpack
08:31 -!- sumedhghaisas3 [~yaaic@] has joined #mlpack
08:32 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 240 seconds]
08:33 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 245 seconds]
08:52 -!- sumedhghaisas3 [~yaaic@] has quit [Remote host closed the connection]
09:12 < zoq> ShikharJ: In case you missed the last message; see
09:53 < jenkins-mlpack> Project docker mlpack nightly build build #343: STILL UNSTABLE in 2 hr 39 min:
11:27 < ShikharJ> zoq: Ah, sure, let's merge. We have been a lot patient with this, and I guess the time is right :)
11:28 < ShikharJ> rcurtin: If you might want to take a look please feel free and let us know.
11:31 < zoq> gradient: Okay, let me set the timer, so that we conform with the merge policy.
11:33 -!- sumedhghaisas [31f8eb8e@gateway/web/freenode/ip.] has joined #mlpack
11:33 < zoq> ShikharJ: Sorry, wrong name.
11:40 < ShikharJ> zoq: Great :)
11:42 < zoq> ShikharJ: Excited to get this merged.
11:43 < ShikharJ> zoq: I'm also nearing completion on the DCGAN PR, once it is ready to be tested (by the weekend), we can merge that as well, and then focus on other tasks. Thanks for helping me out on this. We've been able to complete this because of you and lozhnikov!
11:44 < zoq> ShikharJ: You did everything :)
11:46 < ShikharJ> zoq: We must also thank Kris for his patience with the work. He had implemented a major portion of the API that has helped us to finish off this work without much delay :)
11:48 < zoq> ShikharJ: Definitely, Kris provided a great basis to work with.
11:51 < sumedhghaisas> Atharva: Hi Atharva
11:52 < sumedhghaisas> zoq: Hi Marcus. How are you? We were going to propose a change to the ANN loss function arch. Thought I will run it by you.
11:54 < zoq> sumedhghais: Hey, sure, if this makes things easier sure.
11:56 < sumedhghaisas> zoq: So currently the loss is defined as the last layer, thus can only define the loss over the previous layer
11:56 < sumedhghaisas> This restricts our architecture, we aren't able to implement losses which are dependent on intermediate layers
11:57 < sumedhghaisas> for example, in VAE KL loss defined over the mean and stddev which is the output of Reparametrization layer
11:58 < sumedhghaisas> I mean any kind of regularization cannot be implemented with our arch
11:58 < sumedhghaisas> what I propose is
12:00 < sumedhghaisas> Keeping the loss layer whcih defines the major loss, and implementing a visitor which collect extra loss from remaining layers
12:01 < sumedhghaisas> The layer which adds the extra loss will be responsible for adding corresponding error signals in Backward, as these signals won't affect the layers above the current Backward arch can be kept intact
12:01 < zoq> In this case we don't have to change the main interface right?
12:01 < sumedhghaisas> Exactly
12:02 < sumedhghaisas> All we do is expect a Loss() function which returns a double to be added to the loss
12:02 < ShikharJ> sumedhghaisas: That sounds good to me.
12:03 < zoq> Sounds like a good idea to me, I thought about a layer which forwards the output, but an extra visitor is cleaner.
12:04 < sumedhghaisas> SikharJ, zoq: Just wanted to make sure I am not missing any corner case where this will collapse
12:05 < zoq> Nothing comes to mind, at least for now.
12:06 < Atharva> I will go ahead with this then
12:06 < sumedhghaisas> zoq: :) Okay in my understanding the only function where the actual loss is computed is 'Evaluate' right?
12:06 < zoq> inside the FFN and RNN clas right
12:07 < sumedhghaisas> So that will be the only point of change in my view
12:08 < zoq> yeah, if you are going to use the FFN class, that is the part we have to change
12:08 < sumedhghaisas> Atharva: Are you clear about the implementation? This will avoid the creating of extra VAE class :)
12:10 < Atharva> So, we just put a VAE together with the FFN class
12:10 < Atharva> But the class had a lot of other fnctions planned
12:10 < sumedhghaisas> And if time permits, RNN as well :)
12:10 < Atharva> generate functions for example
12:11 < sumedhghaisas> We can convert any architecture to VAE
12:11 < sumedhghaisas> Okay So lets first implement a FNN extension only and get gradient test passing
12:31 -!- sumedhghaisas [31f8eb8e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
12:52 < rcurtin> zoq: nah, I like to play mariokart 8 deluxe online
12:53 < rcurtin> it's crazy, I think there are a lot of people out there who practice way too much :)
12:55 < zoq> ahh, nintendo switch
12:57 -!- manish7294 [8ba78cf4@gateway/web/freenode/ip.] has joined #mlpack
12:58 < zoq> Perhaps we could play a community round? Not sure anyone else has a switch?
13:00 < manish7294> rcurtin: I tried debugging segfault error coming up during divergence. I think as the coordinates matrix internal values increases to very large values --- KNN search for impostors fails leading to the error.
13:46 < rcurtin> zoq: I'd be up for it :)
13:46 < rcurtin> manish7294: hmm, could you print the coordinates matrix? is the segfault coming from KNN?
13:47 < rcurtin> I thought that KNN should still work with very large values, so long as there is no nan or inf (I don't know what happens in that case)
13:53 < manish7294> rcurtin:, The error comes when calculating eval in gradient part --- because impostors() outputs garbage values.
13:56 < manish7294> like values can be something similar to 16487695656652.... and then when we query for transformedDataset.col(impostors(j, i)), error pops up.
13:58 < rcurtin> ah, that is not good, the coordinates matrix should never be diverging in that way
13:59 < manish7294> rcurtin: This happens when step size is comparitively large, I mentioned this earlier in PR.
14:00 < rcurtin> it's a little hard for me to keep track, there are several different issues being debugged
14:00 < rcurtin> if the step size is too large, indeed it will bounce around to extremely large values
14:00 < rcurtin> what is the step size being used? I think this is the covertype dataset?
14:00 < rcurtin> also, your comment from earlier:
14:01 < manish7294> step size is 1 here
14:01 < rcurtin> 03:51 < manish7294> leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs
14:01 < rcurtin> the SGD run took 6 hours overall and LBFGS took 13 hours overall, right?
14:01 < manish7294> yes
14:01 < rcurtin> ah, step size 1 is almost always going to be way way way too large. usually 0.01 or 0.001 or even smaller is closer to the right choice
14:02 < manish7294> but with covertype a step size of 1e-06 or greater leads to same
14:03 < rcurtin> ah, hm. I wonder if we need to add some regularization or something, but let's not worry too much about that for now
14:03 < rcurtin> if SGD is only spending a total of 4 minutes computing neighbors, then definitely the main bottleneck now is somewhere else; do you know what part is slow?
14:04 < manish7294> I think recalculation of gradient due to neighbors everytime can be a reason but can't say it is significant
14:05 < rcurtin> you can do some high-level profiling by adding 'Timer::Start()' and 'Timer::Stop()' calls throughout the code
14:05 < rcurtin> (or you could use a profiler like gprof or perf or something like this, but for high-level ideas probably using Timer is the easiest way)
14:06 < manish7294> I think we should that, it will definitely help
14:06 < manish7294> and can we do something for that divergence thing like throwing an error or something
14:07 < rcurtin> we could, we would have to catch the condition though
14:07 < rcurtin> I'd like to try and reproduce that, so let me check out the code and see if I can get it to happen
14:07 < manish7294> good enough
14:07 < rcurtin> I really don't think it would be a bad idea to add a penalty term like -|| L || to the objective
14:08 < rcurtin> or something like this, it should help keep the entries of the matrix from diverging
14:08 < rcurtin> in any case, let me reproduce it and see
14:08 < manish7294> sure, I shall be doing the timings then.
14:09 < rcurtin> yeah; if we have the computing neighbors down to 4 minutes with covertype and SGD, this is definitely a great start, and if we can reduce the other part of the computation similarly I think the implementation will be fast
14:09 < rcurtin> still a little work to do for L-BFGS, but I think there are still lots of ideas we can do
14:10 < rcurtin> I guess we should benchmark with other implementations at some point, but at the very least that MATLAB implementation will never work with covertype... since it builds the entire n x n distance matrix, we could only compare against a dataset of roughly 6-8k points or less
14:10 < manish7294> Right, and I think shogun's is roughly based on the same
14:12 < manish7294> Ahh! grammatical mistake :)
14:15 < rcurtin> huh, I'm not sure I noticed any grammatical mistake
14:19 < manish7294> shogun's is
14:19 < rcurtin> hmm, I guess technically the implication is that you mean "shogun's implementation"
14:20 < manish7294> right :)
14:20 < rcurtin> which would work as "shogun's implementation is". I guess I am not sure whether leaving the implementation out makes it grammatically incorrect
14:20 < rcurtin> it seems like a pretty small issue either way :)
14:21 < manish7294> leaving implementation will be like shogun is is or shogun has is :)
14:21 < rcurtin> I guess we could go to the grammar stack exchange
14:21 < rcurtin> but there sure is a lot of pedantry in that forum :)
14:22 < manish7294> let's add this to legendary list of issues to deal with for now, haha :)
14:23 < rcurtin> hah, sounds good :)
14:23 < rcurtin> if you like, maybe it might be worthwhile to add a checklist to the LMNN PR of issues to look into, but that is up to you
14:23 < rcurtin> but it might be useful to have some way to track the multiple threads of discussion
14:23 < manish7294> haha
14:24 < manish7294> maybe a comment at the end with some eye catching material will do
14:26 < rcurtin> that works also, however you want to do it. really in the end all I'm looking for is that we can do LMNN on pretty large datasets and it works reasonably well
14:26 < rcurtin> if all the implementations are built like the MATLAB one, then yours will be able to scale much more significantly than anything else already, but I think we can still make it faster still :)
14:28 < rcurtin> I think we are still roughly on track with your timeline, you had written that you planned to be fully done with LMNN by 6/15
14:28 < rcurtin> actually, I guess you have time for LMNN and benchmarking or writing the lmnn_main until July 1st
14:29 < manish7294> The deadline is closing by, Will need to hurry on optimization part
14:29 < rcurtin> I do think that all the accelerations we do for LMNN will apply to BoostMetric also, which will be nice
14:30 < manish7294> Yes, doing boostmetric part will be lot easier :)
14:30 < manish7294> thanks to all LMNN efforts
16:03 -!- manish7294 [8ba78cf4@gateway/web/freenode/ip.] has quit [Quit: Page closed]
16:30 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
16:32 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
20:01 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
20:33 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
21:01 -!- travis-ci [] has joined #mlpack
21:01 < travis-ci> mlpack/mlpack#5029 (master - 4c008c4 : Ryan Curtin): The build passed.
21:01 < travis-ci> Change view :
21:01 < travis-ci> Build details :
21:01 -!- travis-ci [] has left #mlpack []
21:03 < rcurtin> manish7294: I had time to do some quick experimentation. I built mlpack_lmnn and inserted a number of timers, then ran with SGD with a learning rate of 1e-8 and a batch size of 512 on a subset of 5k points from the covertype dataset
21:03 < rcurtin> this gave reasonable results, but I found that the whole run took 21.8 seconds; of this time, 21.46 was spent in the outer lmnn_sgd_optimization timer, and 20.5 of that was spent in Constraints::Impostors()
21:04 < rcurtin> but the KNN search timer ("computing_neighbors" and "tree_building" will be the parts timed from KNN) only took 3.3 seconds and 0.13 seconds, respectively
21:04 < rcurtin> so it seems like there must be some big inefficiency in the other parts of Impostors()
21:16 < rcurtin> manish7294: I suspect the inefficiency is in the fact that arma::find() and arma::unique() are being called every time Impostors() is called
21:17 < rcurtin> I think you can accelerate things by caching those calculations at the start of the optimization
21:32 -!- travis-ci [] has joined #mlpack
21:32 < travis-ci> mlpack/mlpack#5030 (master - 6a59dd5 : Ryan Curtin): The build passed.
21:32 < travis-ci> Change view :
21:32 < travis-ci> Build details :
21:32 -!- travis-ci [] has left #mlpack []
--- Log closed Sat Jun 09 00:00:55 2018