mlpack IRC logs, 2018-06-07

Logs for the day 2018-06-07 (starts at 0:00 UTC) are shown below.

June 2018
--- Log opened Thu Jun 07 00:00:53 2018
02:11 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
02:19 -!- manish7294 [8ba7e95e@gateway/web/freenode/ip.] has joined #mlpack
02:23 < manish7294> rcurtin: Finally results on covertype are out. K = 10, Initial Accuracy - 96.9761, final Accuracy - 97.3618, Total Time - 13hrs, 23mins, 1.7secs, Optimizer - L-BFGS
02:35 -!- manish7294 [8ba7e95e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
02:41 < rcurtin> manish7294: results seem decent, but that took a really long time. did you compile with -DDEBUG=OFF?
02:41 < rcurtin> also do you know how many iterations of L-BFGS were used?
02:45 < rcurtin> I was writing up my theory today for pruning but something was wrong, the results did not make sense
02:46 < rcurtin> so I have some error I need to fix, I will try tomorrow
02:47 -!- manish7294 [~yaaic@2409:4053:688:8841:5987:4bef:b0f9:6eda] has joined #mlpack
02:47 -!- manish7294_ [8ba7e95e@gateway/web/freenode/ip.] has joined #mlpack
02:50 < manish7294_> rcurtin: I can tell you the number of iterations only if I could scroll back the log. Done many things to make this happen but I am not getting why I can't scroll the log. :(
02:50 < manish7294_> I have used -DDEBUG = OFF Here
02:50 < rcurtin> are you using screen?
02:51 < rcurtin> should be ctrl+a + esc then scroll
02:51 < manish7294_> with screen too, I am not getting
02:51 < manish7294_> let me try once more
02:53 < manish7294_> No, It didn't work
02:54 < rcurtin> ok
02:54 < rcurtin> that's strange
02:54 < rcurtin> the scrollback buffer should still be there
02:55 < rcurtin> sorry for the slightly slow responses... I am playing mariokart online
02:55 < manish7294_> and total computing neighbors time was 6 hrs 15 mins 58.1 secs
02:55 < rcurtin> so I respond while waiting for the next race :)
02:55 < manish7294_> great, should be having fun
02:55 < rcurtin> that sounds about right. if that was on a benchmarking system, each search should take roughly 30 sexs
02:55 < rcurtin> secs
02:55 < rcurtin> excuse me
02:55 < manish7294_> :)
02:59 < rcurtin> I had to correct the typo so I missed the start :)
02:59 < rcurtin> another thing you could try is, e.g., only doing the NN search for impostors every 100 iterations ot somehing like this
03:01 < manish7294_> Now, I am able to scroll back 1950 lines but still can't touch last iteration log as I used --verbose
03:02 < manish7294_> How can we keep count of iterations inside the LMNNfunction to make above happen?
03:03 < rcurtin> ah, sorry, I forgot L-BFGS only prints the iteration number in debug mode
03:03 < rcurtin> so there will be no output
03:03 < rcurtin> maybe we should change the optimizer to print in verbose mode too
03:05 < manish7294_> Temporarily we can just cout
03:06 < manish7294_> Or should we be making a permanent change?
03:06 < rcurtin> I don't have a particular preference, I think it is fine as is
03:06 < rcurtin> but if you want to change it I am fine with that also
03:07 < manish7294_> okay so I will make a temporary cout for myself
03:08 < manish7294_> Please check this 'How can we keep count of iterations inside the LMNNfunction to make above happen?'
03:11 < rcurtin> you could just have a size_t that you increment each time Evaluate() is called
03:11 < rcurtin> that wouldn't be perfect since Evaluate() may be called more than once per itwration
03:11 < rcurtin> but it can still be helpful for reducing the cost of the LMNNFunction evaluatuons
03:12 < manish7294_> Sure, Let's try it out.
03:30 < rcurtin> ok, time for bed now---talk to you tomorrow!
03:30 < manish7294_> have a good night :)
06:43 < ShikharJ> zoq: It seems that our implementation converges within 10 epochs on the 10,000 image dataset. I found no practical difference (atleast not visually) between the 10 and 20 epoch results.
06:45 < ShikharJ> zoq: Might be because we're taking a big generator multiplier (about 10). I'll investigate for different multiplier steps as well.
08:11 -!- manish7294_ [8ba7e95e@gateway/web/freenode/ip.] has quit [Quit: Page closed]
08:53 < zoq> ShikharJ: worth to test out
09:08 -!- manish7294 [~yaaic@2409:4053:688:8841:5987:4bef:b0f9:6eda] has quit [Ping timeout: 265 seconds]
09:51 < jenkins-mlpack> Project docker mlpack nightly build build #342: UNSTABLE in 2 hr 37 min:
10:10 < zoq> ShikharJ: Do you think we could save the weights after a couple of iterations? It might be a neat way to visualize the optimization process afterwards?
10:14 < ShikharJ> zoq: Sure, I'll see what I can do.
10:21 -!- sumedhghaisas [~yaaic@2402:3a80:656:561b:e8b9:53fd:8e0c:42dc] has joined #mlpack
10:25 -!- dasayan05 [cb6ef0c8@gateway/web/freenode/ip.] has joined #mlpack
10:26 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
10:26 -!- dasayan05 [cb6ef0c8@gateway/web/freenode/ip.] has left #mlpack []
10:28 -!- sumedhghaisas_ [31f8eb8e@gateway/web/freenode/ip.] has joined #mlpack
10:28 < sumedhghaisas_> Atharva: Hi Atharva
10:28 -!- sumedhghaisas [~yaaic@2402:3a80:656:561b:e8b9:53fd:8e0c:42dc] has quit [Ping timeout: 276 seconds]
10:30 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 240 seconds]
11:26 < Atharva> sumedhghaias_: Hi Sumedh
11:30 < Atharva> The numerical deriative test is failing, I think we have the derivatives wrong.
11:43 < sumedhghaisas_> Hi Atharva
11:43 < sumedhghaisas_> @atharva
11:44 < Atharva> Hey
11:44 < sumedhghaisas_> *@Atharva
11:45 < sumedhghaisas_> Have you fixed the input thing to SoftplusFunction::Deriv
11:45 < Atharva> Yes
11:46 < sumedhghaisas_> ohh okay. So the only problem remains is the adding KL loss to the total loss?
11:46 < Atharva> Yeah, but the gradient check is failing
11:46 < sumedhghaisas_> As a preliminary test, try the gradient test without adding klBackward to Backward
11:47 < sumedhghaisas_> The gradient test depends on the total loss
11:47 < Atharva> I am trying that
11:47 < sumedhghaisas_> Sure thing... let me know that works or not
11:47 < Atharva> Oh, but that will only be true for a VAE network, right/
11:47 < Atharva> ?
11:48 < Atharva> I am just trying the gradient test with a simple network with a linear and repar layer
11:48 < sumedhghaisas_> No... Remember we were discussing this before the project started?
11:48 < sumedhghaisas_> we somehow need to be able to add the KL loss to the the overall loss
11:49 < sumedhghaisas_> And I think this is super useful also for the future, as with this functionality we will be able to add regularization kind of tricks to the layer
11:49 < sumedhghaisas_> think about it... KL is just another regularization
11:49 < Atharva> Yeah, but only if the network is VAE, right? In this case, aren't we just use a sampling layer to train a simple neural network.
11:50 < sumedhghaisas_> Use linear + repar + linear to test the gradients
11:50 < sumedhghaisas_> Not just VAE, any kind of NN
11:50 < sumedhghaisas_> we haven't made any VAE specific changes, have we?
11:51 < Atharva> Okay, will do that
11:51 < sumedhghaisas_> The Repar layer is just a layer with extra regularization
11:51 < sumedhghaisas_> that is KL
11:51 < Atharva> I will also add the KL loss to the backward function
11:51 < sumedhghaisas_> Add wait...
11:51 < sumedhghaisas_> test first then add...
11:52 < sumedhghaisas_> the gradient test has to pass before adding KL error
11:52 < sumedhghaisas_> Is the procedure clear to you? If you have any doubts let me know
11:53 < Atharva> Yeah. and after adding KL as well, right?
11:53 < sumedhghaisas_> After adding KL error the gradient test should indeed fail
11:53 < sumedhghaisas_> Have you looked at how gradient test works?
11:53 < Atharva> Yes
11:54 < Atharva> I am a little confused as to why it will fail
11:54 < sumedhghaisas_> its delta(Loss) / delta(parameters)
11:54 < sumedhghaisas_> we estimate this numerically
11:55 < sumedhghaisas_> there is one little flaw in our architecture... currently the loss is only 'reconstruction'
11:56 < sumedhghaisas_> so delta(Loss) would be wrong numerically when we also consider the error signal from KL
11:56 < sumedhghaisas_> if we add the error signal from KL, we need to add KL loss to the loss function
11:56 < sumedhghaisas_> which we aren't able to do right now
11:56 < Atharva> Understood
11:57 < sumedhghaisas_> So if the gradient test passes without KL error, our gradients are correct expect for KL
11:57 < sumedhghaisas_> Lets see if thats the case
11:58 < sumedhghaisas_> Then lets decide how to add KL loss overall loss
11:58 < Atharva> Another thing, do you think it will be better to give a boolean parameter while constructing the repar layer whether the user wants to use KL or not, just for that extra functionality
11:58 < Atharva> We could make two cases in the backward function then
11:58 < sumedhghaisas_> umm I am not sure if that will be helpful
11:58 < sumedhghaisas_> repar without KL is just like auto encoders
11:59 < sumedhghaisas_> user can add a bottleneck layer to achieve the same performance
11:59 < Atharva> But there is no random sampling in autoencoders, repar will still have the random sampling
12:00 < sumedhghaisas_> yes... but there is no loss term to tell the layer to control the distribution
12:00 < sumedhghaisas_> it can overfit to every point sees
12:00 < sumedhghaisas_> the problem observed in auto encoers
12:00 < Atharva> Ohhh yes, it will overfit like crazy
12:00 < sumedhghaisas_> indeed
12:01 < sumedhghaisas_> is the test passing?
12:02 < Atharva> Wait, i will give it a go
12:05 < Atharva> To take a break from this, I started with the VAE class yesterday
12:05 < Atharva> It's failing, very badly : critical check CheckGradient(function) <= 1e-4 failed [0.99999998701936665 > 0.0001]
12:06 < Atharva> linear
12:06 < Atharva> repar
12:06 < Atharva> linear
12:31 -!- sumedhghaisas_ [31f8eb8e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
12:58 -!- sumedhghaisas [31f8eb8e@gateway/web/freenode/ip.] has joined #mlpack
12:58 < sumedhghaisas> Atharva: hmmm
12:58 < sumedhghaisas> Lets check the gradients again then
12:58 < sumedhghaisas> The code online is the one you are trying?
12:59 < Atharva> Yeah, with the latest changes you suggested
13:00 < sumedhghaisas> Atharva: the PR code is the latest?
13:01 < sumedhghaisas> but I still see the SoftplusFunction::Deriv issue in that code
13:01 < Atharva> Yeah I made those changes
13:01 < Atharva> haven't commited them yet
13:01 < sumedhghaisas> Okay. Make sure the code you are running is the latest
13:02 < sumedhghaisas> Have you tried debugging where the error is?
13:04 < Atharva> I am trying to calculate the gradients again, do you think the error could be elsewhere?
13:05 < sumedhghaisas> I am not sure. I need to look at the new code you are running
13:06 < Atharva> Okay, I will commit it
13:12 < Atharva> pushed
13:29 < sumedhghaisas> Atharva: Yes I saw. I gave it a very quick look but I have to complete some other work.
13:31 < sumedhghaisas> Try to isolate the error by only using mean or only using stddev
13:31 < sumedhghaisas> this way you would know where the error lies
13:31 < Atharva> Sure, I will try and debug it
13:31 < sumedhghaisas> also check for size consistency in the network, make sure every layer is getting the correct size input
13:32 < Atharva> Okay
13:33 < Atharva> So, I think today's sync isn't necessary now
13:33 < Atharva> I will try and debug the code
13:35 < sumedhghaisas> Atharva: Ahh wait... I see the problem
13:36 < sumedhghaisas> We need to approximate the gradient with constant gaussian sample
13:36 < sumedhghaisas> or the stochasticity in the sample will disturb the computation
13:37 < Atharva> Ohkay
13:38 < sumedhghaisas> Atharva: Okay I suggest adding a boolean to the layer, stochastic=True
13:38 < sumedhghaisas> if user passes false, always assign a constant value to the sample
13:39 < sumedhghaisas> This will help only in testing I guess
13:39 < sumedhghaisas> But its important
13:40 < Atharva> Yeah, but when we don't set a seed, it's always a constant sample
13:40 < sumedhghaisas> umm... I don't think so.
13:41 < sumedhghaisas> When the seed is same, the random number chain is same, each random number is not same
13:41 < sumedhghaisas> so if I run the program again and again, the same random numbers will be generated
13:42 < Atharva> Okay, I will do this
14:30 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:37 < sumedhghaisas> rcurtin: Hey Ryan
14:38 < sumedhghaisas> Do you think we should have a NormalDistribution to support matrix of univariate gaussain distributions?
14:38 < sumedhghaisas> *gaussian
14:39 < rcurtin> sumedhghaisas: that would just be a GaussianDistribution with a diagonal covariance matrix, right?
14:40 < sumedhghaisas> rcurtin: it can be done that way. But the problem occurs when there is a batch of distributions
14:40 < sumedhghaisas> like in VAE
14:41 < rcurtin> hmm, so I read some VAE papers but it was like 3 years ago and I think I forgot everything
14:41 < sumedhghaisas> for example
14:41 < rcurtin> so assume that I don't know much :)
14:42 < sumedhghaisas> Okay so let me describe the problme
14:42 < sumedhghaisas> *problem
14:43 < sumedhghaisas> So we pass a batch of points through encoder which converts each points to a gaussian distribution
14:44 < sumedhghaisas> so we have a batch of gaussian distributions, where each distribution has a fixed size
14:44 < sumedhghaisas> and each variable is independent of each other in the distribution
14:46 < sumedhghaisas> This will be too hard to be represented by our current setup. :(
14:46 < rcurtin> hmm, it seems to me like you could use the current GaussianDistribution class, and you would initialize the mean with the vector of means, and the covariance with diag(variances)
14:47 < rcurtin> however one problem with that is that the covariance will take d x d memory, but really since the covariance is diagonal, we should only need 'd' elements
14:47 < sumedhghaisas> yes... and the batch will make it worse
14:47 < sumedhghaisas> rather than b * d memory
14:47 < rcurtin> right, I guess d will be equal to the batch size?
14:48 < sumedhghaisas> it will take b * d * d
14:48 < rcurtin> or wait would it be b*d*d
14:48 < rcurtin> right
14:48 < rcurtin> I see
14:48 < rcurtin> hmm, so a couple of ideas spring to mind. you could write a new class that is made for multivariate gaussian distributions but is specific to diagonal covariances
14:49 < rcurtin> you could also templatize the existing GaussianDistribution class so that it takes whether or not the covariance is diagonal as a parameter, but I think maybe that is a little bit confusing
14:49 < rcurtin> or you could just work with the matrix of means and variances directly in the VAE classes
14:49 < rcurtin> I think any of those could be fine, but I agree, the existing GaussianDistribution would not work for this
14:49 < sumedhghaisas> huh... templatizing it would also work I guess
14:51 < rcurtin> right, I guess it would be template<bool DiagonalCovariance> class GaussianDistribution
14:51 < rcurtin> but I don't know if that makes it too complex
14:51 < rcurtin> I guess you could use using declarations to make it simpler again...
14:51 < rcurtin> template<bool DiagonalCovariance> class BaseGaussianDistribution;
14:52 < rcurtin> using BaseGaussianDistribution<false> = GaussianDistribution;
14:52 < rcurtin> using BaseGaussianDistribution<true> = DiagonalGaussianDistribution; // or some other name, I don't know if that is a good one
14:52 < rcurtin> anyway, that is just one possible idea
14:54 < sumedhghaisas> I agree... soundsconfusing
14:54 < sumedhghaisas> *sounds confusing
14:55 < sumedhghaisas> Naming it NormalDistribution will is confusing?
14:56 < sumedhghaisas> *be
14:57 < rcurtin> I think it may be confusing, but comments in the class description should be sufficient to clarify for users
14:57 < rcurtin> I can't think of too many other names that are not way too long
14:57 < rcurtin> GaussianDistributionExceptTheCovarianceIsDiagonal :)
14:57 < sumedhghaisas> haha :P
14:58 < sumedhghaisas> okay another issue
14:59 < sumedhghaisas> For this, let us continue with NormalDistribution with extensive documentation?
14:59 < sumedhghaisas> The name is also consistent with other libraries
15:02 < sumedhghaisas> I prefer treating it a matrix of nomral distributions rather than batch of gaussian distributions where each distribution has diagonal covariance. What you think?
15:03 < sumedhghaisas> The other issue is regarding FFN anf RNN arch
15:11 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
15:15 -!- sumedhghaisas [31f8eb8e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
15:20 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 260 seconds]
15:22 -!- travis-ci [] has joined #mlpack
15:22 < travis-ci> mlpack/mlpack#5024 (master - 1917b1a : Ryan Curtin): The build has errored.
15:22 < travis-ci> Change view :
15:22 < travis-ci> Build details :
15:22 -!- travis-ci [] has left #mlpack []
15:24 -!- sumedhghaisas2 [~yaaic@2402:3a80:651:1a49:6c10:d45f:db49:283] has joined #mlpack
15:27 < rcurtin> sorry, I had a meeting, but it is done now
15:27 < rcurtin> I think NormalDistribution is fine if that's what you'd like to do
15:27 < rcurtin> what's the FNN/RNN architecture issue?
15:28 -!- sumedhghaisas [~yaaic@2402:3a80:650:f03e:745b:a6e3:838a:be84] has joined #mlpack
15:29 -!- sumedhghaisas2 [~yaaic@2402:3a80:651:1a49:6c10:d45f:db49:283] has quit [Ping timeout: 276 seconds]
15:33 -!- sumedhghaisas [~yaaic@2402:3a80:650:f03e:745b:a6e3:838a:be84] has quit [Ping timeout: 276 seconds]
15:35 -!- sumedhghaisas [~yaaic@2402:3a80:671:7487:3765:e178:ada5:4385] has joined #mlpack
16:51 -!- sumedhghaisas2 [~yaaic@2402:3a80:67b:d269:66af:bef6:7509:7d2] has joined #mlpack
16:51 -!- sumedhghaisas [~yaaic@2402:3a80:671:7487:3765:e178:ada5:4385] has quit [Ping timeout: 276 seconds]
16:58 -!- sumedhghaisas2 [~yaaic@2402:3a80:67b:d269:66af:bef6:7509:7d2] has quit [Ping timeout: 240 seconds]
16:59 -!- sumedhghaisas [~yaaic@2402:3a80:677:8167:4e6d:6e3d:a65:52b6] has joined #mlpack
17:14 -!- sumedhghaisas2 [~yaaic@2402:3a80:64c:a7e1:f8d7:bb67:aa58:9c16] has joined #mlpack
17:15 -!- sumedhghaisas [~yaaic@2402:3a80:677:8167:4e6d:6e3d:a65:52b6] has quit [Ping timeout: 260 seconds]
17:50 -!- haritha1313 [2ff7e0d8@gateway/web/freenode/ip.] has joined #mlpack
17:51 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 245 seconds]
17:53 < haritha1313> rcurtin: Hi, i had a doubt in armadillo. Thought you might be able to help me out. Is there any function I can use to compare multiple values simultaneously. For e.g if i want to get all rows which have values 3 in column 1 and 4 in column 2, without using find() in a loop.
17:53 < haritha1313> Something like checking for a pair
18:17 -!- sumedhghaisas [~yaaic@2402:3a80:65f:83c8:a479:c3ec:c2d7:b101] has joined #mlpack
18:19 -!- sumedhghaisas2 [~yaaic@2402:3a80:64c:a7e1:f8d7:bb67:aa58:9c16] has quit [Ping timeout: 255 seconds]
18:20 < rcurtin> haritha1313: I see what you mean, I can't think of an immediate function for that
18:20 < rcurtin> but, I wonder if you could use a clever lambda in .transform() or something like this
18:20 < rcurtin> I am not sure if you could change the size of the matrix during such a call, though
18:21 < rcurtin> I don't think it would be efficient, but you could use something like sum(a == b) where a is the matrix you're interested in, and b is a matrix with 3s in column 1, 4s in column 2, and nans everywhere else
18:22 < haritha1313> Right now I am using find() for first value, and then using the returned indices to use any() for second value. It seemed to be a bit slow.
18:22 < rcurtin> given the complexity it may be better to just write a for loop over each column
18:22 < rcurtin> er, rather, loop over each row (although since Armadillo is column major it is faster to iterate over columns)
18:23 < rcurtin> (so maybe it is worth transposing the matrix)
18:24 < haritha1313> Will nested loop have lesser complexity than find(), any()?
18:25 < rcurtin> possibly; find() will turn into a loop, and any() may turn into another loop
18:25 < rcurtin> so if you can do it all as one loop, I don't know that there would be any faster way
18:26 -!- sumedhghaisas [~yaaic@2402:3a80:65f:83c8:a479:c3ec:c2d7:b101] has quit [Ping timeout: 260 seconds]
18:26 < haritha1313> Actually I'm trying out stuff on the movielens-1m dataset, so needed it to be fast enough for 1 million entries.
18:27 < haritha1313> Thanks for helping :) . I'll try it as nested loop itself, the column major point you mentioned will be helpful.
18:27 < rcurtin> the matrix format is going to be Nx3, right? where each column is user id, item id, rating
18:28 < haritha1313> yes
18:28 < rcurtin> or are you representing it as a huge sparse matrix?
18:28 < rcurtin> ah, ok
18:45 -!- sumedhghaisas [~yaaic@] has joined #mlpack
18:54 -!- sumedhghaisas [~yaaic@] has quit [Read error: Connection reset by peer]
18:55 -!- haritha1313 [2ff7e0d8@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
19:00 -!- sumedhghaisas [~yaaic@] has joined #mlpack
19:08 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
19:09 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 260 seconds]
19:11 -!- sumedhghaisas [~yaaic@2402:3a80:65a:52fb:9bdf:334:80dd:e33] has joined #mlpack
19:12 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 260 seconds]
19:34 < ShikharJ> zoq: It seems as though with a lower gradient multiplier, only the time for convergence is increased and no major visual change to the output.
19:42 -!- sumedhghaisas [~yaaic@2402:3a80:65a:52fb:9bdf:334:80dd:e33] has quit [Ping timeout: 276 seconds]
19:43 -!- sumedhghaisas [~yaaic@2402:3a80:66d:1393:580e:f55a:5442:74af] has joined #mlpack
19:47 -!- sumedhghaisas [~yaaic@2402:3a80:66d:1393:580e:f55a:5442:74af] has quit [Ping timeout: 276 seconds]
19:53 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
19:58 < zoq> ShikharJ: hmm, okay, I guess we could rerun the experiments with some other parameters, but since the results are just fine for the smaller dataset I would say let's goahead and merge the code, so that we can continue with the next part. What do you think?
20:31 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
--- Log closed Fri Jun 08 00:00:54 2018