mlpack IRC logs, 2018-06-11

Logs for the day 2018-06-11 (starts at 0:00 UTC) are shown below.

June 2018
--- Log opened Mon Jun 11 00:00:58 2018
02:11 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
05:53 -!- sulan_ [] has joined #mlpack
07:47 -!- sulan_ [] has quit [Read error: Connection reset by peer]
07:48 -!- sulan_ [] has joined #mlpack
08:13 -!- __sulan__ [] has joined #mlpack
08:16 -!- sulan_ [] has quit [Ping timeout: 264 seconds]
09:35 < jenkins-mlpack> Project docker mlpack nightly build build #346: STILL UNSTABLE in 2 hr 21 min:
14:00 < zoq> manish7294: should fix the issue, also did you test AMSGrad?
14:21 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:54 -!- wenhao [731bc011@gateway/web/freenode/ip.] has joined #mlpack
15:09 -!- manish7294 [8ba79d0c@gateway/web/freenode/ip.] has joined #mlpack
15:13 < manish7294> zoq: Thanks for solving the issue and these good suggestions. The changing batch size has made the batch precalculation of lmnn redundant :)
15:13 < manish7294> Either way It was not making much of a difference.
15:14 < manish7294> AMSGrad also works great :)
15:18 < manish7294> rcurtin: As per the findings BigBatchSGD(both adaptive search and line search) or AMSGrad are good options to replace SGD.
15:24 -!- __sulan__ [] has quit [Quit: Leaving]
16:14 < rcurtin> manish7294: great to hear the different optimizers worked better; do you have benchmarking results for them?
16:15 < rcurtin> I saw your comments on the LMNN PR also; I haven't had a chance to dig in deeply, but did calling Impostors() only once every 100 iterations help?
16:16 < manish7294> rcurtin: I have mostly tested them on iris, vc2 and covertype 5k points dataset and by looking at the results I would say results are quite similar but they help in avoiding divergence
16:17 < manish7294> calling impostors after 100 iteration is leading to errors.
16:23 < manish7294> Let me verify the 100 iteration idea once again
16:28 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
16:43 < manish7294> With sgd on 5k covertype data I am getting " [WARN ] SGD: converged to -nan; terminating with failure. Try a smaller step size? " within a second of starting and with BigBatchSgd it does not seem to converge.
16:44 < manish7294> with BigBatchSGD coordinates values seems to remain oscillating between few values.
16:53 -!- sumedhghaisas [68842d55@gateway/web/freenode/ip.] has joined #mlpack
16:56 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
16:57 < sumedhghaisas> Atharva: Hi Atharva
16:58 < sumedhghaisas> Hows it going?
16:58 < Atharva> I am just about to post to the blog.
16:58 < sumedhghaisas> Maybe we can speed up the mail thread with IRC :)
16:58 < Atharva> It's done.
16:58 < sumedhghaisas> Nice! I will take a look at it later
16:58 < Atharva> The tasks for this week
16:59 < sumedhghaisas> umm... Have you updated the PR?
17:00 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
17:01 < Atharva> No, I am just trying to the debug the failing Jacobian test, but I am not quite sure what that test does.
17:01 < Atharva> The gradient check is passing with the KL loss added to the total loss
17:03 < sumedhghaisas> Jacobian tests is failing?
17:03 < sumedhghaisas> huh... Well, push away and lets see why is that test not happy
17:04 < Atharva> OKay
17:04 < sumedhghaisas> Also about the VAE class, which aspect of VAE do you think cannot be emulated by FFN?
17:05 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
17:07 < Atharva> For example, the Encode function, GenerateRandom, GenerateSpecific, SampleOutput. Also, I had to make the Evaluate and Backward function loop over all the layers collecting extra loss, which is 0 almost all the time.
17:08 < Atharva> Even in a VAE network, for one layer, it's too much work.
17:09 < sumedhghaisas> The loop is mostly static... which shouldn't cause any delay.
17:09 < sumedhghaisas> The extra loss functionality is not just for VAE
17:10 < sumedhghaisas> it extends the FFN functionality to produce L1 and L2 regularized layers, which is a huge improvement over the current framework
17:11 < Atharva> Yes, I understand, but the generate and encode functionalities will have to forward pass through some layers of the network, with custom inputs
17:11 < Atharva> With, multiple repar layers, it will prove tougher
17:11 < sumedhghaisas> The Encode function is nothing but a forward of parametric model, feed forward, CNN or RNN thus we do not need to make any extra efforts for it.
17:12 < Atharva> Yes but partial
17:12 < Atharva> Yeah
17:12 < sumedhghaisas> If we look at VAE as a special model, we restrict the user to improve upon it
17:12 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
17:12 < sumedhghaisas> we will restrict them to use the functionality given by us
17:13 < Atharva> That makes sense
17:13 < sumedhghaisas> if we indeed look at it as a specific case of FFN and make sure the current architecture supports it
17:14 < sumedhghaisas> we not only make sure VAE could be implemented but the user can use the extra FFN features to improve upon it
17:14 < sumedhghaisas> For example, if you implement VAE class
17:14 < Atharva> Yeah, I never thought of it that way
17:14 < sumedhghaisas> you have to make sure you support hierarchical VAE, beta VAE, regularized VAE
17:15 < sumedhghaisas> although with FFN, multiplke repar layers would achieve hierarchical aspect
17:15 < sumedhghaisas> specialized repar layer with Beta will achieve Beta VAE and so on
17:16 < sumedhghaisas> minimal changes
17:16 < sumedhghaisas> Although I am still not 100 percent sure we can emulate it :)
17:17 < sumedhghaisas> So some thinking is required there
17:18 < Atharva> So, let's go ahead and start making some models with the FFN class, and if some functions prove too complex, then we can give a thought to VAE class.
17:18 < Atharva> If not, then we are good.
17:19 < sumedhghaisas> that would be risky, as a new class shift is not a simple one
17:19 < sumedhghaisas> Lets look at the aspects of VAE that we cannot satisfy right now
17:20 < sumedhghaisas> 1) Generation
17:20 < sumedhghaisas> what else?
17:20 < sumedhghaisas> hmmm
17:21 < sumedhghaisas> Okay how do we implement generation in FFN
17:22 < Atharva> The generation can be random or controlled
17:22 < Atharva> We need to think about both cases
17:22 < sumedhghaisas> indeed
17:24 < sumedhghaisas> okay give it some thought, lets try involving Ryan and Marcus as well and see if they have some thought on it
17:25 < Atharva> Yeah, can you explain how yoou said we would implement Encode?
17:25 < Atharva> you*
17:25 < Atharva> Can we do partial forward pass with FFN class?
17:25 < sumedhghaisas> Encode is not a direct feature of VAE, but generation is
17:26 < sumedhghaisas> Encode happens as a part of Forward
17:26 < sumedhghaisas> ahh partial pass
17:26 < sumedhghaisas> thats what I was thinking
17:26 < Atharva> Yes, but we should be able to have just the encodings i we want to.
17:27 < Atharva> From thoese encoding, we should be able to operate the Generate functions independently
17:27 < sumedhghaisas> I agree. We should, that could be achieved with partial pass
17:27 < Atharva> Yeah
17:27 < sumedhghaisas> If we do the partial pass and access the layers output parameter
17:28 < sumedhghaisas> we will get encoding
17:28 < Atharva> and then Generate either randomly, or with a sample of our choice
17:29 < sumedhghaisas> If we define the final layer as distribution layer the current architecture should produce conditional samples
17:29 < sumedhghaisas> For example, the current architectute Predict outputs the last layer output
17:30 < Atharva> Yes, but a VAE outputs a distribution
17:30 < Atharva> parameters to a distribution
17:30 < sumedhghaisas> if the last layer outputs a distribution, we sample from it to generate conditional samples
17:30 < Atharva> We should be able to then sample from that
17:30 < Atharva> Exactly
17:30 < sumedhghaisas> yes but that only conditional
17:31 < sumedhghaisas> How do we produce unconditional samples?
17:31 < Atharva> Sorry, what exaclty do you mean by unconditional samples?
17:31 < sumedhghaisas> for that we need to start the forward propagation from Repar layer
17:32 < sumedhghaisas> ohh conditional samples are samples from P(Z | X) where uncoditional are from P(Z)
17:33 < sumedhghaisas> basically conditional are samples from posterior over the latents and unconditional are samples from latent prior
17:33 < Atharva> Yeah
17:33 < Atharva> We need to start from repar layer for that
17:33 < sumedhghaisas> yes. Now thats the puzzler.
17:35 < Atharva> I just pushed the latest changes
17:36 < sumedhghaisas> Okay. Lets keep thinking about this and complete this week's work first. Lets hope we find some solution till then.
17:37 < sumedhghaisas> I will take a look at it tonight :)
17:37 < Atharva> Sure!
18:25 < Atharva> sumedhghaisas: You there?
18:50 -!- manish7294 [8ba79d0c@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
18:52 < rcurtin> manish7294: I think we need to debug the idea a little bit more. recalculating impostors only once every 100 iterations should work just fine
18:52 < rcurtin> if you like, you could try recalculating only every other iteration
18:53 < rcurtin> just for debugging
18:53 < rcurtin> but it should be no problem, since all we are calculating in Impostors() is the indices of the impostors
20:24 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
20:43 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
22:52 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
--- Log closed Tue Jun 12 00:00:00 2018