mlpack IRC logs, 2018-05-31

Logs for the day 2018-05-31 (starts at 0:00 UTC) are shown below.

May 2018
--- Log opened Thu May 31 00:00:43 2018
00:41 -!- sumedhghaisas2 [~yaaic@2402:3a80:648:d418:f0e3:f6c6:1baf:833e] has joined #mlpack
00:43 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 244 seconds]
00:52 -!- sumedhghaisas2 [~yaaic@2402:3a80:648:d418:f0e3:f6c6:1baf:833e] has quit [Ping timeout: 240 seconds]
00:58 -!- sumedhghaisas [~yaaic@2402:3a80:653:fadf:fc3e:5a0f:e1f4:e673] has joined #mlpack
01:03 -!- sumedhghaisas [~yaaic@2402:3a80:653:fadf:fc3e:5a0f:e1f4:e673] has quit [Ping timeout: 276 seconds]
01:25 -!- sumedhghaisas2 [~yaaic@2402:3a80:64c:6978:61a1:a5b1:866d:d7f7] has joined #mlpack
01:27 -!- sumedhghaisas3 [~yaaic@2402:3a80:646:6c8c:383:a452:389b:fabf] has joined #mlpack
01:30 -!- sumedhghaisas2 [~yaaic@2402:3a80:64c:6978:61a1:a5b1:866d:d7f7] has quit [Ping timeout: 276 seconds]
01:34 -!- sumedhghaisas [~yaaic@2402:3a80:669:20e6:f476:1a7a:ba9c:74dc] has joined #mlpack
01:36 -!- sumedhghaisas2 [~yaaic@2402:3a80:699:d6f4:363e:3c37:ca8e:6d09] has joined #mlpack
01:37 -!- sumedhghaisas3 [~yaaic@2402:3a80:646:6c8c:383:a452:389b:fabf] has quit [Ping timeout: 276 seconds]
01:39 -!- sumedhghaisas [~yaaic@2402:3a80:669:20e6:f476:1a7a:ba9c:74dc] has quit [Ping timeout: 276 seconds]
01:40 -!- sumedhghaisas2 [~yaaic@2402:3a80:699:d6f4:363e:3c37:ca8e:6d09] has quit [Ping timeout: 240 seconds]
01:41 -!- sumedhghaisas [~yaaic@2402:3a80:669:9617:9d42:bb97:dcdd:418d] has joined #mlpack
01:45 -!- sumedhghaisas [~yaaic@2402:3a80:669:9617:9d42:bb97:dcdd:418d] has quit [Ping timeout: 240 seconds]
01:49 -!- sumedhghaisas2 [~yaaic@2402:3a80:669:e74d:308d:7b44:6530:88c1] has joined #mlpack
01:51 -!- sumedhghaisas [~yaaic@2402:3a80:663:19b8:eac2:b98:cac3:2dc2] has joined #mlpack
01:53 -!- sumedhghaisas2 [~yaaic@2402:3a80:669:e74d:308d:7b44:6530:88c1] has quit [Ping timeout: 240 seconds]
02:03 -!- sumedhghaisas [~yaaic@2402:3a80:663:19b8:eac2:b98:cac3:2dc2] has quit [Ping timeout: 240 seconds]
02:06 -!- sumedhghaisas [~yaaic@2402:3a80:66d:ab31:59fb:eae8:2683:7584] has joined #mlpack
02:08 -!- sumedhghaisas2 [~yaaic@2402:3a80:6b5:397c:36b8:2397:7be7:7d91] has joined #mlpack
02:11 -!- sumedhghaisas [~yaaic@2402:3a80:66d:ab31:59fb:eae8:2683:7584] has quit [Ping timeout: 276 seconds]
02:41 -!- govg [~govg@unaffiliated/govg] has quit [Ping timeout: 260 seconds]
03:50 -!- govg [~govg@unaffiliated/govg] has joined #mlpack
04:12 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
04:46 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
05:30 -!- sumedhghaisas2 [~yaaic@2402:3a80:6b5:397c:36b8:2397:7be7:7d91] has quit [Ping timeout: 240 seconds]
06:08 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
06:39 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 256 seconds]
06:44 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
06:44 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
06:48 -!- sumedhghaisas [~yaaic@] has joined #mlpack
06:53 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 276 seconds]
06:55 -!- sumedhghaisas [~yaaic@] has joined #mlpack
06:55 -!- wenhao [80018d42@gateway/web/freenode/ip.] has joined #mlpack
06:59 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
07:02 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 256 seconds]
07:04 -!- sumedhghaisas [~yaaic@2402:3a80:6bf:2af7:ce9:c5f9:ca5b:1543] has joined #mlpack
07:05 < wenhao> I am trying to do neighbor search with cosine distance. Does anyone how I should do that? I guess I should use NeighborSearch with IPMetric<CosineDistance> or fastmks.
07:05 < wenhao> Does anyone know how I should do that?*
07:13 -!- sumedhghaisas [~yaaic@2402:3a80:6bf:2af7:ce9:c5f9:ca5b:1543] has quit [Ping timeout: 256 seconds]
07:15 -!- sumedhghaisas [~yaaic@2402:3a80:644:775f:dfdd:f2c5:61d:8ff6] has joined #mlpack
07:17 -!- sumedhghaisas2 [~yaaic@2402:3a80:666:b949:8279:e2e3:805e:c532] has joined #mlpack
07:19 -!- sumedhghaisas [~yaaic@2402:3a80:644:775f:dfdd:f2c5:61d:8ff6] has quit [Ping timeout: 240 seconds]
07:21 -!- sumedhghaisas2 [~yaaic@2402:3a80:666:b949:8279:e2e3:805e:c532] has quit [Ping timeout: 240 seconds]
07:26 -!- travis-ci [] has joined #mlpack
07:26 < travis-ci> manish7294/mlpack#13 (lmnn - ac54f6f : Manish): The build was fixed.
07:26 < travis-ci> Change view :
07:26 < travis-ci> Build details :
07:26 -!- travis-ci [] has left #mlpack []
07:33 < ShikharJ> rcurtin: It seems that tmux is not installed on, and I can't seem to install it using sudo, as I'm not one of the sudoers.
07:34 -!- wenhao [80018d42@gateway/web/freenode/ip.] has quit [Quit: Page closed]
07:34 -!- sumedhghaisas [~yaaic@2402:3a80:653:3a18:c9c:53d1:5b91:280a] has joined #mlpack
07:39 -!- sumedhghaisas [~yaaic@2402:3a80:653:3a18:c9c:53d1:5b91:280a] has quit [Ping timeout: 260 seconds]
07:48 -!- sumedhghaisas [~yaaic@2402:3a80:6ad:47dc:e518:6e4f:c9d8:9390] has joined #mlpack
08:28 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
09:01 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
09:02 -!- govg [~govg@unaffiliated/govg] has quit [Quit: leaving]
09:57 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Read error: Connection reset by peer]
10:03 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
10:28 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
10:29 -!- sumedhghaisas [~yaaic@2402:3a80:6ad:47dc:e518:6e4f:c9d8:9390] has quit [Ping timeout: 276 seconds]
10:29 < ShikharJ> zoq: Since, the work on standard GAN is nearing completion, I'll start working on DCGAN implementation and testing for the rest of Phase I (~1.5 weeks)
10:30 -!- sumedhghaisas [~yaaic@] has joined #mlpack
10:31 < ShikharJ> zoq: I also opened a PR for shed_cols and shed_rows in Cubes for armadillo.
10:32 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 244 seconds]
10:34 -!- sumedhghaisas [~yaaic@] has quit [Ping timeout: 244 seconds]
10:35 -!- sumedhghaisas [~yaaic@2402:3a80:659:c1ae:de53:f91f:6fc5:a65f] has joined #mlpack
10:42 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
10:43 -!- sumedhghaisas [~yaaic@2402:3a80:659:c1ae:de53:f91f:6fc5:a65f] has quit [Ping timeout: 276 seconds]
10:56 -!- sumedhghaisas [~yaaic@2402:3a80:65c:4465:15e:3bd2:eda2:682c] has joined #mlpack
10:57 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 256 seconds]
12:32 -!- sumedhghaisas2 [~yaaic@2402:3a80:672:f308:e945:70ea:83ba:3bcb] has joined #mlpack
12:32 -!- sumedhghaisas [~yaaic@2402:3a80:65c:4465:15e:3bd2:eda2:682c] has quit [Ping timeout: 255 seconds]
12:35 -!- sumedhghaisas [~yaaic@2402:3a80:67a:7c80:b9d3:96f6:5bb9:69a4] has joined #mlpack
12:36 -!- sumedhghaisas2 [~yaaic@2402:3a80:672:f308:e945:70ea:83ba:3bcb] has quit [Ping timeout: 240 seconds]
12:40 -!- sumedhghaisas2 [~yaaic@2402:3a80:66a:28d2:d4cb:5662:c19b:8f86] has joined #mlpack
12:41 -!- sumedhghaisas [~yaaic@2402:3a80:67a:7c80:b9d3:96f6:5bb9:69a4] has quit [Ping timeout: 240 seconds]
12:45 -!- sumedhghaisas2 [~yaaic@2402:3a80:66a:28d2:d4cb:5662:c19b:8f86] has quit [Ping timeout: 276 seconds]
12:45 -!- sumedhghaisas [~yaaic@2402:3a80:640:f02a:2cea:c8aa:5eb8:1a53] has joined #mlpack
13:12 -!- sumedhghaisas3 [~yaaic@2402:3a80:65a:3045:e142:c625:e69c:73bb] has joined #mlpack
13:13 < rcurtin> ShikharJ: fixed, sorry about that
13:13 -!- sumedhghaisas [~yaaic@2402:3a80:640:f02a:2cea:c8aa:5eb8:1a53] has quit [Ping timeout: 276 seconds]
13:16 -!- sumedhghaisas [~yaaic@2402:3a80:69f:6cd6:e396:1943:9f22:9b5f] has joined #mlpack
13:19 -!- sumedhghaisas3 [~yaaic@2402:3a80:65a:3045:e142:c625:e69c:73bb] has quit [Ping timeout: 255 seconds]
13:29 -!- sumedhghaisas2 [~yaaic@2402:3a80:645:d774:8643:2b55:7ef8:d4a8] has joined #mlpack
13:30 -!- sumedhghaisas [~yaaic@2402:3a80:69f:6cd6:e396:1943:9f22:9b5f] has quit [Ping timeout: 255 seconds]
13:56 -!- sumedhghaisas [~yaaic@2402:3a80:676:d45c:9b01:e374:15f3:9a13] has joined #mlpack
13:59 -!- sumedhghaisas3 [~yaaic@] has joined #mlpack
13:59 -!- sumedhghaisas2 [~yaaic@2402:3a80:645:d774:8643:2b55:7ef8:d4a8] has quit [Ping timeout: 276 seconds]
14:01 -!- sumedhghaisas [~yaaic@2402:3a80:676:d45c:9b01:e374:15f3:9a13] has quit [Ping timeout: 276 seconds]
14:02 -!- sumedhghaisas [~yaaic@2402:3a80:69d:955d:b7cc:4b85:5b79:b87f] has joined #mlpack
14:04 -!- sumedhghaisas3 [~yaaic@] has quit [Ping timeout: 244 seconds]
14:17 -!- sumedhghaisas [~yaaic@2402:3a80:69d:955d:b7cc:4b85:5b79:b87f] has quit [Ping timeout: 276 seconds]
14:24 -!- manish7294 [8ba70a35@gateway/web/freenode/ip.] has joined #mlpack
14:25 -!- sumedhghaisas2 [~yaaic@2402:3a80:662:c005:b2b3:2ad3:2ae5:2781] has joined #mlpack
14:27 < manish7294> rcurtin: Currently I have written a custom knn accuracy calculator. Can you please check it?
14:32 < Atharva> sumedhghaisas2: You there?
14:33 -!- sumedhghaisas2 [~yaaic@2402:3a80:662:c005:b2b3:2ad3:2ae5:2781] has quit [Quit: Yaaic - Yet another Android IRC client -]
14:34 -!- sumedhghaisas [2a6b9f2a@gateway/web/freenode/ip.] has joined #mlpack
14:34 < sumedhghaisas> Atharva: Hi Atharva
14:34 < Atharva> Should we begin?
14:35 < sumedhghaisas> Sure thing.
14:35 < sumedhghaisas> Hows it going so far?
14:36 < Atharva> So, I read your mail . Do you mean to say that if we embed a linear layer in the sampling layer, we don't give the option to the users to use other types of layers to generate mean and std?
14:36 < Atharva> Whatever implementation I could find on the internet used only linear layers to output the mean and std
14:37 < sumedhghaisas> Yes. I am not yet sure what other type of layers the users might want to use. Although I do have one use case in mind
14:37 < Atharva> It is going good, I am having fun but am stuck at some parts
14:37 < Atharva> which case?
14:38 < sumedhghaisas> In the current work I am doing, the model restricts the mean to be in certain range, lets say from [-x, x]
14:38 < sumedhghaisas> To emulate this I add a non-linearity over the linear layer before reparametrization
14:38 < sumedhghaisas> but I understand it is a very specific case
14:38 < Atharva> Oh, I understand.
14:39 < Atharva> Yeah it is, because then the users would have to output the right size of the matrix.
14:40 < sumedhghaisas> But in general, I am more inclined towards better control and higher responsibility designs. So in this case we could also take Ryan opinion
14:40 < sumedhghaisas> @rcurtin
14:40 < Atharva> For people new to VAEs, we would have to throw a very detailed error about what to do
14:41 < sumedhghaisas> yes I agree, and I am not sure if we can throw such an error
14:41 < sumedhghaisas> lets keep this at the back of our head, and ponder over it
14:41 < sumedhghaisas> for now we will proceed by embedding the linear layer
14:41 < sumedhghaisas> in case we decide to shift, it would be a minor PR
14:42 < sumedhghaisas> what you think?
14:42 < Atharva> I think that will work
14:42 < Atharva> I can later make the changes
14:42 < sumedhghaisas> Sounds good.
14:43 < Atharva> That's almost done then, I am just stuck on the delta function.
14:43 < sumedhghaisas> Now did you get a chance to replace the weights and bias with Linear<> layer? I think that will solve all the derivative problems
14:44 < Atharva> I thought I had it yesterday, but then I tried implementing it and the differentiation turned out to be wrong
14:44 < sumedhghaisas> ahh sorry. I was talking about a different issue I guess.
14:44 < sumedhghaisas> which delta function?
14:44 -!- wenhao [731bca27@gateway/web/freenode/ip.] has joined #mlpack
14:45 < Atharva> Delta function in the implementation of the sampling layer.
14:46 < Atharva> The thing is, we have to differentiate the final output which is : mean + std % gaussian sample w.r.t. the mean and the std
14:47 -!- mikeling [uid89706@gateway/web/] has joined #mlpack
14:47 < Atharva> and then join those two delta matrices to get out final delta
14:48 < sumedhghaisas> ahh you mean the first derivative, I see, I confused it with kronecker delta
14:48 < Atharva> yeah, my calculations are:
14:49 < Atharva> delta (std) = (weight (std) * error) % gaussian sample
14:49 < Atharva> delta(mean) = weight(mean) * error
14:50 < Atharva> then join_col the two deltas
14:53 < sumedhghaisas> umm... Did you read my comment regarding replacing the weights and bias with Linear<> in the mail I sent?
14:54 < Atharva> Yeah, so even after that we will need to add some gradients, right?
14:54 < Atharva> beacuse we use the outputs of that linear layer to calculate the final sample
14:55 < sumedhghaisas> indeed, but then the equation becomes simpler, sorry I got confused by the 'weight' in your equations
14:55 < sumedhghaisas> so the equation is - output = mean + sample .* stddev
14:55 < Atharva> no problem
14:55 < rcurtin> sumedhghaisas: keeping in mind that I am not familiar with GANs very much, my opinion would usually be that allowing extra flexibility is good if it's possible
14:56 < rcurtin> from a high level it doesn't seem like it would be hugely difficult to allow different layer types for a sampling layer, but again, I am not an expert in GANs, so my input may not be 100% useful :)
14:56 < sumedhghaisas> so d(output) = sample .* d(stddev)
14:57 < sumedhghaisas> this way we could backpropage the error of stddev and mean
14:58 < Atharva> yeah and d(output) = d(mean), right? so there will be two gradients, w.r.t. std and mean respectively
14:58 < sumedhghaisas> am I making any sense? :)
14:58 < Atharva> Yeah I got it :)
14:59 < sumedhghaisas> rcurtin: I agree. I always prefer flexibility :) I think if we document enough it should be pretty clear, also person using VAEs will be familiar with such architectures
14:59 < Atharva> sumedhghaisas: So, should I not embed the linear layer?
15:00 < sumedhghaisas> Atharva: Also I think we need to bound the stddev with a Softmax non-linearity
15:00 < Atharva> yeah, I got that from the mail :)
15:01 < Atharva> But why not leaky relu?
15:01 < sumedhghaisas> Atharva: Yes I think that would be better, so the sampling layer (I would like to call it Reparametrization Layer, as its more common terminology in the literature) will only take a vector of 2n , break it in 2 parts
15:02 < sumedhghaisas> first part would be mean, and the second would be bounded with Softmax
15:02 < rcurtin> ack sorry you are doing VAEs not GANs. oops :(
15:02 < Atharva> rcurtin: It's okay :)
15:02 < sumedhghaisas> rcurtin: but I think flexibility is universal :)
15:03 < sumedhghaisas> Atharva: Also this way there won't be any trainable parameters in the layer, so we only do backward
15:05 < sumedhghaisas> Atharva: oops... seems like we don't have Softplus non-linearity :(
15:05 < Atharva> About that, why not leaky relu?
15:06 < Atharva> It will solve the problem of negative values but won't change the function much
15:07 < sumedhghaisas> hmmm... that oes sound like a viable alternative to me, I think leaky ReLU is defined in such a way to provide some small gradient even when the unit is not active
15:08 < sumedhghaisas> that would help for sure
15:08 < sumedhghaisas> although let me think it over for a moment
15:08 < Atharva> Okay, sure. I also have some other things to discuss after this.
15:09 < sumedhghaisas> Atharva: wait, I just googled the Leaky relu function, its max(x, ax) right?
15:09 < sumedhghaisas> thats not strictly positive... or am I missing something?
15:10 < Atharva> oh sorry
15:10 < Atharva> I was thinking about relu but with some small value added to it so that it's always a little more than zero
15:11 < rcurtin> manish7294: the KNN accuracy calculator looks just fine to me
15:12 < sumedhghaisas> I think ReLU will cause a problem here... if some initialization turns the unit off (produces negative value) it will create a problem in training that latent, given that the initialization is usually done with clipped uniform normal
15:14 < Atharva> Okay then, we will go with softplus, but you said it isn't in the codebase
15:14 < sumedhghaisas> the other option is e^(output)
15:14 < sumedhghaisas> although this function suffers from local optima is KL divergence
15:14 < Atharva> Yeah, so we just train ln(std) instead
15:16 < sumedhghaisas> ln(std)? that won't be strictly positive, will it? if the value is under e
15:17 < Atharva> Sorry, I meant to say that we output ln(std) and then e^(output) to get std. I was just elaborating what you said.
15:17 < sumedhghaisas> ahh and by local optima I mean, with some initialization the output of the layer becomes too high, as e is raised by it
15:18 < Atharva> I see, what do you suggest we do?
15:18 < sumedhghaisas> in this bowl of locality the KL is huge, thus network is forced to optimize the log likelihood only
15:19 < Atharva> That's defeats the purpose of adding KL divergence :|
15:20 < sumedhghaisas> hmm... it depends on our timeframe
15:20 < sumedhghaisas> for now, lets go ahead with e^(output)
15:20 < Atharva> Okay
15:21 < sumedhghaisas> Softplus is not a difficult thing to add, the function is log(exp(features) + 1)
15:21 < sumedhghaisas> It would be ideal to add that at later stage and replace the e^(feature) function
15:22 < sumedhghaisas> as we are running close to our schedule, I think we should focus on the timeline for now
15:22 < Atharva> Should I just use the softplus directly in the layer for now, instead of adding it to the codebase
15:23 < sumedhghaisas> umm... if we put that much effort in implementing the function and taking the derivatives, it would take 5 more minutes to add that in a separate file under 'layer' module :)
15:23 < Atharva> That's true, so for now I will just use e^()
15:24 < sumedhghaisas> its just copy pasting the forward and backward function lines, and adding that class in layers.hpp
15:24 < sumedhghaisas> yes that sounds good :)
15:25 < Atharva> About the KL divergence now, I didn't know that a PR for that has already been merged into the codebase
15:26 < sumedhghaisas> huh... I am not sure I got the notification for that PR. Did you mention me in that PR?
15:26 < Atharva> No, it wasn't mine
15:26 < sumedhghaisas> ohh ... who did it then?
15:26 < sumedhghaisas> let me see
15:26 -!- wenhao [731bca27@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
15:27 < Atharva> #1365
15:28 < sumedhghaisas> is it this one?
15:28 < sumedhghaisas> ohh okay ..
15:29 < sumedhghaisas> Atharva: ooookey.
15:29 < Atharva> Can you take a look at the forward function in that implementation? It seems to me that formula used there is for the functions and not for that parameters.
15:30 < Atharva> The first formula from that paper
15:30 < sumedhghaisas> i don't understand why create a separate API for KL divergence
15:31 < sumedhghaisas> why not use the current 'dist' module in core
15:32 < sumedhghaisas> Atharva: it is merged though
15:32 < sumedhghaisas> hmm... I will take a look
15:33 < Atharva> Sorry, I didn't understand how we use the dist module for KL divergence.
15:34 < sumedhghaisas> although what do you mean by formula used for the functions and not for the parameters?
15:35 < sumedhghaisas> ahh okay, so KL divergence is basically just a distance between one distribution from another, so implement a KL function in any distribution class, which accepts an another distribution class to produce KL
15:35 < Atharva> arma::accu(input % (arma::log(input) - arma::log(target))); I may be wrong but how do we calculate KL divergence between two distributions with this?
15:37 < Atharva> In the case of VAEs, after evaluating, the expression turns out something like this :
15:40 < sumedhghaisas> I see what they have done. They have implemented the actual definition of KL divergence
15:41 < Atharva> Yes, that's what I meant to say.
15:42 < Atharva> Is it even possible to have a genral expression for KL divergence which can be used for kinds of distributions?
15:43 < sumedhghaisas> hmm... so they are evaluating the KL empirically for a batch.
15:43 < sumedhghaisas> I don't think it will be useful for us
15:44 < sumedhghaisas> we have well defined distributions
15:44 < Atharva> Yeah
15:44 < Atharva> How should we proceed then?
15:46 < sumedhghaisas> So we proceed by our 'dist' API then, we implement the KL and its derivative in 'dist'
15:46 < sumedhghaisas> let me try to elaborate on that
15:46 < sumedhghaisas> So the function will accept an object with template parameter
15:46 < sumedhghaisas> we can specialize the template to accommodate different implementation of the KL
15:47 < sumedhghaisas> so in Gaissian we will have a template specialization which accepts other Gaissian dist objects and returns a KL but for other dists its will throw a compile time error
15:48 < sumedhghaisas> hope I am not confusing you too much
15:48 < Atharva> No, I understand, it will become clearer when I actually implement it.
15:49 < sumedhghaisas> This way someone else could extend this API to support KL between various other combinations for his own purpose
15:50 < sumedhghaisas> We will only implement the KL between 2 gaussians
15:51 < Atharva> roger that :)
15:52 < sumedhghaisas> sounds good :) Now we just a good name for the function and its derivative function :)
15:52 < sumedhghaisas> 'kl' sounds good for forward, what should be backward?
15:53 < Atharva> klBackward?
15:54 < sumedhghaisas> 'kl_backward'? or 'kl_derivative' ?
15:54 < sumedhghaisas> you are right... 'kl_backward' sounds better
15:55 < sumedhghaisas> Atharva: now that we are using 'dist' API... there is one little request I would like to make
15:55 < Atharva> yeah, in that, when going forward kl comes from the latent variables, but when going backward it goes to the last layer of the vae network, how do we handle that?
15:56 < Atharva> wont the derivatives change?
15:57 < sumedhghaisas> ahh yes...
15:57 < sumedhghaisas> we need to make sure we accommodate the KL loss in final loss
15:58 < sumedhghaisas> ahh wait... but the KL loss does not affect the decoder in VAE
15:59 < sumedhghaisas> does it?
15:59 < sumedhghaisas> so the original architecture should work just fine...
15:59 < Atharva> I think it won't but the derivative will have to be w.r.t. the decoder
16:00 < sumedhghaisas> Let me see if I can explain this in a better way
16:01 < sumedhghaisas> So in our current framework we have a final loss, for which we will implement log likelihood loss
16:01 < sumedhghaisas> the error of that loss will propagate through the decoder layers and update the weights
16:02 < sumedhghaisas> when it will reach the repar layer, we will update the error with the error from KL loss and backpropagate it through the encoder layers
16:03 < sumedhghaisas> if you do the math, KL loss acts as a constant for decoder layers
16:03 < sumedhghaisas> the derivative will only play a role when mean and stddev is involved
16:03 < sumedhghaisas> maybe I am wrong but let me check this again
16:04 < sumedhghaisas> is it clear my point?
16:04 < Atharva> Yeah it is, but won't the derivative just go to zero for KL loss then?
16:05 < sumedhghaisas> I am not sure why they will go to zero
16:05 -!- kevgeo [0127b141@gateway/web/freenode/ip.] has joined #mlpack
16:06 < sumedhghaisas> the error for mean, will be basically the sum of error from upper layer and error from the KL loss term
16:06 < Atharva> Because as you said, KL loss acts as a constant for decoder layers, and the derivative of the total loss will be w.r.t. to the decoder
16:06 -!- kevgeo [0127b141@gateway/web/freenode/ip.] has quit [Client Quit]
16:07 < sumedhghaisas> ahh maybe I see your confusion
16:07 < sumedhghaisas> the derivative of KL loss will be zero for decoder layers
16:07 < sumedhghaisas> but they won't be zero for mean and stddev in repar layer
16:08 < sumedhghaisas> so we will update the error in that layer
16:09 < Atharva> understood, sorry if i am requiring too much help. I think it will get easier when I start implementing the vae class and these prerequisites are taken care of
16:10 < sumedhghaisas> no problem dude :) Backpropagation is sometimes little tricky... Although I would like to suggest a simple exercise for you if you want to clear these concepts
16:11 < Atharva> That would be very helpful, what is the exercise?
16:12 < sumedhghaisas> So write down a neural network on paper with 1 feed forward layer and non linearity given by function 'g' as an encoder
16:13 < sumedhghaisas> the output goes directly to the repar and then for decoder same feed forward layer with same 'g' function as non linearity
16:13 < sumedhghaisas> so in total there are 3 layers
16:14 < sumedhghaisas> now on paper write down the errors propagating backwards for each layer based on the loss function
16:14 < sumedhghaisas> Also write down the weight updates
16:15 < sumedhghaisas> there are total 3 errors flowing backwards... so it should be very quick to calculate those
16:15 < Atharva> I will do that surely.
16:16 < sumedhghaisas> once you have those 3 errors written down... just analyze them
16:16 < sumedhghaisas> its a fairly quick exercise I used to do in college :)
16:17 < Atharva> Okay :)
16:17 < Atharva> So, I think I will start implementing all that we discussed.
16:18 < Atharva> I will try to do it by sunday night, so that from Monday I can start with VAE class as planned
16:18 < sumedhghaisas> Atharva: Sorry for taking too much of your time and also sorry to be very pedantic about this, but could we also shift the sampling to 'dist' , its a fairly easy copy paste. :)
16:19 < sumedhghaisas> what I mean is... The repar layer receives, vector of 2n, which we split it in 2 vectors of n
16:20 < sumedhghaisas> we can define a GaussianDistribution object on top of it given in class
16:20 < sumedhghaisas> now we implement a 'sample' function in that which basically does mean + random * stddev
16:20 < sumedhghaisas> and thats it
16:21 < sumedhghaisas> now this same gaussian distribution object could be used to define KL loss as well
16:21 < sumedhghaisas> which makes it a very smooth integration
16:21 < sumedhghaisas> what you think?
16:22 < Atharva> Okay, so instead of having a Sampling layer object, we just define a sampling function in the gaussianDistribution object
16:23 < sumedhghaisas> We still have the layer, but it will basically call the functions in gaussian distribution object, yes :)
16:24 < Atharva> Okay, sounds good
16:24 < Atharva> I will try to do all this and open a PR by sunday so that you can review it
16:24 < sumedhghaisas> this would let use the same API when we define the decoder distribution
16:25 < sumedhghaisas> sounds good :) so the PR will contain which part exactly?
16:25 < Atharva> KL divergence and sampling
16:26 < sumedhghaisas> better to send the KL PR separately
16:26 < sumedhghaisas> so other people will review it as well
16:26 < Atharva> okay, two PRs then :)
16:26 < sumedhghaisas> and we can make the sampling PR based on that PR
16:26 < sumedhghaisas> great :)
16:26 < sumedhghaisas> That will mostly conclude the first sync:)
16:26 < sumedhghaisas> hoosh
16:27 < Atharva> that was long :p
16:27 < Atharva> hopefully next thrusday, we will be discussing the VAE class :)
16:27 < Atharva> thursday*
16:28 -!- sumedhghaisas_ [2a6aec2e@gateway/web/freenode/ip.] has joined #mlpack
16:28 < sumedhghaisas_> Atharva: yup, was fun though
16:28 < Atharva> It was
16:29 < sumedhghaisas_> seems we hav everything in order now so feels good
16:29 < sumedhghaisas_> have fun coding :)
16:29 < Atharva> I will
16:30 < Atharva> I just wanted to ask you, when and where did you study about VAEs?
16:30 < Atharva> Was it some college project?
16:31 -!- sumedhghaisas [2a6b9f2a@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
16:32 -!- sumedhghaisas [2a6b8dd9@gateway/web/freenode/ip.] has joined #mlpack
16:32 < sumedhghaisas> Atharva: didn't have VAEs in bachelors :( but I studied them at the end of my masters
16:33 < sumedhghaisas> they gained popularity quickly as they replace the annoying contrastive divergence procedure used in boltzman machines
16:33 -!- sumedhghaisas_ [2a6aec2e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
16:34 < Atharva> I don't really know what that is, but I get your point. :)
16:34 < Atharva> So, are you done with your education now? Are you working?
16:35 -!- sumedhghaisas_ [2a6b9cc6@gateway/web/freenode/ip.] has joined #mlpack
16:36 < sumedhghaisas_> Atharva: ahh yes. I completed my masters
16:36 < ShikharJ> sumedhghaisas_: Aren't you working with the deepmind team?
16:38 -!- sumedhghaisas [2a6b8dd9@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
16:38 -!- sumedhghaisas [~yaaic@2402:3a80:660:61e5:d91c:23a8:ea7:12e] has joined #mlpack
16:39 < sumedhghaisas> Atharva: don't know if you got my messages or not
16:40 < Atharva> I got that you completed your masters, nothing after that.
16:40 < sumedhghaisas> completed my masters and working for DeepMind now... that's the gist of it :)
16:40 < Atharva> Oh that's awesome !
16:40 -!- sumedhghaisas_ [2a6b9cc6@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
16:41 < ShikharJ> sumedhghaisas: That's a professional profile that most would long for :)
16:43 < sumedhghaisas> Atharva, ShikharJ: Thanks :) not selling but MLPack actually helped me a lot in this case
16:44 < sumedhghaisas> I hope it does to you too
16:44 < ShikharJ> :)
16:57 -!- sumedhghaisas2 [~yaaic@2402:3a80:642:1574:6c4f:27ab:2f04:2ffb] has joined #mlpack
16:58 -!- sumedhghaisas [~yaaic@2402:3a80:660:61e5:d91c:23a8:ea7:12e] has quit [Ping timeout: 256 seconds]
16:59 -!- sumedhghaisas [~yaaic@2402:3a80:67c:a8:10ea:13b7:d70d:7578] has joined #mlpack
17:01 -!- sumedhghaisas2 [~yaaic@2402:3a80:642:1574:6c4f:27ab:2f04:2ffb] has quit [Ping timeout: 240 seconds]
17:22 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
17:24 < jenkins-mlpack> Project docker mlpack nightly build build #334: ABORTED in 1 day 10 hr:
17:25 < ShikharJ> zoq: I have tmux'ed the GAN build, let's see what output we obtain.
17:25 < jenkins-mlpack> Project docker mlpack weekly build build #43: ABORTED in 6 days 16 hr:
17:25 < jenkins-mlpack> * pdumouchel: #1152 Add tests for command-line and Python bindings
17:25 < jenkins-mlpack> * pdumouchel: #1152 Add tests for command-line and Python bindings- fixed style issues
17:25 < jenkins-mlpack> * pdumouchel: #1152 corrected previous test that used approx_equal. Created test for
17:25 < jenkins-mlpack> * pdumouchel: #1152 took out the NCALabelSizeTest
17:25 < jenkins-mlpack> * pdumouchel: #1152 put back the NCALabelSizeTest
17:25 < jenkins-mlpack> * pdumouchel: #1152 took out the NCALabelSizeTest
17:25 < jenkins-mlpack> * sshekhar.special: Added Pendulum continuous environment of OpenAI Gym
17:25 < jenkins-mlpack> * sshekhar.special: Fixed style errors
17:25 < jenkins-mlpack> * sshekhar.special: Fixed style and other errors
17:25 < jenkins-mlpack> * sshekhar.special: Added pendulum tests
17:25 < jenkins-mlpack> * sshekhar.special: Switched to Camel Case and fixed some style issues
17:25 < jenkins-mlpack> * sshekhar.special: Minor fix for type
17:25 < jenkins-mlpack> * sshekhar.special: Fixed comment mistake
17:25 < jenkins-mlpack> * Marcus Edel: Adjust parameter size and use alias inside conv operation.
17:25 < jenkins-mlpack> * sshekhar.special: Fixed the scope of power function and typo in tests
17:25 < jenkins-mlpack> * sshekhar.special: Added name to contributors
17:25 < jenkins-mlpack> * sshekhar.special: Fix Continuous Mountain Car environment
17:25 < jenkins-mlpack> * Ryan Curtin: Fix spelling (thanks Manish!).
17:25 < rcurtin> looks like there may be a memory leak somewhere, occasionally the tests are hanging
17:26 -!- sumedhghaisas [~yaaic@2402:3a80:67c:a8:10ea:13b7:d70d:7578] has quit [Ping timeout: 255 seconds]
17:30 -!- sumedhghaisas [~yaaic@2402:3a80:65d:5fbe:742d:a99b:14e9:1fbb] has joined #mlpack
17:31 -!- sumedhghaisas2 [~yaaic@] has quit [Ping timeout: 256 seconds]
17:32 < manish7294> rcurtin: It looks like slake doesn't have tmux too.
17:32 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
17:33 < rcurtin> manish7294: hang on, let me get it there also
17:33 < manish7294> rcurtin: great :)
17:34 < rcurtin> manish7294: ok, installed. don't hesitate to ask if there are any other missing packages :)
17:35 < manish7294> rcurtin: sure :)
17:36 -!- sumedhghaisas [~yaaic@2402:3a80:65d:5fbe:742d:a99b:14e9:1fbb] has quit [Ping timeout: 256 seconds]
17:37 -!- sumedhghaisas [~yaaic@2402:3a80:67e:9647:940e:97ec:4709:72e2] has joined #mlpack
18:13 < ShikharJ> zoq: Are you there?
18:28 < ShikharJ> zoq: Since the standard GAN and DCGAN only differ in the layers they employ (DCGAN makes use of transposed convolutions in place of Bilinear Interpolation, and a few minor changes), I was thinking to make use of the existing code for DCGAN as well.
18:31 < ShikharJ> zoq: That way, when we extend support for batch sizes, we'll automatically have the support in both of them, and if we were to try something additional, such as separating the optimizer for discriminator and generator, we'll get the benefits in both.
18:32 -!- sumedhghaisas [~yaaic@2402:3a80:67e:9647:940e:97ec:4709:72e2] has quit [Ping timeout: 256 seconds]
18:32 -!- sumedhghaisas2 [~yaaic@] has joined #mlpack
18:35 < ShikharJ> zoq: lozhnikov: Let me know what you think of that?
18:44 < ShikharJ> zoq: We'll probably need to have a separate module in the case for Wasserstein GAN, but that's a problem for later.
18:47 -!- travis-ci [] has joined #mlpack
18:47 < travis-ci> ShikharJ/mlpack#167 (master - 949ab83 : Ryan Curtin): The build has errored.
18:47 < travis-ci> Change view :
18:47 < travis-ci> Build details :
18:47 -!- travis-ci [] has left #mlpack []
18:57 -!- mikeling [uid89706@gateway/web/] has quit [Quit: Connection closed for inactivity]
19:55 -!- manish7294 [8ba70a35@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
19:57 < zoq> ShikharJ: I like the idea to use the existing class. I thought about the optimizer seperation. Let me comment on the PR.
19:57 < zoq> ShikharJ: Any results so far?
19:58 < zoq> ShikharJ: The results on the PR are promising. Did you check a range of parameters or are the defaults fine?
20:00 -!- sumedhghaisas2 [~yaaic@] has quit [Remote host closed the connection]
20:02 < jenkins-mlpack> Project docker mlpack nightly build build #335: UNSTABLE in 2 hr 38 min:
20:18 < zoq> ShikharJ: in case you like to use OpenCV to create an image.
20:34 < rcurtin> I'm running valgrind on mlpack_test but it takes a really long time... might be a while before I'm able to isolate any problem that is causing tests to hang
21:06 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
21:12 < zoq> hopefully we get some information
21:19 < rcurtin> maybe in a week :)
--- Log closed Fri Jun 01 00:00:44 2018