mlpack IRC logs, 2018-07-01

Logs for the day 2018-07-01 (starts at 0:00 UTC) are shown below.

July 2018
--- Log opened Sun Jul 01 00:00:27 2018
00:37 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
07:10 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
08:05 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
09:51 < ShikharJ> zoq: Are you there?
10:11 < jenkins-mlpack> Yippee, build fixed!
10:11 < jenkins-mlpack> Project docker mlpack nightly build build #366: FIXED in 2 hr 57 min:
10:27 < zoq> ShikharJ: Great results, will take a look at the PR later today.
10:44 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
10:48 < ShikharJ> zoq: I was hoping if we could discuss the design for the DualOptimizer PR? If you'll be free later during the day, please ping me then.
10:49 < zoq> ShikharJ: If you like we can talk about the design now.
10:52 < Atharva> zoq: For VAE models, MNIST data is needed, should we store it as csv in the models repo or should we store it as the original idx-ubyte format
10:52 < Atharva> Another option can be to download it the first time someone builds the models repo
10:53 < ShikharJ> Atharva: I have the tranposed MNIST data available in csv format, let me know if you need it.
10:53 < Atharva> ShikharJ: What do you mean by transposes here?
10:54 < Atharva> transposed*
10:54 < ShikharJ> Every image is a column of size 784. The original dataset had images laid across rows.
10:54 < zoq> Atharva: csv is fine hdf5 might be another option, we could compress the dataset and uncompress it as a build step, something like:
10:55 < ShikharJ> So the dataset has dimensions 70,000 x 784 instead of 784 x 70,000.
10:55 < Atharva> ShikharJ: got it :), I thought by transposed you meant something else in this case
10:56 < Atharva> zoq: Thanks, can armadillo load hdf5 files?
10:58 < zoq> Atharva: Yes, build you have to build armadillo with hdf5 support
11:00 < Atharva> zoq: Okay, If hdf5 doesn't have a size much lesser tan csv, I guess it's better to go with csv as otherwise people will have to build armadillo differerntly for this one task
11:02 < zoq> Atharva: Agreed.
11:02 < Atharva> ShikharJ: Where do you have it? on some repo?
11:02 < ShikharJ> Atharva: On my laptop, and on savannah server.
11:03 < Atharva> Can you give me the link?
11:03 < ShikharJ> Atharva: If there's a server you need it on, I can scp the zip file?
11:03 < ShikharJ> Atharva: The zip is about 17 MBs, so maybe I can send it over mail as well
11:04 < zoq> ShikharJ: We can put it in the jenkins folder.
11:05 < ShikharJ> zoq: I'm not sure if I understand what you mean by jenkins folder?
11:06 < ShikharJ> Did you mean jenkins-conf repository?
11:06 < zoq> If we move the file in the jenkins workspace, somene can download it over http.
11:07 < zoq> This is just another possibility.
11:08 < ShikharJ> zoq: Ideally we should try and upload the dataset directly to mlpack/models?
11:08 < zoq> ShikharJ: Agreed, what's the size of the compressed dataset (tar.gz)?
11:09 < ShikharJ> zoq: I only have the zip file with me, and that's about 17 MBs.
11:10 < ShikharJ> I'm not sure if tar ball would have a significantly different size or not.
11:11 < zoq> ShikharJ: Right, it's just that on some systems you have to install unzip or something like that to extract the archive.
11:12 < ShikharJ> zoq: I'll upload the tarball to mlpack/models. Atharva, you can download from there then.
11:12 < zoq> ShikharJ: Great, thanks!
11:17 < Atharva> ShikharJ: Thanks!
11:18 < ShikharJ> Interesting, the tarball is about 13.9 MBs in size.
11:19 < Atharva> How much had you thought it should be?
11:20 < ShikharJ> Atharva: I wasn't expecting it to be about 5/6th the size of a zip file.
11:20 < Atharva> Ohh, okay
11:37 < ShikharJ> zoq: I have opened a pull request for the dataset. Atharva, you can download from the same once it is merged.
11:38 < ShikharJ> zoq: Are you still around, maybe we can discuss the design of the dual optimizer further?
11:39 < zoq> ShikharJ: Sure, I'm here.
11:41 < ShikharJ> zoq: As far as I can see, when we create an optimizer object, it internally calls the optimize function. But what that optimize function does is not clear to me.
11:44 < ShikharJ> zoq: More specifically, what functionality does optimize() offer?
11:45 < zoq> ShikharJ: The optimizer will call the Evaluate function to get the current loss, in our case this will run the Forward pass. Afterwards the optimizer will call the Gradient function to get the gradients for the update step, so this will call the Forward/Backward and Gradient function in case of a network.
11:45 < zoq> Perhaps: is helpful as well.
11:45 < zoq> See the example optimizer.
11:46 < zoq> In our case ObjectiveFunction is the GAN class.
11:49 < ShikharJ> zoq: Alright, so if we create two optimizers, we'll need to provide two evaluate, two gradient, and two forward functions (because we're calling discriminator and generator forward at the same time inside GAN::Forward) right/
11:49 < ShikharJ> ?
11:51 < zoq> That's an option, but we could use the same Evalaute function if we could distinguish between the functions somehow.
11:55 < zoq> ShikharJ: If you like I can write a dummy class with the idea I have in mind.
11:55 < ShikharJ> zoq: Hmm, I can't think of an idea that would be fast regarding this, do you have an idea how can this be achieved. I can only think of a template based solution, but that woud lead to runtime loss.
11:55 < ShikharJ> zoq: Please go ahead, I can't think of a good solution for this problem.
11:56 < zoq> Sure, maybe I missed something :) I'll see if I can put some time into the idea later today.
11:57 < ShikharJ> zoq: Also regarding the models directory, what name would you prefer? Maybe a generic name like datasets/ ?
11:57 < zoq> yeah, or data
11:57 < ShikharJ> I'll make the changes and push again.
11:58 < zoq> ShikharJ: Okay, thanks again.
14:17 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has quit [Remote host closed the connection]
14:28 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has joined #mlpack
14:45 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has quit [Remote host closed the connection]
14:47 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has joined #mlpack
17:59 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
18:12 < ShikharJ> zoq: I was thinking, in the meantime, we can try completing the RBM PR?
18:17 < zoq> ShikharJ: I think that is a great idea :)
18:20 < ShikharJ> zoq: Great, I have updated the WGAN PR, and I'll update the blog with a post as well.
18:23 < zoq> ShikharJ: Awesome, I'll take a second look at the changes later today; and we should be able to merge the code in the next days.
18:31 < Atharva> ShikharJ: The dataset you uploaded doesn't have the labels, we don't need it for generative models but I guess it will be better to have them.
18:32 < Atharva> Or do you need it for GANs, sorry I am not sure
18:32 < Atharva> ?
18:33 < ShikharJ> Atharva: Yeah, I did that to prevent having problems for the GAN code. We can provide them separately, I have the original full dataset as well.
18:33 < Atharva> Oh cool, maybe you could commit the csv with labels to the PR :)
18:34 < Atharva> No rush, do it when you get time
18:35 < ShikharJ> Atharva: For most other cases, the should suffice, I don't really see a need for adding labels, as it has them I think.
18:35 < Atharva> Ahh yes, it has. No problem then.
18:51 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 245 seconds]
19:39 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
19:57 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has quit [Quit: Bye bye.]
19:59 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has joined #mlpack
20:54 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
22:10 < Atharva> sumedhghaisas: You there?
--- Log closed Mon Jul 02 00:00:28 2018