mlpack IRC logs, 2018-06-18

Logs for the day 2018-06-18 (starts at 0:00 UTC) are shown below.

June 2018
--- Log opened Mon Jun 18 00:00:08 2018
01:53 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
02:09 -!- manish7294 [~yaaic@2405:205:2480:faee:dc1:96a8:2bab:7e18] has joined #mlpack
02:14 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
02:22 < manish7294> zoq: Ah! that's looks strange, as we are not passing any initial transformation and it's being generated by shogun itself at line 56 of LMNNImp.cpp
02:25 < manish7294> I got it, if you see letter dataset---the labels are in the first column of csv file. That may be the reason of this unusual error.
02:26 < manish7294> so, letter dataset needs an update :)
02:30 < manish7294> And regarding 100% accuracy----It seems strange to me because if we comment out transformedData and just carry out accuracy on the original iris, then also we get 100%. Which is quite strange in itself as in iris there are some points which are just far away from there original class
02:31 -!- manish7294 [~yaaic@2405:205:2480:faee:dc1:96a8:2bab:7e18] has quit [Quit: Yaaic - Yet another Android IRC client -]
02:32 -!- manish7294 [8ba7a5fd@gateway/web/freenode/ip.] has joined #mlpack
02:41 -!- manish7294 [8ba7a5fd@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
02:48 -!- manish7294 [8ba7a5fd@gateway/web/freenode/ip.] has joined #mlpack
02:48 < manish7294>
02:57 -!- manish7294_ [8ba7a5fd@gateway/web/freenode/ip.] has joined #mlpack
02:57 < manish7294_> Same is with balance_scale dataset --- it too have labels in first column
02:58 < manish7294_> Things seems to be connecting now, It seems this is the reason why I was getting 20% and 40.235% on these two and 100% on all others.
03:00 -!- manish7294 [8ba7a5fd@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
03:00 < manish7294_> Since we have query set same as training set, It looks like for k = 1 the shogun's knn is prediciting itself as the nearest point and hence the 100% accuracy.
03:01 < manish7294_> Let me verify this by changing the value of k
03:32 < manish7294_> Right, This time things work --- got 96.6667 on iris :)
05:00 -!- manish7294_ [8ba7a5fd@gateway/web/freenode/ip.] has quit [Quit: Page closed]
08:48 < Atharva> sumedhghaisas: Hi Sumedh
09:11 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 260 seconds]
09:16 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
10:21 < jenkins-mlpack> Project docker mlpack nightly build build #353: STILL UNSTABLE in 3 hr 7 min:
11:40 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 265 seconds]
11:46 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
12:02 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 255 seconds]
12:05 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
12:19 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 256 seconds]
12:20 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
13:00 -!- wenhao [731bc5c9@gateway/web/freenode/ip.] has joined #mlpack
13:13 < rcurtin> manish7294_: sorry I was unavailable over the weekend, looks like you got things figured out
13:14 < rcurtin> with mlpack kNN, you can specify just a single reference set to do all-nearest-neighbors, which won't count the nearest point
13:14 < rcurtin> er, sorry, "which won't count the point as its own nearest neighbor"
13:15 < rcurtin> but it looks like shogun doesn't have that support... I wonder if to make it work right, if you need to query for each point individually with the rest of the points as the reference set
13:15 < rcurtin> you could also ask in #shogun if there's a better way to do it
13:16 -!- manish7294 [~yaaic@2405:205:2480:faee:dc1:96a8:2bab:7e18] has joined #mlpack
13:18 < manish7294> rcurtin: Hi, how's race went?
13:18 < manish7294> things are good if we just avoid k = 1
13:18 < rcurtin> manish7294: it was two events... one was an "ironman", a 1-hour endurance race
13:18 < manish7294> so maybe we can have k = 3 for benchmarking
13:19 < rcurtin> in that one, I placed 2nd:
13:19 < manish7294> great :)
13:19 < rcurtin> the other event was a series of races, but we are randomly assigned karts, and the karts were not good, so as a result I did not do well
13:19 < rcurtin> but I still had fun :)
13:20 < manish7294> wow! you are wearing cat tshirt :)
13:20 < rcurtin> my concern about using k=3 is that it'll give different results than mlpack would for k=3
13:20 < rcurtin> yeah, I ordered that shirt for fun but it turned out to be made of a really nice under armour-like material (don't know what it's called)
13:20 < rcurtin> so it's perfect for activities where I sweat a lot :)
13:21 < manish7294> why do you think it will give different results
13:23 < rcurtin> the kNN classification is the weighted average of the nearest k neighbors
13:23 < rcurtin> in shogun, if the 1st nearest neighbor is always the query point, then we get different results than mlpack, where the 1st nearest neighbor is not the query point
13:23 < manish7294> Right, I totally missed that
13:24 < manish7294> Okay I will ask on #shogun
13:25 < rcurtin> it might be worth looking through their documentation to see if there is any other idea first
13:25 < rcurtin> I only took a quick look
13:25 < manish7294> sure
13:27 < manish7294> or what we can do is keep k for mlpack training 1 less than the k used in shogun accuracy prediction
13:27 < manish7294> like for k = 1, we can put shogun to k = 2
13:27 < rcurtin> no, that may still result in problems
13:28 < rcurtin> there are still situations where that could give different results for mlpack and shogun
13:28 < manish7294> okay, then I shall try to find out more
13:30 < manish7294> will it be good to discuss this problem mentioning this particular with reference to mlpack on #shogun
13:32 < rcurtin> you can mention mlpack, but I don't think the folks in there are familiar with the particular project we are doing
13:32 < rcurtin> so, up to you :) I doubt it will make much difference
13:35 < manish7294> okay I will try my best to explain myself
13:39 < zoq> rcurtin: Nice, this t-shirt is the best :)
13:50 < manish7294> zoq: Can you please check the letters and balance dataset?
13:52 < zoq> manish7294: Hold on let me start the benchmark,.
13:53 < manish7294> zoq: not that, the datasets on the
13:53 < manish7294> they have labels in the first column
13:53 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 264 seconds]
13:53 < manish7294> And that may be the reason you got that error yesterday
13:54 < zoq> I see
13:55 < manish7294> and I think wine.csv doesn't have labels in it, not sure though.
13:55 < zoq> Great that you figured it out.
13:56 < manish7294> zoq: was possible because of your debugging :)
13:58 < zoq> manish7294: for the wine dataset the last column contains the labels, but SplitTrainData should remove that part.
14:01 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
14:04 < sumedhghaisas> Atharva: Hi Atharva
14:05 < manish7294> zoq: Thanks! I was unsure about it as I only had a very quick glance over it.
14:05 < sumedhghaisas> Sorry little busy today. Is it possible to catch up tomorrow?
14:06 < zoq> ShikharJ: Great blog update, always nice to see good results.
14:06 < Atharva> sumedhghaisas: Sure Sumedh! I have rebased the repar PR, do check it when you get time. Till then I will complete the second PR.
14:07 < sumedhghaisas> Atharva: ahh I did give it a look.
14:07 < sumedhghaisas> there seems to be some static code check errors.
14:07 < sumedhghaisas> although I am not sure how to solve them?
14:08 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 245 seconds]
14:08 < Atharva> Yeah I read the details, they are due to the emtpy constructor in the repar layer. But all the other layers have a constructor like that one.
14:09 < sumedhghaisas> hmm... lets ask Marcus then :)
14:09 < sumedhghaisas> zoq: Hey Marcus
14:10 < sumedhghaisas> Are we following static code check for ann layers?
14:10 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
14:11 < zoq> sumedhghais: Yes, but sometimes the static analysis check returns nonsense. You are talking about #1420?
14:13 < manish7294> zoq: Currently we have two elements (timing, Accuracy) in metrics of LMNN but only timing is being shown as benchmarks execute.
14:14 < zoq> manish7294: You are right, we should print everything.
14:14 < zoq> Atharva:
14:15 < zoq> Atharva: We can ignore the first two, I guess you could use arma::ones<arma::Mat<eT> > instead of arma::ones<arma::Mat<eT>> to solve the two; not sure.
14:16 < zoq> Atharva: You can fix the third issue setting latentSize, stochastic, includeKl to zero? inside the constructor initialization list.
14:17 < manish7294> zoq: So, is this a problem from my code or it's need to be implemented in benchmarks ?
14:21 < zoq> manish7294: This has to be implemented inside the main benchmark script; we could open an issue and maybe someone (including myself) will pick it up in the next days?
14:22 < manish7294> zoq: sounds good :)
14:23 < zoq> manish7294: can you open the issue?
14:23 < manish7294> zoq: I too will try if I can.
14:23 < zoq> manish7294: Thanks, if not I can open one later today.
14:23 < manish7294> zoq: sure, will open in about an hour from now.
14:24 < manish7294> or,I think I can do it now also :)
14:28 < zoq> wenhao: Really interesting results, do you think you could accumulate the results over multiple runs and include the runtime as another metric?
14:35 < sumedhghaisas> zoq: yes #1420
14:36 < sumedhghaisas> I looked at the static code analysis result. Don't know if it makes sense
14:37 < zoq> wenhao: Interesting looks like, there is no difference between the different search policies.
14:37 < zoq> sumedhghais: Right, we can ignore the first two, see my comments above.
14:38 < rcurtin> manish7294: I'll provide some updated comments later today or tomorrow... after a week off, I have a lot to catch up on it seems...
14:38 < manish7294> rcurtin: Great :)
14:41 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
15:02 < rcurtin> wenhao: I took a look at #1410, there is some really nice refactoring there. thank you for your hard work!
15:03 < rcurtin> one cool thing also is that with the NeighborSearchPolicy templatized, it would be possible to plug in LSH instead of tree-based kNN if a user wanted it
15:03 < rcurtin> I guess that could be useful if the rank of the decomposition was very large, so that the kNN search was high-dimensional, where LSH could perform better
17:26 < Atharva> I was thinking about upgrading to 18.04, has anybody encountered any problem with mlpack on it?
17:27 < rcurtin> I haven't had any problems
17:27 < ShikharJ> Atharva: Retreat comrade, before you regret!
17:27 < rcurtin> (with mlpack that is)
17:27 < Atharva> ShikharJ: Why what happened?
17:28 < rcurtin> I use debian unstable on most systems anyway :)
17:28 < Atharva> rcurtin: How is the nvidia driver support on 18.04, have they improved it?
17:28 < rcurtin> I haven't had any issues, sometimes the driver version is a little bit old, but if you install via apt there's no issues I've seen
17:28 < ShikharJ> Atharva: Lots of issues with boot and shutdown routines, no bumblebee support for switching off GPU, lots of apts don't work.
17:29 < rcurtin> oh my, maybe don't take my word for it then :)
17:29 < Atharva> Getting mixed reviews here :p
17:29 < ShikharJ> Atharva: If your current system works fine, don't make the mistake of upgrading it.
17:30 < ShikharJ> There's a reason why people still use 14.04
17:30 < Atharva> Maybe I will wait then, I don't have any problems with 16.04 other than the fact that Nvidia drivers are a bit hard to get started with
17:31 < ShikharJ> Atharva: It would only be harder with the newer versions, you see, Nvidia provides software after a particular Linux OS is released. Till then only the older ones are provided, which may not even work.
17:32 < ShikharJ> You might want to try 17.10, it's a major improvement over 16.04.
17:32 < ShikharJ> 17.10 and 18.04 are pretty similar, you wouldn't even feel the difference.
17:34 < rcurtin> my recommendation, which may be more work, would be to switch to debian unstable, which is what ubuntu is a derivative of
17:35 < Atharva> ShikharJ: Thanks! I will give this a thought over the next weekend.
17:35 < rcurtin> but it depends on personal preference. I like minimal systems so debian is a good starting place for me
17:36 < Atharva> rcurtin: Will it be suitable for someone like me who doesn't have a lot of experience with different linux distributions?
17:36 < Atharva> How different is it from ubuntu? Are any commands different?
17:36 < rcurtin> Atharva: I would say it depends on how comfortable you are with the command line. Ubuntu is generally understood to be "easier" and typically has nicer GUI tools and everything
17:37 < rcurtin> but I prefer working with the command-line wherever possible, so Debian is fine for me. both are built on apt, so the process of installing and upgrading packages is roughly the same
17:37 < rcurtin> but again, I would say, give it a shot if you like, but maybe you might not like it. only one way to find out :)
17:38 < Atharva> rcurtin: Yes, only one way to find out. If I do it I will let you know how I find it to be. :)
17:38 < rcurtin> ah, sorry, I did say "debian unstable" but I would recommend instead to use "debian testing"
17:38 < rcurtin> the releases are much less often than Ubuntu
17:39 < Atharva> Okay, thanks, I will check it out.
17:39 < rcurtin> so debian stable ("stretch") is a little old, but testing ("buster") will have more up-to-date packages
17:40 < Atharva> You said you use 18.04, does it kind of run parallel to ubuntu?
17:40 < rcurtin> no, I was using 18.04 in a docker container for some unrelated work at Symantec
17:40 < rcurtin> but I have built mlpack in that same setup, no problems
17:41 < Atharva> Okayy
18:11 < ShikharJ> zoq: Where does math::MakeAlias originate from?
18:11 < rcurtin> ShikharJ: do you mean which file?
18:11 < ShikharJ> zoq: I don't think there's a support for arma::cube for that.
18:11 < ShikharJ> This needs to be extended
18:11 < rcurtin> it's in src/mlpack/core/math/, and yeah, if you want to add cube support it would be great
18:13 < ShikharJ> rcurtin: Thanks!
18:13 < rcurtin> of course, happy to help :)
18:26 < ShikharJ> rcurtin: Is there a better way of doing `math::MakeAlias(const_cast<arma::cube>(arma::cube(input.memptr(), inputWidth, inputHeight, inSize * batchSize)), false);`, since this is not a pointer or a reference, so it leads to compilation issues.
18:27 < rcurtin> the inner arma::cube(input.memptr(), inputWidth, inputHeight, inSize * batchSize) is already most of the way to an alias
18:27 < ShikharJ> But we need to cast away the constness.
18:27 < rcurtin> if you add 'false' as a fifth constructor parameter, it's completely an alias
18:28 < rcurtin> I guess, I am not fully seeing the need for the MakeAlias() function since the MakeAlias() function basically just calls the advanced constructor you've already written there
18:29 < ShikharJ> Yeah, I misrepresented the question.
18:30 < ShikharJ> The question is how to remove the constness from an object like arma::cube(input.memptr(), inputWidth, inputHeight, inSize * batchSize, false, false);
18:30 < rcurtin> but that shouldn't give you a const object, I don't think
18:30 < rcurtin> if the problem is that input is const, you can do 'arma::cube(const_cast<arma::cube>(input).memptr(), inputWidth, inputHeight, inSize * batchSize, false);
18:30 < rcurtin> '
18:31 < ShikharJ> rcurtin: That's exactly the issue, I'll give it a try.
18:31 < rcurtin> yeah, I think that is basically the exact code that's already a part of MakeAlias() for vectors and matrices, but I am not 100% sure (I am not looking at it right this moment)
19:13 < rcurtin> ok, finally, our mlpack paper is accepted into the Journal of Open Source Software:
19:24 < ShikharJ> Congrats rcurtin and everyone!
19:25 < zoq> rcurtin: Awesome! Thanks for keeping up with all the comments!
19:26 < rcurtin> it was really useful to have random new people come try to use the software and post their feedback; we found a lot of documentation issues
19:26 < rcurtin> I think I will try and find random people on the street and give them a few dollars to try it out and see what they think :)
19:27 < ShikharJ> rcurtin: Just curious, does mlpack submit a paper everytime a new version release happens?
19:27 < rcurtin> no, we submitted one for the original release to a NIPS workshop and then submitted a longer version to the JMLR open source software track
19:27 < rcurtin> but version 3 is so different and has so much more, it was time to submit somewhere again
19:28 < rcurtin> maybe it would be useful to submit another paper for version 4? I am not sure, let's see what happens when we get there :)
19:28 < ShikharJ> rcurtin: Ah, I see, hope to stick around till that time :)
19:28 < zoq> rcurtin: That is an interesting idea, I suppose they are somewhat familiar with toolboxes.
19:29 < rcurtin> yeah, this is a big problem that I think JOSS is thinking about... how often to submit a new paper for a new version?
19:29 < rcurtin> I don't think we could submit to JMLR MLOSS again
19:31 < zoq> rcurtin: I guess it makes sense to at least allow someone to update the paper in some forms e.g. add new names.
19:32 < ShikharJ> rcurtin: Could we look for other journals (such as maybe PeerJ)?
19:33 < zoq> rcurtin: But with all the effort they put into the review not sure they have the manpower to do a review for every version.
19:33 < rcurtin> right, I think it might be difficult for them to provide reviewers. so I suspect they would frown upon us submitting a new version every month
19:33 < rcurtin> (that would also make it really hard for people to know what to cite when they use it)
19:34 < rcurtin> ShikharJ: I don't know about PeerJ, but you're right, maybe there are other efforts out there
19:34 < zoq> agreed, good point
20:07 -!- manish7294 [~yaaic@2405:205:2480:faee:dc1:96a8:2bab:7e18] has quit [Ping timeout: 255 seconds]
20:16 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
22:34 -!- wenhao [731bc5c9@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
--- Log closed Tue Jun 19 00:00:10 2018