mlpack IRC logs, 2018-07-04

Logs for the day 2018-07-04 (starts at 0:00 UTC) are shown below.

July 2018
--- Log opened Wed Jul 04 00:00:31 2018
01:12 < rcurtin> zoq: I was trying to run the Elki benchmark job, but it seems like there was an issue with the SQL database:
01:13 < rcurtin>
01:13 < rcurtin> "_mysql_exceptions.OperationalError: (1136, "Column count doesn't match value count at row 1")"
01:13 < rcurtin> do you think I should simply remove (after backup) the database? or do you know of an easy way to fix it?
01:13 < rcurtin> I have no problem running all the benchmarks again
01:13 < rcurtin> it will just take a little while...
01:44 < rcurtin> oh, I see what it is... the sql database doesn't have the right columns for any sweeps
01:47 < rcurtin> in this case I'll go ahead and back up the DB and then remove it so that the benchmark job will just create a new one
07:16 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 245 seconds]
07:16 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
09:59 < jenkins-mlpack> Project docker mlpack nightly build build #369: UNSTABLE in 2 hr 45 min:
10:07 < ShikharJ> zoq: Using EvaluateWithGradients increases time performance by almost 13%. So that saves us about 45 minutes of training time. This puts mlpack (single core majorly) at 6.25 hours and tensorflow (multi-threaded) at 4.5 hours (single core aggregate at 11 hours). Using OpenBLAS didn't turn out to be of much benefit, multi-threading was only active for an aggregate of 5 minutes, so I'm not sure if considering the overhead,
10:07 < ShikharJ> this would be beneficial or harmful.
10:11 < ShikharJ> zoq: I also timed the individual evaluate function at ~19s per call and gradient at ~52s per call. Evaluate with Gradients is steady at ~62s per call. So the next step should be to look into Gradients function, and see if there's a chance of improvement (though it looks pretty tight to me).
10:17 < ShikharJ> zoq: If we can somehow provide multi-thread support to our FFN architecture (with a performance bump of ~30%, though I'm not sure if this is realistic or not), we can beat tensorflow's time on multi-threads as well. But atleast for now we have the edge on single core.
10:28 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
10:31 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
11:42 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
12:40 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has joined #mlpack
12:42 < manish7294> rcurtin: zoq: I think setting seed using math::RandomSeed(const size_t) and then calling arma::randu() is always initializes the same matrix.
12:44 < manish7294> I replaced RandomSeed() with arma_rng::set_seed_random() and this time initialization was different as to be expected.
13:34 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has quit [Ping timeout: 252 seconds]
14:07 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:40 < rcurtin> manish7294: that is strange, RandomSeed() should be calling arma_rng::set_seed_random() internally
14:41 < rcurtin> the only case where it doesn't do that is when BINDING_TYPE is BINDING_TYPE_TEST, is that true in your case?
14:41 < rcurtin> that should only be in any code that's in the src/mlpack/tests/main_tests/ directory
14:46 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has joined #mlpack
14:48 < manish7294> rcurtin: math::RandomSeed((size_t) CLI::GetParam<int>("seed")); This is being used in lmnn_main.cpp to do this, but somehow it's not working.
14:48 < rcurtin> right, so either you'll need to set --seed differently in each of your calls to the program, or you'll need to do what the other programs do, which is
14:48 < rcurtin> if (CLI::GetParam<int>("seed") == 0)
14:48 < rcurtin> math::RandomSeed(std::time(NULL));
14:48 < rcurtin> else
14:49 < rcurtin> math::RandomSeed((size_t) CLI::GetParam<int>("seed"));
14:49 < manish7294> I am doing the same
14:49 < manish7294> if (CLI::GetParam<int>("seed") != 0)
14:49 < manish7294> math::RandomSeed((size_t) CLI::GetParam<int>("seed"));
14:49 < manish7294> else
14:49 < manish7294> math::RandomSeed((size_t) std::time(NULL));
14:49 < rcurtin> and does seed have a default value of 0?
14:49 < manish7294> yes
14:50 < rcurtin> it sounds like you should trace what is going on and figure out why it isn't setting the random seed
14:50 < manish7294> the most strange part is that putting arma_rng::seed here solves this
14:51 < manish7294> *arma_rng::set_seed_random()
14:51 < rcurtin> this does not make sense, so you should debug it and find out what is going on
14:51 < manish7294> ya, sure
14:53 < manish7294> And shall we merge the LMNN code, so that PR related to issues can be open.
14:54 < manish7294> I think all the comments on PR are solved now.
15:27 < rcurtin> manish7294: thanks, I'm glad to have it merged :)
15:28 < rcurtin> ShikharJ: I read your results from yesterday, it looks good to me so far! I agree that some improvement could still be done, but definitely I think we are in a good starting place
15:28 < rcurtin> I like to use profilers like gprof or perf to try and identify what's taking a long time
15:28 < rcurtin> and actually using mlpack's Timer::Start() and Timer::Stop() can be good for high-level benchmarking
15:28 < rcurtin> so long as what you're timing takes a non-negligible amount of time (like 0.0001 seconds between the call to Start() and Stop()) I think it is reasonably accurate
15:29 < rcurtin> of course I don't know if that is what you are planning to do next, and if not, no worries, but if so, I thought it would be helpful :)
15:32 < rcurtin> ShikharJ: I will also bring savannah back online for jenkins, so let me know if you want me to bring it offline for simulations again
15:43 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has quit [Ping timeout: 252 seconds]
16:07 < ShikharJ> rcurtin: Ah, I used std::chrono for timing the builds and the calls.
16:09 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has joined #mlpack
16:10 < manish7294> rcurtin: there?
16:10 < ShikharJ> rcurtin: Sure, you can bring Savannah online now. Atleast we now have a baseline score to beat. I'll keep digging in the code to find places we can improve upon, and test out the implementations on the benchmark systems for as long as they're online.
16:32 -!- travis-ci [] has joined #mlpack
16:32 < travis-ci> mlpack/mlpack#5225 (master - fd59d03 : Ryan Curtin): The build passed.
16:32 < travis-ci> Change view :
16:32 < travis-ci> Build details :
16:32 -!- travis-ci [] has left #mlpack []
16:38 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
16:40 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
16:41 < zoq> ShikharJ: Thanks for the timings, glad implementing EvaluateWithGradients worked out.
16:41 < zoq> ShikharJ: I guess now it's time to use gprof or something similar to find some bottlenecks.
16:41 < zoq> ShikharJ: We can definitely revisit the conv operations.
16:46 < ShikharJ> zoq: I was thinking to not shift focus from RBM PR, since I guess the first priority should be to get as many modules available within Mlpack. Optimizing them further should be a task for the later in my opinion. I wish to finish atleat Kris' share of remaining PRs from last year. Then maybe we caan focus on this?
16:47 < zoq> Sounds reasonable, I'll see if I can do some pre-profiling.
16:52 < manish7294> rcurtin: As per what I found, the error is within this line #if (BINDING_TYPE != BINDING_TYPE_TEST) , it doesn't take care of NULL condition and as a result condition is set to false when BINDING_TYPE is not there because we are evaluating (NULL !=NULL) here----which is not true. If that sounds reasonable then I will open a PR fixing the same.
16:54 < Atharva> zoq:
16:54 < Atharva> I think there is a mistake in the `Gradient()` function of `Sequential` layer.
16:55 < Atharva> The error to the `Sequential` layer is passed to the first layer within the `Sequential` layer. It should be passed to the last layer.
16:57 < Atharva> Also, when using a single `Sequential` layer in a FFN class, the `Gradient(arma::mat &&input)` calls `network[1]` which does not exist and hence it throws a Segmentation fault.
17:00 < zoq> Atharva: Nice catch, you are right; about the single layer issue, ideally we would have to check if network size > 1 and either modify the update process or add an identity layer. I guess adding a note to the class itself that someone should only use the class if there is more than one layer, don't really see a reason to use the seq layer with a single layer.
17:02 < Atharva> Yeah, there isn't any reason to use the class with only only a single layer. I was just trying to debug something so I did it.
17:06 < Atharva> What do you think about the first issue, in th `Gradient()` function of the layer. My network has three layers, sequential encoder, reparametrization and sequential decoder. In the FFN class, the sequential layer is just one object so it passes the error from the reparametrization layer to it. The sequential layer ends up passing it to it's first layer when actually it should pass it to it's last layer.
17:08 < zoq> right, we should use network.back()
17:09 < Atharva> Yes, if it's okay I will make the changes in one of my PRs.
17:12 < zoq> Atharva: Great, thanks :)
17:12 < zoq> Atharva: Really nice catch!
17:13 < Atharva> zoq: Thanks, I had to solve it because my network was failing because of it.
17:34 < ShikharJ> zoq: I'm not sure if the newer constructor initialization is correct in WGAN PR. Could you take a look. I have left a comment at the necessary place.
17:43 -!- manish7294 [8ba7da62@gateway/web/freenode/ip.] has quit [Ping timeout: 252 seconds]
17:50 -!- cjlcarvalho [] has joined #mlpack
17:57 < rcurtin> manish7294: you're right, that is a big bug! if you can submit a PR that would be great
17:57 < rcurtin> that means random seeds are not working at all right now
18:15 -!- cjlcarvalho [] has quit [Ping timeout: 240 seconds]
18:17 -!- manish7294 [8ba74306@gateway/web/freenode/ip.] has joined #mlpack
18:19 < manish7294> rcurtin: I have opened #1462 dealing with RandomSeed() issue. I guess it was accidentally missed when BINDING_TYPE_TEST support was added.
18:52 -!- travis-ci [] has joined #mlpack
18:52 < travis-ci> manish7294/mlpack#50 (RandomSeed - fc0edcd : Manish): The build has errored.
18:52 < travis-ci> Change view :
18:52 < travis-ci> Build details :
18:52 -!- travis-ci [] has left #mlpack []
20:07 -!- travis-ci [] has joined #mlpack
20:07 < travis-ci> manish7294/mlpack#51 (RandomSeed - 46f3695 : Manish): The build has errored.
20:07 < travis-ci> Change view :
20:07 < travis-ci> Build details :
20:07 -!- travis-ci [] has left #mlpack []
20:19 -!- manish7294 [8ba74306@gateway/web/freenode/ip.] has quit [Ping timeout: 252 seconds]
20:38 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
21:22 -!- travis-ci [] has joined #mlpack
21:22 < travis-ci> manish7294/mlpack#52 (RandomSeed - 800f52f : Manish): The build passed.
21:22 < travis-ci> Change view :
21:22 < travis-ci> Build details :
21:22 -!- travis-ci [] has left #mlpack []
21:54 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
22:05 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
--- Log closed Thu Jul 05 00:00:33 2018