mlpack  blog
String Processing Utilities - Week 11

String Processing Utilities - Week 11

Jeffin Sam, 13 August 2019

This week started with extending lozhnikov's PR1960, I added Bagofwords encoding policy and also added Tf-Idf with different variants namely Raw_count, binary, sublinear_tf, term_frequency and also added test for both of the encoding policy, I think we are almost done with the encodings, and maybe some minor fixups needs to be done.

Now coming to the string-cleaning PR, we are done with that too, I made some minor fixups last week and also added some test for the CLI binding and updated documentation too, Again some minor fixups are remaining, apart from that everything is done.

For the coming week, my priority is to complete the Word2Vec algorithm, maybe I could just get the initial API done by the coming week and then can just complete the full-fledged API by the second week.

Also, post-GSoC, I will write tutorials for both how to drop string-encoding API into your code and also how to drop scaling matrix API, so stay tuned for both of that :)

Thank you :)