/ Machine Learning

ML By Redditors, For Redditors: /r/learnmachinelearning Project Proposals and Challenges

You notice how it takes me weeks on weeks to get new posts up? That's intentional, as you'll only see posts when I have something substantive to contribute, and I only take the time to write about it when more pressing tasks let up. That being said, if you're a Redditor who is interested in machine learning, you're in luck! This post pertains to you.

The /r/LearnMachineLearning subreddit has a Discord server for machine learning enthusiasts, and on it came the discussion of making a community project to...well, learn machine learning. This is to be an open source community project for building a Reddit bot, for us to learn collaboratively in writing, implementing, and tuning an algorithm. And the best part is that the end result will have a practical application - you'll know quickly if your work has paid off. We are taking suggestions for what kind of Reddit bot to work on, and you're welcome to contribute your own ideas, or to work on an existing one. Here's the Discord invitation link if I've already got you hooked: https://discord.gg/G3rvFKF

"What are the existing proposals? Anything fun?"

Below are two ideas of my own making that you're free to give input on.

Earlier in the summer, I'd proposed an FAQ responder bot for identifying and responding to common questions, and could be implemented for any subreddit that wanted it. This could make the lives of mods and users easier for driving user education and reducing the frequency with which such questions would be posted. In starting work on this idea, I found the largest challenge wouldn't be creating the algorithm, but rather curating a labelled dataset for training and evaluating it. To the best of my knowledge (and limited searching) there is no set of data already out there for comparing FAQ posts to all the other posts in a subreddit, so we'd need to make our own for every subreddit that would make use of the bot. This would take an unfortunately sizable human effort to create, and that does not cross my mind as being appropriate for someone to spend hours of time on if they're just getting started working with machine learning. That, in addition to what I perceive to be a limited reward, gave me reason to put a hold on that idea.

But there's an upside to this story!

This idea quickly spun off into one that would take advantage of much of the work I'd already put in over the last couple months. Enter the Reddit Recommender! The idea is to create an application that reviews a Redditor's current subscriptions, and predicts if other subreddits might be interesting to them. In the end this will manifest in being a personalized recommender system. Although it's not well-suited to being made into a bot, it will still serve the ultimate goal of creating a practical application for use by Redditors, powered by a machine learning algorithm, and serving as a learning tool for those of us contributing to the project.

Compared to the previous proposal, this one's datasets are relatively fast to curate. Creating datasets for training and evaluation became trivially easy after building the Dataset Toolkit utility. In a nutshell, you write a JSON file for what you need it to fetch for you, and it'll go grab the posts for each subreddit you specified, to be fed into your algorithm thereafter.

The technical details of this proposal have yet to be worked out, but my preliminary work is leveraging the approach described in Denny Britz' blog post on using convolutional neural networks for text classification. I'm looking to make further use of MXNet as opposed to TensorFlow for a number of reasons, so I'll be taking this as a great opportunity to work with the recently-implemented Gluon interface.

"I've got something to say about that!"

Like this proposal, or think it's terrible? Have an idea to contribute? Have questions? Let us know on Discord! I'm @jgreenemi over there. In the near future we'll offer more details on the subreddit about this effort, to catch those who don't use Discord but wish to participate - do be watching for that.