Dissecting Trump's most rabid online following

We’ve adapted a technique that’s used in machine learning research — called latent semantic analysis — to characterize 50,323 active subreddits2 based on 1.4 billion comments posted from Jan. 1, 2015, to Dec. 31, 2016, in a way that allows us to quantify how similar in essence one subreddit is to another. At its heart, the analysis is based on commenter overlap: Two subreddits are deemed more similar if many commenters have posted often to both. This also makes it possible to do what we call “subreddit algebra”: adding one subreddit to another and seeing if the result resembles some third subreddit, or subtracting out a component of one subreddit’s character and seeing what’s left. (There’s a detailed explanation of how this analysis works at the bottom of the article).

Here’s a simple example: Using our technique, you can add the primary subreddit for talking about the NBA (r/nba) to the main subreddit for the state of Minnesota (r/minnesota) and the closest result is r/timberwolves, the subreddit dedicated to Minnesota’s pro basketball team. Similarly, you can take r/nba and subtract r/sports, and the result is r/Sneakers, a subreddit dedicated to the sneaker culture that is a prominent non-sport component of NBA fandom.

This may all seem pretty abstract, but that same algebra can be applied to r/The_Donald. What happens when you break r/The_Donald up into subgroups using subreddit subtraction? What happens when you add unrelated subreddits to r/The_Donald? Before we get into those questions, let’s take a look at the subreddits that are most similar to r/The_Donald, according to our analysis…