Reddit users and news users more likely to be male and young

While just 4% of U.S. adults report using Reddit, about seven-in-ten of these users (78%) get news on the site. Overall, 2% of U.S. adults get news on Reddit.

Both Reddit users in general and users who get news on the site tend to be young, male, and to self-identify as liberal at higher rates than the overall public.

About seven-in-ten (71%) of Reddit news users are men, 59% are between the ages of 18 and 29, and 47% identify as liberal, while only 13% are conservative (39% say they are moderate). In comparison, among all U.S. adults, about half (49%) are men, just 22% are 18- to 29-year-olds and about a quarter (24%) say they are liberal.

As could be expected, Reddit news users are also heavy internet users: 47% report going online almost constantly (compared with 21% of U.S. adults overall).

Reddit users more likely to learn about presidential election from digital sources than the general publicTo get a better understanding of the dynamics of news conversation on Reddit, the analysis in this report focuses on a current and often passionate conversation: the 2016 presidential candidates. Within Reddit users overall (not just Reddit news users), 45% report learning something about the presidential campaign or candidates on the site in a given week. This is especially pronounced among liberal Reddit users, of whom fully 59% said they learned something.

These users are not solely getting news on Reddit. They are also more likely than the general public to be learning about the election from other sources, including news websites or apps, late night comedy shows, and the apps, emails, or websites of issue-based groups. They are less likely, though, to be learning from nightly network news, cable or local TV news, or the print edition of a local daily newspaper.

And when asked for the one type of source they find most helpful for election information, social networking sites ranked first with 44% of Reddit users saying so. About one-in-five (18%) Reddit users specifically named Reddit as the most helpful source.

Many of these differences are in line with Reddit’s younger, more liberal user base.

Understanding Reddit

In its structure and function, Reddit contains elements of a discussion board, social networking site and messaging service. Users submit posts, known as submissions (referred to as “posts” in this report). These can be original content, links to outside content or a combination of the two. Indeed, fully 62% of posts studied here linked to another website (a quarter of these linked to common sites hosting images or videos).3 Other users can then add comments to this post. In addition to commenting on posts, users can also rate both posts and comments by “upvoting” them (indicating that they are worth being seen by others) or “downvoting” them (indicating that they should not be seen). This voting drives the display of posts and comments on the site.

Parts of a Reddit screen

News is one of many types of discussion forums on RedditAs a whole, Reddit is organized into subreddits, roughly equivalent to forums or topics on other online message boards. The names of these subreddits generally describe the topic being discussed (such as /r/politics) or the process used to discuss a variety of subjects (such as /r/AskAnAmerican or /r/explainlikeimfive).

Though many Reddit users get news on the site, news is only one of many types of things discussed there. Of the 10 subreddits that attracted the most comments overall in the three months studied here – not just those that named a presidential candidate – two involve video games, two involve sports, and four are general-interest forums such as /r/funny and /r/videos. The main general news subreddit, /r/news, was the ninth highest in total comments during these three months, while /r/worldnews was the tenth highest.

Tracking discussion of the presidential candidates on Reddit

To shed light on how Reddit users discuss news and current events on the site, Center researchers took advantage of the publicly released work of Jason Baumgartner, a researcher who collected a massive dataset of site comments initially made publicly available in July 2015, with new monthly data added thereafter. Center researchers initially downloaded the two most recent months available, May and June, and then, when it became available, added September. The topic selected for analysis was discussion of the 2016 presidential campaign, operationalized by looking for mentions of the names of the leading presidential contenders in the discussions on Reddit.

It is important to note that this analysis is conducted on the comments that appear under posts, not the posts themselves. This decision was made in part because comments appear more frequently than do individual posts, and because they represent the closest thing to a proxy for participant conversation on the site. In addition, the initial data set made available to researchers was at the comment level. While in some ways this is a constraint, the clear focus on comments – which by their nature far exceed the number of original posts – does offer a window into how conversation happens on Reddit.

The analysis represents a first for the Center in that it employs a combination of in-house machine learning and human coding. Machine learning is a technique that allows “trained” statistical models based on word frequencies to stand in for humans in coding extremely large amounts of text or images. Working with a dataset of all comments posted in Reddit, researchers began by performing free-text searches for each candidate’s name. Due to polysemy, the fact that words can have multiple meanings (“trump” could refer to the Republican candidate or an action in a game of bridge), researchers took an extra step. In each of the three months, for all 21 candidates, if the name search returned less than 500 results, researchers coded all results. If it returned more than 500, researchers coded at least 500 but up to 3,000 as necessary for the classifier to achieve reliability. Overall, researchers hand-coded more than 50,000 comments. These results were used to train a machine learning model to classify all remaining comments. The model was structured to err on the side of not including a comment rather than including one in a classification category. That resulted in a final dataset of more than 350,000 comments from about 100,000 different authors on almost 90,000 posts in about 5,000 different subreddits over the three months. The findings presented here are intended to be a systematic analysis of a specific population: people who talk about presidential candidates on Reddit