May 7, 2015

Methodology: On UK elections, the talk on Twitter is largely negative

This analysis of the Twitter discussions surrounding the 2015 United Kingdom (UK) elections employed media research methods that combined Pew Research’s content analysis rules with computer coding software developed by Crimson Hexagon (CH). This report is based on examination of about 13.5 million Twitter statements that were identified as being about the parties competing for the elections during the time period March 30 – May 3, 2015. (This total is smaller than the total for all parties as some statements reference multiple parties.) The primary searches were conducted in English. The data were gathered and analyzed by Michael Barthel and Kristine Lu.

Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. Pew Research draws its analysis sample from all public Twitter posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training. While automated sentiment analysis is not perfect, the Center has conducted numerous tests and determined that Crimson Hexagon’s method of analysis is among the most accurate tools available. Multiple tests suggest that human coders and Crimson Hexagon’s results are in agreement between 75% and 83% of the time. (For a more in-depth explanation on how Crimson Hexagon’s technology works click here.)

This study contains an analysis of the sentiment or tone of the conversation on Twitter.

All tweets analyzed in this report were collected between 12 am EDT, March 30, 2015 to 11:59 pm EDT, May 3, 2015.

Each Boolean search used keywords in English only.

The Boolean searches used for each monitor included a variety of terms relevant to the subject being examined. Also terms that are commonly used but clearly not related to the study, were excluded. For example, the search used to identify tweets about the Conservative Party was: (@Conservatives OR “Conservative Party” OR @David_Cameron OR Cameron OR #VoteConservative OR #SecureTheRecovery OR #Conservatives OR tory OR tories OR #tories OR #tory OR conservatives) AND NOT (#tcot OR #ccot OR #teaparty OR “tea party” OR “rand paul” OR “ted cruz” OR @tedcruz OR #pjnet OR gop OR @lolgop OR democrats OR #uniteblue OR obama OR republican OR america OR “james cameron” OR “cameron diaz” OR “cameron dallas” OR @camerondallas)

Researchers classified more than 250 tweets in order to “train” these specific Crimson Hexagon monitors. All tweets were put into one of four categories: positive, neutral, negative or off topic. Depending on the search, a tweet was considered positive if it clearly praised a candidate or the party, and considered negative if it was clearly critical.

References to conservatives or the conservative party were only included in the study if the tweet was clearly focused on the UK election. The algorithm was trained to consider references to other conservatives, such as in the U.S., as off-topic and were excluded from the study.

CH monitors examine the entire Twitter discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than the dividing each story, paragraph, sentence or word, CH treats the “assertion” as the unit of measurement. Thus, posts are divided up by the computer algorithm. Consequently, the results are not expressed in percent of newshole or percent of stories. Instead, the results are the percent of assertions out of the entire body of stories identified by the original Boolean search terms. We refer to the entire collection of assertions as the “conversation.”