2012 Conventions Methodology

About This Study

A number of people at the Pew Research Center's Project for Excellence in Journalism worked on PEJ's "Pivotal Moment: How Mainstream and Social Media Treated the Campaign Convention Season." Director Tom Rosenstiel and Associate Director Mark Jurkowitz wrote the report. Senior Researcher Paul Hitlin supervised the content analysis component. Research Analyst Katerina Matsa collected the YouTube data. Researchers Steve Adams, Monica Anderson, Heather Brown, Sovini Tan and Nancy Vogt coded and analyzed the content data. Dana Page handles the communications for the project.

Methodology

This special report by the Pew Research Center's Project for Excellence in Journalism on media coverage of the 2012 presidential campaign uses data derived from three different methodologies. Data regarding the tone of coverage in the mainstream press were derived from the Project for Excellence in Journalism's in-house coding operation. (Click here for details on how that project, also known as PEJ's News Coverage Index, is conducted.)

Data regarding the tone of conversation on social media (Twitter, Facebook and blogs) were derived with a combination of PEJ's traditional media research methods, based on long-standing rules regarding content analysis, along with computer coding software developed by Crimson Hexagon. That software is able to analyze the textual content from millions of posts on social media platforms. Crimson Hexagon (CH) classifies online content by identifying statistical patterns in words.

Finally, data on the views on YouTube were collected from that site's publicly accessible pages.

Human Coding of Mainstream Media

Sample Selection

The content was based on media coverage originally captured as part of PEJ's weekly News Coverage Index (NCI) from August 27 to September 16, 2012. 

Each week, the NCI examines the coverage from 52 outlets in five media sectors, including newspapers, online news, network TV, cable TV, and radio. Following a system of rotation, between 25 and 28 outlets each weekday are studied as well as 3 newspapers each Sunday.

Click here for the full methodology regarding the News Coverage Index and the justification for the choices of outlets studied.

All relevant stories found in the 52 outlets were included in this study. A story was considered relevant if Barack Obama, Mitt Romney, Joe Biden or Paul Ryan were in at least 25% of the time or space of that story.

The unit of analysis for mainstream media coverage was the story or article. Over this four week time period, 1,084 campaign stories were coded.

Human coders determined whether individual stories would be included in the sample, and the tone of stories for each of the four candidates.

Tone Variable for Mainstream Coverage

The method used in this study for measuring tone in mainstream news is the same that PEJ has utilized in previous studies for more than 10 years.

The tone variable measures whether a story's tone is constructed in a way, via use of quotes, assertions, or innuendo, which results in positive, neutral, or negative coverage for the political figure of the story. While reading or listening to a story, coders tallied up all the comments that have either a negative or positive tone to the reporting. Direct and indirect quotes were counted along with assertions made by journalists themselves.

In order for a story to be coded as either "positive" or "negative," it must have either 1.5 times the amount of positive comments to negative comments, or 1.5 times the amount of negative comments to positive comments (with an exception for 2 to 3, which is coded as "neutral"). If the headline or lead has a positive or negative tone, it was counted twice into the total value. Also counted twice for tone were the first three paragraphs or first four sentences, whichever came first.

Any story where the ratio of positive to negative comments was less than 1.5 to 1 was considered a "neutral" story.

Stories were assigned a tone value for an individual only if that candidate was present in 25% or more of that story. If a candidate was in less than 25% of the story, that story was assigned a value of "not applicable" for that specific person. Each story was given a separate tone value for each of the four candidates, meaning that a particular story could be coded as positive for Romney, while also being coded as negative for Obama.


Coding Team & Process for Specific Campaign-related Tone Coding


A team of six experienced coders worked with a senior researcher to complete the tone coding for the mainstream campaign stories. Half of those coders had used the same methodology before.

Each of the six coders were trained (or re-trained) on the tone coding methodology and were given the same set of 40 stories to code for tone for each of the four candidates. The rate of intercoder reliability agreement was 81%.

Coding of Social Media Using a Computer Algorithm

For determining the tone of the conversation on social media, the study employed media research methods that combine PEJ's content analysis rules developed over more than a decade with computer coding software developed by Crimson Hexagon. This report is based on separate examinations of more than 18 million tweets, 690,000 blog posts and 323,000 Facebook posts.

Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. PEJ draws its analysis samples from several million blogs, all public Twitter posts and a random sample of publicly available Facebook posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training.  

According to Crimson Hexagon: "Our technology analyzes the entire social internet (blog posts, forum messages, Tweets, etc.) by identifying statistical patterns in the words used to express opinions on different topics."  Information on the tool itself can be found at http://www.crimsonhexagon.com/ and the in-depth methodologies can be found here http://www.crimsonhexagon.com/products/whitepapers/.

Crimson Hexagon measures text in the aggregate and the unit of measure is the ‘statement' or assessment, not the post or Tweet. One post or Tweet can contain more than one statement if multiple ideas are expressed. The results are determined as a percentage of the overall conversation.

The time frame for the analysis is August 27-September 23, 2012.

PEJ used Boolean searches to narrow the universe to relevant posts. The only search terms used were the candidates' last names ("Obama" and "Romney").

During the CH training process, the program learns to identify relevant posts and exclude messages that are not related from the results. For example, messages that were solely about Michelle Obama were excluded, as were messages focusing entirely on Ann Romney.

Data on YouTube Views

The Project for Excellence in Journalism first tracked and captured videos that were uploaded on the candidates' and parties' YouTube channels from August 28-31, 2012, and from September 4-6, 2012, when the political conventions took place. The videos were captured each day at 9 AM EST; one day after each video was uploaded. We looked into the videos posted by the two candidates' YouTube channels (http://www.youtube.com/user/mittromney/ and http://www.youtube.com/barackobama) and videos posted by the two parties' channels dedicated specifically to the conventions (http://www.youtube.com/gopconvention2012 and http://www.youtube.com/demconvention).

A second audit was conducted between September 21 and 24, 2012, to look for any changes to the number of views each video received. Also, during this second audit, researchers examined the traffic that convention speeches received on other sources such as ABCNews, C-SPAN, or other individual accounts. For each video, the top five in number of views by sources/channels were identified through searching and using different keyword combinations on YouTube.