A number of people at the Pew Research Center's Project for Excellence in Journalism work on PEJ's "Campaign 2012 in the Media." Director Tom Rosenstiel and Associate Director Mark Jurkowitz write the narrative reports. Tricia Sartor, manager of the weekly news index, and senior researcher Paul Hitlin supervise the creation of the monitors using Crimson Hexagon technology. The team that aids in the research including using computer technology, coding and content analysis included researchers Kevin Caldwell and Nancy Vogt, who run the monitors using the computer technology, and Steve Adams, Laura Santhanam, Monica Anderson, Heather Brown, Jeff Beattie, and Sovini Tan who code and analyze the content data. Dana Page handles the web and communications for the project.
The following Pew Research Center staff provide design and programming assistance for the interactive tool: Russell Heimlich, web developer; Michael Piccorossi, director of digital strategy and Michael Keegan, graphics director.
To arrive at the results regarding the tone of coverage, PEJ employed a combination of traditional media research methods, based on long-standing rules regarding content analysis, along with computer coding software developed by Crimson Hexagon. That software is able to analyze the textual content from billions of messages on posts on Twitter and millions of web-based articles from news sites. Crimson Hexagon (CH) classifies online content by identifying statistical patterns in words.
In addition to original work produced by PEJ, the analysis includes links to tools provided by other organizations, such as Google trends, that provide related data about the campaign.
Quantity of Coverage for Candidates in the Mainstream Press
During PEJ's weekly coding for the News Coverage Index, coders determine which stories are focused primarily on the 2012 campaign. A story is considered a campaign story if 50% of the time or space allotted to that story is about the campaign or any of the Republican candidates. (Stories about President Obama are treated differently. See the note below.)
During the same process, coders identify stories where each of the candidates is a "significant newsmaker" in the story. To be considered a "significant newsmaker," a person must be in 25% of the story or more. A story can have multiple significant newsmakers.
To determine the quantity of campaign coverage for each candidate in the mainstream press as a percentage, PEJ divides the number of stories where a candidate is a significant newsmaker by the total number of campaign stories.
Because President Obama receives a considerable amount of news coverage for his acts in office that are not directly tied to the campaign, many stories about him are not coded as campaign stories. Therefore, the percentage of stories where Obama is present is out of all news stories, and not just campaign-focused stories.
Use of Crimson Hexagon's Technology to Determine the Tone of Coverage
To arrive at the results regarding the tone of coverage, PEJ employed computer coding software developed by Crimson Hexagon along with PEJ's traditional media research methods.
The technology for Crimson Hexagon is rooted in an algorithm created by Gary King, a professor at Harvard University's Institute for Quantitative Social Science. (Click here to view the study explaining the algorithm.)
According to Crimson Hexagon, the purpose of computer coding is to "take as data a potentially large set of text documents, of which a small subset is hand coded into an investigator-chosen set of mutually exclusive and exhaustive categories. As output, the methods give approximately unbiased and statistically consistent estimates of the proportion of all documents in each category."
Crimson Hexagon software examines online content provided by RSS feeds of millions of sites from the U.S. and around the world. This provides researchers with analysis of a much wider pool of content than conventional human coding can provide. CH maintains a database of all texts available so content can be investigated retroactively.
For determining the tone of coverage for each candidate, PEJ included Crimson Hexagon's database of news outlets which includes more than 11,500 news sites. Not all of these outlets contain campaign stories on a regular basis, but any time they do, those stories are included in the sample. For instance, a local newspaper site may not offer much coverage of the presidential campaign. However, the sample will include any relevant reports that do appear.
While the software collects and analyzes online content, the database includes many news sites produced by television and radio outlets. Most stations do not offer exact transcripts of their broadcast content on their sites and RSS feeds, however, those sites often include text stories that are very similar to reports that were aired. For example, even though the television programs from Fox News are not in the sample directly, content from Fox News is present through the stories published on FoxNews.com.
The universe includes content from all the major television networks along with thousands of local television and radio stations. Two notable television sources, CBS and PBS' NewsHour, do offer transcripts of their television news programs, and those texts are included in the sample.
PEJ utilizes a wide range of news outlets as the basis for its examination of tone for this project. Previous testing, however, has shown that a smaller, more select sample of popular media sites considered drivers of news coverage-nearly identical to the 52 sites used by PEJ in its News Coverage Index-would yield similar results. PEJ has conducted a number of comparisons between a broad sample and a smaller, "elite" sample and concluded that the results were alike. To view specific evaluations, see the studies located here and here.
Crimson Hexagon draws its universe of tweets from something called the "Twitter Firehose data feed." That is a feed of all the tweets on the twitter platform that are public. According to Twitter's own blog, there are about 140 million tweets posted each day that are public on the Firehose feed. (The Firehose does not include private tweets. Since private tweets are sent to individuals, much like emails, they are not part of public conversations.)
The volume of conversation in Twitter is referred to as "assertions." The number of assertions refers to the quantity of statements or opinions focused on each person.
Because CH examines text in the aggregate, it is not enough to simply count the number of tweets where a person's name shows up to gauge how often a candidate is being discussed. Some tweets may include multiple statements or opinions, while others may use the same word as a candidate's name without referring to the candidate. For example, a tweet that refers to the Huntsman Center at the University of Utah is likely unrelated to the presidential campaign of Jon Huntsman, and is therefore discarded from the sample studied in this report.
Therefore, the number of assertions is a more accurate measure because it includes the relevant statements about the subjects in the race without extraneous information.
Monitor Creation and Training
Each individual study or query related to a set of variables is referred to as a "monitor."
The process of creating a new monitor consists of four steps. (See below for an example of these steps in action.)
First, PEJ researchers decide what timeframe and universe of content to examine-general news stories or messages on Twitter. PEJ only includes English-language content.
Second, the researchers enter key terms using Boolean search logic so the software can identify the universe of posts to analyze.
Next, researchers define categories appropriate to the parameters of the study. If a monitor is measuring the tone of coverage for a specific politician, for example, there would be four categories: positive, neutral, negative, and irrelevant for posts that are off-topic.
If a monitor is measuring media framing or storyline, the categories would be more extensive. For example, a monitor studying the framing of coverage about the death of Osama bin Laden might include nine categories: details of the raid, global reaction, political impact, impact on terrorism, role of Pakistan, straight account of events, impact on U.S. policy, the life of bin Laden, and a category of off-topic posts.
Fourth, researchers "train" the CH platform to analyze content according to specific parameters they want to study. The PEJ researchers in this role have gone through in-depth training at two different levels. They are professional content analysts fully versed in PEJ's existing content analysis operation and methodology. They then undergo specific training on the CH platform including multiple rounds of reliability testing.
The monitor training itself is done with a random selection of posts collected by the technology. One at a time, the software displays posts and a human coder determines which category each example best fits into. In categorizing the content, PEJ staff follows coding rules created over the many years that PEJ has been content analyzing the news media. If an example does not fit easily into a category, that specific post is skipped. The goal of this training is to feed the software with clear examples for every category.
For each new monitor, human coders categorize at least 250 distinct posts. Typically, each individual category includes 20 or more posts before the training is complete. To validate the training, PEJ has conducted numerous intercoder reliability tests (see below) and the training of every monitor is examined by a second coder in order to discover errors.
The training process consists of researchers showing the algorithm stories in their entirety that are unambiguous in tone. Once the training is complete, the algorithm analyzes content at the assertion level, to ensure that the meaning is similarly unambiguous. This makes it possible to analyze and proportion content that contains assertions of differing tone. This classification is done by applying statistical word patterns derived from posts categorized by human coders during the training process.
Improved Training Sample Changes (added February 27, 2012)
The Pew Research Center’s Project for Excellence in Journalism, with the help of Crimson Hexagon, has reevaluated the training methodology for its campaign tone monitors in a constant effort to ensure the data we release is as accurate as possible. After close examination and thorough testing, PEJ has tweaked the training methodology for the tone monitors.
This opportunity for improvement became realized when PEJ’s researchers discovered that the methodology Crimson Hexagon recommended was not reflecting as much of the week-to-week changes as PEJ’s human content analysis of news content indicated it should. PEJ determined that media coverage of the campaign shifted more rapidly than what Crimson Hexagon had found with other kinds of news, and therefore the standard method for building monitors was unable to capture the quickly-evolving campaign news cycle.
As a result, PEJ instituted the following improvements:
With the earlier training methodology, researchers kept all old training documents in the monitor when updating the current week. Coders would then add 25 new news stories from the present week in order to keep the algorithm up-to-date.
PEJ is making two major changes to the original method.
Researchers will now remove any documents which are more than three weeks old. For example, for the monitor the week of February 13-19, 2012, there will be no documents from before January 30. This ensures that older storylines no longer playing in the news cycle will be removed and the algorithm will be working with only the newest material.
Second, each week trainers will add more stories to the training sample to ensure that the changes in the storyline are more accurately reflected in the algorithm. PEJ researchers will now add, at a minimum, 10 new training documents to each category. This results in many categories receiving much more than the 10 new documents. On average, researchers will add roughly 60 new training documents each week.
The changes in the data set from the old and new methodologies are available here.
How the Algorithm Works
To understand how the software recognizes and uses patterns of words to interpret texts, consider a simplified example. Imagine the study examining coverage regarding the death of Osama bin Laden that utilizes the nine categories listed above. As a result of the example of stories categorized by a human coder during the training, the CH monitor might recognize that portions of a story with the words "Obama," "poll" and "increase" near each other are likely about the political ramifications. However, a section that includes the words "Obama," "compound" and "Navy" is likely to be about the details of the raid itself.
Unlike most human coding, CH monitors do not measure each story as a unit, but examine the entire discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than dividing each story, paragraph, sentence or word, CH treats the "assertion" as the unit of measurement. Thus, posts are divided up by the computer algorithm. If 40% of a story fits into one category, and 60% fits into another, the software will divide the text accordingly. Consequently, the results are not expressed in percent of newshole or percent of stories. Instead, the results are the percent of assertions out of the entire body of stories identified by the original Boolean search terms. We refer to the entire collection of assertions as the "conversation."
Testing and Validity
Extensive testing by Crimson Hexagon has demonstrated that the tool is 97% reliable, that is, in 97% of cases analyzed, the technology's coding has been shown to match human coding. PEJ spent more than 12 months testing CH and its own tests comparing coding by humans and the software came up with similar results.
In addition to validity tests of the platform itself, PEJ conducted separate examinations of human intercoder reliability to show that the training process for complex concepts is replicable. The first test had five researchers each code the same 30 stories which resulted in an agreement of 85%.
A second test had each of the five researchers build their own separate monitors to see how the results compared. This test involved not only testing coder agreement, but also how the algorithm handles various examinations of the same content when different human trainers are working on the same subject. The five separate monitors came up with results that were within 85% of each other.
Unlike polling data, the results from the CH tool do not have a sampling margin of error since there is no sampling involved. For the algorithmic tool, reliability tested at 97% meets the highest standards of academic rigor.
In the analysis of campaign coverage, PEJ uses CH to study a given period of time, and then expand the monitor for additional time going forward. In order to accomplish this, researchers first create a monitor for the original timeframe according to the method described above.
Because the tenor and content of online conversation can change over time, additional training is necessary when the timeframe gets extended. Since the specific conversation about candidates evolves all the time, the CH monitor must be trained to understand how newer posts fit into the larger categories.
In those instances, researchers conduct additional training for the monitor with a focus on posts that occurred during the new time period. For every new week that is examined, at least 25 more posts are added to the monitor's training. At that point, the monitor is run to come up with new results for the expanded time period which are added to results that were already derived in the original timeframe.
Since the use of computer-aided coding is a relatively new phenomenon, it will be helpful to demonstrate how the above procedure works by following a specific example.
In September 2011, PEJ created a monitor to measure the tone of media coverage on news sites for Republican candidate Mitt Romney. First, we created a monitor with the following guidelines:
We then created the four categories that are used for measuring tone:
Next, we trained the monitor by classifying documents. CH randomly selected entire posts from the time period specified, and displayed them one by one. A PEJ researcher decided if each post is a clear example of one of the four categories, and if so, assigned that post into the appropriate category. If an example post is not clear in its meaning, or could fit into more than one category, such as a story with a mix of positive and negative assertions, the coder skipped the post. Since the goal is to find the clearest cases possible, coders will often skip many posts until they find good examples.
A story that is entirely about a poll showing Mitt Romney ahead of the Republican field-and that his lead is growing, would be a good example to put in the "positive" category. A different story that is entirely about Romney's record in Massachusetts and how many conservative voters are opposed to him would be put in the "negative" category. A post that is devoid of good or bad implication for the candidate, such as a story about a speech Romney gave on the economy that does not include evaluative assessments, would be put in the "neutral" category. And a post that includes the word "Romney" but is not about the candidate at all, such as a story about a different person with the same last name, would go in the "off-topic" category.
The coder trained 260 documents in all-ten more than the necessary minimum of 250. Each of the four categories had more than 20 posts in them.
At that point, the initial training was finished. For the sake of validity, PEJ has another coder check over all of our training and look for stories that they would have categorized differently. Those stories are removed from the training sample because the disagreement between coders shows that they are not clear, precise examples. In the case of the Romney monitor, there were four documents that were removed for this reason.
Finally, we "ran" the monitor. This means that the algorithm examined the word patterns derived from the monitor training, and applied those patterns to every post that was captured using the initial guidelines. Since the software studies the conversation in an aggregate as opposed to individual posts or stories, the algorithm divided up the overall conversation into percentages that fit into the four categories.
For the initial monitor, the algorithm examined over 94,00 assertions from thousands of news stories and determined that 34% of the conversation was positive, 33% neutral, and 33% negative. The assertions or statements that are off-topic were excluded from the results.
In order to extend the Romney monitor beyond September 11, 2011, coders added at least 25 new pieces of content to the training for each new week examined. This assures that any linguistic changes in the overall coverage or conversation regarding Romney in the new week are accounted for. We then run the monitor again, which now includes the original training of approximately 260 posts plus 25 new ones for each additional week while leaving the earlier results in place.