April 5, 2015

Methodology: From Twitter to Instagram, a different #Ferguson conversation

Click here to see the report.

This analysis of the social media discussions surrounding the events in Ferguson, Mo., was done using two different research methods. The analysis of Twitter combined Pew Research’s content analysis rules with computer coding software developed by Crimson Hexagon (CH). The analysis of Instagram used human coding. The time period examined for this study was March 3-25, 2015. The data were gathered and analyzed by Michael Barthel, Paul Hitlin, Kristine Lu and Nancy Vogt.

Analysis of Twitter

Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. Pew Research draws its analysis sample from all public Twitter posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training. While automated sentiment analysis is not perfect, the Center has conducted numerous tests and determined that Crimson Hexagon’s method of analysis is among the most accurate tools available. Multiple tests suggest that results from human coders and Crimson Hexagon are in agreement between 75% and 83% of the time. (For a more in-depth explanation on how Crimson Hexagon’s technology works click here.)

Researchers collected all public, English-language tweets that included the hashtag #Ferguson during the time period examined. There were 651,581 such tweets.

Researchers classified more than 250 tweets in order to “train” these specific Crimson Hexagon monitors. All tweets were put into one of four categories: Ferguson-related event, Ferguson-related theme, indirectly related to Ferguson event and indirectly related to Ferguson theme.

CH monitors examine the entire Twitter discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than the dividing each story, paragraph, sentence or word, CH treats the “assertion” as the unit of measurement. Thus, posts are divided up by the computer algorithm. Consequently, the results are not expressed in percent of posts. Instead, the results are the percent of assertions out of the entire body of posts identified by the original Boolean search terms. We refer to the entire collection of assertions as the “conversation.”

Analysis of Instagram

Researchers directly pulled all the Instagram posts that included the hashtag #Ferguson from the site’s API (application program interface) during the same time period, March 3-25. That effort returned roughly 8,300 posts. In order to get an understanding of the content of those posts, researchers randomly selected about 5% of those posts (410) and performed human coding of that representative sample.

Of those 410 posts, 59 were excluded from the study for a variety of reasons. Two were no longer online by the time the study began, twelve were in a language other than English, six were about events in Ferguson that had nothing to do with the Michael Brown story or the subsequent events, and 39 used the hashtag to refer to the tractor company Massey Ferguson or to the soccer coach Alex Ferguson. As a result, 351 unique Instagram posts were coded.

Two researchers performed the coding following rules that Pew Research Center has developed over the past ten years.

The unit of analysis for Instagram in this study was the post.

The two coders performed a test of intercoder reliability to ensure consistency. Both independently coded the same set of ten posts and their rate of agreement was 90%.