June 24, 2020

As COVID-19 Emerged in U.S., Facebook Posts About It Appeared in a Wide Range of Public Pages, Groups

Methodology

This study takes a close look at the public spaces (public pages and groups) on Facebook in which coronavirus-related posts with links in March 2020 received a high level of interactions. During this time, the United States went from having two known deaths from the virus on March 1 to a stay-at-home order in most states and over 4,000 deaths by the end of the month. Examining the public spaces and sources that people turn to when posting about important topics such as the coronavirus can shed light on how people share and consume information online.

Terminology

Public spaces: These are defined in this study as the Facebook pages and groups where posts are public (Facebook distinguishes between pages and groups, but they were combined here). The study analyzed English-language posts about COVID-19 in the 3,000 public spaces (1,500 pages and 1,500 groups) whose coronavirus-related posts with links received the most engagement, on average. Public spaces were coded for their subject orientation and their geographic focus.

Public spaces subject orientation: The different types of public spaces are determined by the main topic the public space is oriented around. Researchers manually categorized public Facebook spaces into 10 categories based on their title and “about” section: 1) personal interest and lifestyle; 2) entertainment and sports; 3) government and politics; 4) religion; 5) general news; 6) business and public figures; 7) foreign; 8) nonprofit and research; 9) humor; and 10) health care and science. For example, a public space called “Informed Parents of California” was classified as personal interest and lifestyle.

Public spaces geographic focus: This indicates whether a public space explicitly referenced a local city, town, neighborhood or state in the U.S.; a foreign country or area; or did not have a geographic focus, including those about the U.S. generally. For instance, the “Informed Parents of California” space had a local focus.

Public posts: These are the English-language coronavirus-related posts with links in the 3,000 public spaces analyzed. Coronavirus-related posts are those that match a set of coronavirus-related keywords (“coronavirus,” “covid-19,” “covid” or “corona virus”). This research identified about 6.5 million Facebook posts published during March 2020 (including those that did not share a link), but the main analysis focused on the 93,091 posts with links about COVID-19 from the 3,000 public spaces, based on the average number of interactions their coronavirus-related posts with a link received.

Interactions: These are the total number of comments, shares, likes and other reactions to these posts. The average coronavirus-related post with a link in this study had 2,713 total interactions.

Sources: These are the sites (e.g. pewresearch.org) that are linked to in these posts (referred to both as sources and sites). The 93,091 Facebook posts analyzed in this study linked collectively to 4,860 distinct sites across the 3,000 public spaces. Sources are coded by source type.

Source type: This describes the different types of websites found at the destination of each link. Researchers manually grouped these into six broad mutually exclusive categories: 1) news organizations, including TV, print and digital; 2) social sharing sites, like blogs and social media; 3) nonprofit and research organizations, including academic institutions and think tanks; 4) health care and science sites, including doctors, hospitals and public health agencies; 5) government and political sites; and 6) all other sites. The study also took a closer look at whether a news organization had a geographic focus – that is, whether it focuses on news in a U.S. city or state, a foreign country, or has no geographic orientation. For example, a local TV news station has a local geographic orientation, while a 24-hour cable news network does not have one.

Data collection

Posts about the coronavirus were collected from CrowdTangle, a public insights tool owned and operated by Facebook. CrowdTangle gives academic and researchers access to public posts in their database that match keywords that the researcher supplies in their query.

In this study, researchers searched CrowdTangle using two approaches described below in order to collect the most accurate number of coronavirus-related posts.

Approach 1: Search interface

The first method used the search interface to get estimates of the number of posts matching selected keywords each day in both public pages and public groups. The search interface provides results in which the selected keywords appeared in the post message or in the text of the link, if a link is included in the post. The keywords used in this analysis are “coronavirus,” “covid-19,” “covid” or “corona virus”

These keywords were selected to ensure only posts about the outbreak were included. Researchers also tested additional keywords, including “outbreak,” “pandemic” and “lockdown,” which provided similar results.

This search extended to both public pages and public groups, though there were small differences in how the individual groups and pages were filtered (in this study public pages and groups are analyzed together as public spaces). Unique criteria for inclusion in this analysis were applied for posts in public pages and groups.

  • Posts in public pages: English-language posts and those posted in pages whose owners were based in the U.S. were included if they were in the CrowdTangle database.
  • Posts in public groups: English-language posts were included. There is no functionality in CrowdTangle to select U.S. ownership for groups, but the CrowdTangle data only includes S.-based groups with at least 2,000 members.

Using the CrowdTangle search interface, researchers recorded CrowdTangle’s daily estimates of the number of posts that matched these criteria. Thus, the study identified 6.5 million English-language Facebook posts from public spaces that matched these keywords from March 1 to March 31, 2020. Just under 4 million (3.7 million) of these posts had a link. Most of this study’s analysis focuses on posts about COVID-19 that contained a link (see the method for this analysis below). The 6.5 million Facebook posts were used for the timeline analysis in the overview section of the report [LINK].

Approach 2: Historical data interface

Most of this report’s analysis used data collected from CrowdTangle’s Historical Data interface.

Researchers saved a search in CrowdTangle’s interface using the same search parameters (keywords and specific criteria for pages and groups) described above and used this interface to download all posts matching those criteria that included a link (CrowdTangle only provides data on the first link in the post) from March 1 to March 31, 2020. These data were downloaded April 14-16, 2020.

After removing duplicate posts, there were 3.6 million posts considered for this analysis.

Selecting public spaces

Once all coronavirus-related posts with links were downloaded (N=3.6 million), the average interaction rate across all posts in each public space was then calculated. This allowed researchers to get a sense of where these coronavirus-related posts received the most attention.

Researchers then examined the 3,000 public spaces (1,500 pages and 1,500 groups) with the highest average interaction rate for posts that mentioned coronavirus and contained a link; this amounted to a total of 93,091 posts.

The interaction rate measures the level of engagement a post received. Interactions are defined as the comments, shares and reactions a post received (including like, wow, love and other reactions) and was calculated as the sum of all interactions.

This study specifically looked at the interaction rate across all coronavirus-related posts with links in the public spaces studied. This was calculated using the mean interactions (or average interactions) of these posts. For example, across the 3,000 public spaces, all coronavirus-related posts that linked to news organizations had an average of 2,713 interactions.

Another approach is to measure the interaction rate using the median interaction rate instead of the mean. The average gives a general indication of how much engagement posts are receiving, which could be influenced by a small number of posts that go viral. In contrast, the median shows the typical post and would not be influenced by virality. Because this study looks at the broad environment, researchers compared the mean and median and found similar patterns in the findings.

For these reasons, this study uses the average interaction rate to select public spaces to study. The 1,500 public pages and 1,500 public groups with the highest average interaction rate were included in this study.

Collecting websites

The next step was the analysis of links published in posts about COVID-19. The study extracted the links that were shared in the 93,091 coronavirus-related posts (the text and any links in comments were not included). This analysis aimed to identify the different websites used when posting about COVID-19.

Link cleaning

Many users post links with link shorteners such as bit.ly or share.gs rather than the full URL. CrowdTangle expands many of these shortened URLs into the full URL in the data it provides, but it does not do so for all links. CrowdTangle does not consistently provide expanded links for those organizations or individuals using custom shorteners (for example, Pew Research Center uses links that begin with pewrsr.ch instead of pewresearch.org). To determine the site that was linked to in the full URL, researchers followed all URLs and recorded the final destination using a custom Python script.

Each link was then analyzed using a custom Python script to determine the site that published it (e.g. pewresearch.org/facebook_study.html was published on pewresearch.org). Researchers then manually analyzed each site to ensure there were no duplicates or subdomains (e.g., newsletters.pewresearch.org would be included in pewresearch.org). This process identified 4,860 unique sites in the 93,091 coronavirus-related posts.

 

Content analysis

This study conducted a content analysis of each of the 3,000 public spaces and the 4,860 sites. Five coders were trained and performed this content analysis. All codes were then reviewed by a team of at least two researchers, including at least one that was not part of the original coding.

The 3,000 public spaces were coded for two variables:

  • Subject orientation refers to the broad topic that the public space is oriented around. For each public space, researchers examined the name as returned in the CrowdTangle data. They also performed Google and Facebook searches if the name itself was unclear. Individual posts on a given public space were not considered to determine the subject orientation for each public space. This resulted in a total of 10 broad categories used throughout the report – public spaces oriented around:
    • Personal interest & lifestyle
    • Entertainment & sports
    • & politics
    • Religion
    • General news
    • Business & public figures
    • Foreign
    • Nonprofit & research
    • Humor
    • Health care & science

 

  • Geographic focus indicates whether a public space explicitly referenced a local city, town, neighborhood or state in the U.S., a foreign country or area, or did not have a geographic focus.

The 4,860 websites were analyzed for two variables:

  • Source type describes the different types of websites that coronavirus-related posts linked to. There were a total of 20 mutually exclusive source types, grouped into six broad categories:
    • News organizations – Television stations, digital native news organizations, print publications, news aggregators, radio & podcasts, wire services
    • Social sharing sites – Blogs, social media sites, other discussion sites
    • Nonprofit & research organizations – Nonprofit & advocacy organizations, academic & research organizations, fact-checking sites
    • Health care & scienceHealth care entities (e.g., hospitals, doctors), public health agencies (e.g., CDC)
    • Government and political sites – no subcategories
    • Other sites – Non-U.S. sites, business sites, satirical & humor sites, religion sites, other sites

Geographic focus: This variable was only analyzed for those sites that were news organizations, as described above. This variable identified whether a news organization was local (the outlet focused on news in a U.S. city, state or another specific local area), non-U.S. (the outlet focused on a foreign country or area) or had no geographic focus.

Coders were given multiple sets of public spaces and sites to evaluate during the training process. Coding began once internal agreement of how to code the variables was established. The Cohen’s kappa (for public spaces, which had two coders) and Krippendorff’s alpha (for sites, since there were more coders) estimate for each variable is below. During this process, coders trained on 300 public spaces and 205 sites.

  • Public spaces subject orientation: 0.83
  • Public spaces geographic focus: 0.88
  • Sites source type: 0.91

Analysis of the geographic focus for news organizations was done collaboratively with two coders and was not tested for intercoder reliability, as they talked through each case.

Throughout the coding process, coders discussed questions as they came up and arrived at decisions under the supervision of the content analysis team leader.