As the Internet continues to change the news industry and the methods of production, circulation and consumption, it is ever more critical to understand the emerging trends and news outlets available online. Citizens must make daily choices about what sites to go to for various kinds of news information, but it is largely up to them to figure out which site can best fit their needs at the moment. And in many instances they may be making choices without fully understanding why.
The content analysis element of the 2007 Annual Report on the State of the News Media was designed to try to sort through the many different kinds of sites that offer news information. What do some sites emphasize over other things? Are there common tendencies? The creation of the study and the analysis of the findings was a multi-step process.
To assess the range of news Web sites available, we selected 38 different Web sites that provide such information. The sites were initially drawn from the seven media sectors that PEJ analyzes in each annual report:
In addition, we included one foreign broadcast site (BBC News) and the site of one wire service. (Due to the language barrier, Ethnic, non-English language Web sites were not included in the study.)
The result was the following list of sites:
ABC News Com http://abcnews.go.com [1]
BBC News http://news.bbc.co.uk [2]
Benicia News http://www.benicianews.com [3]
Boston Phoenix http://www.thephoenix.com [4]
CBS11 TV http://cbs11tv.com [5]
CBS News http://www.cbsnews.com [6]
Chicago Sun Times http://www.suntimes.com [7]
CNN http://www.cnn.com [8]
Crooks and Liars http://www.crooksandliars.com [9]
Daily Kos http://www.dailykos.com [10]
Des Moines Register http://www.desmoinesregister.com [11]
Digg http://digg.com [12]
Economist http://www.economist.com [13]
Fox News http://www.foxnews.com [14]
Global voices http://www.globalvoicesonline.org [15]
King5 TV http://www.king5.com [16]
Los Angeles Times http://www.latimes.com [17]
Little Green Footballs http://www.littlegreenfootballs.com [18]
Michelle Malkin http://www.michellemalkin.com [19]
MSNBC http://www.msnbc.msn.com [20]
AOL News http://news.aol.com [21]
Google News http://news.google.com [22]
Yahoo News http://news.yahoo.com [23]
New York Post http://www.nypost.com [24]
New York Times http://www.nytimes.com [25]
NPR http://www.npr.org [26]
Ohmynews.com http://english.ohmynews.com [27]
PBS NewsHour http://www.pbs.org/newshour [28]
Reuters http://www.reuters.com [29]
Salon http://salon.com [30]
San Francisco Bay Guardian http://www.sfbg.com [31]
Slate http://slate.com [32]
Time Magazine http://www.time.com [33]
Topix http://www.topix.net [34]
USA Today http://www.usatoday.com [35]
Washington Post http://www.washingtonpost.com [36]
The Week Magazine http://www.theweekmagazine.com [37]
WTOP Radio http://www.wtop.com [38]
Web sites were captured by a team of professional content coders. At each download, coders made an electronic and printed hard-copy of the homepages for each site as well as the top five news stories. Prominence was determined as follows:
The biggest headline at the top of the screen is the most prominent story. It may or may not have an image associated with it. The second-most prominent story is one that is attached to an image at the top of the screen, if that is a different story from the most prominent story. If there is no image at the top of the screen, (or there are two significant stories attached to the same image) refer then to the next-largest headline. To determine the next-most-prominent stories, refer first to the size of the headlines, and then the place (height) on the screen. If two stories have the same font size and are at the same height on the screen, then give the story on the left more prominence.
Stories were defined as:
Web sites were initially studied from September 18 through October 6, 2006. For that initial review, each site was captured and coded four different times. For two captures, the research team coded for the entire set of variables, both the homepage analysis and the variables related to the content of news stories. The other two rounds of capture were coded only for the variables relating to the content of the lead stories.
Each site was then studied again during the week of February 12-16, 2007, and coded separately. Results for the two time periods were compared. In cases where features had changed, we closely examined the site again to confirm the change or correct inconsistencies. Final analyses were based on the confirmed February site scores.
To create the coding scheme, we first worked to identify the different kinds of features available online — everything from contacting the author to quickly finding just what you want to receiving your news free — and how they could be measured. After several weeks of exploratory research, we identified 63 different quantitative measures and developed those into a working codebook (see list of primary variables below).
Coding was performed at the PEJ by a team of seven professional in-house coders, overseen by a senior researcher and a methodologist. Coders were trained on a standardized codebook that contained a dictionary of coding variables, operations definitions, measurement scales and detailed instructions and examples. The codebook was divided into two sections. The first was based on an inventory of the Web site’s homepage. That was performed three separate times — twice in September, 2006, and once in February, 2007. The second component involved coding the content of news stories themselves. We included the top five stories for the variables related to the content of the news and took the average score for each variable.
Before coding began, coders were trained on the codebook. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and where necessary additional captures took place to verify findings.
Coders followed a series of standardized rules for coding and quantifying Web site traits. Three variables deserve specific mention:
1. Multimedia components on the homepage: Coders counted all content items, defined as links to all material other than landing pages or indexes of some sort. Included were narrative text, still photos, interactive graphics, video, audio, live streams, live Q&A’s, polls, user-based blogs, podcast content and slide shows. Next, the coders tallied the total number of content items on the page as well as the totals for each media form and entered the percentages for each into the data base.
2. Advertisements: In counting advertisements on the homepage, coders included all ads, from obvious banners and flash advertisements to the smaller single-link sponsors of a site. Self-promotional ads were also included in the total. The idea of this variable was to estimate the economic agenda of a given site based on the amount of advertising on the homepage. Advertisements on internal pages were not included in the tally. Because of day-to-day variance in the total number of homepage ads, the final figure was either the average based on all the visits to a site or, in cases where a site redesign had clearly occurred, the latest use of ads.
3. Also in the Byline variable, blog posts required special rules. In counting bylines, for instance, researchers coded a blog entry as if the entry was posted by the blog host—John Amato on Crooks and Liars, for example. If the blog entry was posted by a regular contributor or staff, the “story” scored a “2.” And if the blog entry was posted by an outside contributor, not bylined, or consisted primarily of outside material (an entry, for instance, that simply said, “Read this,” followed by an excerpt from another source), then the post received a score of “3,” the lowest on the scale of original stories.
In analyzing the data, we were able to group variables into six different areas of Web emphasis: User Customization, User Participation, Multimedia Use, Editorial Branding and Originality, Depth of Content and Revenue Streams.
Customization includes
Participation includes
Multimedia includes
Percent of homepage content devoted to:
Editorial Branding includes
Story Depth includes
Revenue Streams includes
Codes within each variable were translated into a numerical rating from low to high for that particular feature. Then PEJ research analysts produced an Excel template to tally the scores (summing the variables) for each site within the six categories. Thus for each of the six categories, each site had a final score. The range of scores was then divided into four quartiles and sites were marked according to which quartile they fell into.