Market Selection
Cities were selected by the research team based on Nielsen Media Research market rankings. Markets were grouped into four quartiles on the basis of the number of television households in each. There was some further adjustment to ensure large markets were not under represented in the sample. Five markets within each quartile were selected randomly, after being stratified to ensure geographic diversity.
Broadcast Selection
The research studied the highest-rated competing half-hour news programs in the market, picking one sweeps week and one non-sweeps week. Using the highest rated time slot as the common denominator - while excluding hour-long newscasts and distant stations - provided the most consistent yardstick between markets. In future years, the study will examine other time periods in some of these markets to see differences between time periods in the same market.
Taping and Screening
Research associates in each of the designated 20 markets taped newscasts from March 9 through 13, 1998 (non-sweeps) and April 27 through May 1, 1998 (sweeps). As a backup, they also taped March 16 through 20 (non-sweeps, secondary) and May 4 through 8 (sweeps, secondary).
The primary taping period was studied unless taping error or program preemption made that impossible, in which case coders substituted days from the secondary weeks, making every effort to match the appropriate day of the week. (In one case, for WBBM/Chicago, taping error required the substitution of a July non-sweeps.)
Each half-hour broadcast was initially screened and pre-coded in its entirety by a single coder to confirm the date/time slot of each broadcast and identify and time individual stories. Given the design team's instructions that weather and sports content are now virtually identical and vary almost entirely only in style, recurring or regular sports and weather segments were classified and timed but were not part of any additional coding or analysis. However, any weather or sports coverage that was moved up into the news segment of the newscast, because of a big storm or game, was thoroughly coded as part of the analysis.
Story Coding and Scoring
Broadcasts were coded in their entirety by a single coder, via multiple story viewings and a standardized codebook. The process began with inventory variables--broadcast date, market, station, network affiliation, etc. Second, recordable variables were coded, including story length, actors, and topics. The final section of the coding scheme contained the rateable variables. These were the measurements identified by the design team as quality indicators. The range in maximum possible points reflects the hierarchical significance of each value per quantitative analysis of the design team's input. Each rateable variable was assigned both a code and a point score.
Here are the variables and their maximum possible points per story: Focus, 10. Enterprise, 8. Source expertise, 9. Balance via # of sources, 5. Balance via, viewpoints, 5. Sensationalism, 3. Presentation, 2. Community Relevance, 8. The score-per-story represents points earned via the rateable variables.
Topic Diversity
Per the design team's directives, no story points were earned for topics; that is, no one topic was considered more important than another. Instead, stations were rewarded for covering a diversity of topics, taking into account both the number of stories presented, and allowing for the additional minutes often added in post-prime timeslots. For each newscast, the number of stories aired was divided by the number of topics they covered.
Next, the broadcast's scores-per-story were totaled, then divided by the number of stories, to reach an average score-per-story. The appropriate multiplier was then applied to the average score-per-story to reach the daily broadcast score.
Finally, each station's 10 daily broadcast scores were totaled to reach the aggregate station score.
Rating Analysis
Since various factors, from lead-in to anchor chemistry to length of time in market, can all determine ratings, the design team felt a better index to the impact of content on ratings was to analyze ratings over time. The design team suggested a three-year trend line.
The study then measured ratings two different ways. The primary method, and the one reflected in the ratings scores and charts, was to take 12 ratings books (three years), and plot a trend line. The data are based on Nielsen Media Research estimates of the weekday average household rating for each of the 12 sweeps periods ranging from July, 1995 to May, 1998, as interpreted by a project researcher. Ordinary least-squares regression was used to determine the slope for each newscast. The slope distribution was then converted into a five-point coding scheme.
The second method was designed to double check the value of the first. This created a trend from the same ratings books but weighted them from the most recent (those measured in the study) to the last. This method confirmed the finding there was no evidence to support the notion that audiences prefer lower quality local television news to higher quality news.
The aggregate score was then matched with ratings information to arrive at the final letter grade for each station.
Intercoder Reliability
Intercoder reliability measures the extent to which two coders, operating individually, reach the same coding decisions. One coder was designated as the control coder, and worked off-site for the duration of the project. At the completion of the general coding process, the three on-site coders, working alone and without access to the control coder's work, recoded one-sixth of the broadcasts completed by the control coder. Daily scores were found to be reliable within +/-0.79 points per day, as per the comparative daily broadcast scores.