September 15, 2008

McCain vs. Obama on the Web

Methodology

Sites Studied

The Pew Research Center’s Project for Excellence in Journalism analyzed the components of the campaign Web sites of presidential candidates John McCain (R) and Barack Obama (D).

John McCain (R) http://johnmccain.com/

Barack Obama
(D) http://www.barackobama.com/

In addition to the Web sites as a whole, the Project conducted a text analysis of each candidate’s biography, issues and speeches sections.

At each download, coders made an electronic copy of the home page, as some Web sites were not printer-friendly. Biographies, issues texts and speeches were copied and saved as Word documents.

Capture Timing

Web sites were initially coded on August 6, 2008. To compare results for accuracy each Web site was coded again on August 8, 2008, and over the course of a month tracked to make note of significant changes made. A final audit of both Web sites was conducted on September 9, 2008, following, after the national political conventions concluded.

The biography sections were downloaded and coded on August 6, 2008. They were also re-read during our second round of coding on August 8, 2008.

Coding Scheme and Procedure

Web site Analysis: To create the coding scheme, we first worked to identify the different kinds of features available on a campaign Web site. These ranged from tools to organize fundraisers to candidate positions on the issues. We identified 22 different quantitative measures and developed those into a working codebook.

Coding was performed at the Project by two professional research analysts. The codebook contained a dictionary of coding variables, operational definitions, detailed instructions and examples. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and monitor consistency of coding. Where necessary, additional Web site captures took place to verify findings. Coders followed a series of standardized rules for coding and quantifying Web site features.

Certain variables merit an explanation of their working definition as applied in the coding scheme:


Site customization—
this variable looked at whether a visitor could tailor the home page/ Web site based on their personal preferences. This feature always required users to register, and included the ability to create and access personal profiles, personal messages, personal blogs and more.

Demographic group pages—this variable measured whether the campaign Web site had a section of the site devoted to various demographic groups.

User comments on campaign blogs—this variable identified whether campaign blogs permitted space for users to add their comments to the official campaign blog posts. We coded for presence of comments on the blog.

Citizen-initiated blogs—in addition to the official campaign blog, several candidates provided a tool for users to establish their own blog to show their support for the candidate. These were coded as citizen-initiated blogs. We coded for the presence of citizen initiated blogs.

Information delivery options—this variable was coded for presence of tools to deliver information directly to users. The dimensions were: RSS feeds, Podcasts, e-mail updates, mobile updates, and search capability.

Grassroots activity—we coded for presence of options for grassroots activity. This variable had three dimensions: fundraising, organizing community events and voter registration information. We coded for presence of all three dimensions.

Social Networking—we coded for the presence of “social networks” and also the number of social networks that a candidate displayed on his/her Web site (on the home page or elsewhere). They were embedded links that led the user to the candidates’ profiles on respective external social networking Web sites such as MySpace, Facebook, Flickr or YouTube. Researchers also searched groups on various social networking Web sites including LinkedIn, MiGente, Asian Ave, Eons and BlackPlanet for the presence of official campaign groups.

Newsroom—the section on the site that lists articles not authored by the campaign. These are predominantly articles about the candidate that appear in the mainstream media (including editorials) and appear as either links to an external site or the article as a whole with the source. If these sections included press releases by the campaign, they were counted as a separate variable. We coded for presence of a newsroom section, and also coded the total number of items in the archived section for the previous week.

Spanish translation—we also coded for whether or not campaign Web sites offered an option to translate content into a second language. Spanish was the only language offered for such translations at the time of publication. It should be noted that this did not necessarily include a translation of all the content on the Web site. A professional research analyst fluent in Spanish compared the English and Spanish texts available.

Text Key Word Search: The program CatPac® was used to analyze the candidate biographies, issue positions and speeches. CatPac is a “self-organizing artificial neural network” that has been optimized for reading text (Doerfel and Barnett, 1999.) By assigning a neuron to each major word in the text, the program is able to identify the most important words in the text by measuring their frequency and co-occurrence.

CatPac also contains a default “exclude” file, which contains prepositions, articles, conjunctions and transitive verbs (such as ‘and,’ ‘when,’ ‘he,’ etc.) that do not bear any meaning and produce clutter within the text. Thus, when the analysis is carried out, these words are excluded by the program, so that they don’t complicate the results.

Our sample consisted of the biographies, issues pages and speeches of both presidential candidates. Where text was presented on more than one page, it was combined into one text file, so that there was just one document for each type of text, for each candidate. These were then fed into the CatPac program.

Analysis

Rankings: In analyzing the data, there were three different areas in which we tallied results and ranked the various Web sites:

  • Conversation and Customization: To gauge how the campaign Web sites were engaging visitors the Project assessed four variables: the degree of customization available, whether the campaign blogs allowed user comments, whether users could create their own blogs, and whether websites devoted special pages for demographic interest groups.
  • Information Delivery: We looked at five variables to help gauge how candidates disseminated their content: e-mail updates, RSS (Real Simple Syndication) feeds, podcasts, mobile device delivery, search function a
  • Grassroots activity:We examined the presence of tools to facilitate grassroots activities. We coded for presence of tools that allowed users to donate money, fundraise from others, host gatherings, register to vote, canvass door-to-door, telephone undecided voters, compete for “activity points” and use a resource library for campaign volunteers.

Research analysts tallied the scores (summing the variables) for each Web site within the two categories. Thus for each of the two categories, each Web site had a final score on a scale ranging between one and six.

Key Word Usage: Our text analysis had two inter-related components. The first was identifying and analyzing the most frequently used or top words in each biography, issue section and speeches section. These were spontaneously generated by the program. The number of most frequent words analyzed for the study was set at 15. Indeed, according to the program, in most studies the first 1% of the total number of words is sufficient for text-analysis. In this case, 15 words were more than 1% of the total text analyzed.

The search was run for both candidates. In our final analysis, we looked only at the top five words for individual candidates and the top five for the broader groups.