Tag Archives: cnn

CNN Sentiment Score

Visualization of CNN’s 2014 Midterm Election Coverage

Adding to the basic text analytics I wrote about last week, I ran a bag-of-word sentiment analysis on CNN’s midterm election coverage on transcripts found on their site. Fortunately, all the transcripts have a time stamp on them denoting what hour of programming the transcript covers, so I was able to attach a time of day to all the transcripts to produce a visualization of CNN’s election coverage.

Category Term Table

To capture what CNN was talking about, I wrote a Python script that found specific key words based on a frequency distribution produced from the entire transcript corpus. The idea being that if CNN didn’t talk about the topic, it wasn’t worth investigating. To consolidate some of the terms into concepts or topics, I created categories to group similar words together. Sometimes these were analogous or simply a plural form of the same word.

CNN Term Count Summary

Republicans and the President were the biggest talking points of the night being mentioned more times than the Senate or Democrats. The number of times that a topic is mentioned doesn’t provide any clue to the context or demeanor of how CNN presented this topic, so a bag-of-words approach was used to score the sentiment of words surrounding these terms within the transcript. This process won’t give an exact interpretation for every instance, but it can get close. With enough term occurrences, the overall sentiment should rise above the error noise.

CNN Sentiment Score

The first thing to notice is that there was no bad news about the Republicans. The sentiment analysis never found any hour of CNN’s broadcast that had more negative mentions of Republicans than positive. In contrast, through out the day, Democrats were not doing anywhere near as well on the newscast from morning to about early evening. Mentions of President Obama were rather volatile with strong negative and positive swings through out the day. Mitch McConnell got a rather big bump right when CNN projected his Senate race in his favor [at about 7PM EST]. The topic of Washington, predominately referring to the federal government, was the only topic that had a negative overall score for the entire day.

CNN Sentiment Direct Comparison

The graph above offers a direct comparison of the sentiment scores for the political categories for every hour of broadcast during the actual election returns after 6PM EST. [It also aggregates mentions across the different programs CNN runs on different channels, so there might be a little disagreement with the numbers if you are comparing charts.]

Overall, the sentiment analysis produces an interesting visual picture of how CNN handled the election. If other news networks had transcripts of entire shows readily available, I’d be able to compare the outlets looking for evidence of bias or slant. If this was applied over a longer time frame, it could present an interesting look into how a news story evolves shredding an objective light on how the news cycle works.

Election Term Count

Basic Text Analytics for News Bias

Bias is a problem every news media outlet has in some form beyond the well-debated political slants that Fox News and MSNBC are renown for. I’ve been attempting to quantify biases using text analytics. By looking at the frequency and topics of articles, word choices, and associated words, I believe that you can find analytical evidence to better understand the how different news outlets are communicating their news.

My first attempt at this has a simple approach: measure and compare the frequency of specific key terms. I used the current topics of Ebola and the midterm election, which will demonstrate some polarization. To summarize the news content, the data was collected towards the tail end of the quarantine-issue news cycle, so there have been political debates on how to handle health-care workers returning to the United States. Oversimplifying, conservatives favor hardline precautions like quarantine, while liberals generally favor the present policy of self-monitoring. The election articles reflect news articles from the weekend before a midterm election where Republicans are favored in the polls to take control of the Senate.

All the articles were gathered from scraping Google search results for ‘ebola+[news outlet]’ or ‘election+[news outlet]’ with a Python script. So the data will reflect data recent news articles relative to November 1, 2014. The text was analyzed by counting specific terms in the articles and the total word count of each article. For those Python-orientated readers, I used the TextBlob package for the n-gram/count methods.

Getting an idea of what the collection of news articles looks like, there are about 100 articles per news outlet and topic, which is what Google returns on the first page of results. All duplicate articles and non-outlet domains [both these restrictions used URLS] are removed, so the number might be less than 100. I’m also scraping Google’s news search site meant for normal web use, so there are related article links attached to some of the results possibly pushing the total results over 100.

Article Count By Topic

Word Count Per Article

Generally, longer articles can provide more detailed information or complex arguments, and it will also be taken into consideration when calculating a term count for articles from the news outlets. The New York Times has by far the longest articles, while NBC News has the shortest.

Ebola Term Count

I assembled a count of certain terms associated with Ebola and averaged those across all the articles. Not surprisingly, out of the the terms I chose, ‘quarantine’ appeared the most with the most frequent mentions by Fox News. An associated term ‘Hickox’, the name of the nurse who was quarantined in NJ and ME, was also used often, but mostly by NBC News. Even though Fox News mentioned quarantining more often, it did not mention the name of the nurse nearly as often. Conversely, NBC News mentioned ‘Hickox’ more often than they did quarantine. Since this is just basic text analysis, I’m hesitant to draw too many conclusions on what the coverage bias means for the new outlet’s slant.

Election Term Count

Similar to the Ebola term count, I gathered similar information for articles about the midterm elections. There wasn’t much disparity in the frequency the articles used terms like there was for Ebola. The most notable pattern was that NPR had strikingly few explicit mentions of political parties or philosophies possibility indicating their strategy to avoid politicizing articles. Fox News and NBC News differed the most in their use of the word ‘liberal’, which is slightly pejorative in conservative circle. This could act as confirmation evidence of the outlet’s well-known slants, but I would insist on further investigation and better evidence.

For those curious about the calculations of the term metrics, it’s the TERM COUNT/ ARTICLE WORD COUNT averaged over all the articles for the outlet and subject, so the measurements on the graphs are essentially average term proportions per article.

This is just a basic, analytical look at news articles for coverage bias, which is associated with what a news outlet decides to cover or include in articles. More articles, TV transcripts, and social media headlines and comments could provide a richer data set for analysis. And hopefully, I can find emotionally charged words and evaluate opinions. All work for the future.