Today, the Berkman Center is relaunching Media Cloud, a platform designed to let scholars, journalists and anyone interested in the world of media ask and answer quantitative questions about media attention. For more than a year, we’ve been collecting roughly 50,000 English-language stories a day from 17,000 media sources, including major mainstream media outlets, left and right-leaning American political blogs, as well as from 1000 popular general interest blogs. (For much more about what Media Cloud does and how it does it, please see this post on the system from our lead architect, Hal Roberts.)
We’ve used what we’ve discovered from this data to analyze the differences in coverage of international crises in professional and citizen media and to study the rapid shifts in media attention that have accompanied the flood of breaking news that’s characterized early 2011. In the next weeks, we’ll be publishing some new research that uses Media Cloud to help us understand the structure of professional and citizen media in Russia and in Egypt.
With our relaunch of the site, many of our most powerful tools are now available for your use. We’re hoping Media Cloud proves useful to anyone interested in asking questions about what bloggers and journalists are paying attention to, ignoring, celebrating or condemning.
We hope the tools we’re providing are a complement to amazing efforts like Project for Excellence in Journalism’s News Coverage and New Media indices – we consider their tools the gold standard for understanding what topics are discussed in American media. PEJ works their magic using talented teams of coders, who sample different corners of the media ecosystem to find out what’s being discussed. We use huge data sets, algorithms and automation to give a different picture, one focused on language instead of topic.
At its most basic, Media Cloud gives a picture of what journalists and bloggers and writing about by counting the words used in recent stories. Above is a cloud of language used in our set of political blogs during the week starting on Monday, May 2nd. We can see language about the US raid on Osama bin Laden’s compound, including obvious words like Abbotabad, Bin Laden and raid, as well as words that suggest particular interests within those stories: helicopter, SEALs, intelligence, interrogation, Pakistan. Even with a major story dominating discussion, we see glimpses of other issues, like the US Congress Caregiver’s Act and speculation that Indiana governor Mitch Daniels will enter the Presidential race. You can click each word in the cloud and see what sentences in different blogs contained the term in question, how often it was used, and how that source compared to others.
Comparison is where our tool is most powerful. The cloud above shows the differences between words used in left and right wing blogs during the same time period. We start to see differences in what aspects of the Bin Laden story bloggers focused on. Bloggers on the left used the words “torture” and “waterboarding” while bloggers on the right use “interrogation” and “terrorist”. Other comparisons are less obvious – we see more discussion of debate about releasing raid photos on the right than on the left, and a discussion about expanding the Hyde Amendment (which affects congressional funding for abortion) on the left.
We’re also able to make general statements about the similarity or difference in word usage in these comparisons. While the left and right may both be focused on the raid in Pakistan, the similarity score (near the bottom of the word cloud, towards the right) suggests a larger disparity in agendas than we saw looking at these two sets of media a year ago, when both sides were talking primarily about Arizona’s tightened immigration laws. I’ve been taking an in-depth look at similarity scores to understand how media attention can shift at moments of international crisis, and how the recent, internationally-focused media cycle may differ from the news we often get in the US.
What our tools let you do with Media Cloud are really just the tip of the iceberg. The code behind our system is published under an open source license, so other researchers can build systems to monitor media in other countries and other languages. (We’ve got a system monitoring Russian media and blogs that you’ll hear more about soon.) We are publishing huge sets of data that include information on word frequencies in different stories for researchers who want to analyze American media without collecting their own data. And we’re hoping to collaborate with researchers around the world who’d like to use our tools and data to ask and answer pressing questions about what’s covered and how.
This new release is thanks for the hard work of Hal Roberts, architect of the project, David Larochelle, developer extraordinaire, Zoe Fraade-Blanar, whose skill at interface design has made our work vastly more useable as well as more attractive. Thanks to them and everyone else involved with the Media Cloud project. Hope you’ll check our work out and let us know what you discover.