How blogs selectively amplify the New York Times

As regular readers of my blog know, I’m interested in trying to paint a statistical picture of media coverage of the developing world. Last year, I found some pretty good evidence that the majority of media sources I was examining were far more likely to report on stories in wealthy nations than in poor ones. As I started talking with friends who worked in mainstream media, I got a fairly consistent explanation: “We report on what our audience wants. And, generally speaking, our audience doesn’t care about the developing world.”

This is the kind of explanation that begs for empirical testing. (Well, it does if you’re me.) Do newspapers and TV stations actually know that their readers aren’t interested in these stories? Or are they guessing?

So I’ve been looking for good proxies that I can use to measure audience interest in the developing world. One I’m interested in is book purchasing – I’ve been looking at Amazon sales rank statistics to make educated guesses at the number and value of books Amazon users purchase on various topics. And I’ve been looking closely at what countries people mention in their blogs, using data from Blogpulse and Daypop to map what parts of the world people are speaking about.

I decided to try a slightly different tactic earlier this week, and banged out a little code that scrapes headlines from the New York Times’s website, and checks to see which articles are subsequently mentioned in the blogosphere. My guess was that the stories bloggers would “amplify” would predominantly be ones regarding US electoral politics and technology, especially consumer and Internet technology.

What follows below are the first results – of the 718 headlines that ran on the New York Times website my scrapers found in the last three days (these include stories published much earlier, but still appearing on the site), only 58 had found their way into Blogpulse’s database. (I’m working on a Technorati version at the moment so we can compare and check Blogpulse’s comprehensiveness.) My script looks for the URL, rather than the headline, so the observation: “The Times ran a story today on Chechnya” won’t be counted unless it links to a specific headline’s URL. (I’d love thoughts from folks on whether, by searching for URLs like “http://www.nytimes.com/date/section/article.html”, I’m missing a group of otherwise-blogged URLs…)

(Update – thanks to Kerim Friedman for pointing out that many people link to http://nytimes.blogspace.com/genlink, instead of the NY Times site – I’ll modify scripts later today to look at this…)

Of the twenty stories blogged more than once, only two were substantially about events outside the US. One involved US/Israeli espionage, the other, terror in Chechnya. US electoral politics dominated, with 11 stories directly related to the US presidential race. Tech had a weaker showing than expected, with three stories; surprisingly popular were stories that touched on education in the US, with four stories, including the most blogged story, on charter schools.

A listing of the 58 stories blogged at least once follows below. I’ll be running this experiment for the next month or so, and hope to expand it to follow other media sources as well – if I get the bugs worked out, I’ll start publishing regular results on my research page. Any and all insights you might have are welcome…