Weblogs and “selective uptake”

As most of my regular readers (all three of you) know, I’m obsessed with what the US media chooses to cover (and not cover). As I’ve accused friends in the media of overfocusing on events that concern wealthy nations at the expense of African, Central Asian and other developing nations, I’ve gotten a consistent response: “That’s what people want to read about.”

Compulsive researcher that I am, I’m interested in figuring out if that’s true. It seems to me that blogs offer one way of figuring out what people are actually interested in – if someone chooses to write about a topic, it reflects a certain degree of interest. Given the large number of people who use blogs to feature news stories they’ve found interesting, to some extent, blogs represent a selective filter of what readers found interesting in the news.

(There’s at least two valid objections to the previous statement. For one thing, I suspect most bloggers write primarily about personal matters, not about the news, so the filtering effect is probably dampened by the small percentage of people who use blogs this way. Second, there’s a small number of people who use blogs to do original reporting – rather than filtering what other journalists are creating, they’re creating their own journalism.)

So how does this selective filter work? Are bloggers more interested in some topics than others? I’m starting to think about experiments to answer that question. The good folks at Intelliseek have given me access to their Blogpulse engine, which has let me see how many mentions a given set of keywords has recieved on the blogs Blogpulse tracks within a set timeframe. The database also includes information on how many blogposts occurred in that timeframe, so I can make reasonable guesses about what percentage of blogs mentioned a phrase in a given time period.

I’m not able to get nearly as rich data from Google News. (Google’s API still covers only the search engine proper. Grr.) But past experience suggests that the vast majority of searches return information from the past 14 days, allowing me to align timeframes with a Blogpulse search. While I can’t guess at what percentage of results these stories represent, I’m able to do a side-by-side comparison of hits. Here’s the results of a couple of searches I’ve run recently…

Term Subject Google hits Blogpulse hitst Blogpulse % G vs. B
Wassef Hassoun Current events 8030 542 0.04% 14.82
Sudan Current events 11300 1453 0.11% 7.78
Darfur Current events 6250 583 0.04% 10.72
George Bush Politics 47100 37323 2.71% 1.26
Dick Cheney Politics 14800 7120 0.52% 2.08
John Kerry Politics 45200 16251 1.18% 2.78
John Edwards Politics 12400 4621 0.34% 2.68
Michael Phelps Sports 1430 177 0.01% 8.08
MPLS Tech/Sci 723 51 0.00% 14.18
Cassini Tech/Sci 4020 926 0.07% 4.34
Firefox Tech/Sci 384 2676 0.19% 0.14
Michael Jackson Entertainment/Media 9320 4649 0.34% 2.00
Michael Moore Entertainment/Media 12400 17298 1.26% 0.72
John Negroponte Current events 3940 165 0.01% 23.88
Lance Armstrong Sports 7590 1133 0.08% 6.70
Euro 2004 Sports 35,200 3941 0.29% 8.93
Sharapova Sports 7020 898 0.07% 7.82
Total hits in period 1377764

The final column – Google versus Blogpulse – is the interesting one, I think. On items that got a lot of attention in mainstream media, but very little attention in the blogosphere, the number is large (very few bloggers seem interested in John Negroponte, the US’s new ambassador to Iraq, while lots of newspapers, especially in the Middle East, are asking interesting questions about his past.) When the number is low, more bloggers are talking about the issue (while there are only a handful of news stories talking about the new Firefox browser, 0.19% of blog posts in the last two weeks mention the software.)

(It would be interesting to know what the ‘equilibrium point’ is between Google and Blogpulse – i.e., at what ratio is a story equally popular in the aggregate news media and in the blogosphere – but to calculate that, I’d need to know the number of entries Google News is tracking, or have a keyword guaranteed to have the same percentage representation across the two sites…)

I’d love to have a list of “top 100 news stories” I could run through this process every day, tracking uptake by bloggers from the mainstream media – anyone have good thoughts on generating this list? It’s kinda the mainstream media version of the Daypop 40… I’d also be grateful for suggestions of interesting, timely (i.e., breaking in the last week) stories to check out and see how Google and Blogpulse cover them.

(This is part of my new “open research” philosophy, where when I don’t know what to do next, I post it on my blog and beg for help. My next post explains why I have at least a modicum of belief that this method actually works…)

