Amazon, the developing world, diet books and fun with data

I had a great conversation recently with Martin Wattenberg, quite possibly the most creative data visualization guy working today. (If you think that’s hyperbole, check out his brilliant Shape of Song and Map of the Market) We were bonding over our fondness for large data sets, especially those freely available on the web. (Had Sir Mix-A-Lot been a data geek, his hit likely would have been titled “Baby Got Data” – “I like big sets and I cannot lie…”) Martin mentioned that Amazon’s Sales Rank makes it possible to guess at how many books were sold daily on a particular topic, which, in turn, makes it an interesting proxy for consumer purchasing behavoir.

It just so happens that I’d been looking for a proxy for purchasing behavoir to test out a theory about foreign news coverage. My research on media attention suggests that mainstream American newspapers don’t give a whole lot of coverage to developing nations. Instead, they disproportionately cover wealthy nations, and poor nations where wealthy nations are militarily involved. (More on this, in brain-numbing detail, available at

If you share my naive belief that journalists are supposed to report the most “important” events regardless of who they affect, you might conclude that journalists, as a whole, are falling down on the job when they fail to cover the war in the Democratic Republic of Congo, the rise of Jihadist Islam in Central Asia or human rights abuses in southeast Asia. Ask a newspaper editor, though, and you’ll often get a radically different response: they’re the good guys, trying to make their readership care about important stories, even though covering developing nations loses them readers as most readers don’t care about developing world stories.

It would be interesting to test this journalistic retort by examining actual consumer behavior – when you ask people to reach in their wallets and pay for information on different topics, what do they spend money on?

(The answer turns out to be “diets”, but more about that in a moment…)

Turns out that a number of smart economists have already considered this question. Austan Goolsbee and Judith Chevalier published a nifty paper last year called “Measuring Prices and Price Competition Online: Amazon and Barnes and Noble” that, amongst other things, concludes that Sales Rank and actual book sales on Amazon are related through a Pareto distribution. Conveniently, my tools to relate news stories and national wealth assume a Pareto distribution… which means I was able to quickly scrape together a set of tools to look at the relationship between national population, wealth and books sold on Amazon.

(Props to Amazon, by the way – they really get web services. It took a total of five minutes to get a developer’s key from Amazon, and five more minutes to write a simple application to pull results from their database. And, unlike Google, which restricts you to writing applications that hit their database 1,000 times a day, Amazon asks you to hit them only once per second, or up to 86,400 times a day.)

So now I’ve got a cute little set of tools that ask Amazon for all the results for a particular keyword, estimates the number of books sold daily from the Sales Ranks and calculates the value of these books. It takes about 10 hours to download information on the 820,286 books (very preliminary results available here.) I’m still crunching the results, but it seems that:

– The number of books written on a topic is strongly correlated (R^2=0.75) to a nation’s GDP, and weakly correlated to its population (R^2=0.42). The correlation between books written and GDP is one of the strongest I’ve ever seen, rivaling the most correlated media sources, like CNN.

– When you look at actual purchasing behavior, both these correlations weaken. (There’s very little difference between correlations to books sold and to book value, suggesting that price pressure isn’t a huge factor in purchasing on these particular titles.) GDP correlation drops to R^2=0.60, and Population correlation to R^2=0.35, likely below the threshhold of statistical significance.

– While this general pattern (strong correlation to GDP, weak correlation to Population) looks a lot like the relationships we see in analyzing news media, the details are quite different. Specifically, the top results for book purchases versus stories on Google News (a pretty good proxy for English-language news reporting over the past 14 days) are quite different:

Google News Hits		Amazon Books Sold Value

Iraq					China

France					United Kingdom	

Russian Federation			France

United Kingdom				Italy	

Germany					Japan

Israel					Germany

Afghanistan				India

Japan					Canada

Mexico					Philippines

China					Mexico

In other words, folks aren’t reading/watching the latest news on Iraq/Afghanistan/Israel and logging on to buy books on these topics – something else seems to be driving book purchasing. (My guess: tourism. Working on testing this statistically.)

– While newspapers and TV news channels do not appear to be giving viewers exactly what they want (Amazon sales value correlates to Google News results at R^2=0.65, strong correlation but less strong than the correlation to GDP…), they’re also not spitefully failing to provide news to a population desparate for information about developing nations. There’s little evidence to suggest that purchasers are any more interested in the developing world than newspaper editors.

– While Amazon customers are sufficiently interested in other countries to purchase 10,383 titles a day, worth $245,342.76, this interest pales in comparison with their interest in dieting. Amazon customers purchase 10,992 titles on dieting, with a sales value of $150,967.19. Lest you find that figure too heartening, let me point out that the 10,992 diet books are the result of a single search for the keyword “diet”, while the 10,383 books on foreign countries are the result of 183 separate searches summed together. Clearly, the best way to get people to pay attention to my research would be to write a diet book. Keep your eyes peeled for the “Johann Galtung Foreign News Flows Diet Guide”, appearing on bookshelves soon.

Seeing no reason I should be the only one to have fun with data, I took one of the tools I built and have thrown it up on a webpage. Feel free to try it out – it invites you to enter a search term or phrase, and select how exhaustive you’d like the search to be. Then it pings Amazon, calculates books sold and their value and feeds this information back to you. It’s fun for the whole family! What sells more books: the Civil war or the Vietnam war? The Green Bay Packers or the Seattle Seahawks? Brittney or Beyonce?

An informal challenge to my blog readers (all three of you): using the “quick” search (which searches the top 50 results), my personal high value is $106,926, using the keyword “diet”. Find a keyword or phrase that gets a better total, and I’ll send you an Amazon gift certificate. No use of literary genres, please – “fiction”, predictably, will knock the ball out of the park.

(Standard disclaimers: Amazon doesn’t endorse my work in any way, shape or form, or vouch for its accuracy. I make no guarantees of the accuracy of information returned by this tool – indeed, I guarantee that, due to the limitations of extrapolation, that my numbers are inaccurate, though in an interesting and predictable way. Finally, I reserve the right to end the above contest at any time, especially before it threatens to bankrupt me.)

This entry was posted in ICT4D. Bookmark the permalink.