My Heart's in Accra

Ethan Zuckerman's musings on Africa, international development
and hacking the media.

10/08/2004 (6:46 pm)

A message from St. Dave?

Filed under: Media ::

Confession: I am, at best, a mediocre computer programmer. I spent most of the computer science courses I took in college, sitting in the back of the room, working on breaking the encryption scheme that protected the college’s copy of Pagemaker so I could have a copy to use for my graphic design business. At Tripod, my utility was not as a coder, but as someone who could run interference between the businesspeople and the techies.

But my current set of research interests has forced me to buy a pile of O’Reilly books, fire up BBEdit and bang out piles of inelegant but functional Perl. The experience is a humbling one. I know a couple of master programmers and have read enough of their code to know how inexpert my code is. I often have the experience of knowing there’s a better way to do something, but lacking the programming chops to execute correctly.

I was feeling roughly as stupid as I usually do as I attempted to write a program this morning that scraped headlines from CNN’s website. My intent was to write a CNN version of the program I wrote a few weeks ago which scrapes headlines from the New York Times and checks Technorati to see whether or not they’ve been blogged. Unlike the NYT, which has a text-only version with predictable layout, CNN’s HTML is ugly, unpredictable and nigh-unscrapeable. As the regular expression I was using to match URLs and headlines grew to fill an entire computer screen, I found myself thinking: “Geez, I wish sites would just put their content up in predictable XML formats so that I could just search for a tag that said ‘Headline’ and get the current headlines.”

And then the voice of Dave Winer spoke to me, and said, “Uh, you mean like an RSS feed?” (I’m serious. It sounded like he was in the next room. Yes, I’m sober, and not listing to any of Dave’s podcasts.)

Uh, yeah, Dave. Like that.

So I’m now writing feedreaders for BBC, the Washington Post and the Guardian, which will give me data on four of the five most blogged mainstream news sources. I’m somehow unsurprised that CNN doesn’t have its own feed, though people braver than I are running scrapers that turn CNN’s pages into RSS. (I may write a tool that uses one of these feeds, but at the moment I’m so annoyed with CNN, I’m not going to bother.)

If all goes well, I’ll have a page up in a couple of days with daily results from these sources. That assumes the voice of Dave doesn’t come back. My current top priority: a new tinfoil hat.

10/07/2004 (5:43 pm)

de-TRO-it

Filed under: Media ::

I was approached a few weeks back by a team of Norwegian and Austrian architects and designers who were putting together an entry for the “Shrinking Cities” competition. The competition, formally titled “Reinventing Urbanism”, is looking for “strategies and new modes of action for shrinking cities”. It’s sponsored by a bunch of German cultural institutions and is leading towards an exhibition in Leipzig next year.

The team that approached me was interested in the difference between Detroit’s physical and economic footprint, and its media and cultural footprint. In other words, while Detroit may have declined from ranks of largest US cities, it has a global cultural import beyond its size as the cool kids in Europe and Japan search the record bins for old Motown singles and Detroit techno. My european friends were interested in using some of my GAP tools to see how Detroit ranked in media attention in comparison to similarly sized cities, so we did a quick lightweight collaboration and turned out some interesting results.

[The thumbnail sketch of results: there’s lots more variance in media attention to cities than there is to nations. With national media attention, we see a reasonably strong correlation between population and attention and very strong correlation between economic indicators and attention. We only have pop. data for cities… and there’s basically no correlation to population. American cities get more attention than the capitals of the world, which get more attention than secondary cities around the world… and every city gets more attention than the dozens of large industrial Chinese cities that never get mentioned in the US media. I’ll publish this one of these days.]

Anyway, my pan-European friends incorporated our joint research into their proposal… and are one of the 12 winners of the first round of the competition. The cover page of their beautiful and fascinating proposal is here – I’ll link to the whole proposal once it’s available.

Like most cyberutopians, I’ve sung the praises of global cooperation and collaboration more than once. But it’s pretty rare that I actually find myself in such collaborations. Publishing research to my blog seems to increase the chance of this sort of serendipity… so expect to see lots more of it in the future.

Congrats, de-TRO-it – looking forward to seeing the show in Leipzig next year.

10/06/2004 (8:29 pm)

USAID versus the UN – Just How Bad a Tragedy is Darfur?

Filed under: Africa (older) ::

Peter Beaumont, writing for The Observer, accuses US officials, especially USAID’s Andrew Natsios, of “‘hyping’ Darfur genocide fears”. Beaumont’s assertion is based on discussions with the UN World Food Program’s survey team. The quotes in the article are quite a bit more nuanced than the headline:

‘It’s not disastrous,’ said one of those involved in the WFP survey, ‘although it certainly was a disaster earlier this year, and if humanitarian assistance declines, this will have very serious negative consequences.’

One might choose to read that statement as an indication that calls to action earlier this year from Natsios and others had the desired effect – a focus on Sudan that’s helped generate international aid. This points to a balancing act that relief organizations are forced to play when confronting disasters. You’ve got to make noise and get attention to get money… but if you do, there’s a danger that your disaster will prevent people from paying attention to other disasters, as well as a danger that you’ll be seen as “crying wolf” and contributing to the (alleged) phenomenon of compassion fatigue.

Aid workers raise two other concerns in conversations with Beaumont. While Darfur’s gotten a great deal of media attention and response, why haven’t similarly dire situations in Uganda and the Democratic Republic of Congo gotten similar attention? And what are the implications of the US declaring the situation in Darfur a genocide, then not moving to intervene?

The first question is fairly easy to answer. There’s a large community in the US – many of whom identify as evangelical Christians – who have been following Sudan closely for over a decade, documenting abuses against Christians in the country at the hands of the Khartoum government. This community worked hard to ensure that US legislators and press paid attention to reports of atrocities coming out of Darfur. They were joined by an unusual coalition of activists from the left and right who saw the genocide as a situation happening in real-time that could be prevented.

I think the rapid reaction to the situation in Darfur by this coalition has a great deal to do with US guilt over our failure to intervene in Rwanda. While militias were burning villages and government planes dropping barrels of metal shrapnel, it was easy to have the fear that the black population of Darfur might be wiped out in the space of a few months while the world watched and our government failed to act. While the conflict between the LRA and the Ugandan government, or the unrest threatening to become civil war again in the DRC have been extremely bloody, they haven’t been as sudden as the conflict in Darfur, and haven’t recieved the same Rwanda comparisons.

(I realize I’m not addressing one of the points of the Beaumont piece – the paranoid speculation that the US is focused on Darfur because the Bush regime wants to overthrow the Khartoum government. It takes little more than a quick glance at the left-wing newspapers, columnists and thinkers who’ve spoken about Darfur in the US to realize that if this is a Bush administration ploy, it’s been an unprecedented success.)

The second question is harder to answer. As an aid worker quoted in the story says:

I have no idea what Colin Powell’s game is, but to call it genocide and then effectively say, “Oh, shucks, but we are not going to do anything about that genocide” undermines the very word “genocide”.

The Clinton administration was very careful not to call the situation in Rwanda “genocide”, because doing so would compel the US – a signatory to the UN Genocide Convention – to act. Clinton spokespeople went to ludicrious lengths, using terms like “the G-word” to avoid characterising Hutu killing of Tutsis in Rwanda as “genocide”.

Perhaps it shouldn’t be a surprise that an administration which has been so dismissive of the UN and international cooperation is willing to use the term, but duck the associated obligations. As Marisa Katz observes in an excellent piece in the New Republic, “Rather than avoid action by avoiding the term, we can now avoid action by invoking it.” Katz points out that a close reading of the convention suggests a reading that compells a country to act only if the genocide is occurring in its own borders; under this reading, Powell is free to declare Darfur a genocide, pressure the UN to act and ensure that US troops will never set foot on Sudanese soil, while being legally in the right, while morally deep in the wrong.

In the meantime, USAID continues to assert that the worst is yet to come, projecting 200-300,000 deaths this year. The UN offers a figure of 50,000 dead so far this year, a figure USAID officials dismiss as “guesswork”.

Update: Sudan’s ambassador to the UN has challenged the US to send troops to Sudan if they really believe genocide is taking place.

10/04/2004 (4:34 am)

Gang Violence in the Favelas of Rio

Filed under: Media ::

Dear friend Kurt Shaw just sent a fascinating and disturbing email about impending gang violence in Rocinha, Rio’s largest favela. Kurt runs a nonprofit called Shine a Light which works with street kids throughout Latin America. He travels throughout favelas in the region collecting best practices from NGOs working with street kids and sharing them with other organizations. He’s in Rio now, waiting for all hell to break loose:

Yesterday, the headline in the Jornal do Brasil, Rio de Janeiro’s local paper, was “Vidigal prepares to invade Rocinha.” Apparently, an alliance of gangs had decided to send a press release to the papers as well as to the police before moving their troops through the city to invade another favela — just another surreal touch in one of the strangest urban wars in history.

Since this gang war began in the 1980s, the conflict has killed more people than any other civil war during the same time period (not including the Rwandan genocide) — more than Colombia, Liberia, Sudan, Afgahanistan — and it is happening a half an hour walk from the beaches of Ipanema and Copacabana, where beach soccer and sunbathing continue as they always have. The war is mostly about control of drug distribution points, with 3 rival, semi-organized gang networks and the police all unable to gain control.

Just as I wrote that last phrase, a rocket exploded over me, suggesting that the invasion is about to begin — at entrance and the top of each favela, a lookout sends up fireworks at the first sign of an invasion, and Rocinha is just on the other side of Corcovado from me. Monday night, the explosions were deafening, first fireworks, then guns and grenades, and 22 people died in Vidigal. During that same battle, an 8 year old boy was hit by a stray bullet not 200 yards from my apartment.

The amazing thing, I think, is the way that life goes on even as the bullets fly. I’m working with community organizers in some of the most violent favelas, and for them, all of this has become almost normal, something they want to stop, but in front of which they feel powerless. Instead, they do what they can, organizing day care for kids, trying to make schools a little safer, lobbying to get pedestrian bridges build over highways so they can get to work. And in the center of the city, the culture of Rio continues: the film festival this week, the Ballet of la Scala is playing at the Teatro Municipal, people swarm the beaches, and nothing can stop a soccer game in Maracan.

It is admirable, really, I think. Two men with a rifle paralyzed Washington a couple of summers ago; after Colombine, thousands of schools installed metal detectors; terror has changed the way the US sees itself. Here, where violence is much more real, people continue with their lives, drums samba on until dawn, and people play soccer on the beach as bullets fly over their heads into the ocean.

The situation in Rocinha gained media attention in April when clashes between reigning druglord Lulu and former gang boss, Dudu, got so violent they required intervention by military police. Brazilian President Lula was asked to consider sending in troops to secure the area; others have proposed a 3 meter high wall surrounding the favela, isolating it from more “desirable” parts of the city.

The violence ended with Lulu’s death at the hands of the police. Gang lords ordered all shops in the favela closed the following day, in mourning for Lulu.

Blogger EastWestSouthNorth has photos and news stories from April’s violence in Rocinha, which s/he describes as “Brazil’s Faluja”.

Google News currently shows 0 stories for search term “rocinha” – if you’re interested in following this story, you might do better to follow Viva Favela, a website dedicated to bringing news and perspective from Brazil’s hundreds of favelas, as well as Jornal de Brasil.

Update: Kurt reports that Dudu was likely killed in one of the more recent skirmishes – there’s a search on for his body. Still not a single story for “Rocinha” in Google News.

10/01/2004 (5:03 pm)

Fun with Google AdWords – or “Why Genocide’s Worth at Least a Buck a Click”

Filed under: Media ::

My latest foray into internet sociology has involved beating my head against the Google Adwords program. I’m interested in seeing what data researchers can extrapolate regarding search engine traffic and market interest in search terms based on the information Adwords gives to potential advertisers.

Google’s Adwords program works on an auction model similar to a second-price auction. (In a second-price auction, the winning bidder pays the second-highest price bid. It’s a model designed to minimize “the winner’s curse”, the tendecy of the winner in an auction to overpay.)

From the user’s perspective, it’s pretty simple. A bidder identifies the terms she’s interested in advertising on and the maximum price she’s willing to pay for a click. The system estimates how many clicks will be available for purchase at prices beneath her price threshold, the average she will pay per click and calculates where her ads will likely be ranked on the page. She can then change the maximum price she’s willing to pay per click (which will likely change the number of clicks available for purchase and her projected rank) and then set a price ceiling on the maximum she’s willing to pay in total per day.

Behind the scenes, some really fearsome math takes place. Google has a good sense – based on past performance – of how many searches for a given term will occur per day. They also can make certain assumptions about the clickthrough an ad will receive because they reserve the right to pull ads below a certain level, generally around 1%. (I’ve run a few test campaigns: Ad/search term combinations that get 2% or better are marked as “strong”; 1-2% gets you a “moderate”; under 1% and the system warns you, and then slows deliver of your ads.) And Google knows what everyone else has bid for the search terms in question.

Based on this information, the system starts allocating page impressions to each bidder. The high bidder gets her fill of top-ranked ads at a price one cent per click above the next-highest bid. The next highest bidder gets the remainder of available inventory in the top slot, and then runs her fill of ads in the second slot. For example:

Assume Google has 10,000 searches per day for “Africa” – this implies 100 clicks for sale at a 1% clickthrough on ads. Buyer A is willing to pay a maximum of $0.25 per click, and will spend up to $5 per day to run her ads. Buyer B is willing to pay up to $0.10 per click and has a $10 budget per day. The Adwords system solves some equations, and runs Buyer A’s ad on roughly 4,500 pages in the first position, charging her $0.11 per click, one cent higher than the next highest bid. Buyer B’s ad runs on the 5,500 pages that didn’t include Buyer A’s ad in first position, and in second position on the other 4,500 pages. Because no one else has bid, B’s ads should run at the system minimum cost – $0.05 per click. So while B’s ads don’t always appear in first position – their “average position” is 1.45 – they’re lots cheaper than Buyer A’s ads.

Unfortunately for anyone trying to build a mathematical model of this process, that’s not the whole story. The popularity of an ad matters as well. When Adwords calculates its ranking of ads, it multiplies the clickthrough rate by the maximum cost per click. Google’s rationale for this is that it benefits their users – more relevant ads move towards the top, like search results; it also benefits Google economically, as they have a disincentive to show poorly crafted ads, since they’re paid per click, not per impression. (There’s an excellent paper by Juan Feng, Hemant Bhargava and David Pennock that demonstrates quite elegantly why this is a far better way for Google to allocate ad placements than based on willingess to pay alone…)

So what can an internet detective glean from the numbers the AdWords system reveals to a potential customer? Create a dummy ad, and you can get a good guess at the number of search results available for a set of keywords per day. AdWords tells me that the optimum pricing for my keywords “Africa News” is $0.57, that I’ll pay $0.21 per click on average, receive 9.3 clicks and have an average position of 1.3. Trying a few other values for my maximum price per click, I get the following data set:

at $0.06 per ad, 7 clicks, $0.06 per click, position 2.7

at $0.12 per ad, 7.9 clicks, $0.07 per click, position 2.2

at $0.25 per ad, 8.4 clicks, $0.11 per click, position 1.8

at $0.57 per ad, 9.3 clicks, $0.21 per click, position 1.3

at $1.00 per ad, 9.5 clicks, $0.27 per click, position 1.2

at $5.00 per ad, 9.7 clicks, $0.36 per click, position 1.0

There’s a clear logarithmic relationship between maximum price and clicks available. (log(price)=n*log(clicks) fits a larger data set – prices for my usual set of keywords representing 180 nations – at R2=0.97). This suggests that there’s an asymptotic ceiling to the number of clicks Google will predict – once your average position has reached 1.0, Google is anticipating a situation where your ad is served on top of every page available, and increasing the amount of money you’re willing to pay per click is unlikely to increase the number of clicks available because Google simply can’t give you any more page impressions.

Turning this figure into total searches per day is an inexact process. I’ve run ads targeted to “Africa News” for the past week, paying a maximum of $0.05 per click – the ad has appeared on 2035 pages and received 41 clicks, and an average placement of 2.2. At 291 impressions per day, this ad would need to receive a 2.4% clickthrough to experience the 7 clicks AdWords projects for $0.06 – the ad has actually received 2% clickthrough. It’s possible that Google is using the actual clickthrough on ads targeted to “africa news” to calculate clickthrough and that other ads for “africa news” are doing better than my ad – it’s also possible that they’re using a fixed clickthrough of 2-3% as an estimator. Assuming that range, I can project that Google is experiencing 323 to 485 searches for “africa news” on a given day. (That figure seems depressingly low. If I’m somehow getting this very, very wrong, please let me know.)

It’s also possible that Google has vastly more searches for that term and only places my ad on some of the searches, but I don’t think so. I’ve told Google I’m willing to spend $5 a day – with only 5.8 clicks a day at $0.05, I’m paying $0.29 a day, or 5% of the money Google could extract from me if they delivered more ads. The only rational reason for Google not to serve my ads is lack of inventory… and they can create more inventory by adding more ads to the sidebar, lowering my rank, but selling me impressions. (Google’s not shy about this – “St. Lucia”, the most expensive search term I’ve found in my “nations” set get 8 ads per page of results. “Africa News” gets two – mine, and a website selling South African television programming.)

It’s also possible to glean something about the market value of a term from this data. Bid very little for ads and you can get a sense for just how competitive each search term is, by looking at what the projected rank is for your ad. At the minimum bid, $0.05 per click, you’ll be ranked near the top (1.3 – 1.4) for searches for “Solomon Islands”, Mauritania, “Burkina Faso”, “Vanuatu”, Swaziland, “Sao Tome” or Lesotho. The same bid puts you far down the page (3 – 3.4) bidding for Maldives, “Costa Rica”, St. Lucia, Croatia, “Dominican Republic”, Fiji, Italy, Belize, Bulgaria, Cyprus, Spain and Bahamas. The most popular terms feature a plethora of ads from rival travel agencies; the least popular are places you’re probably not traveling to any time soon. Market scarcity may also play a role – Maldives, St. Lucia, Croatia, Belize and Bulgaria all get fewer projected clicks per day than the median (66) for my set of nations. (Then again, ALL the unpopular terms get fewer than the median.)

When you first start a campaign on AdWords, Google suggests the maximum price per click you should pay – it appears to set this price at whatever will get you a projected average rank of 1.3. For St. Lucia, for instance, this is $4.32 per click. Before concluding that the system is a) broken or b) preying on the very dumb, there are a couple of reasons to set your keyword price that high. One is that you rarely pay full-freight – even with a maximum of $4.32, Google projects you’ll pay $1.38 on average for your St. Lucia ads – you’ll get most at under $1 per click, but your willingness to bid higher will ensure you end up top ranked even when someone else bids $3 per ad… The second is elegantly explained by this roofing contractor who is willing to pay $25 per click on Google: he closes 30-70% of the deals that come to him through Google, generally for hundreds or thousands of dollars. At that point, a $25 customer acquisition cost is a bargain… (Feng and her colleagues speculate that interest in ads decays exponentially depending on ad placement – the second ad gets only a fraction of the attention the first does, and the third a fraction of the second. They end up recommending that programs like AdWords reward ads that manage to get decent clickthrough in lower positions…)

Let’s posit a projected rank of 1.3 as the threshhold of sanity – i.e., not even Google, who is taking your money, thinks you should be willing to spend $5 per click on a St. Lucia ad. At $2.50 per click, the following nations are still below the sanity threshhold (i.e., you’re going to be ranked 1.4 – 1.8 if you’re willing to pay “only” $2.50 per click): Lebanon, St. Lucia, Maldives, Cyprus, Bulgaria, Costa Rica, Panama, Barbados, Spain, Jamaica, Angola, Sudan, Dominican Republic, Mauritius, Italy, Malta, UK, Iceland, Portugal, Mexico, Turkey, Macedonia and Peru.

So why does the free market think these nations are worth so much per click? Some are obvious: St. Lucia, Costa Rica, Jamaica, Italy and others are expensive vacation destinations – a user clicking on the ad might be prepared to pay thousands for tickets or a hotel. Others – Mexico, Panama, Dominican Republic, Bulgaria, Lebanon – have large expatriate populations who search for flights home, discount phone cards or financial remittance services.

Sudan’s the really weird one. (Angola baffled me for a moment, before I followed a few links and discovered that advertisers were encouraging me to travel to Angola, Indiana.) Search for Sudan on Google. You’ll get a results page with eight ads, the maximum Google puts on a page. Every ad is from a nonprofit organization. Save the Children, Care USA, Doctors Without Borders, Amnesty International and Mercy Corps are running straightforward “We work in Sudan – support our work” ads; American Progress Action Fund and National Public Radio are running ads for their Sudan information sites. The top bidder is “Global Nomads Group”, an NGO which aims to connect children around the world through videoconferencing – they’re also the leading bidder for “Rwanda”.

The rank/price relationship for “Sudan” implies that one or more advertisers either are receiving an excellent clickthrough rate, or are paying well over a dollar per click for their ads, likely both. This reveals an uncomfortable truth about the relief business – on those rare occasions a humanitarian crisis gets global attention, aid agencies have to take advantage of the situation to raise money.

Doctors Without Borders’ website lists projects in 85 countries that they’ve worked on in the past few years. It’s pretty rare that the ongoing strife in Burundi gets international attention – the money that comes in from donors concerned about Darfur supports a program for rape survivors in Bujumbura, HIV prevention efforts in Malawi and anti-malarial efforts in Nigeria. The situation is analagous to the controversy over the Red Cross’s “Liberty Fund”, where the organization announced an intention to use some of the money donated to support the victims of 9/11 to support Red Cross projects around the country – Red Cross CEO Bernadine Healy ended up resigning over the public outcry. Jim Moore has raised concerns about Bono’s DATA (Debt AIDS Trade Africa) project buying Sudan impressions, advertising a site that had little to do with Sudan. (DATA no longer appears to be buying the “Sudan” keyword.)

While it’s interesting (and soul-crushingly depressing) to discover bidding wars over keywords associated with human suffering, I’m focused on the idea that I can pull data about web users’ interest in different subjects out of this data. My data collection holy grail would be an algorithm that allowed me to estimate how much money is spent on each keyword based on click availability and predicted rank at different maximum click levels. Unfortunately, the math is way beyond my capabilities – any game theory/auction economists out there want to give me some pointers?

« Previous Page