My Heart's in Accra

Ethan Zuckerman's musings on Africa, international development
and hacking the media.

09/30/2010 (7:03 pm)

links for 2010-09-30

Filed under: del.icio.us links ::

09/29/2010 (7:03 pm)

links for 2010-09-29

Filed under: del.icio.us links ::

09/29/2010 (1:47 pm)

Cynthia Dwork defines Differential Privacy

Filed under: Berkman ::

Cynthia Dwork is a theoretical computer scientist who is known for applying the theory of her field to problems that seem like they might not be best solved by computer science. One of those problems is privacy, the topic of her lunch talk at Berkman titled, “I’m in the Database, But Nobody Knows”.

When we think about loss of privacy, we often think about threats like data theft, phishing, virii as well as the changing privacy policies of sites like Facebook. There’s another risk – the danger of data leakage through data analysis.

We often want to analyze large data sets without knowing the identity of individuals represented in those sets. Dwork distinguishes between first-tier and second-tier examples. First tier examples deal with potentially personally identifying data (census data, medical outcomes data, epidemiological data. One apparently innocuous data set is vehicle braking data. It’s used in aggregate to design brake systems – used specifically, an insurance company might use it to identify a specific dangerous driver. Second-tier data involves analyzing preferences in aggregate, training an advertising classifier (to determine which people respond to which ads), or recommendation systems like Amazon or Netflix. Data leakage is less clear in these systems, but possible.

Dwork describes the issues with making data accessible to analysis without allowing release of information about a specific individual as a “very pure privacy problem.” In other words, it’s difficult to solve, even if a curator sits between an analyst and the database, even if the curator’s behavior is unimpeachable and the data is secured in a vault.

Here’s why:
We have a database that includes sensitive medical information, including whether people are susceptible to certain genetically-linked diseases. We want to allow queries on this database, but we don’t want anyone to be able to discover the attributes of a specific individual.

Dwork proposes that we query for the susceptibilities associated with female distinguished scientists who work at Microsoft and have very curly hair… i.e., a set of one. If we don’t prohibit these queries, we discover she’s susceptible to Sickle Cell. So we put in place a policy that won’t return very small sets of information, i.e., information on individuals.

The attacker might respond by asking how many Microsoft employees have the sickle cell trait, then asking how many Microsoft employees who are not female distinguished scientists with very curly hair who have the trait. Take the difference between those queries and you’ve got the answer to whether Dwork has the trait – limiting to large set queries alone doesn’t protect privacy.

Another possible solution involves adding random noise to the answer the database gives, at a level low enought that it shouldn’t distort results, but it should make it possible to pull individuals out of the set. The problem here is that we can average responses to repeated queries and obtain a result that converges to true answers. She argues that it’s computationally impossible to detect repetition – the problem is undecidable – and therefore we can’t simply limit queries to defeat this method. (Salil Vadhan walked me through this after the talk – the difficulty is that, with a sufficiently rich query language, it’s possible to encode the same query many different ways, and it’s not possible to reduce all those queries to an identical query and prevent it from being asked iteratively.)

Finally, there are cases where having the curator of a database decide not to answer a question discloses information.

Because it’s so difficult to build a database that’s analyzable and doesn’t release an individual’s data, there’s been a litany of cases where data has been revealed:

- In 2006, a journalist identified an individual in Georgia based on analyzing “anonymized” AOL search queries
- In 1998, a graduate student discovered Massachusetts governor William Weld’s medical record by linking voter registration data and medical records – the overlap of zipcode, birthdate and sex allowed private information to be discovered via this linkage attack
- In 2006, Netflix stopped their second recommendation challenge because a researcher was able to identify an anonymous participant by linking recommendations to behavior on IMDB.
- In 2008, Nils Homer and colleagues demonstrated that you can use certain statistics about single nucleotide polymorphisms (SNPs) to tell whether someone has participated in a genome worldwide associated study. Participation in those studies usually indicates diagnosis with a disease, and so could be very sensitive information.
- In 2007, Lars Backstrom (working with Dwork) published a theoretical paper on social networks, suggesting that in an anonymized social network, 12 participants in the network could work together to determine whether any two people in the network were friends.

Are these failures of privacy? Yes, but they’re also failures of definition. Definitions of privacy tend to fail to cope with auxiliary information – database privacy policies weren’t intended to deal with information in other, future databases. And the definitions, in general, were “syntactic and ad-hoc”, and “don’t capture a robust, global notion of privacy.”

In 1977, statistician Tor Dalenius suggested an “ad omnia” guarantee of database privacy: “Anything that can be learned about a respondent from the statistical database can be learned without access to the database.” In other words, it’s possible to get personal data from a database, but only if the person is an extrovert and puts their information on the web. This is a great definition… unfortunately, it’s unachievable.

To explain this unachievability, Dwork offers a parable, which she discloses is an “Admittedly Unreasonable Impossibility Proof.” Assume Pamela Jones of Groklaw doesn’t want her height disclosed. However, she’s a participant in a database that reveals the average heights of population subgroups. Someone discloses the information that Paula is two inches shorter than the average Swedish woman. (This is the unreasonable part of this proof, as it’s unclear where this information comes from.) If you have access to the database, you can trivially learn her height. This, Dwork says, shows a hole in Dalenius’s definition – we can learn sensitive information not by querying about her in the database, but by issuing another type of query. Dwork argues that Jones loses privacy whether or not she is in the database – the leakage comes from the stated and desired utility of the database.

Dwork offers a different definition for database privacy – differential privacy. The outcome of any analysis is, essentially, independent of whether an individual joins or refrains from join in the dataset. We can define differential privacy by calculating the probability of release occurring if an individual is in the dataset versus someone not in the dataset. Epsilon is the difference between those privacy functions, so a small epsilon is desirable.

The useful property of this definition is that it neutralizes linkage attacks. We’re no longer concerned with what the adversary can bring in from the outside. Differential privacy gives us resilience to all auxiliary information.

This, she tells us, is achievable. There are low-error, high privacy differential privacy techniques that exist for many problems in
datamining. We can reveal data that supports analysis through association rules, decision trees, clustering, contingency tables, histograms, synthetic data sets, machine learning and recommendation systems in ways that are differential privacy compliant. Indeed, there are even programming platforms designed to allow us to analyze our data sets and protect ourselves from disclosure using a differential privacy model.

Why does this matter? Dwork tells us that we should worry about mission creep. Microsoft recently hosted an H1N1 swine flu self-assessment site. If you visit the site and provide some diagnostic information – Do you have a fever? How long have you been sick? Where have you been recently? – it’s possible to learn a great deal from this data, as John Snow discovered in 1854. Microsoft’s privacy policy leaves something to be desired – Dwork describes it as “Your data is confidential, unless we’re required to disclose by law, or we think we need to, or…” There’s a tendency, once you have this data, to use it for other purposes. “People always come up with new uses and say ‘Think of the children!’ What can we give a curator to resist that sort of pressure?” Dwork asks.

“Never store the data in the first place,” she argues. That way you can’t respond to subpoenas. And there’s a set of algorithms she’s working on – pan private streaming algorithms – that store information on data streams in a way that inherently keeps the patterns of appearance of any individual undetectable, protecting against mission creep, subpoena, and intrusion.

There are downsides to differential privacy – the techniques tend to hide outliers within a data set, so sets make it difficult to study outliers. Privacy erodes over multiple analyses – cumulatively – and also over the inclusion of an individual into multiple databases.

The question Dwork and colleagues are currently worried about is the size of epsilon… which is to say, how much more dangerous can it be to be in the database rather than not in the database? To answer this question, she and colleagues are modeling a “lifetime of exposure against worst-cases” to offer a formal definition of that a “reasonable” value might be.

To ensure that a differential privacy solution is useful, Dwork needs to work closely with social scientists to understand what questions we might be asking about social networks. We know that removing individuals from social network graphs can sharply change the properties we’re trying to understand in these networks – which implies that we really need solutions that allow us to study all members in a network without revealing private information. If we know more about what social scientists want to know about networks, perhaps that will shape the algorithms we build.

There’s also a way of analyzing differential privacy that considers the harms that can come from the exposure of data. A database that shows us that smoking causes cancer can cause harm to an individual located in the database – smoker S in the database might be harmed with higher insurance premiums. But, Dwork argues, since learning that smoking causes cancer is the point, the database might lead smoker S to enroll in a smoking cessation program. Who gets to decide whether that tradeoff of harms and benefits is appropriate in first and second-tier cases of data exposure?

The benefits of defining differential privacy, Dwork tells us, are that we’ve got an ad omnia definition of privacy that’s achievable, and frequently achievable in a way that gives us highly accurate data usage. You can program using these techniques using a privacy-preserving interface. This might present a path forward for addressing real-world problems involving privacy in behavioral targeting, as well as in retrieving data from a database.

Dwork offers examples of possible attacks on privacy via behavioral targeting. She asks us to imagine an ad targeted to a strange fetish – she offers women with their legs in casts as an example. If you click on the ad, you’re communicating your interest in women’s legs in casts. But “what if the ad is targeted not just to cast enthusiasts but to wealthy, whale loving long leg cast enthusiasts?” The user knows the ad was targeted to characteristic 1, but not the second characteristic – it may turn out that the set of wealthy, whale-loving cast fetishists is small enough that it’s a single person, and that clicking on the ad reveals your identity.

This might be worth worrying about even if you’re not a whale-hugging cast fetishist. It raises the question of whether ads should be targetable in terms of race, for example. Should poor children see different ads that rich children, given that we believe ads have a role in forming aspirations. The Wall Street Journal reported in August that Capital One financial used tracking information from company [x+1] to steer minority customers to loans that carried higher rates. In principle, Dwork tells us, the law seems to allow credit companies to target products based on a customer’s browsing history… but what if that history encodes race? Does this fall into the realm of unethical or illegal targeting of products based on race?


This was one of the more challenging talks to blog of the presentations I’ve heard at Berkman. It’s very clear that Dwork is doing important work in the field and that the questions she’s tackling are ones that are fascinating to the lawyers, librarians, sociologists and whatever-I-am that fill our conference room. But there’s a huge gap between the mathematic and algorithmic work she’s doing and the issues we generally talk about within a Berkman setting. It’s a good challenge, and an excuse for me to lean on my colleagues from CRCS to understand the issues at play. Apologies for stupid errors I’ve made in attempting to blog this talk – Dr. Dwork’s work is worlds away from what I work on and I may be misunderstanding some of her points.

09/27/2010 (8:22 pm)

Could Canada help release Hossein Derakhshan?

My friend, Canadian/Iranian blogger Hossein Derakhshan has been imprisoned in Iran since November 2008. A few days ago, we got the disturbing news that Tehran’s prosecutor would seek the death penalty for Hossein’s alleged offenses: “collaborating with enemy states, creating propaganda against the Islamic regime, insulting religious sanctity, and creating propaganda for anti-revolutionary groups.”

For many people who know Hossein, his decision to return to Iran in 2008 was a surprise. His previous and widely documented travel to Israel made it likely that he would be prosecuted on returning home. Maziar Bahari, a Canadian/Iranian journalist who had previously been detained in Iran (in part because of an interview he did with The Daily Show’s Jason Jones), helped explain Hossein’s miscalculation. In an interview with On the Media, Bahari explains that Hossein may have been promised by the Ministry of Intelligence that he’d be able to return to Iran safely. (Hoder began his blogging career supporting reformist politicians. Later in his career, he became concerned that the Bush administration would invade Iran, and he became an outspoken supporter of Ahmedinejad.) Once he returned to Tehran, members of the Revolutionary Guard chose to arrest Hossein, perhaps to send a message to anyone who would use digital media to organize politically. Bahari describes Hoder’s situation as “a clear case of the internal battle between the Revolutionary Guards and the Ministry of Intelligence.”

What’s most helpful about Bahari’s interview is that he offers insight on what might help with Hossein’s release: “In my case, and in case of many other dual citizens, including Roxana Saberi, if the second country of that person – in my case, Canada, in Roxana’s case, the United States – if they’re vocal about their citizens, then the Iranian government listens and reacts to the actions of the foreign government.”

Canada hasn’t been especially vocal in pushing for Hossein’s release. My friend Cyrus Farivar has been regularly calling Canada’s Department of Foreign Affairs, and getting unsatisfying answers. His most recent exchange is here.

If Bahari is right, he’s suggesting a possible strategy for the groups trying to advocate for Hossein’s release: pressure Canada. Specifically, the Honorable Lawrence Cannon, the minister of Foreign Affairs. Contact information for his office is here. My guess is that inquiries from Hossein’s fellow Canadians, to the Minister as well as to individual members of parliament would carry more weight than inquiries from citizens of other nations. For my Canadian readers, I hope you might consider contacting your government in the hopes that they’ll more actively seek release for your fellow citizen.


Update: The Iranian government has sentenced Hossein to 19.5 years in prison. While everyone is happy that he’s not facing execution, this is an absurd and unfair sentence for a man who did nothing more than share his thoughts online. I hope readers who are inclined will continue to pressure the Canadian government to act on his behalf.

09/27/2010 (7:02 pm)

links for 2010-09-27

Filed under: del.icio.us links ::

09/27/2010 (5:21 pm)

Learning the Lessons of Bell, CA

Filed under: Blogs and bloggers,Media ::

I was trying to figure out what “civic media” is the other day. My friend Henry Jenkins helped coin the term – as well as co-founding MIT’s Center for Future Civic Media – and he told me that civic media is the media citizens need to make decisions about their communities. More provocatively, Henry suggested that civic media might refer to “all the community information we get after we lose newspapers.”

Henry wasn’t being triumphalist about the death of the newspaper – just articulating his fascination with the various forms of media that are emerging to allow communities to discuss, debate and decide in a digital age. A caution for everyone who’s fascinated by new civic media, myself included, is that newspapers continue to do a great deal of the critical work in reporting the news and information citizens need.

Recently the city of Bell, California has been in the news, not just in the LA area but across the country. Bell is a small city in the southeastern suburbs of Los Angeles, whose population of roughly 37,000 is majority Latino. In late July 2010, Bell became infamous for the extremely high salaries city officials were being paid. Robert Rizzo, the city’s Chief Administrative officer, collected a salary of almost $800,000 a year, with a benefits package that totalled $1.5 million annually. (The Los Angeles County Chief Executive, by contrast, earns just under $340,000 annually.) Many other city officials were earning inflated salaries, and all but one city council member was making nearly $100,000 a year. To pay the exorbitant salaries, Bell residents were paying the second-highest property tax rate in the LA area, a surprisingly high rate for a city with few services. As stories of corruption in Bell came to light, Rizzo, the council members, the mayor and other city officers were arrested for misappropriation of government funds.

The corruption in Bell was vigorously reported by the LA Times, specifically by Ruben Vives and Jeff Gottlieb. David Folkenflik, reporting for NPR, explains that Vives discovered the story when reporting on neighboring Maywood, which had proposed contracting certain of its city services to Bell, including policing. He talked to Gottlieb, who remembered a pending investigation on pay to Bell’s city council. The two began demanding documents from the city and discovered an amazing tale of corruption and government malfeasance.

Writing in the LA Examiner, Nicholas Pell (sorry about that, Nicholas!) makes the case that the Bell story points to the ongoing importance of newspapers capable of doing high quality investigative reporting in an age where such newspapers are endangered. He’s right – the sort of reporting Vives and Gottlieb did requires a great deal of persistent inquiry, something that’s more realistic to expect of paid reporters than of engaged citizens. The Times’s coverage package on Bell is incredibly thorough, and the paper is now investigating other cities in Southeastern LA, like Vernon, where similar corruption seems to have taken place. This was a critically important story to break, and it’s worth pointing out that it might never have forced the elected officials out of office had the story not been legitimated by the LA Times and brought to national attention.

On the other hand, it’s worth mentioning, as Folkenflik does in his piece, that there’s a citizen media component to the story as well. “Pedro Parramo” is the nom de guerre of the blogger behind WatchOurCity.com, a blog that focuses on Bell and environs. Folkenflik reports that the blogger in question is frustrated that his role in helping expose the story has been little discussed, but admits that his blog has dozens of readers, not the hundreds of thousands the LA Times has, and that the corruption only stopped when the bigger paper got involved.

The Bell story is important not just because it will lead to fairer taxes for the citizens of that beleaguered city, but because it points to a large set of cities where corruption might be taking place. Bell was a perfect target for Rizzo and cronies – many of the residents are illegal immigrants, unlikely to get involved with politics. Many are recent legal immigrants and don’t speak English. The city is far from the action of downtown LA and managed to avoid scrutiny until a lucky coincidence and journalistic persistence broke the story. While California’s penchant for charter cities makes it particularly likely to fall victim to such shennanigans, this would be a good time for newspapers around the country to be checking up on local municipal governments… and as critically, for citizens to check in on whether their newspapers are capable of doing that sort of sustained investigative work.

The difficult truth: corruption in Bell went on longer than it should have, and it took an excellent and prestigious newspaper to bring it to light. Not every city is lucky enough to be in the shadow of that calibre of newspaper. And the work that “Pedro Parramo” did to expose the story wasn’t sufficient to stop the corruption until a big paper got involved. As we untangle what’s involved with citizen media, we need to think about more than reporting stories from disadvantaged communities – we need to think about how they get heard and acted upon.

09/27/2010 (3:51 pm)

Mapping the recession

Filed under: ideas,Media ::

You’ve probably seen this incredible visualization of the spread of unemployment in the US produced by journalist LaToya Egwuekwe. I was at a foundation board meeting last week, showing it to anyone I can get to look at it, and everyone reacts with a variation of mouth half in shock and despair. In understanding the anger and frustration of the current moment in politics, this map tells a large part of the story. It’s hard to imagine the Obama administration moving policy forward on any number of issues – from “don’t ask, don’t tell” to climate policy, without offering some believable way of reversing this job picture.

My second reaction to the visualization is to be impressed what a compelling story it tells with very simple inputs. Every piece of information can be represented by three numbers – the county in question, the date and the unemployment rate. Put together into a map and animated into a time series, the story it tells is emotionally affecting in a way that it’s hard for statistics or text to be. (In an earlier blog post, I referred to the problem of extreme examples in advocacy. This visualization is powerful in that it shows the ordinariness of unemployment, its pervasiveness, which helps put individual examples of unemployment and the pain associated with it in context.)

So, how do we create more visualizations like this one?

I’ve not been able to find the tools Egweukwe used listed on her site, but I’m guessing she’s using ArcGIS or a similar commercial package. Those tools are wonderfully powerful, but you only need a tiny fraction of their power to produce maps like this and update them regularly. What you need is a good source of clean data, a mapping engine that can color the appropriate regions and a user-friendly interface that brings the two together.

Mapping engines aren’t the main problem. Google Charts has a simple API that allows you to create complex charts by passing parameters to a script or just creating a complex URL – the current Charts system supports state and region mapping, though not US counties, I think. Other pretty tools – Tableau, for instance – offer similar capabilities with useful additional features, like the ability to focus in on data points and overlay data on top of the visualization. It’s not hard to imagine either system expanding to support county-level mapping and time-series animation.

Data’s a tougher problem. Egwuekwe’s map uses Bureau of Labor Statistics data – BLS has an excellent collection of data sets, many of which are highly granular and regularly updated. While there’s a mandate across the federal government to release key data sets, that hasn’t led to much data being posted on Data.gov yet. Data.gov’s Raw Data catalog features 2,851 sets today, which sounds impressive, until you discover that 19 of the 25 sets listed on the front page are a variant of “1987 Toxics Release Inventory data for the state of Colorado.” Interesting stuff, sure, but one might expect a data set to include multiple states and multiple years. Comparing the US’s data.gov and the UK’s data.gov.uk, FlowingData notes that data.gov doesn’t include some very basic sets, like basic demographic data from the census. (The data exists, it’s just not in Data.gov.)

The real win comes from having data sets tightly integrated with appropriate tools for understanding and visualizing them. For me, that’s the special genius of Hans Rosling’s Gapminder. Rosling’s core thesis is that economic development needs to be thought of in terms of multi-year or multi-decade timelines. In other words, it’s a mistake to compare the current development of sub-Saharan Africa with Scandinavia, without thinking about the ways in which colonialism mean that African states have only been developing under their own power for a couple of decades – when we look at the rates of positive change for some African nations, we can easily imagine parity with wealthier nations in a few decades. Gapminder makes this visible by using rich, complete data (mostly from the World Bank and UNDP) and making it easy to compare multiple countries on various development indicators, animated over time.

What might we ask of a Gapminder US? Rosling and friends have a Gapminder US in their labs section – while it crashes my browser, it seems to apply Rosling’s methods to US data, inviting us to look at income and infant mortality from 1929 to the present. Egwuekwe’s work suggests that we might want to consider shorter time series as well, using data from within the past decade, and looking at a very fine grained level, comparing economic and social indicators on a county-by-county basis. DataMasher, a site from ForumOne Communications, uses Data.gov and other US government data to allow the comparison and mapping of variables – for instance, cancer incidence and health spending per capita. It’s a cool idea, but I find the maps badly in need of explanatory text, and not the fake jewelry spam most seem to feature.

Maybe it’s a mistake to think that there’s a single methodology that works for different data sets – perhaps we can’t offer a Manyeyes for mapping US government data and hope for meaningful results. But Egwuekwe’s map suggests to me that we could understand a great deal more if it were a little easier to visualize fine-grained data over time. Would love your thoughts on other cool projects out there that are visualizing government data over time – what am I missing?


While we’re looking at US maps, Eric Fischer’s work mapping racial divides in American cities is pretty striking.


Eric Fischer’s racial map of New York City.

Fischer’s Flickr page explains that he was intrigued by Bill Rankin’s racial and ethnic map of Chicago on Radical Cartography and decided to replicate the project across a large set of American cities using data from the 2000 census and Open Streetmaps. The results are beautiful and often uncomfortable. The UK’s Daily Mail offers an overview of his project and some of the highlights.

09/27/2010 (1:57 pm)

Your chance to Gerrymander a congressional district

Filed under: ideas,Media ::

One of the prizes of the upcoming US elections is control of statehouses and governor’s mansions. The goal is not just the sheer number of Democrat or Repblican held seats, but the ability to control the redistricting process. Every ten years, in response to the shifts in population documented in the census, states redraw Congressional districts, adding or subtracting seats to ensure proportional representation, and, in the process, shaping seats to make them easier for the party in power to win and hold. Mara Liasson did an excellent story for NPR outlining the importance of redistricting and the resources being put towards races to win statehouse seats – basically, one of the prizes at stake in the 2010 elections is the ability to gerrymander.


NY-28, featured in Michael Cooper’s NYTimes piece

Sunday’s New York Times Week in Review includes a great piece by Michael Cooper on gerrymandering Congressional districts, including five absurd districts created to stack elections, like the district above that spans more than 100 miles in upstate New York, packing as many Democrats as possible into one area to create Republican seats. The Cooper piece makes clear that redistricting is a dark art, involving detailed demographic data and algorithms capable of calculating partisan gains through “packing” and “splitting”.


Screenshot from The Redistricting Game

Ever wanted to practice those dark arts? An excellent simulation from USC’s Annenberg Center and USC’s Game Innovation Lab gives you the opportunity. The Redistricting Game requires you to draw five sets voting districts, optimized for different outcomes. (The game uses pop-up windows, and you may need to make an exception for your pop-up blocker to play it.) The first scenario simply requires you to create proportional districts, with roughly equivalent numbers of voters in each. The second and third get more cynical – which is to say, more realistic. In one, you turn a state that has two Democratic and two Republican districts into one that’s three and one. In the other, you don’t switch control of any seats around – you simply redistrict to ensure safe seats for all elected officials.

The fourth scenario brings in the Voting Rights Acts and requires a new 65% African-American district to be added to the map. It’s extremely difficult to draw this district without damaging your two Democratic incumbents, who’ll vote against your plan. In the fifth scenario, Representative John Tanner’s plan for redistricting has been adopted, and you’ve got to create districts without considering partisan issues. Play this scenario on the basic level, and all you need to do is create compact, contiguous districts with the right number of people in them. This seemed like a cop-out to me – all we need to do is pass this legislation and good things will result, says the game as advocate.

Fortunately, the game’s designers are far more cynical than that. On the advanced level – closer to actual political reality – that compact, contiguous plan will be voted down by all your “bipartisan” panel members on partisan lines. You then need to figure out how to ensure partisan victory for one side, while having no information on the partisanship of the voters you’re dividing. Good fun. While the game was created in 2007, it couldn’t be much more timely, and I hope journalists will turn to it as a resource to help explain what actually goes on in redistricting in the coming year.

I’m often suspicious of games for change, socially responsible games, “serious games” –
I think a lot of foundations support the creation of games without much thought for whether they ultimately reach their intended audience, and sometimes make funding decisions based on the idea that funding games will be perceived as forward-looking, creative and edgy. I know of a number of games that had unimpeachable motives behind their creation, but failed to find an audience, either because the gameplay was mediocre or the subject in question wasn’t well explained via a game.

Redistricting seems to fit the game model better than many other problems – in a very real way, the people who draw constituencies are playing a computer game, so the process is easier to render on the screen than, say, the process of door-to-door campaigning. Playing for half an hour this morning, I felt like I learned something about the intersection of forces that was hard to get from Cooper’s piece.

It’s possible that Cooper’s article illustrates a problem I’m starting to see in a lot of advocacy – advocacy from extreme examples. A trend in recent years in both conservative and progressive advocacy has been realizing the power of narrative. Telling a compelling human story of someone affected by a particular policy or problem can be far more affecting that marshalling statistics and analysis. In a climate where policy decisions seem to be less “reality based” than in (some idealized?) past times, offering a compelling narrative looks to be an essential aspect of advocacy.

To offer a compelling narrative, it helps to show an extreme miscarriage of justice – for example, the arrest of 15% of the African American population of Tulia, Texas on fraudulent drug charges by a corrupt police sting operation. (Indeed, the story is so compelling that it’s been the subject of two documentaries, and a forthcoming film starring Halle Berry…) The Tulia story sticks in the mind of anyone who’s heard it… but so do the extreme circumstances: the shocking credulity of police in accepting the accounts of a paid informant, the scale of the arrests, the aggresiveness of the prosecution, the large number of people affected. Progressive advocates want to use the story of Tulia to explain that there are systemic weaknesses in the criminal justice system that lead African Americans to be disproportionately arrested, especially for drug crimes.

We tell the Tulia story and back it with statistics that make the case that racial disparity in policing isn’t as unusual as we might think – policies like stop and frisk mean that many more African American and Latino men have contact with police than white men, and more opportunity to be arrested for posession of small posessions of drugs. But the extreme nature of the Tulia situation can distract us from more ordinary injustice – the fact that a black man in certain precincts of New York City had a 30-36% chance of being stopped and questioned by police during 2006. I worry that there’s a danger we get caught up with the extraordinary narrative and end up forgetting the more ordinary, commonplace facts.

Similarly, I’m not sure Cooper’s five gerrymandered districts help me understand how ordinary and everyday the shaping of election districts to create partisan outcomes actually is. (I wrote this, and was about to write, “After all, my Congressional district makes pretty good sense, and wasn’t created to be a safe seat as there’s nothing safer than being a western MA democrat.” And then I looked at my district, which looks roughly as confusing and non-compact as any of the districts I was trying to create in USC’s game.) If these districts are extremes, do they encourage us to look at our districts and think about whether they’re similarly strangely structured, or do they let us off the hook because we’re not living anywhere as absurd as New York’s 28th district? USC’s game left me with the impression that none of these districts are objectively “fair” – they’re all the product of an interplay of political forces with extremely high consequences for the affected politicians and the citizens they represent. Whether that’s a fair impression or the result of a well-constructed advocacy game… perhaps that’s the subject for another blog post.

09/21/2010 (1:53 pm)

Jonathan Zittrain and technology to transform the law school

Filed under: Berkman ::

Berkman Center cofounder Jonathan Zittrain is the ringleader for our lunch session at Berkman today, a presentation of H2O, the oft-evolving tool he and crew have been developing to bring syllabi and the casebook into the digital age. Z begins his talk remembering a question asked and answered by Charlie Nesson in the early days of the Berkman Center: “What is teaching, and what is the university’s mission?” The answer (or, at least, Charlie’s answer): the mission is to make information available to anyone who wants it.

Zittrain and Nesson began teaching two courses, one on privacy and one on privacy in cyberspace, to a global audience. Because this was 1997-8, streaming video wasn’t a particularly good idea. Instead, the system was optimized around “low bandwidth, text intensive ways to build a community around an idea.” It’s unclear how many people “took” these classes – and Harvard didn’t let them call them classes for fear that it would damage the university brand. However, more than 1000 people signed up to participate in each of the “online lecture and discussion series”.

MIT’s Open Courseware pushed forward the notion of online teaching by giving MIT professors a fairly irresistable offer: give us your course materials, we’ll ship the uncopyrighted stuff off to India, digitize it and “put your couse online.” This has been extremely successful, both in terms of uptake and in terms of mindshare. And for some courses – i.e., ones where the value comes primarily from the lectures – the tools can provide a very good experience.

Zittrain wondered whether another model for online courses could encourage collaborative efforts between teachers. This led to an early project called Syllabus Maker, which was designed to allow professors to post syllabi and share them with other professors. The idea of having community discussions around syllabi has a parallel in a project now emerging from Harvard’s Law Library Lab, called Shelf Life, which allows the portal page for any book become a community for discussion, and invites a reshelving of library shelves based on what books are being talked about by faculty or students.

Working with Larry Lessig, Zitrain identified law school casebooks as an area for innovation. The casebooks used to teach first year law courses, Zittrain tells us, are as conservative as it gets. Cases often hail from the 19th century. Fortunately, in the US, “we the people own the law”, which means we could use more recent cases as part of these casebooks, and our source material is in the public domain. Rather than paying $200 for a casebook in the bookstore, students could have much better online casebooks – ones where you could click and move from the case edited by the professor to the full text, one you can annotate and share annotations from. For those who wanted a printed casebook, online services exist to print and bind the text on demand.

In conjunction with the next generation of a syllabus sustem – now called “playlist” – professors could share syllabi and casebooks, and we could watch courses evolve, influenced by different professors. “We can preserve geneology to track the influences on a course over time.” Eventually, he hopes, we might “change the nature of casebooks themselves” and move away from the old chestnuts, allowing “new chestnuts to arise.”

Laura Miyakawa, project manager of the H2O system – the platform Zittrain is describing, offers a tour of the system… which you can take at h2odev.law.harvard.edu. H2O is a suite of tools, including:

- The question tool, designed to facilitate discussions during classes by allowing people to ask and answer questions, and vote on what questions they’d like the professor to address

- A casebook creation tool, which relies on a database of tagged cases, allowing professors to “fish in the ocean of cases”.

- A tool called Collage lets professors edit and annotate cases, and lets students annotate and share their annotations

- A playlist maker, which allows professors to create and share syllabi. This includes tools that make it easy for a professor to calculate how much reading she’s assigning, and to trim texts to a “required” passage.

- A “rotisserie” discussion tool, which enables a structured discussion. Users respond to a question, then are assigned discussion partners, who critique their responses.

The tool promises to have benefits for students as well as professors. Students can outline and mark up cases they’re assigned to study and share them with a study group. And Zittrain believes a system like this could change how courses are structured: “I like contracts, I like torts, I’m not going to teach contorts because there’s no book for it. but if I can easily do my own bespoke syllabus drawing on the work of others, I could.”

The system is currently being used in Zittrain’s torts class, and the code will be released under an open source license in the near future.

Asked about making the tool easier for professors to use, Zittrain suggests that some might bootstrap using Mechanical Turk to add cases to the system… then notes that he’s recently been giving talks warning about the potentially exploitative nature of MT. Asked about privacy issues – is it possible for professors to track what students have read? Zittrain admits that, yes, is would be, though the software doesn’t support that functionality. And he points out that it would be a dreadful teaching technique to monitor students and call them based on their failure to do the readings.

09/17/2010 (2:52 pm)

No Firewall – free speech resources for Vietnamese speakers

Filed under: Human Rights ::

What national government censors the internet most aggressively?

Iran and China are probably the most popular answers, and there’s a good case to be made for each. My friend Sami ben Gharbia makes the case – in a very important and must read essay – that much of the Arab world is extremely hostile to online dialog and filters extensively, though gets less attention because Arab leaders are often aligned with US foreign policy objectives. If we’re getting technical about things, North Korea, which doesn’t permit internet access for ordinary citizens, probably wins the prize. But when I think about aggressive internet filtering, I think about Vietnam.

Lots of countries – as many as forty – filter the internet their citizens can see. Vietnam does lots, lots more. They surveil, harass and arrest bloggers. There’s evidence to suggest they use DDoS attacks to silence dissident websites and hacking attacks to disrupt discussion forums and intimidate participants. And in an amazingly brazen attack, they distributed a malicious trojan, distributed as a popular keyboard driver necessary to type in Vietnamese on a Windows computer.

It’s worth noting, as a commentator did on a past one of my blogposts about internet censorship in Vietnam, that pro-democracy groups aren’t the only ones targeted by the Vietnamese government. Bauxiteinfovietnam, a site targeted by DDoS attacks, is the centerpiece of a campaign against a Bauxite mine in an environmentally sensitive area, not an explicitly political protest. While the Vietnamese government is quite aggressive in silencing speech that advocates for democratic change, other forms of dissent are targeted as well.

Given the challenging environment for speech in Vietnam, I was thrilled to hear from my friend Duy Hoang that pro-democracy organization Viet Tan has launched a new website focused on helping Vietnamese evade the national firewall. Nofirewall.net offers an extensive collection of internet security resources translated into Vietnamese. (The site, incidently, is hosted on Blogspot.com. This isn’t because Viet Tan are cheap – many activists are choosing to host on Blogspot or WordPress when they believe their sites are likely to be affected by denial of service attacks – it’s much harder to cripple Google’s vast server farms with a DDoS than it is to take down a third-tier hosting provider.)

Some of No Firewall’s manuals were originally published by the fine folks at FLOSS Manuals – indeed, much of the content is from “How to Bypass Internet Censorship“, a manual written during a “book sprint” in Upstate New York involving contributions from Sesawe, the FLOSS Manuals folks, and other writers who’ve published on internet security.

Viet Tan was able to produce a Vietnamese version of the FLOSS Manual in question because
FLOSS Manuals are generally licensed under the Gnu Public License version two. As such, Viet Tan is free to translate and redistribute the manual so long as they use a compatible open source license on their own text. FLOSS Manuals use of open licenses is one of the most important parts of their project because it means the work done by the manual’s authors can spread widely to a variety of audiences.

Just a few days back, I was writing a letter of endorsement to try to help the FLOSS Manuals project obtain funding from a donor Global Voices has worked with. (The text of that letter follows below.) I argued that the translatability of FLOSS Manuals was a big part of their importance. It was exciting to see an example of this potential put into practice, and I’m glad that these resources are available for Vietnamese speakers around the world.


(from an endorsement letter written in support of FLOSS Manuals)

The rise of open source software has been widely acknowledged as one of the most exciting developments of the 21st century. The ability of geographically distributed individuals to produce mission-critical software and systems offers not only a challenge to the software industry as we commonly understand it, but intriguing hints about the future of economics in a connected age.

Often missed in the enthusiasm about open source software is an understanding of the model’s limitations. Developers have an increasingly impressive track record in bringing innovative new software to light and to rapidly addressing the bugs discovered in this software. Their track record of documenting this software and in producing usable manuals, on the other hand, is pretty dreadful. Writing a new email package is viewed as sexy and exciting – writing the manual to allow someone to use that package is usually viewed as a necessary evil at best, as a task for an unspecified someone else, at worst.

Enter FLOSS Manuals. Adam Hyde and friends are applying some of the best thinking of the open source movement to solve one of that movement’s important and nagging problems: documentation. Without documentation, FLOSS (free, libre and open source software) software is less useful, less usable, and less able to displace expensive and often inferior (though better documented) closed source software. FLOSS Manuals close the gap between the developers and users of software, exploring the power and potential of these tools from the perspective of experienced users, rather than from the inside perspective of the developer. As such, they’re some of the most compelling and useful manuals on tools like Firefox, CiviCRM, OpenOffice, WordPress and GNU/Linux.

These are critical tools for users around the world, but they’re especially important for users in the developing world, as they represent low/no-cost alternatives to very expensive proprietary software. It’s no surprise that the One Laptop Per Child project is working closely with FLOSS Manuals to document their software, used by children across the globe.

FLOSS Manuals’ process is as fascinating and compelling as the work they produce. Books are jointly authored by anywhere from a handful to dozens of authors, collaborating using wikis and other shared workspaces. Many books are produced using a “book sprint”, a unique form of book writing that involves inviting a small group of experts on a tool or process to live and work together for a few days and produce the essential skeleton – and often the entire text – of a manual. Sprints are frequently organized around conferences and meetings of tool developers, leveraging the systems used to organize FLOSS software development to allow for documentation of code. The model has been surprisingly successful in turning out high quality text on very short deadlines. I witnessed a sprint conducted in conjunction with a summit on Open Translation Tools – within a few days, half a dozen lead authors created the best text available on the subject. (My modest contribution, an essay I’d written previously, was sliced and diced into am introduction for the volume, which detracts only slightly from the quality of the project as a whole.)

Translatability is a key feature of books produced by the FLOSS process. They’re licensed in such a way that potential translators face a minimum of hoops to jump through in translating the texts, which makes them easier to localize for developing world environments, or to make available in accessible editions for the blind and disabled.

As the open source ecosystem matures, we are beginning to understand what aspects of this new model work well and which are spaces for further learning and innovation. Two years ago, I would have decried the poor quality of documentation available for users of open source software and pointed to it as a key reason why FLOSS software faces barriers to adoption. Now I can point to FLOSS Manuals as a model for how this work should be carried out, and as an organization capable of carrying out this work on a wider basis, with appropriate support. I celebrate the work Adam and friends have done so far and endorse, in the strongest terms possible, their models and working method. I hope they’ll see increasing recognition and support, as there’s a mountain of projects out there that deserve and demand the sort of high quality documentation FLOSS Manuals has begun to provide.

-Ethan Zuckerman
senior researcher, Berkman Center for Internet and Society, Harvard University
co-founder, Global Voices Online

Next Page »