More than a billion people a month visit YouTube to watch videos.
Sometimes, those billion people watch the same video. More often, they don’t.
YouTube shares information about what videos are popular in different cities and different countries, and for the US, offers a tool to see what videos are popular with different age groups and genders.
We were interested in seeing what videos were popular in different countries, and especially, what videos were popular in more than one country. For the past six months, we’ve gathered data from YouTube to understand What We Watch. The videos we feature are videos that appear on YouTube’s Trends dashboard. These are the videos trending in any of 61 countries – they are not necessarily the most popular of all time, or even most popular that month, but they are receiving a lot of attention in a short period of time. (Gilad Lotan’s explanation of trending topics on Twitter is useful for understanding that distinction.)
What We Watch is a browser for popular YouTube videos, built by Ed Platt, Rahul Bhargava and Ethan Zuckerman at MIT’s Center for Civic Media. (Rahul did data acquisition, Ed did visualization and Ethan waved his hands and requested features inappropriately late in the design process.)
Click on a country, and you’ll get a list of videos that have trended in that country, and a map that shows other countries that watch the same videos. Click a tab, and you can see videos popular just in that country, and not in other countries. Click on a second country, and you’ll see what top videos the countries have in common. Click a video itself, and you’ll get the video itself and a map of the countries where it was popular.
The results are often surprising. The US has more trending videos in common with Germany and the Netherlands than with near neighbors Canada and Mexico. One of the US’s top videos is a Punjabi music video that’s also got an audience in India and Germany. And a 90 second ad for Google Hangouts is surprisingly popular around the world… though hasn’t trended in the US, it’s apparent target market.
While What We Watch is a fun way to navigate the wealth of content available on YouTube, there are serious research questions behind the project as well. In Rewire, I argue that a network that connects computers throughout the globe doesn’t guarantee that content – like videos – will spread across borders of language, culture and nation. Some of what we’re finding on What We Watch supports that contention, and some challenges it.
The music video for “Roar” by Katy Perry offers evidence that some videos find truly global audiences – the video is has trended from Peru to the Philippines, and one of the top videos in Turkey and Saudi Arabia. Other videos find regional, but not global audiences – take P-Square’s “Personally”, which was in the top 10 in Nigeria for 17% of dates we tracked, and is popular in Ghana, Uganda, Kenya, and Senegal… but no where outside of sub-Saharan Africa. And some videos never leave home: Brazil’s top trending video, a humorous ad for a phone company that requires no translation, doesn’t show up on the top charts for any other country.
I’ve been deeply influenced by Pippa Norris’s work on the spread of culture and values across national borders, specifically her book “Cosmopolitan Communications” with Ronald Inglehart. They argue that people tend to overestimate the Katy Perry effect in which US culture sweeps the globe, leveling everything in its path. In some cases, people encounter another culture and reject it violently (the Taliban model), shape it and incorporate it into a new hybrid (the curry model) or simply decide it’s not for them (the firewall theory.) We see evidence for three of the four in our data – it’s hard to see the Taliban model because violent rejection would likely mean banning YouTube, which gives us no data to measure.
We also get some hints on what countries have videos in common. Language matters: countries in Latin America tend to have videos in common with other Spanish-speaking countries. But Brazil and Portugal don’t share much content (and Brazil’s viewing habits have little overlap with anyone, offering another theory: if you have a big enough domestic internet, you may develop your own, insular internet culture, as in Japan as well.)
We got very interested in countries that share content with lots of other countries. To identify these countries, we used a metric called “betweenness centrality“. Imagine the countries as nodes on a graph, connected by links that represent videos in common. If you calculate paths from each node of the graph to each other, nodes that many paths move through have high betweenness centrality – they are bridges through the network.
The countries with highest betweenness centrality are United Arab Emirates and Singapore. Both have lots of weak ties to other countries, which means they may act as cultural bridges between unconnected countries – we can imagine a video popular in India making its way to Yemen through the United Arab Emirates. It’s interesting to note that Singapore and UAE both have massive populations of expatriates and “guest workers” (over 90% of the population in UAE and over 40% in Singapore). Culture travels with people, and it’s no surprise that Indians in the UAE would want to watch videos from home, or that Poles living in the UK mean there are Polish-language videos in the UK’s top ten.
What we don’t know yet is whether videos spread through the networks: i.e., does a video made in India spread to Yemen through UAE, for example? To test that, we’ll need to watch how a popular video spreads over time, and, ideally, we’d want to know where a video originates. That’s harder than you might think. We’ve looked at the possibility of hand-coding the videos as to their nation of origin, so we can see whether a UK video might appear on the charts first in Australia or Poland. But we’re flummoxed by the fact that many of the popular videos aren’t easily pinned down to one nation or another – take this ad, popular in both Russia and Ukraine. It’s a Nike ad about street soccer, which suggests we should attribute it to the US, where the company is based… but the ad’s in Russian, clearly aimed at urban audiences in Eastern Europe and not for a US market. Do we code it as US, Russian or global?
And then, of course, there’s this ad for Google Hangouts. It’s a sweet and sappy 90 second story about a girl who moves to the big city and stays in touch with her dad via Hangouts. The accents are American and it appears to be an ad designed for the US market, but it has trended around the world, including in many countries with high rates of emigration for work or education. Google may have wanted to encourage American twenty-somethings to connect with their parents, but the message seems to resonate for people around the world.
Please experiment with What We Watch and let us know what you think – you can post comments here about anything interesting you discover, or research questions you think we should ask. The code and data behind the system is available on GitHub should you wish to build your own, or to see what we did. One caution for researchers – we are not showing videos that have been taken down by Google, for copyright or other reasons. In some cases, this means we’re removing many videos from top lists. We hope, in the long run, to show the metadata of those videos, but for now, they’re just not in the set, which means the data is not entirely representative of what we’ve collected.
With Rewire out in the world, I’ve had some time this August to think about some of the big questions behind our work at Center for Civic Media, specifically the questions I started to bring up at this year’s Digital Media and Learning Conference: How do we teach civics to a generation that is “born digital“? Are we experiencing a “new civics”, a crisis in civics, or just an opportunistic rebranding of old problems in new digital bottles? My reading this summer hasn’t given me answers, but has sharpened some of the questions.
Earlier this summer, I was invited by the Mobilizing Ideas blog to react to Biella Coleman’s excellent book, “Coding Freedom“. In my response, I noted that Coleman’s ethnography of hacker culture makes clear her hacker friends aren’t the stereotypical geeks, surgically attached to their computers, sequestered in their parents’ basement – they go to conventions, write poetry, and engage in political protest, as well as writing code.
The sort of hackers Biella documents engage in politics, and when they do, they’ve got multiple tools they can use. They organize political campaigns and lobby congresspeople, as Yochai Benkler and colleagues so aptly documented in this recent paper on resistance to SOPA/PIPA. They can write code that makes a new behaviors possible, like Miro, written by the Participatory Culture Foundation, which makes peer to peer filesharing and search easier and more user-friedly. They protest artistically, as with Seth Schoen’s DeCSS haiku (which prominently features in Biella’s writing.)
Hackers engage in instrumental activism, seeking change by challenging unjust laws. They engage in voice-based activism, articulating their frustration and dissent from systems they either cannot or are not willing to exit. But hackers aren’t merely competent activists in Biella’s account – they are able to engage in civics in a more broad way than most citizens. In addition to traditional channels for civic engagement, they can engage by creating code, giving them a more varied repertoire of civic techniques than non-coders have. (We might make the same argument for artists, who may be more effective in spreading their voices than those of us with less artistic talent.)
I’ve been thinking about Biella’s hackers in the context of some ideas from Michael Schudson. Schudson is a brilliant thinker about the relationship between media and civic engagement, the question that currently shapes my work at the Center for Civic Media. In his book “The Good Citizen”, and this 1999 lecture, Schudson challenges the idea that a good American citizen is one who carefully informs herself about politicians, their positions and the issues of an election. Schudson argues that this is an unrealistic expectation for citizens, pointing to the absurdity of 200 page Voter’s Guides to Elections that, he argues, nobody reads. (I know for a fact that danah boyd not only reads them, but holds parties to get people to read them with her.) But he also argues that this model of the “informed citizen” is only one model of American citizenship the republic has experienced since its foundation.
In “The Good Citizen”, Schudson explores four models of citizenship the US has passed through in the last two centuries and change. When the nation was founded, citizenship was restricted to a small group of property-owning white men, and elections didn’t focus on issues, but elected men of high status and character, who went on to deliberate in Congress with similar social elites. In the age of party politics, Schudson argues, politics was a carnival, with votes based on personal loyalties and social alliances, not on consideration of the issues.
Not until the Progressive reformers attacked corruption in the party system (an attack which included support for prohibition of alcohol, as party bosses were often tavern owners and the ability to supply voters with drink was a key political technique) did the notion of the informed voter come into play. Progressives, through adoption of the secret ballot, the introduction of referenda and the rise of muckraking investigative journalism, shifted responsibility for politics from a small group of elites and party bosses, to the general public. Schudson observes that the general public hasn’t been especially excited by this shift – participation in elections fell sharply during the progressive era and has been below 50% of eligible voters since.
Now, Schudson argues, we are living in an era where change through elections is less important than change through the courts, an age that began with the Civil Rights movement of the 1950s and 60s. Informed citizens are important, but their power to make change comes from suing as much as it comes from voting, and activists and lawyers who understand how to challenge constitutionality through the court system are far more powerful than the average citizen.
While he’s critical of the informed citizen model as unrealistic, Schudson is not arguing for the superiority of the rights-based model, or for a return to party bosses. He’s pointing out that America has experienced different visions of what constitutes “the good citizen” and that these visions can change over time.
That’s helpful context for understanding Biella’s hackers. We may be experiencing a shift in citizenship where the idea of the informed citizen no longer applies well to the contemporary political climate. The entrenched gridlock of Congress, the power of incumbency and the geographic polarization of the US make it difficult to argue that making an informed decision about voting for one’s representative in Congress is the most effective way to have a voice in political dialogs.
Instead, we’re seeing activists, particularly young activists, taking on issues through viral video campaigns, consumer activism, civic crowdfunding, and other forms of civic engagement that operate outside traditional political channels. Lance Bennett suggests that we might see these new activists as self-actualizing citizens, focused on methods of civic participation that allow them to see impacts quickly and clearly, rather than following older prescriptions of participation through the informed citizen model.
Biella’s hackers are exemplars of self-actualizing citizens, using code as one of their paths towards self-actualization, alongside traditional political organizing and lobbying. Larry Lessig’s Code and Other Laws of Cyberspace, a book deeply popular with the hackers Biella studies, offers the possibility that these are only two of four paths towards civic engagement and change.
Lessig’s book is written as a warning about possible constraints to the open internet. While many contemporary scholars warned that the lawless internet would come under control of national and local governments, Lessig warned that it would also be regulated through code, which would make some behaviors difficult or impossible to accomplish online. Lessig outlines four ways complex systems tend to be regulated:
- By laws, created and enforced by governments, which prohibit certain behaviors
- By norms, which are created by or emerge from societies, which favor certain behaviors over others
- By markets, regulated and unregulated by laws, which make certain behaviors cheap and others expensive
- By code and other architectures, which make some behaviors difficult and others easy to accomplish
These four methods of regulation are also ways in which activists and other engaged citizens can participate in civics. Citizens frustrated and angered by NSA surveillance of domestic communications, for example, could lobby Congress to hold hearings on whether the NSA has overstepped its bounds, or whether FISA courts are providing sufficient oversight of government surveillance requests. Civic coders could build tools that make use of PGP encryption easier to protect the privacy of emails. Citizens could punish companies that have complied with surveillance requests and reward those who are moving servers outside of the US to make them more surveillance resistant. And people could begin using Tor and PGP routinely, to influence norms of behavior around encryption and make the NSA’s techniques significantly less effective.
These methods are often applied to non-technical issues as well. Social entrepreneurship uses market mechanisms to seek change, paying farmers a fair wage for their coffee, for instance, by buying from collectives rather than from exploitative wholesalers. Social media campaigns focus on harnessing attention and changing norms, bringing underreported issues to wider audiences. Using code to make government more transparent or more effective is a popular, if possibly overhyped, approach to social change. These models may represent a complement to the informed citizen and rights-based citizenship models Schudson examines, representing new civic capabilities in addition to the capability of influencing laws and governments.
Mastering these four capabilities is a tall order for any civic participant, but some activists are trying. Julian Assange has technical skills, as well as a deep understanding of media, which has allowed him to cooperate and compete for attention in working to change norms around secrecy and whistleblowing. His long run from prosecution has sharpened his understanding of legal systems, and, until the financial “blockade” against Wikileaks, he seemed to be doing reasonably well raising money for his project. (My friend Sasa Vucinic, involved with anti-Milosevic radio station B92 and founder of the Media Development Loan Fund, argues that the key to running a successful anti-government newspaper is to get the funding model right and build a sustainable media outlet.) Edward Snowden has proved extremely technically savvy, legally astute and has had an excellent relationship with the global press, essential to gain a wide audience for his revelations.
Schudson’s portrait of citizenship through the ages focuses on the behavior of large groups of citizens. Assange and Snowden are too idiosyncratic to serve as exemplars of a new class of digitally engaged citizens, promoting a new vision of citizenship. But they demonstrate what a highly competent, multifaceted civic participant might look like and I suspect that we will see more citizens leveraging the full suite of tools that Lessig’s structures of regulation point to.
A challenge for those of us who see the shape of civics changing is how we prepare people to participate in civics where the skills required are so diverse. If it’s difficult to expect citizens to be informed voters, as Schudson argues, it’s very difficult to expect them to be coders, entrepreneurs, lawyers and media influencers. We might hope, as Dewey does, that diverse interests will lead to an interlocking public – I care about surveillance and work to change norms, while you write code, and our friend tackles another challenge through social entrepreneurship. Or it may push us back to a democracy enhanced by expertise, as Walter Lippmann suggests, with citizens throwing fiscal and moral support to organizations that lobby for laws, write code, build just markets and influence public debate, leveraging the expertise and skill of those who dedicate their talents to one or more of these facets of citizenship.
I shared a draft of this post with Erhardt Graeff, who pointed out an inherent tension between ideas of the competent and effective citizen and the “good” citizen. The “good” citizens, in Schudson’s exploration, are those who participated in the system of the times, whether or not we see those systems as laudable in retrospect. A particularly cynical version of this idea would posit that today’s “good citizen” is a predictably partisan consumer, deviating as little as possible from the demographic predictions and models built by pollsters and data analysts to ensure that our candidates are correctly marketed to us. Highly participatory and effective citizens would challenge this sort of model, and it’s certainly possible that a democracy composed purely of Assanges and Snowdens would have a hard time functioning.
Erhardt points out that Lessig has been an activist throughout his career, and that his vision of regulation in Code is one consonant with the effective citizen. But can democracy work if all citizens are effective at promoting and campaigning for their own issues? Have we seen evidence of a society with high, effective engagement and with the other characteristics we expect of a democracy? Should a group like Center for Civic Media be working on thinking through models of effective citizenship or considering the larger question of what a large group of effective, engaged citizens could mean for contemporary visions of democracy?
Charlie DeTar defended his doctoral dissertation this afternoon at the MIT Media Lab. Charlie is a student in Chris Schmandt’s Mobility and Speech group, but has also been an active member of my group, Center for Civic Media, where he’s done very important work including Between the Bars, a platform that allows inmates in some US prisons to blog via the postal service. Charlie is an incredibly thoughtful guy, who takes the time to read deeply and develop nuanced understanding of issues before he builds new technologies.
His work on his doctoral thesis reflects this thoughtfulness – in building “Intertwinkles“, a platform to assist in consensus decisionmaking, Charlie conducted a deep dive into the nature of democracy, decisionmaking, group behavior and technology to assist group decisionmaking. His talk today outlined that work as context for his intervention.
Willow Brugh attended the talk and her visualization of Charlie’s remarks is below. My notes follow below her illustration.
Charlie’s remarks start with the question: “How much democracy do you have left?”
He shows a photo series of people holding papers with X marks on them – the marks represent the number of presidential elections the person expects to have left. The message – we don’t have very much democracy, if democracy means voting every four years. “Most of us wouldn’t volunteer to be governed by kings or dictators,” Charlie offers, but we face lots of non-democratic rule in real life: bosses, landlords, banks, other powerful institutions we have little influence over.
High profile, democratically-governed activist organizations tend to have short lifespans. Even long-lasting movements like Occupy tend to be relatively short lived. But collectives and cooperatives use highly participatory methods and many have been in existence for decades. Twinkles – the practice of waving your fingers to show approval, non-verbally, for a statement – is a practice that originated in the 1970s and thrives today within collectives and cooperatives. But the in-person nature of collective and cooperative governance can be slow, expensive and draining. Charlie’s core research question is whether we can design online tools for democratic consultation which result in more just and effective organizations.
To answer this question, Charlie has build a set of tools to support consensus decision-making processes, documenting the participatory design process used to develop the tools and evaluated these tools in their use by real-world groups. He’s also done deep investigative work exploring the history of non-hierarchicalism, consensus, and decisionmaking with computers.
Non-hierarchicalism looks like a simple concept at first glance – it represents forms of governance that are decentralized, flat, leaderless, or horizontal. But questions immediately arise: are facilitators imposing a covert hierarchy?
Charlie suggests we consider decentralization, using a definition from Yochai Benkler: in decentralized systems, many agents work coherently despite the fact that they do not rely on reducing the number of people participating in decisionmaking. While the number of people does not decrease, most decentralized systems require some centralization, as Charlie discusses by examining multiple models. The blogging platform WordPress is decentralized because you can download, customize and run the code, effectively becoming a chapter or franchise for WordPress. With Wikipedia, different sets of people work on different problems, editing different articles, in what can be thought of as a subsidiary model. In BitTorrent, rather than decentralizing resources, the founders have declare a protocol that determines how we interact, enabling decentralization through federation.
Each decentralization has a corresponding centralization:
- Bittorrent decentralizes servers via a centralized protocol
- WordPress decentralizes hosting via a centralized codebase
- Wikipedia decentralizes editors through a centralized database and policies
- Consensus decentralizes authority through centralizing procedures
Consensus decisionmaking is a field of governance, Charlie tells us, that works to avoid three tyrannies:
- The tyranny of the majority, when the mob beats you up
- The tyranny of the minority, where small group prevent functioning or dominate decisionmaking
- The tyranny of structurelessness, where elimination of overt structure leads to covert structure via dominant personalities, racism, sexism and other forms of dominance.
Consensus decisionmaking is the process of consulting stakeholders in a way that seeks to avoid these tyrannies. Charlie outlines seven forms of consensus, including corporate, scientific, standards, consociationalism (power-sharing), mob, assembly, focusing specifically on affinity consensus, groups of people who’ve chosen to work together on problems of common interest. He offers a matrix for how each form of consensus handles open membership, egalitarianism, formal process, and the binding nature of decisions. For instance, a corporate department that practices consensus decisionmaking still has a boss, and may not always make binding decisions. Not all groups are open – if I want to participate in the decisionmaking of Charlie’s housing cooperative, I’m going to be refused admission.
In the process of building Intertwinkles, Charlie has developed a long list of protocols that people use to enable consensus decisionmaking, including various facilitation tools, meeting phases, hand signals, roles and formats. Intertwinkles implements several of these protocols in an online environment.
To understand the history of digital tools to assist with decisionmaking, Charlie takes us back to J.C.R. Licklidder, who talked about decisionmaking with computers as early as 1962. Douglas Englebart, whose “mother of all demos” introduced many of the ideas that have dominated the next 50 years of computing, began developing methods of computer-aided decisionmaking in the late 1960s. The field was formalized as “group decision support systems”, generating a huge amount of scholarship around three systems, generally dedicated computing systems installed in “decision-support rooms” at corporations and universities. While these systems were very engineering-heavy, they often used very similar techniques to those used in consensus-oriented groups. However, it is difficult to extrapolate from the scholarship, because the vast majority of studies used artificial, composed groups, not groups with existing histories and patterns. Most were face to face and most were one-shot experiments. These methodological limitations make it hard to extrapolate to understand the utility of these tools for affinity groups, which have important existing relationships, group histories and policies.
Charlie notes that these early group decisionmaking support tools tended to provide all services – including email – to their users, because they were huge, expensive systems that often represented an organization’s first exposure to digital communication. Now systems are smaller and decentralized, including tools like Doodle (used for meeting scheduling) and Loomio, a new system designed to support discussion of proposals in forums and voting on those proposals.
While these systems are promising, Charlie hopes we can do more. He notes that Joseph McGrath put forward a helpful typology of group tasks in his 1984 book, Groups, Interaction and Performance. Ideally, we’d want a system that helps groups engage in each of these tasks – generating ideas, generating plans, executing tasks, etc.
Intertwinkles began as a participatory design project with Boston cooperative housing groups. Charlie recruited six houses from 29 collective and cooperative housing groups and hired three research assistants who were “native participants”, residents in the houses. 45 people participated, overall.
The groups he worked with were involved throughout a field trial process, from pre-interviews to help understand how groups made decision, through an extensive training session on the tools and for 8-10 weeks of usage, as Charlie and his team iterated to improve the tools with feedback from users. The process involved both the creation of new tools and a pair of games designed to inspire conversation and reflection on group dynamics, Flame War (which models decisionmaking over email) and Moontalk (a realtime game that models limited communication channels). More information on both games is available on the Intertwinkles site.
Charlie offers brief overviews of three tools. Dotstorm is based around sticky note brainstorming, and supports visual thinkers through stickies with drawings and with photos taken through laptops or other devices. The system supports real-time collaboration and sharing of ideas and runs on any contemporary web browser. Resolve supports a rolling proposal process, which allows one member of a group to propose an idea and others to expand, refine or block it, eventually voting on accepting it. The system maintains a rich history of a proposal and uses a notification system to keep participants involved in the process, but lets participants use email as their channel for free-form discussion. Points of Unity is a tool designed to help come up with a short list of values or statements that a group agrees with, which many groups find useful as a mutually agreed-upon common ground.
Many of the features of Intertwinkles are platform features shared across tools. There’s a group-centric sharing model that gives people access to documents and resources once they join the group. Membership is reciprocal (like membership in Facebook) and overlapping (you are friends with everyone in the group), a model that Charlie hasn’t seen in Facebook, Twitter or other systems. Everything is shared publicly for discrete periods of time, which lowers the barrier to entry to the system, but then reverts documents to private to avoid spam, etc. Users can take actions on behalf of other members of the group, recognizing that not everyone is active online constantly. There is rich, semantic event reporting, which allows for a “quantified group” analysis, understanding and describing a group’s behavior in quantifiable terms about participation. Intertwinkles is built on a plug-in architecture. Core services handle search, authentication, twinkles, events, notices, groups – other features plug into those core services, which makes it possible to develop radically new tools without building up the other essential components.
For the system to work, Charlie believes that participants need extensive training. What’s key is getting to the point where everyone is confident that everyone else is comfortable with the tools. To remind collectives of the tool, Charlie distributed a colorful pillow, a Twinkle Plush Star, as “an ambient reminder of the system and its uses.”
Five of the six groups used the tool, completing 66 processes and making 2155 unique edits and visits. One group didn’t use Intertwinkles beyond training, and one reported neutral to negative experiences, while the other four groups had generally positive reactions. Charlie measured the participation of each cooperative member with the system because he worried there might be uneven participation. His analysis suggests quite even participation, similar to what you might get face to face.
In examining how collectives used the system, Charlie reminds us of the idea of “technology in action”, proposed by proponents of structuration theory. This theory suggests that designers build tools for certain tasks, but the tools get used for whatever tasks a group wants to carry out, which leads to unexpected outcomes, sometimes contrary to designer’s intentions. Charlie makes his intentions clear: he wanted to make non-participation apparent, to increase awareness of conflict, to make group processes explicit, and to handle facilitation “out of band”.
He sees a correlation in satisfaction with the tool and group structure. Groups that had more confrontive approaches to decisionaking and more formal approaches to decisionmaking had better results with the tools. The group that was least satisfied tends to be avoidant of conflict and privileges action over speaking. A group that found the tools most useful makes participation in house meetings mandatory, has explicit channels for communication on conflict, and extensive house norms. This highly structured group was able to take advantage of the system in ways less structured groups did not.
Charlie sees room to improve the tools: more work on in-band facilitation, in-band training,instrumenting the platform for online learning, and building an ecosystem of developers. He plans to continue working on the tool and already sees possible alliances to build the platform in conjunction with others building tools for group decisionmaking. But he also sees value in the theoretical approach, suggesting that design research is powerful as a form of sociology and a potential quantitative and qualitative method for studying group behavior.
The NSA documents Edward Snowden leaked have sparked a debate within the US about surveillance. While Americans understood that the US government was likely intercepting telephone and social media data from terrorism suspects, it’s been an uncomfortable discovery that the US collected massive sets of email and telephone data from Americans and non-Americans who aren’t suspected of any crimes. These revelations add context to other discoveries of surveillance in post 9-11 America, including the Mail Isolation Control and Tracking program, which scans the outside of all paper mail sent in the US and stores it for later analysis. (The Smoking Gun reported on the program early last month – I hadn’t heard of it until the Times report today.)
The Obama administration and supporters have responded to criticism of these programs by assuring Americans that the information collected is “metadata”, information on who is talking to whom, not the substance of conversations. As Senator Dianne Feinstein put it, “This is just metadata. There is no content involved.” By analyzing the metadata, officials claim, they can identify potential suspects then seek judicial permission to access the content directly. Nothing to worry about. You’re not being spied on by your government – they’re just monitoring the metadata.
Of course, that’s a naïve and oversimplified view of metadata, which turns out to be a surprisingly rich source of information on who people are, who they know and what they do. Congress has historically recognized that metadata is important and deserves protection – while the Supreme Court ruled in Smith vs. Maryland that phone numbers dialed should not be expected to be private information, as they are exposed to the phone company, Congress put restrictions on the use of “pen registers”, devices that can track what calls are made and received by a phone, requiring law enforcement to go to court to institute such tracking. The same logic in Smith vs. Maryland applies to the Mail Isolation Control and Tracking program – since information on envelopes is visible to the public, or at least to mail carriers, it’s monitorable and storable, even without “mail covers“, US Postal Service administrative orders used to trace mail coming to criminal suspects. And, perhaps, the policymakers who approved NSA’s surveillance projects would argue that the logic applies to email headers as well.
Put aside for the moment the question of whether monitoring metadata is reading public information or is more analogous to a pen register. There’s a scale issue that comes into play here. One major constraint on pen registers and mail covers historically has been the sheer amount of data they generate. Potential overreach by law enforcement is held in check by two factors – the need to get court or administrative approval to trace metadata, and the ability to process said metadata. As a result, USPS insiders report that it processes about 15,000 – 20,000 mail covers a year related to crime, and as security researcher Chris Soghoian discovered, internet and telecommunications companies charge law enforcement agencies for pen registers, putting some practical limits on their use.
But the NSA surveillance of email and phone networks, and the Mail Isolation Control and Tracking program have no such limits. While it’s likely quite expensive to scan all US mail, once you’ve committed to doing so, it’s comparatively cheap to store that information and analyze it at later dates, as investigators evidently did to arrest Shannon Richardson for sending ricin to President Obama and New York City mayor Bloomberg. And, since the costs of NSA surveillance are evidently borne primarily by internet and telephony companies, it’s downright cheap to keep metadata on email and phone calls. All the postal mail, email and phone calls.
It’s also much, much cheaper to analyze this data than in years past. The current frenzy for “big data” and “data science” has called attention to techniques that allow analysts to pull subtle patterns out of data – a New York Times story that suggests that retailer Target was able to identify pregnant customers based on their purchasing behavior (unscented lotion!) and target ad flyers to them gives a sense for the commercial applications of these techniques.
Sociologist Kieran Healy shows another set of applications of these techniques, using a much smaller, historical data set. He looks at a small number of 18th century colonists and the societies in Boston they were members of to identify Paul Revere as a key bridge tie between different organizations. In Healy’s brilliant piece, he writes in the voice of a junior analyst reporting his findings to superiors in the British government, and suggests that his superiors consider investigating Revere as a traitor. He closes with this winning line: “…if a mere scribe such as I — one who knows nearly nothing — can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.”
If you are a member of a secret organization planning overthrow of the government, you’ve probably already thought hard about what your metadata might reveal. But if you’re an average citizen with “nothing to hide”, it may be less obvious why your metadata may not be something you are comfortable sharing. After all, Frank Rich recently proclaimed that “privacy jumped the shark in America long ago” and that we are all members of “the America that prefers to be out there, prizing networking, exhibitionism, and fame more than privacy, introspection, and solitude.” Lured by reality television and social networks, we all want to be watched and have therefore have given up our distaste for surveillance.
I think it’s possible to be both a heavy user of social media, and concerned about the security of your metadata. It simply requires understanding that, for many of us, social media is a performance. When I share links on Twitter, I’m aware that I’m constructing an image to my followers as someone who’s interested in certain topics and disinterested in others. I don’t share every article that I read, both because I suspect not all are interesting to my followers and also because I don’t really want my professional community to know just how much mental energy I spend worrying about who the Green Bay Packers will field at running back in the coming season.
This may not be how you use social media, but it probably should be. As danah boyd and others have pointed out, youth have had to figure out how to navigate a world in which their interpersonal and social interactions are archived, searchable and persist long enough to present a problem in adulthood – as a result, they’re continually engaged in “identity performance”, as well as in developing codes and other ways to speak on social networks to defy monitoring.
By contrast, most of us aren’t maintaining a persistent, public performance when we’re using telephones or email. (For an example of what this might feel like, consider this story from This American Life, where lawyers who work with Guantanamo detainees talk about how having the US government monitor their personal phone calls changes their behavior.) Our metadata can reveal things we may not want to share with others, or may not know ourselves.
As it happens, I have a pretty good sense for what my email metadata might tell an investigator. This fall, I co-taught a class with Cesar Hidalgo, Catherine Havasi and Sep Kamvar at the Media Lab titled “Big Data”. Two of the students who took the class, Daniel Smilkov and Deepak Jagdish, worked on a project called Immersion which uses Gmail metadata to map someone’s social network. I’m one of about 500 alpha testers of the software, developed by Cesar, Daniel and Deepak, and have been one of the poster boys for the project as it’s been on display at the Media Lab, as I’ve got the largest network of Gmail contacts of anyone who’s used the system. (This isn’t because I’m especially popular, I suspect. Most of my MIT colleagues use mit.edu addresses. As someone new to MIT, who maintains a number of different affiliations, I have been a heavy Gmail user.)
The largest node in the graph, the person I exchange the most email with, is my wife, Rachel. I find this reassuring, but Daniel and Deepak have told me that people’s romantic partners are rarely their largest node. Because I travel a lot, Rachel and I have a heavily email-dependent relationship, but many people’s romantic relationships are conducted mostly face to face and don’t show up clearly in metadata. But the prominence of Rachel in the graph is, for me, a reminder that one of the reasons we might be concerned about metadata is that it shows strong relationships, whether those relationships are widely known or are secret.
The other large nodes on the graph are associated with specific clusters. Rebecca is my co-founder at Global Voices and Ivan and Georgia run the organization day-to-day – they dominate the green cluster, which includes key people in that organization. Hal is my chief collaborator at the Berkman Center, and Colin is my boss – they dominate the orange cluster, which includes fellow Berkman folks as well as a number of prominent internet law and policy folks who work closely with the Center. Lorrie is assistant director at Center for Civic Media and is the person I work with most closely at MIT – the red cluster represents the people I work with at the Media Lab.
Anyone who knows me reasonably well could have guessed at the existence of these ties. But there’s other information in the graph that’s more complicated and potentially more sensitive. My primary Media Lab collaborators are my students and staff – Cesar is the only Media Lab node who’s not affiliated with Civic who shows up on my network, which suggests that I’m collaborating less with my Media Lab colleagues than I might hope to be. One might read into my relationships with the students I advise based on the email volume I exchange with them – I’d suggest that the patterns have something to do with our preferred channels of communication, but it certainly shows who’s demanding and receiving attention via email. In other words, absence from a social network map is at least as revealing as presence on it.
Another sensitive piece of information comes from how Immersion draws and codes clusters. Immersion’s algorithm is sensitive to who you include on the same email. Global Voices emails include Ivan, Georgia, Rebecca and others – people who I email when I email those three get placed in the same cluster. People who exist as bridges between clusters are particularly interesting, as they are people who appear in multiple roles in your social network. Joi Ito appears on my graph twice (as “Joi” and “Joichi”) because he uses multiple email addresses, but in either role, he’s a bridge between my MIT existence, my Global Voices existence and my Berkman life, which reflects my long and multi-faceted relationship with him. But he’s colored red, as a Media Lab person, whereas other bridge figures like danah boyd show up as blue, as they have close relationships with Rachel as well. In other words, I have important, long-standing, multifaceted relationships with both danah and Joi, but danah is part of my family life as well, while Joi is not.
My point here isn’t to elucidate all the peculiarities of my social network (indeed, analyzing these diagrams is a bit like analyzing your dreams – fascinating to you, but off-putting to everyone else). It’s to make the case that this metadata paints a very revealing portrait of oneself. And while there’s currently a waiting list to use Immersion, this is data that’s accessible to NSA analysts and to the marketing teams at Google. That makes me uncomfortable, and it makes me want to have a public conversation about what’s okay and what’s not okay to track.
While popular outcry over revelations about the NSA has been somewhat muted so far, it’s possible that widespread protests planned for July 4th will spark more dialog about what represents unconstitutional surveillance. Here’s hoping that conversation will take a close look at metadata and ask hard questions about whether or not this is information we are willing to share with governments and corporations, or whether we need to regulate and limit this power to monitor as we’ve historically done in the United States. Restore the Fourth.
For another example of what metadata may reveal, see Malte Spitz’s phone records. As I discuss in “Rewire”, Spitz sued his mobile phone provider to obtain his records, then worked with Zeit Online to build a visualization of his movements based purely on that set of data.
Swiss author and entrepreneur Rolf Dobelli recently published a provocative essay titled “News is Bad for You” in The Guardian. The essay describes news – particularly fast-breaking, rapidly updated news – as an addictive drug, inhibiting our thinking, damaging our bodies and wasting our time. Dobelli is so concerned with the negative effects of news that he’s cut himself off from consuming news for the past four years and urges that you do the same.
His arguments attracted angry responses within The Guardian‘s newsroom. Madeleine Bunting writes, “As Dobelli described his four-year news purdah to a group of Guardian journalists last week, there was a sharp intake of collective breath, nervous laughter and complete astonishment. How could someone suggest such a thing to a journalist?”
I had a different reaction to Dobelli’s provocation. I found it pretty persuasive. I shared the article with students in a class I teach called “News and Participatory Media”, and asked the students for their reactions. Many found Dobelli’s case compelling, especially those students who were mid-career journalists. Much of what frustrated them about their profession was bluntly identified in Dobelli’s piece: too often, news is a set of disconnected snippets that promises to inform and empower, but merely entertains, distracts and ultimately misleads.
While Dobelli offers a persuasive set of problems, his proposed solution – stop reading news – strikes me as unhelpful and selfish. You personally may benefit from the time you reclaim in kicking the news habit, but there is likely a societal cost in encouraging people to opt out of consuming the news. A democratic form of government presumes an informed populace that can select appropriate representatives and identify issues that merit public debate. As Bunting notes in her response, a happy, docile and ill-informed citizenry is the precursor to a Huxleian vision of totalitarianism.
Dobelli might accept the accusations of selfishness. His essay is adapted from his new book, “The Art of Thinking Clearly”, which is an odd example of a self-help book. Deeply inspired by Naseem Taleb’s work linking cognitive science and economics, Dobelli outlines 99 cognitive shortcomings, errors and fallacies in an attempt either to steer us towards smarter decisionmaking or, more likely, to bludgeon us into a realization that human beings are pretty lousy at making rational decisions. By the end of the book, Dobelli admits that he rarely considers all these errors and fallacies in making decisions and simply goes with his gut – however, he wants us to have these tools handy for the really important decisions. Those decisions, his examples suggest, generally have to do with making investments as wisely as Warren Buffet or getting good deals on expensive cars. His is not a book about civics – it’s a book about maximizing your personal gains.
If we take Dobelli’s criticism seriously but reject his proposed solution, one next step is to look for ways to address the shortcomings of contemporary journalism. If we don’t like the sort of repetitive, click-seeking, shallow journalism that Pablo Boczkowski identifies in his book “News at Work“, we need to find ways to support “slow news” that focuses on investigation and contextualization of breaking news. If we are dismayed by how both new and old media got many details of the Boston Marathon bombing and the manhunt for the bombers wrong, we need either to slow newsrooms down, or to build better tools to help both newsrooms and readers cross-check and verify breaking news reports.
I can (and frequently do) point to people and projects focused on solving the problems Dobelli poses , but I’m left with two of his challenges that I can’t ignore or solve. They are related points: “News is irrelevant” and “News makes us passive.” These intertwined problems strike me as uncomfortably hard to address.