John Wilbanks on Science Commons, and generativity in science

John Wilbanks, the founder of Science Commons, is in the midst of a big move. His division of Creative Commons, focused on opening scientific research and innovation, is now five years old and is being “airlifted” to California to try to bring some of their ideas into the Creative Commons movement as a whole.

One way to think of the mission of Science Commons, Wilbanks tells us, is to spark generative effects in the scientific world much as we’ve seen them in the online world. He quotes Jonathan Zittrain’s definition of generativity, from “The Future of the Internet… and How to Stop It“: “Generativity is a system’s capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences”. This raises some provocative questions, when applied to the world of science: “What does spam look like in a patent system? What does griefing look like in the world of biological data?”

The truth is that the scientific world is far less generative than the digital space. He proposes three major obstacles to generativity: accessibility, ease of mastery, and tranferability. He points out that, as science has gotten more high tech, it’s far harder to master. The result is hyperspecialization: neuroanatomists don’t talk to neuroinformaticists… “and god help you if you cross species lines.” And so universities are making huge investments to try to encourage collaboration: MIT’s just build a $400 million building – the Cook Center – to force collaboration between cancer researchers… and predictably, researchers are fighting the mandate to move in and work together.

People approach Science Commons based on encountering Wikipedia or free software, and say, We what ‘that’ for science.” Unfortunately, there’s not enough analysis of what makes those projects work. We’d think that science is the perfect space for this sort of peer cooperation, based on Thomas Merton’s observation: “I propose the seeming paradox that in science, private property is established by having its substance freely given to others…” Scientists solve a complex game theoretic problem with new research: they’ve got to disclose to get credit, but as they disclose, they enable competitors in the field. The assumption that science is easier than the cultural space to build a commons in might not be true.

Because John comes from a legal background (and from the Berkman Center and Creative Commons), he tends to think about legal constraints and protections in the field. He asks us to consider three aspects of the world of science: data, tools and texts.

Texts are protected by copyrights, and they’re actually pretty simple. The system is near universal, and as Creative Commons has demonstrated, it’s invertable through legal tools like Creative Commons.

Tools are as broad as ice cores from the Arctic Circle, bones from an archeological dig or stem cells. Contracts between institutions – materials transfer agreements – govern their movement, and patents govern them, especially in the life sciences and the energy field.

Data is protected by secrecy and sui generis protection laws. Fortunately, copyright doesn’t attach to raw data in databases, but there are legal tools we need to unlock other aspects of databases, including structure and compatibility.

While these constraints and protections are relevant, Wilbanks tells us that we also need to deal with “the three I’s”: incentives, infrastructures and institutions. Collectively, they work together to slow down adoption of open policies far more than laws do.

For years, the NIH has had a voluntary public access policy, asking researchers to make their text accessible for free on the web within 12 months. 4% of researchers did. Recently, NIH mandated compliance, and now compliance is over 70% and rising. While this only affects NIH-funded research (hugely important to US biological research, but less relevant in other spaces), institutions like Harvard and MIT are adopting open access publishing policies that mandate this behavior.

“The tools we use to open literature from copyrights don’t work in databases – we need different institutions.” Complicated data that isn’t correctly annotated isn’t helpful. And we need to focus on bottom up resistance, the fact that there’s no incentive or mandate to label or format your data. In the publication and text space, we have university and funder participation. We don’t have any of this in the data space.

The tools space involve “physical objects with physical existence – they are not subject to long tail, everything is free realities” of the digital world. And here the incentives work directly against us: “You don’t get funded by giving away your stem cells – the opposite, you get more funding for writing papers only you can write” because you’re the only guy with access to the tools. And so “the resistance is fractal – it shows up in the same form at high and low levels.” And it represents a huge barrier – he shows us the patent workflow for Telomerase. “Any new field with VC’s involved has a patent explosion underway. It’s not just US – China is outpacing us 8 to 1 in filing patents around clean energy.”

We need to consider the problems that we’ll face with an explosion of data. He quotes Bruce Sterling: “We used to produce data faster than humans could structure it. Now we produce data faster than machines can structure it.” Students are now assigned to develop a microbial portrait of a streetcorner – they swab lampposts and garbage bins and are able to get the organisms sequenced in a weekend. “We need a domain name system for data if web effects come into being.” Until we get to a strong enough web infrastructure, we won’t be able to get these positive effects.

Science Commons works on developing new types of contracts, but Wilbanks tells us that the heart of the work is lobbying people to use them and tracking the extent which they’re used. “Measurements lead to incentives in science – everyone wants to maximize on that metric”. The other major change that’s going to push the field forward is the emergence of new technologies, which may put forward new norms.

“Science is heading (back) towards the garage”. You can buy a gene sequencer on eBay for less than $1000 – which allows you to go from the physical to the digital. And you can get your novel genes sequenced on sites like Mr. Gene for $0.39 per base pair, bringing the digital into the physical. Citizen bioengineers are figuring out how to reprogram E. Coli into novel organisms that can detect arsenic in water. You can download tools from a database at MIT and synthesize your new creations at the site of your choice.

“We’re trying to let the explosion of creativity occur,” much as it did in the online space. “The evil people are going to use these tools anyway. If only the bad guys and the government have access, my money’s on the bad guys.”

“We need to decide whether we’re going to have a PC or Tivo for science,” Wilbanks says in closing. The PC model is uglier, but it’s far more generative and creative and it’s what we want to embrace in the long term.

A partial account of question and answers:

Salil Vadhan wonders how these ideas apply to his field, Computer Science. Wilbanks points out that there’s a spectrum of fields with respect to their openness, roughly spanning from math, physics and computer science (where the fields tend to be extremely open) to chemistry, which tends to be extremely closed. In fields that are highly specialized, the consequence is that it’s much harder to make useful abstractions. In open fields like math, Wilbanks tells us that Hippocratic principles prevent him from getting involved.

Eric Von Hippel wonders about the constraints that the patent system put on generativity. Wilbanks tells us that patents tend to be used like trading cards. Massive, mechanized disclosure is going to change this up. The pharma industry did systematic disclosure as a way to prevent patenting of genomes. “Data as prior art, channeled correctly, makes it harder to get an idiotic patent.”

I asked where Wilbanks would put pressure if he were starting the battle for scientific generativity today, independent on Creative Commons or his legal background. He explains that law is actually a pretty useful place to work from – “we don’t take research money, create text, tools or data,” which means we’re perceived as being fairly neutral. And law is an essential component. But the lever is possibly better placed at the point of funder relationships. Funders can demand that you don’t just release your data – you annotate it. You make sure your tools are accessible. And Creative Commons is a great convener to help funders figure out how to do this, with the legal bits disappearing into the framework.

This turns into a discussion of norms and law – Wilbanks quotes a colleague, who’s got the brilliant observation: “Norms scale far better than the law” –

David Weinberger asks his prefered trick question about the organization of the data we’re talking about releasing. Wilbanks correctly identifies it as “a religious question” and offers “the view of my sect”. That view is that we need standard formats like OWL and RDF, but that we need to let people form their own ontologies and let them fight it out in the marketplace of ideas. He’s suspicious of assertions of people (sometimes people involved with the semantic web) who push forward one particular ontology. This resolution is going to be messy… we need to lower transaction costs to allow people to wire big data sets together.

There are excellent conversations as well about privacy and incentives – check David Weinberger’s blog for what turn out to be very thorough notes, especially on the Q&A.

This entry was posted in Berkman, ideas. Bookmark the permalink.