The complexity of sharing scientific databases

Creative Commons is a clever use of the copyright system intended to make it easier for people who want to, to share their work with others. Jonathan Coulton has used Creative Commons to enable an army of remixers and videomakers to produce promotional materials for his songs and albums. Authors like Dan Gillmor and Cory Doctorow have used Creative Commons to let people download, translate and make audio versions of their books. And Global Voices uses Creative Commons so that blogs and news sites can use our content without asking us for permission.

What about scientists?

That’s the research interest of my colleague Melanie Dulong de Rosnay. She’s using her time as a Berkman fellow to study alternative copyright systems and their usage and relavence within academic and library communities. Yesterday, Melanie presented research on the licensing of scientific databases and the obstacles such licensing presents to collaboration between scientists around the world.

Under US law, pretty much anything you write down is copyrighted. Scrawl an original note on a napkin and it’s protected until 70 years after your death. Facts, however, are another matter – they can’t be copyrighted. So while trivial but creative scribblings are copyrighted, unless you choose to release them into the public domain, the information painstakingly discovered about the human genome – DNA sequences, for instance – aren’t. But the containers they’re stored in – the databases they’re held in – can be copyrighted.

If I sound confused about this stuff, that’s because I am. And so were the folks at Science Commons, the project that spun off from Creative Commons to focus on open publishing of scientific information. For a couple of years, they offered a wonderfully complex FAQ on applying Creative Commons licenses to databases – the first question read “Can a Creative Commons license be applied to a database?” After a six paragraph answer to that question, the third question read, “So, a Creative Commons license can be applied to a database?”

The approach Science Commons is taking now is a different one – they’re now recommending use of a protocol that specifies how data can be made Open Access – the FAQ on that protocol explains that the complexities of asking scientists to release their data under Creative Commons licenses was so severe that Science Commons has ended up advocating for data to be released public domain, under the auspices of their protocol, instead.

This question of complexity is what Melanie’s research has focused on. She looked at the terms of use for roughly 200 databases neccesary for work in the life sciences. Evaluating the terms on all those databases, she discovered that only 7 met her stringent definitions of Open Access to data – these databases could be accessed without registration; they could be downloaded for local use; they could be incorporated into other works; they had clear, understandable terms of use. This last factor proved to be the most challenging. She spent hours reading these terms with other experts in the field and discovered that, a great deal of time, the experts disagreed on what was permitted under a specific agreement.

THe reason this is important, Melanie explains, is that scientific research proceeds more quickly when researchers can share resources. But with databases encumbered by different, confusing legal protections, it can become a legal nightmare for researchers to do complex work building new tools that combine information from two databases in a novel way, for instance. And databases that are protected by access restrictions can be out of reach to scientists in developing nations who might not have the financial or technical resources to access them.

I was particularly intrigued by a comment from John Wilbanks, who runs the Science Commons project. He points out that a project like the database work Science Commons and Melanie are undertaking is basically one that seeks to make a cultural change, encouraging scientists to share data while retaining citation credit. In some scientific communities – particle physics, for instance – this is standard practice. In others – microbiology – it’s quite uncommon. Wilbanks suggests that this has something to do with the economics of the fields. There are only a few supercolliders, and physicists have to share them, while there are lots of bacteria out there.

I’m glad that researchers like Melanie are digging into these issues. I have a great deal of respect for anyone willing to take on the task of understanding these labyrinthine, illogical and extremely important systems… and a great deal of gratitude that I don’t do research in these areas myself… :-)

The complexity of sharing scientific databases

8 thoughts on “The complexity of sharing scientific databases”