The goal of this weekend’s Open Government Principles workshop at O’Reilly and Associates was to draft a set of principles to define what constitutes open government data. The people drafting these principles were, for the most part, activists who believe that widespread sharing and creative presentation of government data can create a better-informed citizenry. In other words, they’re data junkies – the perfect folks to create a demanding list of what geeks, journalists and the citizens they serve need to access government data as easily as possible.
I got a sense for the importance of the task talking with Dan O’Neil, who is “people person” for Everyblock.com, a remarkable project headed by Adrian Holovaty designed to be a “one-stop shop” for information about urban neighborhoods, including building permits, crime reports, planned improvements, school information, etc. Dan’s job is to negotiate with government officials in the twenty cities Everyblock seeks to map, and gain access to vast geocoded data sets. Armed with a set of principles and best practices that government geeks can show to their bosses, his job would be a lot easier than it is right now.
My colleagues offered a tight definition of what constitutes open government data:
Government data shall be considered open if it is made public in a way that complies with the principles below:
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
Data is made available as quickly as necessary to preserve the value of the data.
Data is available to the widest range of users for the widest range of purposes.
5. Machine processable
Data is reasonably structured to allow automated processing.
Data is available to anyone, with no requirement of registration.
Data is available in a format over which no entity has exclusive control.
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
All those qualifications were the subject of substantial discussion, some of which is ongoing on a wiki, which you’re welcome to contribute towards. It was a much faster process to draft a introduction – a mini-manifesto of sorts – which reads in part:
The Internet is the public space of the modern world, and through it governments now have the opportunity to better understand the needs of their citizens and citizens may participate more fully in their government. Information becomes more valuable as it is shared, less valuable as it is hoarded. Open data promotes increased civil discourse, improved public welfare, and a more efficient use of public resources.
The definition will surely evolve, especially as we get input from people who make government policy decisions on matters of data access and security. And there are a couple of questions that couldn’t be addressed in the course of a weekend meeting.
One concerns how broad the definition should be of “government data”. If it includes all data paid for by public funds, then a call for open data has substantial overlap with the Open Access Movement, which seeks to unlock scholarly materials published in licensed journals and make those materials available under less arduous licenses, trying to share scholarly research with people in developing nations. (Much of the scholarship Open Access seeks to unlock is produced with government funding – OA advocates argue that research paid for by public funds needs to be broadly available to the public.) While it would be exciting to see solidary between these movements, that definition is probably broader than what most of the people in the room were considering when they thought about government data.
A second concern regards non-digital data. The principles above apply to data that’s available in a digital form – they don’t apply to the vast stacks of paper records most governments have accumulated, or obsolete media like inaccessible computer tapes or disks. Ideally, governments will begin to make this material available, but there are unanswered questions of costs incurred during digitization and the priority of bringing old records online. There’s a danger that keeping records in analog format will become a way to avoid digital scrutiny. Before dismissing this as absurd, keep in mind that the current US administration evidently does not use the White House email system for fear of subpoena, and uses laptops issued by the RNC to keep their proceedings from public scrutiny. At some point, a statement of open data principles will need to address the desirability of ensuring that government data becomes digital as soon as reasonably feasible.
My concerns aside, these principles are a very useful first step in encouraging governments around the world to make their data accessible and usable by activists and citizens. I hope they generate a good deal of discussion and are adopted widely.