Towards the principles of open government data

The goal of this weekend’s Open Government Principles workshop at O’Reilly and Associates was to draft a set of principles to define what constitutes open government data. The people drafting these principles were, for the most part, activists who believe that widespread sharing and creative presentation of government data can create a better-informed citizenry. In other words, they’re data junkies – the perfect folks to create a demanding list of what geeks, journalists and the citizens they serve need to access government data as easily as possible.

I got a sense for the importance of the task talking with Dan O’Neil, who is “people person” for, a remarkable project headed by Adrian Holovaty designed to be a “one-stop shop” for information about urban neighborhoods, including building permits, crime reports, planned improvements, school information, etc. Dan’s job is to negotiate with government officials in the twenty cities Everyblock seeks to map, and gain access to vast geocoded data sets. Armed with a set of principles and best practices that government geeks can show to their bosses, his job would be a lot easier than it is right now.

My colleagues offered a tight definition of what constitutes open government data:

Government data shall be considered open if it is made public in a way that complies with the principles below:

1. Complete
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2. Primary
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

3. Timely
Data is made available as quickly as necessary to preserve the value of the data.

4. Accessible
Data is available to the widest range of users for the widest range of purposes.

5. Machine processable
Data is reasonably structured to allow automated processing.

6. Non-discriminatory
Data is available to anyone, with no requirement of registration.

7. Non-proprietary
Data is available in a format over which no entity has exclusive control.

8. License-free
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

All those qualifications were the subject of substantial discussion, some of which is ongoing on a wiki, which you’re welcome to contribute towards. It was a much faster process to draft a introduction – a mini-manifesto of sorts – which reads in part:

The Internet is the public space of the modern world, and through it governments now have the opportunity to better understand the needs of their citizens and citizens may participate more fully in their government. Information becomes more valuable as it is shared, less valuable as it is hoarded. Open data promotes increased civil discourse, improved public welfare, and a more efficient use of public resources.

The definition will surely evolve, especially as we get input from people who make government policy decisions on matters of data access and security. And there are a couple of questions that couldn’t be addressed in the course of a weekend meeting.

One concerns how broad the definition should be of “government data”. If it includes all data paid for by public funds, then a call for open data has substantial overlap with the Open Access Movement, which seeks to unlock scholarly materials published in licensed journals and make those materials available under less arduous licenses, trying to share scholarly research with people in developing nations. (Much of the scholarship Open Access seeks to unlock is produced with government funding – OA advocates argue that research paid for by public funds needs to be broadly available to the public.) While it would be exciting to see solidary between these movements, that definition is probably broader than what most of the people in the room were considering when they thought about government data.

A second concern regards non-digital data. The principles above apply to data that’s available in a digital form – they don’t apply to the vast stacks of paper records most governments have accumulated, or obsolete media like inaccessible computer tapes or disks. Ideally, governments will begin to make this material available, but there are unanswered questions of costs incurred during digitization and the priority of bringing old records online. There’s a danger that keeping records in analog format will become a way to avoid digital scrutiny. Before dismissing this as absurd, keep in mind that the current US administration evidently does not use the White House email system for fear of subpoena, and uses laptops issued by the RNC to keep their proceedings from public scrutiny. At some point, a statement of open data principles will need to address the desirability of ensuring that government data becomes digital as soon as reasonably feasible.

My concerns aside, these principles are a very useful first step in encouraging governments around the world to make their data accessible and usable by activists and citizens. I hope they generate a good deal of discussion and are adopted widely.

This entry was posted in Geekery, Media. Bookmark the permalink.

9 Responses to Towards the principles of open government data

  1. Pingback: » Late breaking news

  2. Pingback: · Principles for Open Data

  3. Pingback: ICTconsequences » Blog Archive » Open Government Policy

  4. quixote says:

    M-yes. But most people are not data geeks. Maximum granularity is good, but without (very brief!) summaries, graphs, and maps most people won’t be able to do a thing with the data. Unless the government also undertakes to provide that layer, as unbiased as possible and with links to criticisms or other interpretations, all it really accomplishes is provide a data source either for commercial outfits or for interested parties.

    Then the poor non-datageek will need a course of advanced study just to figure out how to find any usable information in the welter of web sites … kind of like the situation we have now.

  5. Once this gets going, and I believe it will, another principle will be required –

    * discoverable

    It’s easy to take for granted the capacity of search engines to find content, but when the content is vast tables of data finding the stuff that’s important can be hard.

    As for the non-datageeks, I don’t believe they need worry, the datageeks will build and share the tools you need. If your need is worthy enough.

  6. Pingback: BlogBuffer 28DEC07 « VibeWise

  7. Pingback: Blogroll » "Machine-Readable Government" from 1987 to 2008

  8. Pingback: Your ideas on the future of Australian governance | pod of c

  9. Pingback: DXO Intranet » Blog Archive » Ethan Zuckerman: Towards the principles of open government data

Comments are closed.