[Summary: What do Internet Governance and Open Data have to do with each other?]
As a proposal I worked on for a workshop at this years Internet Governance Forum on the Internet Governance issues of Open Government Data has been accepted, I’ve been starting to think through the different issues that the background paper for that session will need to cover. This week I took advantage of a chance to guest blog over on the Commonwealth IGF website to start setting them out.
It started with high profile Open Government Data portals like Data.gov in the US, and Data.gov.uk in the UK giving citizens access to hundreds of government datasets. Now, open data has become a key area of focus for many countries across the world, forming a core element of the Open Government Partnership agenda, and sparking a plethora ofInternational conferences, events and online communities. Proponents of open data argue it has the potential to stimulate economic growth, promote transparency and accountability of governments, and to support improved delivery of public services. This year’s Internet Governance Forum in Baku will see a number of open data focussed workshops, following on from open data and PSI panels in previous years. But when it comes to Open Data and Internet Governance, what are the issues we might need to explore? This post is a first attempt to sketch out some of the possible areas of debate.
In 2009 David Eaves put forward ‘three laws of open government data‘ that describe what it takes for a dataset to be considered effectively open. They boil down to requirements that data should be accessible online, machine readable, and under licenses that permit re-use. Explore these three facets of open data offers one route into potential internet governance issues that need to be critically discussed if the potential benefits of open data are to be secured in equitable ways.
1) Open Data as data accessible online
Online accessibility does not equate to effective access, and we should be attentive to new data divides. We also need to address bandwidth for open data, the design of open data platforms, cross-border cloud hosting of open data, and to connect open data and internet freedom issues. Furthermore, the online accessibility of public data may create or compound privacy and security issues that need addressing.
Underlying the democratic arguments for open data is the idea that citizens should have access to any data that affects their lives, to be able to use and analyse it for themselves, to critique official interpretations, and to offer policy alternatives. Economic growth arguments for open data often note the importance of a reliable, timely supply of data on which innovative products and services can be built. But being able to use data for democratic engagement, to support economic activity, is not just a matter of having the data – it also requires the skills to use it. Michael Gurstein has highlighted the risk that open data might ‘empower the empowered’ creating a new ‘data divide’. Addressing grassroots skills to use data, ensuring countries have capacity to exploit their own national open data, and identifying the sorts of intermediary institutions and capacity building to ensure citizens can make effective use of open data is a key challenge.
There are also technical dimensions of the data divide. Many open data infrastructures have developed in environment of virtually unlimited bandwidth, and are based on the assumption that transferring large data files is not problematic: an assumption that cannot be made everywhere in the world. Digital interfaces for working with data often rely on full size computers, and large datasets can be difficult to work with on mobile platforms. As past IGF cloud computing discussions have highlighted, where data is hosted may also matter. Placing public data, albeit openly licensed so sidestepping some of the legal issues, into cloud hosting, could have impacts on the accessibility, and the costs of a access, to that data. How far this becomes an issue may depend on the scale of open data programmes, which as yet can only constitute a very small proportion of Internet traffic in any country. However, when data that matters to citizens is hosted in a range of different jurisdictions, Internet Freedom and filtering issues may have a bearing on who really has access to open data. As Walid Al-Saqaf’s power presentation at the Open Government Partnership highlighted, openness in public debate can be dramatically restricted when governments have arbitrary Internet filtering powers.
Last, but not least, in the data accessibility issues, whilst most advocates of open data explicitly state that they are concerned only with public data, and exclude personal datafrom the discussion, the boundaries between these two categories are often blurred (for example, court records are about individuals, but might also be a matter of public record), and with many independently published open datasets based on aggregated or anonymised personal data, plus with large-scale datasets harvested from social media and held by companies, ‘jigsaw identification’, in which machines can infer lots of potentially sensitive and personal facts about individuals becomes a concern. As Cole outlines, in the past we have dealt with some of these concerns by ad-hoc limitations and negotiated access to data. Unrestricted access to open data online removes these strategies, and highlights the importance of finding other solutions that protect keydimensions of individual privacy.
2) Open data as machine readable
Publishing datasets involves selecting formats and standards which impact on what the data can express and how it can be used. Often standard setting can have profound political consequences, yet it can be treated as a purely technical issue.
Standards are developing for everything from public transport timetables (GTFS), to data on aid projects (IATI). These standards specify the format data should be shared in, and what the data can express. If open data publishers want to take advantage of particular tools and services, they may be encouraged to chose particular data standards. In some areas, no standards exist, and competing open and non-open standards are developing. Sometimes, because of legacy systems, datasets are tied into non-open standards, creating a pressure to develop new open alternatives.
Some data formats offer more flexibility than others, but usually with connected increase in complexity. The common CSV format of flat data, accessing in spreadsheet software, does not make it easy to annotate or extend standardised data to cope with local contexts. eXtensible Markup Language makes extending data easier, and Linked Data offers the possibility of annotating data, but these formats often present barriers for users without specialist skills or training. As a whole web of new standards, code lists and identifiers are developed to represent growing quantities of open data, we need to askwho is involved in setting standards and how can we make sure that global standards for open data promote, rather than restrict, the freedom of local groups to explore and address the diverse issues that concern them.
3) Open data as licensed for re-use
Many uses case for open data rely on the ability to combine datasets, and this makesc ompatible licenses a vital issue. In developing license frameworks, we should engage with debates over who benefits from open data and how norms and licenses can support community claims to benefit from their data.
Open Source and Creative Commons licenses often include terms such as a requirement to ‘Share Alike’, or a Non-Commercial clause prohibiting profit making use of the content. These place restrictions on re-users of the content: for example, if you use Share Alike licensed content to in your work, you must share your work under the same license. However, open data advocates argue that terms like this quickly create challenges for combining different datasets, as differently licensed data may be incompatible, and many of the benefits of having access to the data will be lost when it can’t be mashed up and remixed using both commercial and non-commercial tools. The widely cited OpenDefinition.org states that at most, licenses can require attribution of the source, but cannot place any other restrictions on data re-use. Developing a common framework for licensing has been a significant concern in many past governance discussions of open data.
These discussions of common licenses have connections to past Access to Knowledge (A2K) debates where the rights of communities to govern access to traditional knowledges, or to gain a return from use of traditional knowledge have taken place. An open licensing framework creates the possibility that, without a level playing field of access to resources to use data (i.e. data divides), some powerful actors might exploit open data to their advantage, and to the loss of those who have stewarded that data in the past. Identifying community norms, and other responses to addresses these issues is an area for discussion.
I’ve tried to set out some of the areas where debates on open data might connect with existing or emerging internet governance debates. In the workshop I’m planning for this years IGF I am hoping we will be able to dig into these issues in more depth to identify how far they are issues for the IGF, or for other fora, and to develop ideas on different constructive approaches to support equitable outcomes from open data. I’m sure the issues above don’t cover all those we might address, so do drop in a comment below to share your suggestions for other areas we need to discuss…
(Other suggested references welcome too…)
Addition: over on the CIGF post Andrew has already suggested an extra reference to Tom Slee’s thought provoking blog post on ‘Seeing like a geek’ that emphasises the importance of putting licensing issues very much on the table in governance debates.