Category Archives: Open Data

Can the G8 Open Data Charter deliver real transparency?

[Summary: cross-post of an article reflecting on the G8 Open Data Charter]

I was asked by The Conversation, a new journalism platform based around linking academic writers with professional journalists and editors, to put together a short article on the recent G8 Open Data Charter, looking at the potential for it to deliver on transparency. The result is now live over on The Conversation site, and pasted in below (under a Creative Commons license). 

Last week G8 leaders signed up to an Open Data Charter, calling for government datasets to be “open data by default”. Open data has risen up the government agenda in the UK over the last three years, with the UK positioning itself as a world leader. But what does the charter mean for G8 nations, and more broadly, will it deliver on the promise of economic impacts and improved governance through the open release of government data relating to matters such as crime figures, energy consumption and election results?

Open government data (OGD) has rapidly developed from being the niche interest of a small community of geeks to a high-profile policy idea. The basic premise of OGD is that when governments publish datasets online, in digital formats that can be easily imported into other software tools, and under legal terms that permit anyone to re-use them (including commercially), those outside government can use that data to develop new ideas, apps and businesses. It also allows citizens to better scrutinise government and hold authorities to account. But for that to happen, the kind of data released, and its quality, matter.

As the Open Knowledge Foundation outlined ahead of the G8 Summit in a release from its Open Data Census “G8 countries still have a long way to go in releasing essential information as open data”. Less than 50% of the core datasets the census lists for G8 members are fully available as open data. And because open data is one of the most common commitments made by governments when they join the wider Open Government Partnership (OGP), campaigners want a clear set of standards for what makes a good open data initiative. The G8 Open Data Charter provides an opportunity to elaborate this. In a clear nod towards the OGP, the G8 charter states: “In the spirit of openness we offer this Open Data Charter for consideration by other countries, multinational organisations and initiatives.”

But can the charter really deliver? Russia, the worst scoring G8 member on the Open Data Census, and next chair of the G8, recently withdrew from the OGP, yet signed up to the Charter. Even the UK’s commitment to “open data by default” is undermined by David Cameron’s admission that the register of company beneficial ownership announced as part of G8 pledges on tax transparency will only be accessible to government officials, rather than being the open dataset campaigners had asked for.

The ability of Russia to sign up to the Open Data Charter is down to what Robison and Yu have called the “Ambiguity of Open Government” — the dual role of open data as a tool for transparency and accountability and for economic growth. As Christian Langehenke explains, Russia is interested in the latter, but was uncomfortable with the focus placed on the former in the OGP. The G8 Charter covers both benefits of open data but is relatively vague when it comes to the release of data for improved governance.

However, if delivered, the specific commitments made in the technical annexe to opening national election and budget datasets, and to improving their quality by December 2013, would signal progress for a number of states, Russia included. Elsewhere in the G8 communiqué, states also committed to publishing open data on aid to the International Aid Transparency Initiative standard, representing new commitments from France, Italy and Japan.

The impacts of the charter may also be felt in Germany and in Canada, where open data campaigners have long been pushing for greater progress to release datasets.Canadian campaigner David Eaves highlights in particular how the charter commitment to open specific “high value” datasets goes beyond anything in existing Canadian policy. Although the pressure of next year’s G8 progress report might not provide a significant stick to spur on action, the charter does give campaigners in Canada, Germany other other G8 nations a new lever in pushing for greater publication of data from their governments.

Delivering improved governance and economic growth will not come from the release of data alone. The charter offers some recognition of this, committing states to “work to increase open data literacy” and “encourage innovative uses of our data through the organisation of challenges, prizes or mentoring”. However, it stops short of considering other mechanisms needed to unlock the democratic and governance reform potential of open data. At best it frames data on public services as enabling citizens to “make better informed choices about the services they receive”, encapsulating a notion of citizen as consumer (a framing Jo Bates refers to the as the co-option of open data agendas), rather than committing to build mechanisms for citizens to engage with the policy process, and thus achieve accountability, on the basis of the data that is made available.

The charter marks the continued rise of open data to becoming a key component of modern governance. Yet, the publication of open data alone stops short of the wider institutional reforms needed to deliver modernised and accountable governance. Whether the charter can secure solid open data foundations on which these wider reforms can be built is something only time will tell.

Geneva E-Participation Day: Open Data and International Organisations

Meeting venue (I think...)[Summary: notes for a talk on open data and International Organisations]

In just over a weeks time I’l be heading for Geneva to take part in Diplo Foundation’s E-Participation Day: towards a more open UN?’ event. In the past I’ve worked with Diplo on remote participation, using the web to support live online participation in face-to-face meetings such as the Internet Governance Forum. This time I’ll be talking open data – exploring the ways in which changing regimes around data stand to impact International Organisations. This blog post was written for the Diplo blog as an introduction to some of the themes I might explore. 

The event will, of course, have remote participation – so you can register to join in-person or online for free here.

E-participation and remote hubs have the potential to open up dialogue and decision making. But after the conferences have been closed, and the declarations made, it is data that increasingly shapes the outcome of international processes. Whether it’s the numbers counted up to check on progress towards the millennium development goals, GDP percentage pledges on aid spending, or climate change targets, the outcomes of international co-operation frequently depend on the development and maintenance of datasets.

The adage that ‘you can’t manage what you can’t measure’ has relevance both for International Organisations and for citizens. The better the flows of data International Organisations can secure access to, the greater their theoretical capacity for co-ordination of complex systems. And the greater the flows of information from the internal workings of International Organisations that citizens, states and pressures groups can access, the greater their theoretical capacity to both scrutinise decisions and to get involved in decision making and implementation. I say theoretical capacity, because the picture is rarely that straightforward in practice. Yet, that complexity aside for a moment, over the last few years the idea has been gaining ground that, in some states has led to not only a greater flow of data, but has driven a veritable flood – with hundreds and thousands of government datasets placed online for anyone to access and re-use. That idea is open data.

Open Data is a simple concept. Organisations holding datasets should place them online, in machine-readable formats, and under licenses that let anyone re-use them. Advocates explain that this brings a myriad of benefits. For example, rather than finance data being locked up in internal finance systems, only available to auditors, open data on budgets and spending can be published on the web for anyone to download and explore in their spreadsheet software, or to let third parties generate visualisations that show citizens where their money is being spent, and to help independent analysts look across datasets for possible inefficiency, fraud or corruption. Or instead of the location of schools or health centres being kept on internal systems, the data can be published to allow innovators to present it to citizens in new and more accessible ways. And in crisis situations, instead of co-ordinators spending days collecting data from agencies in the field and re-keying the data into central databases, if all the organisations involved were to publish open data in common formats, there is the possibility of it being aggregated together, building up a clearer picture of what is going on. One of the highest profile existing open data initiatives in the development field is the International Aid Transparency Initiative (IATI) which now has standardised open data from 100s or donors, providing the foundation for a timely view of who is doing what in aid.

Open data ideas have been spreading rapidly across the world, with many states establishing national Open Government Data (OGD) initiatives, and International Organisations from The World Bank, to UN DESA, the OECD and the Open Government Partnership all developing conversations and projects around open data. When the G8 meet next week in Northern-Ireland they are expected to launch an ‘Open Data Charter’ setting out principles for high quality open data, and committing states to publish certain datasets. Right now it remains to be seen whether open data will feature anywhere else in the  in the G8 action plans, although there is clearly space for open data ideas and practices to be deployed in securing greater tax transparency, or supporting the ongoing monitoring of other commitments. In the case of the post-2105 process, a number of organisations have been advocating for an access to information focus, seeking to ensure citizens have access to open data that they can use to monitor government actions and hold governments to account on delivering on commitments.

However – as Robinson and Yu have highlighted – there can be an ambiguity of open government data: more open data does not necessarily mean more open organisations. The call for ‘raw data now’ has led to much open data emerging simply as an outbound communication, without routes for engagement or feedback, and no change in existing organisational practices. Rather than being treated as a reform that can enable greater organisational collaboration and co-ordination, many open datasets have just been ‘dumped’ on the web. In the same way that remote participation is often a bolt-on to meetings, without the deeper changes in process needed to make for equal participation for remote delegates, at best much open data only offers actors outside of institutions a partial window onto their operations, and at worst, the data itself remains opaque: stripped of context and meaning. Getting open data right for both transparency, and for transforming international collaboration needs more than just technology. 

As I explored with Jovan Kurbalija of Diplo in a recent webinar, there are big challenges ahead if open data is to work as an asset for development: from balancing tensions between standardisation and local flexibility, developing true multi-stakeholder governance of important data flows, and getting the incentives for collaboration right. However, now is the time to be engaging with these challenges – within a window of energy and optimism, and before network effects lock in paradoxically ‘closed’ systems of open data. I hope the dialogue at the Geneva E-Participation day will offer a small chance to broaden open data understanding and conversations in a way that can contribute to such engagement.

Open data in extractives: meeting the challenges


followthedatalinesmallerThere’s lots of interest building right now around how open data might be a powerful tool for transparency and accountability in the extractive industries sector. Decisions over where extraction should take place have a massive impact on communities and the environment, yet often decision making is opaque, with wealthy private interests driving exploitation of resources in ways that run counter the public interest. Whilst revenues from oil, gas and mineral resources have the potential to be a powerful tool for development, with a proportion channeled into public funds, massive quantities of revenue frequently ‘go missing’, lost in corruption, and
fuelling elements of a resource curse.

For the last ten years the Extractive Industries Transparency Initiative has been working to get companies to commit to ‘publish what they pay‘ to government, and for government to disclose receipts of finance, working to identifying missing money through a document-based audit process. Campaigning coalitions, watchdogs and global initiatives have focussed on increasing the transparency of the sector. Now, with a recognition that we need to link together information on different resources flows for development at all levels, potentially through the use of structured open data, and with an anticipated “data tsunami” of new information on extractives financials anticipated from the Dodd-Frank act in the US, and similar regulation in Europe, groups working on extractives transparency have been looking at what open data might mean for future work in this area.

8713819458_08a1bf9c10_zRight now, DFID are taking that exploration forward through a series of hack days with Rewired State under the ‘follow the data’ banner, with the first in London last weekend, and one coming up next week in Lagos, Nigeria. The idea of the events is to develop rapid prototypes of tools that might support extractives transparency, putting developers and datasets together over 24 hours to see what emerges. I was one of the judging panel at this weekends event, where the three developer teams that formed looked respectively at: making datasets on energy production and prices more accessible for re-use through an API; visualising the relationship between extractives revenues and various development indicators; and designing an interface for ‘nuggets’ of insight discovered through hack-days to be published and shared with useful (but minimal) meta-data.

In their way, these three projects highlight a range of the challenges ahead for the extractives sector in building capacity to track resource flows through open data:

  • Making data accessibleThe APIfy project sought to take a number of available datasets and aggregate them together in a database, before exposing a number of API endpoints that made machine-readable standardised data available on countries, companies and commodities. By translating the data access challenge from one or routing around in disparate datasets, to one of calling a standard API for key kinds of ‘objects’, the project demonstrated the need developers often have for clear platforms to build upon. However, as I’ve discovered in developing tools for the International Aid Transparency Initiative, building platforms to aggregate together data often turns out to be a non-trivial project: technically (it doesn’t take long to get to millions of data items when you are dealing with financial transactions), economically (as databases serving millions of records to even a small number of users need to be maintained and funded), socially (developers want to be able to trust the APIs they build against to be stable, and outreach and documentation are needed to support developers to engage with an API), and in terms of information architecture (as design choices over a dataset or API can have a powerful affect on downstream re-users).
  • Connecting datasets – none of the applications from the London hack-day were actually able to follow resource flows through the available data. Although visions of a coherent datasphere, in which the challenge is just making the connection between a transaction in one dataset, and a transaction in another, to see where money is flowing, are appealing – traceability in practice turns out to be a lot harder. To use the IATI example again, across the 100,000+ aid activities published so far less than 1% include traceability efforts to show how one transaction relates to another, and even here the relationships exist in the data because of conscious efforts by publishers to link transaction and activity identifiers. In following the money there will be many cases where people have an incentive not to make these linkages explicit. One of the issues raised by developers over the hack-day was the scattered nature of data, and the gaps across it. Yet – when it comes to financial transaction tracking, we’re likely to often be dealing with partial data, full of gaps, and it won’t be easy to tell at first glance when a mis-match between incoming and outgoing finances is a case of missing data or corruption. Right now, a lot of developers attack open data problems with tools optimised for complete and accurate data, yet we need to be developing tools, methods and visualisation approaches that deal with partial and uncertain data. This is developed in the next point.
  • Correlation, causation and investigation – The Compare the Map project developed on the hack day uses “scraped data from GapMinder and EITI to create graphical tools” that allow a user to eye-ball possible correlations between extractives data and development statistics. But of course, correlation is not causation – and the kinds of analysis that dig deeper into possible relationships are difficult to work through on a hack day. Indeed, many of the relationships mash-ups of this form can show have been written about in papers that control for many more variables, dealing carefully with statistically challenging issues of missing data and imperfectly matched datasets. Rather than simple comparison visualisations that show two datasets side by side, it may be more interesting to look for all the possible statistically significant correlations in a datasets with common reference points, and then to look at how human users could be supported in exploring, and giving feedback on, which of those might be meaningful, and which may or may not already be researched. Where research does show a correlation to exist, then using open data to present a visual narrative to users about this can have a place, though here the theory of change is very different – not about identifying connections – but about communicating them in interactive and engaging ways to those who may be able to act upon them.
  • Sharing and collaborating – The third project at the London hack-day was ‘Fact Cache‘ – a simple concept for sharing nuggets of information discovered in hack-day explorations. Often as developers work through datasets they may come across discoveries of interest, yet these are often left aside in the rush to create a prototype app or platform. Fact Cache focussed on making these shareable. However, when it was presented discussions also explored how it could make these nuggets of information into social objects, open to discussion and sharing. This idea of making open data findings more usable as social objects was also an aspect of the UN Global Pulse hunchworks project. That project is currently on hold (it would be interesting to know why…), but the idea of supporting collaboration around open data through online tools, rather than seeing apps that present data, or initial analysis as the end point, is certainly one to explore more in building capacity for open data to be used in holding actors to account.
  • Developing theories of change – as the judges met to talk about the projects, one of the key themes we looked at was whether each project had a clear theory of change. In some sense taken together they represent the complex chain of steps involved in an open data theory of change, from making data more accessible to developers, creating tools and platforms that let end users explore data, andthen allowing findings from data to be communicated and to shape discourses and action. Few datasets or tools are likely to be change-making on their own – but rather can play a key role in shifting the balance of power in existing networks or organisations, activists, companies and governments. Understanding the different theories of change for open data is one of the key themes in the ongoing Open Data in Developing Countries research, where we take existing governance arrangements as a starting point in understanding how open data will bring about impacts.

In a complex world, access to data, and the capacity to use it effectively, are likely to be essential parts of building more accountable governance across a wide range of areas, including in the extractives industry. Although there are many challenges ahead if we are to secure the maximum benefits from open data for transparent and accountable governance, it’s exciting and encouraging to see so many passionate people putting their minds early to tackling them, and building a community ready to innovate and bring about change.

Note: The usage of ‘follow the data’ in this DFID project is distinct from the usage in the work I’m currently doing to explore ‘follow the data’ research methods. In the former, the focus is really on following financial and resource flows through connecting up datasets; in the latter the focus is on tracing the way in which data artefacts have been generated, deployed, transferred and used in order to understand patterns of open data use and impact.

 

Intelligent Impact: Evaluating an open data capacity building with voluntary sector organisations

[Summary: sharing the evaluation report (9 pages, PDF) of an open data skills workshop for voluntary sector organisations]

Banner

Late last year, through the CSO network on the Open Government Partnership, I got talking with Deirdre McGrath of the Your Voice, Your City project about ways of building voluntary sector capacity to engage with open data. We talked about the possibility of a hack-day, but realised the focus at this stage needed to be on building skills, rather than building tools. It also needed to be on discovering what was possible with open data in the voluntary sector, rather than teaching people a limited set of skills. And as the Your Voice, Your City project was hosted within the London Voluntary Services Council (LVSC), an infrastructure organisation with a policy and research team, we had the possibility of thinking about the different roles needed to make the most of open data, and how a capacity building pilot could work both with frontline Voluntary and Community Sector (VCS) organisations, and an infrastructure organisation. A chance meeting with Nick Booth of podnosh gave form to a theme in our conversations about the need to focus on both ‘stats’ and ‘stories’ ensuring that capacity building worked with both quantitative and qualitative data and information. The result: plans for a short project, centred on a one-day workshop on ‘Intelligent Impact’, exploring the use of social media and open data for VCS organisations.

The day involved staff from VCS organisations coming along with questions or issues they wanted to explore, and then splitting into groups with a team of open data and social media mentors (Nick Booth, Caroline Beavon, Steven Flower, Paul Bradshaw and Stuart Harrison) to look at how existing online resources, or self-created data and media, could help respond to those questions and issues. Alex Farrow captured the story of the day for us using Storify and I’ve just completed a short evaluation report telling the story in more depth, capturing key learning from the event, and setting out possible next steps (PDF).

Following on from the event, the LVSC team have been exploring how a combination of free online tools for curating open data, collating questions, and sharing findings can be assembled into a low-cost and effective ‘intelligence hub‘, where data, analysis and presentation layers are all made accessible to VCS organisations in London.

Developing data standards for Open Contracting

logo-open-contractingContracts have a key role to play in effective transparency and accountability: from the contracts government sign with extractives industries for mineral rights, to the contracts for delivery of aid, contracts for provision of key public services, and contracts for supplies. The Open Contracting initiative aims to improve the disclosure and monitoring of public contracts through the creation of global principles, standards for contract disclosure, and building civil society and government capacity. One strand of work that the Open Contracting team have been exploring to support this work is the creation of a set of open data standards for capturing contract information. This blog post reports on some initial ground work designed to inform this strand of work.

Although I was involved in some of the set-up of this short project, and presented the outcomes at last weeks workshop, the bulk of the work was undertaken by Aptivate‘s Sarah Bird.

Update: see also the report of the process here.

Update 2 (12th Sept 2013): Owen Scott has build on the pilot with data from Nepal.

The process

Developing standards is a complex process. Each choice made has implications: for how acceptable the standard will be to different parties; for how easy certain uses of the data will be; and for how extensible the standard will be, or which other standards it will easily align with. However, standards cannot easily be built up choice-by-choice from a blank slate adopting the ideal choice: they are generally created against a background of pre-existing datasets and standards. The Open Contracting data standards team had already gathered together a range of contract information datasets currently published by governments across the world, and so, with just a few weeks between starting this project and the data standards workshop on 28th March, we planned an 5-day development sprint, aiming to generate a very draft first iteration of a standard. Applying an agile methodology, where short iterations are each designed to yield a viable product by the end, but on the anticipating that further early iterations may revise and radically alter this, meant we had to set a reasonable scope for this first sprint.

The focus then was on the supply side, taking a set of existing contract datasets from different parties, and identifying their commonalities and differences. The contract datasets selected were from the UK, USA, Colombia, Philippines and the World Bank. From looking at the fields these existing datasets had in common, an outline structure was developed, working on a principle of taking good ideas from across the existing data, rather than playing to a lowest common denominator. Then, using the International Aid Transparency Initiative activity standard as a basis, Sarah drafted a basic data structure, which can act as a version 0.01 standard for discussion. To test this, the next step was to convert samples from some of the existing datasets into this new structure, and then to analyse how much of the available data was covered by the structure, and how comprehensive the available data was when placed against the draft structure. (The technical approach taken, which can be found in the sprint’s GitHub repository, was to convert the different incoming data to JSON, and post it into a MongoDB instance for analysis).

We discuss the limitations of this process in a later section.

Initial results

The initial pass of data suggested a structure based on:

  • Organisation data – descriptions of organisations, held separately from individual contract information, and linked by a globally unique ID (based on the IATI Organisational ID standard)
  • Contract meta data – general information about the contract in question, such as title, classification, default currency and primary location of supply. Including an area for ‘line items’ of elements the contract covers.
  • Contract stages – a series of separate blocks of data for different stages of the contract, all contained within the overarching contract element.
    • Bid – key dates and classifications about the procurement stage of a contract process.
    • Award – details of the parties awarded the contract and the details of the award.
    • Performance – details of transactions (payments to suppliers) and work activities carried out during the performance of the contract.
    • Termination – details of the ending of the contract.
  • Documents – fields for linking to related documents.

A draft annotated schema for capturing this data can be found in XML and JSON format here, and a high-level overview is also represented in the diagram below. In the diagrams that follow, each block represents one data point in the draft standard.

1-Phases

We then performed an initial analysis to explore how much of the data currently available from the sources explored would fit into the standard, and how comprehensively the standard could be filled from existing data. As the diagram below indicates, no single source covered all the available data fields, and some held no information on particular stages of the contracting process at all. This may be down to different objectives of the available data sources, or deeper differences in how organisations handle information on contracts and contracting workflows.

2-Coverage

Combining the visualisations above into a single views given a sense of which data points in the draft standard have greatest use, illustrated in the schematic heat-map below.

3-Heatma

At this point the analysis is very rough-and-ready, hence the presentation of a rough impression, rather than detailed field-by-field analysis. The last thing to check was how much data was ‘left over’ and not captured in the standard. This was predominantly the case for the UK and USA datasets, where many highly specialised fields and flags were present the dataset, indicating information that might be relevant to capture in local contract datasets, but which might be harder to find standard representations for across contracts.

4-Extra

The next step was to check whether data that could go into the same fields could be easily harmonised. As the existence of organisation details, or dates, and classifications of contracts across different datasets does not necessarily mean these are interoperable. Fields like dates and financial amounts appeared to be relatively easy to harmonise, but some elements present greater challenges, such as organisational identifiers, contact people, and various codelists in use. However some code-lists may possible to harmonise. For example, the ‘Category’ classifications from across datasets were translated, grouped and aggregated, up to 92% of the original data in a sample was retained.

5-Sum and Group

Implications, gaps, next steps

This first iteration provides a basis for future discussions. There are, however, some important gaps. Most significant of all is that this initial development has been supply-side driven, based around the data that organisations are already publishing, rather than developed on the basis of the data that civil society organisations, or scrutiny bodies, are demanding in order to make sense of complex contract situations. It also omits certain kinds of contracts, such as complex extractives contracts (on which, see the fantastic work Revenue Watch have been doing with getting structured data from PDF contracts with Document Cloud), and Public Private Partnership (PPP) contracts. And it has not delved deeply into the data structures needed for properly capturing information that can aid in monitoring contract performance. These gaps will all need to be addressed in future work.

At the moment, this stands as discrete project, and no set next-steps are agreed as far as I’m aware. However, some of the ideas explored in the meeting on the 28th included:

  • A next iteration – focussed on the demand side – working with potential users of contracts data to work out how data needs to be shaped, and what needs to be in a standard to meet different data re-use needs. This could build towards version 0.02.
  • Testing against a wider range of datasets – either following, or in parallel with, a demand-driven iteration, to discover how the work done so far evolves when confronted with a larger set of existing contract datasets to synthesise.
  • Connecting with other standards. This first sprint took the IATI Standard as a reference point. There may be other standards to refer to in development. Discussions on the 28th with those involved in other standards highlighted an interest in more collaborative working to identify shared building blocks or common elements that might be re-used across standards, and to explore the practical and governance implications of this.
  • Working on complementary building blocks of a data standard – such as common approaches to identifying organisations and parties to a contract; or developing tools and platforms that will aggregate data and make data linkable. The experience of IATI, Open Spending and many other projects appears to be that validators, aggregation platforms and data-wrangling tools are important complements to standards for supporting effective re-use of open data.

Keep an eye on the Open Contracting website for more updates.

Open Data for Poverty Alleviation: Striking Poverty Discussion

Screen Shot 2013-02-03 at 08.43.29

[Summary: join an open discussion on the potential impacts of open data on poverty reduction]

Over the next two weeks, along with Tariq Kochar, Nitya V. Raman and Nathan Eagle, I’m taking part in an online panel hosted by the World Bank’s Striking Poverty platform to discuss the potential impacts of open data on poverty alleviation.

So far we’ve been asked to provide some starting statements on how we see open data and poverty might relate, and now there’s an open discussion where visitors to the site are invited to share their questions and reflections on the topic.

Here’s what I have down as my opening remarks:

Development is complex. No individual or group can process all the information needed to make sense of aid flows, trade patterns, government budgets, community resources and environmental factors (amongst other things) that affect development in a locality. That’s where data comes in: open datasets can be connected, combined and analysed to support debate, decision making and governance.

Projects like the International Aid Transparency Initiative (IATI) have sought to create the technical standards and political commitments for effective data sharing. IATI is putting together one corner of the poverty reduction jigsaw, with detailed and timely forward-looking information on aid. IATI open data can be used by governments to forecast spending, and by citizens to hold donors to account. This is the promise of open data: publish once, use many times and for many purposes.

But data does not use itself. Nor does it transcend political and practical realities. As the papers in a recent Journal of Community Informatics special issue highlight show, open data brings both promise and perils. Mobilising open data for social change requires focus and effort.

We’re only at the start of understanding open data impacts. In the upcoming Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC), the Web Foundation and partners will be looking at how open data affects governance in different countries and contexts across the world. Rather than look at open data in the abstract, the project will explore cases such as open data for budget monitoring in Brazil, or open data for poverty reduction in Uganda. This way it will build up a picture of the strategies that can be used to make a difference with data; it will analyse the role that technologies and intermediaries play in mobilising data; and it will also explore unintended consequences of open data.

I hope in this discussion we can similarly focus on particular places where open data has potential, and on the considerations needed to ensure the supply and use of open data has the best chance possible of improving lives worldwide.

What do you think? You can join the discussion for the next two weeks over on the Striking Poverty site…

Linked-Development: notes from Research to Impact at the iHub

[Summary: notes from a hackathon in Nairobi built around linked open data]

Research to Impact HackI’ve just got back from an energising week exploring open data and impact in Kenya, working with R4D and IDS at Nairobi’s iHub to run a three-day hackathon titled ‘Research to Impact’. You can read Pete Cranston’s blog posts on the event here (update: and iHub’s here). In this post, after a quick pre-amble, I reflect particularly on working with linked data as part of the event.

The idea behind the event was fairly simple: lots of researchers are producing reports and publications related to international development, and these are logged in catalogues like R4D and ELDIS, but often it stops there, and research doesn’t make it into the hands of those who can use it to bring about economic and social change. By opening up the data held on these resources, and then working with subject experts and developers, we were interested to see whether new ideas would emerge for taking research to where it is needed.

The Research to Impact hack focused in on ‘agriculture and nutrition’ research so that we could spend the first day working with a set of subject experts to identify the challenges research could help meet, and to map out the different actors who might be served by new digital tools. We were hosted for the whole event at the inspiring iHub and mLab venue by iHub Research. iHub provides a space for the growing Kenya tech community, acting as a meeting space, incubator and workspace for developers and designers. With over 10,000 members of it’s network, iHub also helped us to recruit around 20 developers who worked over the second two days of the hackathon to build prototype applications responding to the challenges identified on day one, and to the data available from R4D and IDS.

A big focus of the hackathon development turned out to be on mobile applications: as in Kenya mobile phones are the primary digital tool for accessing information. On day four, our developers met again with the subject experts, and pitched their creations to a judging panel, who awarded first, second and third prizes. Many of the apps created had zeroed in on a number of key issues: working through intermediaries (in this case, the agricultural extension worker), rather than trying to use tech to entirely disinter-mediate information flows; embedding research information into useful tools, rather than providing it through standalone portals (for example, a number of teams build apps which allowed extension workers to keep track of the farmers they were interacting with, and that could then use this information to suggest relevant research); and, most challengingly, the need for research abstracts and descriptions to be translated into easy-to-understand language that can fit into SMS-size packages. Over the coming weeks IDS and R4D are going to be exploring ways to work with some of the hackathon teams to take their ideas further.

Linked-development: exploring the potential of linked data

Linked Data StructureThe event also provided us with an opportunity to take forward explorations of how linked data might be a useful technology in supporting research knowledge sharing. I recently wrote a paper with Duncan Edwards of IDS exploring the potential of linked data for development communication, and I’ve been exploring linked data in development for a while. However, this time we were running a hackathon directly from a linked data source, which was a new experience.

Ahead of the event I set up linked-development.org as a way to integrate R4D data (already available in RDF), and ELDIS data (which I wrote a quick scraper for), both modelled using the FAO’s AGRIS model. In order to avoid having to teach SPARQL for access to the data, I also (after quite a steep learning curve) put together a very basic Puelia Linked Data API implementation over the top of the data. To allow for a common set of subject terms between the R4D and ELDIS data, I made use of the Maui NLP indexer to tag ELDIS agriculture and nutrition documents against the FAO’s Agrovoc (R4D already had editor assigned terms against this vocabulary), giving us a means of accessing the documents from the two datasets alongside each other.

The potential value of this approach become clear on the first day of the event, when one of the subject experts showed us their own repository of Kenyan-focussed agricultural research publications and resources, which was already modelled and theoretically accessible as RDF using the Agris model. Although our attempts to integrate this into our available dataset failed due to the Drupal site serving the data hitting memory limits (linked data still remains something that tends to need a lot of server power thrown at it, and that can have significant impacts where the relative cost of hosting and tech capacity is high), the potential to bring more local content into linked-development.org alongside data from R4D and ELDIS was noted by many of the developers taking part as something which would be likely to make their applications a lot more successful and useful: ensuring that the available information is built around users needs, not around organisational or project boundaries.

At the start of the developer days, we offered a range of ways for developers to access the research meta-data on offer. We highlighted the linked data API, the ELDIS API (although it only provided access to one of the datasets, I found it would be possible for us to create an compatible API speaking to the linked data in future), and SPARQL as means to work with the data. Feedback forms from the event suggest that formats like JSON were new to many of our participants, and linked data was a new concept to all. However, in the end, most teams chose to use some of the prepared SPARQL queries to access the data, returning results as JSON into PHP or Python. In practice, over the two days this did not end up realising the full value of linked data, as teams generally appeared to use code samples to pull SPARQL ‘SELECT’ result sets into relational databases, and then to build their applications from there (a common issue I’ve noted at hack days, where the first step of developers is to take data into the platform they use most). However, a number of teams were starting to think about both how they could use more advanced queries or direct access to the linked data through code libraries in future, and most strikingly, were talking about how they might be able to write data back to the linked-development.org data store.

This struck me as particularly interesting. A lot of the problems teams faced in creating their application was that the research meta-data available was not customised to agricultural extension workers or farmers. Abstracts would need to be re-written and translated. Good quality information needed to be tagged. New classifications of the resources were needed, such as tagging research that is useful in the planting season. Social features on mobile apps could help discover who likes what and could be used to rate research. However, without a means to write back to the shared data store, all this added value will only ever exist in the local and fragmented ecosystems around particular applications. Getting feedback to researchers about whether their research was useful was also high on the priority list of our developers: yet without somewhere to put this feedback, and a commitment from upstream intermediaries like R4D and ELDIS to play a role feeding back to authors, this would be very difficult to do effectively.

This links to one of the points that came out in our early IKM Emergent work on linked data, noting that the relatively high costs and complexity of the technology, and the way in which servers and services are constructed, may lead to an information environment dominated by those with the capacity to publish; but that it has the potential, with the right platforms, configurations and outreach, to bring about a more pluralistic space, where the annotations from local users of information can be linked with, and equally accessible as, the research meta-data coming from government funded projects. I wish we had thought about this more in advance of the hackathon, and provided each team with a way to write data back to the linked-development.org triple store (e.g. giving them named graphs to write to; and providing some simple code samples or APIs), as I suspect this would have opened up a whole new range of spaces for innovation.

Overall though, the linked-development.org prototype appears to have done some useful work, not least providing a layer to connect two DFID funded projects working on mobilising research. I hope it is something we can build upon in future.

Final papers in JCI Special Issue on Open Data

Earlier this year I blogged about the first release of papers on Open Data in a Special Issue of the Journal of Community Informatics that I had been co-editing with Zainab Bawa. A few days ago we added the last few papers to the issue, finalising it as a collection of critical thinking about the development of Open Government Data.

You can find the full table of contents below (new papers noted with (New)).

Table of Contents

Editorial

The Promises and Perils of Open Government Data (OGD), Tim G. Davies, Zainab Ashraf Bawa

Two Worlds of Open Government Data: Getting the Lowdown on Public Toilets in Chennai and Other Matters, Michael Gurstein

Articles

The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment, Bhuvaneswari Raman

“This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative, Jo Bates

Data Template For District Economic Planning, Sharadini Rath

Guidelines for Designing Deliberative Digital Habitats: Learning from e-Participation for Open Data Initiatives, Fiorella De Cindio

(New) Unintended Behavioural Consequences of Publishing Performance Data: Is More Always Better?, Simon McGinnes, Kasturi Muthu Elandy

(New) Open Government Data and the Right to Information: Opportunities and Obstacles, Katleen Janssen

Notes from the field

Mapping the Tso Kar basin in Ladakh, Shashank Srinivasan

Collecting data in Chennai City and the limits of openness, Nithya V Raman

Apps For Amsterdam, Tom Demeyer

Open Data – what the citizens really want, Wolfgang Both

(New) Trustworthy Records and Open Data, Anne Catherine Thurston

(New) Exploring the politics of Free/Libre/Open Source Software (FLOSS) in the context of contemporary South Africa; how are open policies implemented in practice?, Asne Kvale Handlykken

Points of View

Some Observations on the Practice of “Open Data” As Opposed to Its Promise, Roland J. Cole

How might open data contribute to good governance?

[Summary: sharing an introductory article on open data and governance]

Thanks to an invite via the the great folk at CYEC, earlier this year I was asked to write a contribution for the Commonwealth Governance Handbook around emerging technology trends, so I put down a few thoughts on how open data might contribute to good governance in a Commonwealth context. The book isn’t quite out yet, but as I’m preparing for the next few days I’ll be spending at an IDRC Information and Networks workshop with lots of open access advocates, talking about open data and governance, I thought I should at least get a pre-print uploaded. So here is the PDF for download.

The article starts:

Access to information is increasingly recognised as a fundamental component of good governance. Citizens need access to information on the decision-making processes of government, and on the performance of the state to be able to hold governments to account.

And ends by saying:

Whether open data initiatives will fully live up to high expectations many have for them remains to be seen. However, it is likely that open data will come to play a part in the governance landscape across many Commonwealth countries in coming years, and indeed, could provide a much needed tool to increase the transparency of Commonwealth institutions. Good governance, pro-social and civic outcomes of open data are not inevitable, but with critical attention they can be realised?.

The bit in-between tries to provide a short introduction to open data for beginners, and to consider some of the ways open data and governance meet, drawing particular on examples from the Commonwealth.

Comments and feedback welcome.

Download paper: PDF (128Kb)

Opening the National Pupil Database?

[Summary: some preparatory notes for a response to the National Pupil Database consultation]

The Department for Education are currently consulting on changing the regulations that govern who can gain access to the National Pupil Database (NPD). The NPD holds detailed data on every student in England, going back over ten years, and covering topics from test and exam results, to information on gender, ethnicity, first language, eligibility for free school meals, special educational needs, and detailed information on absences or school exclusion. At present, only a specified list of government bodies are able to access the data, with the exception that it can be shared with suitably approved “persons conducting research into the educational achievements of pupils”. The DFE consultation proposed opening up access to a far wider range of users, in order to maximise the value of this rich dataset.

The idea that government should maximise the value of the data it holds has been well articulated in the open data policies and white paper that suggests open data can be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”. However, the open data movement has always been pretty unequivocal on the claim that ‘personal data’ is not ‘open data’ – yet the DFE proposals seek to apply an open data logic to what is fundamentally a personal, private and sensitive dataset.

The DFE is not, in practice, proposing that the NPD is turned into an open dataset, but it is consulting on the idea that it should be available not only for a wider range of research purposes, but also to “stimulate the market for a broader range of services underpinned by the data, not necessarily related to educational achievement”. Users of the data would still go through an application process, with requests for the most sensitive data subject to additional review, and users agreeing to hold the data securely: but, the data, including easily de-anonymised individual level records, would still be given out to a far wider range of actors, with increased potential for data leakage and abuse.

Consultation and consent

I left school in 2001 and further education in 2003, so as far as I can tell, little of my data is captured by the NPD – but, if it was, it would have been captured based not on my consent to it being handled, but simple on the basis that it was collected as an essential part of running the school system. The consultation documents state that  “The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices”, yet, it would be hard to argue this would constitute informed consent for the data to now be shared with commercial parties for uses far beyond the delivery of education services.

In the case of the NPD, it would appear particularly important to consult with children and young people on their views of the changes – as it is, after all, their personal data held in the NPD. However the DFE website shows no evidence of particular efforts being taken to make the consultation accessible to under 18s. I suspect a carefully conducted consultation with diverse groups of children and young people would be very instructive to guide decision making in the DFE.

The strongest argument for reforming the current regulations in the consultation document is that, in the past, the DFE has had to turn down requests to use the data for research which appears to be in the interests of children and young people’s wellbeing. For example, “research looking at the lifestyle/health of children; sexual exploitation of children; the impact of school travel on the environment; and mortality rates for children with SEN”. It might well be that, consulted on whether the would be happy for their data to be used in such research, many children, young people and parents would be happy to permit a wider wording of the research permissions for the NPD, but I would be surprised if most would happily consent to just about anyone being able to request access to their sensitive data. We should also note that, whilst some of the research DFE has turned down sound compelling, this does not necessarily mean this research could not happen in any other way: nor that it could not be conducted by securing explicit opt-in consent. Data protection principles that require data to only be used for the purpose it was collected cannot just be thrown away because they are inconvenient, and even if consultation does highlight people may be willing for some wider sharing of their personal data for good, it is not clear this can be applied retroactively to data already collected.

Personal data, state data, open data

The NPD consultation raises an important issue about the data that the state has a right to share, and the data it holds in trust. Aggregate, non-disclosive information about the performance of public services is data the state has a clear right to share and is within the scope of open data. Detailed data on individuals that it may need to collect for the purpose of administration, and generating that aggregate data, is data held in trust – not data to be openly shared.

However, there are many ways to aggregate or process a dataset – and many different non-personally identifying products that could be built from a dataset, Many of these government will never have the need to create – yet they could bring social and economic value. So perhaps there are spaces to balance the potential value in personally sensitive datasets with the the necessary primacy of data protection principles.

Practice accommodations: creating open data products

In his article for the Open Data Special Issue of the Journal of Community Informatics I edited earlier this year, Rollie Cole talks about ‘practice accommodations’ between open and closed data. Getting these accommodations right for datasets like the NPD will require careful thought and could benefit from innovation in data governance structures. In early announcements of the Public Data Corporation (now the Public Data Group and Open Data User Group), there was a description of how the PDC could “facilitate or create a vehicle that can attract private investment as needed to support its operations and to create value for the taxpayer”. At the time I read this as exploring the possibility that a PDC could help private actors with an interest in public data products that were beyond the public task of the state, but were best gathered or created through state structures, to pool resources to create or release this data. I’m not sure that’s how the authors of the point intended it, but the idea potentially has some value around the NPD. For example, if there is a demand for better “demographic models [that can be] used by the public and commercial sectors to inform planning and investment decisions” derived from the NPD, are there ways in which new structures, perhaps state-linked co-operatives, or trusted bodies like the Open Data Institute, can pool investment to create these products, and to release them as open data? This would ensure access to sensitive personal data remained tightly controlled, but would enable more of the potential value in a dataset like NPD to be made available through more diverse open aggregated non-personal data products.

Such structures would still need good governance, including open peer-review of any anonymisation taking place, to ensure it was robust.

The counter argument to such an accommodation might be that it would still stifle innovation, by leaving some barriers to data access in place. However, the alternative, of DFE staff assessing each application for access to the NPD, and having to make a decision on whether a commercial re-use of the data is justified, and the requestor has adequate safeguards in place to manage the data effectively, also involves barriers to access – and involves more risk – so the counter argument may not take us that far.

I’m not suggesting this model would necessarily work – but introduce it to highlight that there are ways to increase the value gained from data without just handing it out in ways that inevitably increase the chance it will be leaked or mis-used.

A test case?

The NPD consultation presents a critical test case for advocates of opening government data. It requires us to articulate more clearly the different kinds of data the state holds, to be be much more nuanced about the different regimes of access that are appropriate for different kinds of data, and to consider the relative importance of values like privacy over ideas of exploiting value in datasets.

I can only hope DFE listen to the consultation responses they get, and give their proposals a serious rethink.

 

Further reading and action: Privacy International and Open Rights Group are both preparing group consultation inputs, and welcome input from anyone with views of expert insights to offer.