[Summary: Thinking aloud about open data and data standards as governance tools]
There are interesting shifts in the narratives of open data taking place right now.
Earlier this year, the Open Data Charter launched their new stategy: “Publishing with purpose”, situating it as a move on from the ‘raw data now’ days where governments have taken an open data initaitive to mean just publishing easy-to-open datasets online, and linking to them from data catalogues.
The Open Contracting Partnership, which has encouraged governments to purposely prioritise publication of procurement data for a number of years now, has increasingly been exploring questions of how to design interventions so that they can most effectively move from publication to use. The idea enters here that we should be spending more time with governments focussing on their use cases for data disclosure.
The shifts are welcome: and move closer to understanding open data as strategy. However, there are also risks at play, and we need to take a critical look at the way these approaches could or should play out.
In this post, I introduce a few initial thoughts, though recognising these are as yet underdeveloped. This post is heavily influenced by a recent conversation convened by Alan Hudson of Global Integrity at the OpenGovHub, where we looked at the interaction of ‘(governance) measurement, data, standards, use and impact ‘.
(1) Whose purpose?
The call for ‘raw data now‘ was not without purpose: but it was about the purpose of particular groups of actors: not least semantic web reseachers looking for a large corpus of data to test their methods on. This call configured open data towards the needs and preferences of a particular set of (technical) actors, based on the theory that they would then act as intermediaries, creating a range of products and platforms that would serve the purpose of other groups. That theory hasn’t delivered in practice, with lots of datasets languishing unused, and governments puzzled as to why the promised flowering of re-use has not occurred.
Purpose itself then needs unpacking. Just as early research into the open data agenda questioned how different actors interests may have been co-opted or subverted – we need to keep the question of ‘whose purpose’ central to the publish-with-purpose debate.
(2) Designing around users
Sunlight Foundation recently published a write-up of their engagement with Glendale, Arizona on open data for public procurement. They describe a process that started with a purpose (“get better bids on contract opportunities”), and then engaged with vendors to discuss and test out datasets that were useful to them. The resulting recommendations emphasise particular data elements that could be prioritised by the city administration.
Would Glendale have the same list of required fields if they had started asking citizens about better contract delivery? Or if they had worked with government officials to explore the problems they face when identifying how well a vendor will deliver? For example, the Glendale report doesn’t mention including supplier information and identifiers: central to many contract analysis or anti-corruption use cases.
If we see ‘data as infrastructure’, then we need to consider the appropriate design methods for user engagement. My general sense is that we’re currently applying user centred design methods that were developed to deliver consumer products to questions of public infrastructure: and that this has some risks. Infrastructures differ from applications in their iterability, durability, embeddedness and reach. Premature optimisation for particular data users needs may make it much harder to reach the needs of other users in future.
I also have the concern (though, I should note, not in any way based on the Glendale case) that user-centred design done badly, can be worse than user-centred design done not at all. User engagement and research is a profession with it’s own deep skill set, just as work on technical architecture is, even if it looks at first glance easier to pick up and replicate. Learning from the successes, and failures, of integrating user-centred design approaches into bureacratic contexts and government incentives structures need to be taken seriously. A lot of this is about mapping the moments and mechanisms for user engagement (and remembering that whilst it might help the design process to talk ‘user’ rather than ‘citizen’, sometimes decisions of purpose should be made at the level of the citizenry, not their user stand-ins).
(3) International standards, local adoption
(Open) data standards are a tool for data infrastructure building. They can represent a wide range of user needs to a data publisher, embedding requirement distilled from broad research, and can support interoperabiliy of data between publishers – unlocking cross-cutting use-cases and creating the economic conditions for a marketplace of solutions that build on data. (They can, of course, also do none of these things: acting as interventions to configure data to the needs of a particular small user group).
But in seeking to be generally usable, standard are generally not tailored to particular combinations of local capacity and need. (This pairing is important: if resource and capacity were no object, and each of the requirements of a standard were relevant to at least one user need, then there would be a case to just implement the complete standard. This resource unconstrained world is not one we often find ourselves in.)
How then do we secure the benefits of standards whilst adopting a sequenced publication of data given the resources available in a given context? This isn’t a solved problem: but in the mix are issues of measurement, indicators and incentive structures, as well as designing some degree of implementation levels and flexibility into standards themselves. Validation tools, guidance and templated processes all help too in helping make sure data can deliver both the direct outcomes that might motivate an implementer, whilst not cutting off indirect or alternative outcomes that have wider social value.
(I’m aware that I write this from a position of influence over a number of different data standards. So I have to also introspect on whether I’m just optimising for my own interests in placing the focus on standard design. I’m certainly concerned with the need to develop a clearer articulation of the interaction of policy and technical artefacts in this element of standard setting and implementation, in order to invite both more critique, and more creative problem solving, from a wider community. This somewhat densely written blog post clearly does not get there yet.)
Some preliminary conclusions
In thinking about open data as strategy, we can’t set rules for the relative influence that ‘global’ or ‘local’ factors should have in any decision making. However, the following propositions might act as starting point for decision making at different stages of an open data intervention:
- Purpose should govern the choice of dataset to focus on
- Standards should be the primary guide to the design of the datasets
- User engagement should influence engagement activities ‘on top of’ published data to secure prioritised outcomes
- New user needs should feed into standard extension and development
- User engagement should shape the initiatives built on top of data
Some open questions
- Are there existing theoretical frameworks that could help make more sense of this space?
- Which metaphors and stories could make this more tangible?
- Does it matter?
Hi Tim,
Thanks for this post, which resonates strongly with me given my previous work on IATI and now on the 360Giving Standard. When writing 360Giving’s first 3-year strategy, I wanted to ensure that we built in use and feedback on the dataset from the start – so we could ensure our work was both practical and relevant, and also based on what I learned about assessing the quality of IATI data.
We spent much of Year 1 focusing on getting more grants data published and ensuring the schema was being used correctly (and developing tools that can help publishers do that automatically – its all about taking the pain out of publishing and removing manual errors). In Year 2 we continued to focus on getting more and better data released, but we also started using the data. This was important for testing our assumptions; learning what questions people ask; and seeing what does/doesn’t work when you try to visualise and analyse the data. Its been an interesting process and we’ve learned a lot in terms of where we need to focus our efforts in Year 3. This includes ensuring the schema is used consistently and correctly; looking at what other data is needed so the dataset remains relevant for our key audiences (particularly organisations IDs); and building tools that help people to use it. We don’t have a theoretical framework on this, but we do have some metaphors and stories that might be of interest.
A key metaphor we use is “Publish; Use; Learn; Improve”. This is core to our strategy with the aim of creating a virtuous cycle around the release and use of more standardised grants data.
Here are some case studies that may be relevant:
1) Challenges with using 360Giving and Open Contracting data for the UK Ministry of Justice: http://www.threesixtygiving.org/2018/05/08/what-did-we-do-in-our-first-datadive/ (and linked to this, how do we identify a grant vs a contract: http://www.threesixtygiving.org/2017/07/14/when-is-a-grant-not-a-grant-and-why-should-we-care/).
2) Developing a decision tree for human rights funders to help them identify what data they can share responsibly: http://www.threesixtygiving.org/2018/02/28/sharing-data-responsibly-how-and-why-do-human-rights-funders-share-data/.
3) A practical example of using 360Giving data and what was learned: http://www.threesixtygiving.org/2018/03/05/letting-data-take-the-wheel/.
In the coming months we’ll be looking at organisation identifiers, with the aim of making it easier to identify who is funding the same organisations and to highlight gaps and overlaps. This is a primary user need for our audience but is proving a surprisingly thorny issue. We’re interested in speaking with anyone who’d like to contribute ideas to this discussion.