[Summary: Brief notes exploring a strategic and service-based approach to improve IATI data quality]
Filed under: rough ideas
At the International Aid Transparency Initiative (IATI) Technical Advisory Group meeting (#tag2015) in Ottawa last week I took part in two sessions exploring the need for Application Programming Interfaces (APIs) onto IATI data. It quickly became clear that there were two challenges to address:
(1) Many of the questions people around the table were asking were complex queries, not the simple data retrieval kinds of questions that an API is well suited to;
(2) ‘Out of the box’ IATI data is often not able to answer the kinds of questions being asked, either because
- (a) the quality and consistency of data from distributed sources means that there are a range of special cases to handle when performing cross-donor analysis;
- (b) the questions asked invite additional data preparation, such as currency conversion, or identifying a block of codes that relate to a particular sector (.e.g. identifying all the Water and Sanitation related codes)
These challenges also underlie the wider issue explored at TAG2015: that even though five years of effort have gone into data supply, few people are actually using IATI data day-today.
If the goal of the International Aid Transparency Initiative as a whole, distinct from the specific goal of securing data, is more informed decision making in the sector, then this got me thinking about the extent to which what we need right now is a primary focus on services rather than data and tools. And from that, thinking about whether intelligent funding of such services could lead to the right kinds of pressures for improving data quality.
Improving data through enquiries
Using any dataset to answer complex questions takes both domain knowledge, and knowledge of the data. Development agencies might have lots of one-off and ongoing questions, from “Which donors are spending on Agriculture and Nutrition in East Africa?”, to “What pipeline projects are planned in the next six months affecting women and children in Least Developed Countries?”. Against a suitably cleaned up IATI dataset, reasonable answers to questions like these could be generated with carefully written queries. Authoriative answers might require further cleaning and analysis of the data retrieved.
For someone working with a dataset every day, such queries might take anything from a few minutes to a few hours to develop and execute. Cleaning data to provide authoritative answers might take a bit longer.
For a programme officer, who has the question, but not the knowledge of the data structures, working out how to answer these questions might take days. In fact, the learning curve will mean often these questions are simply not asked. Yet, having the answers could save months, and $millions.
So – what if key donors sponsored an enquiries service that could answer these kinds of queries on demand? With the right funding structure, it could have incentives not only to provide better data on request, but also to put resources into improving data quality and tooling. For example: if there is a set price paid per enquiry successfully answered, and the cost of answering that enquiry is increased by poor data quality from publishers, then there can be an incentive on the service to invest some of it’s time in improving incoming data quality. How to prioritise such investments would be directly connected to user demand: if all the questions are made trickier to answer because of a particular donor’s data, then focussing on improving that data first makes most sense. This helps escape the current situation in which the goal is to seek perfection for all data. Beyond a certain point, the political pressures to publish may ceases to work to increase data quality, whereas requests to improve data that are directly connected to user demand and questions may have greater traction.
Of course, the incentive structures here are subtle: the quickest solution for an enquiry service might be to clean up data as it comes into its own data store rather than trying to improve data at source – and there remains a desire in open data projects to avoid creating single centralised databases, and to increase the resiliency of the ecosystem by improving original open data, which would oppose this strategy. This would need to be worked through in any full proposal.
I’m not sure what appetite there would be for a service like this – but I’m certain that in, what are ultimately niche open data ecosystems like IATI, strategic interventions will be needed to build the markets, services and feedback loops that lead to their survival.
Comments and reflection welcome