Quick links: a personal Open Government Partnership round-up

[Summary: action plans, reports and guides]

I’m back in the US after a week in London, primarily for Rachel’s graduation as a Music Therapist, but which rather fortunately coincided with the Open Government Partnership Summit, and a chance to catch up with many colleagues and friends. I’m yet to digest all the sessions and notes I made well enough to complete a more analytical blog post on the OGP Summit, but as it is many OGP-related projects that have kept me from blogging here over the last month I thought I should at least link to a few of the outputs launched last week that have contributed to my bloggers block:

  • Development Initiatives launched the Joined Up Data report, a great scoping study by Neil Ashton, of how different transparency initiatives might work together on common building blocks of data standards.This is something I worked on a big previously when working with Development Initiatives, and that also has a lot of relevance to the Joined Up Philanthropy project.

Over the last few weeks I’ve definitely discovered the meaning of the term ‘action forcing moment’ – as many projects have worked up to the OGP summit as a deadline. Of course, now attention switches to the follow up – but hopefully at a pace that allows a little more time for sharing work-in-progress and reflective blogging.

Joined Up Philanthropy – a data standards exploration

Earlier this year, Indigo Trust convened a meeting with an ambitious agenda: to see 50% of UK Foundation grants detailed as open data, covering 80% founding grant making by value, within five years. Of course, many of the grant-giving foundations in the UK already share details of the work they fund, through annual reports or pages on their websites – but every funder shares the information differently, which makes bringing together a picture of the funding in a particular area or sector, understanding patterns of funding over time, or identifying the foundations who might be interested in a project idea you have, into a laborious manual task. Data standards for the publication of foundation’s giving could change that.

Supported by The Nominet Trust and Indigo Trust, at Practical Participation I’m working with non-profit sector expert Peter Bass on a series of ‘research sprints’ to explore what a data standard could look like. This builds on an experiment back in March to help scope an Open Contracting Data Standard. We’ll be using an iterative methodology to look at

  • (1) the existing supply of data;

  • (2) demand for data and use-cases;

  • and (3) existing related standards.

Each research sprint focusses primarily on one of these, consisting in around 10 days data collection and analysis, designed to generate useful evidence that can move the conversation forward, without pre-empting future decisions or trying to provide the final word on the question of what a data standard should look like.

Supply: What data is already collected?

The first stage, which we’re working on right now, involves finding out about the data that foundations already collect. We’re talking to a number of different foundations large and small to find out about how they manage information on the work they fund right now.

By collating a list of the different database fields that different foundations hold (whether the column headings in the spreadsheets they use to keep track of grants, or the database fields in a comprehensive relational database) and then mapping these onto a common core we’re aiming to build up a picture of which data might be readily available right now and easy to standardise, and where there are differences and diversities that will need careful handing in development of a standard. Past standards projects like the International Aid Transparency Initiative were able to benefit from a large ‘installed base’ of aid donors already using set conventions and data structures drawn from the OECD Development Assistance Committee, which strongly influenced the first version of IATI. We’ll be on the look-out for existing elements of standardisation that might exist to build upon in the foundations sector, as well as seeking to appreciate the diversity of foundations and the information they hold.

We’re aiming to have a first analysis of this exercise out in mid-October, and whilst we’re only focussing on UK foundations, will share all the methods and resources that would allow the exercise to be extended in other contexts.

Demand: what data do people want?

Of course, the data that it is easy to get hold of might not be the data that it is important to have access to, or that potential users want. That motivates the second phase of our research – looking to understand the different use cases for data from the philanthropic sector. These may range from projects seeking to work out who to send their funding applications to; philanthropists seeking to identify partners they could work with; or sector analysts looking to understand gaps in the current giving environment and catalyse greater investment in specific sectors.

Each use case will have different data needs. For example, a local project seeking funding would care particularly about geodata that can tell them who might make grants in their local area; whereas a researcher may be interested in knowing in which financial year grants were awarded, or disbursements made to projects. By articulating the data needs of each use-case, and matching these against the data that might be available, we can start to work out where supply and demand are well matched, or where a campaign for open philanthropy data might need to encourage philanthropists to collect or generate new information on their activities.

Standards: putting the pieces together

Once we know about the data that exists, the data that people want, and how they want to use it – we can start thinking in-depth about standards. There are already a range of standards in the philanthropy space, from the eGrant and hGrant standards developed by the Foundation Centre, to the International Aid Transparency Initiative (IATI) standard, as well as a range of efforts ongoing to develop standards for financial reporting, spending data, and geocoded project information.

Developing a draft standard involves a number of choices:

  • Fields and formats – a standard is made up both of the fields that are deemed important (e.g. value of grant; date of grant etc.) and the technical format through which the data will be represented. Data formats vary in how ‘expressive’ they are, and how extensible a standard is once determined. However, more expressive standards also tend to be more complex.

  • Start from scratch, or extend existing standards – it may be possible to simply adapt an existing standard. Deciding to do this involves both technical and governance issues: for example, if we build on IATI, how would a domestic philanthropy standard adapt to version upgrades in the IATI standard? What collaboration would need to be established? How would existing tools handle the adapted standard.

  • Publisher capacity and needs – standards should reduce rather than increase the burdens on data suppliers. If we are asking publishers to map their data to a complex additional standard, we’re less likely to get a sustainable supply of data. Understanding the technical capacity of people we’ll be asking for data is important.

  • Mapping between standards – sometimes it is possible to entirely automate the conversion between two related standards. For example, if the fields in our proposed standard are a subset of those in IATI, it might be possible to demonstrate how domestic and international funding flows data can be combined. Thinking about how standards map together involves considering the direction in which conversions can take place, and how this relates to the ways different actors might want to make use of the data.

We’ll be rolling our sleeves up as we develop a draft standard proposal, seeking to work with real data from Phase 1 to test out how it works, and checking the standardised data against the use cases identified in Phase 2.

The outcome of this phase won’t be a final standard – but instead a basis for discussion of what standardised data in the philanthropy sector should look like.

Get involved

We’ll be sharing updates regularly through this blog and inviting comments and feedback on each stage of the research.

If you are from a UK based Foundation who would like to be involved in the first phase of research, just drop me a line and we’ll see what we can do. We’re particularly on the look out for small foundations who don’t do much with data right now – so if you’re currently keeping track of your grant-making records on spreadsheets or post-it notes, do get in touch.

Adventures new

[Summary: From September 2013 - June 2014 I'll be based in Cambridge, MA as a Fellow at the Berkman Centre]

Berkman

When I signed on to undertake PhD work on the continued development of open data two years ago I thought I was opting for three years of mostly study time, whilst Rachel worked on her Music Therapy Masters at Nordoff Robbins. Of course, it didn’t work out that way: as open data as a field has developed larger and faster than most people (and certainly I) imagined, and, of course, down to my, now confirmed, inability to leave aside lots of other interesting (related) projects for a narrow  PhD focus. Fortunately, the Web Science DTC at Southampton has provided a very supportive environment for students working in applied projects alongside study, positively encouraging engagement beyond the library or lecture theatre.

So, over the last year I’ve had the fun of working with the Web Foundation to develop a global research programme on the emerging impacts of open data (we’ve belatedly published the conceptual framework for the project today), and working on a number of hands-on open data projects, with AidInfo, Open Contracting and London voluntary sector organisations amongst others. I’ve also ended up getting very involved in the UK civil society network on the Open Government Partnership, inputting into the development of the National Action Plan, which is now out for consultation and input till September. All of this has drawn upon, and fed into, my ongoing PhD work, part-time for the next year or two, but still moving forward to try and develop a framework for thinking about how technical and social choices around open data affect the realisation of progressive and inclusive democratic benefits.

And now, as Rachel approaches graduation as a Music Therapist, we’re preparing for a next adventure, heading out to live in the USA for nine months from September where I’ll be joining the Berkman Centre community as a 2013-2014 Berkman Fellow. I’m excited to be joining such a diverse and engaged community of scholars and activists. From there I’ll be continuing my focus on democratic impacts of open data, both with the Web Foundation Open Data in Developing Countries project, and PhD studies, and hopefully I’ll have the chance to engage in other projects at the centre too.

Of course, adventures new mean also leaving a few projects, so from the end of July I’ll be handing over my remaining roles at AidInfo where I’ve worked on the International Aid Transparency Initiative, and will be stepping back from a few other UK based commitments I won’t be able to undertake remotely. It’s been a particular pleasure to work with the AidInfo and IATI teams over the last few years on ambitious and exciting work to get open data on aid flowing – and I’ve learnt more than I can catalogue about open data in practice from the experience.

Although this blog has become a bit of a project-reporting space over the last year, it does remain my personal blogging space, so perhaps there might be a bit more personal blogging over the next year as I reflect on living abroad and engaging with lots of new ideas in the Berkman community…

 

Can the G8 Open Data Charter deliver real transparency?

[Summary: cross-post of an article reflecting on the G8 Open Data Charter]

I was asked by The Conversation, a new journalism platform based around linking academic writers with professional journalists and editors, to put together a short article on the recent G8 Open Data Charter, looking at the potential for it to deliver on transparency. The result is now live over on The Conversation site, and pasted in below (under a Creative Commons license). 

Last week G8 leaders signed up to an Open Data Charter, calling for government datasets to be “open data by default”. Open data has risen up the government agenda in the UK over the last three years, with the UK positioning itself as a world leader. But what does the charter mean for G8 nations, and more broadly, will it deliver on the promise of economic impacts and improved governance through the open release of government data relating to matters such as crime figures, energy consumption and election results?

Open government data (OGD) has rapidly developed from being the niche interest of a small community of geeks to a high-profile policy idea. The basic premise of OGD is that when governments publish datasets online, in digital formats that can be easily imported into other software tools, and under legal terms that permit anyone to re-use them (including commercially), those outside government can use that data to develop new ideas, apps and businesses. It also allows citizens to better scrutinise government and hold authorities to account. But for that to happen, the kind of data released, and its quality, matter.

As the Open Knowledge Foundation outlined ahead of the G8 Summit in a release from its Open Data Census “G8 countries still have a long way to go in releasing essential information as open data”. Less than 50% of the core datasets the census lists for G8 members are fully available as open data. And because open data is one of the most common commitments made by governments when they join the wider Open Government Partnership (OGP), campaigners want a clear set of standards for what makes a good open data initiative. The G8 Open Data Charter provides an opportunity to elaborate this. In a clear nod towards the OGP, the G8 charter states: “In the spirit of openness we offer this Open Data Charter for consideration by other countries, multinational organisations and initiatives.”

But can the charter really deliver? Russia, the worst scoring G8 member on the Open Data Census, and next chair of the G8, recently withdrew from the OGP, yet signed up to the Charter. Even the UK’s commitment to “open data by default” is undermined by David Cameron’s admission that the register of company beneficial ownership announced as part of G8 pledges on tax transparency will only be accessible to government officials, rather than being the open dataset campaigners had asked for.

The ability of Russia to sign up to the Open Data Charter is down to what Robison and Yu have called the “Ambiguity of Open Government” — the dual role of open data as a tool for transparency and accountability and for economic growth. As Christian Langehenke explains, Russia is interested in the latter, but was uncomfortable with the focus placed on the former in the OGP. The G8 Charter covers both benefits of open data but is relatively vague when it comes to the release of data for improved governance.

However, if delivered, the specific commitments made in the technical annexe to opening national election and budget datasets, and to improving their quality by December 2013, would signal progress for a number of states, Russia included. Elsewhere in the G8 communiqué, states also committed to publishing open data on aid to the International Aid Transparency Initiative standard, representing new commitments from France, Italy and Japan.

The impacts of the charter may also be felt in Germany and in Canada, where open data campaigners have long been pushing for greater progress to release datasets.Canadian campaigner David Eaves highlights in particular how the charter commitment to open specific “high value” datasets goes beyond anything in existing Canadian policy. Although the pressure of next year’s G8 progress report might not provide a significant stick to spur on action, the charter does give campaigners in Canada, Germany other other G8 nations a new lever in pushing for greater publication of data from their governments.

Delivering improved governance and economic growth will not come from the release of data alone. The charter offers some recognition of this, committing states to “work to increase open data literacy” and “encourage innovative uses of our data through the organisation of challenges, prizes or mentoring”. However, it stops short of considering other mechanisms needed to unlock the democratic and governance reform potential of open data. At best it frames data on public services as enabling citizens to “make better informed choices about the services they receive”, encapsulating a notion of citizen as consumer (a framing Jo Bates refers to the as the co-option of open data agendas), rather than committing to build mechanisms for citizens to engage with the policy process, and thus achieve accountability, on the basis of the data that is made available.

The charter marks the continued rise of open data to becoming a key component of modern governance. Yet, the publication of open data alone stops short of the wider institutional reforms needed to deliver modernised and accountable governance. Whether the charter can secure solid open data foundations on which these wider reforms can be built is something only time will tell.

Geneva E-Participation Day: Open Data and International Organisations

Meeting venue (I think...)[Summary: notes for a talk on open data and International Organisations]

In just over a weeks time I’l be heading for Geneva to take part in Diplo Foundation’s E-Participation Day: towards a more open UN?’ event. In the past I’ve worked with Diplo on remote participation, using the web to support live online participation in face-to-face meetings such as the Internet Governance Forum. This time I’ll be talking open data – exploring the ways in which changing regimes around data stand to impact International Organisations. This blog post was written for the Diplo blog as an introduction to some of the themes I might explore. 

The event will, of course, have remote participation – so you can register to join in-person or online for free here.

E-participation and remote hubs have the potential to open up dialogue and decision making. But after the conferences have been closed, and the declarations made, it is data that increasingly shapes the outcome of international processes. Whether it’s the numbers counted up to check on progress towards the millennium development goals, GDP percentage pledges on aid spending, or climate change targets, the outcomes of international co-operation frequently depend on the development and maintenance of datasets.

The adage that ‘you can’t manage what you can’t measure’ has relevance both for International Organisations and for citizens. The better the flows of data International Organisations can secure access to, the greater their theoretical capacity for co-ordination of complex systems. And the greater the flows of information from the internal workings of International Organisations that citizens, states and pressures groups can access, the greater their theoretical capacity to both scrutinise decisions and to get involved in decision making and implementation. I say theoretical capacity, because the picture is rarely that straightforward in practice. Yet, that complexity aside for a moment, over the last few years the idea has been gaining ground that, in some states has led to not only a greater flow of data, but has driven a veritable flood – with hundreds and thousands of government datasets placed online for anyone to access and re-use. That idea is open data.

Open Data is a simple concept. Organisations holding datasets should place them online, in machine-readable formats, and under licenses that let anyone re-use them. Advocates explain that this brings a myriad of benefits. For example, rather than finance data being locked up in internal finance systems, only available to auditors, open data on budgets and spending can be published on the web for anyone to download and explore in their spreadsheet software, or to let third parties generate visualisations that show citizens where their money is being spent, and to help independent analysts look across datasets for possible inefficiency, fraud or corruption. Or instead of the location of schools or health centres being kept on internal systems, the data can be published to allow innovators to present it to citizens in new and more accessible ways. And in crisis situations, instead of co-ordinators spending days collecting data from agencies in the field and re-keying the data into central databases, if all the organisations involved were to publish open data in common formats, there is the possibility of it being aggregated together, building up a clearer picture of what is going on. One of the highest profile existing open data initiatives in the development field is the International Aid Transparency Initiative (IATI) which now has standardised open data from 100s or donors, providing the foundation for a timely view of who is doing what in aid.

Open data ideas have been spreading rapidly across the world, with many states establishing national Open Government Data (OGD) initiatives, and International Organisations from The World Bank, to UN DESA, the OECD and the Open Government Partnership all developing conversations and projects around open data. When the G8 meet next week in Northern-Ireland they are expected to launch an ‘Open Data Charter’ setting out principles for high quality open data, and committing states to publish certain datasets. Right now it remains to be seen whether open data will feature anywhere else in the  in the G8 action plans, although there is clearly space for open data ideas and practices to be deployed in securing greater tax transparency, or supporting the ongoing monitoring of other commitments. In the case of the post-2105 process, a number of organisations have been advocating for an access to information focus, seeking to ensure citizens have access to open data that they can use to monitor government actions and hold governments to account on delivering on commitments.

However – as Robinson and Yu have highlighted – there can be an ambiguity of open government data: more open data does not necessarily mean more open organisations. The call for ‘raw data now’ has led to much open data emerging simply as an outbound communication, without routes for engagement or feedback, and no change in existing organisational practices. Rather than being treated as a reform that can enable greater organisational collaboration and co-ordination, many open datasets have just been ‘dumped’ on the web. In the same way that remote participation is often a bolt-on to meetings, without the deeper changes in process needed to make for equal participation for remote delegates, at best much open data only offers actors outside of institutions a partial window onto their operations, and at worst, the data itself remains opaque: stripped of context and meaning. Getting open data right for both transparency, and for transforming international collaboration needs more than just technology. 

As I explored with Jovan Kurbalija of Diplo in a recent webinar, there are big challenges ahead if open data is to work as an asset for development: from balancing tensions between standardisation and local flexibility, developing true multi-stakeholder governance of important data flows, and getting the incentives for collaboration right. However, now is the time to be engaging with these challenges – within a window of energy and optimism, and before network effects lock in paradoxically ‘closed’ systems of open data. I hope the dialogue at the Geneva E-Participation day will offer a small chance to broaden open data understanding and conversations in a way that can contribute to such engagement.

Open data in extractives: meeting the challenges


followthedatalinesmallerThere’s lots of interest building right now around how open data might be a powerful tool for transparency and accountability in the extractive industries sector. Decisions over where extraction should take place have a massive impact on communities and the environment, yet often decision making is opaque, with wealthy private interests driving exploitation of resources in ways that run counter the public interest. Whilst revenues from oil, gas and mineral resources have the potential to be a powerful tool for development, with a proportion channeled into public funds, massive quantities of revenue frequently ‘go missing’, lost in corruption, and
fuelling elements of a resource curse.

For the last ten years the Extractive Industries Transparency Initiative has been working to get companies to commit to ‘publish what they pay‘ to government, and for government to disclose receipts of finance, working to identifying missing money through a document-based audit process. Campaigning coalitions, watchdogs and global initiatives have focussed on increasing the transparency of the sector. Now, with a recognition that we need to link together information on different resources flows for development at all levels, potentially through the use of structured open data, and with an anticipated “data tsunami” of new information on extractives financials anticipated from the Dodd-Frank act in the US, and similar regulation in Europe, groups working on extractives transparency have been looking at what open data might mean for future work in this area.

8713819458_08a1bf9c10_zRight now, DFID are taking that exploration forward through a series of hack days with Rewired State under the ‘follow the data’ banner, with the first in London last weekend, and one coming up next week in Lagos, Nigeria. The idea of the events is to develop rapid prototypes of tools that might support extractives transparency, putting developers and datasets together over 24 hours to see what emerges. I was one of the judging panel at this weekends event, where the three developer teams that formed looked respectively at: making datasets on energy production and prices more accessible for re-use through an API; visualising the relationship between extractives revenues and various development indicators; and designing an interface for ‘nuggets’ of insight discovered through hack-days to be published and shared with useful (but minimal) meta-data.

In their way, these three projects highlight a range of the challenges ahead for the extractives sector in building capacity to track resource flows through open data:

  • Making data accessibleThe APIfy project sought to take a number of available datasets and aggregate them together in a database, before exposing a number of API endpoints that made machine-readable standardised data available on countries, companies and commodities. By translating the data access challenge from one or routing around in disparate datasets, to one of calling a standard API for key kinds of ‘objects’, the project demonstrated the need developers often have for clear platforms to build upon. However, as I’ve discovered in developing tools for the International Aid Transparency Initiative, building platforms to aggregate together data often turns out to be a non-trivial project: technically (it doesn’t take long to get to millions of data items when you are dealing with financial transactions), economically (as databases serving millions of records to even a small number of users need to be maintained and funded), socially (developers want to be able to trust the APIs they build against to be stable, and outreach and documentation are needed to support developers to engage with an API), and in terms of information architecture (as design choices over a dataset or API can have a powerful affect on downstream re-users).
  • Connecting datasets – none of the applications from the London hack-day were actually able to follow resource flows through the available data. Although visions of a coherent datasphere, in which the challenge is just making the connection between a transaction in one dataset, and a transaction in another, to see where money is flowing, are appealing – traceability in practice turns out to be a lot harder. To use the IATI example again, across the 100,000+ aid activities published so far less than 1% include traceability efforts to show how one transaction relates to another, and even here the relationships exist in the data because of conscious efforts by publishers to link transaction and activity identifiers. In following the money there will be many cases where people have an incentive not to make these linkages explicit. One of the issues raised by developers over the hack-day was the scattered nature of data, and the gaps across it. Yet – when it comes to financial transaction tracking, we’re likely to often be dealing with partial data, full of gaps, and it won’t be easy to tell at first glance when a mis-match between incoming and outgoing finances is a case of missing data or corruption. Right now, a lot of developers attack open data problems with tools optimised for complete and accurate data, yet we need to be developing tools, methods and visualisation approaches that deal with partial and uncertain data. This is developed in the next point.
  • Correlation, causation and investigation – The Compare the Map project developed on the hack day uses “scraped data from GapMinder and EITI to create graphical tools” that allow a user to eye-ball possible correlations between extractives data and development statistics. But of course, correlation is not causation – and the kinds of analysis that dig deeper into possible relationships are difficult to work through on a hack day. Indeed, many of the relationships mash-ups of this form can show have been written about in papers that control for many more variables, dealing carefully with statistically challenging issues of missing data and imperfectly matched datasets. Rather than simple comparison visualisations that show two datasets side by side, it may be more interesting to look for all the possible statistically significant correlations in a datasets with common reference points, and then to look at how human users could be supported in exploring, and giving feedback on, which of those might be meaningful, and which may or may not already be researched. Where research does show a correlation to exist, then using open data to present a visual narrative to users about this can have a place, though here the theory of change is very different – not about identifying connections – but about communicating them in interactive and engaging ways to those who may be able to act upon them.
  • Sharing and collaborating – The third project at the London hack-day was ‘Fact Cache‘ – a simple concept for sharing nuggets of information discovered in hack-day explorations. Often as developers work through datasets they may come across discoveries of interest, yet these are often left aside in the rush to create a prototype app or platform. Fact Cache focussed on making these shareable. However, when it was presented discussions also explored how it could make these nuggets of information into social objects, open to discussion and sharing. This idea of making open data findings more usable as social objects was also an aspect of the UN Global Pulse hunchworks project. That project is currently on hold (it would be interesting to know why…), but the idea of supporting collaboration around open data through online tools, rather than seeing apps that present data, or initial analysis as the end point, is certainly one to explore more in building capacity for open data to be used in holding actors to account.
  • Developing theories of change – as the judges met to talk about the projects, one of the key themes we looked at was whether each project had a clear theory of change. In some sense taken together they represent the complex chain of steps involved in an open data theory of change, from making data more accessible to developers, creating tools and platforms that let end users explore data, andthen allowing findings from data to be communicated and to shape discourses and action. Few datasets or tools are likely to be change-making on their own – but rather can play a key role in shifting the balance of power in existing networks or organisations, activists, companies and governments. Understanding the different theories of change for open data is one of the key themes in the ongoing Open Data in Developing Countries research, where we take existing governance arrangements as a starting point in understanding how open data will bring about impacts.

In a complex world, access to data, and the capacity to use it effectively, are likely to be essential parts of building more accountable governance across a wide range of areas, including in the extractives industry. Although there are many challenges ahead if we are to secure the maximum benefits from open data for transparent and accountable governance, it’s exciting and encouraging to see so many passionate people putting their minds early to tackling them, and building a community ready to innovate and bring about change.

Note: The usage of ‘follow the data’ in this DFID project is distinct from the usage in the work I’m currently doing to explore ‘follow the data’ research methods. In the former, the focus is really on following financial and resource flows through connecting up datasets; in the latter the focus is on tracing the way in which data artefacts have been generated, deployed, transferred and used in order to understand patterns of open data use and impact.

 

Intelligent Impact: Evaluating an open data capacity building with voluntary sector organisations

[Summary: sharing the evaluation report (9 pages, PDF) of an open data skills workshop for voluntary sector organisations]

Banner

Late last year, through the CSO network on the Open Government Partnership, I got talking with Deirdre McGrath of the Your Voice, Your City project about ways of building voluntary sector capacity to engage with open data. We talked about the possibility of a hack-day, but realised the focus at this stage needed to be on building skills, rather than building tools. It also needed to be on discovering what was possible with open data in the voluntary sector, rather than teaching people a limited set of skills. And as the Your Voice, Your City project was hosted within the London Voluntary Services Council (LVSC), an infrastructure organisation with a policy and research team, we had the possibility of thinking about the different roles needed to make the most of open data, and how a capacity building pilot could work both with frontline Voluntary and Community Sector (VCS) organisations, and an infrastructure organisation. A chance meeting with Nick Booth of podnosh gave form to a theme in our conversations about the need to focus on both ‘stats’ and ‘stories’ ensuring that capacity building worked with both quantitative and qualitative data and information. The result: plans for a short project, centred on a one-day workshop on ‘Intelligent Impact’, exploring the use of social media and open data for VCS organisations.

The day involved staff from VCS organisations coming along with questions or issues they wanted to explore, and then splitting into groups with a team of open data and social media mentors (Nick Booth, Caroline Beavon, Steven Flower, Paul Bradshaw and Stuart Harrison) to look at how existing online resources, or self-created data and media, could help respond to those questions and issues. Alex Farrow captured the story of the day for us using Storify and I’ve just completed a short evaluation report telling the story in more depth, capturing key learning from the event, and setting out possible next steps (PDF).

Following on from the event, the LVSC team have been exploring how a combination of free online tools for curating open data, collating questions, and sharing findings can be assembled into a low-cost and effective ‘intelligence hub‘, where data, analysis and presentation layers are all made accessible to VCS organisations in London.

Developing data standards for Open Contracting

logo-open-contractingContracts have a key role to play in effective transparency and accountability: from the contracts government sign with extractives industries for mineral rights, to the contracts for delivery of aid, contracts for provision of key public services, and contracts for supplies. The Open Contracting initiative aims to improve the disclosure and monitoring of public contracts through the creation of global principles, standards for contract disclosure, and building civil society and government capacity. One strand of work that the Open Contracting team have been exploring to support this work is the creation of a set of open data standards for capturing contract information. This blog post reports on some initial ground work designed to inform this strand of work.

Although I was involved in some of the set-up of this short project, and presented the outcomes at last weeks workshop, the bulk of the work was undertaken by Aptivate‘s Sarah Bird.

Update: see also the report of the process here.

Update 2 (12th Sept 2013): Owen Scott has build on the pilot with data from Nepal.

The process

Developing standards is a complex process. Each choice made has implications: for how acceptable the standard will be to different parties; for how easy certain uses of the data will be; and for how extensible the standard will be, or which other standards it will easily align with. However, standards cannot easily be built up choice-by-choice from a blank slate adopting the ideal choice: they are generally created against a background of pre-existing datasets and standards. The Open Contracting data standards team had already gathered together a range of contract information datasets currently published by governments across the world, and so, with just a few weeks between starting this project and the data standards workshop on 28th March, we planned an 5-day development sprint, aiming to generate a very draft first iteration of a standard. Applying an agile methodology, where short iterations are each designed to yield a viable product by the end, but on the anticipating that further early iterations may revise and radically alter this, meant we had to set a reasonable scope for this first sprint.

The focus then was on the supply side, taking a set of existing contract datasets from different parties, and identifying their commonalities and differences. The contract datasets selected were from the UK, USA, Colombia, Philippines and the World Bank. From looking at the fields these existing datasets had in common, an outline structure was developed, working on a principle of taking good ideas from across the existing data, rather than playing to a lowest common denominator. Then, using the International Aid Transparency Initiative activity standard as a basis, Sarah drafted a basic data structure, which can act as a version 0.01 standard for discussion. To test this, the next step was to convert samples from some of the existing datasets into this new structure, and then to analyse how much of the available data was covered by the structure, and how comprehensive the available data was when placed against the draft structure. (The technical approach taken, which can be found in the sprint’s GitHub repository, was to convert the different incoming data to JSON, and post it into a MongoDB instance for analysis).

We discuss the limitations of this process in a later section.

Initial results

The initial pass of data suggested a structure based on:

  • Organisation data – descriptions of organisations, held separately from individual contract information, and linked by a globally unique ID (based on the IATI Organisational ID standard)
  • Contract meta data – general information about the contract in question, such as title, classification, default currency and primary location of supply. Including an area for ‘line items’ of elements the contract covers.
  • Contract stages – a series of separate blocks of data for different stages of the contract, all contained within the overarching contract element.
    • Bid – key dates and classifications about the procurement stage of a contract process.
    • Award – details of the parties awarded the contract and the details of the award.
    • Performance – details of transactions (payments to suppliers) and work activities carried out during the performance of the contract.
    • Termination – details of the ending of the contract.
  • Documents – fields for linking to related documents.

A draft annotated schema for capturing this data can be found in XML and JSON format here, and a high-level overview is also represented in the diagram below. In the diagrams that follow, each block represents one data point in the draft standard.

1-Phases

We then performed an initial analysis to explore how much of the data currently available from the sources explored would fit into the standard, and how comprehensively the standard could be filled from existing data. As the diagram below indicates, no single source covered all the available data fields, and some held no information on particular stages of the contracting process at all. This may be down to different objectives of the available data sources, or deeper differences in how organisations handle information on contracts and contracting workflows.

2-Coverage

Combining the visualisations above into a single views given a sense of which data points in the draft standard have greatest use, illustrated in the schematic heat-map below.

3-Heatma

At this point the analysis is very rough-and-ready, hence the presentation of a rough impression, rather than detailed field-by-field analysis. The last thing to check was how much data was ‘left over’ and not captured in the standard. This was predominantly the case for the UK and USA datasets, where many highly specialised fields and flags were present the dataset, indicating information that might be relevant to capture in local contract datasets, but which might be harder to find standard representations for across contracts.

4-Extra

The next step was to check whether data that could go into the same fields could be easily harmonised. As the existence of organisation details, or dates, and classifications of contracts across different datasets does not necessarily mean these are interoperable. Fields like dates and financial amounts appeared to be relatively easy to harmonise, but some elements present greater challenges, such as organisational identifiers, contact people, and various codelists in use. However some code-lists may possible to harmonise. For example, the ‘Category’ classifications from across datasets were translated, grouped and aggregated, up to 92% of the original data in a sample was retained.

5-Sum and Group

Implications, gaps, next steps

This first iteration provides a basis for future discussions. There are, however, some important gaps. Most significant of all is that this initial development has been supply-side driven, based around the data that organisations are already publishing, rather than developed on the basis of the data that civil society organisations, or scrutiny bodies, are demanding in order to make sense of complex contract situations. It also omits certain kinds of contracts, such as complex extractives contracts (on which, see the fantastic work Revenue Watch have been doing with getting structured data from PDF contracts with Document Cloud), and Public Private Partnership (PPP) contracts. And it has not delved deeply into the data structures needed for properly capturing information that can aid in monitoring contract performance. These gaps will all need to be addressed in future work.

At the moment, this stands as discrete project, and no set next-steps are agreed as far as I’m aware. However, some of the ideas explored in the meeting on the 28th included:

  • A next iteration – focussed on the demand side – working with potential users of contracts data to work out how data needs to be shaped, and what needs to be in a standard to meet different data re-use needs. This could build towards version 0.02.
  • Testing against a wider range of datasets – either following, or in parallel with, a demand-driven iteration, to discover how the work done so far evolves when confronted with a larger set of existing contract datasets to synthesise.
  • Connecting with other standards. This first sprint took the IATI Standard as a reference point. There may be other standards to refer to in development. Discussions on the 28th with those involved in other standards highlighted an interest in more collaborative working to identify shared building blocks or common elements that might be re-used across standards, and to explore the practical and governance implications of this.
  • Working on complementary building blocks of a data standard – such as common approaches to identifying organisations and parties to a contract; or developing tools and platforms that will aggregate data and make data linkable. The experience of IATI, Open Spending and many other projects appears to be that validators, aggregation platforms and data-wrangling tools are important complements to standards for supporting effective re-use of open data.

Keep an eye on the Open Contracting website for more updates.

New paper: Connecting people, sharing knowledge, increasing transparency

Screen Shot 2013-03-08 at 19.16.53

[Summary: Linking to a short conference paper exploring the impact of the web on land governance]

After a few contributions I made to their online dialogue, the lovely folk at The Land Portal invited me to help them writing up the dialogue and putting together a paper for the upcoming World Bank Conference on Land and Poverty. They’ve just published the result over here (also accessible via this direct PDF link).

The paper itself was a rather rapid creation between the end of the online dialogue and the end of February deadline, but aims to weave together a number of strands important to thinking about how digital technology is changing the landscape for advocacy and work on land governance. It has a particular focus on women’s land rights, which, in the Land Portal’s online dialogue, stimulated lots of interesting discussions about the potentially gendered nature of digital technologies. In it we survey how the Web has evolved from Web 1 (documents), to Web 2 (communities) and onwards towards a possible Web 3 of open and linked open data.

The Land Portal team, since getting involved in last years Open Knowledge Festival, have been really exploring what open data and open development might mean for them – and the Portal is definitely a space to watch to see how ideas of open development might be put into practice in the very grounded and wide-reaching field of land governance.

Joining the Web Foundation, and projects old and new

[Summary: an update on some of the hats I might be wearing]

ODDC_hi-res

After working for the last eight or so months putting the project together, I’ve now formally joined the World Wide Web Foundation as research coordinator on the ‘Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC)’ programme.

It’s a two-year multi-country research project, exploring how open data is working in different settings across the world, funded by Canada’s International Development Research Centre. You can read more about the project over on the Web Foundation website and follow the project as it develops over at www.opendataresearch.org.

With this, I’ve switched my PhD work to part-time, but will continue to work with AidInfo on work related to the International Aid Transparency Initiative, and working through Practical Participation on assorted innovation and advocacy projects.