Category Archives: Open Data

Data, information, knowledge and power – exploring Open Knowledge’s new core purpose

[Summary: a contribution to debate about the development of open knowledge movements]

New 'Open Knowledge' data-earth logo.

New ‘Open Knowledge Foundation’ name and ‘data earth’ branding.

The Open Knowledge Foundation (re-named as as ‘Open Knowledge’) are soft-launching a new brand over the coming months.

Alongside the new logo, and details of how the new brand was developed, posted on the OK Wiki, appear a set of statements about the motivations, core purpose and tag-line of the organisation. In this post I want to offer an initial critical reading of this particular process and, more importantly, text.

Preliminary notes

Before going further, I want to offer a number of background points that frame the spirit in which the critique is offered.

  1. I have nothing but respect for the work of the leaders, staff team, volunteers and wider community of the Open Knowledge Foundation – and have been greatly inspired by the dedication I’ve seen to changing defaults and practices around how we handle data, information and knowledge. There are so many great projects, and so much political progress on openness, which OKFN as a whole can rightly take credit for.
  2. I recognise that there are massive challenges involved in founding, running and scaling up organisations. These challenges are magnified many times in community based and open organisations.
  3. Organisations with a commitment to openness, or democracy, whether the co-operative movement, open source communities like Mozilla, communities such as Creative Commons and indeed, the Open Knowledge Foundation – are generally held to much higher standards and face much more complex pressures from engaging their communities in what they do – than do closed and conventional organisations. And, as the other examples show, the path is not always an easy one. There are inevitably growing pains and challenges.
  4. It is generally better to raise concerns and critiques and talk about them, than leave things unsaid. A critique is about getting into the details. Details matter.
  5. See (1).

(Disclosure: I have previously worked as a voluntary coordinator for the open-development working group of OKF (with support from AidInfo), and have participated in many community activities. I have never carried out paid work for OKF, and have no current formal affiliation.)

The text

Here’s the three statements in the OK Branding notes that caught my attention and sparked some reflections:

About our brand and what motivates us:
A revolution in technology is happening and it’s changing everything we do. Never before has so much data been collected and analysed. Never before have so many people had the ability to freely, easily and quickly share information across the globe. Governments and corporations are using this data to create knowledge about our world, and make decisions about our future. But who should control this data and the ability to find insights and make decisions? The many, or the few? This is a choice that we get to make. The future is up for grabs. Do we want to live in a world where access to knowledge is “closed”, and the power and understanding it brings is controlled by the few? Or, do we choose a world where knowledge is “open” and we are all empowered to make informed choices about our future? We believe that knowledge should be open, and that everyone – from citizens to scientists, from enterprises to entrepreneurs, – should have access to the information they need to understand and shape the world around them.

Our core purpose:

  • A world where knowledge creates power for the many, not the few.
  • A world where data frees us – to make informed choices about how we live, what we buy and who gets our vote.
  • A world where information and insights are accessible – and apparent – to everyone.
  • This is the world we choose.

Our tagline:
See how data can change the world

The critique

My concerns are not about the new logo or name. I understand (all too well) the way that having ‘Foundation’ in a non-profits name can mean different things in different contexts (not least people expecting you to have an endowment and funds to distribute), and so the move to Open Knowledge as a name has a good rationale. Rather, I wanted to raise four concerns:

(1) Process and representativeness

Tag Cloud from Open Knowledge Foundation Survey. See http://blog.okfn.org/2014/02/12/who-are-you-community-survey-results-part-1/ for details.

Tag Cloud from Open Knowledge Foundation Survey. See blog post for details.

The message introducing the new brand to OKF-Discuss notes that “The network has been involved in the brand development process especially in the early stages as we explored what open knowledge meant to us all” referring primarily to the Community Survey run at the end of 2013 and written up here and here. However, the later parts of developing the brand appear to have been outsourced to a commercial brand consultancy consulting with a limited set of staff and stakeholders, and what is now presented appears to be being offered as given, rather than for consultation. The result has been a narrow focus on the ‘data’ aspects of OKF.

Looking back over the feedback from the 2013 survey, that data-centricity fails to represent the breadth of interests in the OKF community (particularly when looking beyond the quantitative survey questions which had an in-built bias towards data in the original survey design). Qualitative responses to the Survey talk of addressing specific global challenges, holding governments accountable, seeking diversity, and going beyond open data to develop broader critiques around intellectual property regimes. Yet none of this surfaces in the motivation statement, or visibly in the core purpose.

OKF has not yet grappled in full with idea of internal democracy and governance – yet as a network made up of many working groups, local chapters and more, for a ‘core purpose’ statement to emerge without wider consultation seem problematic. There is a big missed opportunity here for deeper discussion about ideas and ideals, and for the conceptualisation of a much richer vision of open knowledge. The result is, I think, a core purpose statement that fails to represent the diversity of the community OKF has been able to bring together, and that may threaten it’s ability to bring together those communities in shared space in future.

Process points aside however (see growing pains point above), there are three more substantive issues to be raised.

(2) Data and tech-centricity

A selection of OKF Working Groups

The Open Knowledge movement I’ve met at OKFestival and other events, and that is evident through the pages of the working groups is one committed to many forms of openness – education, hardware, sustainability, economics, political processes and development amongst others. It is a community that has been discussing diversity and building a global movement. Data may be an element of varying importance across the working groups and interest areas of OKF. And technology may be an enabler of action for each. But a lot are not fundamentally about data, or even technology, as their core focus. As we found when we explored how different members of the Open Development working group understood the concept of open development in 2012, many members focussed more upon open processes than on data and tech. Yet, for all this diversity of focus – the new OK tagline emphasises data alone.

I work on issues of open data everyday. I think it’s an important area. But it’s not the only element of open knowledge that should matter in the broad movement.

Whilst the Open Knowledge Foundation has rarely articulated the kinds of broad political critique of intellectual property regimes that might be found in prior Access to Knowledge movements, developing a concrete motivation and purpose statement gave the OKF chance to deepen it’s vision rather than narrow it. The risk Jo Bates has written about, of intellectual of the ‘open’ movement being co-opted into dominant narratives of neoliberalism, appears to be a very real one. In the motivation statement above, government and big corporates are cast as the problem, and technology and data in the hands of ‘citizens’, ‘scientists’, ‘entrepreneurs’ and (perhaps contradictorily) ‘enterprises’, as the solution. Alternative approaches to improving processes of government and governance through opening more spaces for participation is off the table here, as are any specific normative goals for opening knowledge. Data-centricity displaces all of these.

Now – it might be argued that although the motivation statement takes data as a starting point – is is really at its core about the balance of power: asking who should control data, information and knowledge. Yet – the analysis appears to entirely conflate the terms ‘data’, ‘information’ and ‘knowledge’ – which clouds this substantially.

(3) Data, Information and Knowledge

Data, Information, Knowledge ,Wisdom

The DIKW pyramid offers a useful way of thinking about the relationship between Data, Information, Knowledge (and Wisdom). This has sometimes been described as a hierarchy from ‘know nothing’ (data is symbols and signs encoding things about the world, but useless without interpretation), ‘know what’, ‘know how’ and ‘know why’.

Data is not the same as information, nor the same as knowledge. Converting data into information requires the addition of context. Converting information into knowledge requires skill and experience, obtained through practice and dialogue.

Data and information can be treated as artefacts/thigns. I can e-mail you some data or some information. But knowledge involves a process – sharing it involves more than just sending a file.

OKF has historically worked very much on the transition from data to information, and information to knowledge, through providing training, tools and capacity building, yet this is not captured at all in the core purpose. Knowledge, not data, has the potential to free, bringing greater autonomy. And it is arguably proprietary control of data and information that is at the basis of the power of the few, not any superior access to knowledge that they possess. And if we recognise that turning data into information and into knowledge involves contextualisation and subjectivity, then ‘information and insights’ cannot be by simultaneously ‘apparent’ to everyone, if this is taken to represent some consensus on ‘truths’, rather than recognising that insights are generated, and contested, through processes of dialogue.

It feels like there is a strong implicit positivism within the current core purpose: which stands to raise particular problems for broadening the diversity of Open Knowledge beyond a few countries and communities.

(4) Power, individualism and collective action

I’ve already touched upon issues of power. Addressing “global challenges like justice, climate changes, cultural matters” (from survey responses) will not come from empowering individuals alone – but will have to involve new forms of co-ordination and collective action. Yet power in the ‘core purpose’ statement appears to be primarily conceptualised in terms of individual “informed choices about how we live, what we buy and who gets our vote”, suggesting change is purely the result of aggregating ‘choice’, yet failing to explore how knowledge needs to be used to also challenge the frameworks in which choices are presented to us.

The ideas that ‘everyone’ can be empowered, and that when “knowledge is ‘open’ [...] we are all empowered to make informed choices about our future” fails to take account of the wider constraints to action and choice that many around the world face, and that some of the global struggles that motivate many to pursue greater openness are not always win-win situations. Those other constraints and wider contexts might not be directly within the power of an open knowledge movement to address, or the core preserve of open knowledge, but they need to be recognised and taken into account in the theories of change developed.

In summary

I’ve tried to deal with the Motivation, Core Purpose and Tag-line statements with as carefully as limited free time allows – but inevitably there is much more to dig into – and there will be other ways of reading these statements. More optimistic readings are possible – and I certainly hope might turn out to be more realistic – but in the interest of dialogue I hope that a critical reading is a more useful contribution to the debate, and I would re-iterate my preliminary notes 1 – 5 above.

To recap the critique:

  • Developing a brand and statement of core purpose is an opportunity for dialogue and discussion, yet right now this opportunity appears to have be mostly missed;
  • The motivation, core purpose and tagline are more tech-centric and data-centric than the OKF community, risking sidelining other aspects of the open knowledge community;
  • There need to be a recognition of the distinction of data, information and knowledge, to develop a coherent theory of change and purpose;
  • There appears to be an implicit libertarian individualism in current theories of change, and it is not clear that this is compatible with working to address the shared global challenges that have brought many people into the open knowledge community.

Updates:

There is some discussion of these issues taking place on the OKFN-Discuss list, and the Wiki page has been updated from that I was initially writing about, to re-frame what was termed ‘core purpose’ as ‘brand core purpose’.

Five critical questions for constructing data standards

I’ve been spending a lot of time thinking about processes of standardisation recently (building on the recent IATI Technical Advisory Group meeting, working on two new standards projects, and conversations at today’s MIT Center for Civic Media & Berkman Center meet-up). One of the key strands in that thinking is around how pragmatics and ethics of standards collide. Building a good standard involves practical choices based on the data that is available, the technologies that might use that data and what they expect, and the feasibility of encouraging parties who might communicate using that standard to adapt their practices (more or less minimally) in order to adopt it. But a standard also has ethical and political consequences, whether it is a standard deep in the Internet stack (as John Morris and Alan Davidson discuss in this paper from 2003[1]), or a standard at the content level, supporting exchange of information in some specific domain.

The five questions below seek to (in a very provisional sense) capture some of the considerations that might go into an exploration of the ethical dimensions of standard construction[2].

(Thanks to Rodrigo DaviesCatherine D’Ignazio and Willow Brugh for the conversations leading to this post)

For any standard, ask:

Who can use it?

Practically I mean. Who, if data in this standard format was placed in front of them, would be able to do something meaningful with it. Who might want to use it? Are people who could benefit from this data excluded from using it by it’s complexity?

Many data standards assume that ‘end users’ will access the data through intermediaries (i.e. a non-technical user can only do anything with the data after it has been processed by some intermediary individual or tool) – but not everyone has access to intermediaries, or intermediaries may have their own agendas or understandings of the world that don’t fit with those of the data user.

I’ve recently been exploring whether it’s possible to turn this assumption around, and make simple versions of a data standard the default, with more expressive data models available to those with the skills to transform data into these more structured forms. For example, the Three Sixty Giving standard (warning: very draft/provisional technical docs) is based around the idea of a rich data model, but a simple flat-as-possible serialisation that means most of the common forms of analysis someone might want to do with the data can be done in a spreadsheet, and for 90%+ of cases, data can be exchanged in flat(ish) forms, with richer structures only used where needed.

What can be expressed?

Standards make choices about what can be expressed usually at two levels:

  • Field choice
  • Taxonomies / codelists

Both involve making choices about how the world is sliced up, and what sorts of things can be represented and expressed.

A thought experiment: If I asked people in different social situations an open question inviting them to tell me about the things a standard is intended to be about (e.g. “Tell me about this contract?”) how much of what they report can be captured in the standard? Is it better at capturing the information seen as important to people in certain social positions? Are there ways it could capture information from those in other positions?

What social processes might it replace or disrupt?

Over the short-term, many data standards end up being fed by existing information systems – with data exported and transformed into the standard. However, over time, standards can lead to systems being re-engineered around them. And in shifting the flow of information inside and outside of organisations, standards processes can disrupt and shift patterns of autonomy and power.

Sometimes the ‘inefficient’ processes of information exchange, which open data standards seek to rationalise, can be full of all sorts of tacit information exchange, relationship building etc. which the introduction of a standard could affect. Thinking about how the technical choices in a standard affect it’s adoption, and how far they allow for distributed patterns of data generation and management may be important. (For example, which identifiers in a standard have to be maintained centrally, thus placing a pressure for centralised information systems to maintain the integrity of data – and which can be managed locally – making it easier to create more distributed architectures. It’s not simply a case of what kinds of architectures a standard does or doesn’t allow, but which it makes easier or trickier, as in budget constrained environments implementations will often go down the path of least resistance, even if it’s theoretically possible to build out implementation of standard-using tools in ways that better respect the exiting structures of an organisation.)

Which fields are descriptive? Which fields are normative?

There has recently been discussion of the introduction on Facebook of a wide range of options for describing Gender, with Jane Fae arguing in the Guardian that, rather than provide a restricted list of fields, the field should simply be dropped altogether. Fae’s argument is about the way in which gender categories are used to target ads, and that it has little value as a category otherwise.

Is it possible to look at a data standard and consider which proposed fields import strong normative worldviews with them? And then to consider omitting these fields?

It may be that for some fields, silence is the better option that forcing people, organisations or events (or whatever it is that the standard describes) into boxes that don’t make sense for all the individuals/cases covered…

Does it permit dissent?

Catherine D’Ignazio suggested this question. How far does a standard allow itself to be disputed? What consequences are there to breaking the rules of a standard or remixing it to express ideas not envisaged by the original architects? What forms of tussle can the standard accommodate?

This is perhaps even more a question of the ecosystem of tools, validators and other resources around the standard than a standard specification itself, but these are interelated.

Footnotes

[1]: I’ve been looking for more recent work on ‘public interest’ and politics of standard creation. Academically I spend a lot of time going back to Bowker and Star’s work on ‘infrastructure’, but I’m on the look out for other works I should be drawing upon in thinking about this.

[2]: I’m talking particularly about open data standards, and standards at the content level, like IATI, Open 311, GTFS etc.

ODDC Update at Developers for Development, Montreal

[Summary: Cross posted from the Open Data Research Network website. Notes from a talk at OD4DC Montreal] 

I’m in Montreal this week for the Developers for Development hackathon and conference. Asides from having fun building a few things as part of our first explorations for the Open Contracting Data Standard, I was also on a panel with the fantastic Linda Raftree, Laurent Elder and Anahi Ayala Iacucci focussing on the topic of open data impacts in developing country: a topic I spend a lot of time working on. We’re still in the research phase of the Emerging Impacts of Open Data in Developing Countries research network, but I tried to pull together a talk that would capture some of the themes that have been coming up in our network meetings so far. So – herewith the slides and raw notes from that talk.

Introduction

In this short presentation I want to focus on three things. Firstly, I want to present a global snapshot of open data readiness, implementation and impacts around the world.

Secondly, I want to offer some remarks on the importance of how research into open data is framed, and what social research can bring to our understanding of the open data landscape in developing countries.

Lastly, I want to share a number of critical reflections emerging from the work of the ODDC network.

Part 1: A global snapshot

I’ve often started presentations and papers about open data by commenting on how ‘it’s just a few short years since the idea of open data gained traction’, yet, in 2014 that line is starting to get a little old. Data.gov launched in 2009, Kenya’s data portal in 2011. IATI has been with us for a while. Open data is no longer a brand new idea, just waiting to be embraced – it is becoming part of the mainstream discourse of development and government policy. The issue now is less about convincing governments to engage with the open data agenda, than it is about discovering whether open data discourses are translating into effective implementation, and ultimately open data impacts.

Back in June last year, at the Web Foundation we launched a global expert survey to help address that question. All-in-all we collected data covering 77 countries, representing every region, type of government and level of development, and asking about government, civil society and business readiness to secure benefits from open data, the actual availability of key datasets, and observed impacts from open data. The results were striking: over 55% of these diverse countries surveyed had some form of open data policy in place, many with high-level ministerial support.

The policy picture looks good. Yet, when it came to key datasets actually being made available as open data, the picture was very different. Less than 7% of the dataset surveyed in the Barometer were published both in bulk machine-readable forms, and under open licenses: that is, in ways that would meet the open definition. And much of this percentage is made up of the datasets published by a few leading developed states. When it comes to essential infrastructural datasets like national maps, company registers or land registries, data availability, of even non-open data, is very poor, and particularly bad in developing countries. In many countries, the kinds of cadastral records that are cited as a key to the economic potential of open data are simple not yet collected with full country coverage. Many countries have long-standing capacity building programs to help them create land registries or detailed national maps – but with many such programmes years or even decades behind on delivering the required datasets.

The one exception where data was generally available and well curated, albeit not provided in open and accessible forms, was census data. National statistics offices have been the beneficiaries of years of capacity building support: yet the same programmes that have enabled them to manage data well have also helped them to become quasi-independent of governments, complicating whether or not they will easily be covered by government open data policies.

If the implementation story is disappointing, the impact story is even more so. In the Barometer survey we asked expert researchers to cite examples of where open data was reported in the media, or in academic sources, to have had impacts across a range of political, social and economic domains, and to score questions on a 10-point scale for the breadth and depth of impacts identified. The scores were universally low. Of course, whilst the idea of open data can no longer be claimed to be brand new, many country open data initiatives are – and so it is far to day that outcomes and impacts take time – and are unlikely to be seen over in any substantial way over the very short term. Yet, even in countries where open data has been present for a number of years, evidence of impact was light. The impacts cited were often hackathon applications, which, important as they are, generally only prototype and point to potential impacts. Without getting to scale, few demo applications along can deliver substantial change.

Of course, some of this impact evidence gap may also be down to weaknesses in existing research. Some of the outcomes from open data publication are not easily picked up in visible applications or high profile news stories. That’s where the need for a qualitative research agenda really comes in.

Part 2: The Open Data Barometer

The Open Data Barometer is just one part of a wider open data programme at the World Wide Web Foundation, including the Open Data in Development Countries research project supported by Canada’s International Development Research Center. The main focus of that project over the last 12 months has been on establishing a network of case study research partners based in developing countries, each responding to both local concerns, and a shared research agenda, to understand how open data can be put to use in particular decision making and governance situations.

Our case study partners are drawn from Universities, NGOs and independent consultancies, and were selected from responses to an open call for proposals issues in mid 2012. Interestingly, many of these partners were not open data experts, or already involved in open data – but were focussed on particular social and policy issues, and were interested in looking at what open data meant for these. Focus areas for the cases range from budget and aid transparency, to higher education performance, to the location of sanitation facilities in a city. Together, these foundations gives the research network a number of important characteristics:

Firstly, whilst we have a shared research framework that highlights particular elements that each case study seeks to incorporate – from looking at the political, social and economic context of open data, through to the technical features of datasets and the actions of intermediaries – cases are also able to look at the different constraints exogenous to datasets themselves which affect whether or not data has a chance of making a difference.

Secondly, the research network works to build critical research capacity around open data – bringing new voices into the open data debate. For example, in Kenya, the Jesuit Hakimani Trust have an established record working on citizens access to information, but until 2013 had not looking at the issue of open data in Kenya. By incorporating questions about open data in their large-scale surveys of citizen attitudes, they start generating evidence that treats open data alongside other forms of access to information for poor and marginalisd citizens, generating new insights.

Thirdly, the research is open to unintended consequences of open data publication: good and bad – and can look for impacts outside the classic logic model of ‘data + apps = impact’. Indeed, as researchers in both Sao Paulo and Chennai have found, they have, as respected research intermediaries exploring open data use, been invited to get involved with shaping future government data collection practices. Gisele Craviero from the University of Sao Paulo uses the metaphor of an iceberg to highlight this importance of looking below the surface. The idea that opening data ultimately changes what data gets collected, and how it is handled inside the state should not be an alien idea for those involved in IATI – which has led to many aid agencies starting to geocode their data. But it is a route to effects often underplayed in explorations of the changes open data may be part of bringing about.

Part 3: Emerging findings

As mentioned, we’ve spent much of 2013 building up the Open Data in Developing Countries research network – and our case study parters are right now in the midst of their data collection and analysis. We’re looking forward to presenting full findings from this first phase of research towards the summer, but there are some emerging themes that I’ve been hearing from the network in my role as coordinator that I want to draw out. I should note that these points of analysis are preliminary, and are the product of conversations within the network, rather than being final statements, or points that I claim specific authorship over.

We need to unpack the definition of open data.

Open data is generally presented as a package with a formal definition. Open data is data that is proactively published, in machine-readable formats, and under open licenses. Without all of these: there isn’t open data. Yet, ODDC participants have been highlighting how the relative importance of these criteria varies from country to country. In Sierra Leone, for example, machine-readable formats might be argued to be less important right now than proactive publication, as for many datasets the authoritative copy may well be the copy on paper. In India, Nigeria or Brazil, the question of licensing may by mute: as it is either assumed that government data is free to re-use, regardless or explicit statements, or local data re-users may be unconcerned with violating licenses, based on a rational expectation that no-one will come after them.

Now – this is not to say that the Open Definition should be abandoned, but we should be critically aware of it’s primary strength: it helps to create a global open data commons, and to deliver on a vision of ‘Frictionless data’. Open data of this form is easier to access ‘top down’, and can more easily be incorporated into panopticon-like development dashboards, but the actual impact on ‘bottom up’ re-use may be minimal. Unless actors in a developing country are equipped with the skills and capacities to draw on this global commons, and to overcome other local ‘frictions’ to re-using data effectively, the direct ROI on the extra effort to meet a pure open definition might not accrue to those putting the effort in: and a dogmatic focus on strict definitions might even in some cases slow down the process of making data relatively more accessible. Understanding the trade offs here requires more research and analysis – but the point at least is made that there can be differences of emphasis in opening data, and these prioritise different potential users.

Supply is weak, but so is demand.

Talking at the Philippines Good Governance Summit a few weeks ago, Michael Canares presented findings from his research into how the local government Full Disclosure Policy (FDP) is affecting both ‘duty bearers’ responsible for supplying information on local budgets, projects, spend and so-on, and ‘claim holders’ – citizens and their associations who seek to secure good services from government. A major finding has been that, with publishers being in ‘compliance mode’, putting required information but in accessible formats, citizen groups articulated very little demand for online access to Full Disclosure Policy information. Awareness that the information was available was low, interest in the particular data published was low (that is, information made available did not match with any specific demand), and where citizen groups were accessing the data they often found they did not have the knowledge to make sense of or use it. The most viewed and download documents garnered no more than 43 visits in the period surveyed.

In open data, as we remove the formal or technical barriers to data re-use that come from licenses and non-standard formats, we encounter the informal hurdles, roadblocks and thickets that lay behind them. And even as those new barriers are removed through capacity building and intermediation, we may find that they were not necessarily holding back a tide of latent demand – but were rather theoretical barriers in the way of a progressive vision of an engaged citizenry and innovative public service provision. Beyond simply calling for the removal of barriers, this vision needs to be elaborated – whether through the designs of civic leaders, or through the distributed actions of a broad range of social activists and entrepreneurs. And the tricky challenge of culture change – changing expectations of who is, and can be, empowered – needs to be brought to the fore.

Innovative intermediation is about more than visualisation.

Early open data portals listed datasets. Then they started listing third party apps. Now, many profile interactive visualisations built with data, or provide visualisation tools. Apps and infographics have become the main thing people think of when it comes to ‘intermediaries’ making open data accessible. Yet, if you look at how information flows on the ground in developing countries, mobile messaging, community radio, notice boards, churches and chiefs centres are much more likely to come up as key sites of engagement with public information.

What might open data capacity building look like if we started with these intermediaries, and only brought technology in to improve the flow of data where that was needed? What does data need to be shaped like to enable these intermediaries to act with it? And how do the interests of these intermediaries, and the constituencies they serve, affect what will happen with open data? All these are questions we need to dig into further.

Summary

I said in the opening that this would be a presentation of critical reflections. It is important to emphasise that none of this constitutes an argument against open data. The idea that government data should be accessible to citizens retains its strong intrinsic appeal. Rather, in offering some critical remarks, I hope this can help us to consider different directions open data for development can take as it matures, and that ultimately we can move more firmly towards securing impacts from the important open data efforts so many parties are undertaking.

ICTs and Anti-Corruption: theory and examples

[Summary: draft section from U4 paper on exploring the incentives for adopting ICT innovation in the fight against corruption]

As mentioned a few days ago, I’ve currently got a paper online for comment which I’m working on with Silvana Fumega for the U4 anti-corruption centre. I’ll be blogging each of the sections here, and if you’ve comments on any element of it, please do drop in comments to the Google Doc draft. 

ICTS AND ANTI-CORRUPTION

Corruption involves the abuse of entrusted power for personal gain (Transparency International, 2009). Grönlund has identified a wide range of actions that can be taken with ICTs to try and combat corruption, from service automation and the creation of online and mobile phone based corruption-reporting channels to the online publication of government transparency information (Grönlund, 2010). In the diagram below we offer eight broad categories of ICTs interventions with a potential role in fighting corruption.

U4-Diagram

These different ICT interventions can be divided between transactional reforms and transparency reforms. Transactional reforms seek to reduce the space for corrupt activity by controlling and automating processes inside government, or seek to increase the detection of corruption by increasing the flow of information into existing government oversight and accountability mechanisms. Often these developments are framed as part of e-government. Transparency reforms, by contrast, focus on increasing external rather than internal control over government actors by making the actions of the state and its agents more visible to citizens, civil society and the private sector. In the diagram, categories of ICT intervention and related examples are positioned along a horizontal axis to indicate, in general, whether these initiatives have emerged as ‘citizen led’ or ‘government led’ projects, and along the vertical axis to indicate whether the focus of these activities is primarily on transactional reforms, or transparency. In practice, where any actual ICT intervention falls is a matter as much of the details of implementation as it is to do with the technology, although we find these archetypes useful to highlight the different emphasis and origins of different ICT-based approaches.

Many ICT innovations for transparency and accountability[1] have emerged from within civil society and the private sector, only later adopted by governments. In this paper our focus is specifically upon government adoption of innovations: when the government is taking the lead role in implementing some technology with an anti-corruption potential, albeit a technology that may have originally been developed elsewhere, and where similar instances of such technologies may still be deployed by groups outside government. For example, civil society groups in a number of jurisdictions have deployed the Alaveteli open source software[2] which brokers the filing of Right to Information act requests online, logging and making public requests to, and replies from, government. Some government agencies have responded by building their own direct portals for filing requests, which co-exist with the civil society run Alaveteli implementations. The question of concern for this paper is why government has chosen to adopt the innovation and provide its own RTI portals.

Although there are different theories of change underlying ICT enabled transactional and transparency reforms, the actual technologies involved can be highly inter-related. For example, digitising information about a public service as part of an e-government management process means that there is data about its performance that can be released through a data portal and subjected to public pressure and scrutiny. Without the back-office systems, no digital records are available to open (Thurston, 2012).

The connection between transactional e-government and anti-corruption has only relatively recently been explored. As Bhatnagar notes, most e-government reforms did not begin as anti-corruption measures. Instead, they were adopted for their promise to modernise government and make it more efficient (Bhatnagar, 2003). Bhatnagar explains that “…reduction of corruption opportunities has often been an incidental benefit, rather than an explicit objective of e-government”. A focus on the connection between e-government and transparency is more recent still. Kim et. al. (2009) note that “E-government’s potential to increase transparency and combat corruption in government administration is gaining popularity in communities of e-government practitioners and researchers…”, arguably as a result of increased Internet diffusion meaning that for the first time data and information from within government can, in theory, be made directly accessible to citizens through computers and mobile phones, without passing through intermediaries.

In any use of ICTs for anti-corruption, the technology itself is only one part of the picture. Legal frameworks, organisational processes, leadership and campaign strategies may all be necessary complements of digital tools in order to secure effective change. ICTs for accountability and anti-corruption have developed in a range of different sectors and in response to many different global trends. In the following paragraphs we survey in more depth the emergence and evolution of three kinds of ICTs with anti-corruption potential, looking at both the technologies and the contexts they are embedded within. 

2.1 TRANSPARENCY PORTALS

A transparency portal is a website where government agencies routinely publish defined sets of information. They are often concerned with financial information and might include details of laws and regulations alongside more dynamic information such as government debt, departmental budget allocations and government spending (Solana, 2004). They tend to have a specific focus, and are often backed by a legal mandate, or regulatory requirement, that information is published to them on an ongoing basis. National transparency portals have existed across Latin America since the early 2000s, developed by finance ministries following over 15 years investment in financial management capacity building in the region. Procurement portals have also become common, linked to efforts to make public procurement more efficient, and comply with regulations and good practice on public tenders.

More recently, a number of governments have mandated the creation of local government transparency portals, or the creation of dedicated transparency pages on local government websites. For example, in the United Kingdom, the Prime Minister requested that governments publish all public spending over £500 on their websites, whilst in the Philippines the Department of Interior and Local Government (DILG) has pushed the implementation of a Full Disclosure Policy requiring Local Government Units to post a summary of revenues collected, funds received, appropriations and disbursement of funds and procurement–related documents on their websites. The Government of the Philippines has also created an online portal to support local government units in publishing the documents demanded by the policy[3].

In focus: Peru Financial Transparency Portal A transparency portal is a website where government agencies routinely publish defined sets of information. They are often concerned with financial information and might include details of laws and regulations alongside more dynamic information such as government debt, departmental budget allocations and government spending.

Country: Peru

Responsible: Government of Peru- Ministry of Economic and Financial Affairs

Brief description: The Peruvian Government implemented a comprehensive transparency strategy in early 2000. That strategy comprised several initiatives (law on access to financial information, promotion of citizen involvement in transparency processes, among others). The Financial Transparency Portal was launched as one of the elements of that strategy. In that regard, Solanas (2003) suggests that the success of the portal is related to the existence of a comprehensive transparency strategy, in which the portal serves as a central element. The Portal (http://www.mef.gob.pe/) started to operate in 2001 and, at that time, it was praised as the most advanced in the region. Several substantial upgrades to the portal have taken place since the launch.

Current situation:

The portal presents several changes from its early days. In the beginning, the portal provided access to documents on economic and financial information. After more than a decade, it currently publishes datasets on several economic and financial topics, which are provided by each of the agencies in charge of producing or collecting the information. Those datasets are divided in 4 main modules: budget performance monitoring, implementation of investment projects, inquiry on transfers to national, local and regional governments, and domestic and external debt. The portal also includes links to request information, under the Peruvian FOI law, as well as track the status of the request.

Sources:

http://www.politikaperu.org/directorio/ficha.asp?id=355

http://www.egov4dev.org/transparency/case/laportals.shtml

http://www.worldbank.org/socialaccountability_sourcebook/Regional%20database/Case%20 studies/Latin%20America%20&%20Caribbean/TOL-V.pdf#page=71

In general, financial transparency portals have focussed on making government records available: often hosting image file version of printed, signed and scanned documents which mean that anyone wanting to analyse the information from across multiple reports must re-type it into spreadsheets or other software. Although a number of aid and budget transparency portals are linked directly to financial management systems, it is only recently that a small number of portals have started to add features giving direct access to datasets on budget and spending.

Some of the most data-centric transparency portals can be found in the International Aid field, where Aid Transparency Portals have been built on top of Aid Management Platforms used by aid-recipient governments to track their donor-funded projects and budgets. Built with funding and support from International donors, aid transparency portals such as those in Timor Leste and Nepal offer search features across a database of projects. In Nepal, donors have funded the geocoding of project information, allowing a visual map of where funding flows are going to be displayed.

Central to the hypothesis underlying the role of transparency portals in anti-corruption is the idea that citizens and civil society will demand and access information from the portals, and will use it to hold authorities to account (Solana, 2004). In many contexts whilst transparency portals have become well-established, direct demand from citizens and civil society for the information they contain remains, as Alves and Heller put it in relation to Brazil’s fiscal transparency, “frustratingly low” (in Khagram, Fung, & Renzio, 2013). However, transparency portals may also be used by the media and other intermediaries, providing an alternative more indirect theory of change in which coverage of episodes of corruption creates electoral pressures (in functioning democracies at least) against corruption. Though, Power and Taylor’s work on democracy and corruption in Brazil suggests that whilst such mechanisms can have impacts, they are often confounded in practice by other non-corruption related factors that influence voters preferences, and a wide range of contingencies, from electoral cycles to political party structures and electoral math (Power & Taylor, 2011).

2.2 OPEN DATA PORTALS

Where transparency portals focus on the publication of specific kinds of information (financial; aid; government projects etc.), open data portals act as a hub for bringing together diverse datasets published by different government departments.

Open data involves the publication of structured machine-readable data files online with explicit permission granted for anyone to re-use the data in any way. This can be contrasted with examples where transparency portals may publish scanned documents that cannot be loaded into data analysis software, or under copyright restrictions that deny citizens or businesses right to re-use the data.  Open data has risen to prominence over the last five years, spurred on by the 2009 Memorandum on Transparency and Open Government from US President Obama (Obama, 2010) which led to the creation of thedata.gov portal, bringing together US government datasets. This built on principles of Open Government Data elaborated in 2007 by a group of activists meeting in Sebastopol California, calling for government to provide data online that was complete, primary (I.e. not edited or interpreted by government before publication), timely, machine-readable, standardised and openly licensed (Malmud & O’Reilly, 2007)

In focus: Kenya Open Data Initiative (KODI) Open data involves the publication of structured machine-readable data files online with explicit permission granted for anyone to re-use the data in any way. Open data portals act as a hub for bringing together diverse datasets published by different government departments. One of those platforms is: Kenya Open Data Initiative (opendata.go.ke)

Country: Kenya

Responsible: Government of Kenya

Brief description:

Around 2008, projects from Ushahidi to M-PESA put Kenya on the map of ICT innovation. Kenyan government – in particular, then-PS Ndemo of the Ministry of Information and Communications – eager to promote and to encourage that market, started to analyze the idea of publishing government datasets for this community of ICT experts to use.  In that quest, he received support from actors outside of the government such as the World Bank, Google and Ushahidi. Adding to that context, in 2010 a new constitution, recognizing the right to access to information by citizens, was enacted in Kenya (however, a FOI law is still a pending task for the Kenyan government). On July 8 2011, President Mwai Kibaki launched the Kenya Open Data Initiative, making government datasets available to the public through a web portal: opendata.go.ke

Current situation:

Several activist and analyst are starting to write about the lack of updates and updated information of the Kenya Open Data Initiative. The portal has not been updated in several months, and its traffic has slowed down significantly.

Sources:

http://www.scribd.com/doc/75642393/Open-Data-Kenya-Long-Version

http://blog.openingparliament.org/post/63629369190/why-kenyas-open-data-portal-is-failing-and-why-it

http://www.code4kenya.org/?p=469

http://www.ict.go.ke/index.php/hot-topic/416-kenya-open-data

http://www.theguardian.com/global-development/poverty-matters/2011/jul/13/kenya-open-data-initiative

Open data portals have caught on as a policy intervention, with hundreds now online across the world, including an increasing number in developing countries. Brazil, India and Kenya all have national open government data portals, and Edo State in Nigeria recently launched one of the first sub-national open data portals on the continent, expressing a hope that it would “become a platform for improving transparency, catalyzing innovation, and enabling social and economic development”[4]. However, a number of open data portals have already turned out to be short-lived, with the Thai governments open data portal launched[5] in 2011, already defunct and offline at the time of writing.

The data hosted on open data portals varies widely: ranging from information on the locations of public services, and government service performance statistics, to public transport timetables, government budgets, and environmental monitoring data gathered by government research institutions. Not all of this data is useful for anti-corruption work: although the availability of information as structured data makes it far easier to third-parties to analyse a wide range of government datasets not traditionally associated with anti-corruption work to look for patterns and issues that might point to causes for concern. In general, theories of change around open data for anti-corruption assume that skilled intermediaries will access, interpret and work with the datasets published, as portals are generally designed with a technical audience in mind.

Data portals can act as both a catalyst of data publication, providing a focal point that encourages departments to publish data that was not otherwise available, and as an entry-point helping actors outside government to locate datasets that are available. At their best they provide a space for engagement between government and citizens, although few currently incorporate strong community features (De Cindio, 2012).

Recently, transparency and open data efforts have also started to focus on the importance of cross-cutting data standards, that can be used to link up data published in different data portals, and to solicit the publication of sectoral data. Again the aid sector has provided a lead here, with the development the International Aid Transparency Initiative (IATI) data standard, and a data portal collating all the information on aid projects published by donors to this standard[6]. New efforts are seeking to build on experiences from IATI with data standards for contracts information in the Open Contracting initiative, which not only targets information from governments, but also potentially disclosure of contract information in the private sector[7].

2.3 CITIZEN REPORTING CHANNELS

Transparency and open data portals primarily focus on the flow of information from government to citizen. Many efforts to challenge corruption require a flow of information the other way: citizens reporting instances of corruption or providing the information agents of government need to identify and address corrupt behaviour. When reports are filed on paper, or to local officials, it can be hard for central governments to ensure reports are adequately addressed. By contrast, with platforms like the E-Grievance Portal in the Indian State of Orissa[8], when reports are submitted they can be tracked, meaning that where there is will to challenge corruption, citizen reports can be better handled.

Many online channels for citizen reporting have in fact grown up outside of government. Platforms like FixMyStreet in the UK, and the many similar platforms across the world, have been launched by civil society groups frustrated at having to deal with government through seemingly antiquated paper processes. FixMyStreet allows citizens to point out on a map where civil infrastructure requires fixing and forward the citizen reports to the relevant level of government. Government agents are invited to report back to the site when the issue is fixed, giving a trackable and transparent record of government responsiveness. In some areas, governments have responded to these platforms by building their own alternative citizen reporting channels, though often without the transparency of the civil society platforms (reports simply go to the public authority; no open tracking is provided), or, in other cases, by working to integrate the civil society provided solution with their own systems.

In focus: I Paid a BribeMany online channels for citizen reporting have been developed outside of government. One of those platforms is “I Paid a Bribe”, and Indian website aimed at collating bribe’s stories and prices from citizens across the country and then use it to present a snapshot of trends in bribery.

Country: India

Responsible: Janaagraha (www.janaagraha.org) a Bangalore based not-for-profit organizatio

Brief description:

The initiative was first launched on August 15, 2010 (India’s Independence Day), and the website became fully functional a month later. I Paid a Bribe aims to understand the role of bribery in public service delivery by transforming the data collected from the reports into knowledge to inform the government about gaps in public transactions and in strengthening citizen engagement to improve the quality of service delivery. For example, in Bangalore, Bhaskar Rao, the Transport Commissioner for the state of Karnataka, used the data collected on I Paid a Bribe to push through reforms in the motor vehicle department. As a result, and in order to avoid bribes, licenses are now applied for online (Strom, 2012).

Current situation: Trying to reach a greater audience, ipaidabribe.com launched, in mid 2013, “Maine Rishwat Di”, the Hindi language version of the website: http://hindi.ipaidabribe.com/ At the same time, they launched Mobile Apps and SMS services in order to make bribe reporting easier and more accessible to citizens all across India. “I paid a Bribe” has also been replicated with partners in a number of other countries such as Pakistan, Kenya,Morocco and Greece, among others.

Sources: https://www.ipaidabribe.com/about-us

http://southasia.oneworld.net/Files/ict_facilitated_access_to_information_innovations.pdf/at_download/file

http://www.firstpost.com/india/after-reporting-bribes-now-report-rishwats-hindi-version-of-i-paid-a-bribe-launched-1022627.html

http://www.ipaidabribe.com/comment-pieces/“maine-rishwat-di”-hindi-language-version-ipaidabribecom-launched-shankar-mahadevan

Strom, Stephanie (2012) Web Sites Shine Light on Petty Bribery Worldwide. The New York Times. March 6th. Available:  http://www.nytimes.com/2012/03/07/business/web-sites-shine-light-on-petty-bribery-worldwide.html

References

Bhatnagar, S. (2003). Transparency and Corruption?: Does E-Government Help??, 1–9.

De Cindio, F. (2012, April 4). Guidelines for Designing Deliberative Digital Habitats: Learning from e-Participation for Open Data Initiatives. The Journal of Community Informatics.

Fox, J. (2007). The uncertain relationship between transparency and accountability. Development in Practice, 17(4-5), 663–671. doi:10.1080/09614520701469955

Grönlund, Å. (2010). Using ICT to combat corruption – tools, methods and results. In C. Strand (Ed.), Increasing transparency and fighting corruption through ICT: empowering people and communities (pp. 7–26). SPIDER.

Khagram, S., Fung, A., & Renzio, P. de. (2013). Open Budgets: The Political Economy of Transparency, Participation, and Accountability (p. 264). Brookings Institution Press.

Kim, S., Kim, H. J., & Lee, H. (2009). An institutional analysis of an e-government system for anti-corruption: The case of OPEN. Government Information Quarterly, 26(1), 42–50. doi:10.1016/j.giq.2008.09.002

Malmud, C., & O’Reilly, T. (2007, December). 8 Principles of Open Government Data. Retrieved June 01, 2010, from http://resource.org/8_principles.html

Obama, B. (2010). Memo from President Obama on Transparency and Open Government (in Open Government: Collaboration, Transparency and Participation in Practice. In D. Lathrop & L. Ruma (Eds.), .

Power, T. J., & Taylor, M. M. (2011). Corruption and Democracy in Brazil: The struggle for accountability. University of Notre Dame.

Solana, M. (2004). Transparency Portals: Delivering public financial information to Citizens in Latin America. In K. Bain, I. Franka Braun, N. John-Abraham, & M. Peñuela (Eds.), Thinking Out Loud V: Innovative Case Studies on Participatory Instruments (pp. 71–80). World Bank.

Thurston, A. C. (2012). Trustworthy Records and Open Data. The Journal of Community Informatics, 8(2).

Transparency International. (2009). The Anti-Corruption Plain Language Guide.


[1] It is important to clarify that transparency does not necessarily lead to accountability. Transparency, understood as the disclosure of information that sheds light on institutional behavior, can be also defined as answerability. However, accountability (or “hard accountability” according to Fox, 2007) not only implies answerability but also the possibility of sanctions (Fox, 2007).

[2] http://www.alaveteli.org/about/where-has-alaveteli-been-installed/

[4] http://data.edostate.gov.ng/ Accessed 10th October 2013

[8] http://cmgcorissa.gov.in

Joined Up Philanthropy – a data standards exploration

Earlier this year, Indigo Trust convened a meeting with an ambitious agenda: to see 50% of UK Foundation grants detailed as open data, covering 80% founding grant making by value, within five years. Of course, many of the grant-giving foundations in the UK already share details of the work they fund, through annual reports or pages on their websites – but every funder shares the information differently, which makes bringing together a picture of the funding in a particular area or sector, understanding patterns of funding over time, or identifying the foundations who might be interested in a project idea you have, into a laborious manual task. Data standards for the publication of foundation’s giving could change that.

Supported by The Nominet Trust and Indigo Trust, at Practical Participation I’m working with non-profit sector expert Peter Bass on a series of ‘research sprints’ to explore what a data standard could look like. This builds on an experiment back in March to help scope an Open Contracting Data Standard. We’ll be using an iterative methodology to look at

  • (1) the existing supply of data;

  • (2) demand for data and use-cases;

  • and (3) existing related standards.

Each research sprint focusses primarily on one of these, consisting in around 10 days data collection and analysis, designed to generate useful evidence that can move the conversation forward, without pre-empting future decisions or trying to provide the final word on the question of what a data standard should look like.

Supply: What data is already collected?

The first stage, which we’re working on right now, involves finding out about the data that foundations already collect. We’re talking to a number of different foundations large and small to find out about how they manage information on the work they fund right now.

By collating a list of the different database fields that different foundations hold (whether the column headings in the spreadsheets they use to keep track of grants, or the database fields in a comprehensive relational database) and then mapping these onto a common core we’re aiming to build up a picture of which data might be readily available right now and easy to standardise, and where there are differences and diversities that will need careful handing in development of a standard. Past standards projects like the International Aid Transparency Initiative were able to benefit from a large ‘installed base’ of aid donors already using set conventions and data structures drawn from the OECD Development Assistance Committee, which strongly influenced the first version of IATI. We’ll be on the look-out for existing elements of standardisation that might exist to build upon in the foundations sector, as well as seeking to appreciate the diversity of foundations and the information they hold.

We’re aiming to have a first analysis of this exercise out in mid-October, and whilst we’re only focussing on UK foundations, will share all the methods and resources that would allow the exercise to be extended in other contexts.

Demand: what data do people want?

Of course, the data that it is easy to get hold of might not be the data that it is important to have access to, or that potential users want. That motivates the second phase of our research – looking to understand the different use cases for data from the philanthropic sector. These may range from projects seeking to work out who to send their funding applications to; philanthropists seeking to identify partners they could work with; or sector analysts looking to understand gaps in the current giving environment and catalyse greater investment in specific sectors.

Each use case will have different data needs. For example, a local project seeking funding would care particularly about geodata that can tell them who might make grants in their local area; whereas a researcher may be interested in knowing in which financial year grants were awarded, or disbursements made to projects. By articulating the data needs of each use-case, and matching these against the data that might be available, we can start to work out where supply and demand are well matched, or where a campaign for open philanthropy data might need to encourage philanthropists to collect or generate new information on their activities.

Standards: putting the pieces together

Once we know about the data that exists, the data that people want, and how they want to use it – we can start thinking in-depth about standards. There are already a range of standards in the philanthropy space, from the eGrant and hGrant standards developed by the Foundation Centre, to the International Aid Transparency Initiative (IATI) standard, as well as a range of efforts ongoing to develop standards for financial reporting, spending data, and geocoded project information.

Developing a draft standard involves a number of choices:

  • Fields and formats – a standard is made up both of the fields that are deemed important (e.g. value of grant; date of grant etc.) and the technical format through which the data will be represented. Data formats vary in how ‘expressive’ they are, and how extensible a standard is once determined. However, more expressive standards also tend to be more complex.

  • Start from scratch, or extend existing standards – it may be possible to simply adapt an existing standard. Deciding to do this involves both technical and governance issues: for example, if we build on IATI, how would a domestic philanthropy standard adapt to version upgrades in the IATI standard? What collaboration would need to be established? How would existing tools handle the adapted standard.

  • Publisher capacity and needs – standards should reduce rather than increase the burdens on data suppliers. If we are asking publishers to map their data to a complex additional standard, we’re less likely to get a sustainable supply of data. Understanding the technical capacity of people we’ll be asking for data is important.

  • Mapping between standards – sometimes it is possible to entirely automate the conversion between two related standards. For example, if the fields in our proposed standard are a subset of those in IATI, it might be possible to demonstrate how domestic and international funding flows data can be combined. Thinking about how standards map together involves considering the direction in which conversions can take place, and how this relates to the ways different actors might want to make use of the data.

We’ll be rolling our sleeves up as we develop a draft standard proposal, seeking to work with real data from Phase 1 to test out how it works, and checking the standardised data against the use cases identified in Phase 2.

The outcome of this phase won’t be a final standard – but instead a basis for discussion of what standardised data in the philanthropy sector should look like.

Get involved

We’ll be sharing updates regularly through this blog and inviting comments and feedback on each stage of the research.

If you are from a UK based Foundation who would like to be involved in the first phase of research, just drop me a line and we’ll see what we can do. We’re particularly on the look out for small foundations who don’t do much with data right now – so if you’re currently keeping track of your grant-making records on spreadsheets or post-it notes, do get in touch.

Can the G8 Open Data Charter deliver real transparency?

[Summary: cross-post of an article reflecting on the G8 Open Data Charter]

I was asked by The Conversation, a new journalism platform based around linking academic writers with professional journalists and editors, to put together a short article on the recent G8 Open Data Charter, looking at the potential for it to deliver on transparency. The result is now live over on The Conversation site, and pasted in below (under a Creative Commons license). 

Last week G8 leaders signed up to an Open Data Charter, calling for government datasets to be “open data by default”. Open data has risen up the government agenda in the UK over the last three years, with the UK positioning itself as a world leader. But what does the charter mean for G8 nations, and more broadly, will it deliver on the promise of economic impacts and improved governance through the open release of government data relating to matters such as crime figures, energy consumption and election results?

Open government data (OGD) has rapidly developed from being the niche interest of a small community of geeks to a high-profile policy idea. The basic premise of OGD is that when governments publish datasets online, in digital formats that can be easily imported into other software tools, and under legal terms that permit anyone to re-use them (including commercially), those outside government can use that data to develop new ideas, apps and businesses. It also allows citizens to better scrutinise government and hold authorities to account. But for that to happen, the kind of data released, and its quality, matter.

As the Open Knowledge Foundation outlined ahead of the G8 Summit in a release from its Open Data Census “G8 countries still have a long way to go in releasing essential information as open data”. Less than 50% of the core datasets the census lists for G8 members are fully available as open data. And because open data is one of the most common commitments made by governments when they join the wider Open Government Partnership (OGP), campaigners want a clear set of standards for what makes a good open data initiative. The G8 Open Data Charter provides an opportunity to elaborate this. In a clear nod towards the OGP, the G8 charter states: “In the spirit of openness we offer this Open Data Charter for consideration by other countries, multinational organisations and initiatives.”

But can the charter really deliver? Russia, the worst scoring G8 member on the Open Data Census, and next chair of the G8, recently withdrew from the OGP, yet signed up to the Charter. Even the UK’s commitment to “open data by default” is undermined by David Cameron’s admission that the register of company beneficial ownership announced as part of G8 pledges on tax transparency will only be accessible to government officials, rather than being the open dataset campaigners had asked for.

The ability of Russia to sign up to the Open Data Charter is down to what Robison and Yu have called the “Ambiguity of Open Government” — the dual role of open data as a tool for transparency and accountability and for economic growth. As Christian Langehenke explains, Russia is interested in the latter, but was uncomfortable with the focus placed on the former in the OGP. The G8 Charter covers both benefits of open data but is relatively vague when it comes to the release of data for improved governance.

However, if delivered, the specific commitments made in the technical annexe to opening national election and budget datasets, and to improving their quality by December 2013, would signal progress for a number of states, Russia included. Elsewhere in the G8 communiqué, states also committed to publishing open data on aid to the International Aid Transparency Initiative standard, representing new commitments from France, Italy and Japan.

The impacts of the charter may also be felt in Germany and in Canada, where open data campaigners have long been pushing for greater progress to release datasets.Canadian campaigner David Eaves highlights in particular how the charter commitment to open specific “high value” datasets goes beyond anything in existing Canadian policy. Although the pressure of next year’s G8 progress report might not provide a significant stick to spur on action, the charter does give campaigners in Canada, Germany other other G8 nations a new lever in pushing for greater publication of data from their governments.

Delivering improved governance and economic growth will not come from the release of data alone. The charter offers some recognition of this, committing states to “work to increase open data literacy” and “encourage innovative uses of our data through the organisation of challenges, prizes or mentoring”. However, it stops short of considering other mechanisms needed to unlock the democratic and governance reform potential of open data. At best it frames data on public services as enabling citizens to “make better informed choices about the services they receive”, encapsulating a notion of citizen as consumer (a framing Jo Bates refers to the as the co-option of open data agendas), rather than committing to build mechanisms for citizens to engage with the policy process, and thus achieve accountability, on the basis of the data that is made available.

The charter marks the continued rise of open data to becoming a key component of modern governance. Yet, the publication of open data alone stops short of the wider institutional reforms needed to deliver modernised and accountable governance. Whether the charter can secure solid open data foundations on which these wider reforms can be built is something only time will tell.

Geneva E-Participation Day: Open Data and International Organisations

Meeting venue (I think...)[Summary: notes for a talk on open data and International Organisations]

In just over a weeks time I’l be heading for Geneva to take part in Diplo Foundation’s E-Participation Day: towards a more open UN?’ event. In the past I’ve worked with Diplo on remote participation, using the web to support live online participation in face-to-face meetings such as the Internet Governance Forum. This time I’ll be talking open data – exploring the ways in which changing regimes around data stand to impact International Organisations. This blog post was written for the Diplo blog as an introduction to some of the themes I might explore. 

The event will, of course, have remote participation – so you can register to join in-person or online for free here.

E-participation and remote hubs have the potential to open up dialogue and decision making. But after the conferences have been closed, and the declarations made, it is data that increasingly shapes the outcome of international processes. Whether it’s the numbers counted up to check on progress towards the millennium development goals, GDP percentage pledges on aid spending, or climate change targets, the outcomes of international co-operation frequently depend on the development and maintenance of datasets.

The adage that ‘you can’t manage what you can’t measure’ has relevance both for International Organisations and for citizens. The better the flows of data International Organisations can secure access to, the greater their theoretical capacity for co-ordination of complex systems. And the greater the flows of information from the internal workings of International Organisations that citizens, states and pressures groups can access, the greater their theoretical capacity to both scrutinise decisions and to get involved in decision making and implementation. I say theoretical capacity, because the picture is rarely that straightforward in practice. Yet, that complexity aside for a moment, over the last few years the idea has been gaining ground that, in some states has led to not only a greater flow of data, but has driven a veritable flood – with hundreds and thousands of government datasets placed online for anyone to access and re-use. That idea is open data.

Open Data is a simple concept. Organisations holding datasets should place them online, in machine-readable formats, and under licenses that let anyone re-use them. Advocates explain that this brings a myriad of benefits. For example, rather than finance data being locked up in internal finance systems, only available to auditors, open data on budgets and spending can be published on the web for anyone to download and explore in their spreadsheet software, or to let third parties generate visualisations that show citizens where their money is being spent, and to help independent analysts look across datasets for possible inefficiency, fraud or corruption. Or instead of the location of schools or health centres being kept on internal systems, the data can be published to allow innovators to present it to citizens in new and more accessible ways. And in crisis situations, instead of co-ordinators spending days collecting data from agencies in the field and re-keying the data into central databases, if all the organisations involved were to publish open data in common formats, there is the possibility of it being aggregated together, building up a clearer picture of what is going on. One of the highest profile existing open data initiatives in the development field is the International Aid Transparency Initiative (IATI) which now has standardised open data from 100s or donors, providing the foundation for a timely view of who is doing what in aid.

Open data ideas have been spreading rapidly across the world, with many states establishing national Open Government Data (OGD) initiatives, and International Organisations from The World Bank, to UN DESA, the OECD and the Open Government Partnership all developing conversations and projects around open data. When the G8 meet next week in Northern-Ireland they are expected to launch an ‘Open Data Charter’ setting out principles for high quality open data, and committing states to publish certain datasets. Right now it remains to be seen whether open data will feature anywhere else in the  in the G8 action plans, although there is clearly space for open data ideas and practices to be deployed in securing greater tax transparency, or supporting the ongoing monitoring of other commitments. In the case of the post-2105 process, a number of organisations have been advocating for an access to information focus, seeking to ensure citizens have access to open data that they can use to monitor government actions and hold governments to account on delivering on commitments.

However – as Robinson and Yu have highlighted – there can be an ambiguity of open government data: more open data does not necessarily mean more open organisations. The call for ‘raw data now’ has led to much open data emerging simply as an outbound communication, without routes for engagement or feedback, and no change in existing organisational practices. Rather than being treated as a reform that can enable greater organisational collaboration and co-ordination, many open datasets have just been ‘dumped’ on the web. In the same way that remote participation is often a bolt-on to meetings, without the deeper changes in process needed to make for equal participation for remote delegates, at best much open data only offers actors outside of institutions a partial window onto their operations, and at worst, the data itself remains opaque: stripped of context and meaning. Getting open data right for both transparency, and for transforming international collaboration needs more than just technology. 

As I explored with Jovan Kurbalija of Diplo in a recent webinar, there are big challenges ahead if open data is to work as an asset for development: from balancing tensions between standardisation and local flexibility, developing true multi-stakeholder governance of important data flows, and getting the incentives for collaboration right. However, now is the time to be engaging with these challenges – within a window of energy and optimism, and before network effects lock in paradoxically ‘closed’ systems of open data. I hope the dialogue at the Geneva E-Participation day will offer a small chance to broaden open data understanding and conversations in a way that can contribute to such engagement.

Open data in extractives: meeting the challenges


followthedatalinesmallerThere’s lots of interest building right now around how open data might be a powerful tool for transparency and accountability in the extractive industries sector. Decisions over where extraction should take place have a massive impact on communities and the environment, yet often decision making is opaque, with wealthy private interests driving exploitation of resources in ways that run counter the public interest. Whilst revenues from oil, gas and mineral resources have the potential to be a powerful tool for development, with a proportion channeled into public funds, massive quantities of revenue frequently ‘go missing’, lost in corruption, and
fuelling elements of a resource curse.

For the last ten years the Extractive Industries Transparency Initiative has been working to get companies to commit to ‘publish what they pay‘ to government, and for government to disclose receipts of finance, working to identifying missing money through a document-based audit process. Campaigning coalitions, watchdogs and global initiatives have focussed on increasing the transparency of the sector. Now, with a recognition that we need to link together information on different resources flows for development at all levels, potentially through the use of structured open data, and with an anticipated “data tsunami” of new information on extractives financials anticipated from the Dodd-Frank act in the US, and similar regulation in Europe, groups working on extractives transparency have been looking at what open data might mean for future work in this area.

8713819458_08a1bf9c10_zRight now, DFID are taking that exploration forward through a series of hack days with Rewired State under the ‘follow the data’ banner, with the first in London last weekend, and one coming up next week in Lagos, Nigeria. The idea of the events is to develop rapid prototypes of tools that might support extractives transparency, putting developers and datasets together over 24 hours to see what emerges. I was one of the judging panel at this weekends event, where the three developer teams that formed looked respectively at: making datasets on energy production and prices more accessible for re-use through an API; visualising the relationship between extractives revenues and various development indicators; and designing an interface for ‘nuggets’ of insight discovered through hack-days to be published and shared with useful (but minimal) meta-data.

In their way, these three projects highlight a range of the challenges ahead for the extractives sector in building capacity to track resource flows through open data:

  • Making data accessibleThe APIfy project sought to take a number of available datasets and aggregate them together in a database, before exposing a number of API endpoints that made machine-readable standardised data available on countries, companies and commodities. By translating the data access challenge from one or routing around in disparate datasets, to one of calling a standard API for key kinds of ‘objects’, the project demonstrated the need developers often have for clear platforms to build upon. However, as I’ve discovered in developing tools for the International Aid Transparency Initiative, building platforms to aggregate together data often turns out to be a non-trivial project: technically (it doesn’t take long to get to millions of data items when you are dealing with financial transactions), economically (as databases serving millions of records to even a small number of users need to be maintained and funded), socially (developers want to be able to trust the APIs they build against to be stable, and outreach and documentation are needed to support developers to engage with an API), and in terms of information architecture (as design choices over a dataset or API can have a powerful affect on downstream re-users).
  • Connecting datasets – none of the applications from the London hack-day were actually able to follow resource flows through the available data. Although visions of a coherent datasphere, in which the challenge is just making the connection between a transaction in one dataset, and a transaction in another, to see where money is flowing, are appealing – traceability in practice turns out to be a lot harder. To use the IATI example again, across the 100,000+ aid activities published so far less than 1% include traceability efforts to show how one transaction relates to another, and even here the relationships exist in the data because of conscious efforts by publishers to link transaction and activity identifiers. In following the money there will be many cases where people have an incentive not to make these linkages explicit. One of the issues raised by developers over the hack-day was the scattered nature of data, and the gaps across it. Yet – when it comes to financial transaction tracking, we’re likely to often be dealing with partial data, full of gaps, and it won’t be easy to tell at first glance when a mis-match between incoming and outgoing finances is a case of missing data or corruption. Right now, a lot of developers attack open data problems with tools optimised for complete and accurate data, yet we need to be developing tools, methods and visualisation approaches that deal with partial and uncertain data. This is developed in the next point.
  • Correlation, causation and investigation – The Compare the Map project developed on the hack day uses “scraped data from GapMinder and EITI to create graphical tools” that allow a user to eye-ball possible correlations between extractives data and development statistics. But of course, correlation is not causation – and the kinds of analysis that dig deeper into possible relationships are difficult to work through on a hack day. Indeed, many of the relationships mash-ups of this form can show have been written about in papers that control for many more variables, dealing carefully with statistically challenging issues of missing data and imperfectly matched datasets. Rather than simple comparison visualisations that show two datasets side by side, it may be more interesting to look for all the possible statistically significant correlations in a datasets with common reference points, and then to look at how human users could be supported in exploring, and giving feedback on, which of those might be meaningful, and which may or may not already be researched. Where research does show a correlation to exist, then using open data to present a visual narrative to users about this can have a place, though here the theory of change is very different – not about identifying connections – but about communicating them in interactive and engaging ways to those who may be able to act upon them.
  • Sharing and collaborating – The third project at the London hack-day was ‘Fact Cache‘ – a simple concept for sharing nuggets of information discovered in hack-day explorations. Often as developers work through datasets they may come across discoveries of interest, yet these are often left aside in the rush to create a prototype app or platform. Fact Cache focussed on making these shareable. However, when it was presented discussions also explored how it could make these nuggets of information into social objects, open to discussion and sharing. This idea of making open data findings more usable as social objects was also an aspect of the UN Global Pulse hunchworks project. That project is currently on hold (it would be interesting to know why…), but the idea of supporting collaboration around open data through online tools, rather than seeing apps that present data, or initial analysis as the end point, is certainly one to explore more in building capacity for open data to be used in holding actors to account.
  • Developing theories of change – as the judges met to talk about the projects, one of the key themes we looked at was whether each project had a clear theory of change. In some sense taken together they represent the complex chain of steps involved in an open data theory of change, from making data more accessible to developers, creating tools and platforms that let end users explore data, andthen allowing findings from data to be communicated and to shape discourses and action. Few datasets or tools are likely to be change-making on their own – but rather can play a key role in shifting the balance of power in existing networks or organisations, activists, companies and governments. Understanding the different theories of change for open data is one of the key themes in the ongoing Open Data in Developing Countries research, where we take existing governance arrangements as a starting point in understanding how open data will bring about impacts.

In a complex world, access to data, and the capacity to use it effectively, are likely to be essential parts of building more accountable governance across a wide range of areas, including in the extractives industry. Although there are many challenges ahead if we are to secure the maximum benefits from open data for transparent and accountable governance, it’s exciting and encouraging to see so many passionate people putting their minds early to tackling them, and building a community ready to innovate and bring about change.

Note: The usage of ‘follow the data’ in this DFID project is distinct from the usage in the work I’m currently doing to explore ‘follow the data’ research methods. In the former, the focus is really on following financial and resource flows through connecting up datasets; in the latter the focus is on tracing the way in which data artefacts have been generated, deployed, transferred and used in order to understand patterns of open data use and impact.

 

Intelligent Impact: Evaluating an open data capacity building with voluntary sector organisations

[Summary: sharing the evaluation report (9 pages, PDF) of an open data skills workshop for voluntary sector organisations]

Banner

Late last year, through the CSO network on the Open Government Partnership, I got talking with Deirdre McGrath of the Your Voice, Your City project about ways of building voluntary sector capacity to engage with open data. We talked about the possibility of a hack-day, but realised the focus at this stage needed to be on building skills, rather than building tools. It also needed to be on discovering what was possible with open data in the voluntary sector, rather than teaching people a limited set of skills. And as the Your Voice, Your City project was hosted within the London Voluntary Services Council (LVSC), an infrastructure organisation with a policy and research team, we had the possibility of thinking about the different roles needed to make the most of open data, and how a capacity building pilot could work both with frontline Voluntary and Community Sector (VCS) organisations, and an infrastructure organisation. A chance meeting with Nick Booth of podnosh gave form to a theme in our conversations about the need to focus on both ‘stats’ and ‘stories’ ensuring that capacity building worked with both quantitative and qualitative data and information. The result: plans for a short project, centred on a one-day workshop on ‘Intelligent Impact’, exploring the use of social media and open data for VCS organisations.

The day involved staff from VCS organisations coming along with questions or issues they wanted to explore, and then splitting into groups with a team of open data and social media mentors (Nick Booth, Caroline Beavon, Steven Flower, Paul Bradshaw and Stuart Harrison) to look at how existing online resources, or self-created data and media, could help respond to those questions and issues. Alex Farrow captured the story of the day for us using Storify and I’ve just completed a short evaluation report telling the story in more depth, capturing key learning from the event, and setting out possible next steps (PDF).

Following on from the event, the LVSC team have been exploring how a combination of free online tools for curating open data, collating questions, and sharing findings can be assembled into a low-cost and effective ‘intelligence hub‘, where data, analysis and presentation layers are all made accessible to VCS organisations in London.

Developing data standards for Open Contracting

logo-open-contractingContracts have a key role to play in effective transparency and accountability: from the contracts government sign with extractives industries for mineral rights, to the contracts for delivery of aid, contracts for provision of key public services, and contracts for supplies. The Open Contracting initiative aims to improve the disclosure and monitoring of public contracts through the creation of global principles, standards for contract disclosure, and building civil society and government capacity. One strand of work that the Open Contracting team have been exploring to support this work is the creation of a set of open data standards for capturing contract information. This blog post reports on some initial ground work designed to inform this strand of work.

Although I was involved in some of the set-up of this short project, and presented the outcomes at last weeks workshop, the bulk of the work was undertaken by Aptivate‘s Sarah Bird.

Update: see also the report of the process here.

Update 2 (12th Sept 2013): Owen Scott has build on the pilot with data from Nepal.

The process

Developing standards is a complex process. Each choice made has implications: for how acceptable the standard will be to different parties; for how easy certain uses of the data will be; and for how extensible the standard will be, or which other standards it will easily align with. However, standards cannot easily be built up choice-by-choice from a blank slate adopting the ideal choice: they are generally created against a background of pre-existing datasets and standards. The Open Contracting data standards team had already gathered together a range of contract information datasets currently published by governments across the world, and so, with just a few weeks between starting this project and the data standards workshop on 28th March, we planned an 5-day development sprint, aiming to generate a very draft first iteration of a standard. Applying an agile methodology, where short iterations are each designed to yield a viable product by the end, but on the anticipating that further early iterations may revise and radically alter this, meant we had to set a reasonable scope for this first sprint.

The focus then was on the supply side, taking a set of existing contract datasets from different parties, and identifying their commonalities and differences. The contract datasets selected were from the UK, USA, Colombia, Philippines and the World Bank. From looking at the fields these existing datasets had in common, an outline structure was developed, working on a principle of taking good ideas from across the existing data, rather than playing to a lowest common denominator. Then, using the International Aid Transparency Initiative activity standard as a basis, Sarah drafted a basic data structure, which can act as a version 0.01 standard for discussion. To test this, the next step was to convert samples from some of the existing datasets into this new structure, and then to analyse how much of the available data was covered by the structure, and how comprehensive the available data was when placed against the draft structure. (The technical approach taken, which can be found in the sprint’s GitHub repository, was to convert the different incoming data to JSON, and post it into a MongoDB instance for analysis).

We discuss the limitations of this process in a later section.

Initial results

The initial pass of data suggested a structure based on:

  • Organisation data – descriptions of organisations, held separately from individual contract information, and linked by a globally unique ID (based on the IATI Organisational ID standard)
  • Contract meta data – general information about the contract in question, such as title, classification, default currency and primary location of supply. Including an area for ‘line items’ of elements the contract covers.
  • Contract stages – a series of separate blocks of data for different stages of the contract, all contained within the overarching contract element.
    • Bid – key dates and classifications about the procurement stage of a contract process.
    • Award – details of the parties awarded the contract and the details of the award.
    • Performance – details of transactions (payments to suppliers) and work activities carried out during the performance of the contract.
    • Termination – details of the ending of the contract.
  • Documents – fields for linking to related documents.

A draft annotated schema for capturing this data can be found in XML and JSON format here, and a high-level overview is also represented in the diagram below. In the diagrams that follow, each block represents one data point in the draft standard.

1-Phases

We then performed an initial analysis to explore how much of the data currently available from the sources explored would fit into the standard, and how comprehensively the standard could be filled from existing data. As the diagram below indicates, no single source covered all the available data fields, and some held no information on particular stages of the contracting process at all. This may be down to different objectives of the available data sources, or deeper differences in how organisations handle information on contracts and contracting workflows.

2-Coverage

Combining the visualisations above into a single views given a sense of which data points in the draft standard have greatest use, illustrated in the schematic heat-map below.

3-Heatma

At this point the analysis is very rough-and-ready, hence the presentation of a rough impression, rather than detailed field-by-field analysis. The last thing to check was how much data was ‘left over’ and not captured in the standard. This was predominantly the case for the UK and USA datasets, where many highly specialised fields and flags were present the dataset, indicating information that might be relevant to capture in local contract datasets, but which might be harder to find standard representations for across contracts.

4-Extra

The next step was to check whether data that could go into the same fields could be easily harmonised. As the existence of organisation details, or dates, and classifications of contracts across different datasets does not necessarily mean these are interoperable. Fields like dates and financial amounts appeared to be relatively easy to harmonise, but some elements present greater challenges, such as organisational identifiers, contact people, and various codelists in use. However some code-lists may possible to harmonise. For example, the ‘Category’ classifications from across datasets were translated, grouped and aggregated, up to 92% of the original data in a sample was retained.

5-Sum and Group

Implications, gaps, next steps

This first iteration provides a basis for future discussions. There are, however, some important gaps. Most significant of all is that this initial development has been supply-side driven, based around the data that organisations are already publishing, rather than developed on the basis of the data that civil society organisations, or scrutiny bodies, are demanding in order to make sense of complex contract situations. It also omits certain kinds of contracts, such as complex extractives contracts (on which, see the fantastic work Revenue Watch have been doing with getting structured data from PDF contracts with Document Cloud), and Public Private Partnership (PPP) contracts. And it has not delved deeply into the data structures needed for properly capturing information that can aid in monitoring contract performance. These gaps will all need to be addressed in future work.

At the moment, this stands as discrete project, and no set next-steps are agreed as far as I’m aware. However, some of the ideas explored in the meeting on the 28th included:

  • A next iteration – focussed on the demand side – working with potential users of contracts data to work out how data needs to be shaped, and what needs to be in a standard to meet different data re-use needs. This could build towards version 0.02.
  • Testing against a wider range of datasets – either following, or in parallel with, a demand-driven iteration, to discover how the work done so far evolves when confronted with a larger set of existing contract datasets to synthesise.
  • Connecting with other standards. This first sprint took the IATI Standard as a reference point. There may be other standards to refer to in development. Discussions on the 28th with those involved in other standards highlighted an interest in more collaborative working to identify shared building blocks or common elements that might be re-used across standards, and to explore the practical and governance implications of this.
  • Working on complementary building blocks of a data standard – such as common approaches to identifying organisations and parties to a contract; or developing tools and platforms that will aggregate data and make data linkable. The experience of IATI, Open Spending and many other projects appears to be that validators, aggregation platforms and data-wrangling tools are important complements to standards for supporting effective re-use of open data.

Keep an eye on the Open Contracting website for more updates.