Category Archives: Open Data

OCDS – Notes on a standard

logo-open-contracting Today sees the launch of the first release of the Open Contracting Data Standard (OCDS). The standard, as I’ve written before, brings together concrete guidance on the kinds of documents and data that are needed for increased transparency in processes of public contracting, with a technical specification describing how to represent contract data and meta-data in common ways.

The video below provides a brief overview of how it works (or you can read the briefing note), and you can find full documentation at http://standard.open-contracting.org.

When I first jotted down a few notes on how to go forward from the rapid prototype I worked on with Sarah Bird in 2012, I didn’t realise we would actually end up with the opportunity to put some of those ideas into practice. However: we did – and so in this post I wanted to reflect on some aspects of the standard we’ve arrived at, some of the learning from the process, and a few of the ideas that have guided at least my inputs into the development process.

As, hopefully, others pick up and draw upon the initial work we’ve done (in addition to the great inputs we’ve had already), I’m certain there will be much more learning to capture.

(1) Foundations for ‘open by default’

Early open data advocacy called for ‘raw data now‘, asking for governments to essentially export and dump online existing datasets, with issues of structure and regular publishing processes to be sorted out later. Yet, as open data matures, the discussion is shifting to the idea of ‘open by default’, and taken seriously this means more than just data dumps that are created being openly licensed as the default position, but should mean that data is released from government systems as a matter of course in part of their day-to-day operation.

green_compilation.svgThe full OCDS model is designed to support this kind of ‘open by default’, allowing publishers to provide small releases of data every time some event occurs in the lifetime of a contracting process. A new tender is a release. An amendment to that tender is a release. The contract being awarded, or then signed, are each releases. These data releases are tied together by a common identifier, and can be combined into a summary record, providing a snapshot view of the state of a contracting process, and a history of how it has developed over time.

This releases and records model seeks to combine together different user needs: from the firm seeking information about tender opportunities, to the civil society organisation wishing to analyse across a wide range of contracting processes. And by allowing core stages in the business process of contracting to be published as they happen, and then joined up later, it is oriented towards the development of contracting systems that default to timely openness.

As I’ll be exploring in my talk at the Berkman Centre next week, the challenge ahead for open data is not just to find standards to make existing datasets line-up when they get dumped online, but is to envisage and co-design new infrastructures for everyday transparent, effective and accountable processes of government and governance.

(2) Not your minimum viable product

Different models of standard

Many open data standard projects adopt either a ‘Minimum Viable Product‘ approach, looking to capture only the few most common fields between publishers, or are developed through focussing on the concerns of a single publisher or users. Whilst MVP models may make sense for small building blocks designed to fit into other standardisation efforts, when it came to OCDS there was a clear user demand to link up data along the contracting process, and this required an overarching framework from into which simple component could be placed, or from which they could be extracted, rather than the creation of ad-hoc components, with the attempt to join them up made later on.

Whilst we didn’t quite achieve the full abstract model + idiomatic serialisations proposed in the initial technical architecture sketch, we have ended up with a core schema, and then suggested ways to represent this data in both structured and flat formats. This is already proving useful for example in exploring how data published as part of the UK Local Government Transparency Code might be mapped to OCDS from existing CSV schemas.

(3) The interop balancing act & keeping flex in the framework

OCDS is, ultimately, not a small standard. It seeks to describe the whole of a contracting process, from planning, through tender, to contract award, signed contract, and project implementation. And at each stage it provides space for capturing detailed information, linking to documents, tracking milestones and tracking values and line-items.

This shape of the specification is a direct consequence of the method adopted to develop it: looking at a diverse set of existing data, and spending time exploring the data that different users wanted, as well as looking at other existing standards and data specifications.

However, OCDS by not means covers all the things that publishers might want to state about contracting, nor all the things users may want to know. Instead, it focusses on achieving interoperability of data in a number of key areas, and then providing a framework into which extensions can be linked as the needs of different sub-communities of open data users arise.

We’re only in the early stages of thinking about how extensions to the standard will work, but I suspect they will turn out to be an important aspect: allowing different groups to come together to agree (or contest) the extra elements that are important to share in a particular country, sector or context. Over time, some may move into the core of the standard, and potentially elements that appear core right now might move into the realm of extensions, each able to have their own governance processes if appropriate.

As Urs Gasser and John Palfrey note in their work on Interop, the key in building towards interoperability is not to make everything standardised and interoperable, but is to work out the ways in which things should be made compatible, and the ways in which they should not. Forcing everything into a common mould removes the diversity of the real world, yet leaving everything underspecified means no possibility to connect data up. This is both a question of the standards, and the pressures that shape how they are adopted.

(4) Avoiding identity crisis

green_organisation.svgData describes things. To be described, those things need to be identified. When describing data on the web, it helps if those things can be unambiguously identified and distinguished from other things which might have the same names or identification numbers. This generally requires the use of globally unique identifiers (guid): some value which, in a universe of all available contracting data, for example, picks out a unique contracting process; or, in the universe of all organizations, uniquely identifies a specific organization. However, providing these identifiers can turn out to be both a politically and technically challenging process.

The Open Data Institute have recently published a report on the importance of identifiers that underlines how important identifiers are to processes of opening data. Yet, consistent identifiers often have key properties of public goods: everyone benefits from having them, but providing and maintaining them has some costs attached, which no individual identifier user has an incentive to cover. In some cases, such as goods and service identifiers, projects have emerged which take a proprietary approach to fund the maintenance of those identifiers, selling access to the lookup lists which match the codes for describing goods and services to their descriptions. This clearly raises challenges for an open standard, as when proprietary identifiers are incorporated into data, then users may face extra costs to interpret and make sense of data.

In OCDS we’ve sought to take as distributed an approach to identifiers as possible, only requiring globally unique identifiers where absolutely necessary (identifying contracts, organizations and goods and services), and deferring to existing registration agencies and identity providers, with OCDS maintaining, at most, code lists for referring to each identity ‘scheme’.

In some cases, we’ve split the ‘scheme’ out into a separate field: for example, an organization identifier consists of a scheme field with a value like ‘GB-COH’ to stand for UK Companies House, and then the identifier given in that scheme, like ‘5381958’. This approach allows people to store those identifiers in their existing systems without change (existing databases might hold national company numbers, with the field assumed to come from a particular register), whilst making explicit the scheme they come from in the OCDS. In other cases, however, we look to create new composite string identifiers, combining a prefix, and some identifier drawn from an organizations internal system. This is particularly the case for the Open Contracting ID (ocid). By doing this, the identifier can travel between systems more easily as a guid – and could even be incorporated in unstructured data as a key for locating documents and resources related to a given contracting process.

However, recent learning from the project is showing that many organisations are hesistant about the introduction of new IDs, and that adoption of an identifier schema may require as much advocacy as adoption of a standard. At a policy level, bringing some external convention for identifying things into a dataset appears to be seen as affecting the, for want of a better word, sovereignty of a specific dataset: even if in practice the prefix approach of the ocid means it only need to be hard coded in the systems that expose data to the world, not necessarily stored inside organizations databases. However, this is an area I suspect we will need to explore more, and keep tracking, as OCDS adoption moves forward.

(5) Bridging communities of practice

If you look closely you might in fact notice that the specification just launched in Costa Rica is actually labelled as a ‘release candidate‘. This points to another key element of learning in the project, concerning the different processes and timelines of policy and technical standardisation. In the world of funded projects and policy processes, deadlines are often fixed, and the project plan has to work backwards from there. In a technical standardisation process, there is no ‘standard’ until a specification is in use: and has been robustly tested. The processes for adopting a policy standard, and setting a technical one, differ – and whilst perhaps we should have spoken from the start of the project of an overall standard, embedding within it a technical specification, we were too far down the path towards the policy launch before this point. As a result, the Release Candidate designation is intended to suggest the specification is ready to draw upon, but that there is still a process to go (and future governance arrangements to be defined) before it can be adopted as a standard per-se.

(6) The schema is just the start of it

This leads to the most important point: that launching the schemas and specification is just one part of delivering the standard.

In a recent e-mail conversation with Greg Bloom about elements of standardisation, linked to the development of the Open Referral standard, Greg put forward a list of components that may be involved in delivering a sustainable standards project, including:

  • The specification – with its various components and subcomponents);
  • Tools that assesses compliance according to the spec (e.g. validation tools, and more advanced assessment tools);
  • Some means of visualizing a given set of data’s level of compliance;
  • Incentives of some kind (whether positive or negative) for attaining various levels of compliance;
  • Processes for governing all of the above;
  • and of course the community through which all of this emerges and sustains;

To this we might also add elements like documentation and tutorials, support for publishers, catalysing work with tool builders, guidance for users, and so-on.

Open government standards are not something to be published once, and then left, but require labour to develop and sustain, and involve many social processes as much as technical ones.

In many ways, although we’ve spent a year of small development iterations working towards this OCDS release, the work now is only just getting started, and there are many technical, community and capacity-building challenges ahead for the Open Contracting Partnership and others in the open contracting movement.

Creating the capacity building game…

Open Development Camp Logo[Summary: crowdsourcing contributions to a workshop at Open Development Camp]

There is a lot of talk of ‘capacity building’ in the open data world. As the first phase of the ODDC project found, there are many gaps between the potential of open data and it’s realisation: and many of these gaps can be described as capacity gaps – whether on the side of data suppliers, or potential data users.

But how does sustainable capacity for working with open data develop? At the Open Development Camp in a few weeks time I’ll be facilitating a workshop to explore this question, and to support participants to share learning about how different capacity building approaches fit in different settings.

The basic idea is that we’ll use a simple ‘cards and scenarios’ game (modelled, as ever, on the Social Media Game), where we identify a set of scenarios with capacity building needs, and then work in teams to design responses, based on combining a selection of different approaches, each of which will be listed one of the game cards.

But, rather than just work from the cards, I’m hoping that for many of these approaches there will be ‘champions’ on hand, able to make the case for that particular approach, and to provide expert insights to the team. So:

  • (1) I’ve put together a list of 24+ different capacity building approaches I’ve seen in the open data world – but I need your help to fill in the details of their strengths, weaknesses and examples of them in action.
  • (2) I’m looking for ‘champions’ for these approaches, either who will be at the Open Development Camp, or who could prepare a short video input in advance to make the case for their preferred capacity building approach;

If you could help with either, get in touch, or dive in direct on this Google Doc.

If all goes well, I’ll prepare a toolkit after the Open Development Camp for anyone to run their own version of the Capacity Building Game.

The list so far

Click each one to jump direct to the draft document

Exploring Wikidata

WikiData[Summary: thinking aloud – brief notes on learning about the wikidata project, and how it might help addressing the organisational identifiers problem]

I’ve spent a fascinating day today at the Wikimania Conference at the Barbican in London, mostly following the programmes ‘data’ track in order to understand in more depth the Wikidata project. This post shares some thinking aloud to capture some learning, reflections and exploration from the day.

As the Wikidata project manager, Lydia Pintscher, framed it, right now access to knowledge on wikipedia is highly skewed by language. The topics of articles you have access to, the depth of meta-data about them (such as the locations they describe), and the detail of those articles, and their liklihood of being up to date, is greatly affected by the language you speak. Italian or Greek wikipedia may have great coverage of places in Italy or Greece, but go wider and their coverage drops off. In terms of seeking more equal access to knowledge, this is a problem. However, whilst the encyclopedic narrative of a French, Spanish of Catalan page about the Barbican Center in London will need to be written by someone in command of that language, many of the basic facts that go into an article are language-neutral, or translatable as small units of content, rather than sentences and paragraphs. The date the building was built, the name of the architect, the current capacity of the building – all the kinds of things which might appear in infoboxes – are all things that could be made available to bootstrap new articles, or that, when changed, could have their changes cascaded across all the different language pages that draw upon them.

That is one of the motivating cases for Wikidata: separating out ‘items’ and their ‘properties’ that might belong in Wikipedia from the pages, making this data re-usable, and using it to build a better encyclopedia.

However, wikidata is also generating much wider interest – not least because it is taking on a number of problems that many people want to see addressed. These include:

  • Somewhere ‘institutional’ and well governed on the web to put data – and where each data item also gains the advantage of a discussion page.
  • The long-term preservation, and versioning, of data;
  • Providing common identifiers on the web for arbitrary things – and providing URIs for these things that can be looked up (building on the idea of DBPedia as a crystalisation point for the web of linked data);
  • Providing a data model that can cope with change over time, and with data from heterogenous sources – all of the properties in wikidata can have qualifiers, such as when the statement is true from, or until, source information, and other provenance data.

Wikidata could help address these issues on two levels:

  • By allowing anyone to add items and properties to the central wikidata instance, and making these available for re-use;
  • By providing an open source software platform for anyone to use in managing their own corpus of wikified, versioned data*;

A particular use case I’m interested in is whether it might help in addressing the perenial Organisational Identifiers problem faced by data standards such as IATI and Open Contracting, where it turns out that having shared identifiers for government agencies, and lots of existing, but non-registered, entities like charities and associations that give and recieve funds, is really difficult. Others at Wikimania spoke of potential use cases around maintaining national statistics, and archiving the datasets underlying scientific publications.

However, in thinking about the use cases wikidata might have, its important to keep in mind it’s current scope:

  • It is a store of ‘items’ and then ‘statements’ about them (essentially a graph store). This is different from being a place to store datasets (as you might want to do with the archival of the dataset used in a scientific paper), and it means that, once created, items are the first class entities of wikidata, able to exist in multiple collection.
  • It currently inherits Wikipedia’s notability criteria for items. That is, the basic building blocks of wikidata – the items that can be identified and described, such as the Barbican, Cheese or Government of Grenada – can only be included in the main wikidata instance if they have a corresponding wikipedia page in some language wikipedia (or similar: this requirement is a little more complex).
  • It can be edited by anyone, at any time. That is, systems that rely on the data need to consider what levels of consistence they need. Of course, as wikipedia has shown, editability is often a great strength – and as Rufus Pollock noted in the ‘data roundtable’ session, updating and versioning of open data are currently big missing parts of our data infrastructures.

Unlike the entirely distributed open world assumption on the web of data, where the AAA assumption holds (Anyone can say Anything about Anything), wikidata brings both a layer of regulation to the statements that can be made, and the potential of community driven editorial control. It sits somewhere between the controlled description sets of Schema.org, and an entirely open proliferation of items and ontologies to describe them.

Can it help the organisational identifiers problem?

I’ve started to carry out some quick tests to see how far wikidata might be a resource to help with the aforementioned organisational identifiers problem.

Using Kasper Brandt‘s fantastically useful linked data rendering of IATI, I queried for the names of a selection of government and non-government organisations occurring in the International Aid Transparency Initiative data. I then used Open Refine to look up a selection of these on the DBPedia endpoint (which it seems now incorporates wikidata info as well). This was very rough-and-ready (just searching for full name matches), but by cross-checking negative results (where there were no matches) by searching wikipedia manually, it’s possible to get a sense of how many organisations might be identifiable within Wikipedia.

So far I’ve only tested the method, and haven’t run a large scale test – but I found around 1/2 the organisations I checked had a Wikipedia entry of some form, and thus would currently be eligible to be Wikidata items right away. For others, Wikipedia pages would need to be created, and whether or not all the small voluntary organisations that might occur in an IATI or Open Contracting dataset would be notable for inclusion is something that would need to be explored more.

Exploring the Wikidata pages for some of the organisations I did find threw up some interesting additional possibilities to help with organisation identifiers. A number of pages were linked to identifiers from Library Authority Files, including VIAF identifiers such as this set of examples returned for a search on Malawi Ministry of Finance. Library Authority Files would tend to only include entries when a government agency has a publication of some form in that library, but at a quick glance coverage seems pretty good.

Now, as Chris Taggart would be quick to point out, neither wikipedia pages, nor library authority file identifiers, act as a registry of legal entities. They pick out everyday concepts of an organisation, rather than the legally accountably body which enters into contracts. Yet, as they become increasingly backed by data, these identifiers do provide access to look up lots of contextual information that might help in understanding issues like organisational change over time. For example, the Wikipedia page for the UK’s Department for Education includes details on the departments that preceeded it. In wikidata form, a statement like this could even be qualified to say if that relationship of being a preceeding department is one that passes legal obligations from one to the other.

I’ve still got to think about this a lot more, but it seems that:

  • There are many things it might be useful to know about organisations, but which are not going to be captured in official registries anytime soon. Some of these things will need to be subject of discussion, and open to agreement through dialogue. Wikidata, as a trusted shared space with good community governance practices might be a good place to keep these things, albeit recognising that in its current phase it has no goal of being a comprehensive repository of records about all organisations in the world (and other spaces such as Open Corporates are already solving the comprehensive coverage problem for particular classes of organiastion).

  • There are some organisations for which, in many countries, no official registry exists (particularly Government Departments and Agencies). Many of these things are notable (Government Departments for example), and so even if no Wikipedia entry yet exists, one could and should. A project to manage and maintain government agency records and identifiers in Wikidata may be worth exploring.

Whether a shift from seeking to solve some aspects of the organisational identifiers problem through finding some authority to provide master lists, to developing a distributed best-efforts community approach is one that would make sense to the open government community is something yet to be explored.

Notes

*I here acknowledge SJ Klein‘s counsel was that this (encouraging multiple domain specific instances of a wikidata platform) is potentially a very bad idea, as the ‘forking’ of wiki-projects has rarely been a successful journey: particularly with respect to the sustainability of forked content. As SJ outlined, even though there may be technical and social challenges to a mega graph store, these could be compared to the apparant challenges of making the first encyclopedias (the idea of 50,000 page book must have seemed crazy at first), or the social challenges envisioned to Wikipedia at its genesis (‘how could non-experts possible edit an enecylopedia?’). On this view, it is only by setting the ambition of a comprehensive shared store of the worlds propositional data (with the qualifiers that Wikidata supports to make this possible without a closed world assumption) that such limits might be overcome. Perhaps with data there is a greater possibility to support forking, and remerging, of wikidata instances, permitting short-term pragmatic creation of datasets outside the core wikidata project, which can later be brought back in if they are considered, as a set, notable (although this still carries risks that forked projects diverge in their values, governance and structure so far that re-connecting later is made prohibitively difficult).

Fifteen open data insights

ODDC Phase 1 Report - Cover[Summary: blogging the three-page version of Open Data in Developing Countries – Emerging Insights from Phase I paper, with some preamble]

I’m back living in Oxford after my almost-year in the USA at the Berkman Center. Before we returned, Rachel and I took a month to travel around the US – by Amtrak. The delightfully ponderous pace of US trains gave me plenty of time for reading, which was just as well, given June was the month when most of the partners in the Open Data in Developing Countries project I coordinate were producing their final reports. So, in-between time staring at the stunning scenery as we climbed through the Rockies, or watching amazing lightening storms from the viewing car, I was digging through in-depth reports into open data in the global south, and trying to pick out common themes and issues. A combination of post-it notes and scrivener index cards later, and finally back at my desk in Oxford, the result was a report, released alongside the ODDC Research Sharing Event in Berlin last week, that seeks to snapshot 15 insights or provocations for policy-makers and practitioners drawn out from the ODDC case study reports.

These are just the first stage of the synthesis work to be carried out in the ODDC project. In the network meeting also hosted in Berlin last week, we worked on mapping these and other findings from projects onto the original conceptual framework of the project, and looked at identifying further cross-cutting write-ups required. But, for now, below are the 15 points from the three-page briefing version, and you can find a full write-up of these points for download. You can also find reports from all the individual project partners, including a collection of quick-read research posters over on the Open Data Research Network website.

15 insights into open data supply, use and impacts

(1) There are many gaps to overcome before open data availability, can lead to widespread effective use and impact. Open data can lead to change through a ‘domino effect’, or by creating ripples of change that gradually spread out. However, often many of the key ‘domino pieces’ are missing, and local political contexts limit the reach of ripples. Poor data quality, low connectivity, scarce technical skills, weak legal frameworks and political barriers may all prevent open data triggering sustainable change. Attentiveness to all the components of open data impact is needed when designing interventions.

(2) There is a frequent mismatch between open data supply and demand in developing countries. Counting datasets is a poor way of assessing the quality of an open data initiative. The datasets published on portals are often the datasets that are easiest to publish, not the datasets most in demand. Politically sensitive datasets are particularly unlikely to be published without civil society pressure. Sometimes the gap is on the demand side – as potential open data users often do not articulate demands for key datasets.

(3) Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness. The conversation around transparency and accountability that ideas of open data can support is as important as the datasets in some developing countries.

(4) Working on open data projects can change how government creates, prepares and uses its own data. The motivations behind an open data initiative shape how government uses the data itself. Civil society and entrepreneurs interacting with government through open data projects can help shape government data practices. This makes it important to consider which intermediaries gain insider roles shaping data supply.

(5) Intermediaries are vital to both the supply and the use of open data. Not all data needed for governance in developing countries comes from government. Intermediaries can create data, articulate demands for data, and help translate open data visions from political leaders into effective implementations. Traditional local intermediaries are an important source of information, in particular because they are trusted parties.

(6) Digital divides create data divides in both the supply and use of data. In some developing countries key data is not digitised, or a lack of technical staff has left data management patchy and inconsistent. Where Internet access is scarce, few citizens can have direct access to data or services built with it. Full access is needed for full empowerment, but offline intermediaries, including journalists and community radio stations, also play a vital role in bridging the gaps between data and citizens.

(7) Where information is already available and used, the shift to open data involves data evolution rather than data revolution. Many NGOs and intermediaries already access the information which is now becoming available as data. Capacity building should start from existing information and data practices in organisations, and should look for the step-by-step gains to be made from a data-driven approach.

(8) Officials’ fears about the integrity of data are a barrier to more machine-readable data being made available. The publication of data as PDF or in scanned copies is often down to a misunderstanding of how open data works. Only copies can be changed, and originals can be kept authoritative. Helping officials understand this may help increase the supply of data.

(9) Very few datasets are clearly openly licensed, and there is low understanding of what open licenses entail. There are mixed opinions on the importance of a focus on licensing in different contexts. Clear licenses are important to building a global commons of interoperable data, but may be less relevant to particular uses of data on the ground. In many countries wider conversation about licensing are yet to take place.

(10) Privacy issues are not on the radar of most developing country open data projects, although commercial confidentiality does arise as a reason preventing greater data transparency. Much state held data is collected either from citizens or from companies. Few countries in the ODDC study have weak or absent privacy laws and frameworks, yet participants in the studies raised few personal privacy considerations. By contrast, a lack of clarity, and officials’ concerns, about potential breaches of commercial confidentiality when sharing data gathered from firms was a barrier to opening data.

(11) There is more to open data than policies and portals. Whilst central open data portals act as a visible symbol of open data initiatives, a focus on portal building can distract attention from wider reforms. Open data elements can also be built on existing data sharing practices, and data made available through the locations where citizens, NGOs are businesses already go to access information.

(12) Open data advocacy should be aware of, and build upon, existing policy foundations in specific countries and sectors. Sectoral transparency policies for local government, budget and energy industry regulation, amongst others, could all have open data requirements and standards attached, drawing on existing mechanisms to secure sustainable supplies of relevant open data in developing countries. In addition, open data conversations could help make existing data collection and disclosure requirements fit better with the information and data demands of citizens.

(13) Open data is not just a central government issue: local government data, city data, and data from the judicial and legislative branches are all important. Many open data projects focus on the national level, and only on the executive branch. However, local government is closer to citizens, urban areas bring together many of the key ingredients for successful open data initiatives, and transparency in other branches of government is important to secure citizens democratic rights.

(14) Flexibility is needed in the application of definitions of open data to allow locally relevant and effective open data debates and advocacy to emerge. Open data is made up of various elements, including proactive publication, machine-readability and permissions to re-use. Countries at different stages of open data development may choose to focus on one or more of these, but recognising that adopting all elements at once could hinder progress. It is important to find ways to both define open data clearly, and to avoid a reductive debate that does not recognise progressive steps towards greater openness.

(15) There are many different models for an open data initiative: including top-down, bottom-up and sector-specific. Initiatives may also be state-led, civil society-led and entrepreneur-led in their goals and how they are implemented – with consequences for the resources and models required to make them sustainable. There is no one-size-fits-all approach to open data. More experimentation, evaluation and shared learning on the components, partners and processes for putting open data ideas into practice must be a priority for all who want to see a world where open-by-default data drives real social, political and economic change.

You can read more about each of these points in the full report.

New Paper – Mixed incentives: Adopting ICT innovations for transparency, accountability, and anti-corruption

7353-U4Issue-2014-03-04-WEB

[Summary: critical questions to ask when planning, funding or working on ICTs for transparency and accountability]

Last year I posted some drafts of a paper I’ve been writing with Silvana Fumega at the invitation of the U4 Anti-Corruption Center, looking at the incentives for, and dynamics of, adoption of ICTs as anti-corruption tools. Last week the final paper was published in the U4 Issue series, and you can find it for download here.

In the final iteration of the paper we have sought to capture the core of the analysis in the form of a series of critical questions that funders, planners and implementers of anti-corruption ICTs can ask. These are included in the executive summary below, and elaborated more in the full paper.

Adopting ICT innovations for transparency, accountability, and anti-corruption – Executive Summary

Initiatives facilitated by information and communication technology (ICT) are playing an increasingly central role in discourses of transparency, accountability, and anti-corruption. Both advocacy and funding are being mobilised to encourage governments to adopt new technologies aimed at combating corruption. Advocates and funders need to ask critical questions about how innovations from one setting might be transferred to another, assessing how ICTs affect the flow of information, how incentives for their adoption shape implementation, and how citizen engagement and the local context affect the potential impacts of their use.

ICTs can be applied to anti-corruption efforts in many different ways. These technologies change the flow of information between governments and citizens, as well as between different actors within governments and within civil society. E?government ICTs often seek to address corruption by automating processes and restricting discretion of officials. However, many contemporary uses of ICTs place more emphasis on the concept of transparency as a key mechanism to address corruption. Here, a distinction can be made between technologies that support “upward transparency,” where the state gains greater ability to observe and hear from its citizens, or higher-up actors in the state gain greater ability to observe their subordinates, and “downward transparency,” in which “the ‘ruled’ can observe the conduct, behaviour, and/or ‘results’ of their ‘rulers’” (Heald 2006). Streamlined systems that citizens can use to report issues to government fall into the former category, while transparency portals and open data portals are examples of the latter. Transparency alone can only be a starting point for addressing corruption, however: change requires individuals, groups, and institutions who can access and respond to the information.

In any particular application of technology with anti-corruption potential, it is important to ask:

  • What is the direction of the information flow: from whom and to whom?
  • Who controls the flow of information, and at what stages?
  • Who needs to act on the information in order to address corruption?

Different incentives can drive government adoption of ICTs. The current wave of interest in ICT for anti-corruption is relatively new, and limited evidence exists to quantify the benefits that particular technologies can bring in a given context. However, this is not limiting enthusiasm for the idea that governments, particularly developing country governments, can adopt new technologies as part of open government and anti-corruption efforts. Many technologies are “sold” on the basis of multiple promised benefits, and governments respond to a range of different incentives. For example, governments may use ICTs to:

  • Improve information flow and government efficiency, creating more responsive public institutions, supporting coordination.
  • Provide open access to data to enable innovation and economic growth, responding to claims about the economic value of open data and its role as a resource for private enterprise.
  • Address principal-agent problems, allowing progressive and reformist actors within the state to better manage and regulate other parts of the state by detecting and addressing corruption through upward and downward transparency.
  • Respond to international pressure, following the trends in global conversations and pressure from donors and businesses, as well as the availability of funding for pilots and projects.
  • Respond to bottom-up pressure, both from established civil society and from an emerging global network of technology-focussed civil society actors. Governments may do this either as genuine engagement or to “domesticate” what might otherwise be seen as disruptive innovations.

In supporting ICTs for anti-corruption, advocates and donors should consider several key questions related to incentives:

  • What are the stated motivations of government for engaging with this ICT?
  • What other incentives and motivations may be underlying interest in this ICT?
  • Which incentives are strongest? Are any of the incentives in conflict?
  • Which incentives are important to securing anti-corruption outcomes from this ICT?
  • Who may be motivated to oppose or inhibit the anti-corruption applications of this ICT?

The impact of ICTs for anti-corruption is shaped by citizen engagement in a local context. Whether aimed at upward or downward transparency, the successful anti-corruption application of an ICT relies upon citizen engagement. Many factors affect which citizens can engage through technology to share reports with government or act upon information provided by government. ICTs that worked in one context might not achieve the same results in a different setting (McGee and Gaventa 2010). The following questions draw attention to key aspects of context:

  • Who has access to the relevant technologies? What barriers of connectivity, literacy, language, or culture might prevent a certain part of the population from engaging with an ICT innovation?
  • What alternative channels (SMS, offline outreach) might be required to increase the reach of this innovation?
  • How will the initiative close the feedback loop? Will citizens see visible outcomes over the short or long term that build rather than undermine trust?
  • Who are the potential intermediary groups and centralised users for ICTs that provide upward or downward transparency? Are both technical and social intermediaries present? Are they able to work together?

Towards sustainable and effective anti-corruption use of ICTs. As Strand (2010) argues, “While ICT is not a magic bullet when it comes to ensuring greater transparency and less corruption . . . it has a significant role to play as a tool in a number of important areas.” Although taking advantage of the multiple potential benefits of open data, transparency portals, or digitised communication with government can make it easier to start a project, funders and advocates should consider the incentives for ICT adoption and their likely impact on how the technology will be applied in practice. Each of the questions above is important to understanding the role a particular technology might play and the factors that affect how it is implemented and utilised in a particular country.

 

You can read the full paper here.

Data, information, knowledge and power – exploring Open Knowledge’s new core purpose

[Summary: a contribution to debate about the development of open knowledge movements]

New 'Open Knowledge' data-earth logo.

New ‘Open Knowledge Foundation’ name and ‘data earth’ branding.

The Open Knowledge Foundation (re-named as as ‘Open Knowledge’) are soft-launching a new brand over the coming months.

Alongside the new logo, and details of how the new brand was developed, posted on the OK Wiki, appear a set of statements about the motivations, core purpose and tag-line of the organisation. In this post I want to offer an initial critical reading of this particular process and, more importantly, text.

Preliminary notes

Before going further, I want to offer a number of background points that frame the spirit in which the critique is offered.

  1. I have nothing but respect for the work of the leaders, staff team, volunteers and wider community of the Open Knowledge Foundation – and have been greatly inspired by the dedication I’ve seen to changing defaults and practices around how we handle data, information and knowledge. There are so many great projects, and so much political progress on openness, which OKFN as a whole can rightly take credit for.
  2. I recognise that there are massive challenges involved in founding, running and scaling up organisations. These challenges are magnified many times in community based and open organisations.
  3. Organisations with a commitment to openness, or democracy, whether the co-operative movement, open source communities like Mozilla, communities such as Creative Commons and indeed, the Open Knowledge Foundation – are generally held to much higher standards and face much more complex pressures from engaging their communities in what they do – than do closed and conventional organisations. And, as the other examples show, the path is not always an easy one. There are inevitably growing pains and challenges.
  4. It is generally better to raise concerns and critiques and talk about them, than leave things unsaid. A critique is about getting into the details. Details matter.
  5. See (1).

(Disclosure: I have previously worked as a voluntary coordinator for the open-development working group of OKF (with support from AidInfo), and have participated in many community activities. I have never carried out paid work for OKF, and have no current formal affiliation.)

The text

Here’s the three statements in the OK Branding notes that caught my attention and sparked some reflections:

About our brand and what motivates us:
A revolution in technology is happening and it’s changing everything we do. Never before has so much data been collected and analysed. Never before have so many people had the ability to freely, easily and quickly share information across the globe. Governments and corporations are using this data to create knowledge about our world, and make decisions about our future. But who should control this data and the ability to find insights and make decisions? The many, or the few? This is a choice that we get to make. The future is up for grabs. Do we want to live in a world where access to knowledge is “closed”, and the power and understanding it brings is controlled by the few? Or, do we choose a world where knowledge is “open” and we are all empowered to make informed choices about our future? We believe that knowledge should be open, and that everyone – from citizens to scientists, from enterprises to entrepreneurs, – should have access to the information they need to understand and shape the world around them.

Our core purpose:

  • A world where knowledge creates power for the many, not the few.
  • A world where data frees us – to make informed choices about how we live, what we buy and who gets our vote.
  • A world where information and insights are accessible – and apparent – to everyone.
  • This is the world we choose.

Our tagline:
See how data can change the world

The critique

My concerns are not about the new logo or name. I understand (all too well) the way that having ‘Foundation’ in a non-profits name can mean different things in different contexts (not least people expecting you to have an endowment and funds to distribute), and so the move to Open Knowledge as a name has a good rationale. Rather, I wanted to raise four concerns:

(1) Process and representativeness

Tag Cloud from Open Knowledge Foundation Survey. See http://blog.okfn.org/2014/02/12/who-are-you-community-survey-results-part-1/ for details.

Tag Cloud from Open Knowledge Foundation Survey. See blog post for details.

The message introducing the new brand to OKF-Discuss notes that “The network has been involved in the brand development process especially in the early stages as we explored what open knowledge meant to us all” referring primarily to the Community Survey run at the end of 2013 and written up here and here. However, the later parts of developing the brand appear to have been outsourced to a commercial brand consultancy consulting with a limited set of staff and stakeholders, and what is now presented appears to be being offered as given, rather than for consultation. The result has been a narrow focus on the ‘data’ aspects of OKF.

Looking back over the feedback from the 2013 survey, that data-centricity fails to represent the breadth of interests in the OKF community (particularly when looking beyond the quantitative survey questions which had an in-built bias towards data in the original survey design). Qualitative responses to the Survey talk of addressing specific global challenges, holding governments accountable, seeking diversity, and going beyond open data to develop broader critiques around intellectual property regimes. Yet none of this surfaces in the motivation statement, or visibly in the core purpose.

OKF has not yet grappled in full with idea of internal democracy and governance – yet as a network made up of many working groups, local chapters and more, for a ‘core purpose’ statement to emerge without wider consultation seem problematic. There is a big missed opportunity here for deeper discussion about ideas and ideals, and for the conceptualisation of a much richer vision of open knowledge. The result is, I think, a core purpose statement that fails to represent the diversity of the community OKF has been able to bring together, and that may threaten it’s ability to bring together those communities in shared space in future.

Process points aside however (see growing pains point above), there are three more substantive issues to be raised.

(2) Data and tech-centricity

A selection of OKF Working Groups

The Open Knowledge movement I’ve met at OKFestival and other events, and that is evident through the pages of the working groups is one committed to many forms of openness – education, hardware, sustainability, economics, political processes and development amongst others. It is a community that has been discussing diversity and building a global movement. Data may be an element of varying importance across the working groups and interest areas of OKF. And technology may be an enabler of action for each. But a lot are not fundamentally about data, or even technology, as their core focus. As we found when we explored how different members of the Open Development working group understood the concept of open development in 2012, many members focussed more upon open processes than on data and tech. Yet, for all this diversity of focus – the new OK tagline emphasises data alone.

I work on issues of open data everyday. I think it’s an important area. But it’s not the only element of open knowledge that should matter in the broad movement.

Whilst the Open Knowledge Foundation has rarely articulated the kinds of broad political critique of intellectual property regimes that might be found in prior Access to Knowledge movements, developing a concrete motivation and purpose statement gave the OKF chance to deepen it’s vision rather than narrow it. The risk Jo Bates has written about, of intellectual of the ‘open’ movement being co-opted into dominant narratives of neoliberalism, appears to be a very real one. In the motivation statement above, government and big corporates are cast as the problem, and technology and data in the hands of ‘citizens’, ‘scientists’, ‘entrepreneurs’ and (perhaps contradictorily) ‘enterprises’, as the solution. Alternative approaches to improving processes of government and governance through opening more spaces for participation is off the table here, as are any specific normative goals for opening knowledge. Data-centricity displaces all of these.

Now – it might be argued that although the motivation statement takes data as a starting point – is is really at its core about the balance of power: asking who should control data, information and knowledge. Yet – the analysis appears to entirely conflate the terms ‘data’, ‘information’ and ‘knowledge’ – which clouds this substantially.

(3) Data, Information and Knowledge

Data, Information, Knowledge ,Wisdom

The DIKW pyramid offers a useful way of thinking about the relationship between Data, Information, Knowledge (and Wisdom). This has sometimes been described as a hierarchy from ‘know nothing’ (data is symbols and signs encoding things about the world, but useless without interpretation), ‘know what’, ‘know how’ and ‘know why’.

Data is not the same as information, nor the same as knowledge. Converting data into information requires the addition of context. Converting information into knowledge requires skill and experience, obtained through practice and dialogue.

Data and information can be treated as artefacts/thigns. I can e-mail you some data or some information. But knowledge involves a process – sharing it involves more than just sending a file.

OKF has historically worked very much on the transition from data to information, and information to knowledge, through providing training, tools and capacity building, yet this is not captured at all in the core purpose. Knowledge, not data, has the potential to free, bringing greater autonomy. And it is arguably proprietary control of data and information that is at the basis of the power of the few, not any superior access to knowledge that they possess. And if we recognise that turning data into information and into knowledge involves contextualisation and subjectivity, then ‘information and insights’ cannot be by simultaneously ‘apparent’ to everyone, if this is taken to represent some consensus on ‘truths’, rather than recognising that insights are generated, and contested, through processes of dialogue.

It feels like there is a strong implicit positivism within the current core purpose: which stands to raise particular problems for broadening the diversity of Open Knowledge beyond a few countries and communities.

(4) Power, individualism and collective action

I’ve already touched upon issues of power. Addressing “global challenges like justice, climate changes, cultural matters” (from survey responses) will not come from empowering individuals alone – but will have to involve new forms of co-ordination and collective action. Yet power in the ‘core purpose’ statement appears to be primarily conceptualised in terms of individual “informed choices about how we live, what we buy and who gets our vote”, suggesting change is purely the result of aggregating ‘choice’, yet failing to explore how knowledge needs to be used to also challenge the frameworks in which choices are presented to us.

The ideas that ‘everyone’ can be empowered, and that when “knowledge is ‘open’ […] we are all empowered to make informed choices about our future” fails to take account of the wider constraints to action and choice that many around the world face, and that some of the global struggles that motivate many to pursue greater openness are not always win-win situations. Those other constraints and wider contexts might not be directly within the power of an open knowledge movement to address, or the core preserve of open knowledge, but they need to be recognised and taken into account in the theories of change developed.

In summary

I’ve tried to deal with the Motivation, Core Purpose and Tag-line statements with as carefully as limited free time allows – but inevitably there is much more to dig into – and there will be other ways of reading these statements. More optimistic readings are possible – and I certainly hope might turn out to be more realistic – but in the interest of dialogue I hope that a critical reading is a more useful contribution to the debate, and I would re-iterate my preliminary notes 1 – 5 above.

To recap the critique:

  • Developing a brand and statement of core purpose is an opportunity for dialogue and discussion, yet right now this opportunity appears to have be mostly missed;
  • The motivation, core purpose and tagline are more tech-centric and data-centric than the OKF community, risking sidelining other aspects of the open knowledge community;
  • There need to be a recognition of the distinction of data, information and knowledge, to develop a coherent theory of change and purpose;
  • There appears to be an implicit libertarian individualism in current theories of change, and it is not clear that this is compatible with working to address the shared global challenges that have brought many people into the open knowledge community.

Updates:

There is some discussion of these issues taking place on the OKFN-Discuss list, and the Wiki page has been updated from that I was initially writing about, to re-frame what was termed ‘core purpose’ as ‘brand core purpose’.

Five critical questions for constructing data standards

I’ve been spending a lot of time thinking about processes of standardisation recently (building on the recent IATI Technical Advisory Group meeting, working on two new standards projects, and conversations at today’s MIT Center for Civic Media & Berkman Center meet-up). One of the key strands in that thinking is around how pragmatics and ethics of standards collide. Building a good standard involves practical choices based on the data that is available, the technologies that might use that data and what they expect, and the feasibility of encouraging parties who might communicate using that standard to adapt their practices (more or less minimally) in order to adopt it. But a standard also has ethical and political consequences, whether it is a standard deep in the Internet stack (as John Morris and Alan Davidson discuss in this paper from 2003[1]), or a standard at the content level, supporting exchange of information in some specific domain.

The five questions below seek to (in a very provisional sense) capture some of the considerations that might go into an exploration of the ethical dimensions of standard construction[2].

(Thanks to Rodrigo DaviesCatherine D’Ignazio and Willow Brugh for the conversations leading to this post)

For any standard, ask:

Who can use it?

Practically I mean. Who, if data in this standard format was placed in front of them, would be able to do something meaningful with it. Who might want to use it? Are people who could benefit from this data excluded from using it by it’s complexity?

Many data standards assume that ‘end users’ will access the data through intermediaries (i.e. a non-technical user can only do anything with the data after it has been processed by some intermediary individual or tool) – but not everyone has access to intermediaries, or intermediaries may have their own agendas or understandings of the world that don’t fit with those of the data user.

I’ve recently been exploring whether it’s possible to turn this assumption around, and make simple versions of a data standard the default, with more expressive data models available to those with the skills to transform data into these more structured forms. For example, the Three Sixty Giving standard (warning: very draft/provisional technical docs) is based around the idea of a rich data model, but a simple flat-as-possible serialisation that means most of the common forms of analysis someone might want to do with the data can be done in a spreadsheet, and for 90%+ of cases, data can be exchanged in flat(ish) forms, with richer structures only used where needed.

What can be expressed?

Standards make choices about what can be expressed usually at two levels:

  • Field choice
  • Taxonomies / codelists

Both involve making choices about how the world is sliced up, and what sorts of things can be represented and expressed.

A thought experiment: If I asked people in different social situations an open question inviting them to tell me about the things a standard is intended to be about (e.g. “Tell me about this contract?”) how much of what they report can be captured in the standard? Is it better at capturing the information seen as important to people in certain social positions? Are there ways it could capture information from those in other positions?

What social processes might it replace or disrupt?

Over the short-term, many data standards end up being fed by existing information systems – with data exported and transformed into the standard. However, over time, standards can lead to systems being re-engineered around them. And in shifting the flow of information inside and outside of organisations, standards processes can disrupt and shift patterns of autonomy and power.

Sometimes the ‘inefficient’ processes of information exchange, which open data standards seek to rationalise, can be full of all sorts of tacit information exchange, relationship building etc. which the introduction of a standard could affect. Thinking about how the technical choices in a standard affect it’s adoption, and how far they allow for distributed patterns of data generation and management may be important. (For example, which identifiers in a standard have to be maintained centrally, thus placing a pressure for centralised information systems to maintain the integrity of data – and which can be managed locally – making it easier to create more distributed architectures. It’s not simply a case of what kinds of architectures a standard does or doesn’t allow, but which it makes easier or trickier, as in budget constrained environments implementations will often go down the path of least resistance, even if it’s theoretically possible to build out implementation of standard-using tools in ways that better respect the exiting structures of an organisation.)

Which fields are descriptive? Which fields are normative?

There has recently been discussion of the introduction on Facebook of a wide range of options for describing Gender, with Jane Fae arguing in the Guardian that, rather than provide a restricted list of fields, the field should simply be dropped altogether. Fae’s argument is about the way in which gender categories are used to target ads, and that it has little value as a category otherwise.

Is it possible to look at a data standard and consider which proposed fields import strong normative worldviews with them? And then to consider omitting these fields?

It may be that for some fields, silence is the better option that forcing people, organisations or events (or whatever it is that the standard describes) into boxes that don’t make sense for all the individuals/cases covered…

Does it permit dissent?

Catherine D’Ignazio suggested this question. How far does a standard allow itself to be disputed? What consequences are there to breaking the rules of a standard or remixing it to express ideas not envisaged by the original architects? What forms of tussle can the standard accommodate?

This is perhaps even more a question of the ecosystem of tools, validators and other resources around the standard than a standard specification itself, but these are interelated.

Footnotes

[1]: I’ve been looking for more recent work on ‘public interest’ and politics of standard creation. Academically I spend a lot of time going back to Bowker and Star’s work on ‘infrastructure’, but I’m on the look out for other works I should be drawing upon in thinking about this.

[2]: I’m talking particularly about open data standards, and standards at the content level, like IATI, Open 311, GTFS etc.

ODDC Update at Developers for Development, Montreal

[Summary: Cross posted from the Open Data Research Network website. Notes from a talk at OD4DC Montreal] 

I’m in Montreal this week for the Developers for Development hackathon and conference. Asides from having fun building a few things as part of our first explorations for the Open Contracting Data Standard, I was also on a panel with the fantastic Linda Raftree, Laurent Elder and Anahi Ayala Iacucci focussing on the topic of open data impacts in developing country: a topic I spend a lot of time working on. We’re still in the research phase of the Emerging Impacts of Open Data in Developing Countries research network, but I tried to pull together a talk that would capture some of the themes that have been coming up in our network meetings so far. So – herewith the slides and raw notes from that talk.

Introduction

In this short presentation I want to focus on three things. Firstly, I want to present a global snapshot of open data readiness, implementation and impacts around the world.

Secondly, I want to offer some remarks on the importance of how research into open data is framed, and what social research can bring to our understanding of the open data landscape in developing countries.

Lastly, I want to share a number of critical reflections emerging from the work of the ODDC network.

Part 1: A global snapshot

I’ve often started presentations and papers about open data by commenting on how ‘it’s just a few short years since the idea of open data gained traction’, yet, in 2014 that line is starting to get a little old. Data.gov launched in 2009, Kenya’s data portal in 2011. IATI has been with us for a while. Open data is no longer a brand new idea, just waiting to be embraced – it is becoming part of the mainstream discourse of development and government policy. The issue now is less about convincing governments to engage with the open data agenda, than it is about discovering whether open data discourses are translating into effective implementation, and ultimately open data impacts.

Back in June last year, at the Web Foundation we launched a global expert survey to help address that question. All-in-all we collected data covering 77 countries, representing every region, type of government and level of development, and asking about government, civil society and business readiness to secure benefits from open data, the actual availability of key datasets, and observed impacts from open data. The results were striking: over 55% of these diverse countries surveyed had some form of open data policy in place, many with high-level ministerial support.

The policy picture looks good. Yet, when it came to key datasets actually being made available as open data, the picture was very different. Less than 7% of the dataset surveyed in the Barometer were published both in bulk machine-readable forms, and under open licenses: that is, in ways that would meet the open definition. And much of this percentage is made up of the datasets published by a few leading developed states. When it comes to essential infrastructural datasets like national maps, company registers or land registries, data availability, of even non-open data, is very poor, and particularly bad in developing countries. In many countries, the kinds of cadastral records that are cited as a key to the economic potential of open data are simple not yet collected with full country coverage. Many countries have long-standing capacity building programs to help them create land registries or detailed national maps – but with many such programmes years or even decades behind on delivering the required datasets.

The one exception where data was generally available and well curated, albeit not provided in open and accessible forms, was census data. National statistics offices have been the beneficiaries of years of capacity building support: yet the same programmes that have enabled them to manage data well have also helped them to become quasi-independent of governments, complicating whether or not they will easily be covered by government open data policies.

If the implementation story is disappointing, the impact story is even more so. In the Barometer survey we asked expert researchers to cite examples of where open data was reported in the media, or in academic sources, to have had impacts across a range of political, social and economic domains, and to score questions on a 10-point scale for the breadth and depth of impacts identified. The scores were universally low. Of course, whilst the idea of open data can no longer be claimed to be brand new, many country open data initiatives are – and so it is far to day that outcomes and impacts take time – and are unlikely to be seen over in any substantial way over the very short term. Yet, even in countries where open data has been present for a number of years, evidence of impact was light. The impacts cited were often hackathon applications, which, important as they are, generally only prototype and point to potential impacts. Without getting to scale, few demo applications along can deliver substantial change.

Of course, some of this impact evidence gap may also be down to weaknesses in existing research. Some of the outcomes from open data publication are not easily picked up in visible applications or high profile news stories. That’s where the need for a qualitative research agenda really comes in.

Part 2: The Open Data Barometer

The Open Data Barometer is just one part of a wider open data programme at the World Wide Web Foundation, including the Open Data in Development Countries research project supported by Canada’s International Development Research Center. The main focus of that project over the last 12 months has been on establishing a network of case study research partners based in developing countries, each responding to both local concerns, and a shared research agenda, to understand how open data can be put to use in particular decision making and governance situations.

Our case study partners are drawn from Universities, NGOs and independent consultancies, and were selected from responses to an open call for proposals issues in mid 2012. Interestingly, many of these partners were not open data experts, or already involved in open data – but were focussed on particular social and policy issues, and were interested in looking at what open data meant for these. Focus areas for the cases range from budget and aid transparency, to higher education performance, to the location of sanitation facilities in a city. Together, these foundations gives the research network a number of important characteristics:

Firstly, whilst we have a shared research framework that highlights particular elements that each case study seeks to incorporate – from looking at the political, social and economic context of open data, through to the technical features of datasets and the actions of intermediaries – cases are also able to look at the different constraints exogenous to datasets themselves which affect whether or not data has a chance of making a difference.

Secondly, the research network works to build critical research capacity around open data – bringing new voices into the open data debate. For example, in Kenya, the Jesuit Hakimani Trust have an established record working on citizens access to information, but until 2013 had not looking at the issue of open data in Kenya. By incorporating questions about open data in their large-scale surveys of citizen attitudes, they start generating evidence that treats open data alongside other forms of access to information for poor and marginalisd citizens, generating new insights.

Thirdly, the research is open to unintended consequences of open data publication: good and bad – and can look for impacts outside the classic logic model of ‘data + apps = impact’. Indeed, as researchers in both Sao Paulo and Chennai have found, they have, as respected research intermediaries exploring open data use, been invited to get involved with shaping future government data collection practices. Gisele Craviero from the University of Sao Paulo uses the metaphor of an iceberg to highlight this importance of looking below the surface. The idea that opening data ultimately changes what data gets collected, and how it is handled inside the state should not be an alien idea for those involved in IATI – which has led to many aid agencies starting to geocode their data. But it is a route to effects often underplayed in explorations of the changes open data may be part of bringing about.

Part 3: Emerging findings

As mentioned, we’ve spent much of 2013 building up the Open Data in Developing Countries research network – and our case study parters are right now in the midst of their data collection and analysis. We’re looking forward to presenting full findings from this first phase of research towards the summer, but there are some emerging themes that I’ve been hearing from the network in my role as coordinator that I want to draw out. I should note that these points of analysis are preliminary, and are the product of conversations within the network, rather than being final statements, or points that I claim specific authorship over.

We need to unpack the definition of open data.

Open data is generally presented as a package with a formal definition. Open data is data that is proactively published, in machine-readable formats, and under open licenses. Without all of these: there isn’t open data. Yet, ODDC participants have been highlighting how the relative importance of these criteria varies from country to country. In Sierra Leone, for example, machine-readable formats might be argued to be less important right now than proactive publication, as for many datasets the authoritative copy may well be the copy on paper. In India, Nigeria or Brazil, the question of licensing may by mute: as it is either assumed that government data is free to re-use, regardless or explicit statements, or local data re-users may be unconcerned with violating licenses, based on a rational expectation that no-one will come after them.

Now – this is not to say that the Open Definition should be abandoned, but we should be critically aware of it’s primary strength: it helps to create a global open data commons, and to deliver on a vision of ‘Frictionless data’. Open data of this form is easier to access ‘top down’, and can more easily be incorporated into panopticon-like development dashboards, but the actual impact on ‘bottom up’ re-use may be minimal. Unless actors in a developing country are equipped with the skills and capacities to draw on this global commons, and to overcome other local ‘frictions’ to re-using data effectively, the direct ROI on the extra effort to meet a pure open definition might not accrue to those putting the effort in: and a dogmatic focus on strict definitions might even in some cases slow down the process of making data relatively more accessible. Understanding the trade offs here requires more research and analysis – but the point at least is made that there can be differences of emphasis in opening data, and these prioritise different potential users.

Supply is weak, but so is demand.

Talking at the Philippines Good Governance Summit a few weeks ago, Michael Canares presented findings from his research into how the local government Full Disclosure Policy (FDP) is affecting both ‘duty bearers’ responsible for supplying information on local budgets, projects, spend and so-on, and ‘claim holders’ – citizens and their associations who seek to secure good services from government. A major finding has been that, with publishers being in ‘compliance mode’, putting required information but in accessible formats, citizen groups articulated very little demand for online access to Full Disclosure Policy information. Awareness that the information was available was low, interest in the particular data published was low (that is, information made available did not match with any specific demand), and where citizen groups were accessing the data they often found they did not have the knowledge to make sense of or use it. The most viewed and download documents garnered no more than 43 visits in the period surveyed.

In open data, as we remove the formal or technical barriers to data re-use that come from licenses and non-standard formats, we encounter the informal hurdles, roadblocks and thickets that lay behind them. And even as those new barriers are removed through capacity building and intermediation, we may find that they were not necessarily holding back a tide of latent demand – but were rather theoretical barriers in the way of a progressive vision of an engaged citizenry and innovative public service provision. Beyond simply calling for the removal of barriers, this vision needs to be elaborated – whether through the designs of civic leaders, or through the distributed actions of a broad range of social activists and entrepreneurs. And the tricky challenge of culture change – changing expectations of who is, and can be, empowered – needs to be brought to the fore.

Innovative intermediation is about more than visualisation.

Early open data portals listed datasets. Then they started listing third party apps. Now, many profile interactive visualisations built with data, or provide visualisation tools. Apps and infographics have become the main thing people think of when it comes to ‘intermediaries’ making open data accessible. Yet, if you look at how information flows on the ground in developing countries, mobile messaging, community radio, notice boards, churches and chiefs centres are much more likely to come up as key sites of engagement with public information.

What might open data capacity building look like if we started with these intermediaries, and only brought technology in to improve the flow of data where that was needed? What does data need to be shaped like to enable these intermediaries to act with it? And how do the interests of these intermediaries, and the constituencies they serve, affect what will happen with open data? All these are questions we need to dig into further.

Summary

I said in the opening that this would be a presentation of critical reflections. It is important to emphasise that none of this constitutes an argument against open data. The idea that government data should be accessible to citizens retains its strong intrinsic appeal. Rather, in offering some critical remarks, I hope this can help us to consider different directions open data for development can take as it matures, and that ultimately we can move more firmly towards securing impacts from the important open data efforts so many parties are undertaking.

ICTs and Anti-Corruption: theory and examples

[Summary: draft section from U4 paper on exploring the incentives for adopting ICT innovation in the fight against corruption]

As mentioned a few days ago, I’ve currently got a paper online for comment which I’m working on with Silvana Fumega for the U4 anti-corruption centre. I’ll be blogging each of the sections here, and if you’ve comments on any element of it, please do drop in comments to the Google Doc draft. 

ICTS AND ANTI-CORRUPTION

Corruption involves the abuse of entrusted power for personal gain (Transparency International, 2009). Grönlund has identified a wide range of actions that can be taken with ICTs to try and combat corruption, from service automation and the creation of online and mobile phone based corruption-reporting channels to the online publication of government transparency information (Grönlund, 2010). In the diagram below we offer eight broad categories of ICTs interventions with a potential role in fighting corruption.

U4-Diagram

These different ICT interventions can be divided between transactional reforms and transparency reforms. Transactional reforms seek to reduce the space for corrupt activity by controlling and automating processes inside government, or seek to increase the detection of corruption by increasing the flow of information into existing government oversight and accountability mechanisms. Often these developments are framed as part of e-government. Transparency reforms, by contrast, focus on increasing external rather than internal control over government actors by making the actions of the state and its agents more visible to citizens, civil society and the private sector. In the diagram, categories of ICT intervention and related examples are positioned along a horizontal axis to indicate, in general, whether these initiatives have emerged as ‘citizen led’ or ‘government led’ projects, and along the vertical axis to indicate whether the focus of these activities is primarily on transactional reforms, or transparency. In practice, where any actual ICT intervention falls is a matter as much of the details of implementation as it is to do with the technology, although we find these archetypes useful to highlight the different emphasis and origins of different ICT-based approaches.

Many ICT innovations for transparency and accountability[1] have emerged from within civil society and the private sector, only later adopted by governments. In this paper our focus is specifically upon government adoption of innovations: when the government is taking the lead role in implementing some technology with an anti-corruption potential, albeit a technology that may have originally been developed elsewhere, and where similar instances of such technologies may still be deployed by groups outside government. For example, civil society groups in a number of jurisdictions have deployed the Alaveteli open source software[2] which brokers the filing of Right to Information act requests online, logging and making public requests to, and replies from, government. Some government agencies have responded by building their own direct portals for filing requests, which co-exist with the civil society run Alaveteli implementations. The question of concern for this paper is why government has chosen to adopt the innovation and provide its own RTI portals.

Although there are different theories of change underlying ICT enabled transactional and transparency reforms, the actual technologies involved can be highly inter-related. For example, digitising information about a public service as part of an e-government management process means that there is data about its performance that can be released through a data portal and subjected to public pressure and scrutiny. Without the back-office systems, no digital records are available to open (Thurston, 2012).

The connection between transactional e-government and anti-corruption has only relatively recently been explored. As Bhatnagar notes, most e-government reforms did not begin as anti-corruption measures. Instead, they were adopted for their promise to modernise government and make it more efficient (Bhatnagar, 2003). Bhatnagar explains that “…reduction of corruption opportunities has often been an incidental benefit, rather than an explicit objective of e-government”. A focus on the connection between e-government and transparency is more recent still. Kim et. al. (2009) note that “E-government’s potential to increase transparency and combat corruption in government administration is gaining popularity in communities of e-government practitioners and researchers…”, arguably as a result of increased Internet diffusion meaning that for the first time data and information from within government can, in theory, be made directly accessible to citizens through computers and mobile phones, without passing through intermediaries.

In any use of ICTs for anti-corruption, the technology itself is only one part of the picture. Legal frameworks, organisational processes, leadership and campaign strategies may all be necessary complements of digital tools in order to secure effective change. ICTs for accountability and anti-corruption have developed in a range of different sectors and in response to many different global trends. In the following paragraphs we survey in more depth the emergence and evolution of three kinds of ICTs with anti-corruption potential, looking at both the technologies and the contexts they are embedded within. 

2.1 TRANSPARENCY PORTALS

A transparency portal is a website where government agencies routinely publish defined sets of information. They are often concerned with financial information and might include details of laws and regulations alongside more dynamic information such as government debt, departmental budget allocations and government spending (Solana, 2004). They tend to have a specific focus, and are often backed by a legal mandate, or regulatory requirement, that information is published to them on an ongoing basis. National transparency portals have existed across Latin America since the early 2000s, developed by finance ministries following over 15 years investment in financial management capacity building in the region. Procurement portals have also become common, linked to efforts to make public procurement more efficient, and comply with regulations and good practice on public tenders.

More recently, a number of governments have mandated the creation of local government transparency portals, or the creation of dedicated transparency pages on local government websites. For example, in the United Kingdom, the Prime Minister requested that governments publish all public spending over £500 on their websites, whilst in the Philippines the Department of Interior and Local Government (DILG) has pushed the implementation of a Full Disclosure Policy requiring Local Government Units to post a summary of revenues collected, funds received, appropriations and disbursement of funds and procurement–related documents on their websites. The Government of the Philippines has also created an online portal to support local government units in publishing the documents demanded by the policy[3].

In focus: Peru Financial Transparency Portal A transparency portal is a website where government agencies routinely publish defined sets of information. They are often concerned with financial information and might include details of laws and regulations alongside more dynamic information such as government debt, departmental budget allocations and government spending.

Country: Peru

Responsible: Government of Peru- Ministry of Economic and Financial Affairs

Brief description: The Peruvian Government implemented a comprehensive transparency strategy in early 2000. That strategy comprised several initiatives (law on access to financial information, promotion of citizen involvement in transparency processes, among others). The Financial Transparency Portal was launched as one of the elements of that strategy. In that regard, Solanas (2003) suggests that the success of the portal is related to the existence of a comprehensive transparency strategy, in which the portal serves as a central element. The Portal (http://www.mef.gob.pe/) started to operate in 2001 and, at that time, it was praised as the most advanced in the region. Several substantial upgrades to the portal have taken place since the launch.

Current situation:

The portal presents several changes from its early days. In the beginning, the portal provided access to documents on economic and financial information. After more than a decade, it currently publishes datasets on several economic and financial topics, which are provided by each of the agencies in charge of producing or collecting the information. Those datasets are divided in 4 main modules: budget performance monitoring, implementation of investment projects, inquiry on transfers to national, local and regional governments, and domestic and external debt. The portal also includes links to request information, under the Peruvian FOI law, as well as track the status of the request.

Sources:

http://www.politikaperu.org/directorio/ficha.asp?id=355

http://www.egov4dev.org/transparency/case/laportals.shtml

http://www.worldbank.org/socialaccountability_sourcebook/Regional%20database/Case%20 studies/Latin%20America%20&%20Caribbean/TOL-V.pdf#page=71

In general, financial transparency portals have focussed on making government records available: often hosting image file version of printed, signed and scanned documents which mean that anyone wanting to analyse the information from across multiple reports must re-type it into spreadsheets or other software. Although a number of aid and budget transparency portals are linked directly to financial management systems, it is only recently that a small number of portals have started to add features giving direct access to datasets on budget and spending.

Some of the most data-centric transparency portals can be found in the International Aid field, where Aid Transparency Portals have been built on top of Aid Management Platforms used by aid-recipient governments to track their donor-funded projects and budgets. Built with funding and support from International donors, aid transparency portals such as those in Timor Leste and Nepal offer search features across a database of projects. In Nepal, donors have funded the geocoding of project information, allowing a visual map of where funding flows are going to be displayed.

Central to the hypothesis underlying the role of transparency portals in anti-corruption is the idea that citizens and civil society will demand and access information from the portals, and will use it to hold authorities to account (Solana, 2004). In many contexts whilst transparency portals have become well-established, direct demand from citizens and civil society for the information they contain remains, as Alves and Heller put it in relation to Brazil’s fiscal transparency, “frustratingly low” (in Khagram, Fung, & Renzio, 2013). However, transparency portals may also be used by the media and other intermediaries, providing an alternative more indirect theory of change in which coverage of episodes of corruption creates electoral pressures (in functioning democracies at least) against corruption. Though, Power and Taylor’s work on democracy and corruption in Brazil suggests that whilst such mechanisms can have impacts, they are often confounded in practice by other non-corruption related factors that influence voters preferences, and a wide range of contingencies, from electoral cycles to political party structures and electoral math (Power & Taylor, 2011).

2.2 OPEN DATA PORTALS

Where transparency portals focus on the publication of specific kinds of information (financial; aid; government projects etc.), open data portals act as a hub for bringing together diverse datasets published by different government departments.

Open data involves the publication of structured machine-readable data files online with explicit permission granted for anyone to re-use the data in any way. This can be contrasted with examples where transparency portals may publish scanned documents that cannot be loaded into data analysis software, or under copyright restrictions that deny citizens or businesses right to re-use the data.  Open data has risen to prominence over the last five years, spurred on by the 2009 Memorandum on Transparency and Open Government from US President Obama (Obama, 2010) which led to the creation of thedata.gov portal, bringing together US government datasets. This built on principles of Open Government Data elaborated in 2007 by a group of activists meeting in Sebastopol California, calling for government to provide data online that was complete, primary (I.e. not edited or interpreted by government before publication), timely, machine-readable, standardised and openly licensed (Malmud & O’Reilly, 2007)

In focus: Kenya Open Data Initiative (KODI) Open data involves the publication of structured machine-readable data files online with explicit permission granted for anyone to re-use the data in any way. Open data portals act as a hub for bringing together diverse datasets published by different government departments. One of those platforms is: Kenya Open Data Initiative (opendata.go.ke)

Country: Kenya

Responsible: Government of Kenya

Brief description:

Around 2008, projects from Ushahidi to M-PESA put Kenya on the map of ICT innovation. Kenyan government – in particular, then-PS Ndemo of the Ministry of Information and Communications – eager to promote and to encourage that market, started to analyze the idea of publishing government datasets for this community of ICT experts to use.  In that quest, he received support from actors outside of the government such as the World Bank, Google and Ushahidi. Adding to that context, in 2010 a new constitution, recognizing the right to access to information by citizens, was enacted in Kenya (however, a FOI law is still a pending task for the Kenyan government). On July 8 2011, President Mwai Kibaki launched the Kenya Open Data Initiative, making government datasets available to the public through a web portal: opendata.go.ke

Current situation:

Several activist and analyst are starting to write about the lack of updates and updated information of the Kenya Open Data Initiative. The portal has not been updated in several months, and its traffic has slowed down significantly.

Sources:

http://www.scribd.com/doc/75642393/Open-Data-Kenya-Long-Version

http://blog.openingparliament.org/post/63629369190/why-kenyas-open-data-portal-is-failing-and-why-it

http://www.code4kenya.org/?p=469

http://www.ict.go.ke/index.php/hot-topic/416-kenya-open-data

http://www.theguardian.com/global-development/poverty-matters/2011/jul/13/kenya-open-data-initiative

Open data portals have caught on as a policy intervention, with hundreds now online across the world, including an increasing number in developing countries. Brazil, India and Kenya all have national open government data portals, and Edo State in Nigeria recently launched one of the first sub-national open data portals on the continent, expressing a hope that it would “become a platform for improving transparency, catalyzing innovation, and enabling social and economic development”[4]. However, a number of open data portals have already turned out to be short-lived, with the Thai governments open data portal launched[5] in 2011, already defunct and offline at the time of writing.

The data hosted on open data portals varies widely: ranging from information on the locations of public services, and government service performance statistics, to public transport timetables, government budgets, and environmental monitoring data gathered by government research institutions. Not all of this data is useful for anti-corruption work: although the availability of information as structured data makes it far easier to third-parties to analyse a wide range of government datasets not traditionally associated with anti-corruption work to look for patterns and issues that might point to causes for concern. In general, theories of change around open data for anti-corruption assume that skilled intermediaries will access, interpret and work with the datasets published, as portals are generally designed with a technical audience in mind.

Data portals can act as both a catalyst of data publication, providing a focal point that encourages departments to publish data that was not otherwise available, and as an entry-point helping actors outside government to locate datasets that are available. At their best they provide a space for engagement between government and citizens, although few currently incorporate strong community features (De Cindio, 2012).

Recently, transparency and open data efforts have also started to focus on the importance of cross-cutting data standards, that can be used to link up data published in different data portals, and to solicit the publication of sectoral data. Again the aid sector has provided a lead here, with the development the International Aid Transparency Initiative (IATI) data standard, and a data portal collating all the information on aid projects published by donors to this standard[6]. New efforts are seeking to build on experiences from IATI with data standards for contracts information in the Open Contracting initiative, which not only targets information from governments, but also potentially disclosure of contract information in the private sector[7].

2.3 CITIZEN REPORTING CHANNELS

Transparency and open data portals primarily focus on the flow of information from government to citizen. Many efforts to challenge corruption require a flow of information the other way: citizens reporting instances of corruption or providing the information agents of government need to identify and address corrupt behaviour. When reports are filed on paper, or to local officials, it can be hard for central governments to ensure reports are adequately addressed. By contrast, with platforms like the E-Grievance Portal in the Indian State of Orissa[8], when reports are submitted they can be tracked, meaning that where there is will to challenge corruption, citizen reports can be better handled.

Many online channels for citizen reporting have in fact grown up outside of government. Platforms like FixMyStreet in the UK, and the many similar platforms across the world, have been launched by civil society groups frustrated at having to deal with government through seemingly antiquated paper processes. FixMyStreet allows citizens to point out on a map where civil infrastructure requires fixing and forward the citizen reports to the relevant level of government. Government agents are invited to report back to the site when the issue is fixed, giving a trackable and transparent record of government responsiveness. In some areas, governments have responded to these platforms by building their own alternative citizen reporting channels, though often without the transparency of the civil society platforms (reports simply go to the public authority; no open tracking is provided), or, in other cases, by working to integrate the civil society provided solution with their own systems.

In focus: I Paid a BribeMany online channels for citizen reporting have been developed outside of government. One of those platforms is “I Paid a Bribe”, and Indian website aimed at collating bribe’s stories and prices from citizens across the country and then use it to present a snapshot of trends in bribery.

Country: India

Responsible: Janaagraha (www.janaagraha.org) a Bangalore based not-for-profit organizatio

Brief description:

The initiative was first launched on August 15, 2010 (India’s Independence Day), and the website became fully functional a month later. I Paid a Bribe aims to understand the role of bribery in public service delivery by transforming the data collected from the reports into knowledge to inform the government about gaps in public transactions and in strengthening citizen engagement to improve the quality of service delivery. For example, in Bangalore, Bhaskar Rao, the Transport Commissioner for the state of Karnataka, used the data collected on I Paid a Bribe to push through reforms in the motor vehicle department. As a result, and in order to avoid bribes, licenses are now applied for online (Strom, 2012).

Current situation: Trying to reach a greater audience, ipaidabribe.com launched, in mid 2013, “Maine Rishwat Di”, the Hindi language version of the website: http://hindi.ipaidabribe.com/ At the same time, they launched Mobile Apps and SMS services in order to make bribe reporting easier and more accessible to citizens all across India. “I paid a Bribe” has also been replicated with partners in a number of other countries such as Pakistan, Kenya,Morocco and Greece, among others.

Sources: https://www.ipaidabribe.com/about-us

http://southasia.oneworld.net/Files/ict_facilitated_access_to_information_innovations.pdf/at_download/file

http://www.firstpost.com/india/after-reporting-bribes-now-report-rishwats-hindi-version-of-i-paid-a-bribe-launched-1022627.html

http://www.ipaidabribe.com/comment-pieces/“maine-rishwat-di”-hindi-language-version-ipaidabribecom-launched-shankar-mahadevan

Strom, Stephanie (2012) Web Sites Shine Light on Petty Bribery Worldwide. The New York Times. March 6th. Available:  http://www.nytimes.com/2012/03/07/business/web-sites-shine-light-on-petty-bribery-worldwide.html

References

Bhatnagar, S. (2003). Transparency and Corruption?: Does E-Government Help??, 1–9.

De Cindio, F. (2012, April 4). Guidelines for Designing Deliberative Digital Habitats: Learning from e-Participation for Open Data Initiatives. The Journal of Community Informatics.

Fox, J. (2007). The uncertain relationship between transparency and accountability. Development in Practice, 17(4-5), 663–671. doi:10.1080/09614520701469955

Grönlund, Å. (2010). Using ICT to combat corruption – tools, methods and results. In C. Strand (Ed.), Increasing transparency and fighting corruption through ICT: empowering people and communities (pp. 7–26). SPIDER.

Khagram, S., Fung, A., & Renzio, P. de. (2013). Open Budgets: The Political Economy of Transparency, Participation, and Accountability (p. 264). Brookings Institution Press.

Kim, S., Kim, H. J., & Lee, H. (2009). An institutional analysis of an e-government system for anti-corruption: The case of OPEN. Government Information Quarterly, 26(1), 42–50. doi:10.1016/j.giq.2008.09.002

Malmud, C., & O’Reilly, T. (2007, December). 8 Principles of Open Government Data. Retrieved June 01, 2010, from http://resource.org/8_principles.html

Obama, B. (2010). Memo from President Obama on Transparency and Open Government (in Open Government: Collaboration, Transparency and Participation in Practice. In D. Lathrop & L. Ruma (Eds.), .

Power, T. J., & Taylor, M. M. (2011). Corruption and Democracy in Brazil: The struggle for accountability. University of Notre Dame.

Solana, M. (2004). Transparency Portals: Delivering public financial information to Citizens in Latin America. In K. Bain, I. Franka Braun, N. John-Abraham, & M. Peñuela (Eds.), Thinking Out Loud V: Innovative Case Studies on Participatory Instruments (pp. 71–80). World Bank.

Thurston, A. C. (2012). Trustworthy Records and Open Data. The Journal of Community Informatics, 8(2).

Transparency International. (2009). The Anti-Corruption Plain Language Guide.


[1] It is important to clarify that transparency does not necessarily lead to accountability. Transparency, understood as the disclosure of information that sheds light on institutional behavior, can be also defined as answerability. However, accountability (or “hard accountability” according to Fox, 2007) not only implies answerability but also the possibility of sanctions (Fox, 2007).

[2] http://www.alaveteli.org/about/where-has-alaveteli-been-installed/

[4] http://data.edostate.gov.ng/ Accessed 10th October 2013

[8] http://cmgcorissa.gov.in

Joined Up Philanthropy – a data standards exploration

Earlier this year, Indigo Trust convened a meeting with an ambitious agenda: to see 50% of UK Foundation grants detailed as open data, covering 80% founding grant making by value, within five years. Of course, many of the grant-giving foundations in the UK already share details of the work they fund, through annual reports or pages on their websites – but every funder shares the information differently, which makes bringing together a picture of the funding in a particular area or sector, understanding patterns of funding over time, or identifying the foundations who might be interested in a project idea you have, into a laborious manual task. Data standards for the publication of foundation’s giving could change that.

Supported by The Nominet Trust and Indigo Trust, at Practical Participation I’m working with non-profit sector expert Peter Bass on a series of ‘research sprints’ to explore what a data standard could look like. This builds on an experiment back in March to help scope an Open Contracting Data Standard. We’ll be using an iterative methodology to look at

  • (1) the existing supply of data;

  • (2) demand for data and use-cases;

  • and (3) existing related standards.

Each research sprint focusses primarily on one of these, consisting in around 10 days data collection and analysis, designed to generate useful evidence that can move the conversation forward, without pre-empting future decisions or trying to provide the final word on the question of what a data standard should look like.

Supply: What data is already collected?

The first stage, which we’re working on right now, involves finding out about the data that foundations already collect. We’re talking to a number of different foundations large and small to find out about how they manage information on the work they fund right now.

By collating a list of the different database fields that different foundations hold (whether the column headings in the spreadsheets they use to keep track of grants, or the database fields in a comprehensive relational database) and then mapping these onto a common core we’re aiming to build up a picture of which data might be readily available right now and easy to standardise, and where there are differences and diversities that will need careful handing in development of a standard. Past standards projects like the International Aid Transparency Initiative were able to benefit from a large ‘installed base’ of aid donors already using set conventions and data structures drawn from the OECD Development Assistance Committee, which strongly influenced the first version of IATI. We’ll be on the look-out for existing elements of standardisation that might exist to build upon in the foundations sector, as well as seeking to appreciate the diversity of foundations and the information they hold.

We’re aiming to have a first analysis of this exercise out in mid-October, and whilst we’re only focussing on UK foundations, will share all the methods and resources that would allow the exercise to be extended in other contexts.

Demand: what data do people want?

Of course, the data that it is easy to get hold of might not be the data that it is important to have access to, or that potential users want. That motivates the second phase of our research – looking to understand the different use cases for data from the philanthropic sector. These may range from projects seeking to work out who to send their funding applications to; philanthropists seeking to identify partners they could work with; or sector analysts looking to understand gaps in the current giving environment and catalyse greater investment in specific sectors.

Each use case will have different data needs. For example, a local project seeking funding would care particularly about geodata that can tell them who might make grants in their local area; whereas a researcher may be interested in knowing in which financial year grants were awarded, or disbursements made to projects. By articulating the data needs of each use-case, and matching these against the data that might be available, we can start to work out where supply and demand are well matched, or where a campaign for open philanthropy data might need to encourage philanthropists to collect or generate new information on their activities.

Standards: putting the pieces together

Once we know about the data that exists, the data that people want, and how they want to use it – we can start thinking in-depth about standards. There are already a range of standards in the philanthropy space, from the eGrant and hGrant standards developed by the Foundation Centre, to the International Aid Transparency Initiative (IATI) standard, as well as a range of efforts ongoing to develop standards for financial reporting, spending data, and geocoded project information.

Developing a draft standard involves a number of choices:

  • Fields and formats – a standard is made up both of the fields that are deemed important (e.g. value of grant; date of grant etc.) and the technical format through which the data will be represented. Data formats vary in how ‘expressive’ they are, and how extensible a standard is once determined. However, more expressive standards also tend to be more complex.

  • Start from scratch, or extend existing standards – it may be possible to simply adapt an existing standard. Deciding to do this involves both technical and governance issues: for example, if we build on IATI, how would a domestic philanthropy standard adapt to version upgrades in the IATI standard? What collaboration would need to be established? How would existing tools handle the adapted standard.

  • Publisher capacity and needs – standards should reduce rather than increase the burdens on data suppliers. If we are asking publishers to map their data to a complex additional standard, we’re less likely to get a sustainable supply of data. Understanding the technical capacity of people we’ll be asking for data is important.

  • Mapping between standards – sometimes it is possible to entirely automate the conversion between two related standards. For example, if the fields in our proposed standard are a subset of those in IATI, it might be possible to demonstrate how domestic and international funding flows data can be combined. Thinking about how standards map together involves considering the direction in which conversions can take place, and how this relates to the ways different actors might want to make use of the data.

We’ll be rolling our sleeves up as we develop a draft standard proposal, seeking to work with real data from Phase 1 to test out how it works, and checking the standardised data against the use cases identified in Phase 2.

The outcome of this phase won’t be a final standard – but instead a basis for discussion of what standardised data in the philanthropy sector should look like.

Get involved

We’ll be sharing updates regularly through this blog and inviting comments and feedback on each stage of the research.

If you are from a UK based Foundation who would like to be involved in the first phase of research, just drop me a line and we’ll see what we can do. We’re particularly on the look out for small foundations who don’t do much with data right now – so if you’re currently keeping track of your grant-making records on spreadsheets or post-it notes, do get in touch.