Geneva E-Participation Day: Open Data and International Organisations

Meeting venue (I think...)[Summary: notes for a talk on open data and International Organisations]

In just over a weeks time I’l be heading for Geneva to take part in Diplo Foundation’s E-Participation Day: towards a more open UN?’ event. In the past I’ve worked with Diplo on remote participation, using the web to support live online participation in face-to-face meetings such as the Internet Governance Forum. This time I’ll be talking open data – exploring the ways in which changing regimes around data stand to impact International Organisations. This blog post was written for the Diplo blog as an introduction to some of the themes I might explore. 

The event will, of course, have remote participation – so you can register to join in-person or online for free here.

E-participation and remote hubs have the potential to open up dialogue and decision making. But after the conferences have been closed, and the declarations made, it is data that increasingly shapes the outcome of international processes. Whether it’s the numbers counted up to check on progress towards the millennium development goals, GDP percentage pledges on aid spending, or climate change targets, the outcomes of international co-operation frequently depend on the development and maintenance of datasets.

The adage that ‘you can’t manage what you can’t measure’ has relevance both for International Organisations and for citizens. The better the flows of data International Organisations can secure access to, the greater their theoretical capacity for co-ordination of complex systems. And the greater the flows of information from the internal workings of International Organisations that citizens, states and pressures groups can access, the greater their theoretical capacity to both scrutinise decisions and to get involved in decision making and implementation. I say theoretical capacity, because the picture is rarely that straightforward in practice. Yet, that complexity aside for a moment, over the last few years the idea has been gaining ground that, in some states has led to not only a greater flow of data, but has driven a veritable flood – with hundreds and thousands of government datasets placed online for anyone to access and re-use. That idea is open data.

Open Data is a simple concept. Organisations holding datasets should place them online, in machine-readable formats, and under licenses that let anyone re-use them. Advocates explain that this brings a myriad of benefits. For example, rather than finance data being locked up in internal finance systems, only available to auditors, open data on budgets and spending can be published on the web for anyone to download and explore in their spreadsheet software, or to let third parties generate visualisations that show citizens where their money is being spent, and to help independent analysts look across datasets for possible inefficiency, fraud or corruption. Or instead of the location of schools or health centres being kept on internal systems, the data can be published to allow innovators to present it to citizens in new and more accessible ways. And in crisis situations, instead of co-ordinators spending days collecting data from agencies in the field and re-keying the data into central databases, if all the organisations involved were to publish open data in common formats, there is the possibility of it being aggregated together, building up a clearer picture of what is going on. One of the highest profile existing open data initiatives in the development field is the International Aid Transparency Initiative (IATI) which now has standardised open data from 100s or donors, providing the foundation for a timely view of who is doing what in aid.

Open data ideas have been spreading rapidly across the world, with many states establishing national Open Government Data (OGD) initiatives, and International Organisations from The World Bank, to UN DESA, the OECD and the Open Government Partnership all developing conversations and projects around open data. When the G8 meet next week in Northern-Ireland they are expected to launch an ‘Open Data Charter’ setting out principles for high quality open data, and committing states to publish certain datasets. Right now it remains to be seen whether open data will feature anywhere else in the  in the G8 action plans, although there is clearly space for open data ideas and practices to be deployed in securing greater tax transparency, or supporting the ongoing monitoring of other commitments. In the case of the post-2105 process, a number of organisations have been advocating for an access to information focus, seeking to ensure citizens have access to open data that they can use to monitor government actions and hold governments to account on delivering on commitments.

However – as Robinson and Yu have highlighted – there can be an ambiguity of open government data: more open data does not necessarily mean more open organisations. The call for ‘raw data now’ has led to much open data emerging simply as an outbound communication, without routes for engagement or feedback, and no change in existing organisational practices. Rather than being treated as a reform that can enable greater organisational collaboration and co-ordination, many open datasets have just been ‘dumped’ on the web. In the same way that remote participation is often a bolt-on to meetings, without the deeper changes in process needed to make for equal participation for remote delegates, at best much open data only offers actors outside of institutions a partial window onto their operations, and at worst, the data itself remains opaque: stripped of context and meaning. Getting open data right for both transparency, and for transforming international collaboration needs more than just technology. 

As I explored with Jovan Kurbalija of Diplo in a recent webinar, there are big challenges ahead if open data is to work as an asset for development: from balancing tensions between standardisation and local flexibility, developing true multi-stakeholder governance of important data flows, and getting the incentives for collaboration right. However, now is the time to be engaging with these challenges – within a window of energy and optimism, and before network effects lock in paradoxically ‘closed’ systems of open data. I hope the dialogue at the Geneva E-Participation day will offer a small chance to broaden open data understanding and conversations in a way that can contribute to such engagement.

Open data in extractives: meeting the challenges


followthedatalinesmallerThere’s lots of interest building right now around how open data might be a powerful tool for transparency and accountability in the extractive industries sector. Decisions over where extraction should take place have a massive impact on communities and the environment, yet often decision making is opaque, with wealthy private interests driving exploitation of resources in ways that run counter the public interest. Whilst revenues from oil, gas and mineral resources have the potential to be a powerful tool for development, with a proportion channeled into public funds, massive quantities of revenue frequently ‘go missing’, lost in corruption, and
fuelling elements of a resource curse.

For the last ten years the Extractive Industries Transparency Initiative has been working to get companies to commit to ‘publish what they pay‘ to government, and for government to disclose receipts of finance, working to identifying missing money through a document-based audit process. Campaigning coalitions, watchdogs and global initiatives have focussed on increasing the transparency of the sector. Now, with a recognition that we need to link together information on different resources flows for development at all levels, potentially through the use of structured open data, and with an anticipated “data tsunami” of new information on extractives financials anticipated from the Dodd-Frank act in the US, and similar regulation in Europe, groups working on extractives transparency have been looking at what open data might mean for future work in this area.

8713819458_08a1bf9c10_zRight now, DFID are taking that exploration forward through a series of hack days with Rewired State under the ‘follow the data’ banner, with the first in London last weekend, and one coming up next week in Lagos, Nigeria. The idea of the events is to develop rapid prototypes of tools that might support extractives transparency, putting developers and datasets together over 24 hours to see what emerges. I was one of the judging panel at this weekends event, where the three developer teams that formed looked respectively at: making datasets on energy production and prices more accessible for re-use through an API; visualising the relationship between extractives revenues and various development indicators; and designing an interface for ‘nuggets’ of insight discovered through hack-days to be published and shared with useful (but minimal) meta-data.

In their way, these three projects highlight a range of the challenges ahead for the extractives sector in building capacity to track resource flows through open data:

  • Making data accessibleThe APIfy project sought to take a number of available datasets and aggregate them together in a database, before exposing a number of API endpoints that made machine-readable standardised data available on countries, companies and commodities. By translating the data access challenge from one or routing around in disparate datasets, to one of calling a standard API for key kinds of ‘objects’, the project demonstrated the need developers often have for clear platforms to build upon. However, as I’ve discovered in developing tools for the International Aid Transparency Initiative, building platforms to aggregate together data often turns out to be a non-trivial project: technically (it doesn’t take long to get to millions of data items when you are dealing with financial transactions), economically (as databases serving millions of records to even a small number of users need to be maintained and funded), socially (developers want to be able to trust the APIs they build against to be stable, and outreach and documentation are needed to support developers to engage with an API), and in terms of information architecture (as design choices over a dataset or API can have a powerful affect on downstream re-users).
  • Connecting datasets – none of the applications from the London hack-day were actually able to follow resource flows through the available data. Although visions of a coherent datasphere, in which the challenge is just making the connection between a transaction in one dataset, and a transaction in another, to see where money is flowing, are appealing – traceability in practice turns out to be a lot harder. To use the IATI example again, across the 100,000+ aid activities published so far less than 1% include traceability efforts to show how one transaction relates to another, and even here the relationships exist in the data because of conscious efforts by publishers to link transaction and activity identifiers. In following the money there will be many cases where people have an incentive not to make these linkages explicit. One of the issues raised by developers over the hack-day was the scattered nature of data, and the gaps across it. Yet – when it comes to financial transaction tracking, we’re likely to often be dealing with partial data, full of gaps, and it won’t be easy to tell at first glance when a mis-match between incoming and outgoing finances is a case of missing data or corruption. Right now, a lot of developers attack open data problems with tools optimised for complete and accurate data, yet we need to be developing tools, methods and visualisation approaches that deal with partial and uncertain data. This is developed in the next point.
  • Correlation, causation and investigation – The Compare the Map project developed on the hack day uses “scraped data from GapMinder and EITI to create graphical tools” that allow a user to eye-ball possible correlations between extractives data and development statistics. But of course, correlation is not causation – and the kinds of analysis that dig deeper into possible relationships are difficult to work through on a hack day. Indeed, many of the relationships mash-ups of this form can show have been written about in papers that control for many more variables, dealing carefully with statistically challenging issues of missing data and imperfectly matched datasets. Rather than simple comparison visualisations that show two datasets side by side, it may be more interesting to look for all the possible statistically significant correlations in a datasets with common reference points, and then to look at how human users could be supported in exploring, and giving feedback on, which of those might be meaningful, and which may or may not already be researched. Where research does show a correlation to exist, then using open data to present a visual narrative to users about this can have a place, though here the theory of change is very different – not about identifying connections – but about communicating them in interactive and engaging ways to those who may be able to act upon them.
  • Sharing and collaborating – The third project at the London hack-day was ‘Fact Cache‘ – a simple concept for sharing nuggets of information discovered in hack-day explorations. Often as developers work through datasets they may come across discoveries of interest, yet these are often left aside in the rush to create a prototype app or platform. Fact Cache focussed on making these shareable. However, when it was presented discussions also explored how it could make these nuggets of information into social objects, open to discussion and sharing. This idea of making open data findings more usable as social objects was also an aspect of the UN Global Pulse hunchworks project. That project is currently on hold (it would be interesting to know why…), but the idea of supporting collaboration around open data through online tools, rather than seeing apps that present data, or initial analysis as the end point, is certainly one to explore more in building capacity for open data to be used in holding actors to account.
  • Developing theories of change – as the judges met to talk about the projects, one of the key themes we looked at was whether each project had a clear theory of change. In some sense taken together they represent the complex chain of steps involved in an open data theory of change, from making data more accessible to developers, creating tools and platforms that let end users explore data, andthen allowing findings from data to be communicated and to shape discourses and action. Few datasets or tools are likely to be change-making on their own – but rather can play a key role in shifting the balance of power in existing networks or organisations, activists, companies and governments. Understanding the different theories of change for open data is one of the key themes in the ongoing Open Data in Developing Countries research, where we take existing governance arrangements as a starting point in understanding how open data will bring about impacts.

In a complex world, access to data, and the capacity to use it effectively, are likely to be essential parts of building more accountable governance across a wide range of areas, including in the extractives industry. Although there are many challenges ahead if we are to secure the maximum benefits from open data for transparent and accountable governance, it’s exciting and encouraging to see so many passionate people putting their minds early to tackling them, and building a community ready to innovate and bring about change.

Note: The usage of ‘follow the data’ in this DFID project is distinct from the usage in the work I’m currently doing to explore ‘follow the data’ research methods. In the former, the focus is really on following financial and resource flows through connecting up datasets; in the latter the focus is on tracing the way in which data artefacts have been generated, deployed, transferred and used in order to understand patterns of open data use and impact.

 

Intelligent Impact: Evaluating an open data capacity building with voluntary sector organisations

[Summary: sharing the evaluation report (9 pages, PDF) of an open data skills workshop for voluntary sector organisations]

Banner

Late last year, through the CSO network on the Open Government Partnership, I got talking with Deirdre McGrath of the Your Voice, Your City project about ways of building voluntary sector capacity to engage with open data. We talked about the possibility of a hack-day, but realised the focus at this stage needed to be on building skills, rather than building tools. It also needed to be on discovering what was possible with open data in the voluntary sector, rather than teaching people a limited set of skills. And as the Your Voice, Your City project was hosted within the London Voluntary Services Council (LVSC), an infrastructure organisation with a policy and research team, we had the possibility of thinking about the different roles needed to make the most of open data, and how a capacity building pilot could work both with frontline Voluntary and Community Sector (VCS) organisations, and an infrastructure organisation. A chance meeting with Nick Booth of podnosh gave form to a theme in our conversations about the need to focus on both ‘stats’ and ‘stories’ ensuring that capacity building worked with both quantitative and qualitative data and information. The result: plans for a short project, centred on a one-day workshop on ‘Intelligent Impact’, exploring the use of social media and open data for VCS organisations.

The day involved staff from VCS organisations coming along with questions or issues they wanted to explore, and then splitting into groups with a team of open data and social media mentors (Nick Booth, Caroline Beavon, Steven Flower, Paul Bradshaw and Stuart Harrison) to look at how existing online resources, or self-created data and media, could help respond to those questions and issues. Alex Farrow captured the story of the day for us using Storify and I’ve just completed a short evaluation report telling the story in more depth, capturing key learning from the event, and setting out possible next steps (PDF).

Following on from the event, the LVSC team have been exploring how a combination of free online tools for curating open data, collating questions, and sharing findings can be assembled into a low-cost and effective ‘intelligence hub‘, where data, analysis and presentation layers are all made accessible to VCS organisations in London.

Developing data standards for Open Contracting

logo-open-contractingContracts have a key role to play in effective transparency and accountability: from the contracts government sign with extractives industries for mineral rights, to the contracts for delivery of aid, contracts for provision of key public services, and contracts for supplies. The Open Contracting initiative aims to improve the disclosure and monitoring of public contracts through the creation of global principles, standards for contract disclosure, and building civil society and government capacity. One strand of work that the Open Contracting team have been exploring to support this work is the creation of a set of open data standards for capturing contract information. This blog post reports on some initial ground work designed to inform this strand of work.

Although I was involved in some of the set-up of this short project, and presented the outcomes at last weeks workshop, the bulk of the work was undertaken by Aptivate‘s Sarah Bird.

Update: see also the report of the process here.

Update 2 (12th Sept 2013): Owen Scott has build on the pilot with data from Nepal.

The process

Developing standards is a complex process. Each choice made has implications: for how acceptable the standard will be to different parties; for how easy certain uses of the data will be; and for how extensible the standard will be, or which other standards it will easily align with. However, standards cannot easily be built up choice-by-choice from a blank slate adopting the ideal choice: they are generally created against a background of pre-existing datasets and standards. The Open Contracting data standards team had already gathered together a range of contract information datasets currently published by governments across the world, and so, with just a few weeks between starting this project and the data standards workshop on 28th March, we planned an 5-day development sprint, aiming to generate a very draft first iteration of a standard. Applying an agile methodology, where short iterations are each designed to yield a viable product by the end, but on the anticipating that further early iterations may revise and radically alter this, meant we had to set a reasonable scope for this first sprint.

The focus then was on the supply side, taking a set of existing contract datasets from different parties, and identifying their commonalities and differences. The contract datasets selected were from the UK, USA, Colombia, Philippines and the World Bank. From looking at the fields these existing datasets had in common, an outline structure was developed, working on a principle of taking good ideas from across the existing data, rather than playing to a lowest common denominator. Then, using the International Aid Transparency Initiative activity standard as a basis, Sarah drafted a basic data structure, which can act as a version 0.01 standard for discussion. To test this, the next step was to convert samples from some of the existing datasets into this new structure, and then to analyse how much of the available data was covered by the structure, and how comprehensive the available data was when placed against the draft structure. (The technical approach taken, which can be found in the sprint’s GitHub repository, was to convert the different incoming data to JSON, and post it into a MongoDB instance for analysis).

We discuss the limitations of this process in a later section.

Initial results

The initial pass of data suggested a structure based on:

  • Organisation data – descriptions of organisations, held separately from individual contract information, and linked by a globally unique ID (based on the IATI Organisational ID standard)
  • Contract meta data – general information about the contract in question, such as title, classification, default currency and primary location of supply. Including an area for ‘line items’ of elements the contract covers.
  • Contract stages – a series of separate blocks of data for different stages of the contract, all contained within the overarching contract element.
    • Bid – key dates and classifications about the procurement stage of a contract process.
    • Award – details of the parties awarded the contract and the details of the award.
    • Performance – details of transactions (payments to suppliers) and work activities carried out during the performance of the contract.
    • Termination – details of the ending of the contract.
  • Documents – fields for linking to related documents.

A draft annotated schema for capturing this data can be found in XML and JSON format here, and a high-level overview is also represented in the diagram below. In the diagrams that follow, each block represents one data point in the draft standard.

1-Phases

We then performed an initial analysis to explore how much of the data currently available from the sources explored would fit into the standard, and how comprehensively the standard could be filled from existing data. As the diagram below indicates, no single source covered all the available data fields, and some held no information on particular stages of the contracting process at all. This may be down to different objectives of the available data sources, or deeper differences in how organisations handle information on contracts and contracting workflows.

2-Coverage

Combining the visualisations above into a single views given a sense of which data points in the draft standard have greatest use, illustrated in the schematic heat-map below.

3-Heatma

At this point the analysis is very rough-and-ready, hence the presentation of a rough impression, rather than detailed field-by-field analysis. The last thing to check was how much data was ‘left over’ and not captured in the standard. This was predominantly the case for the UK and USA datasets, where many highly specialised fields and flags were present the dataset, indicating information that might be relevant to capture in local contract datasets, but which might be harder to find standard representations for across contracts.

4-Extra

The next step was to check whether data that could go into the same fields could be easily harmonised. As the existence of organisation details, or dates, and classifications of contracts across different datasets does not necessarily mean these are interoperable. Fields like dates and financial amounts appeared to be relatively easy to harmonise, but some elements present greater challenges, such as organisational identifiers, contact people, and various codelists in use. However some code-lists may possible to harmonise. For example, the ‘Category’ classifications from across datasets were translated, grouped and aggregated, up to 92% of the original data in a sample was retained.

5-Sum and Group

Implications, gaps, next steps

This first iteration provides a basis for future discussions. There are, however, some important gaps. Most significant of all is that this initial development has been supply-side driven, based around the data that organisations are already publishing, rather than developed on the basis of the data that civil society organisations, or scrutiny bodies, are demanding in order to make sense of complex contract situations. It also omits certain kinds of contracts, such as complex extractives contracts (on which, see the fantastic work Revenue Watch have been doing with getting structured data from PDF contracts with Document Cloud), and Public Private Partnership (PPP) contracts. And it has not delved deeply into the data structures needed for properly capturing information that can aid in monitoring contract performance. These gaps will all need to be addressed in future work.

At the moment, this stands as discrete project, and no set next-steps are agreed as far as I’m aware. However, some of the ideas explored in the meeting on the 28th included:

  • A next iteration – focussed on the demand side – working with potential users of contracts data to work out how data needs to be shaped, and what needs to be in a standard to meet different data re-use needs. This could build towards version 0.02.
  • Testing against a wider range of datasets – either following, or in parallel with, a demand-driven iteration, to discover how the work done so far evolves when confronted with a larger set of existing contract datasets to synthesise.
  • Connecting with other standards. This first sprint took the IATI Standard as a reference point. There may be other standards to refer to in development. Discussions on the 28th with those involved in other standards highlighted an interest in more collaborative working to identify shared building blocks or common elements that might be re-used across standards, and to explore the practical and governance implications of this.
  • Working on complementary building blocks of a data standard – such as common approaches to identifying organisations and parties to a contract; or developing tools and platforms that will aggregate data and make data linkable. The experience of IATI, Open Spending and many other projects appears to be that validators, aggregation platforms and data-wrangling tools are important complements to standards for supporting effective re-use of open data.

Keep an eye on the Open Contracting website for more updates.

New paper: Connecting people, sharing knowledge, increasing transparency

Screen Shot 2013-03-08 at 19.16.53

[Summary: Linking to a short conference paper exploring the impact of the web on land governance]

After a few contributions I made to their online dialogue, the lovely folk at The Land Portal invited me to help them writing up the dialogue and putting together a paper for the upcoming World Bank Conference on Land and Poverty. They’ve just published the result over here (also accessible via this direct PDF link).

The paper itself was a rather rapid creation between the end of the online dialogue and the end of February deadline, but aims to weave together a number of strands important to thinking about how digital technology is changing the landscape for advocacy and work on land governance. It has a particular focus on women’s land rights, which, in the Land Portal’s online dialogue, stimulated lots of interesting discussions about the potentially gendered nature of digital technologies. In it we survey how the Web has evolved from Web 1 (documents), to Web 2 (communities) and onwards towards a possible Web 3 of open and linked open data.

The Land Portal team, since getting involved in last years Open Knowledge Festival, have been really exploring what open data and open development might mean for them – and the Portal is definitely a space to watch to see how ideas of open development might be put into practice in the very grounded and wide-reaching field of land governance.

Joining the Web Foundation, and projects old and new

[Summary: an update on some of the hats I might be wearing]

ODDC_hi-res

After working for the last eight or so months putting the project together, I’ve now formally joined the World Wide Web Foundation as research coordinator on the ‘Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC)’ programme.

It’s a two-year multi-country research project, exploring how open data is working in different settings across the world, funded by Canada’s International Development Research Centre. You can read more about the project over on the Web Foundation website and follow the project as it develops over at www.opendataresearch.org.

With this, I’ve switched my PhD work to part-time, but will continue to work with AidInfo on work related to the International Aid Transparency Initiative, and working through Practical Participation on assorted innovation and advocacy projects.

 

Generation Y? Bridging the participation gap in an online world

20130204_104332

Back in July 2011 I spoke at a conference on ‘Generation-Y’ and public services hosted by Institut de la Gestion Publique (Institute for Public Management) in Paris. I was asked to write up the talk as an article for a print publication. So, I wrote up an extended version of this blog post, and fired it off, with a creative commons license on. A few months later I found myself having to print and sign paper contracts to convince the publishers that yes, they really could print the article. To make them happier I agreed I wouldn’t publish a copy of the article till it was out in their book. And then I pretty much forgot about it.

So I was surprised to get back from the OKF Winter Summit yesterday to find a parcel from France containing a copy of the book, French translation of the article included. 18 months after the conference, a print document void of links or graphics, with no mention of the creative commons license on the article. It looks like Institut de la Gestion Publique still have a very long way to go before they are really taking seriously the expectations gaps that my article talked about.

Ah well. Here’s a copy of the full article in English anyway (PDF). Unfortunately I’ve not been given a digital copy of the version in French, but happy to scan it in if anyone would like it.

20130204_104324

 

Open Data for Poverty Alleviation: Striking Poverty Discussion

Screen Shot 2013-02-03 at 08.43.29

[Summary: join an open discussion on the potential impacts of open data on poverty reduction]

Over the next two weeks, along with Tariq Kochar, Nitya V. Raman and Nathan Eagle, I’m taking part in an online panel hosted by the World Bank’s Striking Poverty platform to discuss the potential impacts of open data on poverty alleviation.

So far we’ve been asked to provide some starting statements on how we see open data and poverty might relate, and now there’s an open discussion where visitors to the site are invited to share their questions and reflections on the topic.

Here’s what I have down as my opening remarks:

Development is complex. No individual or group can process all the information needed to make sense of aid flows, trade patterns, government budgets, community resources and environmental factors (amongst other things) that affect development in a locality. That’s where data comes in: open datasets can be connected, combined and analysed to support debate, decision making and governance.

Projects like the International Aid Transparency Initiative (IATI) have sought to create the technical standards and political commitments for effective data sharing. IATI is putting together one corner of the poverty reduction jigsaw, with detailed and timely forward-looking information on aid. IATI open data can be used by governments to forecast spending, and by citizens to hold donors to account. This is the promise of open data: publish once, use many times and for many purposes.

But data does not use itself. Nor does it transcend political and practical realities. As the papers in a recent Journal of Community Informatics special issue highlight show, open data brings both promise and perils. Mobilising open data for social change requires focus and effort.

We’re only at the start of understanding open data impacts. In the upcoming Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC), the Web Foundation and partners will be looking at how open data affects governance in different countries and contexts across the world. Rather than look at open data in the abstract, the project will explore cases such as open data for budget monitoring in Brazil, or open data for poverty reduction in Uganda. This way it will build up a picture of the strategies that can be used to make a difference with data; it will analyse the role that technologies and intermediaries play in mobilising data; and it will also explore unintended consequences of open data.

I hope in this discussion we can similarly focus on particular places where open data has potential, and on the considerations needed to ensure the supply and use of open data has the best chance possible of improving lives worldwide.

What do you think? You can join the discussion for the next two weeks over on the Striking Poverty site…

Linked-Development: notes from Research to Impact at the iHub

[Summary: notes from a hackathon in Nairobi built around linked open data]

Research to Impact HackI’ve just got back from an energising week exploring open data and impact in Kenya, working with R4D and IDS at Nairobi’s iHub to run a three-day hackathon titled ‘Research to Impact’. You can read Pete Cranston’s blog posts on the event here (update: and iHub’s here). In this post, after a quick pre-amble, I reflect particularly on working with linked data as part of the event.

The idea behind the event was fairly simple: lots of researchers are producing reports and publications related to international development, and these are logged in catalogues like R4D and ELDIS, but often it stops there, and research doesn’t make it into the hands of those who can use it to bring about economic and social change. By opening up the data held on these resources, and then working with subject experts and developers, we were interested to see whether new ideas would emerge for taking research to where it is needed.

The Research to Impact hack focused in on ‘agriculture and nutrition’ research so that we could spend the first day working with a set of subject experts to identify the challenges research could help meet, and to map out the different actors who might be served by new digital tools. We were hosted for the whole event at the inspiring iHub and mLab venue by iHub Research. iHub provides a space for the growing Kenya tech community, acting as a meeting space, incubator and workspace for developers and designers. With over 10,000 members of it’s network, iHub also helped us to recruit around 20 developers who worked over the second two days of the hackathon to build prototype applications responding to the challenges identified on day one, and to the data available from R4D and IDS.

A big focus of the hackathon development turned out to be on mobile applications: as in Kenya mobile phones are the primary digital tool for accessing information. On day four, our developers met again with the subject experts, and pitched their creations to a judging panel, who awarded first, second and third prizes. Many of the apps created had zeroed in on a number of key issues: working through intermediaries (in this case, the agricultural extension worker), rather than trying to use tech to entirely disinter-mediate information flows; embedding research information into useful tools, rather than providing it through standalone portals (for example, a number of teams build apps which allowed extension workers to keep track of the farmers they were interacting with, and that could then use this information to suggest relevant research); and, most challengingly, the need for research abstracts and descriptions to be translated into easy-to-understand language that can fit into SMS-size packages. Over the coming weeks IDS and R4D are going to be exploring ways to work with some of the hackathon teams to take their ideas further.

Linked-development: exploring the potential of linked data

Linked Data StructureThe event also provided us with an opportunity to take forward explorations of how linked data might be a useful technology in supporting research knowledge sharing. I recently wrote a paper with Duncan Edwards of IDS exploring the potential of linked data for development communication, and I’ve been exploring linked data in development for a while. However, this time we were running a hackathon directly from a linked data source, which was a new experience.

Ahead of the event I set up linked-development.org as a way to integrate R4D data (already available in RDF), and ELDIS data (which I wrote a quick scraper for), both modelled using the FAO’s AGRIS model. In order to avoid having to teach SPARQL for access to the data, I also (after quite a steep learning curve) put together a very basic Puelia Linked Data API implementation over the top of the data. To allow for a common set of subject terms between the R4D and ELDIS data, I made use of the Maui NLP indexer to tag ELDIS agriculture and nutrition documents against the FAO’s Agrovoc (R4D already had editor assigned terms against this vocabulary), giving us a means of accessing the documents from the two datasets alongside each other.

The potential value of this approach become clear on the first day of the event, when one of the subject experts showed us their own repository of Kenyan-focussed agricultural research publications and resources, which was already modelled and theoretically accessible as RDF using the Agris model. Although our attempts to integrate this into our available dataset failed due to the Drupal site serving the data hitting memory limits (linked data still remains something that tends to need a lot of server power thrown at it, and that can have significant impacts where the relative cost of hosting and tech capacity is high), the potential to bring more local content into linked-development.org alongside data from R4D and ELDIS was noted by many of the developers taking part as something which would be likely to make their applications a lot more successful and useful: ensuring that the available information is built around users needs, not around organisational or project boundaries.

At the start of the developer days, we offered a range of ways for developers to access the research meta-data on offer. We highlighted the linked data API, the ELDIS API (although it only provided access to one of the datasets, I found it would be possible for us to create an compatible API speaking to the linked data in future), and SPARQL as means to work with the data. Feedback forms from the event suggest that formats like JSON were new to many of our participants, and linked data was a new concept to all. However, in the end, most teams chose to use some of the prepared SPARQL queries to access the data, returning results as JSON into PHP or Python. In practice, over the two days this did not end up realising the full value of linked data, as teams generally appeared to use code samples to pull SPARQL ‘SELECT’ result sets into relational databases, and then to build their applications from there (a common issue I’ve noted at hack days, where the first step of developers is to take data into the platform they use most). However, a number of teams were starting to think about both how they could use more advanced queries or direct access to the linked data through code libraries in future, and most strikingly, were talking about how they might be able to write data back to the linked-development.org data store.

This struck me as particularly interesting. A lot of the problems teams faced in creating their application was that the research meta-data available was not customised to agricultural extension workers or farmers. Abstracts would need to be re-written and translated. Good quality information needed to be tagged. New classifications of the resources were needed, such as tagging research that is useful in the planting season. Social features on mobile apps could help discover who likes what and could be used to rate research. However, without a means to write back to the shared data store, all this added value will only ever exist in the local and fragmented ecosystems around particular applications. Getting feedback to researchers about whether their research was useful was also high on the priority list of our developers: yet without somewhere to put this feedback, and a commitment from upstream intermediaries like R4D and ELDIS to play a role feeding back to authors, this would be very difficult to do effectively.

This links to one of the points that came out in our early IKM Emergent work on linked data, noting that the relatively high costs and complexity of the technology, and the way in which servers and services are constructed, may lead to an information environment dominated by those with the capacity to publish; but that it has the potential, with the right platforms, configurations and outreach, to bring about a more pluralistic space, where the annotations from local users of information can be linked with, and equally accessible as, the research meta-data coming from government funded projects. I wish we had thought about this more in advance of the hackathon, and provided each team with a way to write data back to the linked-development.org triple store (e.g. giving them named graphs to write to; and providing some simple code samples or APIs), as I suspect this would have opened up a whole new range of spaces for innovation.

Overall though, the linked-development.org prototype appears to have done some useful work, not least providing a layer to connect two DFID funded projects working on mobilising research. I hope it is something we can build upon in future.

Final papers in JCI Special Issue on Open Data

Earlier this year I blogged about the first release of papers on Open Data in a Special Issue of the Journal of Community Informatics that I had been co-editing with Zainab Bawa. A few days ago we added the last few papers to the issue, finalising it as a collection of critical thinking about the development of Open Government Data.

You can find the full table of contents below (new papers noted with (New)).

Table of Contents

Editorial

The Promises and Perils of Open Government Data (OGD), Tim G. Davies, Zainab Ashraf Bawa

Two Worlds of Open Government Data: Getting the Lowdown on Public Toilets in Chennai and Other Matters, Michael Gurstein

Articles

The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment, Bhuvaneswari Raman

“This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative, Jo Bates

Data Template For District Economic Planning, Sharadini Rath

Guidelines for Designing Deliberative Digital Habitats: Learning from e-Participation for Open Data Initiatives, Fiorella De Cindio

(New) Unintended Behavioural Consequences of Publishing Performance Data: Is More Always Better?, Simon McGinnes, Kasturi Muthu Elandy

(New) Open Government Data and the Right to Information: Opportunities and Obstacles, Katleen Janssen

Notes from the field

Mapping the Tso Kar basin in Ladakh, Shashank Srinivasan

Collecting data in Chennai City and the limits of openness, Nithya V Raman

Apps For Amsterdam, Tom Demeyer

Open Data – what the citizens really want, Wolfgang Both

(New) Trustworthy Records and Open Data, Anne Catherine Thurston

(New) Exploring the politics of Free/Libre/Open Source Software (FLOSS) in the context of contemporary South Africa; how are open policies implemented in practice?, Asne Kvale Handlykken

Points of View

Some Observations on the Practice of “Open Data” As Opposed to Its Promise, Roland J. Cole