Getting the incentives right: an IATI enquiry service?

[Summary: Brief notes exploring a strategic and service-based approach to improve IATI data quality]

Filed under: rough ideas

At the International Aid Transparency Initiative (IATI) Technical Advisory Group meeting (#tag2015) in Ottawa last week I took part in two sessions exploring the need for Application Programming Interfaces (APIs) onto IATI data. It quickly became clear that there were two challenges to address:

(1) Many of the questions people around the table were asking were complex queries, not the simple data retrieval kinds of questions that an API is well suited to;

(2) ‘Out of the box’ IATI data is often not able to answer the kinds of questions being asked, either because

  • (a) the quality and consistency of data from distributed sources means that there are a range of special cases to handle when performing cross-donor analysis;
  • (b) the questions asked invite additional data preparation, such as currency conversion, or identifying a block of codes that relate to a particular sector (.e.g. identifying all the Water and Sanitation related codes)

These challenges also underlie the wider issue explored at TAG2015: that even though five years of effort have gone into data supply, few people are actually using IATI data day-today.

If the goal of the International Aid Transparency Initiative as a whole, distinct from the specific goal of securing data, is more informed decision making in the sector, then this got me thinking about the extent to which what we need right now is a primary focus on services rather than data and tools. And from that, thinking about whether intelligent funding of such services could lead to the right kinds of pressures for improving data quality.

Improving data through enquiries

Using any dataset to answer complex questions takes both domain knowledge, and knowledge of the data. Development agencies might have lots of one-off and ongoing questions, from “Which donors are spending on Agriculture and Nutrition in East Africa?”, to “What pipeline projects are planned in the next six months affecting women and children in Least Developed Countries?”. Against a suitably cleaned up IATI dataset, reasonable answers to questions like these could be generated with carefully written queries. Authoriative answers might require further cleaning and analysis of the data retrieved.

For someone working with a dataset every day, such queries might take anything from a few minutes to a few hours to develop and execute. Cleaning data to provide authoritative answers might take a bit longer.

For a programme officer, who has the question, but not the knowledge of the data structures, working out how to answer these questions might take days. In fact, the learning curve will mean often these questions are simply not asked. Yet, having the answers could save months, and $millions.

So – what if key donors sponsored an enquiries service that could answer these kinds of queries on demand? With the right funding structure, it could have incentives not only to provide better data on request, but also to put resources into improving data quality and tooling. For example: if there is a set price paid per enquiry successfully answered, and the cost of answering that enquiry is increased by poor data quality from publishers, then there can be an incentive on the service to invest some of it’s time in improving incoming data quality. How to prioritise such investments would be directly connected to user demand: if all the questions are made trickier to answer because of a particular donor’s data, then focussing on improving that data first makes most sense. This helps escape the current situation in which the goal is to seek perfection for all data. Beyond a certain point, the political pressures to publish may ceases to work to increase data quality, whereas requests to improve data that are directly connected to user demand and questions may have greater traction.

Of course, the incentive structures here are subtle: the quickest solution for an enquiry service might be to clean up data as it comes into its own data store rather than trying to improve data at source – and there remains a desire in open data projects to avoid creating single centralised databases, and to increase the resiliency of the ecosystem by improving original open data, which would oppose this strategy. This would need to be worked through in any full proposal.

I’m not sure what appetite there would be for a service like this – but I’m certain that in, what are ultimately niche open data ecosystems like IATI, strategic interventions will be needed to build the markets, services and feedback loops that lead to their survival.

Comments and reflection welcome

#CODS15: Trends and attitudes in open data

[Summary: sharing slides from talk at Canadian Open Data Summit]

The lovely folks at Open North were kind enough to invite me to give some opening remarks at the Canadian Open Data Summit in Ottawa today. The subject I was set was ‘trends and attitudes in the global open data community’ – and so I tried to pick up on five themes I’ve been observing and reflecting on recently. The slides from my talk are below (or here), and I’ve jotted down a few fragmentary notes that go along with them (and represent some of what I said, and some of what I meant to say [check against delivery etc.]). There’s also a great take on some of the themes I explored, and that developed in the subsequent panel, in the Open Government Podcast recap here.

(These notes are numbered for each of the key frames in the slide deck. You can move horizontally through the deck with the right arrow, or through each section with the down arrow. Hit escape when viewing the deck to get an overview. Or just hit space bar to go through as I did when presenting…)

(1) I’m Tim. I’ve been following the open data field as both a practitioner and a social researcher over the last five years. Much of this work as part of my PhD studies, and through my time as a fellow and affiliate at the Berkman Centre.

(2) First let’s get out the way the ‘trends’ that often get talked about somewhat breathlessly: the rapid growth of open data from niche idea, to part of the policy mainstream. I want to look at five more critical trends, emerging now, and to look at their future.

(3) First trend: the move from engagement with open data to solve problems, to a focus on infrastructure building – and the need to complete a cyclical move back again. Most people I know got interested in open data because of a practical issue, often a political issue, where they wanted data. The data wasn’t there, so they joined action to make it available. This can cycle into ongoing work on building the infrastructure of data needed to solve a problem – but there is a risk that the original problems get lost – and energy goes into infrastructure alone. There is a growing discourse about reconnecting to action. Key is to recognise data as problem solving, and data infrastructure building, as two distinct forms of open data action, complementary, but also in creative tension.

(4) Second trend: there are many forms of open data initiative, and growing data divides. For more on this, see the Open Data Barometer 2015 report, and this comparison of policies across six countries. Canada was up 1 place in the rankings from the first to second editions of the ODB. But that mainly looks at a standard model of doing open data. Too often we’re exporting an idea of open data based on ‘Data Portal + License + Developers & Apps = Open Data Initiative’ – but we need to recognise that there are many different ways to grow an open data initiative, and activity – and to be opening up space for a new wave of innovation, rather than embedding the results of our first years experimentation as the best practice.

(5) Third trend: the Open Data Barometer hints that impact is strongest where there are local initiatives. Urban initiatives? How do we ensure that we’re not designing initiatives that can only achieve impact with a critical mass of developers, community activists and supporting infrastructures.

(6) Fourth trend: There is a growing focus on data standards. We’ve moved beyond ‘Raw Data Now’ to see data publishers thinking about standards on everything from public budgets, to public transit, public contracts and public toilets. But when we recognise that our data is being sliced, diced and cooked, are we thinking about who it is being prepared for? Who is included, and who is excluded? (Remember, Raw Data is an Oxymoron). Even some of the basics of how to do diverse open data are not well resolved right now. How do we do multilingual data for example? Or how do we find measurement standards to assess open data in federal systems? Canada has a role as a well-resourced multi-lingual country in finding good solutions here.

(7) Fifth trend: There are bigger agendas on the policy scene right now than open data. But open data is still a big idea. Open data has been overtaken in many settings by talk of big data, smart cities, data revolutions and the possibility of data-driven governance. In the recent African Data Consensus process, 15 different ‘data communities’ were identified, from land data, and geo-data communities, to health data and conflict data communities. Open data was framed as another ‘data community’. Should we be seeing it this way? Or as an ethic and approach to be brought into all these different thematic areas: a different way of doing data – not another data domain. We need to look to the ideas of commons, and the power to create and collaborate that treating our data as a common resource can unlock. We need to reclaim the politics of open data as an idea that challenges secrecy, and that promotes a foundation for transparency, collaboration and participation. Only with this can we critique these bigger trends with the open data idea – and struggle for a context in which we are not database objects in the systems of the state, but are collaborating, self-determining, sovereign citizens.

(8) Recap & take-aways:

  • Embed open data in wider change
  • Innovate and experiment with different open data practices
  • Build community to unlock the impact of open data
  • Include users in shaping open data standards
  • Combine problem solving and infrastructure building

Slow down with the standards talk: it’s interoperability & information quality we should focus on

[Summary: cross-posting a contribution to the discussions on the International Open Data Conference blog]

There is a lot of focus in the run up the International Open Data Conference in Ottawa next week. Two of the Action Area workshops on Friday are framed in terms of standards – at the level of data publication best practices, and collaboration between the standards projects working on thematic content standards at the global level.

It’s also a conversation of great relevance to local initiatives, with CTIC writing on the increasing tendancy of national open data regulations to focus on specific datasets that should be published, and to prescribe data standards to be used. This is trend mirrored in the UK Local Government Transparency code, accompanied by schema guidance from Local Government Association, and even where governments are not mandating standards, community efforts have emerged in the US and Australia to develop common schemas for publication of local data – covering topics from budgets to public toilet locations.

But – is all this work on standards heading in the right direction? In his inimitable style, Friedrich Lindenberg has offered a powerful provocation, challenging those working on standards to consider whether the lofty goal of creating common ways of describing the world so that all our tools just seamlessly work together is really a coherent or sensible one to be aiming for.

As Friedrich notes, there are many different meanings of the word ‘standard’, and often multiple versions of the word are in play in our discussions and our actions. Data standards like the the General Transit Feed Specification, International Aid Transparency Initiative Schema, or Open Contracting Data Standard are not just technical descriptions of how to publish data: they are also rhetorical and discplinary interventions, setting out priorities about what should be published, and how it should be represented. The long history of (failed) attempts to find general logical languages to describe the world across different contexts should tell us that data standards are always going to encode all sorts of social and cultural assumptions – and that the complexity of our real-world relationships, and all that we want to know about the different overalapping institutional domains that affect our lives will never be easily rendered into a single set of schema.

This is not to say we should not pursue standardisation: standards are an important tool. But I want to suggest that we should embed our talk of standards within a wider discussion about interoperability, and information quality.

An interop approach

I had the chance to take a few minutes out of IODC conference preparations last week to catch up with Urs Gaser, co-author of Interop: The Promise and Perils of Highly Interconnected Systems, and one of the leaders of the ongoing interop research effort. As Urs explained, an interoperability lens provides another way of thinking about the problem standards are working to address.

Where a focus on standards leads us to focus on getting all data represented in a common format, and on using technical specifications to pursue policy goals – an interoperability focus can allow us to incorporate a wider range of strategies: from allowing the presence of translation and brokering layers between different datasets, to focussing on policy problems directly to secure the collection and disclosure of important information.

And even more importantly, an interop approach allows us to discuss what the right level of interoperability to aim for is in any situation: recognising, for example, that as standards become embedded, and sunk into our information infrastructures, they can shift from being a platform for innovation, to a source of innertia and constraints on progress. Getting the interopabiliy level right in global standards is also important from a power perspective: too much interoperability can constrain the ability of countries and localities to adapt how they express data to meet their own needs.

For example, looked at through a standards lense, the existence of different data schema for describing the location of public toilets in Sydney, Chennai and London is a problem. From the standards perspective we want everyone to converge on the same schema and to use the same file formats. For that we’re going to need a committee to manage a global standard, and an in-depth process of enrolling people in the standard. And the result with almost undoubtedly be just one more standard out there, rather than one standard to rule them all, as the obligatory XKCD cartoon contends.

But through an interoperability lense, the first question is what level of interoperability do we really need? Andwhat are the consequences of the level we are striving for?. It invites us to think about the different users of data, and how interoperablity affects them. For example, a common data schema used by all cities might allow a firm providing a loo-location app in Ottawa to use the same technical framework in Chennai, but is this really the ideal outcome? But the consequences of this could be to crowd out local developers who could build something much more culturally contextualised. And there is generally nothing to stop the Ottawa firm from building a translation layer between the schemas used in their app, and the data disclosed in other cities – as long as the disclosure of data in each context include certain key elements, and are internally consistent.

Secondly, an interoperability lens encourages us to consider a whole range of strategies: from regulations that call consistent disclosure of certain information without going as far as giving schema, to programmes to develop common identification infrastructures, to the development and co-funding of tools that bridge between data captured in different countries and contexts, and the fostering of collaborations between organisations to work together on aggregating heterogenous data.

As conversations develop around how to enable collaboration between groups working on open aid data, public contracts, budgets, extractives and so-on, it is important to keep the full range of tools on the table for how we might enable users to find connections between data, and how the interoperability of different data sources might be secured: from building tools and platforms, working together on identifiers and small building-blocks of common infrastructure, to advocating for specific disclosure policies and, of course, discussing standards.

Information quality

When it comes down to it – for many initiatives, standards and interoperability are only a means to another end. The International Aid Transparency Initiative cares about giving aid recieving governments a clear picture of the resources available to them. The Open Contracting Partnership want citizens to have the data they need to be more engaged in contracting, and for corruption in procurement to be identified and stopped. And the architects of public loo data standards don’t want you to get caught short.

Yet often our information quality goals can get lost as we focus on assessing and measuring the compliance of data with schema specs. Interoperability and quality are distinct concepts, although they are closely linked. Having standardised, or at least interoperable data, makes it easier to build tools which go some of the way to assessing information quality for example.

interop-and-quality

But assessing information quality goes beyond this. Assessments need to take place from the perspective of real use-cases. Whilst often standardisation aims at abstraction, our work on promoting the quality, relevance and utility of data sharing – at both the local and global levels – has to be rooted in very grounded problems and projects. Some of the work Johanna Walker and Mark Frank have started on user-centered methods for open data assessment, and Global Integrity’s bottom-up Follow The Money work starts us down this path, but we’ve much more work to do to make sure our discussions of data quality are substantive as well as technical.

Thinking about assessing information quality distinct from interoperability can also help us to critically analyse the interoperability ecosystems that are being developed. We can look at whether an interoperability approach is delivering information quality for a suitable diverse range of stakeholders, or whether the costs of getting information to the required quality for use are falling disproportionately one one group rather than another, or are leading to certain use-cases for data being left unrealised.

Re-framing the debate

I’m not calling for us to abandon a focus on standards. Indeed, much of the work I’m committed to in the coming year is very much involved in rolling out data standards. But I do want to invite us to think about framing our work on standards within a broader debate on interoperability and information quality (and ideally to embed this conversation within the even broader context of thinking on Information Justice, and an awareness of critical information infrastructure studies, and work on humanistic approaches to data).

Exactly what shape that debate takes: I don’t know yet… but I’m keen to see where it could take us…

2015 Open Data Research Symposium – Ottawa

There are a few days left to submit abstracts for the 2015 Open Data Research Symposium due to take place alongside 3rd International Open Government Data Conference in Ottawa, on May 27th 2015.

Registration is also now open for participants as well as presenters.

Call for Abstracts: (Deadline 28th Feb 2015; submission portal)

As open data becomes firmly cemented in the policy mainstream, there is a pressing need to dig deeper into the dynamics of how open data operates in practice, and the theoretical roots of open data activities. Researchers across the world have been looking at these issues, and this workshop offers an opportunity to bring together and have shared dialogue around completed studies and work-in-progress.

Submissions are invited on themes including:

  • Theoretical framing of open data as a concept and a movement;
  • Use and impacts of open data in specific countries or specific sectors, including, but not limited to: government agencies, cities, rural areas, legislatures, judiciaries, and the domains of health, education, transport, finance, environment, and energy;
  • The making, implementation and institutionalisation of open data policy;
  • Capacity building for wider availability and use of open data;
  • Conceptualising open data ecosystems and intermediaries;
  • Entrepreneurial usage and open data economies in developing countries;
  • Linkages between transparency, freedom of information and open data communities;
  • Measurement of open data policy and practices;
  • Critical challenges for open data: privacy, exclusion and abuse;
  • Situating open data in global governance and developmental context;
  • Development and adoption of technical standards for open data;

Submissions are invited from all disciplines, though with an emphasis on empirical social research. PhD students, independent and early career researchers are particularly encouraged to submit abstracts. Panels will provide an opportunity to share completed or in-progress research and receive constructive feedback.

Submission details

Extended abstracts, in French, English, Spanish or Portuguese, of up to two pages, detailing the question addressed by the research, methods employed and findings should be submitted by February 28th 2015. Notifications will be provided by March 31st. Full papers will be due by May 1st. 

Registration for the symposium will open shortly after registration for the main International Open Government Data Conference.

Abstracts should be submitted via Easy Chair

Paper format

Authors of accepted abstracts will be invited to submit full papers. These should be a maximum of 20 pages single spaced, exclusive of bibliography and appendixes. As an interdisciplinary and international workshop we welcome papers in a variety of formats and languages: French, English, Spanish and Portuguese. However, abstracts and paper presentations will need to be given in English. 

Full papers should be provided in .odt, .doc, or .rtf or as .html. Where relevant, we encourage authors to also share in a repository, and link to, data collected as part of their research. 

We are working to identify a journal special issue or other opportunity for publication of selected papers.

Contact

Contact savita.bailur@webfoundation.org or tim.davies@soton.ac.uk for more details.

Programme committee

About the Open Data Research Network

The Open Data Research Network was established in 2012 as part of the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project. It maintains an active newsletter, website and LinkedIn group, providing a space for researchers, policy makers and practitioners to interact. 

This workshop will also include an opportunity to find out how to get involved in the Network as it transitions to a future model, open to new members and partners, and with a new governance structure. 

Exploring the Open Data Barometer

[Summary: ODI Lunchtime lecture about the Open Data Barometer]

odb-logo

Screen Shot 2015-02-24 at 20.39.15

Just over a month ago, the World Wide Web Foundation launched the second edition of the Open Data Barometer to coincide with BBC Democracy Day. This was one of the projects I was worked on at the Web Foundation before I completed my projects there at the end of last year. So, on Friday I had the opportunity to join with my successor at Web Foundation, Savita Bailur, to give an ODI Friday lunchtime talk about the methods and findings of the study.

A recording of the talk and slides are embedded below:

Friday lunchtime lecture: Exploring the Open Data Barometer: the challenges ahead for an open data revoluti…

And, as the talk mentions – all the data from the Open Data Barometer is available in the interactive report at http://opendatabarometer.org/

Unpacking open data: power, politics and the influence of infrastructures

[Summary: recording of Berkman Centre Lunch Talk on open data]

Much belatedly, below you will find the video from the Berkman Centre Talk I gave late last year on ‘Unpacking open data: power, politics and the influence of infrastructures

You can find a live-blog of the talk from Matt Stempeck and Erhardt Graff over on the MIT Media Lab blog, and Willow Brugh drew the fantastic visual record of themes in the talk shown below:

Unpacking_open_data

The slides are also up on Slideshare here.

I’m now in the midst of trying to make more sense of the themes in this talk whilst in the writing up stage for my PhD… and much of the feedback I had from the talk has been incredibly valuable in that – so comments are always welcome.

20 ways to connect open data and local democracy

[Summary: notes for a workshop on local democracy and open data]

At the Local Democracy for Everyone (#notInWestminister) workshop in Huddersfield today I led a session titled ‘20 ways to connect open data and local democracy‘. Below is the list of ideas we started the workshop with

In the workshop we explored how these, and other approaches, could be used to respond to priority local issues, from investing funds in environmental projects, to shaping local planning processes, and dealing with nuisance pigeons.

Graphic recording from break-out session by [@Jargonautical](http://www.twitter.com/jargonautical]

There is more to do to re-imagine how local open data should work, but the conversations today offered an interesting start.

1. Practice open data engagement

Data portals can be very impersonal things. But behind every dataset is a council officer or a team working to collect, manage and use the data. Putting a human face on datasets, linking them to the policy areas they affect, and referencing datasets from reports that draw upon them can all help put data in context and make it more engaging.

The Five Stars of Open Data Engagement provides a model for stepping up engagement activities, from providing better and more social meta-data, through to hosting regular office-hours and drop-in sessions to help the local community understand and use data better.

2. Showing the council contribution

A lot of the datasets required by the Local Government Transparency Code are about the cost of services. What information and data is needed to complete the picture and to show the impact of services and spending?

The Caring for my Neighbourhood project in Sao Paulo looked to geocode government budget and spending data, to understand where funds were flowing, and have opened up a conversation with government about how to collected data in ways that make connecting budget data and its impacts easier in future.

Local government in the UK has access to a rich set of service taxonomies which could be used to link together data on staff salaries, contracts and spending, with stats and stories on the service they provide and their performance. Finding ways to make this full picture accesssible and easy to digest can provide the foundation for more informed local dialogue.

3. Open Data Discourses

In Massachussetts the Open Data Discourse project has been developing the idea of data challenges: based not just one app-building, but also on using data to create policy ideas that can address an identified local challenge.

For Cambridge, Mass, the focus for the first challenge in fall 2014 was on pedestrian, bicycle, and car accidents in the City. Data on accidents was provided, and accesed over 2,000 times in a six-week challenge period. The challenge resulted in eight submissions “that addressed policy-relevant issues such as how to format traffic accident data to enable trend analysis across the river into Boston, or how to reduce accidents and encourage cycling by having a parked car buffer.”

The challenge processes culminated in a friday evening meeting that brought together community members who had worked on challenge ideas, with councillors and representatives of the local authority, to showcase the solutions and provide an award for a winning idea.

4. Focus on small data

There’s a lot of talk out there about ‘big data’ and how big data analytics can revolutionise government. But many of the datasets that matter are small data: spreadsheets created by an officer, or records held by community groups in various structures and formats.

Rahul Bhargava defines small data as:

“the thing that community groups have always used to do their work better in a few ways:

  • Evaluate: Groups use Small Data to evaluate programs so they can improve them
  • Communicate: Groups use Small Data to communicate about their programs and topics with the public and the communities they serve
  • Advocate: Groups use Small Data to make evidence-based arguments to those in power”

Simple steps to share and work with small data can make a big difference: and keep citizens rather than algorythms in control.

5. Tactile data and data murals

The Data Therapy project has been exploring a range of ways to make data more tactile: from laser-cutting food security information into vegetables to running ‘low tech data’ workshops that use pipe-cleaners, lego and crayons to explore representations of data about a local community.

Turning complex comparisons and numbers into physical artefacts, and finding the stories inside the statitics can offer communities a way into data-informed dialogue, without introducing lots of alienating graphs and numbers.

The Data Therapy project’s data murals connect discussions of data with traditional community arts practice: painting large scale artworks that represent a community interpretation of local data and information.

6. Data-driven art

The Open Data Institute’s Data as Culture project has run a series of data art commissions: leading to a number of data-driven art works that bring real-time data flows into the physical environment. In 2011 Bristol City Council commissioned a set of art works, ‘Invisible Airs‘ that included a device stabbing books in response to library cuts, and a spud gun triggered by spending records.

Alongside these political art works that add an explicit emotional dimension to public data, low-cost network connected devices can also be used to make art that passively informs – introducing indicators that show the state of local data into public space.

7. Citizen science

Not all the data that matters to local decision making comes from government. Citizens can create their own data, via crowdsourcing and via citizen-science approaches to data collection.

The Public Lab describes itself as a ‘DIY Environmental Science Community’ and provides How To information on how citizens groups can build their own sensors or tools for everything from arial mapping to water quality monitoring. Rather than ‘smart cities’ that centralise data from sensor networks, citizen science offers space for a collaboration between government and communities – creating smart citizens who can collect and make sense of data alongside local officials.

In China, citizens started their own home water quality testing to call for government to recognise and address clean water problems.

8. Data dives & hackathons

DataKind works to bring together expert analysts with social-sector organisations that have data in order to look for trends and insights. Modelled on a hackathon, where activity takes place over an intense day or weekend of work, DataDives can generate new findings, new ideas about hwo to use data, and new networks for the local authority to draw upon.

Unlike a hackathon where the focus is often on developing a technical app or innovation and where programme skill is often a pre-requisite, a Data Dive might be based around answering a particular question, or around finding what data means to multi-disciplinary teams.

It is possible to design inclusive hackathons which connect up the lived experience of communities with digital skills from inside and outside the community. The Hackathon FAQ explores some of the common pitfals of holding a civic hackathons: encouraging critical thought about whether prizes and other common features are likely to incentivise contributions, or distort the kinds of team building and collaboration wanted in a civic setting.

9. Contextualised consultation

Too often local consultations ask questions without providing citizens with the information they might need to explore and form their opinions. For example, a online consultation on green spaces, simply by asking for the Ward or Postcode of a respondent, could provide tailored information (and questions) about the current green spaces nearby.

Live open data feedback on the demographics and diversity of consultation respondents could also play a role in incentivising people to take part to ensure their views are represented.

It’s important though not to make too many assumptions when providing contextualised data: a respondent might care about the context near where their parents or children live, as much as their own for example – and so interfaces should offer the ability to look at data around areas other than your home.

10. Adopt a dataset

When it snows in America, Fire Hydrants on the street can get frozen under the ice, and so its important to dig them out after snowfall. However, the council don’t have resources to always get to all the hydrants in time. Code for America found an ingenious solution, taking an open dataset of fire hydrants, and creating a campaign for people to ‘Adopt a Hydrant‘, committing to dig it out when the blizzards come. They combined data with a social layer.

The same approach could work for many other community assets, but it could also work for datasets. Which dataset could be co-created with the community? Could walkers help adopt footpath data and help keep it updated? Could the local bus user group adopt data on accessibility of public tranport roots, helping keep it updated?

The relationships created around a data quality feedback loop might also become important relationships for improving the services that the data describes. ?

11. Data-rich press releases

Local authorities are used to putting out press releases, often with selected statistics in. But how can those releases also contain links to key datasets, and even interactive assets that journalists and the public can draw upon to dig deeper into the data.

Data visualisation expert David McCandless has argued that interactivity plays an important role in allowing people to explore structured data and information, and to turn it into knowledge. The Guardian Data Blog has shown how engaging information can be created from datasets. Whilst the Data Journalism Handbook offers some pointers for journalists (and local bloggers) to get started with data, many local newspapers don’t have the dedicated data-desks of big media houses – so the more the authority can do to provide data in ready-to-reuse forms, the more it can be turned into a resource to support local debate.

12. URLs for everything – with a call to action

Which is more likely to turn up on Twitter and get clicked on:

“What do you think of new cycle track policy? Look on page 23, paragraph 2 or report at bottom of this page: http://localcouncil.gov/reports/1234”? or

“What do you think of new cycle track policy? http://localcouncil.gov/policy/ab12”

Far too often the important information citizens might want might be online, but is burried away in documents or provided in ways that are impossible to link to.

When any proposal, policy, decision or transaction gets a permenant URL (web address) it can become a social object: something people can talk about on twitter and facebook and in other spaces.

For Linked Data advocates, giving everything in a dataset its own URL plays an important role in machine-to-machine communication, but it also plays a really important role in human communication. Think about how visitors to a data item might also be offered a ‘call to action’, whether it’s to report concerns about a spending transaction, or volunteer to get involved in events at a park represented by a data item.

13. Participatory budgeting – with real data

What can £5000 buy you? How much does it cost to run a local carnival? Or a swimming pool? Or to provide improved social care? Or cycle lanes? Answers to these questions might exist inside spending data – but often when participatory budgeting activities take place the information needed to work out what kinds of options may be affordable only comes into the picture late in the process.

Open Spending, the World Bank, NESTA and the Finish Institute have all explored how open data could change the participatory budgeting process – although as yet there have been few experiments to really explore the possibilities.

14. Who owns it?

Kirlees Council have put together the ‘Who Owns My Neighbourhood?’ site to let residents explore land holdings and to “help take responsibility for land, buildings and activities in your neighbourhood”. Similar sites, with the goal of improving how land is used and addressing the problem of vacant lots, are cropping up across American cities.

These tools can enable citizens to identify land and government assets that could be better used by the community: but unchecked they may also risk giving more power to wealthy property speculators as a widely cited case study from Bangalore has warned.

15. Social audits

In many parts of the developing world, particularly across India, the Social Audit is an important process, focussed on “reviewing official records and determining whether state reported expenditures reflect the actual monies spent on the ground” (Aiyar & Samji, 2009).

Social Audits involve citizens groups trained up to look at records and ‘ground truth’ whether or not resources have been used in the way authorities say. Crucially, Social Audits culminate in public hearings: meetings where the findings are presented and discussed.

Models of citizen-led investigation, followed by formal public meetings, are also a feature of the London Citizens community organising approach, where citizens assemblies put community views to people in power. How could key local datasets form part of an evidence gathering audit process, whether facilitated by local government or led by independent community organisations?

16. Geofenced bylaws, licenses and regulations: building the data layer of the local authority

After seeing some of the projects to open up the legal codes of US cities I started where I would find out about the Byelaws in my home town of Oxford. As the page on the City Council website that hosts them explaines: “Byelaws generally require something to be done – or not done – in a particular location.”. Unfortunately, in Oxford, what is required to be done, and where is locked up inside scanned PDFs of typewritten minutes.

There are all sorts of local rules and regulations, licenses and other information that authorities issue which is tied to a particular geographic location: yet this is rarely a layer in the Geographic Information Systems that authorities use. How might geocoding this data, or even making it available through geofencing apps help citizens to navigate, explore and debate the rules that shape their local places.?

17. Conversations around the contracts pipeline?

The Open Contracting project is calling for transparency and participation in public contracting. As part of the UK Local Government Transparency Code authorities have to publish the contracts they have entered into – but publishing the contract pipeline and planned procurement offers an important opportunity to work out if there are fresh ideas or important insights that could shape how funds are spent.

The Open Contracting Data Standard provides a way of sharing a flow of data about the early stages of a contracting process. Combine that information with a call to action, and a space for conversation, and there are ways to get citizens shaping tenders and the selection of suppliers.

18. Participatory planning: visualising the impacts of decisions

What data should a local authority ask developers submitting planning applications to provide?

For many developments there might be detailed CAD models available which could be shared and explored in mapping software to support a more informed conversation about proposed building projects. ?

19. Stats that matter

?Local authorities often conduct one-off surveys and data collection excercises. These are a vital opportunity to build up an understanding of the local area. What opportunities are there to work in partnership with local community groups to identify the important questions that they want to ask? How can local government and community groups collaborate to collect actionable stats that matter: pooling needs, and even resources, to get the best sample and the best depth of insight?

20. Spreadsheet scorecards and dashboards

Dig deep enough in most local organisations and you will find one or more ‘super spreadsheets’ that capture and analyse key statistics and performance indicators. Many more people can easily pick up the skills to create a spreadsheet scorecard than can become overnight app developers.

Google Docs spreadsheets can pick up data live from the web. What dashboards might a local councillor want? Or a local residents association? What information would make them better able to do their job?

Five reflections for an open data hackathon

Future Food HackI was asked to provide a short talk at the start of the Future Food Hackathon that kicked off in Wageningen, NL today, linked to the Global Open Data on Agriculture and Nutrition workshop taking place over the next few days.

Below are the speaker notes I jotted down for the talk.

On open data and impact

I want to start with an admission. I’m a sceptic about open data.

In the last five years we’ve seen literally millions of datasets placed online as part of a broad open data movement – with grand promises made about the way this will revolutionise politics, governance and economies.

But, when you look for impact, with the exception of a few specific domains such as transport, the broad society wide impact of that open data is hard to find. Hundreds of hack-days have showcased what could be possible with data, but few have delivered truly transformative innovations that have made it to scale.

And many of the innovations that result often seem to focus #FirstWorldProblems – if not purely ‘empowering the already empowered’, then at least not really engaging with social issues in ways that are set to tip the balance in favour of those with least advantage.

I’m sceptical, but I’m not pessimistic. In fact, understood as part of a critique of the closed way we’ve been doing aid, policy making, production and development – open data is an incredibly exciting idea.

However, far to much open data thinking has stopped at the critique, without moving on to propose something new and substantive. It offers a negation (data which is not proprietary; not in PDF; not kept from public view), without talking enough about how new open datasets should be constructed. Because opening data is not just about taking a dataset from inside the government or company and putting it online, in practice it involves the creation of new datasets: selecting and standardising fields and deciding how to model data. This ultimately involves the construction of new systems of data.

And this links to a second blind spot of current open data thinking: the emphasis on the dataset, to the exclusion of the social relationships around it.

Datasets do not stand alone. The are produced by someone, or some group, for some purpose. They get meaning from their relationship to other data, and from the uses to which they are put. As Lisa Gitelman and colleagues have put it in ‘Raw Data is an Oxymoron’, datasets have histories, and we need to understand these to reshape their futures.

Matthew Smith and colleagues at the IDRC have spent a number of years exploring the idea of openness in development. They distinguish between openness defined in ‘universal legal and technical terms’, and openness as a practice – and argue that we need to put open practices at the centre of our theory of openness. These practices are, to some extent, enabled by the formalities of creative common licenses, or open data formats, but they are something more, and draw upon the cultures of peer-to-peer production and open source, not just the legal and technical devices.

Ultimately, then, I’m optimistic about the potential of open data if we can to think about the work of projects like GODAN not just as a case of gaining permission to work with a few datasets, but as about building new open and collaborative infrastructures, through which we can use data to communicate, collaborate and reshape our world.

I’m also hopeful about the potential of colliding cultures from open source and open data, with current cultures in the agriculture and nutrition communities. Can we bring these into a dialogue that builds shared understanding of how to solve problems, and lets us rethink both openness, and agriculture, to be more effective, inclusive and just?

Five observations on hacking with open data

Ok: so let me pause. I recognise that the last few minutes might have been a bit abstract and theoretical for 9am on a Monday morning. Let me try then and offer then five somewhat more practical thoughts about approaching an open data hackathon:

1. Hacking is learning.

A common experience of the hackathon is frustration at the data not just being ready to use. Yet the process of struggling with data is a process of learning about the world it represents – and sometimes one of the most important outcomes of a hack is the induction of a new community of people, from different backgrounds, into shared understanding of some data and domain.

One of the most fascinating things about the open government data processes I’ve been tracking in the UK has been the way in which it has supported civic learning amongst technology communities – coming to understand more how the state works by coming to understand its data.

So – at an interdisciplinary hack like this, there is the opportunity to see peculiarities of the data as opportunities to understand the process and politics of the agriculture and nutrition field, and to be better equipped to propose new approaches that don’t try to make perfect data out of problematic situations – but that try and engage with the real challenges and problems of the field.

2. Hacking is political.

I’ve had the pleasure over the last few years of working an number of times with the team at the iHub in Nairobi, and of following the development of [Kenya’s open data initiative]. In their study of an ‘incubator’ project to encourage developers to use Kenyan open government data, Leo Mutuku and her team made an interesting discovery.

Some developers did not understand their apps as products to be taken to scale – but instead saw them as rhetorical acts. A demonstration to government of how ICTs could be used, and a call on government to rethinking its own ICTs, rather than an attempt by outside developers to replace those ICTs for government.

Norfolk based developer, Rupert Reddington, once referred to this as ‘digital pamphleteering’ in which the application is a provocation in a debate – rather than primarily, or at all, a tool for everyday use.

Think about how you present a openness-oriented provocation to the status quo when you pitch your ideas and creations.

3. You are building infrastructure.

Apps created with open data are just one part of the change process. Even a transport app that lets people know when the next bus is only has an impact if it becomes part of people’s everyday practice, and they rely on it in ways that change their behaviour.

Infrastructure is something which fades into the background: when it becomes established and works well, we don’t see it. It is only when it is disrupted that it becomes notable (as I learned trying to cross the channel yesterday – when the Channel Tunnel became a very visible piece of infrastructure exactly because it was blocked and not working).

One of the questions I’m increasingly asking in my research work, is how we can build ‘inclusive infrastructures’, and what steps we need to take to ensure that the data infrastructures we have are tipped in favour of the least advantaged rather than the most powerful. Sometimes the best innovations are ones that complement and extend an existing infrastructure, bringing hitherto unheard voices into the debate, or surfacing hitherto unseen assumptions.

Sustainability is also important to infrastructure. What you create today may just be a prototype – but if you are proposing it as part of a new infrastructure of action – consider if you can how it might be made sustainable. Would building for sustainability change the concept or idea?

4. Look at the whole value chain.

There is a tendency in hackthons to focus on the ‘end user’ – building consumer oriented apps and platforms. Often that approach makes sense: disintermediation can make many systems work better. But it’s not always the way to make the most difference.

When I worked with CABI and the Institute for Development Studies in 2013 to host a ‘Research to Impact’ hackathon at the iHub in Nairobi, we brought together people involved in improving the quality of agriculture and the lives of smallholder farmers. After a lot of discussion, it became clear that between ‘research’ and the ‘farm’ were all sorts of important intermediaries, from seed-sellers, to agricultural extension workers. Instead of building direct-to-farmer information systems, teams explored the kinds of tools that could help an agriculture extension worker deliver better support, or that could help a seed-seller to improve their product range.

Apps with 10s or 100s of back-office users may be much more powerful than apps with 1000s of ‘end users’.

When the two Open Data in Developing Countries project research partners in Kenya launched their research in the middle of last year, an interesting argument broke out between advocates of ‘disintermediation’, and ‘empowering intermediaries’. One the one hand, intermediaries contextualise information, and may be trusted: helping communities adopt information as actionable insights, when they may not understand or trust the information direct from source. On the other hand, intermediaries are often seen as a problem: middle-men using their position for self-interest, and limiting the freedoms of those they are the intermediary to.

Open approaches can offer an important ‘pressure valve’ in these contexts: focussing on creating platforms for intermediary, but not restricting information to intermediaries only.

5. Evolution can be as powerful as revolution.

The UN Secretary General has led the call for a ‘data revolution for development’, with the Independent Expert Group he appointed proposing a major updated in practices of data use and practice.

This revolution narratives often implies that organisations needs to shift direction; completely transforming data practices; throwing out existing report-writing and paper-based approaches in place of new ‘digital by default’ technology-driven processes. But what happens if we think differently and start from the existing strengths of organisations:

  • What is going well when it comes to data in the international potato trade?
  • Who are the organisations with promising practice in localising climate-change relevant information for farmers?
  • What have been the stories of progress in tracking food-borne disease?

How can we extend these successes? What innovations have made their first iteration, but are just waiting for the next?

One of the big challenges of ‘data revolution’ is the organisational change curve it demands, and the complex relationship between data supply and demand. Often the data available right now is not great. For example, if you are currently running a crop monitoring project with documents and meetings, but a new open dataset becomes available that is relevant to your work, starting a ‘data revolution’ tomorrow will involve lots of time working with bad data and finding new ways to work around the peculiarities of the new system: the investment this year to do the same work you were doing with ‘inefficient’ analogue approaches last year might be double, as you scale the learning curve.

Of course, in year 3 or 4, the more efficient way of working may start to pay off: but often projects never get there. And because use of the new open dataset dropped away in year 2, when early adopters realised they could not afford to transform their practices to work with it, government publishers get discouraged, and by year 3 and 4 the data might not be there.

An evolution approach works out how to change practices year-by-year: iterating and negotiating the place of data in the future of food.

(See Open Data in Developing Countries – Insights from Phase I for more on this point)

In conclusion

Ok. Still a bit abstract for 9.15am on a Monday morning: but I hope the general point is clear.

Ultimately, the most important thing about the creations at a hackathon is their ‘theory of change’: how does the time spent hacking show the way towards real change? I’m certainly very optimistic that when it comes to the pitch back tomorrow, the ideas and energy in this room will offer some key pointers for us all.

Internet Monitor 2014 chapter on Data Revolutions: Bottom-Up Participation or Top-Down Control?

Internet Monitor[Summary: cross-posting article from from the 2014 Internet Monitor]

The 2014 Internet Monitor Report has just been launched. It’s packed with over 35 quick reads on the landscape of contemporary Internet & Society issues, from platforms and policy, to public discourse. This years edition also includes a whole section on ‘Data and privacy’. My article in the collection, written earlier this year, is below to archive. I encourage you to explore the whole collection – including some great inputs from Sara Watson and Malavika Jayaram exploring how development agencies are engaging with data, and making the case for building better maps of the data landscape to inform regulation and action.

Data Revolutions: Bottom-Up Participation or Top-Down Control?

In September 2015, through the United Nations, governments will agree upon a set of new Sustainable Development Goals (SDGs) replacing the expired Millennium Development Goals and setting new globally agreed targets on issues such as ending poverty, promoting healthy lives, and securing gender equality.1 Within debates over what the goals should be, discussions of online information and data have played an increasingly important role.

Firstly, there have been calls for a “Data Revolution” to establish better monitoring of progress towards the goals: both strengthening national statistical systems and exploring how “big data” digital traces from across the Internet could enable real-time monitoring.2 Secondly, the massive United Nations-run MyWorld survey, which has used online, mobile, and offline data collection to canvas over 4 million people across the globe on their priorities for future development goals, consistently found “An honest and accountable government” amongst people’s top five priorities for the SDGs.3 This has fueled advocacy calls for explicit open government goals requiring online disclosure of key public information such as budgets and spending in order to support greater public oversight and participation.

These two aspects of “data revolution” point to a tension in the evolving landscape of governments and data. In the last five years, open data movements have made rapid progress spreading the idea that government data (from data on schools and hospitals locations to budget datasets and environmental statistics) should be “open by default”: published online in machine-readable formats for scrutiny and re-use. However, in parallel, cash-strapped governments are exploring the greater use of private sector data as policy process inputs, experimenting with data from mobile networks, social media sites, and credit reference agencies amongst others (sometimes shared by those providers under the banner of “data philanthropy”). As both highly personal and commercially sensitive data, these datasets are unlikely to ever be shared en-masse in the public domain, although this proprietary data may increasingly drive important policy making and implementation.

In practice, the evidence so far suggests that the “open by default” idea is struggling to translate into widespread and sustainable access to the kinds of open data citizens and civil society need to hold powerful institutions to account. The multi-country Open Data Barometer study found that key accountability datasets such as company registers, budgets, spending, and land registries are often unavailable, even where countries have adopted open data policies.4 And qualitative work in Brazil has found substantial variation in how the legally mandated publication of spending data operates across different states, frustrating efforts to build up a clear picture of where public money flows.5 Furthermore, studies regularly emphasize the need not only to have data online, but also the need for data literacy and civil society capacity to absorb and work with the data that is made available, as well as calling for the creation of intermediary ecosystems that provide a bridge between “raw” data and its civic use.

Over the last year, open data efforts have also had to increasingly grapple with privacy questions.6 Concerns have been raised that even “non-personal” datasets released online for re-use could be combined with other public and private data and used to undermine privacy.7 In Europe, questions over what constitutes adequate anonymization for opening public data derived from personally identifying information have been hotly debated.8

The web has clearly evolved from a platform centered on documents to become a data-rich platform. Yet, it is public policy that will shape whether it is ultimately a platform that shares data openly about powerful institutions, enabling bottom up participation and accountability, or whether data traces left online become increasingly important, yet opaque, tools of governance and control. Both open data campaigners and privacy advocates have a key role in securing data revolutions that will ultimately bring about a better balance of power in our world.

Notes

  • 1: UN High-Level Panel of Eminent Persons on the Post-2015 Development Agenda, “A New Global Partnership: Eradicate poverty and transform economies through sustainable development,” 2013, http://www.un.org/sg/management/pdf/ HLP_P2015_Report.pdf.
  • 2: Independent Expert Advisory Group on the Data Revolution, http://www.undatarevolution.org.
  • 3: MyWorld Survey, http://data.myworld2015.org/.
  • 4: World Wide Web Foundation, “Open Data Barometer,” 2013, http://www.opendatabarometer. org.
  • 5: N. Beghin and C. Zigoni, “Measuring open data’s impact of Brazilian national and sub-national budget transparency websites and its impacts on people’s rights,” 2014, http://opendataresearch.org/content/2014/651/measuring-opendatas-impact-brazilian-national-and-sub-national-budget.
  • 6: Open Data Research Network, “Privacy Discussion Notes,” 2013, http://www.opendataresearch.org/content/2013/501/ open-data-privacy-discussion-notes.
  • 7: Steve Song, “The Open Data Cart and Twin Horses of Accountability and Innovation,” June 19, 2013, https:// manypossibilities.net/2013/06/the-open-data-cart-and-twin-horses-of-accountability-and-innovation/.
  • 8: See the work of the UK Anonymisation Network, http://ukanon.net/.

(Article under Creative Commons Attribution 3.0 Unported)

Do we need eligibility criteria for private sector involvement in OGP?

I’ve been in Costa Rica for the Open Government Partnership (OGP) Latin America Regional Meeting (where we were launching the Open Contracting Data Standard), and on Tuesday attended a session around private sector involvement in the OGP.

The OGP was always envisaged as a ‘multi-stakeholder forum’ – not only for civil society and governments, but also to include the private sector. But, as Martin Tisne noted in opening the session, private sector involvement has so far been limited – although an OGP Private Sector Council is currently developing.

In his remarks (building on notes from 2013), Martin outlined six different roles for the private sector in open government, including:

  1. Firms as mediators of open government data – making governance related public data more accessible;
  2. Firms as beneficiaries and users of open data – building businesses of data releases, and fostering demand for, and sustainable supply of, open data;
  3. Firms as anti-corruption advocates – particularly rating agencies whose judgements on risk of investment in a country as a result of poor governance environments can strongly incentivise governments to institute reforms;
  4. Firms practising corporate accountability – including by being transparent about their own activities.
  5. Technology firms providing platforms for citizen-state interaction – from large platforms like Facebook which have played a role in democracy movements, to specifically civic private-sector provided platforms like change.org or SeeClickFix.
  6. Companies providing technical assistance and advice to governments on their OGP action plans.

The discussion panel then went on to look at a number of examples of private sector involvement in open government, ranging from Chambers of Commerce acting as advocates for anti-corruption and governance reforms, to large firms like IBM providing software and staff time to efforts to meet the challenge of Ebola through data-driven projects. A clear theme in the discussion was the need to recognise that, like government and civil society, the private sector is not monolithic. Indeed, I have to remember that I’ve participated in the UK OGP process as a result of being able to subsidise my time via Practical Participation Ltd.

Reflecting on public and private interests

Regardless of the positive contributions and points made by all the panelists in the session, I do find myself approaching the general concept of private sector engagement with OGP with a constructive scepticism, and one that I hope supports wider reflections about the role and accountability of all stakeholders in the process. Many of these reflections are driven by a concern about the relative power of different stakeholders in these processes, and the fact that, in a world where the state is often in retreat, civil society spread increasingly thin, and wealth accumulated in vastly uneven ways, ensuring a fair process of multi-stakeholder dialogue requires careful institutional design. In light of the uneven flow of resources in our world, these reflections also draw on an important distinction between public and private interest.

Whilst there are institutional mechanisms in place (albeit flawed in many cases) that mean both government and non-profits should operate in the public interest, the essential logic of the private sector is to act in private interest. Of course, the extent of this logic varies by type of firm, but large multi-nationals have legal obligations to their shareholders which can, at least when shareholders are focussed on short-term returns, create direct tensions with responsible corporate behaviour. This is relevant for OGP in at least two ways:

Firstly, when private firms are active contributors to open government activities, whether mediating public data, providing humanitarian interventions, offering platforms for citizen interaction, or providing technical assistance, mechanisms are needed in a public interest forum such as the OGP to ensure that such private sector interventions provide a net gain to the public good.

Take for example a private firm that offers hardware or software to a government for free to support it in implementing an open government project. If the project has a reasonable chance of success, this can be a positive contribution to the public good. However, if the motivation for the project comes from private rather than a public interest, and leads to a government being locked into future use of a proprietary software platform, or to an ongoing relationship with the company who have gained special access as a result of their ‘CSR’ support for the open government project – then it is possible for the net-result to be against the public interest.

It should be possible to establish governance mechanisms that address these concerns, and allow the genuine public interest, and win-win contributions of the private sector to open government and development to be facilitated, whilst establishing checks against abuse of the power imbalance, whether due to relative wealth, scale or technical know-how, that can exist between firms and states.

Secondly, corporate contributions to aspects of the OGP agenda should not distract from a focus on key issues of large-scale corporate behaviour that undermine the capacity and effectiveness of governments, such as the use of complex tax avoidance schemes, or the exploitation of workforces and suppression of wages such that citizens have little time or energy left after achieving the essentials of daily living to give to civic engagement.

A proposal

In Tuesday’s session these reflections led me towards thinking about whether the Open Government Partnership should have some form of eligibility criteria for corporate participants, as a partial parallel to those that exist for states. To keep this practical and relevant, they could relate to the existence of key disclosures by the firm for all the settings they operate in: such as disclosure of amount of tax paid, the beneficial owners of the firm, and of the amount of funding the firm is putting towards engagement in the OGP process.

Such requirements need not necessarily operate in an entirely gatekeeping fashion (i.e. it should not be that participants cannot engage at all without such disclosures), but could be instituted initially as a recommended transparency practice, creating space for social pressures to encourage compliance, and giving extra information to those considering the legitimacy of, and weight to give to, the contributions of corporate participants within the OGP process.

As noted earlier, these critical reflection might also be extended to civil society participants: there can also be legitimate concerns about the interests being represented through the work of CSOs. The Who Funds You campaign is a useful point of reference here: CSO participants could be encouraged to disclosure information on who is funding their work, and again, how much resource they are dedicating to OGP work.

Conclusions

This post provides some initial reflections as a discussion starter. The purpose is not to argue against private sector involvement in OGP – but is to, in engaging proactively with a multi-stakeholder model, to raise the need for critical thinking in the open government debate not only about the transparency and accountability of governments, but also about the transparency and accountability of other parties who are engaged.