Category Archives: Open Data

Open Data for Poverty Alleviation: Striking Poverty Discussion

Screen Shot 2013-02-03 at 08.43.29

[Summary: join an open discussion on the potential impacts of open data on poverty reduction]

Over the next two weeks, along with Tariq Kochar, Nitya V. Raman and Nathan Eagle, I’m taking part in an online panel hosted by the World Bank’s Striking Poverty platform to discuss the potential impacts of open data on poverty alleviation.

So far we’ve been asked to provide some starting statements on how we see open data and poverty might relate, and now there’s an open discussion where visitors to the site are invited to share their questions and reflections on the topic.

Here’s what I have down as my opening remarks:

Development is complex. No individual or group can process all the information needed to make sense of aid flows, trade patterns, government budgets, community resources and environmental factors (amongst other things) that affect development in a locality. That’s where data comes in: open datasets can be connected, combined and analysed to support debate, decision making and governance.

Projects like the International Aid Transparency Initiative (IATI) have sought to create the technical standards and political commitments for effective data sharing. IATI is putting together one corner of the poverty reduction jigsaw, with detailed and timely forward-looking information on aid. IATI open data can be used by governments to forecast spending, and by citizens to hold donors to account. This is the promise of open data: publish once, use many times and for many purposes.

But data does not use itself. Nor does it transcend political and practical realities. As the papers in a recent Journal of Community Informatics special issue highlight show, open data brings both promise and perils. Mobilising open data for social change requires focus and effort.

We’re only at the start of understanding open data impacts. In the upcoming Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC), the Web Foundation and partners will be looking at how open data affects governance in different countries and contexts across the world. Rather than look at open data in the abstract, the project will explore cases such as open data for budget monitoring in Brazil, or open data for poverty reduction in Uganda. This way it will build up a picture of the strategies that can be used to make a difference with data; it will analyse the role that technologies and intermediaries play in mobilising data; and it will also explore unintended consequences of open data.

I hope in this discussion we can similarly focus on particular places where open data has potential, and on the considerations needed to ensure the supply and use of open data has the best chance possible of improving lives worldwide.

What do you think? You can join the discussion for the next two weeks over on the Striking Poverty site…

Linked-Development: notes from Research to Impact at the iHub

[Summary: notes from a hackathon in Nairobi built around linked open data]

Research to Impact HackI’ve just got back from an energising week exploring open data and impact in Kenya, working with R4D and IDS at Nairobi’s iHub to run a three-day hackathon titled ‘Research to Impact’. You can read Pete Cranston’s blog posts on the event here (update: and iHub’s here). In this post, after a quick pre-amble, I reflect particularly on working with linked data as part of the event.

The idea behind the event was fairly simple: lots of researchers are producing reports and publications related to international development, and these are logged in catalogues like R4D and ELDIS, but often it stops there, and research doesn’t make it into the hands of those who can use it to bring about economic and social change. By opening up the data held on these resources, and then working with subject experts and developers, we were interested to see whether new ideas would emerge for taking research to where it is needed.

The Research to Impact hack focused in on ‘agriculture and nutrition’ research so that we could spend the first day working with a set of subject experts to identify the challenges research could help meet, and to map out the different actors who might be served by new digital tools. We were hosted for the whole event at the inspiring iHub and mLab venue by iHub Research. iHub provides a space for the growing Kenya tech community, acting as a meeting space, incubator and workspace for developers and designers. With over 10,000 members of it’s network, iHub also helped us to recruit around 20 developers who worked over the second two days of the hackathon to build prototype applications responding to the challenges identified on day one, and to the data available from R4D and IDS.

A big focus of the hackathon development turned out to be on mobile applications: as in Kenya mobile phones are the primary digital tool for accessing information. On day four, our developers met again with the subject experts, and pitched their creations to a judging panel, who awarded first, second and third prizes. Many of the apps created had zeroed in on a number of key issues: working through intermediaries (in this case, the agricultural extension worker), rather than trying to use tech to entirely disinter-mediate information flows; embedding research information into useful tools, rather than providing it through standalone portals (for example, a number of teams build apps which allowed extension workers to keep track of the farmers they were interacting with, and that could then use this information to suggest relevant research); and, most challengingly, the need for research abstracts and descriptions to be translated into easy-to-understand language that can fit into SMS-size packages. Over the coming weeks IDS and R4D are going to be exploring ways to work with some of the hackathon teams to take their ideas further.

Linked-development: exploring the potential of linked data

Linked Data StructureThe event also provided us with an opportunity to take forward explorations of how linked data might be a useful technology in supporting research knowledge sharing. I recently wrote a paper with Duncan Edwards of IDS exploring the potential of linked data for development communication, and I’ve been exploring linked data in development for a while. However, this time we were running a hackathon directly from a linked data source, which was a new experience.

Ahead of the event I set up linked-development.org as a way to integrate R4D data (already available in RDF), and ELDIS data (which I wrote a quick scraper for), both modelled using the FAO’s AGRIS model. In order to avoid having to teach SPARQL for access to the data, I also (after quite a steep learning curve) put together a very basic Puelia Linked Data API implementation over the top of the data. To allow for a common set of subject terms between the R4D and ELDIS data, I made use of the Maui NLP indexer to tag ELDIS agriculture and nutrition documents against the FAO’s Agrovoc (R4D already had editor assigned terms against this vocabulary), giving us a means of accessing the documents from the two datasets alongside each other.

The potential value of this approach become clear on the first day of the event, when one of the subject experts showed us their own repository of Kenyan-focussed agricultural research publications and resources, which was already modelled and theoretically accessible as RDF using the Agris model. Although our attempts to integrate this into our available dataset failed due to the Drupal site serving the data hitting memory limits (linked data still remains something that tends to need a lot of server power thrown at it, and that can have significant impacts where the relative cost of hosting and tech capacity is high), the potential to bring more local content into linked-development.org alongside data from R4D and ELDIS was noted by many of the developers taking part as something which would be likely to make their applications a lot more successful and useful: ensuring that the available information is built around users needs, not around organisational or project boundaries.

At the start of the developer days, we offered a range of ways for developers to access the research meta-data on offer. We highlighted the linked data API, the ELDIS API (although it only provided access to one of the datasets, I found it would be possible for us to create an compatible API speaking to the linked data in future), and SPARQL as means to work with the data. Feedback forms from the event suggest that formats like JSON were new to many of our participants, and linked data was a new concept to all. However, in the end, most teams chose to use some of the prepared SPARQL queries to access the data, returning results as JSON into PHP or Python. In practice, over the two days this did not end up realising the full value of linked data, as teams generally appeared to use code samples to pull SPARQL ‘SELECT’ result sets into relational databases, and then to build their applications from there (a common issue I’ve noted at hack days, where the first step of developers is to take data into the platform they use most). However, a number of teams were starting to think about both how they could use more advanced queries or direct access to the linked data through code libraries in future, and most strikingly, were talking about how they might be able to write data back to the linked-development.org data store.

This struck me as particularly interesting. A lot of the problems teams faced in creating their application was that the research meta-data available was not customised to agricultural extension workers or farmers. Abstracts would need to be re-written and translated. Good quality information needed to be tagged. New classifications of the resources were needed, such as tagging research that is useful in the planting season. Social features on mobile apps could help discover who likes what and could be used to rate research. However, without a means to write back to the shared data store, all this added value will only ever exist in the local and fragmented ecosystems around particular applications. Getting feedback to researchers about whether their research was useful was also high on the priority list of our developers: yet without somewhere to put this feedback, and a commitment from upstream intermediaries like R4D and ELDIS to play a role feeding back to authors, this would be very difficult to do effectively.

This links to one of the points that came out in our early IKM Emergent work on linked data, noting that the relatively high costs and complexity of the technology, and the way in which servers and services are constructed, may lead to an information environment dominated by those with the capacity to publish; but that it has the potential, with the right platforms, configurations and outreach, to bring about a more pluralistic space, where the annotations from local users of information can be linked with, and equally accessible as, the research meta-data coming from government funded projects. I wish we had thought about this more in advance of the hackathon, and provided each team with a way to write data back to the linked-development.org triple store (e.g. giving them named graphs to write to; and providing some simple code samples or APIs), as I suspect this would have opened up a whole new range of spaces for innovation.

Overall though, the linked-development.org prototype appears to have done some useful work, not least providing a layer to connect two DFID funded projects working on mobilising research. I hope it is something we can build upon in future.

Final papers in JCI Special Issue on Open Data

Earlier this year I blogged about the first release of papers on Open Data in a Special Issue of the Journal of Community Informatics that I had been co-editing with Zainab Bawa. A few days ago we added the last few papers to the issue, finalising it as a collection of critical thinking about the development of Open Government Data.

You can find the full table of contents below (new papers noted with (New)).

Table of Contents

Editorial

The Promises and Perils of Open Government Data (OGD), Tim G. Davies, Zainab Ashraf Bawa

Two Worlds of Open Government Data: Getting the Lowdown on Public Toilets in Chennai and Other Matters, Michael Gurstein

Articles

The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment, Bhuvaneswari Raman

“This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative, Jo Bates

Data Template For District Economic Planning, Sharadini Rath

Guidelines for Designing Deliberative Digital Habitats: Learning from e-Participation for Open Data Initiatives, Fiorella De Cindio

(New) Unintended Behavioural Consequences of Publishing Performance Data: Is More Always Better?, Simon McGinnes, Kasturi Muthu Elandy

(New) Open Government Data and the Right to Information: Opportunities and Obstacles, Katleen Janssen

Notes from the field

Mapping the Tso Kar basin in Ladakh, Shashank Srinivasan

Collecting data in Chennai City and the limits of openness, Nithya V Raman

Apps For Amsterdam, Tom Demeyer

Open Data – what the citizens really want, Wolfgang Both

(New) Trustworthy Records and Open Data, Anne Catherine Thurston

(New) Exploring the politics of Free/Libre/Open Source Software (FLOSS) in the context of contemporary South Africa; how are open policies implemented in practice?, Asne Kvale Handlykken

Points of View

Some Observations on the Practice of “Open Data” As Opposed to Its Promise, Roland J. Cole

How might open data contribute to good governance?

[Summary: sharing an introductory article on open data and governance]

Thanks to an invite via the the great folk at CYEC, earlier this year I was asked to write a contribution for the Commonwealth Governance Handbook around emerging technology trends, so I put down a few thoughts on how open data might contribute to good governance in a Commonwealth context. The book isn’t quite out yet, but as I’m preparing for the next few days I’ll be spending at an IDRC Information and Networks workshop with lots of open access advocates, talking about open data and governance, I thought I should at least get a pre-print uploaded. So here is the PDF for download.

The article starts:

Access to information is increasingly recognised as a fundamental component of good governance. Citizens need access to information on the decision-making processes of government, and on the performance of the state to be able to hold governments to account.

And ends by saying:

Whether open data initiatives will fully live up to high expectations many have for them remains to be seen. However, it is likely that open data will come to play a part in the governance landscape across many Commonwealth countries in coming years, and indeed, could provide a much needed tool to increase the transparency of Commonwealth institutions. Good governance, pro-social and civic outcomes of open data are not inevitable, but with critical attention they can be realised?.

The bit in-between tries to provide a short introduction to open data for beginners, and to consider some of the ways open data and governance meet, drawing particular on examples from the Commonwealth.

Comments and feedback welcome.

Download paper: PDF (128Kb)

Opening the National Pupil Database?

[Summary: some preparatory notes for a response to the National Pupil Database consultation]

The Department for Education are currently consulting on changing the regulations that govern who can gain access to the National Pupil Database (NPD). The NPD holds detailed data on every student in England, going back over ten years, and covering topics from test and exam results, to information on gender, ethnicity, first language, eligibility for free school meals, special educational needs, and detailed information on absences or school exclusion. At present, only a specified list of government bodies are able to access the data, with the exception that it can be shared with suitably approved “persons conducting research into the educational achievements of pupils”. The DFE consultation proposed opening up access to a far wider range of users, in order to maximise the value of this rich dataset.

The idea that government should maximise the value of the data it holds has been well articulated in the open data policies and white paper that suggests open data can be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”. However, the open data movement has always been pretty unequivocal on the claim that ‘personal data’ is not ‘open data’ – yet the DFE proposals seek to apply an open data logic to what is fundamentally a personal, private and sensitive dataset.

The DFE is not, in practice, proposing that the NPD is turned into an open dataset, but it is consulting on the idea that it should be available not only for a wider range of research purposes, but also to “stimulate the market for a broader range of services underpinned by the data, not necessarily related to educational achievement”. Users of the data would still go through an application process, with requests for the most sensitive data subject to additional review, and users agreeing to hold the data securely: but, the data, including easily de-anonymised individual level records, would still be given out to a far wider range of actors, with increased potential for data leakage and abuse.

Consultation and consent

I left school in 2001 and further education in 2003, so as far as I can tell, little of my data is captured by the NPD – but, if it was, it would have been captured based not on my consent to it being handled, but simple on the basis that it was collected as an essential part of running the school system. The consultation documents state that  ”The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices”, yet, it would be hard to argue this would constitute informed consent for the data to now be shared with commercial parties for uses far beyond the delivery of education services.

In the case of the NPD, it would appear particularly important to consult with children and young people on their views of the changes – as it is, after all, their personal data held in the NPD. However the DFE website shows no evidence of particular efforts being taken to make the consultation accessible to under 18s. I suspect a carefully conducted consultation with diverse groups of children and young people would be very instructive to guide decision making in the DFE.

The strongest argument for reforming the current regulations in the consultation document is that, in the past, the DFE has had to turn down requests to use the data for research which appears to be in the interests of children and young people’s wellbeing. For example, “research looking at the lifestyle/health of children; sexual exploitation of children; the impact of school travel on the environment; and mortality rates for children with SEN”. It might well be that, consulted on whether the would be happy for their data to be used in such research, many children, young people and parents would be happy to permit a wider wording of the research permissions for the NPD, but I would be surprised if most would happily consent to just about anyone being able to request access to their sensitive data. We should also note that, whilst some of the research DFE has turned down sound compelling, this does not necessarily mean this research could not happen in any other way: nor that it could not be conducted by securing explicit opt-in consent. Data protection principles that require data to only be used for the purpose it was collected cannot just be thrown away because they are inconvenient, and even if consultation does highlight people may be willing for some wider sharing of their personal data for good, it is not clear this can be applied retroactively to data already collected.

Personal data, state data, open data

The NPD consultation raises an important issue about the data that the state has a right to share, and the data it holds in trust. Aggregate, non-disclosive information about the performance of public services is data the state has a clear right to share and is within the scope of open data. Detailed data on individuals that it may need to collect for the purpose of administration, and generating that aggregate data, is data held in trust – not data to be openly shared.

However, there are many ways to aggregate or process a dataset – and many different non-personally identifying products that could be built from a dataset, Many of these government will never have the need to create – yet they could bring social and economic value. So perhaps there are spaces to balance the potential value in personally sensitive datasets with the the necessary primacy of data protection principles.

Practice accommodations: creating open data products

In his article for the Open Data Special Issue of the Journal of Community Informatics I edited earlier this year, Rollie Cole talks about ‘practice accommodations’ between open and closed data. Getting these accommodations right for datasets like the NPD will require careful thought and could benefit from innovation in data governance structures. In early announcements of the Public Data Corporation (now the Public Data Group and Open Data User Group), there was a description of how the PDC could “facilitate or create a vehicle that can attract private investment as needed to support its operations and to create value for the taxpayer”. At the time I read this as exploring the possibility that a PDC could help private actors with an interest in public data products that were beyond the public task of the state, but were best gathered or created through state structures, to pool resources to create or release this data. I’m not sure that’s how the authors of the point intended it, but the idea potentially has some value around the NPD. For example, if there is a demand for better “demographic models [that can be] used by the public and commercial sectors to inform planning and investment decisions” derived from the NPD, are there ways in which new structures, perhaps state-linked co-operatives, or trusted bodies like the Open Data Institute, can pool investment to create these products, and to release them as open data? This would ensure access to sensitive personal data remained tightly controlled, but would enable more of the potential value in a dataset like NPD to be made available through more diverse open aggregated non-personal data products.

Such structures would still need good governance, including open peer-review of any anonymisation taking place, to ensure it was robust.

The counter argument to such an accommodation might be that it would still stifle innovation, by leaving some barriers to data access in place. However, the alternative, of DFE staff assessing each application for access to the NPD, and having to make a decision on whether a commercial re-use of the data is justified, and the requestor has adequate safeguards in place to manage the data effectively, also involves barriers to access – and involves more risk – so the counter argument may not take us that far.

I’m not suggesting this model would necessarily work – but introduce it to highlight that there are ways to increase the value gained from data without just handing it out in ways that inevitably increase the chance it will be leaked or mis-used.

A test case?

The NPD consultation presents a critical test case for advocates of opening government data. It requires us to articulate more clearly the different kinds of data the state holds, to be be much more nuanced about the different regimes of access that are appropriate for different kinds of data, and to consider the relative importance of values like privacy over ideas of exploiting value in datasets.

I can only hope DFE listen to the consultation responses they get, and give their proposals a serious rethink.

 

Further reading and action: Privacy International and Open Rights Group are both preparing group consultation inputs, and welcome input from anyone with views of expert insights to offer.

Complexity and complementarity – why more raw material alone won’t necessarily bring open data driven growth

[Summary: reflections on an open data hack day, complexity, and complements to open data for economic and social impact. Cross posted from Open Data Impacts blog.]

“Data is the raw material of the 21st Century”.

It’s a claim that has been made in various forms by former US CIO Vivek Kundra (PDF), by large consultancies and tech commentators, and that is regularly repeated in speeches by UK Cabinet Office Minister Francis Maude, mostly in relation to the drive to open up government data. This raw material, it is hoped, will bring about new forms of economic activity and growth. There is certainly evidence to suggest that for some forms of government data, particularly ‘infrastructural’ data, moving to free and open access can stimulate economic activity. But, for many open data advocates, the evidence is not showing the sorts of returns on investment, or even the ‘gold rush’ of developers picking over data catalogues to exploit newly available data that they had expected.

At a hack-event held at the soon-to-be-launched Open Data Institute in London this week, a number of speakers highlighted the challenge of getting open data used: the portals are built, but the users do not necessarily come. Data quality, poor meta-data, inaccessible language, and the difficulty of finding wheat amongst the chaff of data were all diagnosed as part of the problem, with some interesting interfaces and tools developed to try and improve data description and discovery. Yet these diagnosis and solutions are still based on linear thinking: when a dataset is truly accessible, then it will be used, and economic benefits will flow.

Owen Barder identifies the same sort of linear thinking in much macro-economic international development policy of the 70s and 80s in his recent Development Drums podcast lecture on complexity and development. The lecture explores the question of how countries with similar levels of ‘raw materials’ in terms of human and physical capital, could have had such different growth rates over the last half century. The answer, it suggests, lies in the complexity of economic development – where we need not just raw materials, but diverse sets of skills and supply chains, frameworks, cultures and practices. Making the raw materials available is rarely enough for economic growth. And this something that open data advocates focussed on economic returns on data need to grapple with.

Thinking about open data use as part of a complex system involves paying attention to many different dimensions of the environment around data. Jose Alonso highlights “the political, legal, organisation, social, technical and economic” as all being important areas to focus on. One way of grounding notions of complexity in thinking about open data use, that I was introduced to in working on a paper with George Kuk last year, is through the concept of ‘complementarity’. Essentially A complements B if A and B together are more than the sum of their parts. For example, a mobile phone application and an app store are complements: as the software in one, needs the business model and delivery mechanisms in the other in order to be used.

The challenge then is to identify all the things that may complement open data for a particular use; or, more importantly, to identify all those processes already out there in the economy to which certain open data sets are a complement. Whilst the example above of complements appears at first glance technological (apps and app stores), behind it are economic, social and legal complementarities, amongst others. Investors, payment processing services, app store business models, remmitance to developers, and often-times, stable jobs for developers in an existing buoyant IT industry that allow them to either work on apps for fun in spare time, or to leave work with enough capital to take a risk on building their own applications are all part of the economic background. Developer meet-ups, online fora, clear licensing of data, no fear of state censorship of applications built and so-on contribute to the social and legal background. These parts of the complex landscape generally cannot be centrally planned or controlled, but equally they cannot be ignored when we are asking why the provision of a raw material has not brought about anticipated use.

As I start work on the ‘Exploring the Emerging Impacts of Open Data in the South‘ project with the Web Foundation and IDRC, understanding the possible complements of open data for economic, political and social use may provide one route to explore which countries and contexts are likely to see strong returns from open data policy, and to see what sorts of strategies states, donors and communities can adopt to increase their opportunity to gain potential benefits and avoid possible pitfalls of greater access to open data. Perhaps for further Open Data Institute hack days, it can also encourage more action to address the complex landscape in which open data sits, rather than just linear extensions of data platforms that exist in the hope that the users will eventually come*.

Where co-operatives and open data meet…

[Summary: thoughts on ways in which co-operatives could engage with open data]

With the paper I worked on with Web Science Trust for Nominet Trust on ‘Open Data and Charities‘ just released (find the PDF for download here), and this post on ‘Open Data and Co-operatives’ it might feel like I’m just churning through a formula for working on ‘organisation structure’ + ‘open data’ for writing articles and blog posts. It is however, just a fortuitous co-incidence of timing, thanks to a great event organised today by Open Data Manchester and Co-operatives UK.

The event was a workshop on ‘Co-operative business models for open data‘ and involved an exploration of some of the different ways in which co-operatives might have a role to play in creating, sharing and managing open data resources. Below are my notes from some of the presentations and discussions, and some added reflections jotted down during this write-up.

What are co-operatives?

Many people in the UK are familiar with the high-street retail co-operative; but there are thousands more co-operatives in the UK active in all sectors of the economy; and the co-operative is a business form established right across the world.

The co-operative is a model of business ownership and governance. Unlike limited or public companies which are owned and essential run in the interests of their shareholders, co-operatives are owned by their members, and are run in the interest of those members. Co-ops legal expert Ged explained this still leaves a vast range of possible governance models for co-ops, depending on who the members are, and how they are structured. For example, the retail coop is a ‘Consumers’ co-operative, where shoppers who use its services can become members and have a say in the governance of the institution. By contrast, the John Lewis Partnership is an employee owned, or ‘producer’ co-operative, which is run for the collective benefit of its staff. Some co-operatives are jointly owned by producers and consumers, and others, like Co-ops UK are owned by their member organisations – existing to provide a service to other co-ops.

There’s been a lot of focus on co-ops in recent years. This year is UN Year of the Co-operative, and the current UK Government has talked a lot about mutualisation of public services.

What do co-operatives have to do with open data?

There are many different perspectives on what open data is, but at its most basic, open data involves making datasets accessible online, in standard formats, and under licenses that allow them to be re-used. In discussions we explored a range of ways in which co-operative structures might meet open data.

Share: Co-operatives sharing data

As businesses, co-operatives have a wide range of data they might consider making available as open data. Discussions in today’s workshop highlighted the wide variety of possible data: from locations of retail coop outlets, to energy usage data gathered by an energy co-operative, or turnstile data from a co-operative football club.

Co-operatives might also hold datasets that contain personal or commercially sensitive data, such as the records held by the co-operative bank, or the shopping data held by the retail co-operative, but that could be used to generate derived datasets that could be made openly available to support innovation, or to inform action on key social challenges.

There are a number of motivations for co-ops to release data as open data:

  • Firstly, releasing data may allow others to re-use it in a way that benefits the coop economy. For example, Co-operatives UK recently released a mobile app for locating a wide range of co-ops and retail outlets. If the data for this was also available, third parties could build information on coop services into their own apps, tools and services, potentially increasing awareness of co-operatives.
  • Secondly, sharing data might support the wider social aims of a co-operative. For example, an energy co-operative might have gathered lots of data on the sorts of renewable energy sources that work in different settings, and sharing this data openly would support other people working on sustainable energy to make better choices; or retail co-operatives might share information on the grants they give to community groups in a structured form in a way that would support them to better target resources on areas with the most impact.
  • Thirdly, transparency, accountability and trust might be important drivers for co-ops to release data – with open data supporting new models of co-operative governance. For example, co-ops might release detailed financial information as open data to allow their members to understand their performance, or to analyse staff remuneration. Or a coop might provide aggregate data on its supply chain to show how it is improving the percentage of supplies from other co-operatives or from Fairtrade suppliers. For public service co-operatives, like the Youth Mutual forming in Lambeth, it may be important to publish structured data on how public money is being spent, ensuring that the contracting out of services through co-operatives does not undermine the local authority spending transparency that has been established over recent years.

 

Collaborate: Co-operatives as data sharing clubs

Discussions also looked at how we can put data into co-operatives, rather than get data out. A lot of the open data agenda so far has focussed on open data from government (OGD), but often the data needed to answer key questions comes from a variety of stakeholders, including governments, community groups and individuals.

Co-operatives could provide a model to manage the ownership of shared data resources. Most open data licenses are still based upon data being owned somewhere (apart from CC-zero, and Public Domain Dedications which effectively waive ownership rights over a dataset). Co-operatives can provide a model for ownership of open data resources, giving different stakeholders a say in how shared data is managed. For example, if government releases a dataset of public transport provision, and invites citizens and organisations to take part in crowdsourced improvement of the data, people may be reluctant to contribute if the data is just going back into state ownership. However, if contributors to the improved dataset also gain a shared stake in ownership of that enhanced data, they may be more interested in giving their input. This was an issue that came up at the PMOD conference in Brussels last month.

We also discussed how co-operative structures could provide a vehicle for combining open and private data, or for the limited pooling of private data. For example, under the MiData programme, government is working to give citizens better access to their personal data from corporations, such as phone and energy companies. Pooling their personal data (in secure, non-open ways) could allow consumers to get better deals on products or to engage in collective purchasing. Undoubtedly private companies will emerge offering services based on pooled personal data, but where this sort of activity takes place through co-operative structures, consumers sharing their data can have a guarantee that the benefits of the pooled data are being shared amongst the contributors to it, not appropriated by some private party.

Create and curate: Co-operative governance of datasets and portals

Linked to the idea of co-operatives as data sharing clubs, Julian Tait highlighted the potential for co-operative governance of data portals – taking a mutual approach to managing the meta-data and services that they provide.

As I’ve argued elsewhere, open data portals need to go beyond just listing datasets, to also be a hub of engagement – building the capacity of diverse groups to make use of data.

Ideas of joint producer and consumer co-operatives might also provide a means to involve users of data in deciding how data is created and collected. Choices made about data schemas, frequency of update etc. can have a big impact on what can be done with data – yet users of data are rarely involved in these choices.

Mobilise: Collaborating to add value to data

The claim is often implicitly or explicitly made that publishing this data will lead to all sort of benefits, from greater transparency, accountability and trust, to innovation and economic growth.

However, looked at in detail, we find that there are many elements to the value chain between raw open data and social or economic value. Data may need cleaning, linking, contextualising, analysing and interpreting before it can be effectively used. In talking about the Swirl business model for open data, Ric Roberts explained that if you charge too early on in the value chain for data, it will be underused. However, efforts to add value to data in the open can suffer a public good problem – everyone benefits, but no-one wants to cover the full cost alone. If everyone duplicates the tasks involved in adding value to data, less will be done – so establishing co-operative structures around data in particular areas or sectors might provide a means to pool efforts on improving data, adding value, and generating shared tools and services with data that can benefit all the members of a coop.

This might be something we explore in thinking about a ‘commissioning fund’ around the International Aid Transparency Initiative to help different stakeholders in IATI to pool resources to develop useful tools and services based on the data.

Where next?

We ended today’s workshop by setting up a Google Document to develop a short paper on co-operatives and open data. You can find the draft here, and join in to help fill out a map of all the different ways co-ops could engage with open data, and to develop plans for some pilots and shared activities to explore the co-operative-opendata connection more.

Keep an eye on the Co-operative News ‘Open’ pages for more on the co-operative open data journey.

How data.gov.uk is laying foundations for open data engagement

Originally posted as a Guest Post on data.gov.uk

When the first data.gov.uk platform was launched, it was a great example of the ‘rewired state’ spirit: pioneering the rapid development of a new digital part of government using open source code, and developed through fluid collaboration between government staff, academics, open source developers, and open data activists from outside government. But essentially, the first data.gov.uk was bolted onto the existing machinery of government: a data outpost scraping together details of datasets from across departments, and acting as the broker providing the world with information on where to access that data. And it is fair to say data.gov.uk was designed by data-geeks, for data-geeks.

Tom Steinberg has argued that data portals need not appeal to the masses , and that most people will access government data through apps, but there are thousands of citizens who want direct access to data, and it is vital that data portals don’t exclude those unfamiliar with the design metaphors of source and software repositories. That’s why it is great to see a redesign of data.gov.uk that takes steps to simplify the user experience for anyone seeking out data, whether as a techie, or not.

The most interesting changes to data.gov.uk though are more subtle than the cleaner navigation and unexpected (but refreshing) green colour scheme. Behind the scenes Antonio Acuna and his team have been overhauling the admin system where data records are managed, with some important implications. Firstly, the site includes a clear hierarchy of publishing organisations (over 700 of them) and somewhere in each hierarchy there is a named contact to be found. That means that when you’re looking at any dataset it’s now easier to find out who you can contact to ask questions about it, or, if the data doesn’t tell you what you want, the new data.gov.uk lets you exercise your Right to Information (and hopefully soon Right to Data) and points you to how you can submit a Freedom of Information request.

Whilst at first most of these enquiries will go off to the lead person in each publishing organisation who updates their records ondata.gov.uk, the site allows contact details to be set at the dataset level, moving towards the idea of data catalogues not as a firewall sitting between government and citizens, but as the starting point of a conversation between data owners/data stewards and citizens with an interest in the data. Using data to generate conversation, and more citizen-state collaboration, is one of the key ideas in the 5 stars for open data engagement , drafted at this year’s UKGovCamp.

The addition of a Library section with space  for detailed documentation on datasets, including space to share the PDF handbooks that often accompany complex datasets and that share lots of the context that can’t be reduced down into neat meta-data, is a valuable addition too. I hope we’ll see a lot more of the ‘social life’ of the datasets that government holds becoming apparent on the new site over time – highlighting that not only can data be used to tell stories, but that there is a story behind each dataset too.

Open data portals have a hard balance to strike – between providing ‘raw’ datasets and disintermediating data, separating data from the analysis and presentation layers government often fixes on top – and becoming new intermediaries, giving citizens and developers the tools they need to effectively access data. Data portals take a range of approaches, and most are still a long way from striking the perfect balance. But the re-launched data.gov.uk lays some important foundations for a continued focus on user needs, and making sure citizens get the data they need, and, in the future, access to all the tools and resources that can help them make sense of it, whether those tools come from government or not.

What does Internet Governance have to do with open data?

[Summary: What do Internet Governance and Open Data have to do with each other?]

As a proposal I worked on for a workshop at this years Internet Governance Forum on the Internet Governance issues of Open Government Data has been accepted, I’ve been starting to think through the different issues that the background paper for that session will need to cover. This week I took advantage of a chance to guest blog over on the Commonwealth IGF website to start setting them out. 

It started with high profile Open Government Data portals like Data.gov in the US, and Data.gov.uk in the UK giving citizens access to hundreds of government datasets. Now, open data has become a key area of focus for many countries across the world, forming a core element of the Open Government Partnership agenda, and sparking a plethora ofInternational conferencesevents and online communities. Proponents of open data argue it has the potential to stimulate economic growth, promote transparency and accountability of governments, and to support improved delivery of public services. This year’s Internet Governance Forum in Baku will see a number of open data focussed workshops, following on from open data and PSI panels in previous years. But when it comes to Open Data and Internet Governance, what are the issues we might need to explore? This post is a first attempt to sketch out some of the possible areas of debate.

In 2009 David Eaves put forward ‘three laws of open government data‘ that describe what it takes for a dataset to be considered effectively open. They boil down to requirements that data should be accessible online, machine readable, and under licenses that permit re-use. Explore these three facets of open data offers one route into potential internet governance issues that need to be critically discussed if the potential benefits of open data are to be secured in equitable ways.

1) Open Data as data accessible online

Online accessibility does not equate to effective access, and we should be attentive to new data divides. We also need to address bandwidth for open data, the design of open data platforms, cross-border cloud hosting of open data, and to connect open data and internet freedom issues. Furthermore, the online accessibility of public data may create or compound privacy and security issues that need addressing.

Underlying the democratic arguments for open data is the idea that citizens should have access to any data that affects their lives, to be able to use and analyse it for themselves, to critique official interpretations, and to offer policy alternatives. Economic growth arguments for open data often note the importance of a reliable, timely supply of data on which innovative products and services can be built. But being able to use data for democratic engagement, to support economic activity, is not just a matter of having the data – it also requires the skills to use it. Michael Gurstein has highlighted the risk that open data might ‘empower the empowered’ creating a new ‘data divide’. Addressing grassroots skills to use data, ensuring countries have capacity to exploit their own national open data, and identifying the sorts of intermediary institutions and capacity building to ensure citizens can make effective use of open data is a key challenge.

There are also technical dimensions of the data divide. Many open data infrastructures have developed in environment of virtually unlimited bandwidth, and are based on the assumption that transferring large data files is not problematic: an assumption that cannot be made everywhere in the world. Digital interfaces for working with data often rely on full size computers, and large datasets can be difficult to work with on mobile platforms. As past IGF cloud computing discussions have highlighted, where data is hosted may also matter. Placing public data, albeit openly licensed so sidestepping some of the legal issues, into cloud hosting, could have impacts on the accessibility, and the costs of a access, to that data. How far this becomes an issue may depend on the scale of open data programmes, which as yet can only constitute a very small proportion of Internet traffic in any country. However, when data that matters to citizens is hosted in a range of different jurisdictions, Internet Freedom and filtering issues may have a bearing on who really has access to open data. As Walid Al-Saqaf’s power presentation at the Open Government Partnership highlighted, openness in public debate can be dramatically restricted when governments have arbitrary Internet filtering powers.

Last, but not least, in the data accessibility issues, whilst most advocates of open data explicitly state that they are concerned only with public data, and exclude personal datafrom the discussion, the boundaries between these two categories are often blurred (for example, court records are about individuals, but might also be a matter of public record), and with many independently published open datasets based on aggregated or anonymised personal data, plus with large-scale datasets harvested from social media and held by companies, ‘jigsaw identification’, in which machines can infer lots of potentially sensitive and personal facts about individuals becomes a concern. As Cole outlines, in the past we have dealt with some of these concerns by ad-hoc limitations and negotiated access to data. Unrestricted access to open data online removes these strategies, and highlights the importance of finding other solutions that protect keydimensions of individual privacy.

2) Open data as machine readable

Publishing datasets involves selecting formats and standards which impact on what the data can express and how it can be used. Often standard setting can have profound political consequences, yet it can be treated as a purely technical issue.

Standards are developing for everything from public transport timetables (GTFS), to data on aid projects (IATI). These standards specify the format data should be shared in, and what the data can express. If open data publishers want to take advantage of particular tools and services, they may be encouraged to chose particular data standards. In some areas, no standards exist, and competing open and non-open standards are developing. Sometimes, because of legacy systems, datasets are tied into non-open standards, creating a pressure to develop new open alternatives.

Some data formats offer more flexibility than others, but usually with connected increase in complexity. The common CSV format of flat data, accessing in spreadsheet software, does not make it easy to annotate or extend standardised data to cope with local contexts. eXtensible Markup Language makes extending data easier, and Linked Data offers the possibility of annotating data, but these formats often present barriers for users without specialist skills or training. As a whole web of new standards, code lists and identifiers are developed to represent growing quantities of open data, we need to askwho is involved in setting standards and how can we make sure that global standards for open data promote, rather than restrict, the freedom of local groups to explore and address the diverse issues that concern them.

3) Open data as licensed for re-use

Many uses case for open data rely on the ability to combine datasets, and this makesc ompatible licenses a vital issue. In developing license frameworks, we should engage with debates over who benefits from open data and how norms and licenses can support community claims to benefit from their data.

Open Source and Creative Commons licenses often include terms such as a requirement to ‘Share Alike’, or a Non-Commercial clause prohibiting profit making use of the content. These place restrictions on re-users of the content: for example, if you use Share Alike licensed content to in your work, you must share your work under the same license. However, open data advocates argue that terms like this quickly create challenges for combining different datasets, as differently licensed data may be incompatible, and many of the benefits of having access to the data will be lost when it can’t be mashed up and remixed using both commercial and non-commercial tools. The widely cited OpenDefinition.org states that at most, licenses can require attribution of the source, but cannot place any other restrictions on data re-use. Developing a common framework for licensing has been a significant concern in many past governance discussions of open data.

These discussions of common licenses have connections to past Access to Knowledge (A2K) debates where the rights of communities to govern access to traditional knowledges, or to gain a return from use of traditional knowledge have taken place. An open licensing framework creates the possibility that, without a level playing field of access to resources to use data (i.e. data divides), some powerful actors might exploit open data to their advantage, and to the loss of those who have stewarded that data in the past. Identifying community norms, and other responses to addresses these issues is an area for discussion.

Further issues?

I’ve tried to set out some of the areas where debates on open data might connect with existing or emerging internet governance debates. In the workshop I’m planning for this years IGF I am hoping we will be able to dig into these issues in more depth to identify how far they are issues for the IGF, or for other fora, and to develop ideas on different constructive approaches to support equitable outcomes from open data. I’m sure the issues above don’t cover all those we might address, so do drop in a comment below to share your suggestions for other areas we need to discuss…

Further reading:

(Other suggested references welcome too…)

Addition: over on the CIGF post Andrew has already suggested an extra reference to Tom Slee’s thought provoking blog post on ‘Seeing like a geek’ that emphasises the importance of putting licensing issues very much on the table in governance debates.

 

Open data: embracing the tough questions – new publications

[Summary: launching open data special issue of Journal of Community Informatics, and a new IKM Emergent paper] (Cross posted from Open Data Impacts blog)

Two open data related publications I’ve been working on have made it to the web in the last few days. Having spent a lot of the last few years working to support organisations to explore the possibilities of open data, these feel like they represent a more critical strand of exploring OGD, trying to embrace and engage with, rather than to avoid the tough questions. I’m hoping, however, they both offer something to the ongoing and unfolding debate about how to use open data in the interests of positive social change.

Special Issue of JoCI on Open Government Data
The first is a Special Issue of the Journal of Community Informatics on Open Government Data (OGD) bringing together four new papers, five field notes, and two editorials that critically explore how Open Government Data policies and practices are playing out across the world. All the papers and notes draw upon empirical study and grassroots experiences in order to explore key challenges of, and challenges to, OGD.

Nitya Raman’s note on “Collecting data in Chennai City and the limits of Openness” and Tom Demeyer’s account of putting together an application competition in Amsterdam explore some of the challenges of accessing and opening up government datasets in very different contexts, highlighting the complex realities involved in securing ongoing access to reliable government data. Papers from Sharadini Rath (on using government data to influence local planning in India), and Fiorella De Cindo (on designing deliberative digital spaces), explore the challenges of taking open data into civic discussions and policy making – recognising the role that platforms, politics and social dynamics play in enabling, and putting the brakes on, open data as a tool to drive change. A field note from Wolfgang Both and a point of view note from Rolie Cole on “The practice of open data as opposed to it’s promise” highlight that any OGD initiative involves choices about the data to priotise, and the compromises to make between competing agendas when it comes to opening data. Shashank Srinivasan’s note on Mapping the Tso Kar basin in Ladakh, using GIS systems to represent the Changpa tribal people’s interaction with the land also draws attention to the key role that technical systems and architectures play in making certain information visible, and the need to look for the data that is missing from official records.

Unlike many reports and white papers on OGD out there, which focus solely on potential positive benefits, a number of the papers in the issue also take the important step of looking at the potential for OGD to cause harm, or for OGD agendas to be co-opted against the interests of citizens and communities. Bhuvaneswari Raman’s paper
The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment
puts power front and centre of an analysis of how the impacts of open data may play out, and Jo Bates “This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative questions whether UK open data policy has become a fig-leaf for marketisation of public services and neoliberal reforms in the state.

These challenges to open government data, questioning whether OGD does (or even can?) deliver on promises to promote democratic engagement and citizen empowerment are, well, challenging. Advocates of OGD may initially want to ignore these critical cases, or to jump straight to sketching ‘patches’ and pragmatic fixes that route around these challenges. However, I suspect the positive potential of OGD will be closer when we more deeply engage with these critiques, and when in the advocacy and architecture of OGD we find ways to embrace tough questions of power and local context.

(Zainab and I have tried to provide a longer summary weaving together some of these issues in our editorial essay here, although we see this very much as the start, rather than end-point, of an exploration…)

More to come: I’ve been working on the journal issue for just over a year with my co-editor Zainab Bawa, and at the invitation of Michael Gurstein, who has also been fantastically supportive in us publishing this as a ‘rolling issue’. That means we’re going to be adding to the issue over the coming months, and this is just the first batch of papers available to start feeding into discussions and debates now, particuarly ahead of the Open Government Partnership meeting in Brasilia next week where IDRC, Berkman Centre and the World Wide Web Foundation are hosting a discussion to develop future research agendas on the impacts of Open Government Data.

ICT for or against development? Exploring linked and open data in development

The second publication is a report I worked on last year with Mike Powel and Keisha Taylor for the IKM Emergent programme, under the title: ICT for or against development? An introduction to the ongoing case of Web 3” (PDF). The paper asks whether the International Development sector has historically adopted ICT innovations in ways that empower the subjects of development and to deliver sustainable improvements for those whose lives ” are blighted by poverty, ill-health, insecurity and lack of opportunity”, and looks at where the opportunities and challenges might lie in the adoption of open and linked data technologies in the development sector. It’s online as a PDF here, and summaries are available in English, Spanish and French