Open data: embracing the tough questions – new publications

[Summary: launching open data special issue of Journal of Community Informatics, and a new IKM Emergent paper] (Cross posted from Open Data Impacts blog)

Two open data related publications I’ve been working on have made it to the web in the last few days. Having spent a lot of the last few years working to support organisations to explore the possibilities of open data, these feel like they represent a more critical strand of exploring OGD, trying to embrace and engage with, rather than to avoid the tough questions. I’m hoping, however, they both offer something to the ongoing and unfolding debate about how to use open data in the interests of positive social change.

Special Issue of JoCI on Open Government Data
The first is a Special Issue of the Journal of Community Informatics on Open Government Data (OGD) bringing together four new papers, five field notes, and two editorials that critically explore how Open Government Data policies and practices are playing out across the world. All the papers and notes draw upon empirical study and grassroots experiences in order to explore key challenges of, and challenges to, OGD.

Nitya Raman’s note on “Collecting data in Chennai City and the limits of Openness” and Tom Demeyer’s account of putting together an application competition in Amsterdam explore some of the challenges of accessing and opening up government datasets in very different contexts, highlighting the complex realities involved in securing ongoing access to reliable government data. Papers from Sharadini Rath (on using government data to influence local planning in India), and Fiorella De Cindo (on designing deliberative digital spaces), explore the challenges of taking open data into civic discussions and policy making – recognising the role that platforms, politics and social dynamics play in enabling, and putting the brakes on, open data as a tool to drive change. A field note from Wolfgang Both and a point of view note from Rolie Cole on “The practice of open data as opposed to it’s promise” highlight that any OGD initiative involves choices about the data to priotise, and the compromises to make between competing agendas when it comes to opening data. Shashank Srinivasan’s note on Mapping the Tso Kar basin in Ladakh, using GIS systems to represent the Changpa tribal people’s interaction with the land also draws attention to the key role that technical systems and architectures play in making certain information visible, and the need to look for the data that is missing from official records.

Unlike many reports and white papers on OGD out there, which focus solely on potential positive benefits, a number of the papers in the issue also take the important step of looking at the potential for OGD to cause harm, or for OGD agendas to be co-opted against the interests of citizens and communities. Bhuvaneswari Raman’s paper
The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment
puts power front and centre of an analysis of how the impacts of open data may play out, and Jo Bates “This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative questions whether UK open data policy has become a fig-leaf for marketisation of public services and neoliberal reforms in the state.

These challenges to open government data, questioning whether OGD does (or even can?) deliver on promises to promote democratic engagement and citizen empowerment are, well, challenging. Advocates of OGD may initially want to ignore these critical cases, or to jump straight to sketching ‘patches’ and pragmatic fixes that route around these challenges. However, I suspect the positive potential of OGD will be closer when we more deeply engage with these critiques, and when in the advocacy and architecture of OGD we find ways to embrace tough questions of power and local context.

(Zainab and I have tried to provide a longer summary weaving together some of these issues in our editorial essay here, although we see this very much as the start, rather than end-point, of an exploration…)

More to come: I’ve been working on the journal issue for just over a year with my co-editor Zainab Bawa, and at the invitation of Michael Gurstein, who has also been fantastically supportive in us publishing this as a ‘rolling issue’. That means we’re going to be adding to the issue over the coming months, and this is just the first batch of papers available to start feeding into discussions and debates now, particuarly ahead of the Open Government Partnership meeting in Brasilia next week where IDRC, Berkman Centre and the World Wide Web Foundation are hosting a discussion to develop future research agendas on the impacts of Open Government Data.

ICT for or against development? Exploring linked and open data in development

The second publication is a report I worked on last year with Mike Powel and Keisha Taylor for the IKM Emergent programme, under the title: ICT for or against development? An introduction to the ongoing case of Web 3” (PDF). The paper asks whether the International Development sector has historically adopted ICT innovations in ways that empower the subjects of development and to deliver sustainable improvements for those whose lives ” are blighted by poverty, ill-health, insecurity and lack of opportunity”, and looks at where the opportunities and challenges might lie in the adoption of open and linked data technologies in the development sector. It’s online as a PDF here, and summaries are available in English, Spanish and French

 

Exploring how digital technology can support young people to engage socially and economically in their communities

[Summary: launching an open research project to find key messages for youth-focussed digital innovation]

Over the coming months I’ll be sharing a series of blog posts linked to a project I’m working on with David Wilcox and Alex Farrow for Nominet Trust, developing a number of key messages on how digital technologies can be used to support young people to engage socially and economically in their communities. It’s a project we would love to get your input into…

Here’s where we are starting from:

“The race is on to re-engage young people in building an inclusive, healthier, more equal and economically viable society.

But changing times need fresh thinking and new solutions.  It is essential that we find new, more effective approaches to addressing these persistent social and economic challenges.   

Digital technology offers all of us the opportunity to engage young people in new, more meaningful and relevant ways and enable their participation in building a more resilient society.

We recognise that there is no single solution; many different strategies are needed to support young people. What is going to work?  ”

Between now and mid-May we’re going to be working up a series of key messages for innovators exploring the digital dimension of work with young people (you can input into this draft messages in this document before 12th April), and then taking a ‘social reporting’ approach to curate key social media and online content that helps unpack what those messages might mean in practice.

Digital dimensions of innovation

So many digital innovation projects essentially work by either taking a social challenge, and bolting a digital tool onto it; or taking a digital tool, and bolting on a social issue it might deal with. But digital innovation can be about more than tools and platforms: it can be about seeing how digital communication impacts upon the methods of organizing and the sorts of activities that make sense in contemporary communities. We’re looking for the messages that work from a recognition of the shared space between digital innovation and social change.

For example, back in the Youth Work and Social Networking report (PDF) we explored how, now that digital technologies means young people are in almost constant contact with peer-groups through SMS, social networking and instant messaging, ideas of informal education based solely on an isolated two or three hours a week of face-to-face contact seem outdated. But the solution isn’t just for youth workers to pick up and use social network sites as a venue for existing forms of practice (as a number of ‘virtual youth centre’ projects quickly discovered). Instead, by going back to youth work values, practitioners can identify the new forms of practice and interaction that are possible in the digital world.

And digital innovations to support youth engagement in employment, enterprise and community action might not just involve changing the way services are delivered to young people. A post from Jonathan Ward this morning on the Guardian’s Service Delivery Hub highlights how many of the institutions of localism such as local strategic partnerships, neighborhood planning groups, and localism forums are inaccessible to young people who “are often too busy with family and work commitments to take part in the business of localism”. We could take an approach of bolting-on digital technologies for young people to input into local fora: setting up Facebook groups or online spaces to discuss planning, with someone feeding this into regular face-to-face meetings. But on it’s own this isn’t terribly empowering. Instead, we might explore what tools what would make the processes of neighborhood in general planning more open to youth input, and look at how digital technology can not only allow consultation with young people, but can shift the structures of decision making so that online input is as valued and important as the input of those with the time to turn up to a face-to-face meeting.

Get involved

Between now and April 12th we’re inviting input into the key messages that we should develop further. You can drop ideas into the comments below, or direct into the open document where we’re drafting ideas here. After April 12th, we’ll start working up a selection of the messages and searching out the social media and other online content that can illuminate what these messages might mean in practice.

As we work through our exploration, we’ll be blogging and tweeting reflections, and all the replies and responses we get will be fed into the process.

At the start of June the results of the process will hopefully be published as a paper and online resource to support Nominet Trust’s latest call for proposals.

Untangling the data debate

[Cross posted from my PhD blog where I’m trying to write a bit more about issues coming up in my current research…]

This post is also available as a two-page PDF here.

Untangling the data debate: definitions and implications

Data is a hot topic right now: from big data, to open data and linked data, entrepreneurs and policy makers are making big claims about ‘data revolutions’. But, not all ‘data’ are the same, and good decision making about data involves knowing the differences.

Big data

Definition: Data that requires ‘massive’ computing power to process (Crawford & Boyd, 2011).

Massive computing power, originally only available on supercomputers, is increasingly available on desktop computers or via low cost cloud computing.

Implications: Companies and researchers can ‘data mine’ vast data resources, to identify trends and patterns. Big data is often generated by combining different datasets.

Digital traces from individuals and companies are increasingly captured and stored for their potential value as ‘big data’.

Raw data

Definition: Primary data, as collected or measured direct from the source. Or Data in a form that allows it to be easily manipulated, sorted, filtered and remixed.

Implications: Access to raw data can allows journalists, researchers and citizens to ‘fact check’ official analysis. Programmers are interested in building innovative services with raw data.

Real-time data

Definitions: Data measured and made accessible with minimal delay. Often accessed over the web as a stream of data through APIs (Application Programming Interfaces).

Implications: Real-time data supports rapid identifications trends. Data can support the development of ‘early warning systems’ (e.g. Google Flu Trends; Ushahidi). ‘Smart systems’ and ‘smart cities’ can be configured to respond to real-time data and adapt to changing circumstances.

Open data

Definition: Datasets that are made accessible in non-proprietary formats under licenses that permit unrestricted re-use (OKF – Open Knowledge Foundation, 2006). Open government data involves governments providing many of their datasets online in this way.

Implications: Third-parties can innovate with open data, generating social and economic benefits. Citizens and advocacy groups can use open government data to hold state institutions to account. Data can be shared between institutions with less friction.

Personal/ private data

Definitions: Data about an individual that they have a right to control access to. Such data might be gathered by companies, governments or other third-parties in order to provide a service to someone, or as part of regulatory and law-enforcement activities.

Implications: Many big and raw datasets are based on aggregating personal data, and combining them with other data. Effective anonymisation of personal data is difficult particularly when open data provides the pieces for ‘jigsaw identification’ of private facts about people (Ohm, 2009).

Linked data

Definitions: Datasets are published in the RDF format using URIs (web addresses) to identify the elements they contain, with links made between datasets (Berners-Lee, 2006; Shadbolt, Hall, & Berners-Lee, 2006).

Implications: A ‘web of linked data’ emerges, supporting ‘smart applications’ (Allemang & Hendler, 2008) that can follow the links between datasets. This provides the foundations for the Semantic Web.

More dimensions of data:

These are just a few different types of data commonly discussed in policy debates. There are many other data-distinctions we could also draw. For example: we can look at whether data was crowd-sourced, statistically sampled, or collected through a census. The content of a dataset also has important influence on the implications that working with that data will have: an operational dataset of performance statistics is very different from a geographical dataset describing the road network for example.

Crossovers and conflicts:

Almost all of the above types of data can be found in combination: you can have big linked raw data; real-time open data; raw personal data; and so-on.

There are some combinations that must be addressed with care. For example, ‘open data’ and ‘personal data’ are two categories that are generally kept apart for good reason: open data involves giving up control over access to a dataset, whilst personal data is the data an individual has the right to control access over.

These can be found in combination on platforms like Twitter, when individuals choose to give wider access to personal information by sharing it in a public space, but this is different from the controller of a dataset of personal data making that whole dataset openly available.

A nuanced debate:

It’s not uncommon to see claims and anecdotes about the impacts of ‘big data’ use in companies like Amazon, Google or Twitter being used to justify publishing ‘open’ and ‘raw data’ from governments, drawing on aggregating ‘personal data’. This sort of treatment glosses over the difference between types of data, the contents of the datasets, and the contexts they are used in. Looking to the potential of data use from different contexts, and looking to transfer learning between sectors can support economic and social innovation, but it also needs critical questions to be asked, such as:

  • What kind of data is this case describing?
  • Does the data I’m dealing with have similar properties?
  • Can the impacts of this data apply to the data I’m dealing with?
  • What other considerations apply to the data I’m dealing with?

Bibliography/further reading:

See http://www.opendataimpacts.net for ongoing work.

Allemang, D., & Hendler, J. A. (2008). Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann. Retrieved from

Berners-Lee, T. (2006, July). Linked Data – Design Issues. Retrieved from http://www.w3.org/DesignIssues/LinkedData.html

Crawford, K., & Boyd, D. (2011). Six Provocations for Big Data.

Davies, T. (2010). Open data, democracy and public sector reform: A look at open government data use from data. gov. uk. Practical Participation. Retrieved from http://www.practicalparticipation.co.uk/odi/report

OKF – Open Knowledge Foundation. (2006). Open Knowledge Definition. Retrieved March 4, 2010, from http://www.opendefinition.org/

Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. Imagine. Retrieved from http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=1450006

Shadbolt, N., Hall, W., & Berners-Lee, T. (2006). The Semantic Web Revisited. IEEE intelligent systems, 21(3), 96–101.

Open Rights Group 2012 Conference

[Summary: A quick plug for the upcoming Open Rights Group conference on March 24th 2012.]

The Open Rights Group is a campaigning organisation focussed on protecting citizen’s rights in the digital age. From advocating for a proportional copyright system that doesn’t lead to rights-holders dictating the terms of Internet access, to scrutinising government policies on Internet filtering and blocking, protecting online freedoms, and digging into the detail of open data to balance benefits for society and individuals privacy, the Open Rights Group (ORG) is active on issues that are increasingly important to all of us. I recently joined the ORG Advisory Council to support work on open data, and have been really impressed to find an organisation committed to improving policy so that key digital (and thus, in a digital age, general) freedoms are not undermined by narrow or special interest driven policy making.

In a few Saturday’s ORG are holding their annual conference in London and tickets are still on sale here.

There’s a great line of speakers and workshops, including a rare UK appearance by Lawrence Lessig, and keynotes from Cory Doctorow and Wendy Seltzer.

Plus, in one of the workshops I’m going to be putting some key questions to open data advocates Rufus Pollock and Chris Taggart and another guest panelist asking: Raw, big, linked and open: is all this data doing us, our economy and our democracy any good?

It would be great to see you there if you can make it…

Focussing on open data where it matters: accountability and action

A lot of talk of open data proceeds as if all data is equal, and a government dataset is a government dataset. Some open data advocates fall into the trap of seeing databases as collections of ‘neutral facts’, without recognising the many political and practical judgements that go into the collection and modelling of data. But, increasingly, an awareness is growing that datasets are not a-political, and that not all datasets are equal when it comes to their role in constituting a more open government.

Back in November 2010 I started exploring whether the government’s ‘Public Sector Information Unlocking Service’ actually worked by asking for open data access to the dataset underling the Strategic Export Controls: Reports and Statistics Website. Data on where the UK has issued arms export licenses is clearly important data for accountability, and yet, the data is kept in obfuscated in an inaccessible website. 14 months on, and my various requests for the data have seen absolutely zero response. Not even an acknowledgement.

However, today Campaign Against the Arms Trade have managed to unlock the Export License dataset, after painstakingly extracting inaccessible statistics from the official government site, and turning this into an open dataset and providing an online application to explore the data. They explain:

Until now the data, compiled by the Export Control Organisation(ECO) in the Department for Business, Innovation and Skills (BIS), was difficult to access, use and understand. The new CAAT app, available via CAAT’s website, transforms the accessibility of the data.

The salient features are:

    • Open access – anyone can view data without registering and can make and refine searches in real time.
    • Data has been disaggregated, providing itemised licences with ratings and values.
    • Comprehensive searchability (including of commonly-required groupings, for example by region of the world or type of weaponry).
    • Graphs of values of items licensed are provided alongside listings of licences.
    • Revoked licences are identified with the initial licence approvals.
    • Individual pages/searches (unique urls) can be linked to directly.
    • The full raw data is available as csv files for download.
And as Ian Prichard, CAAT Research Co-ordinator put’s it:

It is hard to think of an area of government activity that demands transparency more than arms export licensing. 

The lack of access to detailed, easy-to-access information has been a barrier to the public, media and parliamentarians being able to question government policies and practices. These practices include routine arming of authoritarian regimes such as Saudi Arabia, Bahrain and Egypt.

As well as providing more information in and of itself, we hope the web app will prompt the government to apply its own open data policies to arms exports. and substantially increase the level and accessibility of information available.

Perhaps projects like CAAT’s can help bring back the ‘hard political edge’ Robinson and Yu describe in the heritage of ‘open government’. They certainly emphasise the need for a ‘right to data’ rather than just access to data existing as a general policy subject to the decisions of those in power.

A commonwealth of skills and capabilities

Cross-posted from a guest blog post on the Commonwealth Internet Governance Forum Website.

[Summary: Creating cultures of online collaboration, and skills for online safety, is tougher than building platforms or creating technical controls, but without a participation-centred approach we will lose out on the benefits of the net]

“We want to encourage more knowledge sharing, learning and collaboration within our network. Let’s create an online platform.”

“These online spaces contain dangerous content. We need to restrict access to them.”

These sorts of thoughts are incredibly common when it comes to engagement with the Net, whether as a space of opportunity, or as a space of risk and danger. I’m sure you will have encountered them. For example, from a committee focussing on the provision of new online tools and services, forums and websites to improve communication within a group. Or perhaps from institutions and governments arguing for more powers or tools to control Internet access, whether filtering Internet access in schools, or domain seizures requests to take websites offline at the DNS level in the interests of protecting students or citizens. However, these lines of reasoning are deeply problematic if we believe in the Internet as a democratic tool, and a space of active citizenship. In this post I’ll try and explain why, and to argue that our energy should go primarily into sharing skills and capabilities rather than solely into building platforms or creating controls.

The protection paradox

In the UK there has recently been a vigorous debate over whether the police should be able to ask the national domain name registrar Nominet, to block certain .uk DNS entries (domain names) if a website is found to contain malware or to be selling counterfeit goods. Much of the debate has been over whether the police should have a court order before making their requests, or whether the DNS can be altered on law-enforcement request without judicial authorisation. Creating new powers to allow authorities to act against cybercrime by adding blocks within the network can certainly seems like an appealing option when confronted with a multitude of websites with malicious intent, but these approaches to protection can create a number of unintended results.

Blocks within the network can create a false sense of security: users feel that someone else is taking care of security for them, and so have even less motivation to act on security for themselves, creating increased risks when malicious sites inevitably slip through the cracks. Strategies of control  and filtering in schools and educational institutions also remove the incentives for educators to support young people to develop the digital skills they need to navigate online risks safely. But the potential for control-based protection policies to limit individuals ability to protect themselves is just one of the paradoxes. Protection measures placed in the network itself can centralise power over Internet content, creating threats to the open nature of the Internet, and putting in place systems and powers which could be used to limit democratic freedoms.

If we put restriction and control strategies of protection aside, there are still options open to us – and options that better respect democratic traditions. On the one hand, we can ensure that laws and effective judicial processes are in place to address abuses of the openness of the Internet; and, on the other, we can focus on individuals skills and capabilities to manage their own online safety. Often these skills are very practical. As young people at last years Internet Governance Forum explained in a session on challenging myths about young people and the Internet explained (LINK), young people do care about privacy: they don’t need to be given scare stories about privacy dangers,  but they do want help to use complicated social network privacy settings, and opportunities to talk with friends and colleagues about norms of sharing personal information online.

Of course, the work involved in spreading practical digital skills can look like at order of magnitude greater than the work involved in implementing network-level controls. But that doesn’t mean it’s not the right approach. It might be argued that, for some countries, spreading the digital literacy needed for people to participate in their own protection from cybercrime is simply too complicated right now – and it can wait until later, whilst rolling out Internet access can’t wait. In ‘Development as FreedomAmartya Sen counters a similar argument about democratic freedoms and economic development, where some theorists suggest democratic rights are a luxury that can only be afforded once economic development is well progressed. Sen counters that democratic rights and freedoms are a constitute part of development, not some add-on. In the same vein we might argue that being able to be an autonomous user of an Internet endpoint, with as much control as possible over any controls that might be placed on your Internet access is constitutive both of having effective Internet access, and of being able to use the Internet as a tool to promote freedom and development. The potential challenges of prioritising skills-based and user-empowerment approach to cyber-security should not be something we shy away from.

The problem with a platform focus

When we look at a successful example of online collaboration the most obvious visible element of it is often the platform being used: whether it’s a Facebook group, or a custom-built intranet. Projects to support online learning, knowledge sharing or dialogue can quickly get bogged down in developing feature-lists for the platform they think they need – articulating grand architectural visions of a platform which will bring disparate conversations together, and which will resolve information-sharing bottlenecks in an organisation or network. But when you look closer at any successful online collaboration, you will see that it’s not the platform, but the people, that make it work.

People need opportunities, capabilities and supportive institutional cultures to make the most of the Internet for collaboration. The capabilities needed range from technical skills (and, on corporate networks, the permission) to install and use programs like Skype, to Internet literacies for creating hyper-links and sharing documents, and the social and media literacy to participate in horizontal conversations across different media. But even skills and capabilities of the participants are not enough to make online collaboration work: there also needs to be a culture of sharing, recognising that the Internet changes the very logic of organisational structures, and means individuals need to be trusted and empowered to collaborate and communicate across organisational and national boundaries in pursuit of common goals.

Online collaboration also needs facilitation: from animateurs who can build community and keep conversations flowing, to technology stewards who can help individuals and groups to find the right ad-hoc tools for the sorts of sharing they are engaged in at that particular time. Online facilitators also need to work to ensure dialogues are inclusive – and to build bridges between online and offline dialogue. In my experience facilitating an online community of youth workers in the UK, or supporting social reporting at the Internet Governance Forum, the biggest barriers to online collaboration have been people’s lack of confidence in expressing themselves online, or easily-address technical skill shortages for uploading and embedding video, or following a conversation on Twitter.

Building the capacity of people and institutions, and changing cultures, so that online collaboration can work is far trickier than building a platform. But, it’s the only way to support truly inclusive dialogue and knowledge-sharing. Plus, when we focus on skills and capabilities, we don’t limit the sorts of purposes they can be put to. A platform has a specific focus and a limited scope: sharing skills lays the foundation for people to participate in a far wider range of online opportunities in the future.

Culture, capability and capacity building in the Commonwealth

So what has this all got to do with the Commonwealth? And with Internet governance? Hopefully the connections are clear. Sharing knowledge across boundaries is at the heart of a Commonwealth vision, and cybercrime is one area where the Commonwealth has agreed to focus collaboration (LINK). Projects like the Commonwealth Youth Exchange Council’s Digital Guyana project, and numerous other technical skills exchanges provide strong examples of how the Commonwealth can build digital skills and capabilities – but as yet, we’ve only scratched the surface of social media, online collaboration and digital skill-sharing in the Commonwealth. It would be great to think that we can switch from the statements this post opened with, to finding that statements like those below are more familiar:

“We want to encourage more knowledge sharing, learning and collaboration within our network. Let’s invest in sharing the skills to engage, building the culture of openness, and involving the technology stewardship and facilitation we need to do it.”

“These online spaces contain dangerous content. Let’s use shared knowledge across the Commonwealth to build the capacity of communities and individuals to actively participate in their own protection, and in having a safer experience of the Internet.”

Going beyond simply sharing legal codes and practices, or building platforms, to sharing skills and co-creating programmes to build individual and community capability is key for us to meet the collaboration and IG challenges of the future.

 

—–

Footnote: 3Ps, with participation as the foundation

In thinking about how to respond to range of Internet Governance issues, I’m increasingly turning to a model drawn from the UN Convention on the Rights of the Child (UNCRC), which turns out to have far wider applicability than just to youth issues. There is a customary division of the UNCRC rights into three categories: protection rights, provision rights, and participation rights. Rather than being in tension, these can be seen as mutually re-enforcing, and represented with a triangle of rights. Remove any side of the triangle, and the whole structure collapses.

How can this be used? Think of this triangle as a guide to what any effective policy and practice response to use of the Internet needs to involve. When your concern is protection (e.g. in addressing cybercrime), the solutions don’t only involve ‘protective’ measures, but need components involving the provision of education, support or remedial action in cases of harm, and components that promote the participation in individuals, both to develop skills to navigate online risks, and to be active stakeholders in their own protection. When your concern is promoting online participation and collaboration then as well as developing participative cultures and skills, you need to look to the provision of spaces and tools for dialogue, and making sure those spaces do not create unnecessary risks for participants. A balanced response to the Net can identify how it addresses each of protection, provision and participation.

However, we can go one step further by positing Participation as the foundation of this triangle (in the UNCRC Participation rights are arguably a key foundation for the others). Any policy or intervention which undermines people’s capacity to freely participate online undermines the validity of this intervention as a whole.

You can find more on the application of this model to young people’s online lives in this paper, or share your reflections on the model on this blog post.

NT Open Data Days: Exploring data flow in a VCO

[Summary: A practical post of notes from a charity open data day. Part reflective learning; part brain-dump; part notes for ECDP]

Chelmsford was my destination this morning for a Nominet Trust funded ‘Open Data Day’ with the Essex Coalition of Disabled People (ECDP). The Open Data Days are part of an action research exploration of how charities might engage with the growing world of open data, both as data users and publishers. You can find a bit more of the context in this post on my last Open Data Day with the Nominet Trust team.

This (rather long and detailed) post provides a run down of what we explored on the day as a record for the ECDP team, and as as resource of wider shared learning.

Seeking structures and managing data

For most small organisations, data management often means Excel spreadsheets, and ECDP is no exception. In fact, ECDP has a lot of spreadsheets on the go. Different teams across the organisation maintain lists of volunteers, records about service users, performance data, employment statistics, and a whole lot more, in individual Excel workbooks. Bringing that data together to publish the ‘Performance Dashboards‘ that ECDP built for internal management, but that have also been shared in the open data are of the ECDPwebsite, is a largely manual task. Across these spreadsheets it’s not uncommon to see the information on a particular topic (e.g. volunteers), spread across different tabs, or duplicated into different spreadsheets where staff have manually copied filtered extracts for particular reports. The challenge with this is that it leads the organisations information to fragment, and makes pulling together both internal and open data and analysis tricky. Many of the spreadsheets we found during the open day mix the ‘data layer’, with ‘presentation’ and ‘analysis’ layers, rather than separating these out.

What can be done?

Before getting started with open data, we realised that we needed to look at the flow of data inside the organisation. So, we looked at what makes a good data layer in a spreadsheet, such as:

  • Keeping all the data of one type in a single worksheet. For example, if you have data on volunteers all the data should be in a single sheet. Don’t start new sheets for ‘Former volunteers’, or ‘Volunteers interested in sports’ – as this fragments the data. If you need to be know about a volunteers interest, or whether they are active or not, add a column to your main sheet, and use filters (see below).
  • Having one header row of columns. You can use merged cells, sub-headings and other formatting when you present data – but when you use these in the master spreadsheet where you collect and store your data you make life trickier for the computer to understand what your data is, and to support different analysis of the data in future.
  • Including validation… Excel allows you to define a list of possible values for a cell, and provides users entering data with a drop-down box to select from instead of them typing values in by hand. This really helps increase the consistency of data. You can also validate to be sure the entry in a cell is a number, or a date, and so-on. In working on some ECDP prototypes we ran up against a problem where our lists of possible valid entries for a cell was too long, and we didn’t want to keep the master-list of valid values in the Excel sheet our data was on, but Wizard of Excel has documented a workaround for that.
  • …but keeping some flexiblity. Really strict validation has it’s own problems, as it can force people to twist what they wanted to record to fit in a structure that doesn’t make sense, or that distorts the data. For example, in some spreadsheets we found the ‘Staff member responsible’ column often had more than one name in. We had to explore why that was, and whether the data structure needed to accomodate more than one staff member linked to a particular row in the spreadsheet. Keeping a spreadsheet structure flexible can be a matter of providing free text areas where users are not constrained in the detail they provide and in having a flexible process to revise and update structures according to demand.

Once you have a well structured spreadsheet (see the open data cookbook section on preparing your data if you still need to get a sense of what well structured data might look like), then you can do a lot more with it. For example:

  • Creating a pivot chart. Pivot chartsare a great way to analyse data, and are well worth spending time to explore. Many of the reporting requirements an organisation has can be met using a pivot chart.For ECDP we created an example well-structured dataset of ‘Lived Experience Feedback’ – views and insights provided by service users and recorded with detailed descriptions, dates when the feedback was given, and categories highlighting the topical focus of the views expressed. We made all this data into an Excel list, which allowed us to add a formula that would apply to every row and that used the =MONTH() formula to extract the month from the dates given in each row. Creating a pivot chart from this list, we could then drill down to find figures such as the number of Lived Experience reports provided to the Insight team and relating to ‘Employment’ in any given month.
  • Creating filtered lists and dashboards. It can seem counterintuitive to an organisation which mostly wants to see data in separate lists by area, or organisational team, to put all the data for those areas and teams into one spreadsheet, with just a column to flag up which team or area a row relates to. That’s why spreadsheets often end up with different tabs for different teams – where the same sort of data is spread across them. Using formulae to create thematic lists and dashboards can be a good way to keep teams happy, whilst getting them to contribute to a single master list of data. (We spent quite a lot of time on the open data day thinking about the importance of motivating staff to provide good quality data, and the need to make the consequences of providing good data visible.)Whilst the ‘Autofilter’ feater in Excel can be used to quickly sub-set a dataset to get just the information you are interested in, when we’re building a spreadsheet to be stored on a shared drive, and used by multiple teams, we want to avoid confusion when the main data sheet ends up with filters applied. So instead we used simple cross-sheet formulae (e.g. If your main data sheet is called ‘Data’, then put =’Data’!A1 in the top-left cell of a new sheet, and then drag it out) to copies of the master sheet, and then applied to the filters to these. We included a big note on each of these extra sheets to remind people that any edits should be made to the master data, not these lists.
  • Linking across spreadsheets. Excel formulae can be used to point to values not just in other sheets, but also to values in other files. This makes it possible to build a dashboard that automatically updates by running queries against other sheets on a shared drive.Things get even more powerful when you are able to publish datasets to the web as open data, when tools like Google Docs have the ability to pull in values and data across the web, but even with non-open data in an organisation, there should be no need to copy and paste values that could be transfered dynamically and automatically.

Of course, when you’ve got lots of legacy spreadsheets around, then making the shift to more structured data, separating the data, analysis and presentation layers, can be tricky. Fortunately, some of the common tools in the open data wranglers toolbox come in handy here.

To move from a spreadsheet with similar data spread across lots of different tabs (one for each team that produced that sort of data), to one with consistent and standardised data, we copied all the data into a single sheet with one header row, and a new column indicating the ‘team’ that row was from (we did this by saving each of the sheets as .csv files, and using the ‘cat’ command on Mac OSX to combine these together, but the same effect can be got with copy and paste).

We then turned to the open data wranglers power tool Google Refine (available as a free download) to clean up the data. We used ‘text facets’ to see where people had entered slightly different names for the same area or theme, and made bulk edits to these, and used some replacement patterns to tidy up date values.

We then took this data back into Excel to build a master spreadsheet, with one single ‘Data’ sheet, and separate sheets for pivot chart reports and filtered lists.

The whole process once started took an hour or so, but once complete, we had a dataset that could be analysed in many more ways than before, and we had the foundations for building both better internal data flows, and for extracting open data to share.

Heading towards a CRM

As much as, with the appropriate planning, discipline and stewardship, Excel can be used to manage a lot of the data an organisation needs, we also explored the potential to use a fully-featured ‘Contact Relationship Management‘ dataset (CRM) to record information right across the organisation.

Even when teams and projects in an organisation are using well structured spreadsheets, there are likely to be overlaps and links between their datasets that are hard to make unless they are all brought into one place. For example, two teams might be talking to the same person, but if one knows the person as Mr Rich Watts, and the other record R.Watts, bringing together this information is tricky. A CRM is a central database (often now accessed over the web) which keeps all this information in one place.

Modern CRM systems can be set up to track all sorts of interactions with volunteers, customers or service users, both to support day to day operations, and to generate management information. We looked at the range of CRM tools available, from the Open Source ‘CiviCRM’ which has case tracking modules that may be useful to an organisation like ECDP, through to tools like Salesforce, which offer discounts to non-profits. Most CRM solutions have free online trials. LASA’s ICT Knowledge Base is a great place to look for more support on exploring options for CRM systems.

In our open data day we discussed the importance of thinking about the ‘user journey’ that any database needs to support, and ensuring the databases enable, rather than constrain, staff. Any process of implementing a new database is likely to involve some changes in staff working practices too, so it’s important to look at the training and culture change components as well as the technical elements. This is something true of both internal data, and open data, projects.

When choosing CRM tools it’s important to think about how a system might make it possible to publish selected information as open data directly in future, and how they might be able to pull in open data.

Privacy Matters

Open data should not involve the release of people’s personal data. To make open data work, a clear line needs to be drawn between data that identifies and is about individuals, and the sorts of non-personal data that an organisation can release as open data.

Taking privacy seriously matters:

  • Data anonymisation cannot be relied upon. Studies conclusively show that we should not put our faith in anonymisation to protect individuals identity in published datasets. It’s not enough to simply remove names or dates of birth from a dataset before publishing it.
  • Any release of data drawn from personal data needs to follow from a clear risk assessment. It’s important to consider what harm could result from the release of any dataset. For example, if publishing a dataset that has contains information on reported hate crime by post-code area, if a report was traced back to an individual could this lead to negative consequences for them?
  • It’s important to be aware of jigsaw re-identification risks. Jigsaw re-identification is the risk that putting together two open datasets will allow someone to unlock previously anonymised personal data. For example, if you publish one open dataset that maps where users of your service are, and includes data on types of disability, and you publish another dataset that lists reports of hate-crime by local area, could these be combined to discover the disability of the person who reported hate crime in a particular area, and then, perhaps combined with some information from a social network like Facebook or Twitter, to identify that person.

Privacy concerns don’t mean that it’s impossible to produce open data from internal datasets of pesonal information, but care has to be taken. There can be tension between the utility of open data, and the privacy of personal data in a dataset. Organisations need to be careful to ensure privacy concerns and the rights of service users always come first.

With the ECDP data on ‘Lived Experience’ we looked at how Google Refine could be used to extract from the data a list of ‘PCT Areas’ and ‘Issue Topics’ reported by service users, to map where the hot-spots were for particular issues at the PCT level. Whilst drawn from a dataset with personal information, this dataset would not include any Personally Identifying Information, and may be possible to publish as open data.

Open data fusion

Whilst a lot of our ‘open data day’ was spent on the foundations for open data work, rather than open data itself, we did work on one small project which had an immediate open data element.

Rich Watts brought to the session a spreadsheet of 250 Disabled People’s User Led Organisations (DPULOs), and wanted to find out (a) how many of these organisations were charities; and (b) what their turnover was. Fortunately, Open Charities has gathered exactly the data needed to answer that question as open data, and so we ran through how Google Fusion Tables could be used to merge Rich’s spreadsheet with existing charity data (see this How To for an almost identical project with Esmee Fairbairn grants data), generating the dataset needed to answer these questions in just under 10 minutes.

We discussed how Rich might want to publish his spreadsheet of DPULOs as open data in future, or to contribute information on the fact that certain charities are Disabled People’s User Led Organisations back to an open data source like Open Charities.

Research Resources

The other element to our data was an exploration of online data sources useful in researching a local area, led fantastically by Matthew of PolicyWorks.

Many of the data sources Matthew was able to point to for finding labour market information, health statistics, demographic information and other stats provide online access to datasets, but don’t offer this as ‘open data’ that would meet the OKF’s open definition requirements, raising some interesting questions about the balance between a purist approach to open data, or an approach that looks for data which is ‘open enough’ for rough-and-ready research.

Where next?

Next week Nominet, NCVO and Big Lottery Fund are hosting a conference to bring together learning from all the different Open Data Days that have been taking place. The day will also see the release of a report on the potential of open data in the charity sector.

For me, today’s open data day has shown that we need to recognise some of the core data skills that organisations will need to benefit from open data. Not just skills to use new online tools, but skills to manage the flow of data internally, and to fascilitate good data management. Investment in these foundations might turn out to be pivotal for realising open data’s third-sector potential…

5-Stars of Open Data Engagement?

[Summary: Notes from a workshop at UKGovCamp that led to sketching a framework to encourage engagement and impact of open data initiatives might contain]

Update: The 5 Stars of Open Data Engagement now have their own website at http://www.opendataimpacts.net/engagement/.

In short

* Be demand driven

* * Provide context

* * * Support conversation

* * * * Build capacity & skills

* * * * * Collaborate with the community

The Context

I’ve spent the last two days at UKGovCamp, an annual open-space gathering of people from inside and around local and national government passionate about using digital technologies for better engagement, policy making and practice. This years event was split over two days: Friday for conversations and short open-space slots; Saturday for more hands-on discussions and action. Suffice to say, there were plenty of sessions on open data on both days – and this afternoon we tried to take forward some of the ideas from Day 1 about open data engagement in a practical form.

There is a general recognition of the gap between putting a dataset online, and seeing data driving real social change. In a session on Day 1 led by @exmosis, we started to dig into different ways to support everyday engagement with data, leading to Antonio from Data.gov.uk suggesting that open data initiatives really needed to have some sort of ‘Charter of engagement’ to outline ways they can get beyond simply publishing datasets, and get to supporting people to use data to create social, economic and administrative change. So, we took that as a challenge for day 2, and in session on ‘designing an engaging open data portal’ a small group of us (including Liz StevensonAnthony Zacharzewski, Jon Foster and Jag Goraya) started to sketch what a charter might look like.

You can see the (still developing) charter draft in this Google Doc. However, it was Jag Goraya‘s suggestion that the elements of a charter we were exploring might also be distilled into a ‘5 Stars’ that seemed to really make some sense of the challenge of articulating what it means to go beyond publishing datasets to do open data engagement. Of course, 5-star rating scales have their limitations, but I thought it worth sharing the draft that was emerging.

What is Open Data Engagement?

We were thinking about open data engagement as the sorts of things an open data initiative should be doing beyond just publishing datasets. The engagement stars don’t relate to the technical openness or quality of the datasets (there are other scales for that), and are designed to be flexible to be able to apply to a particular dataset, a thematic set of datasets, or an open data initiative as a whole.

We were also thinking about open government data in our workshop; though hopefully the draft has wider applicability. The ‘overarching principles’ drafted for the Charter might also help put the stars in context:

Key principles of open government data: “Government information and data are common resources, managed in trust by government. They provide a platform for public service provision, democratic engagement and accountability, and economic development and innovation. A commitment to open data involves making information and data resources accessible to all without discrimination; and actively engaging to ensure that information and data can be used in a wide range of ways.”

Draft sketch of five stars of Open Data Engagement

The names and explanatory text of these still need a lot of work; you can suggest edits as comments in the Google Doc where they were drafted.

* Be demand driven

Are your choices about the data you release, how it is structured, and the tools and support provided around it based on community needs and demands? Have you got ways of listening to people’s requests for data, and responding with open data?

** Provide good meta-data; and put data in context

Do your data catalogue provide clear meta-data on datasets, including structured information about frequency of updates, data formats and data quality? Do you include qualitative information alongside datasets such as details of how the data was created, or manuals for working with the data? Do you link from data catalogue pages to analysis your organisation, or third-parties, have already carried out with the data, or to third-party tools for working with the data?

Often organisations already have detailed documentation of datasets (e.g. analysis manuals and How To’s) which could be shared openly with minimal edits. It needs to be easy to find these when you find a dataset. It’s also common that governments have published analysis of the datasets (they collected it for a reason), or used it in some product or service, and so linking to these from the dataset (and vice-versa) can help people to engage with it.

*** Support conversation around the data

Can people comment on datasets, or create a structured conversation around data to network with other data users? Do you join the conversations? Are there easy ways to contact the individual ‘data owner’ in your organisation to ask them questions about the data, or to get them to join the conversation? Are there offline opportunities to have conversations that involve your data?

**** Build capacity, skills and networks

Do you provide or link to tools for people to work with your datasets? Do you provide or link to How To guidance on using open data analysis tools, so people can build their capacity and skills to interpret and use data in the ways they want to? Are these links contextual (e.g. pointing people to GeoData tools for a geo dataset, and to statistical tools for a performance monitoring dataset)? Do you go out into the community to run skill-building sessions on using data in particular ways, or using particular datasets? Do you sponsor or engage with community capacity building?

When you give people tools – you help them do one thing. When you give people skills, you open the possibility of them doing many things in future. Skills and networks are more empowering than tools. 

***** Collaborate on data as a common resource

Do you have feedback loops so people can help you improve your datasets? Do you collaborate with the community to create new data resources (e.g. derived datasets)? Do you broker or provide support to people to build and sustain useful tools and services that work with your data?


It’s important for all the stars that they can be read not just with engaging developers and techies in mind, but also community groups, local councillors, individual non-techie citizens etc. Providing support for collaboration can range from setting up source-code sharing space on GitHub, to hanging out in a community centre with print-outs and post-it notes. Different datasets, and different initiatives will have different audiences and so approaches to the stars – but hopefully there is a rough structure showing how these build to deeper levels of engagement.

Where next?

Hopefully Open Data Sheffield will spend some time looking at this framework at a future meeting – and all comments are welcome on the Google doc. Clearly there’s lot to be done to make these more snappy, focussed and neat – but if we do find there’s a fairly settled sense of a five stars of engagement framework (if not yet good language to express it) then it would be interesting to think about whether we have the platforms and processes in place anywhere to support all of this: finding the good practice to share. Of course, there might already be a good engagement framework out there we missed when sketching this all out – so comments to that effect welcome too…

 

Updates:

Ammended 22nd January to properly credit Antonio of Data.gov.uk as originator of the Charter idea

Exploring Open Charity Data with Nominet Trust

[Summary: notes from a pilot one-day working on open data opportunities in third-sector organisations]

On Friday I spent the day with Nominet Trust for the second of a series of charity ‘Open Data Days’ exploring how charities can engage with the rapidly growing and evolving world of open data. The goal of these hands-on workshops is to spend just one working day looking at what open data might have to offer to a particular organisation and, via some hands-on prototyping and skill-sharing, to develop an idea of the opportunities and challenges that the charity needs to explore to engage more with open data.

The results of ten open data days will be presented at a Nominet Trust, NCVO and Big Lottery Fund conference later in the year, but for now, here’s a quick run-down / brain-dump of some of the things explored with the Nominet Trust team.

What is Open Data anyway?

Open data means many different things to different people – so it made sense to start the day looking at different ways of understanding open data, and identifying the ideas of open data that chimed most with Ed and Kieron from the Nominet Trust Team.

The presentation below runs through five different perspectives on open data, from understanding open data as a set of policies and practices, to looking at how open data can be seen as a political movement or a movement to build foundations of collaboration on the web.


Reflecting on the slides with Ed and Kieron highlighted that the best route into exploring open data for Nominet Trust was looking at the idea that ‘open data is what open data does’ which helped us to set the focus for the day on exploring practical ways to use open data in a few different contexts. However, a lot of the uses of open data we went on to explore also chime in with the idea of a technical and cultural change that allows people to perform their own analysis, rather than just taking presentations of statistics and data at face value.

Mapping opportunities for open data

Even in a small charity there are many different places open data could have an impact. With Nominet Trust we looked at a number of areas where data is in use already:

  • Informing calls for proposals – Nominet Trust invite grant applications for ideas that use technology for disruptive innovation in a number of thematic areas, with two main thematic areas of focus live at any one time. New thematic areas of focus are informed by ‘State of the Art’ review reports. Looking at one of these it quickly becomes clear these are data-packed resources, but that the data, analysis and presentation are all smushed together.
  • Throughout the grant process – Nominet Trust are working not only to fund innovative projects, but also to broker connections between projects and to help knowledge and learning flow between funded projects. Grant applications are made online, and right now, details of successful applicants are published on the Trust’s websites. A database of grant investment is used to keep track of ongoing projects.
  • Evaluation – the Trust are currently looking at new approaches to evaluating projects, and identifying ways to make sure evaluation contributes not only to an organisations own reflections on a project, but also to wider learning about effective responses to key social issues.

With these three areas of data focus, we turned to identify three data wishes to guide the rest of the open data day. These were:

  • Being able to find the data we need when we need it
  • Creating actionable tools that can be embedded in different parts of the grant process – and doing this with open platforms that allow the Nominet Trust team to tweak and adapt these tools.
  • Improving evaluation – with better data in, and better day out

Pilots, prototypes and playing with data

The next part of our Open Data Day was to roll up our sleeves and to try some rapid experiments with a wide range of different open data tools and platforms. Here are some the experiments we tried:

Searching for data

We imagined a grant application looking at ways to provide support to young people not in education, employment or training in the Royal Borough of Kensington and Chelsea, and set the challenge of finding data that could support the application, or that could support evaluation of it. Using the Open Data Cook Book guide to sourcing data, Ed and Keiron set off to track down relevant datasets, eventually arriving at a series of spreadsheets on education stats in London on the London Skills and Employment Observatory website via the London Datastore portal.  Digging into the spreadsheets allowed the team to put claims that could be made about levels of education and employment exclusion in RBKC in context, looking at the difference interpretations that might be drawn from claims made about trends and percentages, and claims about absolute numbers of young people affected.

Learning: The data is out there; and having access to the raw data makes it possible to fact-check claims that might be made in grant applications. But, the data still needs a lot of interpretation, and much of the ‘open data’ is hidden away in spreadsheets.

Publishing open data

Most websites are essentially databases of content with a template to present them to human readers. However, it’s often possible to make the ‘raw data’ underlying the website available as more structured, standardised open data. The Nominet Trust website runs on Drupal and includes a content type for projects awarded funding which includes details of the project, it’s website address, and the funding awarded.

Using a demonstration Drupal website we explored how the Drupal Views and the Views Bonus Pack open source modules it was easy to create a ‘CSV’ open data download of information in the website.

The sorts of ‘projects funded’ open data this would make available from Nominet Trust might be of interest to sites like OpenlyLocal.com which are aggregating details of funding to many different organisations.

Learning: You can become an open data publisher very easily, and by hooking into existing places where ‘datasets’ are kept, keeping your open data up-to-date is simple.

Mashing-up datasets

Because open datasets are often provided in standardised forms, and the licenses under which data is published allow flexible re-use of the data, it becomes easy to mash-up different datasets, generating new insights by combining different sources.

We explored a number of mash-up tools. Firstly, we looked at using Google Spreadsheets and Yahoo Pipes to filter a dataset ready to combine it with other data. The Open Data Cook Book has a recipe that involves scraping data with Google Spreadsheets, and a Yahoo Pipes recipe on combing datasets.

Then we turned to the open data powertool that is Google Refine. Whilst Refine runs in a web browser, it is software you install on your own computer, and it keeps the data on your machine until you publish it – making a good tool for a charity to use to experiment with their own data, before deciding whether it will be published as open data or not.

We started by using Google Refine to explore data from OpenCharities.org – taking a list of all the charities with the word ‘Internet’ in their description that had been exported from the site, and using the ‘Facets’ feature (and a Word Facet) in Google Refine to look at the other terms they used in their descriptions. Then we turned to a simple dataset of organisations funded by Nominet Trust, and explored how by using API access to OpenlyLocal.com’s spending dataset we could get Google Refine to fetch details of which Nominet Trust funded organisations had also recieved money from particular local authorities or big funders like Big Lottery Fund and the Arts Council. This got a bit technical, so a step-by-step How To will have to wait – but the result was an interesting indication of some of the organisations that might turn out to be common co-funders of projects with Nominet Trust – a discovery enabled by those funders making their funding information available as open data.

Learning: Mash-ups can generate new insights – although many mash-ups still involve a bit of technical heavy-lifting and it can take some time to really explore all the possibilities.

Open data for evaluation

Open data can be both an input and an output of evaluation. We looked at a simple approach using Google Spreadsheets to help a funder create evaluation online evaluation tools for funded projects.

With a Google Docs account, we looked at creating a new ‘Form’. Google Forms are easy to create, and let you design a set of simple survey elements that a project can fill in online, with the results going directly into an online Google Spreadsheet. In the resulting spreadsheet, we added an extra tab for ‘Baseline Data’, and exploring how the =ImportData() formula in Google Spreadsheet can be used to pull in CSV files of open data from a third party, keeping a sheet of baseline data up-to-date. Finally, we looked at the ‘Publish as a Web Page’ feature of Google Spreadsheets which makes it possible to provide a simple CSV file output from a particular sheet.

In this way, we saw that a funder could create an evaluation form template for projects in a Google Form/Spreadsheet, and with shared access to this spreadsheet, could help funded projects to structure their evaluations in ways that helped cross-project comparison. By using formulae to move a particular sub-set of the data to a new sheet in the Spreadsheet, and then using the ‘Publish as a Web Page’ feature, non-private information could be directly published as open data from here.

Learning: Open data can be both an input to, and an output from, evaluation.

Embeddable tools and widgets

Working with open data allows you to present one interpretation or analysis of some data, but also allow users of your website or resources to dig more deeply into the data and find their own angles, interpretations, or specific facts.

When you add a ‘Gadget’ chart to a Google Spreadsheet of data you can often turn it into a widget to embed in a third party website. Using some of the interactive gadgets allows you to make data available in more engaging ways.

Platforms like IBM’s Many Eyes also let you create interactive graphs that users can explore.

Sometimes, interactive widgets might already be available, as in the case of Interactive Population pyramids from ONS. The Nominet Trust state of the art review on Aging and use of the Internet includes a static image of a population pyramid, but many readers could find the interactive version more useful.

Learning: If you have data in a report, or on a web page, you can make it interactive by publishing it as open data, and then using embeddable widgets.

Looking ahead

The Open Data Day ended with a look at some of the different ways to take forward learning from our pilots and prototypes. The possibilities included:

Sooner

  • Quick wins: Making funded project data available as structured open data. As this information is already published online, there are not privacy issues with making it available in a more structured format.
  • Developing small prototypes taking the very rough proof-of-concept ideas from the Open Data Day on a stage, and using this to inform plans for future developments. Some of the prototypes might be interactive widgets.
  • A ‘fact check’ experiment: taking a couple of past grant applications, and using open data resources to fact-check the claims made in those applications. Reflecting on whether this process offers useful insights and how it might form part of future processes.
  • Commissioning open data along with research: when Nominet Trust commissions future State of the Art reviews it could include a request for the researcher to prepare a list of relevant open datasets as well, or to publish data for the report as open data.

Later

  • Explore open data standards such as the International Aid Transparency Initiative Standard for publishing project data in a more detailed form.
  • Building our own widgets and tools: for example, tools to help applicants find relevant open data to support their application, or tools to give trustees detailed information on applicant organisations to help their decision making.
  • Building generalisable tools and contributing to the growth of a common resource of software and tools for working with open data, as well as just building things for direct organisational use.

Where next?

This was just the second of a series of Open Data Days supported by Nominet Trust. I’m facilitating one more next month, and there are a team of other consultants working with varied other charities over the coming weeks. So far I’ve been getting a sense of the wide range of possible areas open data can fit into charity work (it feels quite like exploring the ways social media could work for charities did back in 2007/8…), but there’s also much work to be done identifying some of the challenges that charities might face, and sustainable ways to overcome them. Lots more to learn….

Evaluating the Autumn Statement Open Data Measures

[Summary: Is government is meeting the challenge of building an open data infrastructure for the UK? A critical look at the Autumn Statement Open Data Measures.]

For open data advocates, the Chancellor’s Autumn Statement published on Tuesday, underlined how far open data has moved from a small geeks issue, to an increasingly common element in Government policy. The statement itself included a section announcing new data, and renewing the argument that Public Sector Information (PSI) can play a role in both economic growth, and public service standards.

1.125 Making more public sector information available will help catalyse new markets and innovative products and services as well as improving standards and transparency in public services. The Government will open up access to core public datasets on transport, weather and health, including giving individuals access to their online GP records by the end of this Parliament. The Government will provide up to £10 million over five years to establish an Open Data Institute to help industry exploit the opportunities created through release of this data

And accompanying this the Cabinet Office published a paper of Further Detail on Open Data Measures in the Autumn Statement, including an updated on the fate of the proposed Public Data Corporation consulted on earlier in the year. Although this paper includes a number of positive announcements when it comes to the release of new datasets such as detailed transport and train timetable data, the overall document shows that government continues to fudge key reforms to bring the UK’s open data infrastructure into the 21st Century, and displays some worrying (though perhaps unsurprising) signs of open data rhetoric being hijacked to advance non-open personal data sharing projects, and highly political uses of selective open data release.

In order to put forward a constructive critique, let us take the governments intent at face value (the intent to use PSI and open data to promote economic growth, and to improve standards in public services), and then suggest where the Open Data Measures either fall short of this, or where they should otherwise give cause for concern.

A strategic approach to data?

Firstly, let’s consider the particular datasets being made available: there are commitments to provide train and bus timetable information, highways and traffic data, land registry ‘price paid’ data, Met Office weather data and companies house datasets all under some form of open license. However, the commitments to other datasets, such as key ordnance survey mapping data, train ticket price data, and the national address gazetteer are much more limited, with only a limited ‘developers preview’ of the gazetteer being suggested. There appears to be little coherence to what is being made available as open data, nor a clear assessment of how the particular datasets in question will support economic development and public accountability. If we take seriously the idea that open government data provides key elements of infrastructure for both enterprise and civic engagement in a digital economy, then we need a clear strategic approach to build and invest in that infrastructure: focussing attention on the datasets that matter most rather than seeing piecemeal release of data [1].

Clear institutional arrangements and governance?

Secondly, although the much disliked ‘Public Data Corporation’ proposal to integrate the main trading funds and establish a common (and non-open) regime for their data, has disappeared from the Measures, the alternative institutional arrangements right now appear inadequate to meet key goals of releasing infrastructure data to support economic development, and removing the inefficiencies in the current system which has government buying data off itself, reducing usage and limiting innovation.

The Open Data Measures propose the creation of a ‘Public Data Group (PDG)’ to include the trading funds who retain their trading role, selling core data and value-added services, although with a new responsibility to better collaborate and drive efficiency. The responsibility to promote availability of open data is split off to a ‘Data Strategy Board (DSB)’, which, in the current proposal, will receive a subsidy in it’s first year to ‘buy’ data from the PSG for the public, will in future years rely for it’s funding on a proportion of the dividends paid from the PDG. It is notable that the DSB is only responsible for ‘commissioning and purchasing of data for free release’ and not for ‘open’ release (the difference is in the terms of re-use of the data), which may mean in effect the DSB is only able to ‘rent’ data from the PDG, or that any data it is able to release will be a snapshot in time extract of core reference data, not a sustainable move of core reference data into the public domain.

So – in effect whilst the PDC has disappeared, and there is a split between the bodies with an interest in maximising return on data (PDG), and a body increasing supply of public data (DSB) – the body seeking public data will be reliant upon the profitability of the PDG in order to have the funding it needs to secure the release of data that, if properly released in free forms, would likely undermine the current trading revenue model of the PDG. That doesn’t look like the foundation for very independent and effective governance or regulation to open up core reference data!

Furthermore, whilst the proposed terms for the DSB terms state that “Data users from outside the public sector, including representatives of commercial re-users and the Open Data community, will represent at least 30% of the members of DSB”, there are also challenges ahead to ensure data users from civil society interests are represented on the board, including established civil society organisations from beyond the technology-centric element of the open data community (the local authority or government members of the board will not be ‘open data’ people, but simply data people – who want better access to the resources they may already be using; we should be identifying similar actors from civil society to participate – understanding the role of the DSB as one of data governance through the framework of an open data strategy).

Open data as a cloak for personal data projects and political agendas?

Thirdly, and turning to some of the other alarm bells that ring in the Open Data Measures, the first measures in the Cabinet Office’s paper are explicitly not about open data as public data, but are about the restricted sharing of personal medical records with life-science research firms – with the intent of developing this sector of the economy. With a small nod to “identifying specified datasets for open publication and linkage”, the proposals are more centrally concerned with supporting the development of a Clinical Practice Research Datalink (CPRD) which will contain interlinked ‘unidentifiable, individual level’ health records, by which I interpret the ability to identify a particular individual with some set of data points recorded on them in primary and secondary care data, without the identity of the person being revealed.

The place of this in open data measures raises a number of questions, such as whether the right constituencies have been consulted on these measures and why such a significant shift in how the NHS may be handing citizens personal data is included in proposals unlikely to be heavily scrutinised by patient groups? In the past, open data policies have been very clear that ‘personal data’ is out of scope – and the confusion here raises risks to public confidence in the open data agenda. Leaving this issue aside for the moment, we also need to critically explore the evidence that the release of detailed health data will “reinforce the UK’s position as a global centre for research and analytics and boost UK life sciences”. In theory, if life science data is released digitally and online, then the firms that can exploit it are not only UK firms – but the return on the release of UK citizens personal data could be gained anywhere in the world where the research skills to work with it exist.

When we look at the other administrative datasets proposed for release in the Measures the politicisation of open data release is evident: Fit Note Data; Universal Credit Data; and Welfare Data (again discussed for ‘linking’ implying we’re not just talking about aggregate statistics) are all proposed for increased release, with specific proposals to “increase their value to industry”. By contrast, no mention of releasing more details on the tax share paid by corporations, where the UK issues arms export licenses, or which organisations are responsible for the most employment law violations. Although the stated aims of the Measures include increasing “transparency and accountability” it would not be unreasonable to read the detail of the measures as very one-sided on this point: and emphasising industry exploitation of data far more than good governance and citizen rights with respect to data.

The blurring of the line between ‘personal data’ and ‘open data’, and the state’s assumption of the right to share personal data for industrial gain should give cause for concern, and highlights the need for build a stronger constituency scrutinising government open data action.

Building capacity to use data?

Fourthly, and perhaps most significantly if we are taking seriously the goal of seeing open data not only lead to economic development, but also to better public services, the measures contain a dearth of funding or support to truly support the sorts of skills development and organisational change that will be needed to have effective use of open data in the UK.

The Measures announce the creation of an Open Data Institute, with the possibility of £10m match funding over 5 years, to “help business exploit the opportunities created by release of public data” which does have the potential to address much needed research to the gap in understanding and practice on how to build sustainable enterprise with open data. However, beyond this, there is little in the measures to foster the development of data skills more widely in government, in the economy and in civil society.

We know that open data alone is not enough to drive innovation: it’s a raw material to be combined with others in an information economy and information society. There are significant skills development needs to equip the UK to make the most of open data – and the Measures fall short on meeting that challenge.

A constructive critique?

Many of the detailed measures from the Autumn Statement are still draft – subject to further consultation. As a package, it’s not one to be accepted or rejected out of hand. Rather – there is a need for continued engagement by a broad constituency, including members of the broad based ‘open data community’ to address the measures one-by-one as government works to fill in the details over coming months.

Footnotes

[1] An open data infrastructure: The idea of open data as digital infrastructure for the nation has a number of useful consequences. It can help us to develop our thinking about the state’s responsibility with respect to datasets. Just as in the development of our physical infrastructure the state both invested directly in provision of roads and railways, has adopted previously privately created infrastructure (the turnpikes for examples), and encouraged private investment within frameworks of government regulation, a strategic approach to public data infrastructure would not just be about pre-existing datasets having an open license slapped on them – but would involve looking at a range of strategies to provide the open data  foundations for economic and civic activity. Government may need to act as guarantor of specific datasets, if not core provider. When we think infrastructure projects, we can think critically about who benefits from particular projects: and can have an open debate about where limited state resources to support a sustainable open data infrastructure should go. The infrastructure metaphor also helps us start to distinguish different sorts of government data, recognising that performance data and personal data may need to be handled within different arrangements and frameworks from core reference data like mapping and transport systems information. In the later case, there is a strong argument to secure a guarantee of the continued funding of these resources as public goods, free at the point of use, kept in public trust, and maintained to high standards of consistency. Other arrangements are likely to lead to over-charging and under-use of core reference datasets, with deadweight loss of benefit – and particularly excluding civic uses and benefits. In the case of other datasets generated by government in the day to day conduct of business (performance data; aggregate medical records, etc.), it may be more appropriate to recognise that while there is benefit to be gained from the open release of these (a) for civic use; and (b) for commercial use, this will vary significantly on a case-by-case basis, and the release of the data should not create an ongoing obligation on government to continue to collect and produce the data once it is no longer useful for government’s primary purpose.)