NT Open Data Days: Exploring data flow in a VCO

[Summary: A practical post of notes from a charity open data day. Part reflective learning; part brain-dump; part notes for ECDP]

Chelmsford was my destination this morning for a Nominet Trust funded ‘Open Data Day’ with the Essex Coalition of Disabled People (ECDP). The Open Data Days are part of an action research exploration of how charities might engage with the growing world of open data, both as data users and publishers. You can find a bit more of the context in this post on my last Open Data Day with the Nominet Trust team.

This (rather long and detailed) post provides a run down of what we explored on the day as a record for the ECDP team, and as as resource of wider shared learning.

Seeking structures and managing data

For most small organisations, data management often means Excel spreadsheets, and ECDP is no exception. In fact, ECDP has a lot of spreadsheets on the go. Different teams across the organisation maintain lists of volunteers, records about service users, performance data, employment statistics, and a whole lot more, in individual Excel workbooks. Bringing that data together to publish the ‘Performance Dashboards‘ that ECDP built for internal management, but that have also been shared in the open data are of the ECDPwebsite, is a largely manual task. Across these spreadsheets it’s not uncommon to see the information on a particular topic (e.g. volunteers), spread across different tabs, or duplicated into different spreadsheets where staff have manually copied filtered extracts for particular reports. The challenge with this is that it leads the organisations information to fragment, and makes pulling together both internal and open data and analysis tricky. Many of the spreadsheets we found during the open day mix the ‘data layer’, with ‘presentation’ and ‘analysis’ layers, rather than separating these out.

What can be done?

Before getting started with open data, we realised that we needed to look at the flow of data inside the organisation. So, we looked at what makes a good data layer in a spreadsheet, such as:

  • Keeping all the data of one type in a single worksheet. For example, if you have data on volunteers all the data should be in a single sheet. Don’t start new sheets for ‘Former volunteers’, or ‘Volunteers interested in sports’ – as this fragments the data. If you need to be know about a volunteers interest, or whether they are active or not, add a column to your main sheet, and use filters (see below).
  • Having one header row of columns. You can use merged cells, sub-headings and other formatting when you present data – but when you use these in the master spreadsheet where you collect and store your data you make life trickier for the computer to understand what your data is, and to support different analysis of the data in future.
  • Including validation… Excel allows you to define a list of possible values for a cell, and provides users entering data with a drop-down box to select from instead of them typing values in by hand. This really helps increase the consistency of data. You can also validate to be sure the entry in a cell is a number, or a date, and so-on. In working on some ECDP prototypes we ran up against a problem where our lists of possible valid entries for a cell was too long, and we didn’t want to keep the master-list of valid values in the Excel sheet our data was on, but Wizard of Excel has documented a workaround for that.
  • …but keeping some flexiblity. Really strict validation has it’s own problems, as it can force people to twist what they wanted to record to fit in a structure that doesn’t make sense, or that distorts the data. For example, in some spreadsheets we found the ‘Staff member responsible’ column often had more than one name in. We had to explore why that was, and whether the data structure needed to accomodate more than one staff member linked to a particular row in the spreadsheet. Keeping a spreadsheet structure flexible can be a matter of providing free text areas where users are not constrained in the detail they provide and in having a flexible process to revise and update structures according to demand.

Once you have a well structured spreadsheet (see the open data cookbook section on preparing your data if you still need to get a sense of what well structured data might look like), then you can do a lot more with it. For example:

  • Creating a pivot chart. Pivot chartsare a great way to analyse data, and are well worth spending time to explore. Many of the reporting requirements an organisation has can be met using a pivot chart.For ECDP we created an example well-structured dataset of ‘Lived Experience Feedback’ – views and insights provided by service users and recorded with detailed descriptions, dates when the feedback was given, and categories highlighting the topical focus of the views expressed. We made all this data into an Excel list, which allowed us to add a formula that would apply to every row and that used the =MONTH() formula to extract the month from the dates given in each row. Creating a pivot chart from this list, we could then drill down to find figures such as the number of Lived Experience reports provided to the Insight team and relating to ‘Employment’ in any given month.
  • Creating filtered lists and dashboards. It can seem counterintuitive to an organisation which mostly wants to see data in separate lists by area, or organisational team, to put all the data for those areas and teams into one spreadsheet, with just a column to flag up which team or area a row relates to. That’s why spreadsheets often end up with different tabs for different teams – where the same sort of data is spread across them. Using formulae to create thematic lists and dashboards can be a good way to keep teams happy, whilst getting them to contribute to a single master list of data. (We spent quite a lot of time on the open data day thinking about the importance of motivating staff to provide good quality data, and the need to make the consequences of providing good data visible.)Whilst the ‘Autofilter’ feater in Excel can be used to quickly sub-set a dataset to get just the information you are interested in, when we’re building a spreadsheet to be stored on a shared drive, and used by multiple teams, we want to avoid confusion when the main data sheet ends up with filters applied. So instead we used simple cross-sheet formulae (e.g. If your main data sheet is called ‘Data’, then put =’Data’!A1 in the top-left cell of a new sheet, and then drag it out) to copies of the master sheet, and then applied to the filters to these. We included a big note on each of these extra sheets to remind people that any edits should be made to the master data, not these lists.
  • Linking across spreadsheets. Excel formulae can be used to point to values not just in other sheets, but also to values in other files. This makes it possible to build a dashboard that automatically updates by running queries against other sheets on a shared drive.Things get even more powerful when you are able to publish datasets to the web as open data, when tools like Google Docs have the ability to pull in values and data across the web, but even with non-open data in an organisation, there should be no need to copy and paste values that could be transfered dynamically and automatically.

Of course, when you’ve got lots of legacy spreadsheets around, then making the shift to more structured data, separating the data, analysis and presentation layers, can be tricky. Fortunately, some of the common tools in the open data wranglers toolbox come in handy here.

To move from a spreadsheet with similar data spread across lots of different tabs (one for each team that produced that sort of data), to one with consistent and standardised data, we copied all the data into a single sheet with one header row, and a new column indicating the ‘team’ that row was from (we did this by saving each of the sheets as .csv files, and using the ‘cat’ command on Mac OSX to combine these together, but the same effect can be got with copy and paste).

We then turned to the open data wranglers power tool Google Refine (available as a free download) to clean up the data. We used ‘text facets’ to see where people had entered slightly different names for the same area or theme, and made bulk edits to these, and used some replacement patterns to tidy up date values.

We then took this data back into Excel to build a master spreadsheet, with one single ‘Data’ sheet, and separate sheets for pivot chart reports and filtered lists.

The whole process once started took an hour or so, but once complete, we had a dataset that could be analysed in many more ways than before, and we had the foundations for building both better internal data flows, and for extracting open data to share.

Heading towards a CRM

As much as, with the appropriate planning, discipline and stewardship, Excel can be used to manage a lot of the data an organisation needs, we also explored the potential to use a fully-featured ‘Contact Relationship Management‘ dataset (CRM) to record information right across the organisation.

Even when teams and projects in an organisation are using well structured spreadsheets, there are likely to be overlaps and links between their datasets that are hard to make unless they are all brought into one place. For example, two teams might be talking to the same person, but if one knows the person as Mr Rich Watts, and the other record R.Watts, bringing together this information is tricky. A CRM is a central database (often now accessed over the web) which keeps all this information in one place.

Modern CRM systems can be set up to track all sorts of interactions with volunteers, customers or service users, both to support day to day operations, and to generate management information. We looked at the range of CRM tools available, from the Open Source ‘CiviCRM’ which has case tracking modules that may be useful to an organisation like ECDP, through to tools like Salesforce, which offer discounts to non-profits. Most CRM solutions have free online trials. LASA’s ICT Knowledge Base is a great place to look for more support on exploring options for CRM systems.

In our open data day we discussed the importance of thinking about the ‘user journey’ that any database needs to support, and ensuring the databases enable, rather than constrain, staff. Any process of implementing a new database is likely to involve some changes in staff working practices too, so it’s important to look at the training and culture change components as well as the technical elements. This is something true of both internal data, and open data, projects.

When choosing CRM tools it’s important to think about how a system might make it possible to publish selected information as open data directly in future, and how they might be able to pull in open data.

Privacy Matters

Open data should not involve the release of people’s personal data. To make open data work, a clear line needs to be drawn between data that identifies and is about individuals, and the sorts of non-personal data that an organisation can release as open data.

Taking privacy seriously matters:

  • Data anonymisation cannot be relied upon. Studies conclusively show that we should not put our faith in anonymisation to protect individuals identity in published datasets. It’s not enough to simply remove names or dates of birth from a dataset before publishing it.
  • Any release of data drawn from personal data needs to follow from a clear risk assessment. It’s important to consider what harm could result from the release of any dataset. For example, if publishing a dataset that has contains information on reported hate crime by post-code area, if a report was traced back to an individual could this lead to negative consequences for them?
  • It’s important to be aware of jigsaw re-identification risks. Jigsaw re-identification is the risk that putting together two open datasets will allow someone to unlock previously anonymised personal data. For example, if you publish one open dataset that maps where users of your service are, and includes data on types of disability, and you publish another dataset that lists reports of hate-crime by local area, could these be combined to discover the disability of the person who reported hate crime in a particular area, and then, perhaps combined with some information from a social network like Facebook or Twitter, to identify that person.

Privacy concerns don’t mean that it’s impossible to produce open data from internal datasets of pesonal information, but care has to be taken. There can be tension between the utility of open data, and the privacy of personal data in a dataset. Organisations need to be careful to ensure privacy concerns and the rights of service users always come first.

With the ECDP data on ‘Lived Experience’ we looked at how Google Refine could be used to extract from the data a list of ‘PCT Areas’ and ‘Issue Topics’ reported by service users, to map where the hot-spots were for particular issues at the PCT level. Whilst drawn from a dataset with personal information, this dataset would not include any Personally Identifying Information, and may be possible to publish as open data.

Open data fusion

Whilst a lot of our ‘open data day’ was spent on the foundations for open data work, rather than open data itself, we did work on one small project which had an immediate open data element.

Rich Watts brought to the session a spreadsheet of 250 Disabled People’s User Led Organisations (DPULOs), and wanted to find out (a) how many of these organisations were charities; and (b) what their turnover was. Fortunately, Open Charities has gathered exactly the data needed to answer that question as open data, and so we ran through how Google Fusion Tables could be used to merge Rich’s spreadsheet with existing charity data (see this How To for an almost identical project with Esmee Fairbairn grants data), generating the dataset needed to answer these questions in just under 10 minutes.

We discussed how Rich might want to publish his spreadsheet of DPULOs as open data in future, or to contribute information on the fact that certain charities are Disabled People’s User Led Organisations back to an open data source like Open Charities.

Research Resources

The other element to our data was an exploration of online data sources useful in researching a local area, led fantastically by Matthew of PolicyWorks.

Many of the data sources Matthew was able to point to for finding labour market information, health statistics, demographic information and other stats provide online access to datasets, but don’t offer this as ‘open data’ that would meet the OKF’s open definition requirements, raising some interesting questions about the balance between a purist approach to open data, or an approach that looks for data which is ‘open enough’ for rough-and-ready research.

Where next?

Next week Nominet, NCVO and Big Lottery Fund are hosting a conference to bring together learning from all the different Open Data Days that have been taking place. The day will also see the release of a report on the potential of open data in the charity sector.

For me, today’s open data day has shown that we need to recognise some of the core data skills that organisations will need to benefit from open data. Not just skills to use new online tools, but skills to manage the flow of data internally, and to fascilitate good data management. Investment in these foundations might turn out to be pivotal for realising open data’s third-sector potential…

5-Stars of Open Data Engagement?

[Summary: Notes from a workshop at UKGovCamp that led to sketching a framework to encourage engagement and impact of open data initiatives might contain]

Update: The 5 Stars of Open Data Engagement now have their own website at http://www.opendataimpacts.net/engagement/.

In short

* Be demand driven

* * Provide context

* * * Support conversation

* * * * Build capacity & skills

* * * * * Collaborate with the community

The Context

I’ve spent the last two days at UKGovCamp, an annual open-space gathering of people from inside and around local and national government passionate about using digital technologies for better engagement, policy making and practice. This years event was split over two days: Friday for conversations and short open-space slots; Saturday for more hands-on discussions and action. Suffice to say, there were plenty of sessions on open data on both days – and this afternoon we tried to take forward some of the ideas from Day 1 about open data engagement in a practical form.

There is a general recognition of the gap between putting a dataset online, and seeing data driving real social change. In a session on Day 1 led by @exmosis, we started to dig into different ways to support everyday engagement with data, leading to Antonio from Data.gov.uk suggesting that open data initiatives really needed to have some sort of ‘Charter of engagement’ to outline ways they can get beyond simply publishing datasets, and get to supporting people to use data to create social, economic and administrative change. So, we took that as a challenge for day 2, and in session on ‘designing an engaging open data portal’ a small group of us (including Liz StevensonAnthony Zacharzewski, Jon Foster and Jag Goraya) started to sketch what a charter might look like.

You can see the (still developing) charter draft in this Google Doc. However, it was Jag Goraya‘s suggestion that the elements of a charter we were exploring might also be distilled into a ‘5 Stars’ that seemed to really make some sense of the challenge of articulating what it means to go beyond publishing datasets to do open data engagement. Of course, 5-star rating scales have their limitations, but I thought it worth sharing the draft that was emerging.

What is Open Data Engagement?

We were thinking about open data engagement as the sorts of things an open data initiative should be doing beyond just publishing datasets. The engagement stars don’t relate to the technical openness or quality of the datasets (there are other scales for that), and are designed to be flexible to be able to apply to a particular dataset, a thematic set of datasets, or an open data initiative as a whole.

We were also thinking about open government data in our workshop; though hopefully the draft has wider applicability. The ‘overarching principles’ drafted for the Charter might also help put the stars in context:

Key principles of open government data: “Government information and data are common resources, managed in trust by government. They provide a platform for public service provision, democratic engagement and accountability, and economic development and innovation. A commitment to open data involves making information and data resources accessible to all without discrimination; and actively engaging to ensure that information and data can be used in a wide range of ways.”

Draft sketch of five stars of Open Data Engagement

The names and explanatory text of these still need a lot of work; you can suggest edits as comments in the Google Doc where they were drafted.

* Be demand driven

Are your choices about the data you release, how it is structured, and the tools and support provided around it based on community needs and demands? Have you got ways of listening to people’s requests for data, and responding with open data?

** Provide good meta-data; and put data in context

Do your data catalogue provide clear meta-data on datasets, including structured information about frequency of updates, data formats and data quality? Do you include qualitative information alongside datasets such as details of how the data was created, or manuals for working with the data? Do you link from data catalogue pages to analysis your organisation, or third-parties, have already carried out with the data, or to third-party tools for working with the data?

Often organisations already have detailed documentation of datasets (e.g. analysis manuals and How To’s) which could be shared openly with minimal edits. It needs to be easy to find these when you find a dataset. It’s also common that governments have published analysis of the datasets (they collected it for a reason), or used it in some product or service, and so linking to these from the dataset (and vice-versa) can help people to engage with it.

*** Support conversation around the data

Can people comment on datasets, or create a structured conversation around data to network with other data users? Do you join the conversations? Are there easy ways to contact the individual ‘data owner’ in your organisation to ask them questions about the data, or to get them to join the conversation? Are there offline opportunities to have conversations that involve your data?

**** Build capacity, skills and networks

Do you provide or link to tools for people to work with your datasets? Do you provide or link to How To guidance on using open data analysis tools, so people can build their capacity and skills to interpret and use data in the ways they want to? Are these links contextual (e.g. pointing people to GeoData tools for a geo dataset, and to statistical tools for a performance monitoring dataset)? Do you go out into the community to run skill-building sessions on using data in particular ways, or using particular datasets? Do you sponsor or engage with community capacity building?

When you give people tools – you help them do one thing. When you give people skills, you open the possibility of them doing many things in future. Skills and networks are more empowering than tools. 

***** Collaborate on data as a common resource

Do you have feedback loops so people can help you improve your datasets? Do you collaborate with the community to create new data resources (e.g. derived datasets)? Do you broker or provide support to people to build and sustain useful tools and services that work with your data?

It’s important for all the stars that they can be read not just with engaging developers and techies in mind, but also community groups, local councillors, individual non-techie citizens etc. Providing support for collaboration can range from setting up source-code sharing space on GitHub, to hanging out in a community centre with print-outs and post-it notes. Different datasets, and different initiatives will have different audiences and so approaches to the stars – but hopefully there is a rough structure showing how these build to deeper levels of engagement.

Where next?

Hopefully Open Data Sheffield will spend some time looking at this framework at a future meeting – and all comments are welcome on the Google doc. Clearly there’s lot to be done to make these more snappy, focussed and neat – but if we do find there’s a fairly settled sense of a five stars of engagement framework (if not yet good language to express it) then it would be interesting to think about whether we have the platforms and processes in place anywhere to support all of this: finding the good practice to share. Of course, there might already be a good engagement framework out there we missed when sketching this all out – so comments to that effect welcome too…



Ammended 22nd January to properly credit Antonio of Data.gov.uk as originator of the Charter idea

Exploring Open Charity Data with Nominet Trust

[Summary: notes from a pilot one-day working on open data opportunities in third-sector organisations]

On Friday I spent the day with Nominet Trust for the second of a series of charity ‘Open Data Days’ exploring how charities can engage with the rapidly growing and evolving world of open data. The goal of these hands-on workshops is to spend just one working day looking at what open data might have to offer to a particular organisation and, via some hands-on prototyping and skill-sharing, to develop an idea of the opportunities and challenges that the charity needs to explore to engage more with open data.

The results of ten open data days will be presented at a Nominet Trust, NCVO and Big Lottery Fund conference later in the year, but for now, here’s a quick run-down / brain-dump of some of the things explored with the Nominet Trust team.

What is Open Data anyway?

Open data means many different things to different people – so it made sense to start the day looking at different ways of understanding open data, and identifying the ideas of open data that chimed most with Ed and Kieron from the Nominet Trust Team.

The presentation below runs through five different perspectives on open data, from understanding open data as a set of policies and practices, to looking at how open data can be seen as a political movement or a movement to build foundations of collaboration on the web.

Reflecting on the slides with Ed and Kieron highlighted that the best route into exploring open data for Nominet Trust was looking at the idea that ‘open data is what open data does’ which helped us to set the focus for the day on exploring practical ways to use open data in a few different contexts. However, a lot of the uses of open data we went on to explore also chime in with the idea of a technical and cultural change that allows people to perform their own analysis, rather than just taking presentations of statistics and data at face value.

Mapping opportunities for open data

Even in a small charity there are many different places open data could have an impact. With Nominet Trust we looked at a number of areas where data is in use already:

  • Informing calls for proposals – Nominet Trust invite grant applications for ideas that use technology for disruptive innovation in a number of thematic areas, with two main thematic areas of focus live at any one time. New thematic areas of focus are informed by ‘State of the Art’ review reports. Looking at one of these it quickly becomes clear these are data-packed resources, but that the data, analysis and presentation are all smushed together.
  • Throughout the grant process – Nominet Trust are working not only to fund innovative projects, but also to broker connections between projects and to help knowledge and learning flow between funded projects. Grant applications are made online, and right now, details of successful applicants are published on the Trust’s websites. A database of grant investment is used to keep track of ongoing projects.
  • Evaluation – the Trust are currently looking at new approaches to evaluating projects, and identifying ways to make sure evaluation contributes not only to an organisations own reflections on a project, but also to wider learning about effective responses to key social issues.

With these three areas of data focus, we turned to identify three data wishes to guide the rest of the open data day. These were:

  • Being able to find the data we need when we need it
  • Creating actionable tools that can be embedded in different parts of the grant process – and doing this with open platforms that allow the Nominet Trust team to tweak and adapt these tools.
  • Improving evaluation – with better data in, and better day out

Pilots, prototypes and playing with data

The next part of our Open Data Day was to roll up our sleeves and to try some rapid experiments with a wide range of different open data tools and platforms. Here are some the experiments we tried:

Searching for data

We imagined a grant application looking at ways to provide support to young people not in education, employment or training in the Royal Borough of Kensington and Chelsea, and set the challenge of finding data that could support the application, or that could support evaluation of it. Using the Open Data Cook Book guide to sourcing data, Ed and Keiron set off to track down relevant datasets, eventually arriving at a series of spreadsheets on education stats in London on the London Skills and Employment Observatory website via the London Datastore portal.  Digging into the spreadsheets allowed the team to put claims that could be made about levels of education and employment exclusion in RBKC in context, looking at the difference interpretations that might be drawn from claims made about trends and percentages, and claims about absolute numbers of young people affected.

Learning: The data is out there; and having access to the raw data makes it possible to fact-check claims that might be made in grant applications. But, the data still needs a lot of interpretation, and much of the ‘open data’ is hidden away in spreadsheets.

Publishing open data

Most websites are essentially databases of content with a template to present them to human readers. However, it’s often possible to make the ‘raw data’ underlying the website available as more structured, standardised open data. The Nominet Trust website runs on Drupal and includes a content type for projects awarded funding which includes details of the project, it’s website address, and the funding awarded.

Using a demonstration Drupal website we explored how the Drupal Views and the Views Bonus Pack open source modules it was easy to create a ‘CSV’ open data download of information in the website.

The sorts of ‘projects funded’ open data this would make available from Nominet Trust might be of interest to sites like OpenlyLocal.com which are aggregating details of funding to many different organisations.

Learning: You can become an open data publisher very easily, and by hooking into existing places where ‘datasets’ are kept, keeping your open data up-to-date is simple.

Mashing-up datasets

Because open datasets are often provided in standardised forms, and the licenses under which data is published allow flexible re-use of the data, it becomes easy to mash-up different datasets, generating new insights by combining different sources.

We explored a number of mash-up tools. Firstly, we looked at using Google Spreadsheets and Yahoo Pipes to filter a dataset ready to combine it with other data. The Open Data Cook Book has a recipe that involves scraping data with Google Spreadsheets, and a Yahoo Pipes recipe on combing datasets.

Then we turned to the open data powertool that is Google Refine. Whilst Refine runs in a web browser, it is software you install on your own computer, and it keeps the data on your machine until you publish it – making a good tool for a charity to use to experiment with their own data, before deciding whether it will be published as open data or not.

We started by using Google Refine to explore data from OpenCharities.org – taking a list of all the charities with the word ‘Internet’ in their description that had been exported from the site, and using the ‘Facets’ feature (and a Word Facet) in Google Refine to look at the other terms they used in their descriptions. Then we turned to a simple dataset of organisations funded by Nominet Trust, and explored how by using API access to OpenlyLocal.com’s spending dataset we could get Google Refine to fetch details of which Nominet Trust funded organisations had also recieved money from particular local authorities or big funders like Big Lottery Fund and the Arts Council. This got a bit technical, so a step-by-step How To will have to wait – but the result was an interesting indication of some of the organisations that might turn out to be common co-funders of projects with Nominet Trust – a discovery enabled by those funders making their funding information available as open data.

Learning: Mash-ups can generate new insights – although many mash-ups still involve a bit of technical heavy-lifting and it can take some time to really explore all the possibilities.

Open data for evaluation

Open data can be both an input and an output of evaluation. We looked at a simple approach using Google Spreadsheets to help a funder create evaluation online evaluation tools for funded projects.

With a Google Docs account, we looked at creating a new ‘Form’. Google Forms are easy to create, and let you design a set of simple survey elements that a project can fill in online, with the results going directly into an online Google Spreadsheet. In the resulting spreadsheet, we added an extra tab for ‘Baseline Data’, and exploring how the =ImportData() formula in Google Spreadsheet can be used to pull in CSV files of open data from a third party, keeping a sheet of baseline data up-to-date. Finally, we looked at the ‘Publish as a Web Page’ feature of Google Spreadsheets which makes it possible to provide a simple CSV file output from a particular sheet.

In this way, we saw that a funder could create an evaluation form template for projects in a Google Form/Spreadsheet, and with shared access to this spreadsheet, could help funded projects to structure their evaluations in ways that helped cross-project comparison. By using formulae to move a particular sub-set of the data to a new sheet in the Spreadsheet, and then using the ‘Publish as a Web Page’ feature, non-private information could be directly published as open data from here.

Learning: Open data can be both an input to, and an output from, evaluation.

Embeddable tools and widgets

Working with open data allows you to present one interpretation or analysis of some data, but also allow users of your website or resources to dig more deeply into the data and find their own angles, interpretations, or specific facts.

When you add a ‘Gadget’ chart to a Google Spreadsheet of data you can often turn it into a widget to embed in a third party website. Using some of the interactive gadgets allows you to make data available in more engaging ways.

Platforms like IBM’s Many Eyes also let you create interactive graphs that users can explore.

Sometimes, interactive widgets might already be available, as in the case of Interactive Population pyramids from ONS. The Nominet Trust state of the art review on Aging and use of the Internet includes a static image of a population pyramid, but many readers could find the interactive version more useful.

Learning: If you have data in a report, or on a web page, you can make it interactive by publishing it as open data, and then using embeddable widgets.

Looking ahead

The Open Data Day ended with a look at some of the different ways to take forward learning from our pilots and prototypes. The possibilities included:


  • Quick wins: Making funded project data available as structured open data. As this information is already published online, there are not privacy issues with making it available in a more structured format.
  • Developing small prototypes taking the very rough proof-of-concept ideas from the Open Data Day on a stage, and using this to inform plans for future developments. Some of the prototypes might be interactive widgets.
  • A ‘fact check’ experiment: taking a couple of past grant applications, and using open data resources to fact-check the claims made in those applications. Reflecting on whether this process offers useful insights and how it might form part of future processes.
  • Commissioning open data along with research: when Nominet Trust commissions future State of the Art reviews it could include a request for the researcher to prepare a list of relevant open datasets as well, or to publish data for the report as open data.


  • Explore open data standards such as the International Aid Transparency Initiative Standard for publishing project data in a more detailed form.
  • Building our own widgets and tools: for example, tools to help applicants find relevant open data to support their application, or tools to give trustees detailed information on applicant organisations to help their decision making.
  • Building generalisable tools and contributing to the growth of a common resource of software and tools for working with open data, as well as just building things for direct organisational use.

Where next?

This was just the second of a series of Open Data Days supported by Nominet Trust. I’m facilitating one more next month, and there are a team of other consultants working with varied other charities over the coming weeks. So far I’ve been getting a sense of the wide range of possible areas open data can fit into charity work (it feels quite like exploring the ways social media could work for charities did back in 2007/8…), but there’s also much work to be done identifying some of the challenges that charities might face, and sustainable ways to overcome them. Lots more to learn….

Evaluating the Autumn Statement Open Data Measures

[Summary: Is government is meeting the challenge of building an open data infrastructure for the UK? A critical look at the Autumn Statement Open Data Measures.]

For open data advocates, the Chancellor’s Autumn Statement published on Tuesday, underlined how far open data has moved from a small geeks issue, to an increasingly common element in Government policy. The statement itself included a section announcing new data, and renewing the argument that Public Sector Information (PSI) can play a role in both economic growth, and public service standards.

1.125 Making more public sector information available will help catalyse new markets and innovative products and services as well as improving standards and transparency in public services. The Government will open up access to core public datasets on transport, weather and health, including giving individuals access to their online GP records by the end of this Parliament. The Government will provide up to £10 million over five years to establish an Open Data Institute to help industry exploit the opportunities created through release of this data

And accompanying this the Cabinet Office published a paper of Further Detail on Open Data Measures in the Autumn Statement, including an updated on the fate of the proposed Public Data Corporation consulted on earlier in the year. Although this paper includes a number of positive announcements when it comes to the release of new datasets such as detailed transport and train timetable data, the overall document shows that government continues to fudge key reforms to bring the UK’s open data infrastructure into the 21st Century, and displays some worrying (though perhaps unsurprising) signs of open data rhetoric being hijacked to advance non-open personal data sharing projects, and highly political uses of selective open data release.

In order to put forward a constructive critique, let us take the governments intent at face value (the intent to use PSI and open data to promote economic growth, and to improve standards in public services), and then suggest where the Open Data Measures either fall short of this, or where they should otherwise give cause for concern.

A strategic approach to data?

Firstly, let’s consider the particular datasets being made available: there are commitments to provide train and bus timetable information, highways and traffic data, land registry ‘price paid’ data, Met Office weather data and companies house datasets all under some form of open license. However, the commitments to other datasets, such as key ordnance survey mapping data, train ticket price data, and the national address gazetteer are much more limited, with only a limited ‘developers preview’ of the gazetteer being suggested. There appears to be little coherence to what is being made available as open data, nor a clear assessment of how the particular datasets in question will support economic development and public accountability. If we take seriously the idea that open government data provides key elements of infrastructure for both enterprise and civic engagement in a digital economy, then we need a clear strategic approach to build and invest in that infrastructure: focussing attention on the datasets that matter most rather than seeing piecemeal release of data [1].

Clear institutional arrangements and governance?

Secondly, although the much disliked ‘Public Data Corporation’ proposal to integrate the main trading funds and establish a common (and non-open) regime for their data, has disappeared from the Measures, the alternative institutional arrangements right now appear inadequate to meet key goals of releasing infrastructure data to support economic development, and removing the inefficiencies in the current system which has government buying data off itself, reducing usage and limiting innovation.

The Open Data Measures propose the creation of a ‘Public Data Group (PDG)’ to include the trading funds who retain their trading role, selling core data and value-added services, although with a new responsibility to better collaborate and drive efficiency. The responsibility to promote availability of open data is split off to a ‘Data Strategy Board (DSB)’, which, in the current proposal, will receive a subsidy in it’s first year to ‘buy’ data from the PSG for the public, will in future years rely for it’s funding on a proportion of the dividends paid from the PDG. It is notable that the DSB is only responsible for ‘commissioning and purchasing of data for free release’ and not for ‘open’ release (the difference is in the terms of re-use of the data), which may mean in effect the DSB is only able to ‘rent’ data from the PDG, or that any data it is able to release will be a snapshot in time extract of core reference data, not a sustainable move of core reference data into the public domain.

So – in effect whilst the PDC has disappeared, and there is a split between the bodies with an interest in maximising return on data (PDG), and a body increasing supply of public data (DSB) – the body seeking public data will be reliant upon the profitability of the PDG in order to have the funding it needs to secure the release of data that, if properly released in free forms, would likely undermine the current trading revenue model of the PDG. That doesn’t look like the foundation for very independent and effective governance or regulation to open up core reference data!

Furthermore, whilst the proposed terms for the DSB terms state that “Data users from outside the public sector, including representatives of commercial re-users and the Open Data community, will represent at least 30% of the members of DSB”, there are also challenges ahead to ensure data users from civil society interests are represented on the board, including established civil society organisations from beyond the technology-centric element of the open data community (the local authority or government members of the board will not be ‘open data’ people, but simply data people – who want better access to the resources they may already be using; we should be identifying similar actors from civil society to participate – understanding the role of the DSB as one of data governance through the framework of an open data strategy).

Open data as a cloak for personal data projects and political agendas?

Thirdly, and turning to some of the other alarm bells that ring in the Open Data Measures, the first measures in the Cabinet Office’s paper are explicitly not about open data as public data, but are about the restricted sharing of personal medical records with life-science research firms – with the intent of developing this sector of the economy. With a small nod to “identifying specified datasets for open publication and linkage”, the proposals are more centrally concerned with supporting the development of a Clinical Practice Research Datalink (CPRD) which will contain interlinked ‘unidentifiable, individual level’ health records, by which I interpret the ability to identify a particular individual with some set of data points recorded on them in primary and secondary care data, without the identity of the person being revealed.

The place of this in open data measures raises a number of questions, such as whether the right constituencies have been consulted on these measures and why such a significant shift in how the NHS may be handing citizens personal data is included in proposals unlikely to be heavily scrutinised by patient groups? In the past, open data policies have been very clear that ‘personal data’ is out of scope – and the confusion here raises risks to public confidence in the open data agenda. Leaving this issue aside for the moment, we also need to critically explore the evidence that the release of detailed health data will “reinforce the UK’s position as a global centre for research and analytics and boost UK life sciences”. In theory, if life science data is released digitally and online, then the firms that can exploit it are not only UK firms – but the return on the release of UK citizens personal data could be gained anywhere in the world where the research skills to work with it exist.

When we look at the other administrative datasets proposed for release in the Measures the politicisation of open data release is evident: Fit Note Data; Universal Credit Data; and Welfare Data (again discussed for ‘linking’ implying we’re not just talking about aggregate statistics) are all proposed for increased release, with specific proposals to “increase their value to industry”. By contrast, no mention of releasing more details on the tax share paid by corporations, where the UK issues arms export licenses, or which organisations are responsible for the most employment law violations. Although the stated aims of the Measures include increasing “transparency and accountability” it would not be unreasonable to read the detail of the measures as very one-sided on this point: and emphasising industry exploitation of data far more than good governance and citizen rights with respect to data.

The blurring of the line between ‘personal data’ and ‘open data’, and the state’s assumption of the right to share personal data for industrial gain should give cause for concern, and highlights the need for build a stronger constituency scrutinising government open data action.

Building capacity to use data?

Fourthly, and perhaps most significantly if we are taking seriously the goal of seeing open data not only lead to economic development, but also to better public services, the measures contain a dearth of funding or support to truly support the sorts of skills development and organisational change that will be needed to have effective use of open data in the UK.

The Measures announce the creation of an Open Data Institute, with the possibility of £10m match funding over 5 years, to “help business exploit the opportunities created by release of public data” which does have the potential to address much needed research to the gap in understanding and practice on how to build sustainable enterprise with open data. However, beyond this, there is little in the measures to foster the development of data skills more widely in government, in the economy and in civil society.

We know that open data alone is not enough to drive innovation: it’s a raw material to be combined with others in an information economy and information society. There are significant skills development needs to equip the UK to make the most of open data – and the Measures fall short on meeting that challenge.

A constructive critique?

Many of the detailed measures from the Autumn Statement are still draft – subject to further consultation. As a package, it’s not one to be accepted or rejected out of hand. Rather – there is a need for continued engagement by a broad constituency, including members of the broad based ‘open data community’ to address the measures one-by-one as government works to fill in the details over coming months.


[1] An open data infrastructure: The idea of open data as digital infrastructure for the nation has a number of useful consequences. It can help us to develop our thinking about the state’s responsibility with respect to datasets. Just as in the development of our physical infrastructure the state both invested directly in provision of roads and railways, has adopted previously privately created infrastructure (the turnpikes for examples), and encouraged private investment within frameworks of government regulation, a strategic approach to public data infrastructure would not just be about pre-existing datasets having an open license slapped on them – but would involve looking at a range of strategies to provide the open data  foundations for economic and civic activity. Government may need to act as guarantor of specific datasets, if not core provider. When we think infrastructure projects, we can think critically about who benefits from particular projects: and can have an open debate about where limited state resources to support a sustainable open data infrastructure should go. The infrastructure metaphor also helps us start to distinguish different sorts of government data, recognising that performance data and personal data may need to be handled within different arrangements and frameworks from core reference data like mapping and transport systems information. In the later case, there is a strong argument to secure a guarantee of the continued funding of these resources as public goods, free at the point of use, kept in public trust, and maintained to high standards of consistency. Other arrangements are likely to lead to over-charging and under-use of core reference datasets, with deadweight loss of benefit – and particularly excluding civic uses and benefits. In the case of other datasets generated by government in the day to day conduct of business (performance data; aggregate medical records, etc.), it may be more appropriate to recognise that while there is benefit to be gained from the open release of these (a) for civic use; and (b) for commercial use, this will vary significantly on a case-by-case basis, and the release of the data should not create an ongoing obligation on government to continue to collect and produce the data once it is no longer useful for government’s primary purpose.)

Open Personae: a step towards user-centred data developments?

[Summary: reflections on data-shaped design, and adding user persona as a new raw material in working with open data]

A lot has been written recently about the fact that open data alone is not enough to make a difference. Data needs to be put into the hands of those who can use it to make a difference, and if the only way to do that is as a programmer, or someone with the resources to hire one, we end up with a bigger, rather than narrower, data divide.

Infomediaries, with the technical skills to take data and create accessible interface onto it; to integrate it into existing systems; and to make it accessible to be communicated to those who need it, are a key part of the solution. However, unlike common software and resource development challenges, which often start from a clearly articulated problem and user needs, and then work backwards to source data and information, open data projects often have a different structure. A need is recognized; data is identified; data is opened; and then from the data applications and resources are built. The advantage of open data is that, rather than data being accessed just to solve one particular problem, it is now available to be used in a wide range of problem solving. But, there is a risk that the structure of the open data process introduces a disconnect: specific problems drive demands for open data, but open data offers general solutions – and those with the skills to work with data may not be aware of, or connected with, the specific problems that motivated the desire to open the data in the first place; nor with other specific problems which the data, now it is open, can be part of solving.

When open data is the primary raw material for a project, that data can exert a powerful influence in shaping the design of the project and its outputs. The limitations of the data quickly become accepted as limitations of the application; the structure of the data is often presented to the user, regardless of whether this is the structure of information they need to be able to use the application effectively. Data-shaped design is not necessarily good design. But finding ways to put users back at the heart of projects, and adopt user-centered design approaches to working with data can be a challenge.

The frictionless nature of accessing data contrasts heavily with the friction involved in identifying and working with potential users of a data-driven application. For technical developers interested in experimenting with data in hack-day contexts*, or working in small, time and resource-limited, projects, the overheads of user engagement are a big ask. It’s an even bigger challenge in projects like the International Aid Transparency Initiative (IATI), where with aidinfo labs I’m currently trying to support development of informediary apps and resources for users spread across the globe: users who might be in low-bandwidth/limited Internet access environments, or in senior governmental positions, where engagement in a user-workshop is not easy to secure.

So – without ignoring the need to have real user engagement in a project – one of the things we’re just starting to experiment with in the aidinfo labs project, is adding another raw material alongside our open data. We’re creating a set of ‘open personae’ – imaginary profiles of potential users of applications and resources built with IATI data, designed to help techies and developers gain insights into the people who might benefit from the data, and to help provide a clearer idea of some of the challenges applications need to meet.

So far we’ve created four personae (borrowing one from another project), simply working in open Google Docs so that we can collaboratively draft them, and leave them open to comment to help them develop. And we’re planning to create lots more over the coming months (with fantastic support from Tara Burke who is researching and writing a lot of the profiles), created as an open resource so others can use them too.

I’m keen to explore how these personae can provide a first step to greater user-centered design in data use – and how we can use them as an intuitive tool for us to explore who is being best served by the eco-system of applications and infomediaries around IATI data. I’m also curious about the potential for a wider library of open personae to be used to help other open data projects include users as a key raw material for app building.

If ‘Data + Data-use skills + Involvement of Users’ is a part of ‘effective use’ of open data, then ‘Data + Skills + Understanding of users’ must be a step in the right direction…

Sprinkled stats and the search for data…

[Summary: Data-driven vs. data guided change-making. Reposted from the new Making a Difference With Data website]

I woke up to a tweet this morning from @YoungAdvisors pointing me to their new ‘Big Book of Stats’ and ‘What’s the Real Cost of Cutting’ resources – bringing together statistics from across the youth sector in a quick-to-skim PDF.

I got in touch with Gary Buxton, Young Advisors Chief Exec to ask a few questions about the stats:

Q: What inspired your to collect the figures you have gathered?
When times are tough its even more important to share and collaborate.  Our social goals are about creating good opportunities for young people. Having charities, social enterprises and young people all replicating work is distracting and reduces everyone’s ability to deliver. If we all shared a little bit more, we’d all be greater than the sum of our parts.

Q: How easy was it to find the data and numbers you needed?
Both pieces were pretty difficult to pull together.  It became a bit of an evening hobby! Stats came from old NYA policy briefings, NCVYS, Twitter, Facebook, Private Consultancy Companies, New Economics Foundation, Prince’s Trust and government sites etc etc.  I still really want how much it costs when a young person is excluded from school!

Q: How are you now planning to use these figures?
We use the stats for writing bids and helping the young people we work with write bids and presentations that are well informed and referenced.  Knowing your data helps young people make reasoned and compelling solutions to community problems.  We wanted to open the data to others who might find it helpful so everyone can work smart and not hard, keep delivering great work, but most of all, make a good case to decision makers, councillors and MPs about how important investing in young people is and the risk of pulling funding from services that young people regard as important.

As the ‘Sprinkled Statistics’ recipe over in the Open Data Cook Book suggests, sometimes using open data is as simple as backing up an argument with the numbers – with no need for fancy visualisation or mash-ups. Resources like Young Advisors Big Book of Stats can make that easier for other groups.

But, as Gary notes, even just collecting the statistics you need from government reports, let alone getting access to raw data to slice and explore it in different ways, can be tricky. And as Paul Clarke questions in a blog post today, is getting the data always the most important part of campaigning for a change? Whilst we might imagine there are clear ‘facts’ about the cost of school exclusions, or patient to nurse ratios, these statistics do not come solely from direct measurement, but are based on calculations from different datasets, and, importantly, rest upon definitions (what is an exclusion; what counts as a direct or indirect cost of exclusion; do you count all the time a nurse is on the ward, or only the time they are available for patient care (not paperwork). As Paul puts it:

…does the cause need the data? Does the search for data delay the obvious? Could the open data revolution sometimes obfuscate more than enlighten? While we’re arguing over reporting standards, boundary definitions and data feeds, real people are hurting and starving.

So where does this leave us? Having access to statistics, data and figures at a local level can certainly help strengthen those advocating for change. And knowing the numbers can inform bids, proposals and smarter working. But perhaps key here is to see campaigning for change as ‘data guided’ and ‘data backed’ rather than ‘data-driven’.

Making a difference with data means knowing how to use it as a tool, but one amongst many in the change-makers toolbox.

Open data quick links: cook books; aid data; campaign camps; MADwData

[Summary: A couple of quick open data links]

The Open Data Cook Book now has a new look and a few more recipes – providing step by step instructions for working with open data. It’s also now Wikified – so anyone can sign-up to edit and add recipes. So, if you’ve got ideas for how people can use open data in creative ways – head over and add some recipes.

On the topic of Making a Difference With Data the new MADwData website is packed full of links and analysis on open data to support change at a local level, particularly organised around different sectors: health, local authorities, housing, transport, crime & education.  I’m editing the education section, and have been exploring how open the EduBase dataset really is. Take a look though at the fantastic content from the other editors – all giving some great overviews of the state of data for change in different contexts.

In the MADwData forum Vicky Sargent has been asking about the use of data in library closure campaigns. I’ve been in touch with a lot of campaigning organisations recently who sense that there is real potential for using open data as part of campaigns – but unsure exactly how it should work and how to start engaging with data (and open data advocates asking the same questions from the other direction). Hopefully we’ll be digging into exactly these questions, and providing some practical learning opportunities and take-away ideas at the upcoming Open Data Campaigning Camp in Oxford on 24th March. It’s tacked onto the end of the E-Campaigning Forum, and I’m co-organising with Rolf Kleef and Javier Ruiz. Free places are still left for organisations interested in spending day of hands-on learning exploring how data could help in campaigning against cuts; on environmental issues; and in international development campaigns and funding.

And talking of development funding… (not only a post of outward links; seemless links internally as well!) – last week the International Aid Transparency Initiative (IATI) Standard‘s first version was full agreed. I had the pleasure of working with Development Initiatives on a demonstrator of how IATI data could be visualised, the results of which are available on AidInfoLabs as the IATI Data Explorer allowing you to pick any country and dig into details of where DFID UK Government Aid spending has gone there – and, where the data is available, digging into the individual transactions.

Expectations and Evidence: youth participation and open data

[Summary: Exploring ways to use data as part of a youth participation process.]

Over the last year and a bit I’ve been doing less work on youth engagement and civic engagement processes than I would ideally like. I’m fascinated by processes of participation, and how to design activities and frameworks within which people can actively influence change on issues that affect them – getting beyond simply asking different groups the question ‘what do you want?’ and then struggling to reconcile conflicting answers (or, oftentimes, simple ignoring this input), to create spaces in which the different factors and views affecting a decision are materialised and in which those affected by decisions get to engage with the real decision making process. I’ve had varying levels of successes doing that – but the more time I’ve been spending with public data – the more I’ve been struggling to work out how to bring it into participative discussions in ways that are accessible and empowering to participants.

Generally data is about aggregates: about trends and patterns rather than the specific details of individual cases. Yet in participation, the goal is often to allow people to bring their own specific experience into discussions and to engage with issues and decisions based upon their unique perspectives. How can open datasets complement that process?

The approach I started to explore in a workshop this evening was linking ‘expectations and evidence’ – asking a group to draw upon their experience to write down a list of expectations, based on the questions that had been asked in a survey they had carried out amongst their peers – and then helping them to use IBM Many Eyes to visualise and explore the survey evidence that might support or challenge their expectations (I’ve written up the process of using the free Many Eyes tool over in the Open Data Cook Book). It was a short session, and not all of the group were familiar with the survey questions, so I would be pushed to call it a great success, but it did generate some useful learning about introducing data into participation processes.

1) Stats are scary (and/or boring; and/or confusing)
Even using a fairly interactive data visualisation tool like IBM Many Eyes statistics and data are, for many people, pretty alien things. The idea of multi-variate analysis (looking at more than one variable at once and the relationship between variables) is not something most people spend much time on in school or college – and trying to introduce three-variable analysis in a short youth participation workshop is tricky without leading to quite a bit of confusion.

One participant in this evenings working made the suggestion that “It would be useful to have a reminder of how to read all these charts. What does all this mean?”. Next time I run a similar session (as I’m keen to develop the idea further) I’ll look into finding/preparing a cheat-sheet for reading any data visualisations that get created…

2) ‘Expectations and Evidence’ can provide a good framework to start engaging with data
In this evenings workshop after looking at data we turned to talk about interview questions the group might ask delegates at an upcoming conference. A number of the question ideas threw up new ideas for ‘expectations’ the group had (for example, that youth services were being cut in different ways in different places across the country), which there might be ‘evidence’ available to support or challenge. Whilst we didn’t have time to then go and seek out the relevant data there was potential here to try and then go and search data catalogues and use a range of visualisation and exploration approaches to test those bigger expectations more (our first expectations work focussed on some fairly localised survey data).

3) The questions and processes matter
When I started to think about how data and participation might fit together I sketched out different sorts of questions that participation processes might work with. Different questions link to different processes of decision making…

  • (a) What was your experience of…? (share your story…we’ll analyse)
  • (b) What do you think of…? (give your opinion … we’ll decide what to do with it)
  • (c) What should we do about…? (give us your proposals…)
  • (d) Share this decision with us… (we need to work from shared understanding…)

To introduce data into (a) and (b) is tricky. If the ‘trend’ contradicts an individuals own view or experience, it can be very demanding to ask them to reconcile that contradiction. Of course, creating opportunities for people with experience of a situtation to reconcile tensions between stats and stories is better than leaving it up to distant decision makers to choose whether to trust what the data says, or what people are saying, when it seems they don’t concur – but finding empowering participative processes for this seems tough.

It seems that data can feature in participation more easily when we shift from opinion gathering to decision sharing; but building shared understanding around narratives and around data is not something that can happen quickly in short sessions.

I’m not sure this post gets me towards any great answers on how to link data into participative processes. But, in interests of thinking aloud (and in an effort to reclaim my blogging as reflective practice, getting away from the ways it’s been rather news and reporting driven of late) I’ll let it make it onto the blog, with all reflections/comments very much welcomed…

CfP: Journal Special Issue on Open Data

[Summary: Abstracts wanted for special issue of Journal of Community Informatics focussing on supply and use of open government data in different contexts across the world]

Michael Gurstein’s blog post last year on Open Data: Empowering the Empowered, or Effective Use for Everyone sparked some interesting discussions about how open data policies and practices impact different groups on the ground. The question of what impacts open data will have in different contexts has been picked up in Daniel Kaplan’s recent post on the OKF blog, and the need for different approaches to open data in different countries is a key theme in the draft Open Government Data in India report. With the discussion on open data impacts growing, I’m really pleased to be able to share the Call for Proposal below for a special issue of the Journal of Community Informatics that I’ll be guest editing along with Zainab Bawa of the CIS in India. So, if you’ve been meaning to write an article on the impacts of open data, or you know of grass roots projects in different places across the world working with the supply or use of open data, take a look at the call below…

Journal of Community Informatics: Call for Papers for Special issue on Open Data

Guest editors:  Tim Davies, Practical Participation and Zainab Bawa, CIS-RAW fellow

Call for Proposals
The Journal of Community Informatics is a focal point for the communication of research that is of interest to a global network of academics, Community Informatics practitioners and national and multi-lateral policy makers.

We invite submission of original, unpublished articles for a forthcoming special edition of the Journal that will focus on Open Data. We welcome research articles, case studies and notes from the field. All research articles will be double blind peer-reviewed. Insights and analytical perspectives from practitioners and policy makers in the form of notes from the field or case studies are also encouraged. These will not be peer-reviewed.

Why a special issue on Open Data
In many countries across the world, discussions, policies and developments are actively emerging around open access to government data. It is believed that opening up government data to citizens is critical for enforcing transparency and accountability within the government. Open data is also seen as holding the potential to bring about greater citizens’ participation, empowering citizens to ask questions of their governments via not only the data that is made openly available but also through the interpretations that different stakeholders make of the open data. Besides advocacy for open data on grounds of democracy, it is also argued that opening government data can have significant economic potential, generating new industries and innovations.

Whilst some open government data initiatives are being led by governments, other open data projects are taking a grassroots approach, collecting and curating government data in reusable digital formats which can be used by specific communities at the grassroots and/or macro datasets that can be used/received/applied in different ways in different local/grassroots contexts. INGOs, NGOs and various civil society and community based organizations are also getting involved with open data activities, from sharing data they hold regarding aid flows, health, education, crime, land records, demographics, etc, to actively sourcing public data through freedom of information and right to information acts. The publishing of open data on the Internet can make it part of a global eco-system of data, and efforts are underway in technology, advocacy and policy-making communities to develop standards, approaches and tools for linking and analysing these new open data resources. At the same time, there are questions surrounding the very notion of ‘openness’, primarily whether openness and open data have negative repercussions for particular groups of citizens in certain social, geographic, political, demographic, cultural and other grassroots contexts.

In sum then, what we find in society today is not only various practices relating to open data, but also an active shift in paradigms about access and use of information and data, and notions of “openness” and “information/data”. These emerging/renewed paradigms are also configuring/reconfiguring understandings and practices of “community” and “citizenship”. We therefore find it imperative to engage with crucial questions that are emerging from these paradigm shifts as well as the related policy initiatives, programmatic action and field experiences.

Some of the questions that we hope this special issue will explore are:

  1. How are citizens’ groups, grassroots organizations, NGOs, diverse civil society associations and other public and private entities negotiating with different arms of the state to provide access to government data both in the presence and absence of official open data policies, freedom/right of information legislations and similar commitments on the part of governments?
  2. What are the various models of open data that are operational in practice in different parts of the world? What are the different ways in which open data are being used by and for the grassroots and what are the impacts (positive, negative, paradoxical) of such open data  for communities and groups at the grassroots?
  3. Who/which actors are involved in opening up what kinds of data? What are their stakes in opening up such data and making it available for the public?
  4. What are the different technologies that are being used for publishing, storing and archiving open data? What are the challenges/issues that various grassroots users and the stakeholders, experience with respect to these technologies i.e., design, scale, costs, dissemination of the open data to different publics and realizing the potential of open data?
  5. What notions of openness and publicness are at work in both policies as well as initiatives concerning open data and what impacts do these notions have on grassroots’ practitioners and users?
  6. Following from the above, what are the implications of opening up different kinds of data for privacy, security and local level practices and information systems?

Thematic focus
The following suggested areas of thematic focus (policy, technology, uses, impacts) give a non-exhaustive list of potential topic areas for articles or case studies. The core interest of the special issue is addressing each of these themes from, or taking into account, grassroots, local citizen and community perspectives.

  1. Different policy and practice approaches to open data and open government data
  2. Diverse uses of open data and their impacts
  3. Technologies that are deployed for implementing open data and their implications
  4. Critical assessments of stakeholders and stakes in opening up different kinds of data.
Abstracts are invited in the first instance, to be submitted by e-mail to jociopendata@gmail.com.

Deadline for abstracts: 31st March 2011
Deadline for complete paper submissions: 15th September 2011
Publication date is forthcoming

Please send abstracts, in the first instance, to jociopendata@gmail.com.

For information about JCI submission requirements, including author guidelines, please visit: http://www.ci-journal.net/index.php/ciej/about/submissions#onlineSubmissions

Guest Editors

Zainab Bawa
Centre for Internet and Society (CIS) RAW fellow bawazainab79@gmail.com

Tim Davies
Director, Practical Participation (http://www.practicalparticipation.co.uk)
tim@practicalparticipation.co.uk | @timdavies | +447834856303

Sourcing raw data… (drafting the open data cook book)

Open Data Cook Book LogoI’m at the Local by Social South West ‘Apps for Communities’ event in Bristol today, doing some prototyping work on the Open Data Cook Book. Listening to people working through how to find data – and trying to search for data myself, I thought I would try and map out all the different places I’ve been looking to track down different open datasets. So – with a sprinkling of recipe book metaphors – here’s a draft for comment of key places to track down open data (focussed on UK government data)…

Sourcing raw data

Finding the right ingredients for your data creation is often the hardest part. You will often have to mix-and-match from the approaches below to get all the data and information you need.

1) Search the supermarkets – the data catalogues & data stores

There are a growing number of data catalogues that bring together listings of published open data (and there are also now data marketplaces that can help you find commercially licensed data as well – so be sure to check the details of the data you find).

Data catalogues often have a particular focus – and no one catalogue can tell you about all the data out there.

CKAN.net is a catalogue of data from many different sources. Good to check if you are not quite sure where the dataset you want might be found to see if someone has already created a ‘packaged‘ version of it.

Data.gov.uk is the UK Governments data catalogue, which aims to include listings of all open datasets in the public sector. It’s early days yet, but it boasts over 4,600 dataset listings, many of which link direct to spreadsheets and data downloads.

Guardian World Data Store makes it easy to search across a range of different government open data catalogues – browsing data by country and format.

Your local authority might have a data store, or at least a data page on their website. London has http://data.london.gov.uk and you can find a list of other local open data web pages through the ‘All Councils’ listing at OpenlyLocal.com.

Publicdata.eu is a new catalogue bringing together data from right across Europe.

2) Specialist independents – data stores

Where the supermarkets are stacking the datasets high, and sharing them free – there might be a specialist in your area of interest – working hard to source and bring together the finest data they can. Fortunately, most of them provide the data for free too.

OpenlyLocal.com is focussed on making local council information accessible. You can find details of local council spending for many authorities alongside details of council meetings and councillors that has been scrumped and scraped from the respective websites for you. Most of the raw data is available through an API – so you might need to explore a few new skills to get at it though.

Timetric.com are specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data.

Do you have a specialist independent you go to for data? Tell us about them in the comments.

3) Foraging – searching for the data

If the data you want isn’t available pre-packaged and catalogued, you might need to head out foraging across the Internet. There is a lot of open data in the wild – you just need to know how to spot it.

GetTheData.org makes a great first port of call to see if other data-foragers have already found a good spot to get the data you are after. It’s a community website full of requests for data, and conversations about good places to find it. Plus, if your own foraging doesn’t turn up anything, you can come back and pose your question to the community here later.

SearchTry searching the web for the topic you are interested in. Perhaps add ‘data’ as an extra key word. When you read news articles or web pages that appear to be based on data, take note of the names of the data sources they mention and plug that back into a search. Oftentimes that will lead you to some data you might be able to use.

Think-tank websites, academic researcher web pages and even newspaper sites can all host lots of datasets. Just make sure you find out all you can about the provenance of the information before you use it!

Deep searchingYou can use a standard Google Search to look for data published in common office formats hosted on a particular web domain: your local council or university for example. All you need are two handy operators:

  • The ‘site:’ operator on Google restricts searches to only show results from a particular domain;
  • The ‘filetype:’ operator only returns files of a particular type.

Using those together you can construct searches like ‘filetype:xls site:oxford.gov.uk’ to find all the Excel spreadsheets that Google has indexed on the Oxford City Council website.

4) Scrumping – screen-scrape the data

It’s not uncommon to find the data you need… only it’s just out of reach. Perhaps it’s in a table on a web page when you want it in the sort of table you can load into a spreadsheet to sort and chart. Or it might be spread across lots of different web pages and files. That’s where screen-scraping comes in – creating small computer scripts that turn structured information on a website into raw data.

There are recipes that explain the details of screen-scraping coming in the cook book, and you can go screen-scrape scrumping with a variety of different tools.

Google Spreadsheetsusing a special formula you can grab tables and lists from other websites direct into your spreadsheet (recipe).

Scraper Wiki – helps you get started created advanced scrapers which they will run every day to grab information from websites and turn it into accessible raw data (recipe).

5) Special order – FOI

Perhaps you have found that no-one stocks the data you need – not even in places you can forage or scrump for it. If the data comes from a public body, then it might be time to explore putting in a special request for it using the Freedom of Information Act.

WhatDoTheyKnow.com is a service that makes it easy to submit a Freedom of Information Act request to a local authority, government department or other public body. You have a right to ask authorities for a copy of the information and data they hold, and you can ask for it to me returned as raw data. Search WhatDoTheyKnow to see if anyone has requested the data you want already, and if not, put in your request. (Often if data is available on WhatDoTheyKnow it will be locked up in PDFs. You might need to crowd-source the process of turning it into structured raw data, although there are a few tools and approaches that might help turn PDFs into data programatically)

The Public Sector Information Unlocking Service available at http://unlockingservice.data.gov.uk/ provides a root for requesting data is opened up by the Data.gov.uk team. It’s not backed by the legal framework of FOI, but may play a role in data requests under the currently debated ‘Right to Data’ legislation.

IsItOpenData.org provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing.

6) Home grown – research and crowdsourcing

Some data simply doesn’t exist yet – but you can create a raw dataset through research, and through crowd-sourcing, inviting others to help you research.

Simple spreadsheets – if you are systematically working through a research task, keep your results in a spreadsheet. See the section on raw data for ideas about how to structure it well.

Google Forms – available through http://docs.google.com allows you to create an online form that anyone can fill in, with all the responses going direct into a spreadsheet for you to use. You might be able to get supporters to research for you and collaborative build up a useful dataset.

Always check the label

Is the data you have found licensed for re-use? Whilst you might get away with cooking up some foraged raw data for your own consumption without checking out the details – when you re-publish data and share it with others you need to be sure you have permission to do so.

Remember as well to keep a list of the ingredient you use, and where you got them from, so you can publish a full list of sources along with your creation.)

Worked example: A simple search, with many steps

Sadly we’re not yet at the stage where you can easily get all the data you need delivered to your door – so most projects will involve some searching around.

For example: I was recently looking for data on library locations in Bristol. I started at the data supermarkets, searching data.gov.uk for ‘libraries’. I found a few datasets listed, but the links were broken, so I ended up at a dead end. Next I turned to the Guardian datastore, but that wasn’t very helpful either – so I looked at GetTheData.org to see if anyone else had been looking for library data. Fortunately they had, and their conversations pointed me towards a few possible data sources. Again though, I ended up almost a a dead end – I could find a list of planned library closures, but not a dataset of all the libraries. However, I did find a link to the Bristol Council website, and on browsing the site I came across a listing of libraries in a web-page – so I turned to a little scrumping – using Google Spreadsheets to import the web-page table into a spreadsheet table that I could manipulate and work with. Working through the list of data sources above I was searching for about 15 minutes – following my nose to finally get to the raw ingredients I needed for some data creations.