Open data quick links: cook books; aid data; campaign camps; MADwData

[Summary: A couple of quick open data links]

The Open Data Cook Book now has a new look and a few more recipes – providing step by step instructions for working with open data. It’s also now Wikified – so anyone can sign-up to edit and add recipes. So, if you’ve got ideas for how people can use open data in creative ways – head over and add some recipes.

On the topic of Making a Difference With Data the new MADwData website is packed full of links and analysis on open data to support change at a local level, particularly organised around different sectors: health, local authorities, housing, transport, crime & education.  I’m editing the education section, and have been exploring how open the EduBase dataset really is. Take a look though at the fantastic content from the other editors – all giving some great overviews of the state of data for change in different contexts.

In the MADwData forum Vicky Sargent has been asking about the use of data in library closure campaigns. I’ve been in touch with a lot of campaigning organisations recently who sense that there is real potential for using open data as part of campaigns – but unsure exactly how it should work and how to start engaging with data (and open data advocates asking the same questions from the other direction). Hopefully we’ll be digging into exactly these questions, and providing some practical learning opportunities and take-away ideas at the upcoming Open Data Campaigning Camp in Oxford on 24th March. It’s tacked onto the end of the E-Campaigning Forum, and I’m co-organising with Rolf Kleef and Javier Ruiz. Free places are still left for organisations interested in spending day of hands-on learning exploring how data could help in campaigning against cuts; on environmental issues; and in international development campaigns and funding.

And talking of development funding… (not only a post of outward links; seemless links internally as well!) – last week the International Aid Transparency Initiative (IATI) Standard‘s first version was full agreed. I had the pleasure of working with Development Initiatives on a demonstrator of how IATI data could be visualised, the results of which are available on AidInfoLabs as the IATI Data Explorer allowing you to pick any country and dig into details of where DFID UK Government Aid spending has gone there – and, where the data is available, digging into the individual transactions.

Sourcing raw data… (drafting the open data cook book)

Open Data Cook Book LogoI’m at the Local by Social South West ‘Apps for Communities’ event in Bristol today, doing some prototyping work on the Open Data Cook Book. Listening to people working through how to find data – and trying to search for data myself, I thought I would try and map out all the different places I’ve been looking to track down different open datasets. So – with a sprinkling of recipe book metaphors – here’s a draft for comment of key places to track down open data (focussed on UK government data)…

Sourcing raw data

Finding the right ingredients for your data creation is often the hardest part. You will often have to mix-and-match from the approaches below to get all the data and information you need.

1) Search the supermarkets – the data catalogues & data stores

There are a growing number of data catalogues that bring together listings of published open data (and there are also now data marketplaces that can help you find commercially licensed data as well – so be sure to check the details of the data you find).

Data catalogues often have a particular focus – and no one catalogue can tell you about all the data out there.

CKAN.net is a catalogue of data from many different sources. Good to check if you are not quite sure where the dataset you want might be found to see if someone has already created a ‘packaged‘ version of it.

Data.gov.uk is the UK Governments data catalogue, which aims to include listings of all open datasets in the public sector. It’s early days yet, but it boasts over 4,600 dataset listings, many of which link direct to spreadsheets and data downloads.

Guardian World Data Store makes it easy to search across a range of different government open data catalogues – browsing data by country and format.

Your local authority might have a data store, or at least a data page on their website. London has http://data.london.gov.uk and you can find a list of other local open data web pages through the ‘All Councils’ listing at OpenlyLocal.com.

Publicdata.eu is a new catalogue bringing together data from right across Europe.

2) Specialist independents – data stores

Where the supermarkets are stacking the datasets high, and sharing them free – there might be a specialist in your area of interest – working hard to source and bring together the finest data they can. Fortunately, most of them provide the data for free too.

OpenlyLocal.com is focussed on making local council information accessible. You can find details of local council spending for many authorities alongside details of council meetings and councillors that has been scrumped and scraped from the respective websites for you. Most of the raw data is available through an API – so you might need to explore a few new skills to get at it though.

Timetric.com are specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data.

Do you have a specialist independent you go to for data? Tell us about them in the comments.

3) Foraging – searching for the data

If the data you want isn’t available pre-packaged and catalogued, you might need to head out foraging across the Internet. There is a lot of open data in the wild – you just need to know how to spot it.

GetTheData.org makes a great first port of call to see if other data-foragers have already found a good spot to get the data you are after. It’s a community website full of requests for data, and conversations about good places to find it. Plus, if your own foraging doesn’t turn up anything, you can come back and pose your question to the community here later.

SearchTry searching the web for the topic you are interested in. Perhaps add ‘data’ as an extra key word. When you read news articles or web pages that appear to be based on data, take note of the names of the data sources they mention and plug that back into a search. Oftentimes that will lead you to some data you might be able to use.

Think-tank websites, academic researcher web pages and even newspaper sites can all host lots of datasets. Just make sure you find out all you can about the provenance of the information before you use it!

Deep searchingYou can use a standard Google Search to look for data published in common office formats hosted on a particular web domain: your local council or university for example. All you need are two handy operators:

  • The ‘site:’ operator on Google restricts searches to only show results from a particular domain;
  • The ‘filetype:’ operator only returns files of a particular type.

Using those together you can construct searches like ‘filetype:xls site:oxford.gov.uk’ to find all the Excel spreadsheets that Google has indexed on the Oxford City Council website.

4) Scrumping – screen-scrape the data

It’s not uncommon to find the data you need… only it’s just out of reach. Perhaps it’s in a table on a web page when you want it in the sort of table you can load into a spreadsheet to sort and chart. Or it might be spread across lots of different web pages and files. That’s where screen-scraping comes in – creating small computer scripts that turn structured information on a website into raw data.

There are recipes that explain the details of screen-scraping coming in the cook book, and you can go screen-scrape scrumping with a variety of different tools.

Google Spreadsheetsusing a special formula you can grab tables and lists from other websites direct into your spreadsheet (recipe).

Scraper Wiki – helps you get started created advanced scrapers which they will run every day to grab information from websites and turn it into accessible raw data (recipe).

5) Special order – FOI

Perhaps you have found that no-one stocks the data you need – not even in places you can forage or scrump for it. If the data comes from a public body, then it might be time to explore putting in a special request for it using the Freedom of Information Act.

WhatDoTheyKnow.com is a service that makes it easy to submit a Freedom of Information Act request to a local authority, government department or other public body. You have a right to ask authorities for a copy of the information and data they hold, and you can ask for it to me returned as raw data. Search WhatDoTheyKnow to see if anyone has requested the data you want already, and if not, put in your request. (Often if data is available on WhatDoTheyKnow it will be locked up in PDFs. You might need to crowd-source the process of turning it into structured raw data, although there are a few tools and approaches that might help turn PDFs into data programatically)

The Public Sector Information Unlocking Service available at http://unlockingservice.data.gov.uk/ provides a root for requesting data is opened up by the Data.gov.uk team. It’s not backed by the legal framework of FOI, but may play a role in data requests under the currently debated ‘Right to Data’ legislation.

IsItOpenData.org provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing.

6) Home grown – research and crowdsourcing

Some data simply doesn’t exist yet – but you can create a raw dataset through research, and through crowd-sourcing, inviting others to help you research.

Simple spreadsheets – if you are systematically working through a research task, keep your results in a spreadsheet. See the section on raw data for ideas about how to structure it well.

Google Forms – available through http://docs.google.com allows you to create an online form that anyone can fill in, with all the responses going direct into a spreadsheet for you to use. You might be able to get supporters to research for you and collaborative build up a useful dataset.


Always check the label

Is the data you have found licensed for re-use? Whilst you might get away with cooking up some foraged raw data for your own consumption without checking out the details – when you re-publish data and share it with others you need to be sure you have permission to do so.

Remember as well to keep a list of the ingredient you use, and where you got them from, so you can publish a full list of sources along with your creation.)

Worked example: A simple search, with many steps

Sadly we’re not yet at the stage where you can easily get all the data you need delivered to your door – so most projects will involve some searching around.

For example: I was recently looking for data on library locations in Bristol. I started at the data supermarkets, searching data.gov.uk for ‘libraries’. I found a few datasets listed, but the links were broken, so I ended up at a dead end. Next I turned to the Guardian datastore, but that wasn’t very helpful either – so I looked at GetTheData.org to see if anyone else had been looking for library data. Fortunately they had, and their conversations pointed me towards a few possible data sources. Again though, I ended up almost a a dead end – I could find a list of planned library closures, but not a dataset of all the libraries. However, I did find a link to the Bristol Council website, and on browsing the site I came across a listing of libraries in a web-page – so I turned to a little scrumping – using Google Spreadsheets to import the web-page table into a spreadsheet table that I could manipulate and work with. Working through the list of data sources above I was searching for about 15 minutes – following my nose to finally get to the raw ingredients I needed for some data creations.