Sourcing raw data… (drafting the open data cook book)

Open Data Cook Book LogoI’m at the Local by Social South West ‘Apps for Communities’ event in Bristol today, doing some prototyping work on the Open Data Cook Book. Listening to people working through how to find data – and trying to search for data myself, I thought I would try and map out all the different places I’ve been looking to track down different open datasets. So – with a sprinkling of recipe book metaphors – here’s a draft for comment of key places to track down open data (focussed on UK government data)…

Sourcing raw data

Finding the right ingredients for your data creation is often the hardest part. You will often have to mix-and-match from the approaches below to get all the data and information you need.

1) Search the supermarkets – the data catalogues & data stores

There are a growing number of data catalogues that bring together listings of published open data (and there are also now data marketplaces that can help you find commercially licensed data as well – so be sure to check the details of the data you find).

Data catalogues often have a particular focus – and no one catalogue can tell you about all the data out there.

CKAN.net is a catalogue of data from many different sources. Good to check if you are not quite sure where the dataset you want might be found to see if someone has already created a ‘packaged‘ version of it.

Data.gov.uk is the UK Governments data catalogue, which aims to include listings of all open datasets in the public sector. It’s early days yet, but it boasts over 4,600 dataset listings, many of which link direct to spreadsheets and data downloads.

Guardian World Data Store makes it easy to search across a range of different government open data catalogues – browsing data by country and format.

Your local authority might have a data store, or at least a data page on their website. London has http://data.london.gov.uk and you can find a list of other local open data web pages through the ‘All Councils’ listing at OpenlyLocal.com.

Publicdata.eu is a new catalogue bringing together data from right across Europe.

2) Specialist independents – data stores

Where the supermarkets are stacking the datasets high, and sharing them free – there might be a specialist in your area of interest – working hard to source and bring together the finest data they can. Fortunately, most of them provide the data for free too.

OpenlyLocal.com is focussed on making local council information accessible. You can find details of local council spending for many authorities alongside details of council meetings and councillors that has been scrumped and scraped from the respective websites for you. Most of the raw data is available through an API – so you might need to explore a few new skills to get at it though.

Timetric.com are specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data.

Do you have a specialist independent you go to for data? Tell us about them in the comments.

3) Foraging – searching for the data

If the data you want isn’t available pre-packaged and catalogued, you might need to head out foraging across the Internet. There is a lot of open data in the wild – you just need to know how to spot it.

GetTheData.org makes a great first port of call to see if other data-foragers have already found a good spot to get the data you are after. It’s a community website full of requests for data, and conversations about good places to find it. Plus, if your own foraging doesn’t turn up anything, you can come back and pose your question to the community here later.

SearchTry searching the web for the topic you are interested in. Perhaps add ‘data’ as an extra key word. When you read news articles or web pages that appear to be based on data, take note of the names of the data sources they mention and plug that back into a search. Oftentimes that will lead you to some data you might be able to use.

Think-tank websites, academic researcher web pages and even newspaper sites can all host lots of datasets. Just make sure you find out all you can about the provenance of the information before you use it!

Deep searchingYou can use a standard Google Search to look for data published in common office formats hosted on a particular web domain: your local council or university for example. All you need are two handy operators:

  • The ‘site:’ operator on Google restricts searches to only show results from a particular domain;
  • The ‘filetype:’ operator only returns files of a particular type.

Using those together you can construct searches like ‘filetype:xls site:oxford.gov.uk’ to find all the Excel spreadsheets that Google has indexed on the Oxford City Council website.

4) Scrumping – screen-scrape the data

It’s not uncommon to find the data you need… only it’s just out of reach. Perhaps it’s in a table on a web page when you want it in the sort of table you can load into a spreadsheet to sort and chart. Or it might be spread across lots of different web pages and files. That’s where screen-scraping comes in – creating small computer scripts that turn structured information on a website into raw data.

There are recipes that explain the details of screen-scraping coming in the cook book, and you can go screen-scrape scrumping with a variety of different tools.

Google Spreadsheetsusing a special formula you can grab tables and lists from other websites direct into your spreadsheet (recipe).

Scraper Wiki – helps you get started created advanced scrapers which they will run every day to grab information from websites and turn it into accessible raw data (recipe).

5) Special order – FOI

Perhaps you have found that no-one stocks the data you need – not even in places you can forage or scrump for it. If the data comes from a public body, then it might be time to explore putting in a special request for it using the Freedom of Information Act.

WhatDoTheyKnow.com is a service that makes it easy to submit a Freedom of Information Act request to a local authority, government department or other public body. You have a right to ask authorities for a copy of the information and data they hold, and you can ask for it to me returned as raw data. Search WhatDoTheyKnow to see if anyone has requested the data you want already, and if not, put in your request. (Often if data is available on WhatDoTheyKnow it will be locked up in PDFs. You might need to crowd-source the process of turning it into structured raw data, although there are a few tools and approaches that might help turn PDFs into data programatically)

The Public Sector Information Unlocking Service available at http://unlockingservice.data.gov.uk/ provides a root for requesting data is opened up by the Data.gov.uk team. It’s not backed by the legal framework of FOI, but may play a role in data requests under the currently debated ‘Right to Data’ legislation.

IsItOpenData.org provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing.

6) Home grown – research and crowdsourcing

Some data simply doesn’t exist yet – but you can create a raw dataset through research, and through crowd-sourcing, inviting others to help you research.

Simple spreadsheets – if you are systematically working through a research task, keep your results in a spreadsheet. See the section on raw data for ideas about how to structure it well.

Google Forms – available through http://docs.google.com allows you to create an online form that anyone can fill in, with all the responses going direct into a spreadsheet for you to use. You might be able to get supporters to research for you and collaborative build up a useful dataset.


Always check the label

Is the data you have found licensed for re-use? Whilst you might get away with cooking up some foraged raw data for your own consumption without checking out the details – when you re-publish data and share it with others you need to be sure you have permission to do so.

Remember as well to keep a list of the ingredient you use, and where you got them from, so you can publish a full list of sources along with your creation.)

Worked example: A simple search, with many steps

Sadly we’re not yet at the stage where you can easily get all the data you need delivered to your door – so most projects will involve some searching around.

For example: I was recently looking for data on library locations in Bristol. I started at the data supermarkets, searching data.gov.uk for ‘libraries’. I found a few datasets listed, but the links were broken, so I ended up at a dead end. Next I turned to the Guardian datastore, but that wasn’t very helpful either – so I looked at GetTheData.org to see if anyone else had been looking for library data. Fortunately they had, and their conversations pointed me towards a few possible data sources. Again though, I ended up almost a a dead end – I could find a list of planned library closures, but not a dataset of all the libraries. However, I did find a link to the Bristol Council website, and on browsing the site I came across a listing of libraries in a web-page – so I turned to a little scrumping – using Google Spreadsheets to import the web-page table into a spreadsheet table that I could manipulate and work with. Working through the list of data sources above I was searching for about 15 minutes – following my nose to finally get to the raw ingredients I needed for some data creations.

Digital Futures – Trends in Technology, Youth and Policy

[Summary: What technologies will affect services for young people in 2011? Presentation, worksheet and reflections on a workshop]

I’ve read a lot of blog posts and watched a lot of presentations about technology trends, and future technologies that everyone needs to be aware of – but they can often feel pretty distant from the reality of frontline public services trying to make sense of how new technologies affect their work. So when I was offered the opportunity to run a workshop on ‘digital futures’ at the children’s services conference of a national children’s charity, right at the start of 2011, I thought it would provide an interesting opportunity to explore different ways of talking about and making sense of technology trends.

Continue reading “Digital Futures – Trends in Technology, Youth and Policy”

Young people, activism & the web: Speaking Out in a Connected World

[Summary: Sharing slides and notes from a children’s sector conference presentation]

I was speaking earlier today at the Children England & NCVYS ‘Speaking Out’ conference on the topic of ‘young people, activism and the web’. The conference was predominantly attended by staff from third-sector organisations providing frontline services for children, young people and families, so I tried (not entirely successfully in a short slot…) to cover a mix of examples of youth-led use of the web in campaigning at the national level, and some practical steps that organisations, who may not be campaigning organisations, can take to make the most of the web to engage with young people and get their voices heard.

A slightly adapted version of the slides can be seen via slideshare below, and I’ve tried to write up some notes with links to relevant resources as well.

Notes and Links

I started planning the presentation by posing the question “How can young people use the web in activism?”, which pretty quickly, as I turned to watch a Twitter stream full of tweeting from the University College London students occupying their University, making extensive use of different digital media challenges to get their message out, and with members of UK Youth Climate Coalition celebrating their success keeping Chris Hune at the climate negotiations in Cancun by mobilising hundreds of people by e-mail, Facebook and Twitter to flood the Number 10 switchboard with calls, that the question was really “How can they not?”. The web is right at the heart of much modern youth action – and yet so many organisations still struggle to engage with online spaces.

As I put together the next slides, however, I was quickly reminded that the web alone doth not change create. Earlier this year I came across a Facebook group set up by young people campaigning against the use of Mosquito sonic weapons against young people in Barnsley, and I fired up Facebook to grab a screenshot of this today’s presentation – hoping I would see stacks of campaign updates. Yet the Facebook group, which when launched had quickly accelerated to over 700 members, was standing stagnant, the top updates as spam, and apparently no real action having been taken further engage and mobilise the young members of the group. So whilst young people may turn to social media tools when they’ve causes to campaign on, and they may have the know-how to set up Facebook groups and YouTube channels, the skills, support and connections needed to campaign effectively remain as vital as ever. As the Young Foundation put it, many young people are plugged in, but with their digital skills untapped.

Resources like Act by Right (and the great Act by Right on Climate Change remix by Alex Farrow), the Battlefront campaign toolkit, and a wealth of web pages about campaigning with the web, can provide some of those skills through the web itself – but there is also a need for youth organisations to work directly with young people to support the development of critical campaigning skills. Just before I spoke today, John Not, General Secretary of the Woodcraft folk, gave a last-minute presentation and shared the inspiring work they are doing to offer support to young people who are passionately campaigning right now on the issue of University Fees, demonstrating some great leadership on how organisations can provide responsible backing to youth-led action.

Helping young people to make connections with decision makers, through sites like TheyWorkForYou.com and WriteToThem.com, with the press, through the leverage that organisations might have, and with other campaigners, through spaces like TakingItGlobal and Battlefront is also a key role that adults can play in supporting young people to use the web for positive activism. There is also a need for organisations to think about how they support young people to make safe and effective use of the web in campaigning.

Many organisations, however, might not see their role as supporting general youth-led activism, but there are still many ways digital tools can support the delivery of participative practice. Online spaces can help organisations to engage young people, to communicate and co-ordinate, and to amplify their practice; and to ensure that young people’s views and insights on key aspects of a service, or key local issues, are heard and valued in decision making.

In thinking about how to engage with young people online it’s important to understand the different ways young people use the web and to think about whether a project is trying to engage young people who are already into an issue, or whether it’s trying to attract attention of those who are predominantly ‘hanging out’ online – spending time with friends and paying little attention to organisations and issues in the digital space. Good engagement also starts by listening (I mentioned Google Alerts as one handy digital listening tool, but there are many more), and starts from where young people are, whilst seeking to support young people to move beyond their starting point (a theme I initially developed in talking about youth work values and social media in the Youth Work & Social Networking report (PDF)).

Using online spaces to communicate involves finding the right tools for each job, and, finding out the right ways to use them. For example, Facebook profiles, groups and pages look very similar – but offer nuanced different ways of communicating with young people and creating online community. Quite a few of the practicalities of using different social media tools for youth engagement, including issues around organisational policy and safety concerns are covered in the ‘Social Media Youth Participation in Local Democracy’ report and in posts on Youth Work Online.

I ended today’s presentation by taking a look at three big policy agendas which have a digital edge to them, and trying to relate each to a critical question for organisations working with young people – but the full articulation of each of those I think will have to wait for a future blog post…

Further links
For those who were at the conference, and have made it reading this far without being overwhelmed by lots of links (and for anyone interested), a few more bits that might be of interest: