Open Arms? Unlocking raw data

[Summary: Exploring the process of requesting access to a raw dataset]

Update 22nd December: Almost a month on, and whilst my post on the OPSI Data Unlocking Service has had 30 votes in favour (more than any other request I can see by far) I’ve not heard from either OPSI or the data owner/data.gov.uk in response to my comments/requests for raw data. So far, it looks like requesting new raw data through the advertised routes doesn’t meet with much action. I’ll wait till the Open Up competition closes in the New Year to see what results that might bring – and then it’s time to start looking at what other ways there might be to request this data…

A lot of the open government data that has been released in recent years is only available locked up in PDFs and website interfaces. As this definition seeks to explain this radically limits the potential uses of that data.

Following a recent event organised by Campaign Against the Arms Trade I was curious about who the UK issues Export Control Licenses to, so I took a look on data.gov.uk. Sure enough, the Strategic Export Controls: Reports and Statistics Website is listed on the Data.gov.uk catalogue. But on closer investigation it turn out that the Strategic Export Controls: Reports and Statistics Website (a) requires registration before you can access it; (b) predominantly provides data as PDFs; (c) has a very complex search interface that generates reports in the background ready for download later – but reports which don’t include key information such as the month a license was issued. All the data is clearly in the system – as you can search by date – but in it’s current form, to extract meaningful information about where UK companies have gained arms export licenses (or been refused) would be a long and slow job.

I’ve heard about the OPSI Data Unlocking Service, and I’ve been in a number of presentations hearing senior government officials and Ministers talking about the commitment of government to releasing raw data, so I thought this would provide a good opportunity to test the process of requesting raw data.

So – as of this morning, I’ve tried three routes to ask for access to this data:

  1. Adding a comment to the package on Data.gov.uk requesting access to the data. I’ve also sent a copy of the comment via the ‘Feedback Form’ listed under ‘Contact Details’ for each dataset. From past experience, I think the comment form gets forwarded to the Data.gov.uk team who forward it on to the department – but I’m not certain where that message has gone, or who reads the comments on datasets.
  2. Submitting a request to the OPSI Data Unlocking Service. This appeared to submit an e-mail form to the OPSI webmaster, who is, I understand, supposed to check the request and  then add it to the OPSI website for others to vote on – as well as – I presume, to someone inside OPSI to review and act upon – although the process by which a request could lead to data is fairly unclear. My request is not yet on the website.
  3. Adding an idea submission to the TSO Open Up Competition which you can see here. As I understand, the TSO are working closely with government on open data projects, although don’t have authority to open access to data themselves. However, there does appear to be an interest from the competition in what datasets people want to see – so I figured a request via here can’t harm.

I suspect a fourth route might be to submit a Freedom of Information Request, but I’m keen to explore in the first place how these open data requesting channels work in practice. Have I missed any? How else should be requesting access to raw data? Do you have experience of requesting data? What worked and what didn’t?

I’ll report back on any updates on the process of getting access to this data…

Defining raw data

[Summary: explaining what raw data is and  why it matters]

On the Friday of last weeks Open Government Data Camp in a discussion on how to empower non-technical citizens, civil servants and community activists to make use of open government data, we hit upon the idea of an ‘Open Data Cook Book’ of simple recipes for working with data. The recipe analogy also emerged (via @exmosis) in a twitter discussion on Monday about ‘machine-readable data’ – and a bit of cook-book drafting later, here’s my attempt at describing good open data, whilst avoiding as much as possible any technical terms or getting caught up in the ambiguity of machine-readability.

Sourcing your ingredients for a raw data project:

For all of the recipes in the forthcoming open data cook book you will need to have access to some raw data to work with. You might already have the data you want to work with to hand, or you might have ideas for a great project, but no idea of where to get the data you need. In cook book we will outline a range of places you can source your data, and how to prepare it ready to be part of your data-creations.

Identifying raw data

You can find data all over the place when you start looking, but all-too-often the data you want has been pre-prepared, locked down in written reports, or only available through complicated website interfaces that only let you glimpse a small bit of the data at any one time.

Raw data is easier to manipulate with a computer. When you have raw data you can sort it, edit it and remix it in new ways with the tools you want to use.

Locked, raw, linked

We can think of data on a continuum.

At one end, is locked-up data. This is the sort of data you find in reports, charts and maps. Someone has interpreted what the data means and has pinned it down in a particular context. To use this data in new ways you will probably have to spend time converting it into a raw format through scraping, crowd-sourcing, or lots of manual work.

In the middle is raw data. This is when the data is available in a structured way that you can load into the software or online tools of your choice and can explore, manipulate and remix it. Raw data is ready for us in open data recipes.

However, to make use of any raw dataset you will need to know what it contains. Often raw data can contain cryptic headings, titles and codes for columns, rows or other elements of the dataset, so you will need to make sure you have access to meta-data which tells you what all the things in your raw dataset are, and how the data was generated (sort of like the ingredients list, and list of additives and preservatives on the back of any food packet).

Linked data and RDF provide a way for the meta-data to be transferred along with the raw data, and for connections to be made between different datasets that make it possible to discover even more context about something in your data. Linked data can make it easier to integrate different datasets when they use the same ways of representing different parts of the data. The tools for working with linked data aren’t quite as widespread yet as the tools for working with standard raw data formats, so often linked data is transformed into a common raw data format like CSV (spreadsheets/tabular data), or JSON and XML (flexible structures for different sorts of data).


I’ve still some more work to do tidying up these definitions – and I hope in the cook book we can make use of a few more visual metaphors to show the difference between locked-up, raw and linked information. The process of creating thinking through the relationship between raw and linked data as defined above, in conjunction with the DIKW model also seems to hint at a useful point I’ve not found a good way of articulating yet: that in most mash-up creation/data-use, human understanding of both data and context(meta-data) as separate elements is important – so whilst linked data helps context travel with data, when it comes to working with data, most users need to decompose it back into raw data with separate data and context to work with it.

A fear of open data heresy? Time to move beyond zealotry?

[Summary: A quick post for folk mainly for folk at today’s Open Government Data Camp, on the need to raise critical perspectives about open government.]

There are strong normative arguments for opening up government data – and there is great potential to be realised from that.

However, whilst the broad brush idea can command widespread support, the details of how we do open government data matter, and attentiveness to the social impacts is vital.

I’ve heard many people at events, including the Open Government Data Camp, express nuanced views on openness. And yet, far too often such views have been followed by comments such as “but I’m not sure I should be saying that sort of thing here”, or a retreat from the critical argument in order to add voices to the call for ‘more data now’.

So – I’m for a bit more heresy. A bit more challenge to the zealotry. A slightly louder voice for the critical friends of the open data movement.

It’s possible to argue for greater openness of data, and to think critically about the impacts that open data will have. It’s important to ask the question ‘Open data + what’ ? What do we need to be doing as well as releasing data to drive positive social change.

Young Lives Linked Data Demonstrator

[Summary: showcasing linked data for development project]

Over the past month or so I’ve been working for IKM Emergent on a demonstrator project to explore the potential implications of linked data for information management in the development sector – seeking put a small sub-section of the survey micro-data from the Young Lives longitudinal study online in order to explore the process and potential of generating linked data in development-focussed settings.

The results of that project are now live and online for the time being, and accessible here. The most visually interesting part of the demonstrator (thanks to the work or Rupert Redington at NeonTribe) is the Comparator tool which does some pretty clever things to identify ‘Data Cubes’ in the Young Lives linked data dataset we’ve published, and to offer (in the case of the smoking prevalence data) comparisons between the Young Lives dataset, and another comparable dataset we’ve also loaded into our Young Lives datastore.

However, through the demonstrator we’ve also made the Health dataset from the Young Lives data available to browse via OntoWiki interface, and to query via SPARQL – exploring how linked data structures give us the opportunity to annotate the questions from the young lives data – potentially helping future researchers to find questions and data of interest to them,

The presentation below steps through some of the basics of Linked Data, before, from slide 13 onwards, introducing the Young Lives Data Demonstrator.

I’ll be sharing some more learning notes from the Young Lives Linked Data Demonstrator over on the open data impacts blog soon.

Open Data Hack Day in Oxford – 4th December

Open Data Day Oxford on the 4th December 2010 is on the look out for designers, coders, copy-writers, policy people, journalists, statisticians, campaigners, data-geeks and anyone interested in exploring what can be done when you take some public data and spend a day creating things with it in order to contribute to some positive social change goals.

Thanks to Cowley based Web & Software Developers White October we’ve got a fantastic venue for an Oxford Open Data Hack Day* as part of the global Open Data Day events taking place right across the world.

Here’s how an open data hack day in Oxford should work:

  1. Anyone interested in taking part signs up using the registration form here, and, optionally, adds some notes to the planning Wiki page (Just click ‘Edit’ at the top-right of the wiki page, scroll to find where to add your notes, drop them in, ignoring any extra characters/symbols on the page you’re not sure about, and save the changed page. )

    You can sign-up with an idea for the project you want to work on on the day – or just to offer your skills. You don’t need to have taken part in a hack-day before, or to be an uber-geek to take part!

  2. The planning group will make sure we’ve got a good mix of people and possible project teams emerging – and might get in touch to link you up with potential collaborators for the day so you can have conversations in advance.
  3. On the day, we’ll start around 10am in the fantastic split-level and spacious White October offices, which are an walk/Bus Ride from the centre of Oxford (or a short bus-ride from the station) with coffee, refreshments and chance to meet other participants and hear about different ideas for projects on the day.
  4. We’ll form into teams to work on particular projects. Teams will find a space, get laptops and computers out -and start building things. You can either spend your whole day working with a particular team, or you can take your skills between teams to help them out when they need.

    Teams usually develop fairly organically to have 3 – 5 people in (although some people choose to work in smaller or larger groups) and will have a mix of skills.

  5. In your teams you will identify the data you are working with and what you want to do – and start creating something. It could be anything. At past events we’ve built everything from mash-up maps, through to paper-based card-games and Facebook apps.

    Recent ideas I’ve heard for hack-day outputs include data-driven stencils for creating artworks; web applications for checking the best place to park a bike; mobile phone-based tools for finding transport routes – and lots more.

    I’m expecting to be spending a lot of my time helping source data – and help people get hold of the data they want – and the team from White October will, I’m sure, be on hand offering their skills in all manor of digital webby stuff.

  6. By about 1pm we’ll get some lunch in – and depending on how work is going, we might break for people to feedback on progress so far and share any offers of, or requests for extra skills they have. We might even be able to link up by Skype with one of the other open data day events taking place around the world (tbc.)
  7. After an afternoon of making stuff, around 5pm, we’ll have a show and tell. If any kind sponsors get in touch we might even have some prizes to award to the best or most innovative creations.

    We’re thinking of inviting people from the City & County Council or other groups who might have an interest in releasing data along to see what has been created. Anyone with contacts who we could invite along to the show and tell, do let me know.

  8. We’ll tidy up and head to the pub – an optional ending to the day.

Are you up for it? If so – head over to the Wiki page to get registered. Offers of help organising, sourcing sponsorship, inviting show and tell participants etc. all welcome. Any questions? Drop them in as blog comments or on the Wiki.

(*Whilst some of the data we focus on might be Oxford/Oxfordshire based, participation is open to all, not just those based locally)

(Data week on Tim’s blog)

Just a quick note for all of you who follow this blog for the youth participation & digital youth work related bits… the next week of posting is going to be quite a lot on the other topic I spend lots of time thinking about/working on of open data (and I realise that past months have had a lot of open data stuff to).

But please don’t re-adjust your blog-reader/subscription just yet… some exciting digital youth work, children’s rights and youth participation posts to come soon – right after this week of data-related postings.


Brief practical notes on open data and activism

Flip Chart from CAAT Conference[Summary: Context, links, resources and ideas for working with open data in campaigning organisations and/or third-sector contexts.] (See other open data posts here.)

The rough notes below come from an short open session discussion held at the  Campaign Against the Arms Trade (CAAT) annual gathering last Saturday exploring how open data could be useful to a campaigning organisation. A PDF copy is here: Open Data and Campaigning.

Background & Context

The last 18-months have seen an impressive array of policy initiatives and practical actions leading to the release of datasets from governments in the UK, the US and across the world in open and re-usable formats online. Datasets ranging from the location of educational institutions, to details of taxation and government spending, have been brought together in data portals such as data.gov and data.gov.uk.

The open government data ‘movement’ has three broad constituent parts:

  • An open Public Sector Information (PSI) movement – drawing upon economic arguments to call for government data to be released and made freely re-useable. Often drawing upon comparisons between EU context where government collected data is copyright and restricted, and the US where government datasets are more open and large industries have developed on the back of them (e.g. Weather data; Geodata etc.).

  • A transparency movement – linked to Access to Information and Freedom of Information movements – calling for the release of data in the interests of democratic empowerment, or data to be used in particular contexts and settings.

  • Digital government & semantic web computerization movements – focussed on the potential for innovation and more efficient working when data is made available for computer processing: and working to build open networks of knowledge across the Internet though linked-data approaches.

Many different groups can be found within the open government data ‘movement’ – from groups calling for aid transparency, to SME companies seeking to address what are seen as unfair data monopolies.

Policy context:

(See Open Government Data & Democracy report for a full timeline)

  • The http://data.gov initiative in the US proceeded from Obama’s first executive order on taking power as President.
  • http://data.gov.uk in the UK was initiated by Gordon Brown in 2009.
  • Since coming to power in 2010 the Coalition Government in the UK have continued to push open data initiatives – thought with a slightly different ‘transparency’ and ‘accountability’ framing.
    • A requirement has been placed on local authorities to publish all spending over £500 by January, listing supplier and spend.
    • Government departments are under a similar requirement for all spend over £25,000, and have been asked to publish senior staff pay details and internal organizational diagrams.
    • Francis Maude has spoken of the need for a ‘Freedom of Data’ act, and has called for all responses to Freedom of Information requests that contain data to provide that data in machine-readable forms (i.e. Excel spreadsheet rather than print-out of PDF files…)
    • Aid Transparency has been high up the government’s development agenda.
  • The World Bank have released significant amounts of their data as open data.
  • Australia, New Zealand and many European countries have ongoing open data initiatives and campaigns.

Beyond government data

It’s not only government supplied data that is of interest to campaigners:

  • Projects like TheyWorkForYou.com and PublicWhip.org generate structured data about politicians voting records by ‘scraping’ parliamentary records;
  • Data Journalists (led by innovators at The Guardian amongst other places) publish their research as open accessible spreadsheets of data that others can re-use.
  • Some NGOs and community organisations are publishing open datasets.

Why data?

One of the key properties of data is that it can be easily manipulated by computer – allowing datasets to be combined, visualized, explored and used in many more ways than a written report or printed document can.

Where to find data

For official government data – the guardian’s World Government Data Search looks across a range of data catalogues like http://data.gov.uk. Find it at http://www.guardian.co.uk/world-government-data and search for keywords or topics of interest to you.

You can also search http://data.gov.uk direct to browse data my department or topic.

http://ckan.net/ provides a catalogue of open data from many different sources – including government data, NGOs and research projects. It is a good place to ‘register’ any open data you create. It is also wiki-like, meaning any user can edit the records – allowing the creation of ‘collections’ of data on a particular topic: e.g. ‘arms trade’.

ScraperWiki.com provides a collection of ‘scrapers’ which collect structured data from unstructured data-sources (i.e. make open data where the original publisher didn’t provide it). For example, generating a dataset of hospitality received by UK Government Ministers, originally only available as a large collection of different word documents is now here: http://scraperwiki.com/scrapers/government-meetings-with-external-organisations-ne/ and available for download (Update: it’s also now available from http://transparency.number10.gov.uk/)

If you are looking for a particular dataset – it can be worth asking in the data.gov.uk forums, or using the #opendata hash-tag on Twitter.

Data on MPs and voting records is available from www.theyworkforyou.com in the UK, and the PublicWhip.org project collects more detailed voting records and makes them available.

When data isn’t available

Try using the Public Data Unlocking Service to request that data is proactively published: http://www.opsi.gov.uk/unlocking-service/opsipage.aspx?page=unlockindex

If using the Freedom of Information Act to request data, remind the recipient of Francis Maude’s policy statements on the need to provide machine-readable data in return.

If the information is available on websites, but not as structured data – consider putting a request on http://www.scraperwiki.com for someone to build a tool to screen-scrape the data.

Consider using any of the ‘data competitions’ (e.g. http://openup.tso.co.uk) as a higher-profile way to ask for a dataset: emphasizing the government’s focus on accountability through transparency in other sectors such as local authority spending and aid.

Use the facts you can find from datasets like COINS (http://data.gov.uk/dataset/coins) to better structure Freedom of Information requests or crowdsourcing activities.

Explore ways to ‘crowd-source’ the data by calling on campaigners and supporters to find out particular facts – and to enter them into shared online spreadsheets (e.g. using Google Spreadsheets and Google Forms you can create an easy way for people to collaboratively input into a shared document – which can be instantly published online). Crowdsourcing tools like Ushahidi can also be used to develop projects such as http://WhereAreTheCuts.org – crowdsourcing reports of public spending cuts.

Working with data

Working with data scares many people – but it can start off very simply, but there are many approaches – including:

  1. Using data-driven websites such as http://TheyWorkForYou.com (MPs speeches and voting) or http://WhereDoesMyMoneyGo.com (government spending) which have taken government data and made it available in more accessible forms.
  2. Downloading and exploring a single dataset – many datasets can be opened in spreadsheet software like Excel. Sort and filter the columns to look for interesting information.
  3. Visualise the data – using a tool like IBM Many Eyes where you can upload simple datasets and explore a range of different ways of presenting the data.
  4. Building a mash-up – using tools like Google Spreadsheets and Google Fusion Tables, or Google Refine (available for free download) to explore and combine datasets.Google Fusion Tables will allow you to upload any spreadsheet, and, if it contains place names, quickly ‘geocode’ the data for displaying on a map. You can also combine two datasets – matching on any shared keys (e.g. MP name; Town name; Constituency) to build larger datasets.
  5. Holding a hack day – hack days like those organized by Rewired State bring together developers (coders/geeks) and people with problems to solve and spend one or two days of concerted effort creating ‘hacks’ (rapid prototypes) which address those issues, often using open data.For example, a hack-day could look to generate visualizations concerning arms licenses (CAAT Specific), or to create tools that support campaigners to get information to use when writing to MPs. (Update: We could have a campaigning strand at the Oxford Open Data Hack Day on 4th December if there was interest)
  6. Commissioning open data-based tools – developing hack-day created prototypes, or other ideas, into full working tools.

  7. Training activists in using data – through workshops and hands-on activities. (I’m mid way through developing a training workshop at the mo… suggestions of groups to pilot with welcome…)
  8. Releasing datasets – from in-house research or crowd-sourced data – and inviting supporters to use the data in creative ways. For example, putting researched data into Google Spreadsheets and, much as the Guardian Datablog does, sharing links to that data whenever posting news stories or website pages based upon it.

Going further

Search for the #opendata community on Twitter; or the ‘Open Government Data’ mailing lists run by Open Knowledge Foundation. Most of the links above will also provide access to further practical and background information on open government data.

Tim Davies, Practical Participation (tim@practicalparticipation.co.uk) can offer consultancy, training, workshops and support for organisations exploring the use of open data in campaigning. Please do get in touch to explore more…

Youth Participation in the Big Society…

[Summary: explore & add your thoughts to this paper from North West Regional Youth Work Unit on youth participation in the Big Society]

Last week a new paper from the North West Regional Youth Work Unit (NWRYWU) crossed my radar – exploring how a wide range of approaches to youth participation may fare under current government policies and priorities – particularly those framed by ideas of the big society.

It looks at approaches including:

  • Youth led grant giving
  • Youth inspectors
  • Shadow boards and youth panels
  • Youth councils and fora
  • Campaigning work
  • Regional youth fora & youth parliaments
  • Peer education
  • Short-term projects
  • Rights and advocacy work
  • Health service participation
    and
  • Youth-led organisations

Partly to aid my own note-taking on the document – but also to open it up to wider discussion (and with kind permission from NWRYWU) I’ve put the document up as a commentable doc over here where it can be read paragraph-by-paragraph and you can leave your comments on any section.

Whilst I’m not convinced that those supporting young people should immediately bend their language and focus to the priorities and language of a ‘big society’ agenda (and Kevin Harris’s critiques on the naivety of much big society thinking are worth reading), exploring and understanding what big society ideas might mean for youth engagement and getting more dialogue on the future of youth engagement can only be a good thing.

Open government data is not just a one-way flow…

[Summary: How can citizens, community institutions and social enterprise be part of producing ‘government’ data as well as consuming it? Some quick reflections…] (Cross-posted to Open Data Impacts blog)

Alison Powell poses the question in this blog post of whether we are moving into an era of ‘policy-based evidence’: where ideologically-driven policy making may lead to an end of evidence collection on key indicators (justified, no doubt, in the interests of ‘efficiency’), but impoverishing our understanding of the impacts of key policy choices. Alison certainly has a point: collecting evidence on an issue has been a key political strategy for shifting the political debate: and when evidence on the impact of a policy is gone – showing the positive or negative impact it had becomes far trickier.

However, just because government stops collecting data, or requiring that data is collected, doesn’t necessarily have to mean the loss of important social-policy datasets. The same transformational technological forces that mean government no-longer needs to, or can justify, monopolising the analysis of state data, means that the monopoly power of government is no longer needed to collect and collate many social-policy relevant datasets.

For many datasets the state has acted as co-ordinator of data collection: using it’s authority to require data to be shared in a standardised form (more often than not, spreadsheets or forms filled in and mailed or e-mailed in to some official in central government, who then rekeys data into another spreadsheet…). But: with collaborative online tools, will from the grassroots, and the right co-ordination/leadership many important datasets may be possible to generate without government involved at all.

Of course, if the strict definition of open government data is only applied to “produced or commissioned by government or government controlled entities” (though the definitions are a live debate…) then what I’m really talking about is community-created “open governance data” – or ‘data essential for informed democratic policy making’.

I don’t pretend that all the datasets Alison fears will be lost will survive: but it is worth thinking about how, if government no longer wants the data, those who care about the stories it will be telling in a few years time, keep collecting and take open, collaborative approaches to making governance data a two-way street…

(Some of the thoughts here are based on the lit review/analysis in §2.1 – 2.3 of my dissertation)