Sprinkled stats and the search for data…

[Summary: Data-driven vs. data guided change-making. Reposted from the new Making a Difference With Data website]

I woke up to a tweet this morning from @YoungAdvisors pointing me to their new ‘Big Book of Stats’ and ‘What’s the Real Cost of Cutting’ resources – bringing together statistics from across the youth sector in a quick-to-skim PDF.

I got in touch with Gary Buxton, Young Advisors Chief Exec to ask a few questions about the stats:

Q: What inspired your to collect the figures you have gathered?
When times are tough its even more important to share and collaborate.  Our social goals are about creating good opportunities for young people. Having charities, social enterprises and young people all replicating work is distracting and reduces everyone’s ability to deliver. If we all shared a little bit more, we’d all be greater than the sum of our parts.

Q: How easy was it to find the data and numbers you needed?
Both pieces were pretty difficult to pull together.  It became a bit of an evening hobby! Stats came from old NYA policy briefings, NCVYS, Twitter, Facebook, Private Consultancy Companies, New Economics Foundation, Prince’s Trust and government sites etc etc.  I still really want how much it costs when a young person is excluded from school!

Q: How are you now planning to use these figures?
We use the stats for writing bids and helping the young people we work with write bids and presentations that are well informed and referenced.  Knowing your data helps young people make reasoned and compelling solutions to community problems.  We wanted to open the data to others who might find it helpful so everyone can work smart and not hard, keep delivering great work, but most of all, make a good case to decision makers, councillors and MPs about how important investing in young people is and the risk of pulling funding from services that young people regard as important.

As the ‘Sprinkled Statistics’ recipe over in the Open Data Cook Book suggests, sometimes using open data is as simple as backing up an argument with the numbers – with no need for fancy visualisation or mash-ups. Resources like Young Advisors Big Book of Stats can make that easier for other groups.

But, as Gary notes, even just collecting the statistics you need from government reports, let alone getting access to raw data to slice and explore it in different ways, can be tricky. And as Paul Clarke questions in a blog post today, is getting the data always the most important part of campaigning for a change? Whilst we might imagine there are clear ‘facts’ about the cost of school exclusions, or patient to nurse ratios, these statistics do not come solely from direct measurement, but are based on calculations from different datasets, and, importantly, rest upon definitions (what is an exclusion; what counts as a direct or indirect cost of exclusion; do you count all the time a nurse is on the ward, or only the time they are available for patient care (not paperwork). As Paul puts it:

…does the cause need the data? Does the search for data delay the obvious? Could the open data revolution sometimes obfuscate more than enlighten? While we’re arguing over reporting standards, boundary definitions and data feeds, real people are hurting and starving.

So where does this leave us? Having access to statistics, data and figures at a local level can certainly help strengthen those advocating for change. And knowing the numbers can inform bids, proposals and smarter working. But perhaps key here is to see campaigning for change as ‘data guided’ and ‘data backed’ rather than ‘data-driven’.

Making a difference with data means knowing how to use it as a tool, but one amongst many in the change-makers toolbox.

Open data quick links: cook books; aid data; campaign camps; MADwData

[Summary: A couple of quick open data links]

The Open Data Cook Book now has a new look and a few more recipes – providing step by step instructions for working with open data. It’s also now Wikified – so anyone can sign-up to edit and add recipes. So, if you’ve got ideas for how people can use open data in creative ways – head over and add some recipes.

On the topic of Making a Difference With Data the new MADwData website is packed full of links and analysis on open data to support change at a local level, particularly organised around different sectors: health, local authorities, housing, transport, crime & education.  I’m editing the education section, and have been exploring how open the EduBase dataset really is. Take a look though at the fantastic content from the other editors – all giving some great overviews of the state of data for change in different contexts.

In the MADwData forum Vicky Sargent has been asking about the use of data in library closure campaigns. I’ve been in touch with a lot of campaigning organisations recently who sense that there is real potential for using open data as part of campaigns – but unsure exactly how it should work and how to start engaging with data (and open data advocates asking the same questions from the other direction). Hopefully we’ll be digging into exactly these questions, and providing some practical learning opportunities and take-away ideas at the upcoming Open Data Campaigning Camp in Oxford on 24th March. It’s tacked onto the end of the E-Campaigning Forum, and I’m co-organising with Rolf Kleef and Javier Ruiz. Free places are still left for organisations interested in spending day of hands-on learning exploring how data could help in campaigning against cuts; on environmental issues; and in international development campaigns and funding.

And talking of development funding… (not only a post of outward links; seemless links internally as well!) – last week the International Aid Transparency Initiative (IATI) Standard‘s first version was full agreed. I had the pleasure of working with Development Initiatives on a demonstrator of how IATI data could be visualised, the results of which are available on AidInfoLabs as the IATI Data Explorer allowing you to pick any country and dig into details of where DFID UK Government Aid spending has gone there – and, where the data is available, digging into the individual transactions.

Expectations and Evidence: youth participation and open data

[Summary: Exploring ways to use data as part of a youth participation process.]

Over the last year and a bit I’ve been doing less work on youth engagement and civic engagement processes than I would ideally like. I’m fascinated by processes of participation, and how to design activities and frameworks within which people can actively influence change on issues that affect them – getting beyond simply asking different groups the question ‘what do you want?’ and then struggling to reconcile conflicting answers (or, oftentimes, simple ignoring this input), to create spaces in which the different factors and views affecting a decision are materialised and in which those affected by decisions get to engage with the real decision making process. I’ve had varying levels of successes doing that – but the more time I’ve been spending with public data – the more I’ve been struggling to work out how to bring it into participative discussions in ways that are accessible and empowering to participants.

Generally data is about aggregates: about trends and patterns rather than the specific details of individual cases. Yet in participation, the goal is often to allow people to bring their own specific experience into discussions and to engage with issues and decisions based upon their unique perspectives. How can open datasets complement that process?

The approach I started to explore in a workshop this evening was linking ‘expectations and evidence’ – asking a group to draw upon their experience to write down a list of expectations, based on the questions that had been asked in a survey they had carried out amongst their peers – and then helping them to use IBM Many Eyes to visualise and explore the survey evidence that might support or challenge their expectations (I’ve written up the process of using the free Many Eyes tool over in the Open Data Cook Book). It was a short session, and not all of the group were familiar with the survey questions, so I would be pushed to call it a great success, but it did generate some useful learning about introducing data into participation processes.

1) Stats are scary (and/or boring; and/or confusing)
Even using a fairly interactive data visualisation tool like IBM Many Eyes statistics and data are, for many people, pretty alien things. The idea of multi-variate analysis (looking at more than one variable at once and the relationship between variables) is not something most people spend much time on in school or college – and trying to introduce three-variable analysis in a short youth participation workshop is tricky without leading to quite a bit of confusion.

One participant in this evenings working made the suggestion that “It would be useful to have a reminder of how to read all these charts. What does all this mean?”. Next time I run a similar session (as I’m keen to develop the idea further) I’ll look into finding/preparing a cheat-sheet for reading any data visualisations that get created…

2) ‘Expectations and Evidence’ can provide a good framework to start engaging with data
In this evenings workshop after looking at data we turned to talk about interview questions the group might ask delegates at an upcoming conference. A number of the question ideas threw up new ideas for ‘expectations’ the group had (for example, that youth services were being cut in different ways in different places across the country), which there might be ‘evidence’ available to support or challenge. Whilst we didn’t have time to then go and seek out the relevant data there was potential here to try and then go and search data catalogues and use a range of visualisation and exploration approaches to test those bigger expectations more (our first expectations work focussed on some fairly localised survey data).

3) The questions and processes matter
When I started to think about how data and participation might fit together I sketched out different sorts of questions that participation processes might work with. Different questions link to different processes of decision making…

  • (a) What was your experience of…? (share your story…we’ll analyse)
  • (b) What do you think of…? (give your opinion … we’ll decide what to do with it)
  • (c) What should we do about…? (give us your proposals…)
  • (d) Share this decision with us… (we need to work from shared understanding…)

To introduce data into (a) and (b) is tricky. If the ‘trend’ contradicts an individuals own view or experience, it can be very demanding to ask them to reconcile that contradiction. Of course, creating opportunities for people with experience of a situtation to reconcile tensions between stats and stories is better than leaving it up to distant decision makers to choose whether to trust what the data says, or what people are saying, when it seems they don’t concur – but finding empowering participative processes for this seems tough.

It seems that data can feature in participation more easily when we shift from opinion gathering to decision sharing; but building shared understanding around narratives and around data is not something that can happen quickly in short sessions.

I’m not sure this post gets me towards any great answers on how to link data into participative processes. But, in interests of thinking aloud (and in an effort to reclaim my blogging as reflective practice, getting away from the ways it’s been rather news and reporting driven of late) I’ll let it make it onto the blog, with all reflections/comments very much welcomed…

CfP: Journal Special Issue on Open Data

[Summary: Abstracts wanted for special issue of Journal of Community Informatics focussing on supply and use of open government data in different contexts across the world]

Michael Gurstein’s blog post last year on Open Data: Empowering the Empowered, or Effective Use for Everyone sparked some interesting discussions about how open data policies and practices impact different groups on the ground. The question of what impacts open data will have in different contexts has been picked up in Daniel Kaplan’s recent post on the OKF blog, and the need for different approaches to open data in different countries is a key theme in the draft Open Government Data in India report. With the discussion on open data impacts growing, I’m really pleased to be able to share the Call for Proposal below for a special issue of the Journal of Community Informatics that I’ll be guest editing along with Zainab Bawa of the CIS in India. So, if you’ve been meaning to write an article on the impacts of open data, or you know of grass roots projects in different places across the world working with the supply or use of open data, take a look at the call below…

Journal of Community Informatics: Call for Papers for Special issue on Open Data

Guest editors:  Tim Davies, Practical Participation and Zainab Bawa, CIS-RAW fellow

Call for Proposals
The Journal of Community Informatics is a focal point for the communication of research that is of interest to a global network of academics, Community Informatics practitioners and national and multi-lateral policy makers.

We invite submission of original, unpublished articles for a forthcoming special edition of the Journal that will focus on Open Data. We welcome research articles, case studies and notes from the field. All research articles will be double blind peer-reviewed. Insights and analytical perspectives from practitioners and policy makers in the form of notes from the field or case studies are also encouraged. These will not be peer-reviewed.

Why a special issue on Open Data
In many countries across the world, discussions, policies and developments are actively emerging around open access to government data. It is believed that opening up government data to citizens is critical for enforcing transparency and accountability within the government. Open data is also seen as holding the potential to bring about greater citizens’ participation, empowering citizens to ask questions of their governments via not only the data that is made openly available but also through the interpretations that different stakeholders make of the open data. Besides advocacy for open data on grounds of democracy, it is also argued that opening government data can have significant economic potential, generating new industries and innovations.

Whilst some open government data initiatives are being led by governments, other open data projects are taking a grassroots approach, collecting and curating government data in reusable digital formats which can be used by specific communities at the grassroots and/or macro datasets that can be used/received/applied in different ways in different local/grassroots contexts. INGOs, NGOs and various civil society and community based organizations are also getting involved with open data activities, from sharing data they hold regarding aid flows, health, education, crime, land records, demographics, etc, to actively sourcing public data through freedom of information and right to information acts. The publishing of open data on the Internet can make it part of a global eco-system of data, and efforts are underway in technology, advocacy and policy-making communities to develop standards, approaches and tools for linking and analysing these new open data resources. At the same time, there are questions surrounding the very notion of ‘openness’, primarily whether openness and open data have negative repercussions for particular groups of citizens in certain social, geographic, political, demographic, cultural and other grassroots contexts.

In sum then, what we find in society today is not only various practices relating to open data, but also an active shift in paradigms about access and use of information and data, and notions of “openness” and “information/data”. These emerging/renewed paradigms are also configuring/reconfiguring understandings and practices of “community” and “citizenship”. We therefore find it imperative to engage with crucial questions that are emerging from these paradigm shifts as well as the related policy initiatives, programmatic action and field experiences.

Some of the questions that we hope this special issue will explore are:

  1. How are citizens’ groups, grassroots organizations, NGOs, diverse civil society associations and other public and private entities negotiating with different arms of the state to provide access to government data both in the presence and absence of official open data policies, freedom/right of information legislations and similar commitments on the part of governments?
  2. What are the various models of open data that are operational in practice in different parts of the world? What are the different ways in which open data are being used by and for the grassroots and what are the impacts (positive, negative, paradoxical) of such open data  for communities and groups at the grassroots?
  3. Who/which actors are involved in opening up what kinds of data? What are their stakes in opening up such data and making it available for the public?
  4. What are the different technologies that are being used for publishing, storing and archiving open data? What are the challenges/issues that various grassroots users and the stakeholders, experience with respect to these technologies i.e., design, scale, costs, dissemination of the open data to different publics and realizing the potential of open data?
  5. What notions of openness and publicness are at work in both policies as well as initiatives concerning open data and what impacts do these notions have on grassroots’ practitioners and users?
  6. Following from the above, what are the implications of opening up different kinds of data for privacy, security and local level practices and information systems?

Thematic focus
The following suggested areas of thematic focus (policy, technology, uses, impacts) give a non-exhaustive list of potential topic areas for articles or case studies. The core interest of the special issue is addressing each of these themes from, or taking into account, grassroots, local citizen and community perspectives.

  1. Different policy and practice approaches to open data and open government data
  2. Diverse uses of open data and their impacts
  3. Technologies that are deployed for implementing open data and their implications
  4. Critical assessments of stakeholders and stakes in opening up different kinds of data.
Submission
Abstracts are invited in the first instance, to be submitted by e-mail to jociopendata@gmail.com.

Deadline for abstracts: 31st March 2011
Deadline for complete paper submissions: 15th September 2011
Publication date is forthcoming

Please send abstracts, in the first instance, to jociopendata@gmail.com.

For information about JCI submission requirements, including author guidelines, please visit: http://www.ci-journal.net/index.php/ciej/about/submissions#onlineSubmissions

Guest Editors

Zainab Bawa
Centre for Internet and Society (CIS) RAW fellow bawazainab79@gmail.com

Tim Davies
Director, Practical Participation (http://www.practicalparticipation.co.uk)
tim@practicalparticipation.co.uk | @timdavies | +447834856303

Sourcing raw data… (drafting the open data cook book)

Open Data Cook Book LogoI’m at the Local by Social South West ‘Apps for Communities’ event in Bristol today, doing some prototyping work on the Open Data Cook Book. Listening to people working through how to find data – and trying to search for data myself, I thought I would try and map out all the different places I’ve been looking to track down different open datasets. So – with a sprinkling of recipe book metaphors – here’s a draft for comment of key places to track down open data (focussed on UK government data)…

Sourcing raw data

Finding the right ingredients for your data creation is often the hardest part. You will often have to mix-and-match from the approaches below to get all the data and information you need.

1) Search the supermarkets – the data catalogues & data stores

There are a growing number of data catalogues that bring together listings of published open data (and there are also now data marketplaces that can help you find commercially licensed data as well – so be sure to check the details of the data you find).

Data catalogues often have a particular focus – and no one catalogue can tell you about all the data out there.

CKAN.net is a catalogue of data from many different sources. Good to check if you are not quite sure where the dataset you want might be found to see if someone has already created a ‘packaged‘ version of it.

Data.gov.uk is the UK Governments data catalogue, which aims to include listings of all open datasets in the public sector. It’s early days yet, but it boasts over 4,600 dataset listings, many of which link direct to spreadsheets and data downloads.

Guardian World Data Store makes it easy to search across a range of different government open data catalogues – browsing data by country and format.

Your local authority might have a data store, or at least a data page on their website. London has http://data.london.gov.uk and you can find a list of other local open data web pages through the ‘All Councils’ listing at OpenlyLocal.com.

Publicdata.eu is a new catalogue bringing together data from right across Europe.

2) Specialist independents – data stores

Where the supermarkets are stacking the datasets high, and sharing them free – there might be a specialist in your area of interest – working hard to source and bring together the finest data they can. Fortunately, most of them provide the data for free too.

OpenlyLocal.com is focussed on making local council information accessible. You can find details of local council spending for many authorities alongside details of council meetings and councillors that has been scrumped and scraped from the respective websites for you. Most of the raw data is available through an API – so you might need to explore a few new skills to get at it though.

Timetric.com are specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data.

Do you have a specialist independent you go to for data? Tell us about them in the comments.

3) Foraging – searching for the data

If the data you want isn’t available pre-packaged and catalogued, you might need to head out foraging across the Internet. There is a lot of open data in the wild – you just need to know how to spot it.

GetTheData.org makes a great first port of call to see if other data-foragers have already found a good spot to get the data you are after. It’s a community website full of requests for data, and conversations about good places to find it. Plus, if your own foraging doesn’t turn up anything, you can come back and pose your question to the community here later.

SearchTry searching the web for the topic you are interested in. Perhaps add ‘data’ as an extra key word. When you read news articles or web pages that appear to be based on data, take note of the names of the data sources they mention and plug that back into a search. Oftentimes that will lead you to some data you might be able to use.

Think-tank websites, academic researcher web pages and even newspaper sites can all host lots of datasets. Just make sure you find out all you can about the provenance of the information before you use it!

Deep searchingYou can use a standard Google Search to look for data published in common office formats hosted on a particular web domain: your local council or university for example. All you need are two handy operators:

  • The ‘site:’ operator on Google restricts searches to only show results from a particular domain;
  • The ‘filetype:’ operator only returns files of a particular type.

Using those together you can construct searches like ‘filetype:xls site:oxford.gov.uk’ to find all the Excel spreadsheets that Google has indexed on the Oxford City Council website.

4) Scrumping – screen-scrape the data

It’s not uncommon to find the data you need… only it’s just out of reach. Perhaps it’s in a table on a web page when you want it in the sort of table you can load into a spreadsheet to sort and chart. Or it might be spread across lots of different web pages and files. That’s where screen-scraping comes in – creating small computer scripts that turn structured information on a website into raw data.

There are recipes that explain the details of screen-scraping coming in the cook book, and you can go screen-scrape scrumping with a variety of different tools.

Google Spreadsheetsusing a special formula you can grab tables and lists from other websites direct into your spreadsheet (recipe).

Scraper Wiki – helps you get started created advanced scrapers which they will run every day to grab information from websites and turn it into accessible raw data (recipe).

5) Special order – FOI

Perhaps you have found that no-one stocks the data you need – not even in places you can forage or scrump for it. If the data comes from a public body, then it might be time to explore putting in a special request for it using the Freedom of Information Act.

WhatDoTheyKnow.com is a service that makes it easy to submit a Freedom of Information Act request to a local authority, government department or other public body. You have a right to ask authorities for a copy of the information and data they hold, and you can ask for it to me returned as raw data. Search WhatDoTheyKnow to see if anyone has requested the data you want already, and if not, put in your request. (Often if data is available on WhatDoTheyKnow it will be locked up in PDFs. You might need to crowd-source the process of turning it into structured raw data, although there are a few tools and approaches that might help turn PDFs into data programatically)

The Public Sector Information Unlocking Service available at http://unlockingservice.data.gov.uk/ provides a root for requesting data is opened up by the Data.gov.uk team. It’s not backed by the legal framework of FOI, but may play a role in data requests under the currently debated ‘Right to Data’ legislation.

IsItOpenData.org provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing.

6) Home grown – research and crowdsourcing

Some data simply doesn’t exist yet – but you can create a raw dataset through research, and through crowd-sourcing, inviting others to help you research.

Simple spreadsheets – if you are systematically working through a research task, keep your results in a spreadsheet. See the section on raw data for ideas about how to structure it well.

Google Forms – available through http://docs.google.com allows you to create an online form that anyone can fill in, with all the responses going direct into a spreadsheet for you to use. You might be able to get supporters to research for you and collaborative build up a useful dataset.


Always check the label

Is the data you have found licensed for re-use? Whilst you might get away with cooking up some foraged raw data for your own consumption without checking out the details – when you re-publish data and share it with others you need to be sure you have permission to do so.

Remember as well to keep a list of the ingredient you use, and where you got them from, so you can publish a full list of sources along with your creation.)

Worked example: A simple search, with many steps

Sadly we’re not yet at the stage where you can easily get all the data you need delivered to your door – so most projects will involve some searching around.

For example: I was recently looking for data on library locations in Bristol. I started at the data supermarkets, searching data.gov.uk for ‘libraries’. I found a few datasets listed, but the links were broken, so I ended up at a dead end. Next I turned to the Guardian datastore, but that wasn’t very helpful either – so I looked at GetTheData.org to see if anyone else had been looking for library data. Fortunately they had, and their conversations pointed me towards a few possible data sources. Again though, I ended up almost a a dead end – I could find a list of planned library closures, but not a dataset of all the libraries. However, I did find a link to the Bristol Council website, and on browsing the site I came across a listing of libraries in a web-page – so I turned to a little scrumping – using Google Spreadsheets to import the web-page table into a spreadsheet table that I could manipulate and work with. Working through the list of data sources above I was searching for about 15 minutes – following my nose to finally get to the raw ingredients I needed for some data creations.

Linked Open Data & Development at ICTD2010

[Summary: Short paper and presentations exploring linked open data in International Development]

Yesterday, Tim Berners-Lee gave the keynote speech at the 2010 ICT & International Development conference in London, including talk of the potential role of open data in development (I was following via Twitter). The details of how open and linked data might impact development were the key theme in the recent IKM workshop I blogged about a few weeks ago, and as a follow up to that workshop, a short discussion paper was available at ICTD, alongside a range of fantastic touchscreen kiosks produced  by Ralph Borland.

Last week, I rather rapidly put together the interface for one of those kiosks, focussed on offering users an introduction to open data, linked data, data visualisation, and the IKM questions being asked about how the development of standards, norms and practices in the creation, sharing and linking of datasets might impact upon development at local levels.

You can find the IKM discussion paper on linked open information for Development for download here and if you want to explore the TouchScreen interface, albeit with some bits that might not work 100% in browsers other than Firefox and which might not make sense on a standard machine rather than touchscreen, you can launch it below.

I’ve also noticed that the draft of Keish Taylor and Ginette Law’s fantastic (and very comprehensive) write-up of the IKM Linked Information Workshop is also available for download on the IKM site.


Reflections on Oxford Open Data Day

[Summary: creations and learning from Oxford Open Data Day]

Yesterday around 30 people got together in Oxford to take part in the first international Open Data Day, an initiative sparked off by David Eaves to get groups around the world exploring what they could create with public data. For many of the assembled Oxford crowd it was their first experience of both exploring public data, and taking part in a hack-day event, so, having started at 10am, it was fantastic that by 4.30pm we:

Thanks to everyone who took part in the day, and particularly to Ed, Kevin, Ed & Dave at White October for hosting the event, and to Incuna for sponsoring the lunch. Many thanks also to Sywia for blogging the event: you can find photos and video clips sharing the story here.

Quick Learning Notes

Skill building: I also took advantage of the Open Data Day to start exploring some of the ideas that might go into an Open Data Cook Book of ‘recipes’ for creating and working with open data. There are big challenges when it comes to building the capacity of both technical developers and non-developers alike to discover and then work with open data.

I’ve been reflecting on the discovery and design processes we could make use of at the start of any open data focussed workshops – whether with developers, civil servants, community groups or campaigners to provide the right level of context on what open data is, the potential and limitations of different datasets, and to provide a general awareness of where data can be discovered. At Open Data Day in Oxford we perhaps struggled to generate ideas for projects in the first half of the day – but understandably so given it takes a while to get familiar with the datasets available.

I wonder if for hack-day style events with people new to open data, some sort of training & team-building exercises for the first hour might be useful?

Data-led or problem-led: Most of the groups working were broadly data-led. They found some data of interest, and then explored what could be done with it. One group (the visualisations of impacts of tax changes for the Robin Hood Tax campaign) was more ‘problem led’ – starting with an issue to explore and then seeking data to work with. Both have their challenges: with the first, projects can struggle to find a focus; with the latter, it’s easy to get stuck because the data you imagine might be available turns out not to be. Finding the data you need isn’t available can provide a good spark for more open data campaigning (why, for example, are the details of prices in the Retail Price Index basket of goods not being published, and FOI requests for them being turned down on the basis of ‘personal information’ exemptions?), but when you can’t get that campaigning to produce results during the course of a single day, it can be pretty frustrating as well.

On the day or in advance?:
We held a pre-meeting for the Oxford Open Data Day – and it was useful in getting people to know each other and to discover some ideas and sources of data – but we perhaps didn’t carry through the ideas from that meeting into the hack-day very strongly. Encouraging a few more people to act as project leaders in advance may have been useful to for enabling those who came wanting to help on projects rather than create their own to get involved.

Data not just for developers:
My mantra. Yet still hard to plan for and make work. Perhaps trying to include a greater training element into a hack day would help here, or encouraging some technically-inclined folk to take on a role of data-facilitators – helping non-developers get the data into a shape they need for working with it in non-technical ways. Hopefully some of the open data cook book recipes might be useful here.

Sharing learning rather than simply products:
David Eaves set out three shared goals for the Open Data Day events:

1. Have fun

2. Help foster local supportive and diverse communities of people who advocate for open data

3. Help raise awareness of open data, why it matters, by building sites and applications

emphasising the importance of producing tangible things to demonstrate the potential of open data. This is definitely important – but I think we probably missed a trick by focussing on the products of the hack-day in presentations at the end of the day, rather than the learning and new skills people had picked up and could tell others about.

Open Arms? Unlocking raw data

[Summary: Exploring the process of requesting access to a raw dataset]

Update 22nd December: Almost a month on, and whilst my post on the OPSI Data Unlocking Service has had 30 votes in favour (more than any other request I can see by far) I’ve not heard from either OPSI or the data owner/data.gov.uk in response to my comments/requests for raw data. So far, it looks like requesting new raw data through the advertised routes doesn’t meet with much action. I’ll wait till the Open Up competition closes in the New Year to see what results that might bring – and then it’s time to start looking at what other ways there might be to request this data…

A lot of the open government data that has been released in recent years is only available locked up in PDFs and website interfaces. As this definition seeks to explain this radically limits the potential uses of that data.

Following a recent event organised by Campaign Against the Arms Trade I was curious about who the UK issues Export Control Licenses to, so I took a look on data.gov.uk. Sure enough, the Strategic Export Controls: Reports and Statistics Website is listed on the Data.gov.uk catalogue. But on closer investigation it turn out that the Strategic Export Controls: Reports and Statistics Website (a) requires registration before you can access it; (b) predominantly provides data as PDFs; (c) has a very complex search interface that generates reports in the background ready for download later – but reports which don’t include key information such as the month a license was issued. All the data is clearly in the system – as you can search by date – but in it’s current form, to extract meaningful information about where UK companies have gained arms export licenses (or been refused) would be a long and slow job.

I’ve heard about the OPSI Data Unlocking Service, and I’ve been in a number of presentations hearing senior government officials and Ministers talking about the commitment of government to releasing raw data, so I thought this would provide a good opportunity to test the process of requesting raw data.

So – as of this morning, I’ve tried three routes to ask for access to this data:

  1. Adding a comment to the package on Data.gov.uk requesting access to the data. I’ve also sent a copy of the comment via the ‘Feedback Form’ listed under ‘Contact Details’ for each dataset. From past experience, I think the comment form gets forwarded to the Data.gov.uk team who forward it on to the department – but I’m not certain where that message has gone, or who reads the comments on datasets.
  2. Submitting a request to the OPSI Data Unlocking Service. This appeared to submit an e-mail form to the OPSI webmaster, who is, I understand, supposed to check the request and  then add it to the OPSI website for others to vote on – as well as – I presume, to someone inside OPSI to review and act upon – although the process by which a request could lead to data is fairly unclear. My request is not yet on the website.
  3. Adding an idea submission to the TSO Open Up Competition which you can see here. As I understand, the TSO are working closely with government on open data projects, although don’t have authority to open access to data themselves. However, there does appear to be an interest from the competition in what datasets people want to see – so I figured a request via here can’t harm.

I suspect a fourth route might be to submit a Freedom of Information Request, but I’m keen to explore in the first place how these open data requesting channels work in practice. Have I missed any? How else should be requesting access to raw data? Do you have experience of requesting data? What worked and what didn’t?

I’ll report back on any updates on the process of getting access to this data…

Defining raw data

[Summary: explaining what raw data is and  why it matters]

On the Friday of last weeks Open Government Data Camp in a discussion on how to empower non-technical citizens, civil servants and community activists to make use of open government data, we hit upon the idea of an ‘Open Data Cook Book’ of simple recipes for working with data. The recipe analogy also emerged (via @exmosis) in a twitter discussion on Monday about ‘machine-readable data’ – and a bit of cook-book drafting later, here’s my attempt at describing good open data, whilst avoiding as much as possible any technical terms or getting caught up in the ambiguity of machine-readability.

Sourcing your ingredients for a raw data project:

For all of the recipes in the forthcoming open data cook book you will need to have access to some raw data to work with. You might already have the data you want to work with to hand, or you might have ideas for a great project, but no idea of where to get the data you need. In cook book we will outline a range of places you can source your data, and how to prepare it ready to be part of your data-creations.

Identifying raw data

You can find data all over the place when you start looking, but all-too-often the data you want has been pre-prepared, locked down in written reports, or only available through complicated website interfaces that only let you glimpse a small bit of the data at any one time.

Raw data is easier to manipulate with a computer. When you have raw data you can sort it, edit it and remix it in new ways with the tools you want to use.

Locked, raw, linked

We can think of data on a continuum.

At one end, is locked-up data. This is the sort of data you find in reports, charts and maps. Someone has interpreted what the data means and has pinned it down in a particular context. To use this data in new ways you will probably have to spend time converting it into a raw format through scraping, crowd-sourcing, or lots of manual work.

In the middle is raw data. This is when the data is available in a structured way that you can load into the software or online tools of your choice and can explore, manipulate and remix it. Raw data is ready for us in open data recipes.

However, to make use of any raw dataset you will need to know what it contains. Often raw data can contain cryptic headings, titles and codes for columns, rows or other elements of the dataset, so you will need to make sure you have access to meta-data which tells you what all the things in your raw dataset are, and how the data was generated (sort of like the ingredients list, and list of additives and preservatives on the back of any food packet).

Linked data and RDF provide a way for the meta-data to be transferred along with the raw data, and for connections to be made between different datasets that make it possible to discover even more context about something in your data. Linked data can make it easier to integrate different datasets when they use the same ways of representing different parts of the data. The tools for working with linked data aren’t quite as widespread yet as the tools for working with standard raw data formats, so often linked data is transformed into a common raw data format like CSV (spreadsheets/tabular data), or JSON and XML (flexible structures for different sorts of data).


I’ve still some more work to do tidying up these definitions – and I hope in the cook book we can make use of a few more visual metaphors to show the difference between locked-up, raw and linked information. The process of creating thinking through the relationship between raw and linked data as defined above, in conjunction with the DIKW model also seems to hint at a useful point I’ve not found a good way of articulating yet: that in most mash-up creation/data-use, human understanding of both data and context(meta-data) as separate elements is important – so whilst linked data helps context travel with data, when it comes to working with data, most users need to decompose it back into raw data with separate data and context to work with it.

A fear of open data heresy? Time to move beyond zealotry?

[Summary: A quick post for folk mainly for folk at today’s Open Government Data Camp, on the need to raise critical perspectives about open government.]

There are strong normative arguments for opening up government data – and there is great potential to be realised from that.

However, whilst the broad brush idea can command widespread support, the details of how we do open government data matter, and attentiveness to the social impacts is vital.

I’ve heard many people at events, including the Open Government Data Camp, express nuanced views on openness. And yet, far too often such views have been followed by comments such as “but I’m not sure I should be saying that sort of thing here”, or a retreat from the critical argument in order to add voices to the call for ‘more data now’.

So – I’m for a bit more heresy. A bit more challenge to the zealotry. A slightly louder voice for the critical friends of the open data movement.

It’s possible to argue for greater openness of data, and to think critically about the impacts that open data will have. It’s important to ask the question ‘Open data + what’ ? What do we need to be doing as well as releasing data to drive positive social change.