Young Lives Linked Data Demonstrator

[Summary: showcasing linked data for development project]

Over the past month or so I’ve been working for IKM Emergent on a demonstrator project to explore the potential implications of linked data for information management in the development sector – seeking put a small sub-section of the survey micro-data from the Young Lives longitudinal study online in order to explore the process and potential of generating linked data in development-focussed settings.

The results of that project are now live and online for the time being, and accessible here. The most visually interesting part of the demonstrator (thanks to the work or Rupert Redington at NeonTribe) is the Comparator tool which does some pretty clever things to identify ‘Data Cubes’ in the Young Lives linked data dataset we’ve published, and to offer (in the case of the smoking prevalence data) comparisons between the Young Lives dataset, and another comparable dataset we’ve also loaded into our Young Lives datastore.

However, through the demonstrator we’ve also made the Health dataset from the Young Lives data available to browse via OntoWiki interface, and to query via SPARQL – exploring how linked data structures give us the opportunity to annotate the questions from the young lives data – potentially helping future researchers to find questions and data of interest to them,

The presentation below steps through some of the basics of Linked Data, before, from slide 13 onwards, introducing the Young Lives Data Demonstrator.

I’ll be sharing some more learning notes from the Young Lives Linked Data Demonstrator over on the open data impacts blog soon.

Open Data Hack Day in Oxford – 4th December

Open Data Day Oxford on the 4th December 2010 is on the look out for designers, coders, copy-writers, policy people, journalists, statisticians, campaigners, data-geeks and anyone interested in exploring what can be done when you take some public data and spend a day creating things with it in order to contribute to some positive social change goals.

Thanks to Cowley based Web & Software Developers White October we’ve got a fantastic venue for an Oxford Open Data Hack Day* as part of the global Open Data Day events taking place right across the world.

Here’s how an open data hack day in Oxford should work:

  1. Anyone interested in taking part signs up using the registration form here, and, optionally, adds some notes to the planning Wiki page (Just click ‘Edit’ at the top-right of the wiki page, scroll to find where to add your notes, drop them in, ignoring any extra characters/symbols on the page you’re not sure about, and save the changed page. )

    You can sign-up with an idea for the project you want to work on on the day – or just to offer your skills. You don’t need to have taken part in a hack-day before, or to be an uber-geek to take part!

  2. The planning group will make sure we’ve got a good mix of people and possible project teams emerging – and might get in touch to link you up with potential collaborators for the day so you can have conversations in advance.
  3. On the day, we’ll start around 10am in the fantastic split-level and spacious White October offices, which are an walk/Bus Ride from the centre of Oxford (or a short bus-ride from the station) with coffee, refreshments and chance to meet other participants and hear about different ideas for projects on the day.
  4. We’ll form into teams to work on particular projects. Teams will find a space, get laptops and computers out -and start building things. You can either spend your whole day working with a particular team, or you can take your skills between teams to help them out when they need.

    Teams usually develop fairly organically to have 3 – 5 people in (although some people choose to work in smaller or larger groups) and will have a mix of skills.

  5. In your teams you will identify the data you are working with and what you want to do – and start creating something. It could be anything. At past events we’ve built everything from mash-up maps, through to paper-based card-games and Facebook apps.

    Recent ideas I’ve heard for hack-day outputs include data-driven stencils for creating artworks; web applications for checking the best place to park a bike; mobile phone-based tools for finding transport routes – and lots more.

    I’m expecting to be spending a lot of my time helping source data – and help people get hold of the data they want – and the team from White October will, I’m sure, be on hand offering their skills in all manor of digital webby stuff.

  6. By about 1pm we’ll get some lunch in – and depending on how work is going, we might break for people to feedback on progress so far and share any offers of, or requests for extra skills they have. We might even be able to link up by Skype with one of the other open data day events taking place around the world (tbc.)
  7. After an afternoon of making stuff, around 5pm, we’ll have a show and tell. If any kind sponsors get in touch we might even have some prizes to award to the best or most innovative creations.

    We’re thinking of inviting people from the City & County Council or other groups who might have an interest in releasing data along to see what has been created. Anyone with contacts who we could invite along to the show and tell, do let me know.

  8. We’ll tidy up and head to the pub – an optional ending to the day.

Are you up for it? If so – head over to the Wiki page to get registered. Offers of help organising, sourcing sponsorship, inviting show and tell participants etc. all welcome. Any questions? Drop them in as blog comments or on the Wiki.

(*Whilst some of the data we focus on might be Oxford/Oxfordshire based, participation is open to all, not just those based locally)

Brief practical notes on open data and activism

Flip Chart from CAAT Conference[Summary: Context, links, resources and ideas for working with open data in campaigning organisations and/or third-sector contexts.] (See other open data posts here.)

The rough notes below come from an short open session discussion held at the  Campaign Against the Arms Trade (CAAT) annual gathering last Saturday exploring how open data could be useful to a campaigning organisation. A PDF copy is here: Open Data and Campaigning.

Background & Context

The last 18-months have seen an impressive array of policy initiatives and practical actions leading to the release of datasets from governments in the UK, the US and across the world in open and re-usable formats online. Datasets ranging from the location of educational institutions, to details of taxation and government spending, have been brought together in data portals such as data.gov and data.gov.uk.

The open government data ‘movement’ has three broad constituent parts:

  • An open Public Sector Information (PSI) movement – drawing upon economic arguments to call for government data to be released and made freely re-useable. Often drawing upon comparisons between EU context where government collected data is copyright and restricted, and the US where government datasets are more open and large industries have developed on the back of them (e.g. Weather data; Geodata etc.).

  • A transparency movement – linked to Access to Information and Freedom of Information movements – calling for the release of data in the interests of democratic empowerment, or data to be used in particular contexts and settings.

  • Digital government & semantic web computerization movements – focussed on the potential for innovation and more efficient working when data is made available for computer processing: and working to build open networks of knowledge across the Internet though linked-data approaches.

Many different groups can be found within the open government data ‘movement’ – from groups calling for aid transparency, to SME companies seeking to address what are seen as unfair data monopolies.

Policy context:

(See Open Government Data & Democracy report for a full timeline)

  • The http://data.gov initiative in the US proceeded from Obama’s first executive order on taking power as President.
  • http://data.gov.uk in the UK was initiated by Gordon Brown in 2009.
  • Since coming to power in 2010 the Coalition Government in the UK have continued to push open data initiatives – thought with a slightly different ‘transparency’ and ‘accountability’ framing.
    • A requirement has been placed on local authorities to publish all spending over £500 by January, listing supplier and spend.
    • Government departments are under a similar requirement for all spend over £25,000, and have been asked to publish senior staff pay details and internal organizational diagrams.
    • Francis Maude has spoken of the need for a ‘Freedom of Data’ act, and has called for all responses to Freedom of Information requests that contain data to provide that data in machine-readable forms (i.e. Excel spreadsheet rather than print-out of PDF files…)
    • Aid Transparency has been high up the government’s development agenda.
  • The World Bank have released significant amounts of their data as open data.
  • Australia, New Zealand and many European countries have ongoing open data initiatives and campaigns.

Beyond government data

It’s not only government supplied data that is of interest to campaigners:

  • Projects like TheyWorkForYou.com and PublicWhip.org generate structured data about politicians voting records by ‘scraping’ parliamentary records;
  • Data Journalists (led by innovators at The Guardian amongst other places) publish their research as open accessible spreadsheets of data that others can re-use.
  • Some NGOs and community organisations are publishing open datasets.

Why data?

One of the key properties of data is that it can be easily manipulated by computer – allowing datasets to be combined, visualized, explored and used in many more ways than a written report or printed document can.

Where to find data

For official government data – the guardian’s World Government Data Search looks across a range of data catalogues like http://data.gov.uk. Find it at http://www.guardian.co.uk/world-government-data and search for keywords or topics of interest to you.

You can also search http://data.gov.uk direct to browse data my department or topic.

http://ckan.net/ provides a catalogue of open data from many different sources – including government data, NGOs and research projects. It is a good place to ‘register’ any open data you create. It is also wiki-like, meaning any user can edit the records – allowing the creation of ‘collections’ of data on a particular topic: e.g. ‘arms trade’.

ScraperWiki.com provides a collection of ‘scrapers’ which collect structured data from unstructured data-sources (i.e. make open data where the original publisher didn’t provide it). For example, generating a dataset of hospitality received by UK Government Ministers, originally only available as a large collection of different word documents is now here: http://scraperwiki.com/scrapers/government-meetings-with-external-organisations-ne/ and available for download (Update: it’s also now available from http://transparency.number10.gov.uk/)

If you are looking for a particular dataset – it can be worth asking in the data.gov.uk forums, or using the #opendata hash-tag on Twitter.

Data on MPs and voting records is available from www.theyworkforyou.com in the UK, and the PublicWhip.org project collects more detailed voting records and makes them available.

When data isn’t available

Try using the Public Data Unlocking Service to request that data is proactively published: http://www.opsi.gov.uk/unlocking-service/opsipage.aspx?page=unlockindex

If using the Freedom of Information Act to request data, remind the recipient of Francis Maude’s policy statements on the need to provide machine-readable data in return.

If the information is available on websites, but not as structured data – consider putting a request on http://www.scraperwiki.com for someone to build a tool to screen-scrape the data.

Consider using any of the ‘data competitions’ (e.g. http://openup.tso.co.uk) as a higher-profile way to ask for a dataset: emphasizing the government’s focus on accountability through transparency in other sectors such as local authority spending and aid.

Use the facts you can find from datasets like COINS (http://data.gov.uk/dataset/coins) to better structure Freedom of Information requests or crowdsourcing activities.

Explore ways to ‘crowd-source’ the data by calling on campaigners and supporters to find out particular facts – and to enter them into shared online spreadsheets (e.g. using Google Spreadsheets and Google Forms you can create an easy way for people to collaboratively input into a shared document – which can be instantly published online). Crowdsourcing tools like Ushahidi can also be used to develop projects such as http://WhereAreTheCuts.org – crowdsourcing reports of public spending cuts.

Working with data

Working with data scares many people – but it can start off very simply, but there are many approaches – including:

  1. Using data-driven websites such as http://TheyWorkForYou.com (MPs speeches and voting) or http://WhereDoesMyMoneyGo.com (government spending) which have taken government data and made it available in more accessible forms.
  2. Downloading and exploring a single dataset – many datasets can be opened in spreadsheet software like Excel. Sort and filter the columns to look for interesting information.
  3. Visualise the data – using a tool like IBM Many Eyes where you can upload simple datasets and explore a range of different ways of presenting the data.
  4. Building a mash-up – using tools like Google Spreadsheets and Google Fusion Tables, or Google Refine (available for free download) to explore and combine datasets.Google Fusion Tables will allow you to upload any spreadsheet, and, if it contains place names, quickly ‘geocode’ the data for displaying on a map. You can also combine two datasets – matching on any shared keys (e.g. MP name; Town name; Constituency) to build larger datasets.
  5. Holding a hack day – hack days like those organized by Rewired State bring together developers (coders/geeks) and people with problems to solve and spend one or two days of concerted effort creating ‘hacks’ (rapid prototypes) which address those issues, often using open data.For example, a hack-day could look to generate visualizations concerning arms licenses (CAAT Specific), or to create tools that support campaigners to get information to use when writing to MPs. (Update: We could have a campaigning strand at the Oxford Open Data Hack Day on 4th December if there was interest)
  6. Commissioning open data-based tools – developing hack-day created prototypes, or other ideas, into full working tools.

  7. Training activists in using data – through workshops and hands-on activities. (I’m mid way through developing a training workshop at the mo… suggestions of groups to pilot with welcome…)
  8. Releasing datasets – from in-house research or crowd-sourced data – and inviting supporters to use the data in creative ways. For example, putting researched data into Google Spreadsheets and, much as the Guardian Datablog does, sharing links to that data whenever posting news stories or website pages based upon it.

Going further

Search for the #opendata community on Twitter; or the ‘Open Government Data’ mailing lists run by Open Knowledge Foundation. Most of the links above will also provide access to further practical and background information on open government data.

Tim Davies, Practical Participation (tim@practicalparticipation.co.uk) can offer consultancy, training, workshops and support for organisations exploring the use of open data in campaigning. Please do get in touch to explore more…

Open government data is not just a one-way flow…

[Summary: How can citizens, community institutions and social enterprise be part of producing ‘government’ data as well as consuming it? Some quick reflections…] (Cross-posted to Open Data Impacts blog)

Alison Powell poses the question in this blog post of whether we are moving into an era of ‘policy-based evidence’: where ideologically-driven policy making may lead to an end of evidence collection on key indicators (justified, no doubt, in the interests of ‘efficiency’), but impoverishing our understanding of the impacts of key policy choices. Alison certainly has a point: collecting evidence on an issue has been a key political strategy for shifting the political debate: and when evidence on the impact of a policy is gone – showing the positive or negative impact it had becomes far trickier.

However, just because government stops collecting data, or requiring that data is collected, doesn’t necessarily have to mean the loss of important social-policy datasets. The same transformational technological forces that mean government no-longer needs to, or can justify, monopolising the analysis of state data, means that the monopoly power of government is no longer needed to collect and collate many social-policy relevant datasets.

For many datasets the state has acted as co-ordinator of data collection: using it’s authority to require data to be shared in a standardised form (more often than not, spreadsheets or forms filled in and mailed or e-mailed in to some official in central government, who then rekeys data into another spreadsheet…). But: with collaborative online tools, will from the grassroots, and the right co-ordination/leadership many important datasets may be possible to generate without government involved at all.

Of course, if the strict definition of open government data is only applied to “produced or commissioned by government or government controlled entities” (though the definitions are a live debate…) then what I’m really talking about is community-created “open governance data” – or ‘data essential for informed democratic policy making’.

I don’t pretend that all the datasets Alison fears will be lost will survive: but it is worth thinking about how, if government no longer wants the data, those who care about the stories it will be telling in a few years time, keep collecting and take open, collaborative approaches to making governance data a two-way street…

(Some of the thoughts here are based on the lit review/analysis in §2.1 – 2.3 of my dissertation)

Online version: Open Data, Democracy and Public Sector Reform

Reposted from my Open Data Impacts research blog, which is where I’ll try and keep most open data related posts in future, with this blog maintaining it’s wider focus…

A public report based on the research for my MSc Dissertation is now available here*.

Thank you to everyone who contributed to the research whether in discussions, interviews or responding to the survey. As promised, I’ve started to share data from the survey, and will add to this as time allows in coming weeks.

Published for digressions

Over the weeks since I handed in my MSc Dissertation I’ve been trying to work out how best to share the final version. Each time I’ve started to edit it for release I’ve found more areas where I want to develop the argument further, or where I recognise that points I thought were conclusions are in fact the start of new questions. After trying out a few options, I settled on the fantastic Digress.it platform to put a copy of the report online – giving each paragraph it’s own URL and space for comments and trackbacks.

Hopefully this can help turn a static dissertation into something more dynamic as a tool for helping take forward thinking about the impacts of open government data. All comments, feedback, reflections and thinking aloud on the document welcome.


*Note: This is not the copy of the dissertation I submitted. That is still with the University being marked. When I submit a hard and digital library copy later this year I’ll post a link to those as the ‘official’ literature.

Oxfordshire: open & interactive

[Summary: A local post about open data, interactive working and social media in Oxford & Oxfordshire – and some rough ideas for making stuff happen…]

There’s not all that much open data published by local authorities in Oxfordshire right now, and whilst there are some great pockets of social media use, and digital technology projects across the different local authorities in the County, online interactivity from councillors, digital engagement from local councils, and hyperlocal community websites seem pretty sparse round here. We’ve got some great geek gatherings and social media meets, but not much that I can find in the way of social media surgery type activities.

But, having met with quite a few people from different local authorities across the County in the last month  it seems clear that there is real potential for more online engagement and open working in Oxfordshire, just some gaps in the knowledge, networks and catalysts to make things happen.

Which got me wondering about how the knowledge, networks and catalysts could be brought together. What would help…

  • …local authorities in Oxfordshire to understand, explore and release more open data;
  • …local authorities and community groups to get the most out of social media and interactive technology;
  • …turn Oxfordshire from a bit of a laggard in the worlds of open data and online interactivity, into a leading light…

And I realised: I’m not sure. But, here’s two modest proposals:

  • An informal gathering some time in September of people interested in catalysing more online engagement and open data action in the county to explore possibilities. Could we set up a regular social media surgery? What about some hack-days with local open data? Or should we head out a build a better directory of the hyperlocal websites across Oxford? Interested? Let me know in the comments below – and suggest when might be a good time on this Doodle and I’ll try and find a suitable venue… (offers of venues welcome…)
  • Running a half-day event for Oxfordshire local authorities sometime in the Autumn to provide an introduction to open data; social media and ideas for more interactive ways of working. Sometime to be discussed at an informal gathering perhaps – but I’d also be interested to hear direct from anyone in Oxfordshire Councils about whether this would be useful / what would be most useful…. drop me an e-mail if you work with an Oxfordshire LA and you would be interested; or if you work with open data / social media locally and might be interested in helping organise something.

What do you think? If there is interest then I’d be up for spending a bit of time helping make something happen…

Of course – this may all already be happening? Or it might have been tried before? So comments / ideas on stuff already going / criticism / alternative ideas etc. welcome too…

Open data requires responsible reporting…

[Summary: Some initial reflections on the release and reporting of COINS government spending data]

The last week has seen big moves in the opening up of Government data, with the release today of the COINS database of government spending.

Since it was released at 9.30 this morning there has been buzz of activity trying to clean the raw data up into usable forms (see the Open Knowledge Foundation and Guardian interfaces to exploring the COINS data) and I think it’s certainly fair to say that the race to create ways to explore the data has generated some impressive results – leading to tools beyond what may have been created by an internal government process to present the same data in user-friendly forms. We’re learning a lot right now about the potential of crowd-sourced collaborations between government and other groups. And thanks to the development of good ways to explore the data, it is already providing the basis for news stories on government spending… and this is where we’ve still got a lot to learn.

Responsible Reporting

Neither the Guardian (disappointingly), nor the Daily Mail (unsurprisingly), in reporting that government spend £1.8bn on consultants last year, give an account of how this figure was derived. Transparency can’t be for government alone.

It does not seem to be too much to ask that the reports give an account of how this data was derived, given they can very easily link to the raw data itself. The £1.8bn Consultancy Spend story is interesting. But without knowing what categories of codes from COINS were used to generate that figure – I’ve no way of using the transparency of the government data to explore that finding more for myself.

Interestingly, this may also fall foul of the terms under which the data is available: ‘Crown Copyright with Data.gov.uk Rights‘. This requires attribution of the data ‘in the form the data provider specifies, or otherwise “Contains [insert name of Data Provider] data © Crown copyright and database right” and requires that users “do not misrepresent the Data or its source”.

As government develops new conventions for transparency – it would be good to see new conventions from mediators between data and the public too. Perhaps data.gov.uk should be clearer about attribution – and suggest that attribution should involve a clear link back to the dataset. If that was combined with some of the points Paul Clarke noted (and my comment on that post picks up on) around improving the user-friendly nature of data-stores, then simple steps might move us closer to ensuring transparency builds effective public debate – weaving data into the information.

Transparency in government means more than just chance for government. And that’s important for advocates of open data and an open society not to loose sight of…

Have you explored open government data?

If you’ve looked at any sites such as Data.gov.uk or the London Datastore website, where you can browse and access datasets recently released by government, then I need your help.

As part of my MSc dissertation research I’m carrying out a survey into the use of open government data.

If you can spare 10 or 15 minutes to respond, then please do take a look here.

(Oh, and there is a draw for one of four £25 Amazon vouchers as a way of thanking contributors to the survey…)

And if you’re interested in the wider research, I’m blogging that over on the Open Data Impacts project blog.

Where is DFID spending money on youth, and other interesting project data mash-ups

I was down in London again on Saturday for the AID Information Challenge – another data-focussed event, but this time looking at International Development Data.

One of the main datasets we had to work with was the DFID Projects Database – a list of all the different development projects the Department for International Development has been funding over recent years, and has funding committed to in the future. Given I’ve recently finished getting the DFID funded ‘Youth Participation in Development‘ guide online, I initially thought I would explore how to link project data to the case studies in that guide. However, I soon found myself joining in with a team of others who were trying to visualise the projects dataset in more general ways.

The result: a faceted browsing mash-up using the fantastic Exhibit framework – turning this into this.

The faceted browser means that you can select different countries (only by their country code at the moment), years, funding types or funding programmes and explore the different project funding DFID has been giving out to these.

Click through to the Map view, and where funding went to a specific country you’ll be able to see a map of where the funds were distributed. (A lot of funding goes to regions or is non-specific geographically – at the moment this just display under the ‘could not be plotted’ above the map).

Even though I didn’t work directly with the Youth Participation in Development Guide, down at the bottom of the list of facets you will find one to help explore youth-related funding: you can pull out all the projects which include ‘Youth’ or ‘Young People’ in their project titles or descriptions.

Thanks to the Publish What You Fund and Open Knowledge Foundation teams for organising the day 🙂