[Summary: Some initial reflections on the release and reporting of COINS government spending data]
The last week has seen big moves in the opening up of Government data, with the release today of the COINS database of government spending.
Since it was released at 9.30 this morning there has been buzz of activity trying to clean the raw data up into usable forms (see the Open Knowledge Foundation and Guardian interfaces to exploring the COINS data) and I think it’s certainly fair to say that the race to create ways to explore the data has generated some impressive results – leading to tools beyond what may have been created by an internal government process to present the same data in user-friendly forms. We’re learning a lot right now about the potential of crowd-sourced collaborations between government and other groups. And thanks to the development of good ways to explore the data, it is already providing the basis for news stories on government spending… and this is where we’ve still got a lot to learn.
Responsible Reporting
Neither the Guardian (disappointingly), nor the Daily Mail (unsurprisingly), in reporting that government spend £1.8bn on consultants last year, give an account of how this figure was derived. Transparency can’t be for government alone.
It does not seem to be too much to ask that the reports give an account of how this data was derived, given they can very easily link to the raw data itself. The £1.8bn Consultancy Spend story is interesting. But without knowing what categories of codes from COINS were used to generate that figure – I’ve no way of using the transparency of the government data to explore that finding more for myself.
Interestingly, this may also fall foul of the terms under which the data is available: ‘Crown Copyright with Data.gov.uk Rights‘. This requires attribution of the data ‘in the form the data provider specifies, or otherwise “Contains [insert name of Data Provider] data © Crown copyright and database right” and requires that users “do not misrepresent the Data or its source”.
As government develops new conventions for transparency – it would be good to see new conventions from mediators between data and the public too. Perhaps data.gov.uk should be clearer about attribution – and suggest that attribution should involve a clear link back to the dataset. If that was combined with some of the points Paul Clarke noted (and my comment on that post picks up on) around improving the user-friendly nature of data-stores, then simple steps might move us closer to ensuring transparency builds effective public debate – weaving data into the information.
Transparency in government means more than just chance for government. And that’s important for advocates of open data and an open society not to loose sight of…
It’s an interesting point, Tim.
After 12 years as a journalist and five as a press officer the coverage doesn’t surprise me in the slightest. It’s what will happen. It’s what is going to happen. It’s not going to change. The journalist will always go for the ‘news line’ in a story. It’s what they do.
I’m afraid in a world where media law is tested and sailed close to on a daily basis, where embargoes are broken routinely and where the pressure on journalists has never been greater any editor would not pay a moments fleeting attention to attribution. They neither understand it nor care, I’m afraid.
What would be interesting and I’ve seen the odd piee of evidence of this happening is for newsrooms to make space for web developers who can build apps that would interpret the data for their entire readership.
Now that’s where things would get interesting.
I think the most positive view of this is that we’re at the early point of an upward curve in sophistication of dealing with data and developing a cadre of citizen-analysts.
First step: put significant volumes of data online. Next, enable individuals and groups, as well as professionals and the media, to analyse it. Over time, the combination of so many datasets and potential stories of bureaucratic abuse, as well as the number of eyeballs able to check and challenge, or corroborate those stories, will start to ensure that data is handled more sensibly and analysed more fairly.
As I say, it’s the optimistic, but hopefully just long-term, view!
@danslee My sense is that when government is changing what it is that it does (Just 12 months ago we could have said “Keeping data tightly controlled. That’s just had governments do.”) then I think we have to recognise that a democratic system is composed to inter-related governmental, media, corporate and civil society systems – and some of those needs to be adapting what it is they do too.
Now, of course the theoretical reason governments change behaviour is voter pressure, and for newspapers, it’s highly unlikely we’ll get change in behaviour without consumer pressure. But to create the sorts of cadre’s of citizen journalists that @lesteph mentions – responsible mediation of data by the media is going to be important.
Attribution wise – I would certainly expect it to be ignored in the majority of cases: But, a license is a legal document – and it sets a norm, even if that norm is not always applied. Certainly in the academic world it’s been found that if you make it easy for people to cite data (e.g. provide a very clear suggested citation) then they are more likely to. Finding the appropriate forms of suggested citation that fit into newspaper styles could be an interesting ‘nudge’ project. The legal nature of license does mean there is an avenue open (highly unlikely to be used, but open), for government to come back at wilful misrepresentation of data.
@steph I’m absolutely in the long-term optimist camp. But the good outcomes are not inevitable – they do depend on how the cadre of citizen-analysts develop; how media plays a role in the transparency movement; how the release of data supports the sorts of open conversations and social interaction between state and citizen that we’ve been working on for years before the focus shifted to data.
The graph at http://mps-expenses.guardian.co.uk/ does make me worry that we don’t take enough into account the limitations on crowd-sourcing. It’s an unquestionably powerful way to get many eyes on tough problems – but it has limitations also…
So, summary: optimistic about future of an open society; but we need to be thinking about, and designing the details to make sure if goes in good directions…
It’s an interesting point about the licence for open data leaving an avenue open for potential legal challenge on reporting.
However, the experience of MPs expenses and the fruitless legal steps taken to try and block this means, for my money, that it would be a very, very, very brave Government that decided to take legal action. They’ve been there. They’ve done that.
Can’t remember who said it, but they did make the point that they hoped that the vacuum for interpreting data would not be filled solely by the T*x P*ayers Alli*nce who are, it is reported, only largely representative of a small number of taxpayers with non-dom status.
Perhaps Julian Assange’s vision of a ‘Scientific Journalism’: http://techpresident.com/blog-entry/julian-assanges-vision-%E2%80%98scientific-journalism%E2%80%99 may be part of the solution?
@Lesley The idea of ‘scientific journalism’ is potentially part of the solution, although I’m not sure the model the article suggest Julian Assange is pushing is quite it. The concept of “be transparent with the public about the raw materials you’re working with, and you’re given license to spin the news how you see fit” wouldn’t entirely make sense in the journalism community as the social pressures which ensure scientists lose out if they repeatedly twist or misreport data are not present in the journalist community.
Whilst the scientific ethic is focussed on (in theory) searching after truth, or at least truth as defined by a community of scientists, elements of the journalistic world are often based on fairly ‘tribal’ communities of interest with political rather than ‘fact finding’ goals.
It’s the social structure of science that helps ensure relatively good behaviour in the interpretation of data – rather than solely the norm of publishing data alone (a norm which is only implemented in some disciplines…). Sawyer addresses some of the basis of scientific data use and sharing, and it’s impact on fact production in this ironically closed access paper: http://bit.ly/dvlqm9.
So: publishing the data which journalistic articles are based on = good principle; When that data is open data then linking to the source and accounting for the analysis should be a positive community norm to encourage. But it’s unlikely that this practice would have the same ‘rationalist’ impacts that scientific data publishing practice does.
@DanSlee An interesting point about the TPA.
It’s notable that the minutes of the meeting in Windsor and Maidenhead that led to the council there publishing all ‘Payments to Suppliers’ over £500 was motivated by a TPA Campaign (see pg 20. of minutes of http://bit.ly/9vIZBc (PDF)).
Transparency is a fascinating issue because it doesn’t seem to be operating along ‘old politics’ lines, but there is certainly a lot of ‘old politics’, and potentially very anti-public-sector politics still around it…