Linking data and AI literacy at each stage of the data pipeline

[Summary: extended notes from an unConference session]

At the recent data literacy focussed Open Government Partnership unConference day (ably facilitated by my fellow Stroudie Dirk Slater)  I acted as host for a break-out discussion on ‘Artificial Intelligence and Data Literacy’, building on the ‘Algorithms and AI’ chapter I contributed to The State of Open Data book.

In that chapter, I offer the recommendation that machine learning should be addressed within wider open data literacy building.  However, it was only through the unConference discussions that we found a promising approach to take that recommendation forward: encouraging a critical look at how AI might be applied at each stage of the School of Data ‘Data Pipeline’.

The Data Pipeline, which features in the Data Literacy chapter of The State of Open Data, describes seven stages for woking with data, from defining the problem to be addressed, through to finding and getting hold of relevant data, verifying and cleaning it, and analysing data and presenting findings.

Figure 2: The School of Data’s data pipeline. Source: https://schoolofdata.org/methodology/
Figure: The School of Data’s data pipeline. Source: https://schoolofdata.org/methodology/

 

Often, AI is described as a tool for data analysis (any this was the mental framework many unConference session participants started with). Yet, in practice, AI tools might play a role at each stage of the data pipeline, and exploring these different applications of AI could support a more critical understanding of the affordances, and limitations, of AI.

The following rough worked example looks at how this could be applied in practice, using an imagined case study to illustrate the opportunities to build AI literacy along the data pipeline.

(Note: although I’ll use machine-learning and AI broadly interchangeably in this blog post, as I outline in the State of Open Data Chapter, AI is a  broader concept than machine-learning.)

Worked example

Imagine a human rights organisation, using a media-monitoring service to identify emerging trends that they should investigate. The monitoring service flags a spike in gender based violence, encouraging them to seek out more detailed data. Their research locates a mix of social media posts, crowdsourced data from a harassment mapping platform, and official statistics collected in different regions across the country. They bring this data together, and seek to check it’s accuracy, before producing an analysis and visually impactful report.

As we unpack this (fictional) example, we can consider how algorithms and machine-learning are, or could be, applied at each stage – and we can use that to consider the strengths and weaknesses of machine-learning approaches, building data and AI literacy.

  • Define – The patterns that first give rise to a hunch or topic to investigate may have been identified by an algorithmic model.  How does this fit with, or challenge, the perception of staff or community members? If there is a mis-match – is this because the model is able to spot a pattern than humans were not able to see (+1 for the AI)? Or could it be because the model is relying on input data that reflects certain bias (e.g. media may under-report certain stories, or certain stories may be over-reported because of certain cognitive biases amongst reporters)?

  • Find – Search engine algorithms may be applying machine-learning approaches to identify and rank results. Machine-translation tools, that could be used to search for data described in other languages, are also an example of really well established AI. Consider the accuracy of search engines and machine-translation: they are remarkable tools, but we also recognise that they are nowhere near 100% reliable. We still generally rely on a human to sift through the results they give.

  • Get – One of the most common, and powerful, applications of machine-learning, is in turning information into data: taking unstructured content, and adding structure through classification or data extraction. For example, image classification algorithms can be trained to convert complex imagery into a dataset of terms or descriptions; entity extraction and sentiment analysis tools can be used to pick out place names, event descriptions and a judgement on whether the event described is good or bad, from free text tweets, and data extraction algorithms can (in some cases) offer a much faster and cheaper way to transcribe thousands of documents than having humans do the work by hand. AI can, ultimately, change what counts as structured data or not.  However, that doesn’t mean that you can get all the data you need using AI tools. Sometimes, particularly where well-defined categorical data is needed, getting data may require creation of new reporting tools, definitions and data standards.

  • Verify – School of Data describe the verification step like this: “We got our hands in the data, but that doesn’t mean it’s the data we need. We have to check out if details are valid, such as the meta-data, the methodology of collection, if we know who organised the dataset and it’s a credible source.” In the context of AI-extracted data, this offers an opportunity to talk about training data and test data, and to think about the impact that tuning tolerances to false-positives or false-negatives might have on the analysis that will be carried out. It also offers an opportunity to think about the impact that different biases in the data might have on any models built to analyse it.

  • Clean – When bringing together data from multiple sources, there may be all sorts of errors and outliers to address. Machine-learning tools may prove particularly useful for de-duplication of data, or spotting possible outliers. Data cleaning to prepare data for a machine-learning based analysis may also involve simplifying a complex dataset into a smaller number of variables and categories. Working through this process can help build an understanding of the ways in which, before a model is applied, certain important decisions have already been made.

  • Analyse – Often, data analysis takes the form of simple descriptive charts, graphs and maps. But, when AI tools are added to the mix, analysis might involve building predictive models, able, for example, to suggest areas of a county that might see future hot-spots of violence, or that create interactive tools that can be used to perform ongoing monitoring of social media reports. However, it’s important in adding AI to the analysis toolbox, not to skip entirely over other statistical methods: and instead to think about the relative strengths and weaknesses of a machine-learning model as against some other form of statistical model. One of the key issues to consider in algorithmic analysis is the ’n’ required: that is, the sample size needed to train a model, or to get accurate results. It’s striking that many machine-learning techniques required a far larger dataset that can be easily supplied outside big corporate contexts. A second issue that can be considered in looking at analysis is how ‘explainable’ a model is: does the machine-learning method applied allow an exploration of the connections between input and output? Or is it only a black box.

  • Present – Where the output of conventional data analysis might be a graph or a chart describing a trend, the output of a machine-learning model may be a prediction. Where a summary of data might be static, a model could be used to create interactive content that responds to user input in some way. Thinking carefully about the presentation of the products of machine-learning based analysis could support a deeper understanding of the ways in which such outputs could or should be used to inform action.

The bullets above give just some (quickly drafted and incomplete) examples of how the data pipeline can be used to explore AI-literacy alongside data literacy. Hopefully, however, this acts as enough of a proof-of-concept to suggest this might warrant further development work.

The benefit of teaching AI literacy through open data

I also argue in The State of Open Data that:

AI approaches often rely on centralising big datasets and seeking to personalise services through the application of black-box algorithms. Open data approaches can offer an important counter-narrative to this, focusing on both big and small data and enabling collective responses to social and developmental challenges.

Operating well in a datified world requires citizens to have a critical appreciation of a wide variety of ways in which data is created, analysed and used – and the ability to judge which tool is appropriate to which context.  By introducing AI approaches as one part of the wider data toolbox, it’s possible to build this kind of literacy in ways that are not possible in training or capacity building efforts focussed on AI alone.

The politics of misdirection? Open government ≠ technology.

[Summary: An extended write-up of a tweet-length critique]

The Open Government Partnership (OGP) Summit is, on many levels, an inspiring event. Civil society and government in dialogue together on substantive initiatives to improve governance, address civic engagement, and push forward transparency and accountability reforms. I’ve had the privilege, through various projects, to be a civil society participant in each of the 6 summits in Brasilia, London, Mexico, Paris, Tbilisi and now Ottawa. I have a lot of respect for the OGP Support Unit team, and the many government and civil society participants who work to make OGP a meaningful forum and mechanism for change. And I recognise that the substance of a summit is often found in the smaller sessions, rather than the set-piece plenaries. But, the summit’s opening plenary offered a powerful example of the way in which a continued embrace of a tech-goggles approach at OGP, and weaknesses in the design of the partnership and it’s events, misdirect attention, and leave some of the biggest open government challenges unresolved.

Trudeau’s Tech Goggles?

We need to call out the techno-elitism, and political misdirection, that mean  the Prime Minister of Canada can spend the opening plenary in an interview that focussed more on regulation of Facebook, than on regulation of the money flowing into politics; and more time answering questions about his Netflix watching, than discussing the fact that millions of people still lack the connectivity, social capital or civic space to engage in any meaningful form of democratic decision making. Whilst (new-)media inevitably plays a role in shaping patterns of populism, a narrow focus on the regulation of online platforms directs attention away from the ways in which economic forces, transportation policy, and a relentless functionalist focus on ‘efficient’ public services, without recognising their vital role in producing social-solidarity,  has contributed to the social dislocation in which populism (and fascism) finds root.

Of course, the regulation of large technology firms matters, but it’s ultimately an implementation detail that some come as part of wider reforms to our democratic systems. The OGP should not be seeking to become the Internet Governance Forum (and if it does want to talk tech regulation, then it should start by learning lessons from the IGFs successes and failures), but should instead be looking deeper at the root causes of closing civic space, and of the upswing of populist, non-participatory, and non-inclusive politics.

Beyond the ballot box?

The first edition of the OGP’s Global Report is sub-titled ‘Democracy Beyond the Ballot Box and opens with the claim that:

…authoritarianism is on the rise again. The current wave is different–it is more gradual and less direct than in past eras. Today, challenges to democracy come less frequently from vote theft or military coups; they come from persistent threats to activists and journalists, the media, and the rule of law.

The threats to democracy are coming from outside of the electoral process and our response must be found there too. Both the problem and the solution lie “beyond the ballot box.”

There appears to be a non-sequitur here. That votes are not being stolen through physical coercion, does not mean that we should immediately move our focus beyond electoral processes. Much like the Internet adage that ‘censorship is damage, route around it, there can be a tendency in Open Government circles to treat the messy politics of governing as a fundamentally broken part of government, and to try and create alternative systems of participation or engagement that seek to be ‘beyond politics’. Yet, if new systems of participation come to have meaningful influence, what reason do we have to think they won’t become subject to the legitimate and illegitimate pressures that lead to deadlock or ‘inefficiency’ in our existing institutions? And as I know from local experience, citizen scrutiny of procurement or public sending from outside government can only get us so far without political representatives willing to use and defend they constitutional powers of scrutiny.

I’m more and more convinced that to fight back against closing civic space and authoritarian government, we cannot work around the edges: but need to think more deeply about about how we work to get capable and ethical politicians elected: held in check by functioning party systems, and engaging in fair electoral competition overseen by robust electoral institutions. We need to go back to the ballot box, rather than beyond it. Otherwise we are simply ceding ground to the forces who have progressively learnt to manipulate elections, without needing to directly buy votes.

Globally leaders, locally laggards?

The opening plenary also featured UK Government Minister John Penrose MP. But, rather than making even passing mention of the UK’s OGP National Action Plan, launched just one day before, Mr Penrose talked about UK support for global beneficial ownership transparency. Now: it is absolutely great that that ideas of beneficial ownership transparency are gaining pace through the OGP process.

But, there is a design flaw in a multi-stakeholder partnership where a national politician of a member country is able to take the stage without any response from civil society. And where there is no space for questions on the fact that the UK government has delayed the extension of public beneficial ownership registries to UK Overseas Territories until at least 2023. The misdirection, and #OpenWashing at work here needs to be addressed head on: demanding honest reflections from a government minister on the legislative and constitutional challenges of extending beneficial ownership transparency to tax havens and secrecy jurisdictions.

As long as politicians and presenters are not challenged when framing reforms as simple (and cheap) technological fixes, we will cease to learn about and discuss the deeper legal reforms needed, and the work needed on implementation. As our State of Open Data session on Friday explored: data and standards must be the means not the ends, and more public scepticism about techno-determinist presentations would be well warranted.

Back, however, to event design. Although when hosted in London, the OGP Summit offered UK civil society at least, an action-forcing moment to push forward substantive National Action Plan commitments, the continued disappearance of performative spaces in which governments account for their NAPs, or  different stakeholders from a countries multi-stakeholder group share the stage, means that (wealthy, and northern) governments are put in control of the spin.

Grounds for hope?

It’s clear that very many of us understand that open government ≠ technology, at least if (irony noted) likes and RTs on the below give a clue. 

But we need to hone our critical instincts to apply that understanding to more of the discussions in fora like OGP. And if, as the Canadian Co-Chair argued in closing, “OGP is developing a new forms of multilateralism”, civil society needs to be much more assertive in taking control of the institutional and event design of OGP Summits, to avoid this being simply a useful annual networking shin-dig. The closing plenary also included calls to take seriously threats to civic space: but how can we make sure we’re not just saying this from the stage in the closing, but that the institutional design ensures there are mechanisms for civil society to push forward action on this issue. 

In looking to the future of OGP, we should consider how civil society spends some time taking technology off the table. Let it emerge as an implementation detail, but perhaps let’s see where we get when we don’t let techo-discussions lead?

The lamentable State of Open Government in the UK

Yesterday the UK Government published, a year late, it’s most recent Open Government Partnership National Action Plan. It would be fair to say that civil society expectations for the plan were low, but when you look beyond the fine words to the detail of the targets set, the plan appears to  limbo under even the lowest of expectations.

For example, although the Ministerial foreword acknowledges that “The National Action Plan is set against the backdrop of innovative technology being harnessed to erode public trust in state institutions, subverting and undermining democracy, and enabling the irresponsible use of personal information.”, the furthest the plan goes in relation to these issues is a weak commitment to “maintain an open dialogue with data users and civil society to support the development of the Government’s National Data Strategy.” This commitment has supposedly been ‘ongoing’ since September 2018, yet try as I might to find any public documentation of how the government is engaging around the data strategy – I’m drawing a blank. Not to mention that there is absolutely zilch here about actually tackling the ways in which we see democracy being subverted, not only through use of technology, but also through government’s own failures to respond to concerns about the management of elections or to bring forward serious measures to tackle the illegal flow of money into party and referendum campaigning. For work on open government to be meaningful we have to take off the tech-goggles, and address the very real governance  and compliance challenges harming democracy in the UK. This plan singularly fails at that challenge.

In short, this is a plan with nothing new; with very few measurable targets that can be used to hold government to account; and with a renewed conflation of open data and open government.

Commitment 3 on Open Policy Making, to “Deliver at least 4 Open Policy Making demonstrator projects” have suspicious echoes of the 2013 commitment 16 to run “at least five ‘test and demonstrate projects’ across different policy areas.”. If central government has truly “led by example” on “increasingly citizen participation” as the introduction to this plan claims, then it seems all we are every going to get are ad-hoc examples. Evidence of any systemic action to promote engagement is entirely absent.  The recent backsliding on public engagement in the UK vividly underscored by the fact that commitment 8 includes responding by November 2019 to a 2016 consultation. Agile, iterative and open government this is not.

Commitment 6 on an ‘Innovation in Democracy Programme involves token funding to allow a few local authority areas to pilot ‘Area Democracy Forums’, based on a citizens assembly models – at the same time that the government refuses to support any sort of participatory citizen dialogue to deal with pressing issue of both Brexit and Climate Change. The contract to deliver this work has already been tendered in any case, and the only targets in the plan relate to ‘pilots delivered’ and ‘evaluation’. Meaningful targets that might track how far progress has been made in actually giving citizens power over decisions making are notably absent.

The most substantive targets can be found under commitments 4 and 5 on Open Contracting and Natural Resource Transparency (full disclosure: most of the Open Contracting targets come from draft content I wrote when a member of the UK Open Contracting Steering Group). If Government actually follows through on the commitment to “Report regularly on publication of contract documents, and extent of redactions.”, and this reporting leads to better compliance with the policy requirements to disclose contracts, there may even be something approaching transformative here. But, the plan suggests such a commitment to quarterly reporting should have been in place since the start of the year, and I’ve not yet tracked down any such report. 

Overall these commitments are about house-keeping: moving forward a little on the compliance with policy requirements that should have been met long ago. By contrast, the one draft commitment that could have substantively moved forward Open Contracting in the UK, by shifting emphasis to the local level where there is greatest scope to connect contracting and citizen engagement, is the one commitment conspicuously dropped from the final National Action Plan.  Similarly, whilst the plan does provide space for some marginal improvements in grants data (Commitment 1), this is simply a continuation of existing commitments.

I recognise that civil servants have had to work long and hard to get even this limited NAP through government given the continued breakdown normal Westminster operations. However, as I look back to the critique we wrote of the first UK OGP NAP back in 2012, it seems to me that we’re back where we started or even worse: with a government narrative that equates open government and open data, and a National Action Plan that repackages existing work without any substantive progress or ambition. And we have to consider when something so weak is actually worse than nothing at all.

I resigned my place on the UK Open Government Network Steering Group last summer: partly due to my own capacity, but also because of frustration at stalled progress, and the co-option of civil society into a process where, instead of speaking boldly about the major issues facing our public sphere, the focus has been put on marginal pilots or small changes to how data is published. It’s not that those things are unimportant in and of themselves: but if we let them define what open government is about – well, then we have lost what open government should have been about.

And even we do allow the OGP to have a substantial emphasis on open data, where the UK government continues to claim leadership, the real picture is not so rosy. I’ll quote from Rufus Pollock and Danny Lämmerhirt’s analysis of the UK in their chapter for the State of Open Data:

“Open data lost most of its momentum in late 2015 as government attention turned to the Brexit referendum and later to Brexit negotiations. Many open data advisory bodies ceased to exist or merged with others. For example, the Public Sector Transparency Board became part of the Data Steering Group in November 2015, and the Open Data User Group discontinued its activities entirely in 2015. There have also been political attempts to limit the Freedom of Information Act (FOIA) based on the argument that opening up government data would be an adequate substitute. There are still issues around publishing land ownership information across all regions, and some valuable datasets have been transferred out of government ownership avoiding publication, such as the Postal Address File that was sold off during the privatisation of the Royal Mail.”

The UK dropped in the Open Data Barometer rankings in 2017 (the latest data we have), and one of the key commitments from the last National Action Plan to “develop a common data standard for reporting election results in the UK” and improve crucial data on elections results had ‘limited’ progress according to the IRM, demonstrating a poor recent track record from the UK on opening up new datasets where it matters.

So where from here?

I generally prefer my blogging (and engagement) to be constructive. But I’m hoping that sometimes, the most constructive thing to do, is to call out the problems, even when I can’t see a way to solutions. Right now, it feels to me as though the starting point must be to recognise:

  • The UK Government is failing to live up to the Open Government Declaration.
  • UK Civil Society has failed to use the latest OGP NAP process to secure any meaningful progress on the major open government issues of the day.
  • The Global OGP process is doing very little to spur on UK action.

It’s time for us to face up to these challenges, and work out where we head from here. 

Over the horizons: reflections from a week discussing the State of Open Data

[Summary: thinking aloud with five reflections on future directions for ope data related work, following discussions around the US east coast]

Over the last week I’ve had the opportunity to share findings from The State of Open Data: Histories and Horizons in a number of different settings: from academic roundtables, to conference presentations, and discussion panels.

Each has been an opportunity not only to promote the rich open access collection of essays just published, but also a chance to explore the many and varied chapters of the book as the starting point for new conversation about how to take forward an open approach to data in different settings and societies.

In this post I’m going to try and reflect on a couple of themes that have struck me during the week. (Note: These are, at this stage, just my initial and personal reflections, rather than a fully edited take on discussions arising from the book.)

Panel discussion at the GovLab with Tariq Khokhar, Adrienne Schmoeker and Beth Noveck.

Renewing open advocacy in a changed landscape

The timeliness of our look at the Histories and Horizons of open data was underlined on Monday when a tweet from Data.gov announced this week as their 10th anniversary, and the Open Knowledge Foundation, also celebrated their 15th birthday with a return to their old name, a re-focussed mission to address all forms of open knowledge, and an emphasis on creating “a future that is fair, free and open.”As they put it:

  …in 2019, our world has changed dramatically. Large unaccountable technology companies have monopolised the digital age, and an unsustainable concentration of wealth and power has led to stunted growth and lost opportunities. “

going on to say

“we recognise it is time for new rules for this new digital world.”

Not only is this a welcome and timely example of the kind of “thinking politically we call for in the State of Open Data conclusion, but it chimes with many of the discussions this week, which have focussed as much on the ways in which private sector data should be regulated as they have on opening up government data. 

While, in tools like the Open Data Charter’s Open Up Guides, we have been able to articulate a general case for opening up data in a particular sector, and then to enumerate ‘high value’ datasets that efforts should attend to, future work may need to go even deeper into analysing the political economy around individual datasets, and to show how a mix of voluntary data sharing, and hard and soft regulation, can be used to more directly address questions about how power is created, structured and distributed through control of data.

As one attendee at our panel at the Gov Lab put it, right now, open data is still often seen as a “perk not a right”.  And although ‘right to data’ advocacy has an important role, it is by linking access to data to other rights (to clean air, to health, to justice etc.) that a more sophisticated conversation can develop around improving openness of systems as well as datasets (a point I believe Adrienne Schmoeker put in summing up a vision for the future).

Policy enables, problems drive

So does a turn towards problem-focussed open data initiatives mean we can put aside work on developing open data policies or readiness assessments? In short, no.

In a lunchtime panel at the World Bank, Anat Lewin offered an insightful reflection on The State of Open Data from a multilateral’s perspective, highlighting the continued importance of developing a ‘whole of government’ approach to open data. This was echoed in Adrienne Schmoeker’s description at The Gov Lab of the steps needed to create a city-wide open data capacity in New York. In short, without readiness assessment and open data policies put in place, initiatives that use open data as a strategic tool are likely to rub up against all sorts of practical implementation challenges.

Where in the past, government open data programmes have often involved going out to find data to release, the increasing presence of data science and data analytics teams in government means the emphasis is shifting onto finding problems to solve. Provided data analytics teams recognise the idea of ‘data as a team sport’, requiring not just technical skills, but also social science, civic engagement and policy development skill sets – and providing professional values of openness are embedded in such teams – then we may be moving towards a model in which ‘vertical’ work on open data policy, works alongside ‘horizontal’ problem-driven initiatives that may make less use of the language of open data, but which still benefit from a framework of openness.

Chapter discussions at the OpenGovHub, Washington DC

Political economy really matters

It’s been really good to see the insights that can be generated by bringing different chapters of the book into conversation. For example, at the Berkman-Klein Centre, comparing and contrasting attitudes in North America vs. North Africa towards the idea that governments might require transport app providers like Uber to share their data with the state, revealed the different layers of concern, from differences in the market structure in each country, to different levels of trust in the state. Or as danah boyd put it in our discussions at Data and Society, “what do you do when the government is part of your threat model?”.  This presents interesting challenges for the development of transnational (open) data initiatives and standards – calling for a recognition that the approach that works in one country (or even one city), may not work so well in others. Research still does too little to take into account the particular political and market dynamics that surround successful open data and data analytic projects.

A comparisons across sectors, emerging from our ‘world cafe’ with State of Open Data authors at the OpenGovHub also shows the trade-offs to be made when designing transparency, open data and data sharing initiatives. For example, where the extractives transparency community has the benefit of hard law to mandate certain disclosures, such law is comparatively brittle, and does not always result in the kind of structured data needed to drive analysis. By contrast, open contracting, in relying on a more voluntary and peer-pressure model, may be able to refine it’s technical standards more iteratively, but perhaps at the cost of weaker mechanisms to enforce comprehensive disclosure. As Noel Hidalgo put it, there is a design challenge in making a standard that is a baseline, on top of which more can be shared, rather than one that becomes a ceiling, where governments focus on minimal compliance.

It is also important to recognise that when data has power, many different actors may seek to control, influence and ultimately mess with it. As data systems become more complex, the vectors for attack can increase. In discussions at Data & Society, we briefly touched on one cases where a government institution has had to take considerable steps to correct for external manipulation of it’s network of sensors. When data is used to trigger direct policy response (e.g. weather data triggering insurance payouts, or crime data triggering policing action), then the security and scrutiny of that data becomes even more important.

Open data as a strategic tool for data justice

I heard the question “Is open data dead?” a few times over this week. As the introductory presentation I gave for a few talks noted, we are certainly beyond peak open data hype. But, the jury is, it seems, still very much out on the role that discourses around open data should play in the decade ahead. At our Berkman-Klein Centre roundtable, Laura Bacon shared work by Omidyar/Luminate/Dalberg that offered a set of future scenarios for work on open data, including the continued existence of a distinct open data field, and an alternative future in which open data becomes subsumed within some other agenda such as ‘data rights’. However, as we got into discussions at Data & Society of data on police violence, questions of missing data, and debates about the balancing act to be struck in future between publishing administrative data and protecting privacy, the language of ‘data justice’ (rather than data rights) appeared to offer us the richest framework for thinking about the future.

Data justice is broader than open data, yet open data practices may often be a strategic tool in bringing it about. I’ve been left this week with a sense that we have not done enough to date to document and understand ways of drawing on open data production, consumption and standardisation as a form of strategic intervention. If we had a better language here, better documented patterns, and a stronger evidence base on what works, it might be easier to both choose when to prioritise open data interventions, and to identify when other kinds of interventions in a data ecosystem are more appropriate tools of social progress and justice.

Ultimately, a lot of discussions the book has sparked have been less about open data per-se, and much more about the shape of data infrastructures, and questions of data interoperability.  In discussions of Open Data and Artificial Intelligence at the OpenGovHub, we explored the failure of many efforts to develop interoperability within organisations and across organisational boundaries. I believe it was Jed Miller who put the challenge succinctly: to build interoperable systems, you need to “think like an organiser” – recognising data projects also as projects of organisational change and mass collaboration. Although I think we have mostly moved past the era in which civic technologists were walking around with an open data hammer, and seeing every problem as a nail, we have some way to go before we have a full understanding of the open data tools that need to be in everyones toolbox, and those that may still need a specialist.

Reconfiguring measurement to focus on openness of infrastructure

One way to support advocacy for openness, whilst avoiding reifying open data, and integrating learning from the last decade on the need to embed open data practices sector-by-sector, could be found in an updated approach to measurement. David Eaves made the point in our Berkman-Klein Centre roundtable that the number of widely adopted standards, as opposed to the number of data portals or datasets, is a much better indicator of progress.

As resource for monitoring, measuring or benchmarking open data per-se becomes more scarce, there is an opportunity to look at new measurement frames that look at the data infrastructure and ecosystem around a particular problem, and ask about the extent of openness, not only of data, but also of governance. A number of conversations this week have illustrated the value of shifting the discussion onto data infrastructure and interoperability: yet (a) the language of data infrastructure has not yet taken hold, and can be hard to pin down; and (b) there is a risk of openness being downplayed in favour of a focus on centralised data infrastructures. Updating open data measurement tools to look at infrastructures and systems rather than datasets may be one way to intervene in this unfolding space.

Thought experiment: a data extraction transparency initiative

[Summary: rapid reflections on applying extractives metaphors to data in a international development context]

In yesterday’s Data as Development Workshop at the Belfer Center for Science and International Affairs we were exploring the impact of digital transformation on developing countries and the role of public policy in harnessing it. The role of large tech firms (whether from Silicon Valley, or indeed from China, India and other countries around the world) was never far from the debate. 

Although in general I’m not a fan of descriptions of ‘data as the new oil’ (I find the equation tends to be made as part of rather breathless techno-deterministic accounts of the future), an extractives metaphor may turn out to be quite useful in asking about the kinds of regulatory regimes that could be appropriate to promote both development, and manage risks, from the rise of data-intensive activity in developing countries.

Over recent decades, principles of extractives governance have developed that recognise the mineral and hydrocarbon resources of a country as at least partially part of the common wealth, such that control of extraction should be regulated, firms involved in extraction should take responsibility for externalities from their work, revenues should be taxed, and taxes invested into development. When we think about firms ‘extracting’ data from a country, perhaps through providing social media platforms and gathering digital trace data, or capturing and processing data from sensor networks, or even collecting genomic information from a biodiverse area to feed into research and product development, what regimes could or should exist to make sure benefits are shared, externalities managed, and the ‘common wealth’ that comes from the collected data, does not entirely flow out of the country, or into the pockets of a small elite?

Although real world extractives governance has often not resolved all these questions successfully, one tool in the governance toolbox has been the  Extractives Industry Transparency Initiative (EITI) . Under EITI, member countries and companies  are required to disclose information on all stages of of the extractives process: from the granting of permissions to operate, through to the taxation or revenue sharing secured, and the social and economic spending that results. The model recognises that governance failures might come from the actions of both companies, and governments – rather than assuming one or the other is the problem or benign. Although transparency alone does not solve governance problems: it can support better debate about both policy design and implementation, and can help address distorting information and power asymmetries that otherwise work against development.

So, what could an analogous initiative look like if applied to international firms involved in ‘data extraction’?

(Note: this is a rough-and-ready thought experiment testing out an extended version of an originally tweet-length thought. It is not a fully developed argument in favour of the ideas explored here).

Data as a national resource

Before conceptualising a ‘data extraction transparency initiative’ we need to first think about what counts as ‘data extraction’.  This involves considering the collected informational (and attention) resources of a population as a whole. Although data itself can be replicated (marking a key difference from finite fossil fuels and mineral resources), the generation and use of data is often rival (i.e. if I spend my time on Facebook, I’m not spending it on some other platform, and/or, some other tasks and activities),  involves first mover advantages (e.g. the first person who street view maps country X may corner the market), and can be made finite through law (e.g. someone collecting genomic material from a country may gain intellectual property rights protection for their data), or simply through restricting access (e.g. as Jeni considers here, where data is gathered from a community and used to shape policy, without the data being shared back to that community).

We could think then of data extraction as any data collection process which ‘uses up’ a common resource such as attention and time, which reduces the competitiveness of a market (thus shifting consumer to producer surplus), or which reduces the potential extent of the knowledge commons through intellectual property regimes or other restrictions on access and use.  Of course, the use of an extracted data resource may have economic and social benefits that feed back to the subjects of the extraction. The point is not that all extraction is bad, but is rather to be aware that data collection and use as an embedded process is definitely not the non-rival, infinitely replicable and zero-cost activity that some economic theories would have us believe.

(Note that underlying this lens is the idea that we should approach data extraction at the level of populations and environments, rather than trying to conceptualise individual ownership of data, and to define extraction in terms of a set of distinct transactions between firms and individuals.)

Past precedent: states and companies

Our model then for data extraction involves a relationship between firms and communities, which we will assume for the moment can be adequately represented by their states. A ‘data extractive transparency initiative’ would then be asking for disclosure from these firms at a country-by-country level, and disclosure from the states themselves. Is this reasonable to expect? 

We can find some precedents for disclosure by looking at the most recent Ranking Digital Rights Report, released last week. This describes how many firms are now providing data about government requests for content or account restriction. A number of companies produce detailed transparency reports that describe content removal requests from government, or show political advertising spend. This at least establishes the idea that voluntarily, or through regulation, it is feasible to expect firms to disclose certain aspects of their operations.

The idea that states should disclose information about their relationship with firms is also reasonably well established (if not wholly widespread). Open Contracting, and the kind of project-level disclosure of payments to government that can be see at ResourceProjects.org illustrate ways in which transparency can be brought to the government-private sector nexus.

In short, encouraging or mandating the kinds of disclosures we might consider below is not a new. Targeted transparency has long been in the regulatory toolbox.

Components of transparency

So – to continue the thought experiment: if we take some of the categories of EITI disclosure, what could this look like in a data context?

Legal framework

Countries would publish in a clear, accessible (and machine-readable?) form, details of the legal frameworks relating to privacy and data protection, intellectual property rights, and taxation of digital industries.

This should help firms to understand their legal obligations in each country, and may also make it easier for smaller firms to provide responsible services across borders without current high costs of finding the basic information needed to make sure they are complying with laws country-by-country.

Firms could also be mandated to make their policies and procedures for data handling clear, accessible (and machine-readable?).

Contracts, licenses and ownership

Whenever governments sign contracts that allow private sector to collect or control data about citizens, public spaces, or the environment, these contracts should be public. 

(In the Data as Development workshop, Sriganesh related the case  of a city that had signed a 20 year deal for broadband provision, signing over all sorts of data to the private firm involved.)

Similarly, licenses to operate, and permissions granted to firms should be clearly and publicly documented.

Recently, EITI has also focussed on beneficial ownership information: seeking to make clear who is really behind companies. For digital industries, mandating clear disclosure of corporate structure, and potentially also of the data-sharing relationships between firms (as GDPR starts to establish) could allow greater scrutiny of who is ultimately benefiting from data extraction.

Production

In the oil, gas and mining context, firms are asked to reveal production volumes (i.e. the amount extracted). The rise of country-by-country reporting, and project-level disclosure has sought to push for information on activity to be revealed not at the aggregated firm level, but in a more granular way.

For data firms, this requirement might translate into disclosure of the quantity of data (in terms of number of users, number of sensors etc.) collected from a country, or disclosure of country by country earnings.

Revenue collection

One important aspect of EITI has been an audit and reconciliation process that checks that the amounts firms claim to be paying in taxes or royalties to government match up with the amounts government claims to have received. This requires disclosure from both private firms and government.

A better understanding of whose digital activities are being taxed, and how, may support design of better policy that allows a share of revenues from data extraction to flow to the populations whose data-related resources are being exploited.

In yesterday’s workshop, Sriganesh pointed to the way in which some developing country governments now treat telecoms firms as an easy tax collection mechanism: if everyone wants a mobile phone connection, and mobile providers are already collecting payments, levying a charge on each connection, or a monthly tax, can be easy to administer. But, in the wrong places, and at the wrong levels, such taxes may capture consumer rather than producer surplus, and suppress rather than support the digital economy,

Perhaps one of the big challenges for ‘data as development’ when companies in more developed economies may extract data from developing countries, but process it back ‘at home’, is that current economic models may suggest that the biggest ‘added value’ is generated from the application of algorithms and processing. This (combined with creative accounting by big firms) can lead to little tax revenue in the countries from which data was originally extracted. Combining ‘production’ and ‘revenue’ data can at least bring this problem into view more clearly – and a strong country-by-country reporting regime may even allow governments to more accurately apply taxes.

Revenue allocation, social and economic spending

Important to the EITI model, is the idea that when governments do tax, or collect royalties, they do so on behalf of the whole polity, and they should be accountable for how they are then using the resulting resources.

By analogy, a ‘data extraction transparency initiative’ initiative may include requirements for greater transparency about how telecoms and data taxes are being used. This could further support multi-stakeholder dialogue on the kinds of public sector investments needed to support national development through use of data resources.

Environmental and social reporting

EITI encourages countries to ‘go beyond the standard and disclose other information too, including environmental information and information on gender.

Similar disclosures could also form part of a ‘data extraction transparency initiative’: encouraging or requiring firms to provide information on gender pay gaps and their environmental impact.

Is implementation possible?

So far this though experiment has established ways of thinking about ‘data extraction’ by analogy to natural resource extraction, and has identified some potential disclosures that could be made by both governments and private actors. It has done so in the context of thinking about sustainable development, and how to protect developing countries from data-exploitation, whilst also supporting them to appropriately and responsibly harness data as a developmental tool. There are some rough edges in all this: but also, I would argue, some quite feasible proposals too (disclosure of data-related contracts for example).

Large scale implementation would, of course, need careful design. The market structure, capital requirements and scale of digital and data firms is quite different to that of the natural resource industry. Compliance costs of any disclosure regime would need to be low enough to ensure that it is not only the biggest firms that can engage. Developing country governments also often have limited capacity when it comes to information management. Yet, most of the disclosures envisaged above relate to transactions that, if ‘born digital’, should be fairly easy to publish data on. And where additional machine-readable data (e.g. on laws and policies) is requested, if standards are designed well, there could be a win-win for firms and governments – for example, by allowing firms to more easily identify and select cloud providers that allow them to comply with the regulatory requirements of a particular country.

The political dimensions of implementation are, of course, another story – and one I’ll leave out of this thought experiment for now.

But why? What could the impact be?

Now we come to the real question. Even if we could create a ‘data extraction transparency initiative’, could it have any meaningful developmental impacts?

Here’s where some of the impacts could lie:

  • If firms had to report more clearly on the amount of ‘data’ they are taking out of a country, and the revenue that gives rise to, governments could tailor licensing and taxation regimes to promote more developmental uses of data. Firms would also be encouraged think about how they are investing in value-generation in countries where they operate. 
  • If contracts that involve data extraction are made public, terms that promote development can be encouraged, and those that diminish the opportunity to national development can be challenged.
  • If a country government chooses to engage in forms of ‘digital protectionism’, or to impose ‘local content requirements’ on the development of data technologies that could bring long-term benefits, but risk creating a short-term hit on the quality of digital services available in a country, greater transparency could support better policy debate. (Noting, however, that recent years have shown us that politics often trumps rational policy making in the real world).

There will inevitably be readers who see the thrust of this thought experiment as fundamentally anti-market, and who are fearful of, or ideologically opposed, to any of the kinds of government intervention that increasing transparency around data extraction might bring. It can be hard to imagine a digital future not dominated by the ever-increased rise of a small number of digital monopolies. But, from a sustainable development point of view, allowing another path to be sought: which supports to creation of resilient domestic technology industries, which prices in positive and negative externalities from data extraction, and which therefore allows active choices to be made about how national data resources are used as common asset, may be no bad thing.

The State of Open Data: Histories and Horizons – panels and conversations

The online and open access book versions ‘The State of Open Data: Histories and Horizons’ went live yesterday. Do check it out!

We’ve got an official book launch on 27th May in Ottawa, but ahead of that, I’m spending the next 8 days on the US East Coast contributing to a few of events to share learning from the project.

Over the last 18 months we’ve worked with 66 fantastic authors, and many other contributors, reviewers and editorial board members, to pull together a review of the last decade of activity on open data. The resulting collection provides short essays that look at open data in different sectors, fromaccountability and anti-corruption, to the environment, land ownership and international aid, as well as touching on cross-cutting issues, differentstakeholder perspectives, and regional experiences. We’ve tried to distill key insights in overall and section introductions, and to draw out some emerging messages in an overall conclusion.

This has been my first experience pulling together a whole book, and I’m incredibly grateful to my co-editors, Steve Walker, Mor Rubinstein, and Fernando Perini, who have worked tirelessly over the project to bring together all these contributions, make sure the project is community driven, and to present a professional final book to the world, particularly in what has been a tricky year personally. The team at our co-publishers, African Mindsand IDRC (Simon, Leith, Francois and Nola) also deserve a great debt of thanks for their attention to detail and design.

I’ll ty and write up some reflections and learning points on the book process in the near future, and will be blogging more about specific elements of the research in the coming weeks, but for now, let me share the schedule of upcoming events in case any blog readers happen to be able to join. I’ll aim to update these with links to any outcomes from the sessions too later.

Book events

Thursday 16th May – 09:00 – 11:00Future directions for open data research and action

Roundtable at the Harvard Berkman Klein Center, with chapter authors David Eaves, Mariel Garcia Montes, Nagla Rizk, and response from Luminate’s Laura Bacon.

Thursday 16th MayDeveloping the Caribbean

I’ll be connecting via hangouts to explore the connections between data literacy, artificial intelligence, and private sector engagement with open data

Monday 20th May – 12:00 – 13:00Let’s Talk Data – Does open data have an identity crisis?, World Bank I Building, Washington DC

A panel discussion as part of the World Bank Let’s Talk Data series, exploring the development of open data over the last decade. This session will also be webcast – see detail in EventBrite.

Monday 20th May – 17:30 – 19:30World Cafe & Happy Hour @ OpenGovHub, Washington DC

We’ll be bringing together authors from lots of different chapters, including Shaida Baidee (National Statistics), Catherine Weaver (Development Assistance & Humanitarian Action), Jorge Florez (Anti-corruption), Alexander Howard (Journalists and the Media), Joel Gurin (Private Sector), Christopher Wilson (Civil Society) and Anders Pedersen (Extractives) to talk about their key findings in an informal world cafe style.

Tuesday 21st MayThe State of Open Data: Open Data, Data Collaboratives and the Future of Data Stewardship, GovLab, New York

I’m joining Tariq Khokhar, Managing Director & Chief Data Scientist, Innovation, The Rockefeller Foundation, Adrienne Schmoeker, Deputy Chief Analytics Officer, City of New York and Beth Simone Noveck, Professor and Director, The GovLab, NYU Tandon (and also foreword writer for the book), to discuss changing approaches to data sharing, and how open data remains relevant.

Wednesday 22nd May – 18:00 – 20:00Small Group Session at Data & Society, New York

Join us for discussions of themes from the book, and how open data communities could or should interact with work on AI, big data, and data justice.

Monday 27th May – 17:00 – 19:30Book Launch in Ottawa

Join me and the other co-editors to celebrate the formal launch of the book!

Exploring Arts Engagement with (Open) Data

[Summary: Over the next few months I’m working with Create Gloucestershire with a brief to catalyse a range of organisational data projects. Amongst these will be a hackathon of sorts, exploring how artists and analysts might collaborate to look at the cultural education sector locally. The body of this post shares some exploratory groundwork. This is a variation cross-posted from the Create Gloucestershire website.]

Pre-amble…

Create Gloucestershire have been exploring data for a while now, looking to understand what the ever-increasing volume of online forms, data systems and spreadsheets arts organisations encounter every day might mean for the local cultural sector. For my part, I’ve long worked with data-rich projects, focussing on topics from workers co-operatives and youth participation, to international aid and corruption in government contracting, but the cultural sector is a space I’ve not widely explored.

Often, the process of exploring data can feel like a journey into the technical: where data stands in opposition to all things creative. So, as I join CG for the next three months as a ‘digital catalyst’, working on the use of data within the organisation, I wanted to start by stepping back, and exploring the different places at which data, art and creativity meet with an exploratory blog post..

…and a local note on getting involved…

In a few weeks (late February 2019) we’ll be exploring these issues through a short early-evening workshop in Stroud: with a view to hosting a day-long data-&-art hackathon in late Spring. If you would like to find out more, drop me a line.

Post: Art meets data | Data meets art

For some, data and art are diametrically opposed. Data is about facts. Art about feelings.

Take a look at writings from the data visualisation community [1], and you will see some suggest that data art is just bad visualisation. Data visualisation, the argument runs, uses graphical presentation to communicate information concisely and clearly. Data art, by contrast, places beauty before functionality. Aesthetics before information.

Found on Flickr: “I’m not even sure what this chart says … but I think its gorgeous!” (Image CC-BY Carla Gates / Original image source: ZSL)

I prefer to see data, visualisation and art all as components of communication. Communication as the process of sharing information, knowledge and wisdom.

The DIKW pyramid proposes a relationship between Data, Information, Knowledge and Wisdom, in which information involves the representation of data into ‘knowing that’, whilst knowledge requires experience to ‘know how’, and wisdom requires perspective and trained judgement in order to ‘know why’. (Image CC BY-SA. Wikimedia Commons)

Turning data into information requires a process of organisation and contextualisation. For example, a collection of isolated facts may be made more informative when arranged into a table. That table may be made more easily intelligible when summarised through counts and averages. And it may communicate more clearly when visualisation is included.

An Information -> Data -> Information journey. GCSE Results in Arts Subjects. (Screenshots & own analysis)

But when seeking to communicate a message from the data, there is another contextualisation that matters: contextualising to the recipient: to what they already know, or what you may want to them to come to know. Here, the right tools may not only be those of analysis and visualisation, but also those of art: communicating a message shaped by the data, though not entirely composed of it.

Artistic expression could focus on a finding, or a claim from the data, or may seek to support a particular audience to explore, interrogate and draw interpretations from a dataset. (Image CC BY-SA Toby Oxborrow)

In our upcoming workshop, we’ll be taking a number of datasets about the state of cultural education in Gloucestershire, and asking what they tell us. We’ll be thinking about the different ways to make sense of the data, and the ways to communicate messages from it. My hope is that we will find different ways to express the same data, looking at the same topic from a range of different angles, and bringing in other data sources of our own. In that way, we’ll be able to learn together both about practical skills for working with data, and to explore the subjects the data represents.

In preparing for this workshop I’ve been looking at ways different practitioners have connected data and art, through a range of media, over recent years.

The Open Data Institute: Data as Culture

Since it’s inception, The Open Data Institute in London has run a programme called ‘Data as culture’, commissioning artists to respond to the increasing datification of society.

Some works take a relatively direct approach to representation, selecting particular streams of data from the web and using different media to represent them. Text trends, for example, selected and counterposes different google search trends on a simple graph over time. And the ODIs infamous vending machine provides free crisps in response to news media mentions of recession.

Text Trends. From ODI Website and Data Soliloquies book.

In representative works, the artist has chosen the signal to focus on, and the context in which it is presented. However, the underlying data remains more or less legible, and depending on the contextual media and the literacies of the ‘reader’, certain factual information can also be extracted from the artwork. Whilst it might be more time-consuming to read, the effort demanded by both the act of creation, and the act of reading, may invite a deeper engagement with the phenomena described by the data. London EC2 explores this idea of changing the message through changing the media: by woodblock printing twitter messages, thus slowing down the pace of social media, encouraging the viewer to rethink otherwise ephemeral information.

In other works that are directly driven by datasets, data is used more to convey an impression rather than to convey specific information. In the knitted Punchcard Economy banners, a representation working hours is combined with a pre-defined message resulting in data that can be read as texture, more than it can be read as pattern. In choosing how far to ‘arrange’ the data, the work finds its place on a spectrum between visualisation or aesthetic organisation.

Punchcard Economy, Sam Meech, 2013. ODI: 3.5 x 0.5m knitted banner, FutureEverything: 5 x 3m knitted banner & knitting machines.

Other works in the data as culture collection start not from datasets, but from artists responses to wider trends of datification. Works such as metographyflipped clock and horizon respond to forms of data and it’s presentation in the modern world, raising questions about data and representation – but not necessarily about the specific data which happens to form part of the work.

Flipped Clock, Thomson & Craighead, 2008. ODI Data as Culture.

Other works still, look for the data within art, such as pixelquipu which takes it’s structure from pre-Columbian quipu (necklace-shaped, knotted threads from the Inca empire, that are thought to contain information relating to calendars and accounting in the empire). In these cases, turning information into data, and then representing it back in other way, is used to explore patterns that might not have otherwise been visible.

YoHa: Invisible Airs

Although it has also featured in the ODI’s Data as Culture collection, I want to draw out and look specifically at YoHa’s ‘Invisible Airs’ project. Not least because it was the first real work of ‘open data art’ I encountered, stumbling across it at an event in Bristol.

As newly released public spending records appear on screen, a pneumatically powered knife stabs a library book, sending a message about budget cuts, and inviting scrutiny of the data on screen.

It is a hard project to describe, but fortunately YoHa have a detailed project description and video on their website, showing the contraptions (participatory kinetic sculptures?) they created in 2014, driven by pneumatic tubes and actuated by information from Bristol City Council’s database of public spending.


In the video, Graham Harwood describes how their different creations (from a bike seat that rises up in response to spending transactions, to a pneumatic knife stabbing a book to highlight library service cuts) seek to ‘de-normalise’ data, not in the database designers sense of finding a suitable level of data abstraction, but in the sense of engaging the participant to understand otherwise dry data in new ways. The learning from the project is also instructive: in terms of exploring how far the works kept the attention of those engaging with them, or how far they were able to communicate only a conceptual point, before viewers attention fell away, and messages from the underlying data were lost.

Ultimately though, Invisible Airs (and other YoHa works engaging with the theme of data) are not so much communicating data, as communicating again about the role, and power, of data in our society. Their work seeks to bring databases, rather than the individual data items they contain, into view. As project commissioner Prof Jon Dovey puts it, “If you are interested in the way that power works, if you are interested in the way that local government works, if you are interested in the way that corporations work, if you are interested in the way that the state works, then data is at the heart of it…. The way your council tax gets calculated… the way your education budget gets calculated, all these things function through databases.”

Everyday data arts

Data as art need not involve costly commissions. For example, the media recently picked up on the story of a german commuter who had knitted a ‘train-delay scarf’, with choice of wool and colour representing length of delays. The act of creating was both a means to record, and to communicate, and in the process communicate much more effectively than the same data might have done if simply recorded in a spreadsheet, or even placed onto a chart with data visualisation.

‘Train Delay Scarf’ – a twitter sensation in January 2019.

Data sculpture and data-driven music

In a 2011 TED Talk, Nathalie Miebach has explored both how weather data can be turned into a work of art through sculpture and music, as well as questioning how the setting in which the resulting work is show affects how it is perceived.

She describes the creation of a vocabulary for turning the data into a creative work, but also the choice of a media that is not entirely controlled by the data, such that the resulting work is not entirely determined by the data, but also by its interaction with other environmental factors.

Dance your PhD, and dancing data

When reflecting on data and art, I was reminded of the annual Dance your PhD competition. Although the focus is more on expressing algorithms and research findings, than underlying datasets, it offers a useful way to reflect on ways to explain data, not only express what it contains.

In a similar vein, AlgoRythmics explain sorting algorithms using folk dance – a playful way of explaining what’s going on inside the machine when processing data.

There is an interesting distinction though between these two. Whilst Dance your PhD entries generally ‘annotate’ the dance with text to explain the phenomena that the dance engages with audience with, in AlgoRythmics, the dance itself is the entirety of the explanation.

Visualisation

The fields of InfoViz and DataViz have exploded over the last decade. Blog such as InformationIsBeautiful, Flowing Data and Visualising Data provide a regular dose of new maps, charts and novel presentation of data. However, InfoViz and DataViz are not simply synonyms: they represent work that starts from different points of a Data/Information/Knowledge model, and with often different goals in mind.

Take, for example, David McCandless’ work in the ‘Information in Beautiful’ book (also presented in this TED Talk). The images, although often based on data, are not a direct visualisation of the data, but an editorialised story. The data has already been analysed to identify a message before it is presented through charts, maps and diagrams.

 

By contrast, in Edward Tufte’s work on data visualisation, or even statistical graphics, the role of visualisation is to present data in order to support the analytical process and the discovery of information. Tufte talks of ‘the thinking eye’, highlighting the way in which patterns that may be invisible when data is presented numerically, can become visible and intelligible when the right visual representation is chosen. However, for Tufte, the idea of the correct approach to visualisation is important: presenting data effectively is both an art and a technical skill, informed by insights and research from art and design, but fundamentally something that can be done right, or done wrong.

Graphical Practices: Page 14 of Edward Tufte ‘The Visual Display of Quantitative Information

Other data visualisation falls somewhere between the extremes I’ve painted here. Exploratory data visualisations can seek to both support analysis, but also to tell a particular story through their selection of visualisation approach. A look at the winners of the recent 360 Giving Data Visualisation Challenge illustrates this well. Each of these visualisation draws on the same open dataset about grant making, but where ‘A drop in the bucket’ uses a playful animation to highlight the size of grants from different funders, Funding Themes extracts topics from the data and presents an interactive visualisation, inviting users to ‘drill down’ into the data and explore it in more depth. Others, like trend engine use more of a dashboard approach to present data, allowing the user to skim through and find, if not complete answers, at least refined questions that they may want to ask of the raw dataset.

Funding Trends for a ‘cluster’ of arts-related grants, drawing on 360 Giving data. Creator: Xavi Gimenez

Arts meet data | Data meet arts | Brokering introductions

Writing this post has given me a starting point to explore some data-art-dichotomies and to survey and link to a range of shared examples that might be useful for conversations in the coming weeks.

It’s also sparked some ideas for workshop methods we might be able to use to keep analytical, interpretative and communicative modes in mind when planning for a hackathon later this year. But that will have to wait for a future post…

 

Footnotes

[1]: I am overstating the argument in the blog post on art and data visualisation slightly for effect. The post, and comments in fact offer a nuanced dialogue worth exploring on the relationship of data visualisation and art, although still seeking to draw a clear disjunct relationship.

Notes on a Tribunal (well, almost). Or, “how to increase your contract costs by 30% by negotiating in secret.”

[Summary: I promise at some point this blog will carry content other than about incinerators and contracts. But, for the moment one more exciting instalment in the ongoing saga, in which we learn the contracting documents GCC have been fighting to hide show a 30% increase in Javelin Park costs.]

What’s just happened

Gloucestershire County Council (GCC) decided earlier this week to drop their appeal against an ICO ruling that they should release in full a 2015 ‘Value for Money’ analysis carried out just before they signed a revised contact with Urbaser Balfour Beatty (UBB) for building the Javelin Park Incinerator (which we’ve been referring to locally as the Ernst and Young report).

Throughout the process GCC have claimed that ‘commercial’ risk to both the Council and UBB prevents them from disclosing the documents. By dropping the appeal just a month before it was due to go to a Tribunal hearing, they avoid having to prove any of these claims in front of a panel and judge.

In addition, GCC appear to have delayed providing this information in order to commission Ernst and Young to produce another report, this time calculating an assumption-laden average gate fee, in order to continue to make the case for the project. This is at odds with the requirements of the Environmental Information Regulations to prompt disclosure – as presumably GCC must have known the commercial interest were no longer active prior to commissioning this new report, but instead chose to delay disclosure and spend taxpayers money on an ‘explanatory note’.

Where are we now

There’s a lot of history to this story, so to recap quickly.

Gloucestershire County Council have been seeking to build an Energy from Waste Incinerator for over a decade. In 2013 they signed a Public Private Partnership contract with UBB for the project. The contract was signed before planning permission was in place for the construction site. Planning was refused, leading to a two-year delay. This triggered a renegotiation of the contract in 2015, signed in January 2016. The plant is now under construction and close to being operational in 2019. Throughout the process GCC have claimed the project provides savings of up to £150m (later quietly reduced to £100m without explanation) over it’s 25 year life span.

Campaigners have long sought to see the contract, and in early 2016, the Information Rights Tribunal ruled that the majority of details, including gate fees (i.e. the price paid to burn waste) should be disclosed. I then requested a 2015 analysis relating to the re-negotiation, and was only given a highly redacted copy, not showing gate fees. I requested a review, and eventually appealed to the Information Commissioners Office (ICO) against authority refusal to release the information. The ICO ruled that the documents should be disclosed un-redacted. GCC appealed this decision in the summer, and since then have been preparing for a tribunal case claiming that disclosure would be against the contractors and the authorities commercial interest. They have now released the documents, although notably only claiming the contractor no longer has a commercial interest in them being confidential, leaving lingering questions about whether the authority had any legitimate commercial interest in non-disclosure all along.

What do we learn from the new documents

First below is the redacted document from GCC (click for full size). Then there is the equivalent table from the un-redacted and new report by Ernst and Young (which usefully does include the final rather than forecast figures for the renegotiated deal: i.e. the actual new contract numbers assuming there has been no further renegotiation since).

(Note that in the Table 2, the first ‘Variance’ column is between the originally signed contract, and the forecast revisions in 2015, and the second Variance column (in yellow) is between the forecast revisions, and the finally signed updated deal. So to get total variation from 2013 to 2016, you need to add these two columns together.)

So: what can we learn from the new data and documents:

  • (1) Firstly, the headline price per-tonne has increased by £42.97/tonne – a staggering 29.3% rise for a three year project delay (cumulative CPI inflation over the same period was 5.13%). The total per-tonne cost for the first 108,000 tonnes is now £189.33/tonne: far above anything any other authority in the country appears to be paying.
  • (2) This drives an increase in the nominal tonnage payments over the contract life of £446m to £601.5m. Whilst some of that might be offset by energy income/benefits, given these were also part of the case in 2013, this looks like a massive increase in costs – again just for a three year delay. (Given predicted waste volume rises in all the forecasts the contract is based on, a three year delay also involves starting the project when waste volumes are higher than in 2013 – so some change would be anticipated in this figure even if there was no gate fee increases. But the gate fee increases look like the major component turning Javelin Park from a £450m to a £600m project).
  • (3) Other tonnage payments have also increased – although the most notable change between the forecast, and signed renegotiation is that ‘Third Party Gate Fees’ have been kept down – suggesting that the financial modelling for the plant relies on attracting as much additional waste as possible at a low cost, with all the fixed costs of the project subsidised by the taxpayer.
  • (4) The Cabinet were told in November 2015 (E&Y VfM report; §3.1) that the capital costs of the project in UBB’s original Revised Project Plan “included a significantly inflated price of £177m.” but that “The council has had some success in negotiating this EPC price down and it now stands at £167m” and “the Council expects to see further improvements in this price”, yet by the signature of the new deal, the capital expenditure costs were also up 30% £178.9m – £2m higher than UBB’s opening gambit!
  • (5) The Value for Money calculations (E&Y VfM report, and restated in Table 3, new E&Y report) only carry out comparisons to ‘Termination (Landfill alternative)’ of which at least £60m is the cancellation cost signed up to in 2013. For this reason, the newly released Annex 1 of the report to Cabinet in 2015 (§6) explicitly acknowledges that the most that should be claimed in savings when this is taken out is £93m – and this is only when Council reserves are put into the project. It’s not clear why Cabinet Members continued to use a figure higher than this in public after this report.
  • (6) The first important thing from that last point are that at no point in 2015, did the Cabinet carry out a Value for Money assessment comparing the costs of continuing with the 30% more expensive contract, vs. cancelling and re-tendering. Instead, they pressed ahead with a closed-door renegotiation without competitive pressures – which goes a long way to explaining why the contractor could get such a big boost in costs.
  • (7) The second thing to note is that the claim of anything close to £100m savings is only secured by the cash injection into the project from reserves. Those are reserves that are then locked up and not available for other use. Without cash injections from reserves, the savings are much lower.

The mystery of the ‘Real Average Gate Fee’

In the new Ernst and Young report that accompanies the response to my EIR/FOI request, a lot is made of a figure called the ‘Real Average Gate Fee’ (RAGF), which is calculated at £112.47/tonne.

Now – a few things about this number:

(1) I googled “Real Average Gate Fee” to see if this was based on an established industry wide methodology. As it turns out – the only place this phrase occurs on the whole of the Internet is in Ernst and Young’s report.

It turns out no-one apart from GCC know what a Real Average Gate Fee is either

(2) As I understand, this number is based on making best case assumptions about the income to the authority from electricity sales, and third-party income – and assuming the maximum contract tonnages set out in authority forecasts. If those assumptions are not met (e.g. we hit 60% recycling by 2020 and 70% by 2029/30; or waste volumes do not continue to rise as fast as forecast), then, because of the structure of the contract (all front-loaded costs on the first 108,000 tonnes; savings on waste volumes above this), the RAGF would very quickly rise. In other words, the RAGF is only valid if you accept high waste assumptions. In any other scenario it gets much higher.

(3) The report compares this to the range of real gate fees that WRAP found in their 2016 survey. Note that WRAP have a pretty robust methodology in their survey, and they state:

“Not all waste management services are costed or charged on a simple gate fee basis (£/tonne). In some cases a tonnage-related payment is just one element of a wider unitary charge paid by an authority” and that “every effort is made to eliminate such responses from the sample”

so the comparison of a constructed ‘Real Average Gate Fee’ from a unitary charge/PPP structure to a real gate fee is questionable at best.

(4) However, even more questionable is picking the 2016 data to compare the RAGF too. We now have 2017 data available from WRAP, and the E&Y report notes that this figure is based on the ‘Net Present Value based date of June 2015‘, so both 2015 and 2017 would seem more reasonable years to compare too.

Pro-tip: When looking to manipulate figures, always pick your comparison year to make your numbers look the best you can…

A quick look at WRAP’s EfW Overview dashboard for post-2000 plants gives us a clue as to why this year was chosen. 2016 is an outlier when it comes to maximum values in WRAPs survey. In 2015, the highest anyone responding to the survey was paying per tonne was £131/tonne, and in 2017 it was £116/tonne.

Even if we allow the not-really-comparable RAGF of £112/tonne – that put it right at the top of the range. Even a small reducting in income from energy, or reductions in waste, would push this into being the most expensive deal in the country.

If you compare the actual gate fee of £190/tonne for the first 108.000 tonnes, it is clear this is massively above what anyone else surveyed is paying.

What we still need to explore

The Ernst and Young Report VfM report notes that the overall lifetime project costs could have been substantially reduced with ‘Prudential Borrowing’ (i.e. relying on low interest loans the authority can achieve, rather than private banks). Annexe 1 to the 2015 Cabinet Report reveals this option was ignored, because it would have required discussion as part of the Council’s budget process in February 2016 and it states “the banks have advised that they need to achieve financial close by the end of the year [2015]”.

However, financial close was not achieved until January 2016 (it seems the banks didn’t mind so much after all?). It’s not yet clear to me what information Councillors outside cabinet had at the time on this decision – and what it tells us about the pursuit of a PFI option, when it appears other, much cheaper public funded options for the project were available.

There is also a question of the missing OJEU (Official Journal of the European Union) notice. It seems that, whilst in most cases, any contract variation of over 10% in value after the Public Contract Regulations 2015 came into force should normally have involved re-tendering, an exception may have been permissable for this project because of ‘unforseen circumstances’ (although whether the planning refusal is something a dilligent authority could not have forseen is open to major question). Notwithstanding that – LGA guidance states that in a case of major contract modification “a special type of notice must be published in OJEU” a “‘Notice of modification of a contract during its term’.”

I’ve not been able to find any evidence that such a notice was issued, and Cllr Rachel Smith has also asked for copies of all OJEU notices related to the project a number of times, and a contract modification notice has never been amongst them.

Where next?

Hopefully a Christmas break! Whilst it was kind of GCC to drop these documents just before Christmas – I’m hoping to have at least a bit of time off.

However, far from proving the value for money or transparency of the project as Councillors claim: these documents show there are still major questions to be answered about how a secret renegotiation led to 30% increase in costs, and why no assessments took place to look at non-landfill alternatives and create at least some sort of competitive pressure at the time of renegotiation.

There are also major questions to be asked about the handing of the Information Tribunal appeal. But those can wait for a day or two at least.

Beyond october

In a few weeks time (October 12th) I’m going to be leaving Open Data Services Co-op and starting a short career-break of sorts: returning to my research roots, spending some time exploring possible areas of future focus, and generally taking a bit of time out.

I’ll be leaving projects in capable hands, with colleagues at Open Data Services continuing to work on Open Contracting, Beneficial Ownership, 360 Giving, Org-id.guide IATI and Social Economy data standards projects. One of the great advantages of the worker co-operative model we’ve been developing over the last three and a half years is that, instead of now needing to seek new leaders for the technical work on these projects, we’ve been developing shared leadership of these projects from day one.

I first got involved in the development of open data standards out of research interest: curious about how these elements of data infrastructure were created and maintained, and about the politics embedded within, or expressed through, them. Over the last five years my work has increasingly focussed on supporting open data standard adoption, generating tons of learning – but with little time to process it or write it up. So – at least for a while – I’ll be stepping back from day-to-day work on specific standards and data infrastructure, and hopefully next year will find ways to distill the last few years learning in some useful form.

Between now and the end of 2018, I’ll be working on editing the State of Open Data collection of essays for the OD4D network. Then in early 2019, I’m planning for a bit of time off completely, before starting to explore new projects from April onwards.

I’m imensely proud of what we’ve done with Open Data Services Co-op over the last 3.5 years, and grateful to colleague for co-creating something that both supports world-changing data projects, but that also supports team members in their own journeys. If you ever need support with an open data project, do not hesitate to drop them a line.

Javelin Park Episode 5: Return of the ICO

[Summary: The Information Commissioner’s Office has upheld an appeal against continued redaction of key financial information about the Javelin Park Incinerator Public Private Partnership (PPP) project in Gloucestershire]

The Story So Far

I’ve written before about controversy over the contract for Javelin Park, a waste incinerator project worth at least £0.5bn and being constructed just outside Stroud as part of a 25-year Public Private Partnership deal. There’s a short history at the bottom of this article, which breaks off in 2015 when the Information Commissioners’ Office last ruled against Gloucestershire County Council (GCC) and told them to release an unredacted copy of the PPP contract. GCC appealed that decision, but were finally told by the Information Tribunal in 2017 to publish the contract: which they did. Sort of. Because in the papers released, we found out about a 2015 renegotiation that had taken place, meaning that we still don’t know how much local taxpayers are on the hook for, nor how the charging model affects potential recycling rates, or incentives to burn plastics.

In June last year, through FOI, I got a heavily redacted copy of a report considering the value for money of this renegotiated contract, but blacking out all the key figures. This week the Information Commissioner upheld my appeal against the redactions, ruling that GCC have 35 days to provide un-redacted information. They may still make their own appeal against this, but the ICO decision makes very clear that the reasoning from the 2017 Information Tribunal ruling holds firm when it comes to the public interest in knowing salient details of original and renegotiated contracts.

The Story Right Now

For the last two weeks, Gloucestershire resident Sid Saunders has been on hunger strike outside the county’s Shire Hall to call for the release of the full revised contract between Gloucestershire County Council and Urbaser Balfour Beatty. This is, to my knowledge, unprecedented. It demonstrates the strength of feeling over the project, and the crucial importance of transparency around contracts in securing public accountability.

GCC are already weeks overdue responding to the most recent FOI/EIR request for the latest contract text, and continue to stonewall requests for even basic details, repeating discredited soundbites about potential savings that rely on outdated assumptions about comparisons and high waste flows.

On Wednesday, Sid and other local activists staged a dignified silent protest at the meeting of GCC Cabinet, where public and councillor questions on an air quality agenda item had unconstitutionally been excluded.

Tomorrow we’ll be heading to Gloucester in support of Sid’s continued campaign for information, and for action to bring accountability to this mega-project.

It’s against this backdrop that I wanted to draw out some of the key elements of the ICO’s decision notice, and observations on GCC responses to FOI and EIR requests.

Unpacking the decision notice

The decision notice has not yet been published on the ICO website, but I’ve posted a copy here and will update the link once the ICO version is online.

The delays can’t stay

It is notable that every request for information relating to Javelin Park has been met with very delayed replies, exceeding the statutory limits set down in the Freedom of Information Act (FOIA), and the stricter Environmental Information Regulations (EIR).

The decision notice states that the “council failed to comply with the requirements of Regulation 5(2) and Regulation 14(2)” which set strict time limits on the provision of information, and the grounds for which an authority can take extra time to respond.

Yet, we’re seeing in the latest requests, that GCC suggest that they will need until the end of June (which falls, curiously, just days after the next full meeting of the County Council) to work out what they can release. I suspect consistent breaches of the regulations on timeliness are not likely to be looked on favourably by the ICO in any future appeals.

The information tribunal principles stand

The Commissioners decision notice draws heavily on the earlier Information Tribunal ruling that noted that, whilst there are commercial interests of the Authority, and UBB at play, there are significant public interests in transparency, and:

“In the end it is the electorate which must hold the Council as a whole to account and the electorate are more able to do that properly if relevant information is available to all”

The decision note makes clear that the reasoning applies to revisions to the contract:

Even with the disclosures ordered by the Tribunal from the contract the Commissioner considers that it is impossible for the public to be fully aware of the overall value for money of the project in the long term if it is unable to analyse the full figures regarding costs and price estimates which the council was working from at the time of the revised project plan.

going on to say:

The report therefore provides more current, relevant figures which the council used to evaluate and inform its decisions regarding the contract and it will presumably be used as a basis for its future negotiations over pricing and costs. Currently these figures are not publicly available, and therefore the public as a whole cannot create an overall picture as to whether the EfW development provides value for money under the revised agreement.

As the World Bank PPP Disclosure Framework makes clear, amendment and revisions to a contract are as important as the contract itself, and should be proactively published. Not laboriously dragged out of an authority through repeated trips to information tribunals.

Prices come from markets, not from secrets

A consistent theme in the GCCs case for keeping heavy redactions in the contract is that disclosure of information might affect the price they get for selling electricity generated at the plant. However, the decision notice puts the point succinctly:

Whilst she [the Commissioner] also accepts that if these figures are published third parties might take account of them during negotiations, the main issue will be the market value of electricity at the time that negotiations are taking place.

As I recall from first year economics lectures (or perhaps even GCSE business studies…): markets function better with more perfect information. The energy market is competitive, and there is no reason to think that selective secrecy will distort the market or secure the authority a better deal.

(It is worth noting that the same reasoning, hiding information to ‘get a better deal’ seems to be driving the non-disclosure of details of the £53m of land the authority plan to dispose of – again raising major questions about exactly whose interests are being served by a culture of secrecy?).

Not everything is open

The ICO decision notice is nuanced. It does find some areas where, with the commercial interest of the private party invoked, public interest is not strong enough to lead to disclosure. The Commissioner states:

These include issues such as interest and debt rates and operating costs of UBB which do not directly affect the overall value for money to the public, but which are commercially sensitive to UBB.

This makes some sense. As this decision notice relates to a consultants report on Value for Money, rather than the contract with the public authority, it is possible for there to be figures that do not warrant wider disclosure. However, following the precedent set by the Information Tribunal, the same reasoning would only apply to parts of a contract if they had been agreed in advance to be commercially confidential. As Judge Shanks found, only a limited part of the agreement between UBB and GCC was covered by such terms. Any redactions GCC now want to apply to a revised agreement should start only from consulting contract Schedule 23 on agreed commercial confidential information.

Where next?

GCC have either 28 days to appeal the decision notice, or 35 days to provide the requested information. The document in question is only a 29 page report, with a small number of redactions to remove, so it certainly should not take that long.

Last time GCC appealed to a Tribunal in the case of the 2013 Javelin Park Contract they spent upwards of £400,000 of taxpayers money on lawyers*, only to be told to release the majority of the text. Given the ICO Decision Notice makes clear it is relying on the reasoning of the Tribunal, a new appeal to the tribunal would seem unlikely to succeed.

However, we do now have to wait and see what GCC do, and whether we’ll get to know what the renegotiated contract prices were in 2015. Of course, this doesn’t tell us whether or not there has been further renegotiation, and for that we have to continue to push for proactive transparency and a clear open contracting policy at GCC that will make transparency the norm, rather than something committed local citizens have to fight for through self-sacrificing direct action.

*Based on public spending data payments from Residential Waste Project to Eversheds.