Algorithmic systems, Wittgenstein and Ways of Life

I’m spending much of this October as a resident fellow at the Bellagio Centre in Italy, taking part in a thematic month on Artificial Intelligence (AI). Besides working on some writings about the relationship between open standards for data and the evolving AI field, I’m trying to read around the subject more widely, and learn as much as I can from my fellow residents. 

As the first of a likely series of ‘thinking aloud’ blog posts to try and capture reflections from reading and conversations, I’ve been exploring what Wittgenstein’s later language philosophy might add to conversations around AI.

Wittgenstein and technology

Wittgenstein’s philosophy of language, whilst hard to summarise in brief, might be conveyed through reference to a few of his key aphorisms. §43 of the Philosophical Investigations makes the key claim that: ”For a large class of cases–though not for all–in which we employ the word ‘meaning’ it can be defined thus: the meaning of a word is its use in the language.” But this does not lead to the idea that words can mean anything: rather, correct use of a word depends on its use being effective, and that in turn depends on a setting, or, as Wittgenstein terms it, a ‘language game. In a language game participants have come to understand the rules, even if the rules are not clearly stated or entirely legible: we engage successfully in language games through learning the techniques of participation, acquired through a mix of instruction and of practice. Our participation in these language games is linked to the idea of ‘forms of life, or, as it is put in §241 of the Philosophical Investigations, “It is what human beings say that is false and true; and they agree in the language they use. That is not agreement in opinions but in form of life.”.

As I understand it, one of the key ideas here can be expressed by stating that meaning is essentially social, and it is our behaviours and ways of acting, constrained by wider social and physical limits, that determine the ways in which meaning is made and remade.

Where does AI fit into this? Well in Wittgenstein as a Philosopher of Technology: Tool Use, Forms of Life, Technique, and a Transcendental Argument, Coeckelbergh & Funk (2018) draw on Wittgenstein’s tool metaphors (and professional history as an engineer as well as philosopher) to show that we can apply a Wittgensteinian analysis to technologies, explaining that: that “we can only understand technologies in and from their use, that is, in technological practice which is also culture-in-practice.” (p 178) . At the same time, they point to the role of technologies in constructing the physical and material constraints upon plausible forms of life:

Understanding technology, then, means understanding a form of life, and this includes technique and the use of all kinds of tools—linguistic, material, and others. Then the main question for a Wittgensteinian philosophy of technology applied to technology development and innovation is: what will the future forms of life, including new technological developments, look like, and how might this form of life be related to historical and contemporary forms of live?  [sic] (p 179)

It is important though to be attentive to the different properties of  different kinds of tools in use (linguistic, material, technological) within any form of life. Mass digital technologies, in particular, appears to spread in less negotiable ways: that is, some new technology introduced, whilst open to be embedded in forms of life in some subtly different ways, often has core features presented only on a take-it-or-leave-it basis, and, once introduced, can be relatively brittle and resistant to shaping by its users.

So – as new technologies are introduced, we may find that they reconfigure the social and material bounds of our current forms of life, whilst also introducing new language games, or new rules to existing games into our social settings. And with contemporary AI technologies in particular, a number of specific concerns may arise.

AI Concerns and Critical Responses

Before we consider how AI might affect our forms of life, a few further observations (and statements of value):

  • The plural of ‘forms’ is intentional. There are variations in the forms of life lived across our planet. Social agreements in behaviour and action vary between cultural settings, regions or social strata. Many humans live between multiple forms of life, translating in word and behaviour between the different meanings each requires. Multiple forms are not strictly dichotomous: different forms of life may have many resemblances, but their distinctions matter and should be valued (this is an explicit political statement of value on my part).
  • There have been a number of social projects to establish certain universal forms of life over past centuries. For example, the development of consensus on human rights frameworks is one of these. seeking equitable treatment of all (I also personally subscribe to the view that a high level of respect for universal human rights should feature as a constraint to  all forms of life).
  • Within this trend, there are also a number of significant projects seeking to establish greater acceptance of different ways of living, including action to reverse the victorian imposition of certain normative family structures, work to afford individuals greater autonomy in defining their own identities, and activity to embed much more ecological models of thinking about human society.

These trends (or ongoing social struggles if you like) seeking to make our ways of living more tolerant, open,  inclusive and sustainable are important to note when we consider the rise of AI systems. Such systems are frequently reliant on categorised data, and on a reductive modelling of the human experience based on past, rather than prospective, data.

This noted, it appears then that we might point to two distinct forms of concern about AI:

(A) The use of algorithmic systems, built on reductive data, risks ossifying past ways of life (with their many injustices), rather than supporting struggles for social justice that involve ongoing efforts to renegotiate the meaning of certain categories and behaviours.

(B) Algorithmic systems may embody particular ways of life that, because of the power that can be exercised through their pervasive operation, cause those forms of life to be imposed over others. This creates pressure for humans to adapt their ways of life to fit the machine (and its creators/owners), rather than allowing the adaptation of the machine to fit into different human ways of life.

Brief examples

Gender detection software is AI trained to judge  the gender of a person from an image (or from analysing names, text or some other input). In general, such systems define gender using a male-female binary. Such systems are being widely used in research and industry. Yet, at the same time the task of judging gender is being passed from human to machine, there are increasingly present ways of life that reject the equation of gender and sex identity, and the idea of a fixed gender-binary. The introduction of AI here risks the ossification of past social forms.

Predictive text tools are increasingly being embedded in e-mail and chat clients to suggest one-click automatic responses, instead of requiring the human to craft a written response. Such AI-driven features are at once a tool of great convenience, but also an imposed shift in our patterns of social interaction.

Such forms of ‘social robot’ are addressed by Coeckelbergh & Funk when they write: “These social robots become active systems for verbal communication and therefore influence human linguistic habits more than non-talking tools.” (p 185). But note the material limitations of these robots: they can’t construct a full sentence representative of their user. Instead, they push conversation towards the quick short response, creating a pressure to change patterns of human interaction.

Auto-replies suggested by Google Mail based on a proprietary algorithm.

The examples above suggested by gmail for me to use in reply to a recent e-mail might follow terms I’d often use, but push towards a form of e-mail communication that, at least in my experience, represents a particularly capitalist and functional form of life, in which speed of communication is of the essence, rather than social communication and exploration of ideas.

Reflections and responses

Wittgenstein was not a social commentator, but it is possible to draw upon his ideas to move beyond conversations about AI bias, to look at how the widespread introduction of algorithmic and machine-learning driven systems may interact with different contemporary forms of living.

I’m always interested though in the critical leading to the practical, and so below I’ve started to sketch out possible responses the analysis above leads me to consider. I also strongly suspect that these responses, and justification for them, can be elaborated much more directly and accessibility without getting here via Wittgenstein. Writing that may be a task for later, but as I came here via the Wittgensitinian route, I’ll stick with it.

(1) Find better categories

If we want future algorithmic systems to represent the forms of live we want to live, not just those lived in the past, or imposed upon populations, we need to focus on the categories and data structured used to describe the world and train machine-learning systems.

The question of when we can develop global categories that have meaning that is ‘good enough’ in terms of alignment in use across different settings, and when it is important to have systems that can accommodate more localised categorisations, is one that requires detailed work, and that is inherent political.

(2) Build a better machine

Some objects to particular instances of AI may be because it is, ultimately, too blunt in its current form. Would my objection to the predictive text tools be the same if they could express more complete sentences, more in line with the way I want to communicate? For many critiques of algorithmic systems, there may be a plausible response to suggest that a better designed or trained system could address the problem raised.

I’m sceptical however, of whether it is plausible for most current instantiations of machine-learning to be adaptable enough to different forms of life: not least on the grounds that for some ways of living the sample-size may be too small to gather enough data points to construct a good model, or the collection of the data required may be too expensive or intrusive for theoretical possibilities of highly adaptive machine-learning systems to be practically feasible or desirable.

(3) Strategic rejection

Recognising the economic and political power embedded in certain AI implementations, and the particular form of life it embodies, may help us to see technologies we want to reject outright. If a certain tool makes moves in a language game that are at odds with the game we want to be playing, and only gains agreement of action through its imposition, then perhaps we should not admit it at all.

To put that more bluntly (and bringing in my own political stance), certain AI tools embody a late-capitalist form of life, rooted in cultures and practices of a small strata of Silicon Valley. Such tools should have no place in shaping other ways of life, and should be rejected not because they are biased, or because they have not adequately considered issues of privacy, but simply because the form of life they replicate undermines both equality and ecology.

Where next

Over my time here at Bellagio, I’ll be particularly focussed on the first of these responses – seeking better categories, and understanding how processes of standardisation interact with AI. My goal is to do that with more narrative, and less abstraction, but we shall see…

Creative Lab Report: Data | Culture | Learning

[Summary: report from a one day workshop with Create Gloucestershire bringing together artists and technologists to create artworks responding to data. Part 2 in a series with Exploring Arts Engagement with (Open) Data]

What happens when you bring together a group of artists, scientists, teachers and creative producers, with a collection of datasets, and a sprinkling of technologists and data analysts for a day? What will they create? What can we learn about data through the process?  

There has been a long trend of data-driven artworks, or individual artists incorporating responses to structured data in their work. But how does this work in the compressed context of a one-day collaborative workshop? These are all questions I had the opportunity to explore last Saturday in a workshop co-facilitated with Jay Haigh of Create Gloucestershire and hosted at Atelier in Stroud:  an event we ran under the title “Data | Create | Learning: Creative Lab”

The steady decline in education spending and increased focus on STEM subjects has impacted significantly on arts teaching and teachers. The knock on effect is observed in the take up of arts subjects at secondary, further and higher education level and, ultimately, impacting negatively on the arts and cultural sector in the UK. As such, Create Gloucestershire has been piloting new work in Gloucestershire schools to embed new creative curriculum approaches, supporting its mission to ‘make arts everyday for everyone’. The cultural education agenda therefore provided a useful ‘hook’ for this data exploration. 

Data: preparation

We started thinking about the idea of a ‘art and data hackathon’ at the start of this year, as part of Create Gloucestershire’s data maturity journey and decided to focus on questions around cultural education in Gloucestershire. However, we quickly realised an event could not be entirely modelled on a classic coding hackathon event, so, in April we brought together a group of potential participants for a short design meeting. 

Photo of preparation workshop

For this, we sought out a range of datasets about schools, arts education, arts teaching and funding for arts activities – and I worked to prepare Gloucestershire extracts of these datasets (slimming them down from hundreds of columns and rows) . Inspired by the Dataset Nutrition Project project, and using AirTable blocks to rapidly create a set of cards, we took along profiles of some of these datasets to help give participants at the planning meeting a sense of what might be found inside each of the datasets we looked at. 

Dataset labels: inspired by dataset nutrition project
Through this planning meeting we were able to set our expectations about the kind of analysis and insights we might get to from these datasets, and to think about placing the emphasis of the day on collaboration and learning, rather than being overly directive about the questions to be answered with data. We also decided that, in order to help collaborative groups form in the workshop, and to make sure we had materials prepared for particular art forms, we would invite a number of artists to act as anchor facilitators on the day.

Culture: the hackathon day 

Group photo of hackathon day

After an overview of Create Gloucestershire’s mission to bring about ‘arts everyday for everyone’, we began with introductions, going round the group and completing three sentences:

  • For me, data is…
  • For me, arts everyday is…
  • In Gloucestershire, is arts everyday….? 

For me, data is... (post-it notes)

Through this, we began to surface different experiences of engagement with data (everywhere; semi-transparent; impersonal; information; a goldmine; less well defined than art; complex; connective…), and with questions of access to arts (Arts everyday is: fun; making sense of the world; what you make of it; necessary; a privilege for some; an improbable dream; essential). 

We then turned briefly to look at some of the data available to explore these questions, before inviting our artists to explain the tools and approaches they had brought along to share:

  • Barney Heywood of Stand + Stare demonstrated use of touch-sensitive tape to create physical installations that respond to an audience with sound or visuals, as well as the Mayfly app that links stickers and sounds;
  • Illustrator and filmmaker, Joe Magee described the power of the pen, and how to sketch out responses to data;
  • Digital communications consultant and artist, Sarah Dixon described the use of textiles and paper to create work that mixes 2D and 3D; and
  • Architect Tomas Millar introduced a range of Virtual Reality technologies, and how tools from architecture and gaming could be adapted to create data-related artworks. 

To get our creative ideas flowing, we then ran through some rapid idea generation, with everyone rotating around our four artists groups, and responding to four different items of data (below) with as many different ideas as possible. From the 30+ ideas generated came some of the seeds of the works we then developed during the afternoon.

Slides showing: 38% drop in arts GCSE entries 2010 to 2019; Table of number and percentage of students a local secondary schools eligible for free school meals; Quantitative and qualitative data from a study on arts education in schools.

Following a short break, everyone had the chance to form groups and dig deeper into designing an artwork, guided by a number of questions:

  • What response to data do group members want to focus on? Collecting data? Data representation? Interpretation and response? Or exploring ‘missing data’?
  • Is there a story, or a question you want to explore?
  • Who is the audience for your creation?
  • What data do you need? Individual numbers; graphs; tables; geo data; qualitative data; network data or some other form? 
Example of sketches
Sketching early ideas

Groups then had around three hours to start making and creating prototype artworks based on their ideas, before we reconvened for a showcase of the creations.

The process was chaotic and collaborative. Some groups were straight into making: testing out the physical properties of materials, and then retrofitting data into their works later. Others sought to explore available datasets and find the stories amongst a wall of statistics. In some cases, we found ourselves gathering new data (e.g. lists of extracurricular activities taken from school websites), and in others, we needed to use exploratory data visualisation tools to see trends and extrapolate stories that could be explored through our artforms. People moved between groups to help create: recording audio, providing drawings, or sharing skills to stimulate new ways of increasing access to the stories within the data. Below is a brief summary of some of the works created, followed by some reflections on learning from the day. 

The artworks

Interactive audio: school subjects in harmony

Artwork: Barney Heywood and team | Photo credit: Kazz Hollick

Responding to questions about the balance of the school curriculum, and the low share of teaching hours occupied by the arts, the group recorded a four-part harmony audio clip, and set the volume of each part relative to the share of teaching time for arts, english, sciences and humanities. Through a collection of objects representing each subject, audiences could trigger individual parts, all four parts together, or a distorted version of the harmony. Through inviting interaction, and using volume and distortion, the piece invited reflection on the ‘right’ balance of school subjects, and the effect of loosing arts from the curriculum for the overall harmony of education. 

Fabric chromatography: creative combinations

Artwork: Sarah Dixon and team. Photo credit: Jay Haigh

 Picking up on a similar theme, this fabric based project sought to explore the mix of extracurricular activities available at a school, and how access to a range of activities can interact to support creative education. Using strips of fabric, woven in a grid onto a backcloth, the work immersed a dangling end of each strip in coloured ink, the mix of inks depending on the range of arts activities available at a particular school. As the ink soaked up vertical strands of the fabric, it also started to seep into horizontal strands, which could mix with other colours. The colours chosen reflected a chart representation of the dataset used to inform the work, establishing a clear link between data, information, and art work.

This work offered a powerful connection between art, data and science: allowing an exploration of how the properties of different inks, and different fabrics, could be used to represent data on ‘absorption’ of cultural education, and the benefits that may emerge from combining different cultural activities. The group envisaged works like this being developed with students, and then shown in the reception area of a school to showcase it’s cultural offer. 

The shrinking design teacher (VR installation)

Artwork: Tomas Millar & Pip Heywood. Photo credit: Jay Haigh

Using a series of photographs taken on a mobile phone, a 3D model of representation of Pip, a design teacher, was created in a virtual landscape. An audio recording of Pip describing the critical skill sets engendered through design teaching was linked to the model, which was set to shrink in size over the time of the recording reflecting 7-years of data on the reduction in design teaching hours in school.

Observed through VR goggles, the piece offered an emotive way to engage with a narrative on the power of art to encourage critical questioning of structures, and to support creative engagement with the world, all whilst – imperceptibly at first, and more clearly as the VR observer finds themselves looking down at the shrinking teacher – highlighting current trends in teaching hours. 

Arcade mechanicals

Artwork: Joe Magee and team. Photo credit: Jay Haigh

From the virtual to the physical, this sketch questioned the ‘rigged’ nature of grammar school and private education, imagining an arcade machine where the weight, size and shape of tokens were set according to various data points, and where the mechanism would lead to certain tokens having a better chance of winning. 

By exploring a data-informed arcade mechanisms, this idea captures the idea that statistical models can tell us something about potential future outcomes, but that outcomes are not entirely determined, and there are still elements of chance, or unpredictable interactions, in any individual story. 

Exclusion tags

Artwork: Joe Magee, Sarah Dixon and team. Photo: Jay Haigh

Building on data about different reasons for school exclusion, eight workshop participants were handed paper tags, marking them out for exclusion from the ‘classroom’. They were told to leave the room, where the images on their tags were scanned (using the Mayfly app) playing them a cold explanation of why they have been excluded and for how long.

The group were then invited to create a fabric based sculpture to represent the percentage of children excluded from school in Gloucestershire for the reasons indicated on their tag.  

The work sought to explore the subjective experience of being excluded, and to look behind the numbers to the individual stories – whilst also prototyping a possible creative yarn-bombing workshop that could be used with excluded young people to re-engage them with education.  

The team envisaged a further set of tags linked to personal narratives collected from young people excluded from school, bringing their voices into the piece to humanise the data story.

Library lights: stories from library users

This early prototype explored the potential VR to let an audience explore a space, shedding light on areas that are otherwise in darkness. Drawing on statistics about the fact that 33% of people use libraries, and on audio recordings – drawn from direct participant quotes collected by Create Gloucestershire during their 3-year Art of Libraries test programme describing how people benefitted from engagement with arts interventions in libraries across Gloucestershire – a virtual space was populated with 100 orbs – the percentage lit relating to those who use libraries. As the audience in VR approached a lit orb, an audio recording of an individual experience with a library would play. 

The creative team envisaged the potential to create a galaxy of voices: offseting negative comments about libraries from those that don’t use them (they were able to find a significant number of data sets showing negative perceptions about libraries, but few positive ones) with the good experiences of those that do.

Artwork: Tomas Millar and team (image to come)

Seeing our networks


Not so much an artwork, as a data visualisation, this piece took data gathered over the last five years by Create Gloucestershire to record attendance at Create Gloucestershire events. Adding in data on attendance at the Creative Lab, lists of people, events and event participation (captured and cleaned up using the vTiger CRM), were fed into Kumu, and used to build an interactive network diagram. The visual allows an identification of how, over time, CG events have both engaged with new people (out on the edge of the network), and have started to build ongoing connections. 

A note on naming

*One things we forgot to do (!) in our process was to ask each group to title their works, so the titles and descriptions above are given by the authors of this post. We will happily amend with input from each group. 

Learning

We closed our workshop reflecting on learning from the day. I was particularly struck by the way in which responding to dataset through the lens of artistic creation (and not just data visualisation) provided opportunities to ask new questions of datasets, and to critically question their veracity and politics: digging into the stories behind each data point, and powerfully combining qualitative and quantitative data to look not just at presenting data, but finding what it might mean for particular audiences. 

However, as Joe Magee framed it, it wasn’t always easy to find a route up the “gigantic data coalface”. Faced with hundreds of rows and columns of data, it was important to have access to tools and skills to carry out quick visualisations: yet knowing the right tools to use, or how to shape data so that it can be easily visualised, is not always straightforwards. Unlike a classic data hackathon, where there are often demands for the ‘raw data’, a data and art creative lab benefits from more work to prepare data extracts, and to provide access to layers of data (individual data points, a small set they belong in, the larger set they come from) . 

Our journey, however, took use beyond the datasets we had pre-prepared. One particular resource we came across was the UK Taking Part Survey which offers a range of analysis tools to drill down into statistics on participation in art forms by age, region and socio-economic status. With this dataset, and a number of others, our expectations were often confounded when, for example,  relationships we had expected to find between poverty and arts participation, or age and involvement, were not borne out in the data. 

This points to a useful symmetry: turning to data allowed us to challenge the assumptions that might otherwise be baked into an agenda-driven artwork, but engaging with data through an arts lens also allowed us to challenge the  assumptions behind data points, and behind the ways data is used in policy-making. 

We’ve also learnt more about how to frame an event like this. We struggled to describe it in advance and to advertise it. Too much text was the feedback from some! Now with images of this event, we can think about ways to provide a better visual story for future workshops of what might be involved. 

Given Create Gloucestershire’s commitment to arts everyday for everyone as a wholly inclusive statement of intent, it was exciting to see collaborators on the day truly engaging with data in a way they may not have done previously, and then expanding access to it by representing data in accessible and engaging forms which, additionally, could be explored by subjects of the data themselves.  What might have seemed “boring” or “troublesome” at the start of the day become a font of inspiration and creativity, opening up new conversations that may never have previously taken place and setting up the potential for new collaborations, conversations, advocacy and engagement.

Thanks

Thank you to the team at Create Gloucestershire for hosting the day, and particularly to Caroline, Pippa and Jay for all the organisation. Thanks to Kat at Atelier for hosting us, and to our facilitating artists: Barney, Sarah, Thomas and Joe. And thanks to everyone who gave up a Saturday to take part!

Photo credit where not stated: Jay Haigh

High value datasets: an exploration

[Summary: an argument for the importance of involving civil society, and thinking broad when exploring the concept of high value data (with lots of links to past research and the like smuggled in)]

On 26th June this year the European Parliament and Council published an update to the Public Sector Information (PSI) directive, now recast as Directive 2019/1024 “on open data and the re-use of public sector information.  The new text makes a number of important changes, including bringing data held by publicly controlled companies in utility and transport sectors into the scope of the directive, extending coverage of research data, and seeking to limit the granting of exclusive private sector rights to data created during public tasks, and increase the transparency when such rights are granted.

However, one of the most significant changes of all is the inclusion of Article 14 on High Value Datasets which gives the Commission power to adopt an implementing act “laying down a list of specific high-value datasets” that member states will be obliged to publish under open licenses, and, in some cases, using certain APIs and standards. The implementing acts will have the power to set out those standards. This presents a major opportunity to shape the open data ecosystem of Europe for decades to come.

The EU Commission have already issued a tender for a consultant to support them in defining a ‘List of High-value Datasets to be made Available by the Member States under the PSI-Directive’, and work looks set to advance at pace, particularly as the window granted by the directive to the Commission to set out a list of high value datasets is time-limited.

A few weeks back, a number of open data researchers and campaigners had a quick call to discuss ways to make sure past research, and civil society voices, inform the work that goes forward. As part of that, I agreed to draft a short(ish) post exploring the concept of high value data, and looking at some of the issues that might need to be addressed in the coming months. I’d hoped to co-draft this with colleagues, but with summer holidays and travel having intervened, am instead posting a sole authored post, with an invite to others to add/dispute/critique etc. 

Notably, whilst it appears few (if any) open-data related civil society organisations are in a position to lead a response to the current EC tender, the civil society open data networks built over the last decade in Europe have a lot to offer in identifying, exploring and quantifying the potential social value of specific open datasets.

What counts as high value?

The Commission’s tender points towards a desire for a single list of datasets that can be said to exist in some form in each member state. The directive restricts the scope of this list to six domains: geospatial, earth observation and environment, meteorological, statistical, company and company ownership, and mobility-related datasets. It also appears to anticipate that data standards will only be prescribed for some kinds of data: highlighting a distinction between data that may be high value simply by virtue of publication, and data which is high-value by virtue of it’s interoperability between states.

In the new directive, the definition of ‘high value datasets’ is put as:

“documents the re-use of which is associated with important benefits for society, the environment and the economy, in particular because of their suitability for the creation of value-added services, applications and new, high-quality and decent jobs, and of the number of potential beneficiaries of the value-added services and applications based on those datasets;” (§2.10)

Although the ordering of society, environment and economy is welcome, there are subtle but important differences from the definition advanced in a 2014 paper from W3C and PwC for the European Commission which described a number of factors for determining whether there was high value to making a dataset open (and standardising it in some ways). It focussed attention on whether publication of a dataset:

  • Contributes to transparency
  • Helps governments meet legal obligations
  • Relates to a public task
  • Realises cost reductions; and
  • Has some value to a large audience, or substantial value to a smaller audience.

Although the recent tender talks of identifying “socio-economic” benefits of datasets, overall it adopts a strongly economic frame, seeking quantification of these and asking in particular for evaluation of “potential for AI applications of the identified datasets;”. (This particular framing of open data as a raw material input for AI is something I explored in the recent State of Open Data book, where the privacy chapter also picked up on a brief exploration how AI applications may also create new privacy risks for release of certain datasets.)  But to keep wider political and social uses of open data in view, and to recognise that quantification of benefits is not a simple process of adding up the revenue of firms that use that data, any comprehensive method to explore high value datasets will need to consider a range of issues, including that:

  • Value is produced in a range of different ways
  • Not all future value can be identified from looking at existing data use cases
  • Value may result from network effects
  • Realising value takes more than data
  • Value is a two-sided calculation; and
  • The distribution of value matters as well as the total amount

I dig into each of these below.

Value is produced in different ways

A ‘raw material’ theory of change still pervades many discussions of open data, in spite of the growing evidence base about the many different ways that opening up access to data generates value. In ‘raw material’ theory, open data is an input, taken in by firms, processed, and output as part of new products and services. The value of the data can then be measured in the ‘value add’ captured from sales of the resulting product or service. Yet, this only captures a small part of the value that mandating certain datasets be made open can generate. Other mechanisms at play can include:

  • Risk reduction. Take, for example, beneficial ownership data. Quite asides from the revenue generated by ‘Know Your Customer (KYC)’ brokers who might build services off the back of public registers of beneficial ownership, consider the savings to government and firms from not being exposed to dodgy shell-companies, and the consumer surplus generated by supporting a clamp down on illicit financial flows into the housing market by supporting more effective cross-border anti-money laundering investigations. OpenOwnership are planning research later this year to dig more into how firms are using, or could use, beneficial ownership transparency data including to manage their exposure to risk. Any quantification needs to take into account not only value gained, but also value ‘not lost’ because a dataset is made open.
  • Internal efficiency and innovation. When data is made open, and particularly when standards are adopted, it often triggers a reconfiguration of data practices inside the data (c.f. Goëta & Davies), with the potential for this to support more efficient working, and enable innovation through collaboration between government, civil society and enterprise. For example, the open publication of contracting data, particularly with the adoption of common data standards, has enabled a number of governments to introduce new analytical tools, finding ways to get a better deal on the products and services they buy. Again, this value for money for the taxpayer may be missed by a simple ‘raw material’ theory.
  • Political and rights impacts. The 2014 W3C/PWC paper I cited earlier talks about identifying datasets with “some value to a large audience, or substantial value to a smaller audience.”. There may also be datasets that have low likelihood of causing impact, but high impact (at least for those affected) when they do. Take, for example, statistics on school admissions. When I first looked at use of open data back in 2009, I was struck by the case of an individual gaining confidence from the fact that statistics on school admission appeals were available (E7) when constructing an appeal case against a school’s refusal to admit their own child. The open availability of this data (not necessarily standardised or aggregated) had substantial value to empowering a citizen in securing their rights. Similarly, there are datasets that are important for communities to secure their rights (e.g. air quality data), or to take political action to either enforce existing policy (e.g. air quality limits), or to change policy (e.g. secure new air quality action zones). No only is such value difficult to quantify, but whether or not certain data generates value will vary between countries in accordance with local policies and political issues. The definition of EU-wide ‘high value datasets’ should not crowd out the possibility or process of defining data that is high-value in particular country. That said, there may at least be scope to look at datasets in the study categories that have substantial potential value in relation to EU social and environmental policy priorities.

Beyond the mechanisms above, there may also be datasets where we find a high intrinsic value in the transparency their publication brings, even without a clear evidence base that can quantifies their impact. In these cases, we might also talk of the normative value of openness, and consider which datasets deserve a place on the high-value list because we take the openness of this data to be foundational to the kind of societies we want to live in, just as we may take certain freedoms of speech and movement as foundational to the kind of Europe we want to see created.

Not all value can be found from prior examples

The tender cites projects like the Open Data Barometer (which I was involved in developing the methodology for) as potential inspirations for the design of approaches to assess “datasets that should belong to the list of high value datasets”. The primary place to look for that inspiration is not in the published stats, but in the underlying qualitative data which includes raw reports of cases of political, social and economic impact from open data. This data (available for a number of past editions of the Barometer) remains an under-explored source of potential impact cases that could be used to identify how data has been used in particular countries and settings. Equally, projects like the State of Open Data can be used to find inspiration on where data has been used to generate social value: the chapter on Transport is as case-in-point, looking at how comprehensive data on transport can support applications improving the mobility of people with specific needs.

However, many potential uses and impacts of open data are still to be realised, because the data they might work with has not heretofore been accessible. Looking only at existing cases of use and impact is likely to miss such cases. This is where dialogue with civil society becomes vitally important. Campaigners, analysts and advocates may have ideas for the projects that could exist if only particular data was available. In some cases, there will be a hint at what is possible from academic projects that have gained access to particular government datasets, or from pilot projects where limited data was temporarily shared – but in other cases, understanding potential value will require a more imaginative and forward-looking and consultative process. Given the upcoming study may set the list of high value datasets for decades to come – it’s important that the agenda is not be solely determined by prior publication precedent.

For some datasets, certain value comes from network effects

If one country provides an open register of corporate ownership, the value this has for anti-corruption purposes only goes so far. Corruption is a networked game, and without being able to following corporate chains across borders, the value of a single register may be limited. The value of corporate disclosures in one jurisdiction increase the more other jurisdictions provide such data. The general principle here, that certain data gains value through network effects, raises some important issues for the quantification of value, and will help point towards those datasets where standardisation is particularly important. Being able to show, for example, that the majority of the value of public transit data comes from domestic use (and so interoperability is less important), but the majority of value of, say, carbon emission or climate change mitigation financing data, comes from cross-border use, will be important to support prioritisation of datasets.

Value generation takes more than data

Another challenge of of the ‘raw material’ theory of change is that it often fails to consider (a) the underlying quality (not only format standardisation) of source data, and (b) the complementary policies and resources that enable use. For example, air quality data from low-quality or uncalibrated particulate sensors may be less valuable than data from calibrated and high quality sensors, particularly when national policy may set out criteria for the kinds of data that can be used in advancing claims for additional environmental protections in high-pollution areas. Understanding this interaction of ‘local data’ and the governance contexts where it is used is important in understanding how far, and under what conditions, one may extrapolate from value identified in one context, to potential value to be realised in another. This calls for methods that can go beyond naming datasets, to being able to describe features (not just formats) that are important for them to have. 

Within the Web Foundation hosted Open Data Research Network a few years back we spent considerable time refining a framework for thinking about all the aspects that go into securing impact (and value) from open data, and work by GovLab has also identified factors that have been important to the success of initiatives using open data. Beyond this, numerous dataset-specific frameworks for understanding what quality looks like may exist. Whilst recommending dataset-by-dataset measures to enhance the value realised from particular open datasets may be beyond the scope of the European Commission’s current study – when researching and extrapolating from past value generation in different contexts it is important to look at the other complementary factors that may have contributed that value realising alongside the simple availability of data.

Value is a two-sided calculation

It can be temping to quantify the value of a dataset simply by taking all the ‘positive’ value it might generate, and adding it up. But, a true quantification calculation also needs to consider potential negative impacts. In some cases, this could be positive economic value set against some social or ecological dis-benefit. For example, consider the release of some data that might increase use of carbon-intensive air and road transport. While this  could generate quantifiable revenue for haulage and airline firms, it might undermine efforts to tackle climate change, destroying long-term value. Or in other cases, there may be data that provides social benefit (e.g. through the release of consumer protection related data) but that disrupts an existing industry in ways that reduce private sector revenues. 

Recognising the power of data, involves recognising that power can be used in both positive and negative ways. A complete balance sheet needs to consider the plus and the minus. This is another key point where dialogue with civil society will be vital – and not only with open data advocates, but with those who can help consider the potential harms of certain data being more open. 

Distribution of value matters

Last but not least, when considering public investment in ‘high value’ datasets, it is important to consider who captures that value. I’ve already hinted at the fact that value might be captured as government surplus, consumer surplus or producer (private sector) surplus – but there are also relevant question to ask about which countries or industries may be best placed to capture value from cross-border interoperable datasets.

When we see data as infrastructure, then it can help us consider the potential to both provide infrastructure that is open to all and generative of innovation, but also to design policies that ensure those capturing value from the infrastructure are contributing to its maintenance.

In summary

Work on methodologies to identify high value datasets in Europe should not start from scratch, and stand to benefit substantially from engaging with open data communities across the region. There is a risk that a narrow conceptualisation and quantification of ‘high value’ will fail to capture the true value of openness, and to consider the contexts of data production and use. However, there is a wealth of research from the last decade (including some linked in this post, and cited in State of Open Data) to build upon, and I’m hopeful that whichever consultant or consortium takes on the EC’s commissioned study, they will take as broad a view as possible within the practical constraints of their project.

Linking data and AI literacy at each stage of the data pipeline

[Summary: extended notes from an unConference session]

At the recent data literacy focussed Open Government Partnership unConference day (ably facilitated by my fellow Stroudie Dirk Slater)  I acted as host for a break-out discussion on ‘Artificial Intelligence and Data Literacy’, building on the ‘Algorithms and AI’ chapter I contributed to The State of Open Data book.

In that chapter, I offer the recommendation that machine learning should be addressed within wider open data literacy building.  However, it was only through the unConference discussions that we found a promising approach to take that recommendation forward: encouraging a critical look at how AI might be applied at each stage of the School of Data ‘Data Pipeline’.

The Data Pipeline, which features in the Data Literacy chapter of The State of Open Data, describes seven stages for woking with data, from defining the problem to be addressed, through to finding and getting hold of relevant data, verifying and cleaning it, and analysing data and presenting findings.

Figure 2: The School of Data’s data pipeline. Source: https://schoolofdata.org/methodology/
Figure: The School of Data’s data pipeline. Source: https://schoolofdata.org/methodology/

 

Often, AI is described as a tool for data analysis (any this was the mental framework many unConference session participants started with). Yet, in practice, AI tools might play a role at each stage of the data pipeline, and exploring these different applications of AI could support a more critical understanding of the affordances, and limitations, of AI.

The following rough worked example looks at how this could be applied in practice, using an imagined case study to illustrate the opportunities to build AI literacy along the data pipeline.

(Note: although I’ll use machine-learning and AI broadly interchangeably in this blog post, as I outline in the State of Open Data Chapter, AI is a  broader concept than machine-learning.)

Worked example

Imagine a human rights organisation, using a media-monitoring service to identify emerging trends that they should investigate. The monitoring service flags a spike in gender based violence, encouraging them to seek out more detailed data. Their research locates a mix of social media posts, crowdsourced data from a harassment mapping platform, and official statistics collected in different regions across the country. They bring this data together, and seek to check it’s accuracy, before producing an analysis and visually impactful report.

As we unpack this (fictional) example, we can consider how algorithms and machine-learning are, or could be, applied at each stage – and we can use that to consider the strengths and weaknesses of machine-learning approaches, building data and AI literacy.

  • Define – The patterns that first give rise to a hunch or topic to investigate may have been identified by an algorithmic model.  How does this fit with, or challenge, the perception of staff or community members? If there is a mis-match – is this because the model is able to spot a pattern than humans were not able to see (+1 for the AI)? Or could it be because the model is relying on input data that reflects certain bias (e.g. media may under-report certain stories, or certain stories may be over-reported because of certain cognitive biases amongst reporters)?

  • Find – Search engine algorithms may be applying machine-learning approaches to identify and rank results. Machine-translation tools, that could be used to search for data described in other languages, are also an example of really well established AI. Consider the accuracy of search engines and machine-translation: they are remarkable tools, but we also recognise that they are nowhere near 100% reliable. We still generally rely on a human to sift through the results they give.

  • Get – One of the most common, and powerful, applications of machine-learning, is in turning information into data: taking unstructured content, and adding structure through classification or data extraction. For example, image classification algorithms can be trained to convert complex imagery into a dataset of terms or descriptions; entity extraction and sentiment analysis tools can be used to pick out place names, event descriptions and a judgement on whether the event described is good or bad, from free text tweets, and data extraction algorithms can (in some cases) offer a much faster and cheaper way to transcribe thousands of documents than having humans do the work by hand. AI can, ultimately, change what counts as structured data or not.  However, that doesn’t mean that you can get all the data you need using AI tools. Sometimes, particularly where well-defined categorical data is needed, getting data may require creation of new reporting tools, definitions and data standards.

  • Verify – School of Data describe the verification step like this: “We got our hands in the data, but that doesn’t mean it’s the data we need. We have to check out if details are valid, such as the meta-data, the methodology of collection, if we know who organised the dataset and it’s a credible source.” In the context of AI-extracted data, this offers an opportunity to talk about training data and test data, and to think about the impact that tuning tolerances to false-positives or false-negatives might have on the analysis that will be carried out. It also offers an opportunity to think about the impact that different biases in the data might have on any models built to analyse it.

  • Clean – When bringing together data from multiple sources, there may be all sorts of errors and outliers to address. Machine-learning tools may prove particularly useful for de-duplication of data, or spotting possible outliers. Data cleaning to prepare data for a machine-learning based analysis may also involve simplifying a complex dataset into a smaller number of variables and categories. Working through this process can help build an understanding of the ways in which, before a model is applied, certain important decisions have already been made.

  • Analyse – Often, data analysis takes the form of simple descriptive charts, graphs and maps. But, when AI tools are added to the mix, analysis might involve building predictive models, able, for example, to suggest areas of a county that might see future hot-spots of violence, or that create interactive tools that can be used to perform ongoing monitoring of social media reports. However, it’s important in adding AI to the analysis toolbox, not to skip entirely over other statistical methods: and instead to think about the relative strengths and weaknesses of a machine-learning model as against some other form of statistical model. One of the key issues to consider in algorithmic analysis is the ’n’ required: that is, the sample size needed to train a model, or to get accurate results. It’s striking that many machine-learning techniques required a far larger dataset that can be easily supplied outside big corporate contexts. A second issue that can be considered in looking at analysis is how ‘explainable’ a model is: does the machine-learning method applied allow an exploration of the connections between input and output? Or is it only a black box.

  • Present – Where the output of conventional data analysis might be a graph or a chart describing a trend, the output of a machine-learning model may be a prediction. Where a summary of data might be static, a model could be used to create interactive content that responds to user input in some way. Thinking carefully about the presentation of the products of machine-learning based analysis could support a deeper understanding of the ways in which such outputs could or should be used to inform action.

The bullets above give just some (quickly drafted and incomplete) examples of how the data pipeline can be used to explore AI-literacy alongside data literacy. Hopefully, however, this acts as enough of a proof-of-concept to suggest this might warrant further development work.

The benefit of teaching AI literacy through open data

I also argue in The State of Open Data that:

AI approaches often rely on centralising big datasets and seeking to personalise services through the application of black-box algorithms. Open data approaches can offer an important counter-narrative to this, focusing on both big and small data and enabling collective responses to social and developmental challenges.

Operating well in a datified world requires citizens to have a critical appreciation of a wide variety of ways in which data is created, analysed and used – and the ability to judge which tool is appropriate to which context.  By introducing AI approaches as one part of the wider data toolbox, it’s possible to build this kind of literacy in ways that are not possible in training or capacity building efforts focussed on AI alone.

The politics of misdirection? Open government ≠ technology.

[Summary: An extended write-up of a tweet-length critique]

The Open Government Partnership (OGP) Summit is, on many levels, an inspiring event. Civil society and government in dialogue together on substantive initiatives to improve governance, address civic engagement, and push forward transparency and accountability reforms. I’ve had the privilege, through various projects, to be a civil society participant in each of the 6 summits in Brasilia, London, Mexico, Paris, Tbilisi and now Ottawa. I have a lot of respect for the OGP Support Unit team, and the many government and civil society participants who work to make OGP a meaningful forum and mechanism for change. And I recognise that the substance of a summit is often found in the smaller sessions, rather than the set-piece plenaries. But, the summit’s opening plenary offered a powerful example of the way in which a continued embrace of a tech-goggles approach at OGP, and weaknesses in the design of the partnership and it’s events, misdirect attention, and leave some of the biggest open government challenges unresolved.

Trudeau’s Tech Goggles?

We need to call out the techno-elitism, and political misdirection, that mean  the Prime Minister of Canada can spend the opening plenary in an interview that focussed more on regulation of Facebook, than on regulation of the money flowing into politics; and more time answering questions about his Netflix watching, than discussing the fact that millions of people still lack the connectivity, social capital or civic space to engage in any meaningful form of democratic decision making. Whilst (new-)media inevitably plays a role in shaping patterns of populism, a narrow focus on the regulation of online platforms directs attention away from the ways in which economic forces, transportation policy, and a relentless functionalist focus on ‘efficient’ public services, without recognising their vital role in producing social-solidarity,  has contributed to the social dislocation in which populism (and fascism) finds root.

Of course, the regulation of large technology firms matters, but it’s ultimately an implementation detail that some come as part of wider reforms to our democratic systems. The OGP should not be seeking to become the Internet Governance Forum (and if it does want to talk tech regulation, then it should start by learning lessons from the IGFs successes and failures), but should instead be looking deeper at the root causes of closing civic space, and of the upswing of populist, non-participatory, and non-inclusive politics.

Beyond the ballot box?

The first edition of the OGP’s Global Report is sub-titled ‘Democracy Beyond the Ballot Box and opens with the claim that:

…authoritarianism is on the rise again. The current wave is different–it is more gradual and less direct than in past eras. Today, challenges to democracy come less frequently from vote theft or military coups; they come from persistent threats to activists and journalists, the media, and the rule of law.

The threats to democracy are coming from outside of the electoral process and our response must be found there too. Both the problem and the solution lie “beyond the ballot box.”

There appears to be a non-sequitur here. That votes are not being stolen through physical coercion, does not mean that we should immediately move our focus beyond electoral processes. Much like the Internet adage that ‘censorship is damage, route around it, there can be a tendency in Open Government circles to treat the messy politics of governing as a fundamentally broken part of government, and to try and create alternative systems of participation or engagement that seek to be ‘beyond politics’. Yet, if new systems of participation come to have meaningful influence, what reason do we have to think they won’t become subject to the legitimate and illegitimate pressures that lead to deadlock or ‘inefficiency’ in our existing institutions? And as I know from local experience, citizen scrutiny of procurement or public sending from outside government can only get us so far without political representatives willing to use and defend they constitutional powers of scrutiny.

I’m more and more convinced that to fight back against closing civic space and authoritarian government, we cannot work around the edges: but need to think more deeply about about how we work to get capable and ethical politicians elected: held in check by functioning party systems, and engaging in fair electoral competition overseen by robust electoral institutions. We need to go back to the ballot box, rather than beyond it. Otherwise we are simply ceding ground to the forces who have progressively learnt to manipulate elections, without needing to directly buy votes.

Globally leaders, locally laggards?

The opening plenary also featured UK Government Minister John Penrose MP. But, rather than making even passing mention of the UK’s OGP National Action Plan, launched just one day before, Mr Penrose talked about UK support for global beneficial ownership transparency. Now: it is absolutely great that that ideas of beneficial ownership transparency are gaining pace through the OGP process.

But, there is a design flaw in a multi-stakeholder partnership where a national politician of a member country is able to take the stage without any response from civil society. And where there is no space for questions on the fact that the UK government has delayed the extension of public beneficial ownership registries to UK Overseas Territories until at least 2023. The misdirection, and #OpenWashing at work here needs to be addressed head on: demanding honest reflections from a government minister on the legislative and constitutional challenges of extending beneficial ownership transparency to tax havens and secrecy jurisdictions.

As long as politicians and presenters are not challenged when framing reforms as simple (and cheap) technological fixes, we will cease to learn about and discuss the deeper legal reforms needed, and the work needed on implementation. As our State of Open Data session on Friday explored: data and standards must be the means not the ends, and more public scepticism about techno-determinist presentations would be well warranted.

Back, however, to event design. Although when hosted in London, the OGP Summit offered UK civil society at least, an action-forcing moment to push forward substantive National Action Plan commitments, the continued disappearance of performative spaces in which governments account for their NAPs, or  different stakeholders from a countries multi-stakeholder group share the stage, means that (wealthy, and northern) governments are put in control of the spin.

Grounds for hope?

It’s clear that very many of us understand that open government ≠ technology, at least if (irony noted) likes and RTs on the below give a clue. 

But we need to hone our critical instincts to apply that understanding to more of the discussions in fora like OGP. And if, as the Canadian Co-Chair argued in closing, “OGP is developing a new forms of multilateralism”, civil society needs to be much more assertive in taking control of the institutional and event design of OGP Summits, to avoid this being simply a useful annual networking shin-dig. The closing plenary also included calls to take seriously threats to civic space: but how can we make sure we’re not just saying this from the stage in the closing, but that the institutional design ensures there are mechanisms for civil society to push forward action on this issue. 

In looking to the future of OGP, we should consider how civil society spends some time taking technology off the table. Let it emerge as an implementation detail, but perhaps let’s see where we get when we don’t let techo-discussions lead?

The lamentable State of Open Government in the UK

Yesterday the UK Government published, a year late, it’s most recent Open Government Partnership National Action Plan. It would be fair to say that civil society expectations for the plan were low, but when you look beyond the fine words to the detail of the targets set, the plan appears to  limbo under even the lowest of expectations.

For example, although the Ministerial foreword acknowledges that “The National Action Plan is set against the backdrop of innovative technology being harnessed to erode public trust in state institutions, subverting and undermining democracy, and enabling the irresponsible use of personal information.”, the furthest the plan goes in relation to these issues is a weak commitment to “maintain an open dialogue with data users and civil society to support the development of the Government’s National Data Strategy.” This commitment has supposedly been ‘ongoing’ since September 2018, yet try as I might to find any public documentation of how the government is engaging around the data strategy – I’m drawing a blank. Not to mention that there is absolutely zilch here about actually tackling the ways in which we see democracy being subverted, not only through use of technology, but also through government’s own failures to respond to concerns about the management of elections or to bring forward serious measures to tackle the illegal flow of money into party and referendum campaigning. For work on open government to be meaningful we have to take off the tech-goggles, and address the very real governance  and compliance challenges harming democracy in the UK. This plan singularly fails at that challenge.

In short, this is a plan with nothing new; with very few measurable targets that can be used to hold government to account; and with a renewed conflation of open data and open government.

Commitment 3 on Open Policy Making, to “Deliver at least 4 Open Policy Making demonstrator projects” have suspicious echoes of the 2013 commitment 16 to run “at least five ‘test and demonstrate projects’ across different policy areas.”. If central government has truly “led by example” on “increasingly citizen participation” as the introduction to this plan claims, then it seems all we are every going to get are ad-hoc examples. Evidence of any systemic action to promote engagement is entirely absent.  The recent backsliding on public engagement in the UK vividly underscored by the fact that commitment 8 includes responding by November 2019 to a 2016 consultation. Agile, iterative and open government this is not.

Commitment 6 on an ‘Innovation in Democracy Programme involves token funding to allow a few local authority areas to pilot ‘Area Democracy Forums’, based on a citizens assembly models – at the same time that the government refuses to support any sort of participatory citizen dialogue to deal with pressing issue of both Brexit and Climate Change. The contract to deliver this work has already been tendered in any case, and the only targets in the plan relate to ‘pilots delivered’ and ‘evaluation’. Meaningful targets that might track how far progress has been made in actually giving citizens power over decisions making are notably absent.

The most substantive targets can be found under commitments 4 and 5 on Open Contracting and Natural Resource Transparency (full disclosure: most of the Open Contracting targets come from draft content I wrote when a member of the UK Open Contracting Steering Group). If Government actually follows through on the commitment to “Report regularly on publication of contract documents, and extent of redactions.”, and this reporting leads to better compliance with the policy requirements to disclose contracts, there may even be something approaching transformative here. But, the plan suggests such a commitment to quarterly reporting should have been in place since the start of the year, and I’ve not yet tracked down any such report. 

Overall these commitments are about house-keeping: moving forward a little on the compliance with policy requirements that should have been met long ago. By contrast, the one draft commitment that could have substantively moved forward Open Contracting in the UK, by shifting emphasis to the local level where there is greatest scope to connect contracting and citizen engagement, is the one commitment conspicuously dropped from the final National Action Plan.  Similarly, whilst the plan does provide space for some marginal improvements in grants data (Commitment 1), this is simply a continuation of existing commitments.

I recognise that civil servants have had to work long and hard to get even this limited NAP through government given the continued breakdown normal Westminster operations. However, as I look back to the critique we wrote of the first UK OGP NAP back in 2012, it seems to me that we’re back where we started or even worse: with a government narrative that equates open government and open data, and a National Action Plan that repackages existing work without any substantive progress or ambition. And we have to consider when something so weak is actually worse than nothing at all.

I resigned my place on the UK Open Government Network Steering Group last summer: partly due to my own capacity, but also because of frustration at stalled progress, and the co-option of civil society into a process where, instead of speaking boldly about the major issues facing our public sphere, the focus has been put on marginal pilots or small changes to how data is published. It’s not that those things are unimportant in and of themselves: but if we let them define what open government is about – well, then we have lost what open government should have been about.

And even we do allow the OGP to have a substantial emphasis on open data, where the UK government continues to claim leadership, the real picture is not so rosy. I’ll quote from Rufus Pollock and Danny Lämmerhirt’s analysis of the UK in their chapter for the State of Open Data:

“Open data lost most of its momentum in late 2015 as government attention turned to the Brexit referendum and later to Brexit negotiations. Many open data advisory bodies ceased to exist or merged with others. For example, the Public Sector Transparency Board became part of the Data Steering Group in November 2015, and the Open Data User Group discontinued its activities entirely in 2015. There have also been political attempts to limit the Freedom of Information Act (FOIA) based on the argument that opening up government data would be an adequate substitute. There are still issues around publishing land ownership information across all regions, and some valuable datasets have been transferred out of government ownership avoiding publication, such as the Postal Address File that was sold off during the privatisation of the Royal Mail.”

The UK dropped in the Open Data Barometer rankings in 2017 (the latest data we have), and one of the key commitments from the last National Action Plan to “develop a common data standard for reporting election results in the UK” and improve crucial data on elections results had ‘limited’ progress according to the IRM, demonstrating a poor recent track record from the UK on opening up new datasets where it matters.

So where from here?

I generally prefer my blogging (and engagement) to be constructive. But I’m hoping that sometimes, the most constructive thing to do, is to call out the problems, even when I can’t see a way to solutions. Right now, it feels to me as though the starting point must be to recognise:

  • The UK Government is failing to live up to the Open Government Declaration.
  • UK Civil Society has failed to use the latest OGP NAP process to secure any meaningful progress on the major open government issues of the day.
  • The Global OGP process is doing very little to spur on UK action.

It’s time for us to face up to these challenges, and work out where we head from here. 

Over the horizons: reflections from a week discussing the State of Open Data

[Summary: thinking aloud with five reflections on future directions for ope data related work, following discussions around the US east coast]

Over the last week I’ve had the opportunity to share findings from The State of Open Data: Histories and Horizons in a number of different settings: from academic roundtables, to conference presentations, and discussion panels.

Each has been an opportunity not only to promote the rich open access collection of essays just published, but also a chance to explore the many and varied chapters of the book as the starting point for new conversation about how to take forward an open approach to data in different settings and societies.

In this post I’m going to try and reflect on a couple of themes that have struck me during the week. (Note: These are, at this stage, just my initial and personal reflections, rather than a fully edited take on discussions arising from the book.)

Panel discussion at the GovLab with Tariq Khokhar, Adrienne Schmoeker and Beth Noveck.

Renewing open advocacy in a changed landscape

The timeliness of our look at the Histories and Horizons of open data was underlined on Monday when a tweet from Data.gov announced this week as their 10th anniversary, and the Open Knowledge Foundation, also celebrated their 15th birthday with a return to their old name, a re-focussed mission to address all forms of open knowledge, and an emphasis on creating “a future that is fair, free and open.”As they put it:

  …in 2019, our world has changed dramatically. Large unaccountable technology companies have monopolised the digital age, and an unsustainable concentration of wealth and power has led to stunted growth and lost opportunities. “

going on to say

“we recognise it is time for new rules for this new digital world.”

Not only is this a welcome and timely example of the kind of “thinking politically we call for in the State of Open Data conclusion, but it chimes with many of the discussions this week, which have focussed as much on the ways in which private sector data should be regulated as they have on opening up government data. 

While, in tools like the Open Data Charter’s Open Up Guides, we have been able to articulate a general case for opening up data in a particular sector, and then to enumerate ‘high value’ datasets that efforts should attend to, future work may need to go even deeper into analysing the political economy around individual datasets, and to show how a mix of voluntary data sharing, and hard and soft regulation, can be used to more directly address questions about how power is created, structured and distributed through control of data.

As one attendee at our panel at the Gov Lab put it, right now, open data is still often seen as a “perk not a right”.  And although ‘right to data’ advocacy has an important role, it is by linking access to data to other rights (to clean air, to health, to justice etc.) that a more sophisticated conversation can develop around improving openness of systems as well as datasets (a point I believe Adrienne Schmoeker put in summing up a vision for the future).

Policy enables, problems drive

So does a turn towards problem-focussed open data initiatives mean we can put aside work on developing open data policies or readiness assessments? In short, no.

In a lunchtime panel at the World Bank, Anat Lewin offered an insightful reflection on The State of Open Data from a multilateral’s perspective, highlighting the continued importance of developing a ‘whole of government’ approach to open data. This was echoed in Adrienne Schmoeker’s description at The Gov Lab of the steps needed to create a city-wide open data capacity in New York. In short, without readiness assessment and open data policies put in place, initiatives that use open data as a strategic tool are likely to rub up against all sorts of practical implementation challenges.

Where in the past, government open data programmes have often involved going out to find data to release, the increasing presence of data science and data analytics teams in government means the emphasis is shifting onto finding problems to solve. Provided data analytics teams recognise the idea of ‘data as a team sport’, requiring not just technical skills, but also social science, civic engagement and policy development skill sets – and providing professional values of openness are embedded in such teams – then we may be moving towards a model in which ‘vertical’ work on open data policy, works alongside ‘horizontal’ problem-driven initiatives that may make less use of the language of open data, but which still benefit from a framework of openness.

Chapter discussions at the OpenGovHub, Washington DC

Political economy really matters

It’s been really good to see the insights that can be generated by bringing different chapters of the book into conversation. For example, at the Berkman-Klein Centre, comparing and contrasting attitudes in North America vs. North Africa towards the idea that governments might require transport app providers like Uber to share their data with the state, revealed the different layers of concern, from differences in the market structure in each country, to different levels of trust in the state. Or as danah boyd put it in our discussions at Data and Society, “what do you do when the government is part of your threat model?”.  This presents interesting challenges for the development of transnational (open) data initiatives and standards – calling for a recognition that the approach that works in one country (or even one city), may not work so well in others. Research still does too little to take into account the particular political and market dynamics that surround successful open data and data analytic projects.

A comparisons across sectors, emerging from our ‘world cafe’ with State of Open Data authors at the OpenGovHub also shows the trade-offs to be made when designing transparency, open data and data sharing initiatives. For example, where the extractives transparency community has the benefit of hard law to mandate certain disclosures, such law is comparatively brittle, and does not always result in the kind of structured data needed to drive analysis. By contrast, open contracting, in relying on a more voluntary and peer-pressure model, may be able to refine it’s technical standards more iteratively, but perhaps at the cost of weaker mechanisms to enforce comprehensive disclosure. As Noel Hidalgo put it, there is a design challenge in making a standard that is a baseline, on top of which more can be shared, rather than one that becomes a ceiling, where governments focus on minimal compliance.

It is also important to recognise that when data has power, many different actors may seek to control, influence and ultimately mess with it. As data systems become more complex, the vectors for attack can increase. In discussions at Data & Society, we briefly touched on one cases where a government institution has had to take considerable steps to correct for external manipulation of it’s network of sensors. When data is used to trigger direct policy response (e.g. weather data triggering insurance payouts, or crime data triggering policing action), then the security and scrutiny of that data becomes even more important.

Open data as a strategic tool for data justice

I heard the question “Is open data dead?” a few times over this week. As the introductory presentation I gave for a few talks noted, we are certainly beyond peak open data hype. But, the jury is, it seems, still very much out on the role that discourses around open data should play in the decade ahead. At our Berkman-Klein Centre roundtable, Laura Bacon shared work by Omidyar/Luminate/Dalberg that offered a set of future scenarios for work on open data, including the continued existence of a distinct open data field, and an alternative future in which open data becomes subsumed within some other agenda such as ‘data rights’. However, as we got into discussions at Data & Society of data on police violence, questions of missing data, and debates about the balancing act to be struck in future between publishing administrative data and protecting privacy, the language of ‘data justice’ (rather than data rights) appeared to offer us the richest framework for thinking about the future.

Data justice is broader than open data, yet open data practices may often be a strategic tool in bringing it about. I’ve been left this week with a sense that we have not done enough to date to document and understand ways of drawing on open data production, consumption and standardisation as a form of strategic intervention. If we had a better language here, better documented patterns, and a stronger evidence base on what works, it might be easier to both choose when to prioritise open data interventions, and to identify when other kinds of interventions in a data ecosystem are more appropriate tools of social progress and justice.

Ultimately, a lot of discussions the book has sparked have been less about open data per-se, and much more about the shape of data infrastructures, and questions of data interoperability.  In discussions of Open Data and Artificial Intelligence at the OpenGovHub, we explored the failure of many efforts to develop interoperability within organisations and across organisational boundaries. I believe it was Jed Miller who put the challenge succinctly: to build interoperable systems, you need to “think like an organiser” – recognising data projects also as projects of organisational change and mass collaboration. Although I think we have mostly moved past the era in which civic technologists were walking around with an open data hammer, and seeing every problem as a nail, we have some way to go before we have a full understanding of the open data tools that need to be in everyones toolbox, and those that may still need a specialist.

Reconfiguring measurement to focus on openness of infrastructure

One way to support advocacy for openness, whilst avoiding reifying open data, and integrating learning from the last decade on the need to embed open data practices sector-by-sector, could be found in an updated approach to measurement. David Eaves made the point in our Berkman-Klein Centre roundtable that the number of widely adopted standards, as opposed to the number of data portals or datasets, is a much better indicator of progress.

As resource for monitoring, measuring or benchmarking open data per-se becomes more scarce, there is an opportunity to look at new measurement frames that look at the data infrastructure and ecosystem around a particular problem, and ask about the extent of openness, not only of data, but also of governance. A number of conversations this week have illustrated the value of shifting the discussion onto data infrastructure and interoperability: yet (a) the language of data infrastructure has not yet taken hold, and can be hard to pin down; and (b) there is a risk of openness being downplayed in favour of a focus on centralised data infrastructures. Updating open data measurement tools to look at infrastructures and systems rather than datasets may be one way to intervene in this unfolding space.

Thought experiment: a data extraction transparency initiative

[Summary: rapid reflections on applying extractives metaphors to data in a international development context]

In yesterday’s Data as Development Workshop at the Belfer Center for Science and International Affairs we were exploring the impact of digital transformation on developing countries and the role of public policy in harnessing it. The role of large tech firms (whether from Silicon Valley, or indeed from China, India and other countries around the world) was never far from the debate. 

Although in general I’m not a fan of descriptions of ‘data as the new oil’ (I find the equation tends to be made as part of rather breathless techno-deterministic accounts of the future), an extractives metaphor may turn out to be quite useful in asking about the kinds of regulatory regimes that could be appropriate to promote both development, and manage risks, from the rise of data-intensive activity in developing countries.

Over recent decades, principles of extractives governance have developed that recognise the mineral and hydrocarbon resources of a country as at least partially part of the common wealth, such that control of extraction should be regulated, firms involved in extraction should take responsibility for externalities from their work, revenues should be taxed, and taxes invested into development. When we think about firms ‘extracting’ data from a country, perhaps through providing social media platforms and gathering digital trace data, or capturing and processing data from sensor networks, or even collecting genomic information from a biodiverse area to feed into research and product development, what regimes could or should exist to make sure benefits are shared, externalities managed, and the ‘common wealth’ that comes from the collected data, does not entirely flow out of the country, or into the pockets of a small elite?

Although real world extractives governance has often not resolved all these questions successfully, one tool in the governance toolbox has been the  Extractives Industry Transparency Initiative (EITI) . Under EITI, member countries and companies  are required to disclose information on all stages of of the extractives process: from the granting of permissions to operate, through to the taxation or revenue sharing secured, and the social and economic spending that results. The model recognises that governance failures might come from the actions of both companies, and governments – rather than assuming one or the other is the problem or benign. Although transparency alone does not solve governance problems: it can support better debate about both policy design and implementation, and can help address distorting information and power asymmetries that otherwise work against development.

So, what could an analogous initiative look like if applied to international firms involved in ‘data extraction’?

(Note: this is a rough-and-ready thought experiment testing out an extended version of an originally tweet-length thought. It is not a fully developed argument in favour of the ideas explored here).

Data as a national resource

Before conceptualising a ‘data extraction transparency initiative’ we need to first think about what counts as ‘data extraction’.  This involves considering the collected informational (and attention) resources of a population as a whole. Although data itself can be replicated (marking a key difference from finite fossil fuels and mineral resources), the generation and use of data is often rival (i.e. if I spend my time on Facebook, I’m not spending it on some other platform, and/or, some other tasks and activities),  involves first mover advantages (e.g. the first person who street view maps country X may corner the market), and can be made finite through law (e.g. someone collecting genomic material from a country may gain intellectual property rights protection for their data), or simply through restricting access (e.g. as Jeni considers here, where data is gathered from a community and used to shape policy, without the data being shared back to that community).

We could think then of data extraction as any data collection process which ‘uses up’ a common resource such as attention and time, which reduces the competitiveness of a market (thus shifting consumer to producer surplus), or which reduces the potential extent of the knowledge commons through intellectual property regimes or other restrictions on access and use.  Of course, the use of an extracted data resource may have economic and social benefits that feed back to the subjects of the extraction. The point is not that all extraction is bad, but is rather to be aware that data collection and use as an embedded process is definitely not the non-rival, infinitely replicable and zero-cost activity that some economic theories would have us believe.

(Note that underlying this lens is the idea that we should approach data extraction at the level of populations and environments, rather than trying to conceptualise individual ownership of data, and to define extraction in terms of a set of distinct transactions between firms and individuals.)

Past precedent: states and companies

Our model then for data extraction involves a relationship between firms and communities, which we will assume for the moment can be adequately represented by their states. A ‘data extractive transparency initiative’ would then be asking for disclosure from these firms at a country-by-country level, and disclosure from the states themselves. Is this reasonable to expect? 

We can find some precedents for disclosure by looking at the most recent Ranking Digital Rights Report, released last week. This describes how many firms are now providing data about government requests for content or account restriction. A number of companies produce detailed transparency reports that describe content removal requests from government, or show political advertising spend. This at least establishes the idea that voluntarily, or through regulation, it is feasible to expect firms to disclose certain aspects of their operations.

The idea that states should disclose information about their relationship with firms is also reasonably well established (if not wholly widespread). Open Contracting, and the kind of project-level disclosure of payments to government that can be see at ResourceProjects.org illustrate ways in which transparency can be brought to the government-private sector nexus.

In short, encouraging or mandating the kinds of disclosures we might consider below is not a new. Targeted transparency has long been in the regulatory toolbox.

Components of transparency

So – to continue the thought experiment: if we take some of the categories of EITI disclosure, what could this look like in a data context?

Legal framework

Countries would publish in a clear, accessible (and machine-readable?) form, details of the legal frameworks relating to privacy and data protection, intellectual property rights, and taxation of digital industries.

This should help firms to understand their legal obligations in each country, and may also make it easier for smaller firms to provide responsible services across borders without current high costs of finding the basic information needed to make sure they are complying with laws country-by-country.

Firms could also be mandated to make their policies and procedures for data handling clear, accessible (and machine-readable?).

Contracts, licenses and ownership

Whenever governments sign contracts that allow private sector to collect or control data about citizens, public spaces, or the environment, these contracts should be public. 

(In the Data as Development workshop, Sriganesh related the case  of a city that had signed a 20 year deal for broadband provision, signing over all sorts of data to the private firm involved.)

Similarly, licenses to operate, and permissions granted to firms should be clearly and publicly documented.

Recently, EITI has also focussed on beneficial ownership information: seeking to make clear who is really behind companies. For digital industries, mandating clear disclosure of corporate structure, and potentially also of the data-sharing relationships between firms (as GDPR starts to establish) could allow greater scrutiny of who is ultimately benefiting from data extraction.

Production

In the oil, gas and mining context, firms are asked to reveal production volumes (i.e. the amount extracted). The rise of country-by-country reporting, and project-level disclosure has sought to push for information on activity to be revealed not at the aggregated firm level, but in a more granular way.

For data firms, this requirement might translate into disclosure of the quantity of data (in terms of number of users, number of sensors etc.) collected from a country, or disclosure of country by country earnings.

Revenue collection

One important aspect of EITI has been an audit and reconciliation process that checks that the amounts firms claim to be paying in taxes or royalties to government match up with the amounts government claims to have received. This requires disclosure from both private firms and government.

A better understanding of whose digital activities are being taxed, and how, may support design of better policy that allows a share of revenues from data extraction to flow to the populations whose data-related resources are being exploited.

In yesterday’s workshop, Sriganesh pointed to the way in which some developing country governments now treat telecoms firms as an easy tax collection mechanism: if everyone wants a mobile phone connection, and mobile providers are already collecting payments, levying a charge on each connection, or a monthly tax, can be easy to administer. But, in the wrong places, and at the wrong levels, such taxes may capture consumer rather than producer surplus, and suppress rather than support the digital economy,

Perhaps one of the big challenges for ‘data as development’ when companies in more developed economies may extract data from developing countries, but process it back ‘at home’, is that current economic models may suggest that the biggest ‘added value’ is generated from the application of algorithms and processing. This (combined with creative accounting by big firms) can lead to little tax revenue in the countries from which data was originally extracted. Combining ‘production’ and ‘revenue’ data can at least bring this problem into view more clearly – and a strong country-by-country reporting regime may even allow governments to more accurately apply taxes.

Revenue allocation, social and economic spending

Important to the EITI model, is the idea that when governments do tax, or collect royalties, they do so on behalf of the whole polity, and they should be accountable for how they are then using the resulting resources.

By analogy, a ‘data extraction transparency initiative’ initiative may include requirements for greater transparency about how telecoms and data taxes are being used. This could further support multi-stakeholder dialogue on the kinds of public sector investments needed to support national development through use of data resources.

Environmental and social reporting

EITI encourages countries to ‘go beyond the standard and disclose other information too, including environmental information and information on gender.

Similar disclosures could also form part of a ‘data extraction transparency initiative’: encouraging or requiring firms to provide information on gender pay gaps and their environmental impact.

Is implementation possible?

So far this though experiment has established ways of thinking about ‘data extraction’ by analogy to natural resource extraction, and has identified some potential disclosures that could be made by both governments and private actors. It has done so in the context of thinking about sustainable development, and how to protect developing countries from data-exploitation, whilst also supporting them to appropriately and responsibly harness data as a developmental tool. There are some rough edges in all this: but also, I would argue, some quite feasible proposals too (disclosure of data-related contracts for example).

Large scale implementation would, of course, need careful design. The market structure, capital requirements and scale of digital and data firms is quite different to that of the natural resource industry. Compliance costs of any disclosure regime would need to be low enough to ensure that it is not only the biggest firms that can engage. Developing country governments also often have limited capacity when it comes to information management. Yet, most of the disclosures envisaged above relate to transactions that, if ‘born digital’, should be fairly easy to publish data on. And where additional machine-readable data (e.g. on laws and policies) is requested, if standards are designed well, there could be a win-win for firms and governments – for example, by allowing firms to more easily identify and select cloud providers that allow them to comply with the regulatory requirements of a particular country.

The political dimensions of implementation are, of course, another story – and one I’ll leave out of this thought experiment for now.

But why? What could the impact be?

Now we come to the real question. Even if we could create a ‘data extraction transparency initiative’, could it have any meaningful developmental impacts?

Here’s where some of the impacts could lie:

  • If firms had to report more clearly on the amount of ‘data’ they are taking out of a country, and the revenue that gives rise to, governments could tailor licensing and taxation regimes to promote more developmental uses of data. Firms would also be encouraged think about how they are investing in value-generation in countries where they operate. 
  • If contracts that involve data extraction are made public, terms that promote development can be encouraged, and those that diminish the opportunity to national development can be challenged.
  • If a country government chooses to engage in forms of ‘digital protectionism’, or to impose ‘local content requirements’ on the development of data technologies that could bring long-term benefits, but risk creating a short-term hit on the quality of digital services available in a country, greater transparency could support better policy debate. (Noting, however, that recent years have shown us that politics often trumps rational policy making in the real world).

There will inevitably be readers who see the thrust of this thought experiment as fundamentally anti-market, and who are fearful of, or ideologically opposed, to any of the kinds of government intervention that increasing transparency around data extraction might bring. It can be hard to imagine a digital future not dominated by the ever-increased rise of a small number of digital monopolies. But, from a sustainable development point of view, allowing another path to be sought: which supports to creation of resilient domestic technology industries, which prices in positive and negative externalities from data extraction, and which therefore allows active choices to be made about how national data resources are used as common asset, may be no bad thing.

The State of Open Data: Histories and Horizons – panels and conversations

The online and open access book versions ‘The State of Open Data: Histories and Horizons’ went live yesterday. Do check it out!

We’ve got an official book launch on 27th May in Ottawa, but ahead of that, I’m spending the next 8 days on the US East Coast contributing to a few of events to share learning from the project.

Over the last 18 months we’ve worked with 66 fantastic authors, and many other contributors, reviewers and editorial board members, to pull together a review of the last decade of activity on open data. The resulting collection provides short essays that look at open data in different sectors, fromaccountability and anti-corruption, to the environment, land ownership and international aid, as well as touching on cross-cutting issues, differentstakeholder perspectives, and regional experiences. We’ve tried to distill key insights in overall and section introductions, and to draw out some emerging messages in an overall conclusion.

This has been my first experience pulling together a whole book, and I’m incredibly grateful to my co-editors, Steve Walker, Mor Rubinstein, and Fernando Perini, who have worked tirelessly over the project to bring together all these contributions, make sure the project is community driven, and to present a professional final book to the world, particularly in what has been a tricky year personally. The team at our co-publishers, African Mindsand IDRC (Simon, Leith, Francois and Nola) also deserve a great debt of thanks for their attention to detail and design.

I’ll ty and write up some reflections and learning points on the book process in the near future, and will be blogging more about specific elements of the research in the coming weeks, but for now, let me share the schedule of upcoming events in case any blog readers happen to be able to join. I’ll aim to update these with links to any outcomes from the sessions too later.

Book events

Thursday 16th May – 09:00 – 11:00Future directions for open data research and action

Roundtable at the Harvard Berkman Klein Center, with chapter authors David Eaves, Mariel Garcia Montes, Nagla Rizk, and response from Luminate’s Laura Bacon.

Thursday 16th MayDeveloping the Caribbean

I’ll be connecting via hangouts to explore the connections between data literacy, artificial intelligence, and private sector engagement with open data

Monday 20th May – 12:00 – 13:00Let’s Talk Data – Does open data have an identity crisis?, World Bank I Building, Washington DC

A panel discussion as part of the World Bank Let’s Talk Data series, exploring the development of open data over the last decade. This session will also be webcast – see detail in EventBrite.

Monday 20th May – 17:30 – 19:30World Cafe & Happy Hour @ OpenGovHub, Washington DC

We’ll be bringing together authors from lots of different chapters, including Shaida Baidee (National Statistics), Catherine Weaver (Development Assistance & Humanitarian Action), Jorge Florez (Anti-corruption), Alexander Howard (Journalists and the Media), Joel Gurin (Private Sector), Christopher Wilson (Civil Society) and Anders Pedersen (Extractives) to talk about their key findings in an informal world cafe style.

Tuesday 21st MayThe State of Open Data: Open Data, Data Collaboratives and the Future of Data Stewardship, GovLab, New York

I’m joining Tariq Khokhar, Managing Director & Chief Data Scientist, Innovation, The Rockefeller Foundation, Adrienne Schmoeker, Deputy Chief Analytics Officer, City of New York and Beth Simone Noveck, Professor and Director, The GovLab, NYU Tandon (and also foreword writer for the book), to discuss changing approaches to data sharing, and how open data remains relevant.

Wednesday 22nd May – 18:00 – 20:00Small Group Session at Data & Society, New York

Join us for discussions of themes from the book, and how open data communities could or should interact with work on AI, big data, and data justice.

Monday 27th May – 17:00 – 19:30Book Launch in Ottawa

Join me and the other co-editors to celebrate the formal launch of the book!

Exploring Arts Engagement with (Open) Data

[Summary: Over the next few months I’m working with Create Gloucestershire with a brief to catalyse a range of organisational data projects. Amongst these will be a hackathon of sorts, exploring how artists and analysts might collaborate to look at the cultural education sector locally. The body of this post shares some exploratory groundwork. This is a variation cross-posted from the Create Gloucestershire website.]

Update: For part 2 – about the event we held, see Creative Lab Report: Data | Culture | Learning.

Pre-amble…

Create Gloucestershire have been exploring data for a while now, looking to understand what the ever-increasing volume of online forms, data systems and spreadsheets arts organisations encounter every day might mean for the local cultural sector. For my part, I’ve long worked with data-rich projects, focussing on topics from workers co-operatives and youth participation, to international aid and corruption in government contracting, but the cultural sector is a space I’ve not widely explored.

Often, the process of exploring data can feel like a journey into the technical: where data stands in opposition to all things creative. So, as I join CG for the next three months as a ‘digital catalyst’, working on the use of data within the organisation, I wanted to start by stepping back, and exploring the different places at which data, art and creativity meet with an exploratory blog post..

…and a local note on getting involved…

In a few weeks (late February 2019) we’ll be exploring these issues through a short early-evening workshop in Stroud: with a view to hosting a day-long data-&-art hackathon in late Spring. If you would like to find out more, drop me a line.

Post: Art meets data | Data meets art

For some, data and art are diametrically opposed. Data is about facts. Art about feelings.

Take a look at writings from the data visualisation community [1], and you will see some suggest that data art is just bad visualisation. Data visualisation, the argument runs, uses graphical presentation to communicate information concisely and clearly. Data art, by contrast, places beauty before functionality. Aesthetics before information.

Found on Flickr: “I’m not even sure what this chart says … but I think its gorgeous!” (Image CC-BY Carla Gates / Original image source: ZSL)

I prefer to see data, visualisation and art all as components of communication. Communication as the process of sharing information, knowledge and wisdom.

The DIKW pyramid proposes a relationship between Data, Information, Knowledge and Wisdom, in which information involves the representation of data into ‘knowing that’, whilst knowledge requires experience to ‘know how’, and wisdom requires perspective and trained judgement in order to ‘know why’. (Image CC BY-SA. Wikimedia Commons)

Turning data into information requires a process of organisation and contextualisation. For example, a collection of isolated facts may be made more informative when arranged into a table. That table may be made more easily intelligible when summarised through counts and averages. And it may communicate more clearly when visualisation is included.

An Information -> Data -> Information journey. GCSE Results in Arts Subjects. (Screenshots & own analysis)

But when seeking to communicate a message from the data, there is another contextualisation that matters: contextualising to the recipient: to what they already know, or what you may want to them to come to know. Here, the right tools may not only be those of analysis and visualisation, but also those of art: communicating a message shaped by the data, though not entirely composed of it.

Artistic expression could focus on a finding, or a claim from the data, or may seek to support a particular audience to explore, interrogate and draw interpretations from a dataset. (Image CC BY-SA Toby Oxborrow)

In our upcoming workshop, we’ll be taking a number of datasets about the state of cultural education in Gloucestershire, and asking what they tell us. We’ll be thinking about the different ways to make sense of the data, and the ways to communicate messages from it. My hope is that we will find different ways to express the same data, looking at the same topic from a range of different angles, and bringing in other data sources of our own. In that way, we’ll be able to learn together both about practical skills for working with data, and to explore the subjects the data represents.

In preparing for this workshop I’ve been looking at ways different practitioners have connected data and art, through a range of media, over recent years.

The Open Data Institute: Data as Culture

Since it’s inception, The Open Data Institute in London has run a programme called ‘Data as culture’, commissioning artists to respond to the increasing datification of society.

Some works take a relatively direct approach to representation, selecting particular streams of data from the web and using different media to represent them. Text trends, for example, selected and counterposes different google search trends on a simple graph over time. And the ODIs infamous vending machine provides free crisps in response to news media mentions of recession.

Text Trends. From ODI Website and Data Soliloquies book.

In representative works, the artist has chosen the signal to focus on, and the context in which it is presented. However, the underlying data remains more or less legible, and depending on the contextual media and the literacies of the ‘reader’, certain factual information can also be extracted from the artwork. Whilst it might be more time-consuming to read, the effort demanded by both the act of creation, and the act of reading, may invite a deeper engagement with the phenomena described by the data. London EC2 explores this idea of changing the message through changing the media: by woodblock printing twitter messages, thus slowing down the pace of social media, encouraging the viewer to rethink otherwise ephemeral information.

In other works that are directly driven by datasets, data is used more to convey an impression rather than to convey specific information. In the knitted Punchcard Economy banners, a representation working hours is combined with a pre-defined message resulting in data that can be read as texture, more than it can be read as pattern. In choosing how far to ‘arrange’ the data, the work finds its place on a spectrum between visualisation or aesthetic organisation.

Punchcard Economy, Sam Meech, 2013. ODI: 3.5 x 0.5m knitted banner, FutureEverything: 5 x 3m knitted banner & knitting machines.

Other works in the data as culture collection start not from datasets, but from artists responses to wider trends of datification. Works such as metographyflipped clock and horizon respond to forms of data and it’s presentation in the modern world, raising questions about data and representation – but not necessarily about the specific data which happens to form part of the work.

Flipped Clock, Thomson & Craighead, 2008. ODI Data as Culture.

Other works still, look for the data within art, such as pixelquipu which takes it’s structure from pre-Columbian quipu (necklace-shaped, knotted threads from the Inca empire, that are thought to contain information relating to calendars and accounting in the empire). In these cases, turning information into data, and then representing it back in other way, is used to explore patterns that might not have otherwise been visible.

YoHa: Invisible Airs

Although it has also featured in the ODI’s Data as Culture collection, I want to draw out and look specifically at YoHa’s ‘Invisible Airs’ project. Not least because it was the first real work of ‘open data art’ I encountered, stumbling across it at an event in Bristol.

As newly released public spending records appear on screen, a pneumatically powered knife stabs a library book, sending a message about budget cuts, and inviting scrutiny of the data on screen.

It is a hard project to describe, but fortunately YoHa have a detailed project description and video on their website, showing the contraptions (participatory kinetic sculptures?) they created in 2014, driven by pneumatic tubes and actuated by information from Bristol City Council’s database of public spending.


In the video, Graham Harwood describes how their different creations (from a bike seat that rises up in response to spending transactions, to a pneumatic knife stabbing a book to highlight library service cuts) seek to ‘de-normalise’ data, not in the database designers sense of finding a suitable level of data abstraction, but in the sense of engaging the participant to understand otherwise dry data in new ways. The learning from the project is also instructive: in terms of exploring how far the works kept the attention of those engaging with them, or how far they were able to communicate only a conceptual point, before viewers attention fell away, and messages from the underlying data were lost.

Ultimately though, Invisible Airs (and other YoHa works engaging with the theme of data) are not so much communicating data, as communicating again about the role, and power, of data in our society. Their work seeks to bring databases, rather than the individual data items they contain, into view. As project commissioner Prof Jon Dovey puts it, “If you are interested in the way that power works, if you are interested in the way that local government works, if you are interested in the way that corporations work, if you are interested in the way that the state works, then data is at the heart of it…. The way your council tax gets calculated… the way your education budget gets calculated, all these things function through databases.”

Everyday data arts

Data as art need not involve costly commissions. For example, the media recently picked up on the story of a german commuter who had knitted a ‘train-delay scarf’, with choice of wool and colour representing length of delays. The act of creating was both a means to record, and to communicate, and in the process communicate much more effectively than the same data might have done if simply recorded in a spreadsheet, or even placed onto a chart with data visualisation.

‘Train Delay Scarf’ – a twitter sensation in January 2019.

Data sculpture and data-driven music

In a 2011 TED Talk, Nathalie Miebach has explored both how weather data can be turned into a work of art through sculpture and music, as well as questioning how the setting in which the resulting work is show affects how it is perceived.

She describes the creation of a vocabulary for turning the data into a creative work, but also the choice of a media that is not entirely controlled by the data, such that the resulting work is not entirely determined by the data, but also by its interaction with other environmental factors.

Dance your PhD, and dancing data

When reflecting on data and art, I was reminded of the annual Dance your PhD competition. Although the focus is more on expressing algorithms and research findings, than underlying datasets, it offers a useful way to reflect on ways to explain data, not only express what it contains.

In a similar vein, AlgoRythmics explain sorting algorithms using folk dance – a playful way of explaining what’s going on inside the machine when processing data.

There is an interesting distinction though between these two. Whilst Dance your PhD entries generally ‘annotate’ the dance with text to explain the phenomena that the dance engages with audience with, in AlgoRythmics, the dance itself is the entirety of the explanation.

Visualisation

The fields of InfoViz and DataViz have exploded over the last decade. Blog such as InformationIsBeautiful, Flowing Data and Visualising Data provide a regular dose of new maps, charts and novel presentation of data. However, InfoViz and DataViz are not simply synonyms: they represent work that starts from different points of a Data/Information/Knowledge model, and with often different goals in mind.

Take, for example, David McCandless’ work in the ‘Information in Beautiful’ book (also presented in this TED Talk). The images, although often based on data, are not a direct visualisation of the data, but an editorialised story. The data has already been analysed to identify a message before it is presented through charts, maps and diagrams.

 

By contrast, in Edward Tufte’s work on data visualisation, or even statistical graphics, the role of visualisation is to present data in order to support the analytical process and the discovery of information. Tufte talks of ‘the thinking eye’, highlighting the way in which patterns that may be invisible when data is presented numerically, can become visible and intelligible when the right visual representation is chosen. However, for Tufte, the idea of the correct approach to visualisation is important: presenting data effectively is both an art and a technical skill, informed by insights and research from art and design, but fundamentally something that can be done right, or done wrong.

Graphical Practices: Page 14 of Edward Tufte ‘The Visual Display of Quantitative Information

Other data visualisation falls somewhere between the extremes I’ve painted here. Exploratory data visualisations can seek to both support analysis, but also to tell a particular story through their selection of visualisation approach. A look at the winners of the recent 360 Giving Data Visualisation Challenge illustrates this well. Each of these visualisation draws on the same open dataset about grant making, but where ‘A drop in the bucket’ uses a playful animation to highlight the size of grants from different funders, Funding Themes extracts topics from the data and presents an interactive visualisation, inviting users to ‘drill down’ into the data and explore it in more depth. Others, like trend engine use more of a dashboard approach to present data, allowing the user to skim through and find, if not complete answers, at least refined questions that they may want to ask of the raw dataset.

Funding Trends for a ‘cluster’ of arts-related grants, drawing on 360 Giving data. Creator: Xavi Gimenez

Arts meet data | Data meet arts | Brokering introductions

Writing this post has given me a starting point to explore some data-art-dichotomies and to survey and link to a range of shared examples that might be useful for conversations in the coming weeks.

It’s also sparked some ideas for workshop methods we might be able to use to keep analytical, interpretative and communicative modes in mind when planning for a hackathon later this year. But that will have to wait for a future post…

 

Footnotes

[1]: I am overstating the argument in the blog post on art and data visualisation slightly for effect. The post, and comments in fact offer a nuanced dialogue worth exploring on the relationship of data visualisation and art, although still seeking to draw a clear disjunct relationship.