Overcoming posting-paralysis?

[Summary: I’m trying to post a bit more critical reflection on things I read, and to write up more of my learning in shared space. I’ve been exploring why that’s been feeling difficult of late.] 

Reading, blogging and engaging through social media used to be a fairly central part of my reflective learning practice. In recent years, my reading, note-taking and posting practices have become quite frayed. Although many times I get as far as a draft post or tweet thread of reflections, I’m often hit by a posting-paralysis – and I stop short both of engaging in open conversation, and solidifying my own reflections through a public post. As I return to a mix of freelance facilitation, research and project work (more on that soon), I’m keen to recover an open learning practice that makes effective use of online spaces

Inspired by Lloyd Davis’ explorations in ‘learning how to work out loud again’(appropriately so, since Lloyd’s earlier community convening and event hosting was a big influence on much of my earlier practice), I’m taking a bit of time in my first few days weeks back at work to identify what I want from a reflective learning practice, to try and examine the barriers I’ve been encountering, and to prototype the tools, processes and principles that might help me recapture some of the thinking space that, at it’s best, the online realm can still (I hope) provide.

Why post anyway?

The caption of David Eaves’ blog comes to mind: “if writing is a muscle, this is my gym”. And linked: writing is a tool of thought. So, if I want to think properly about the things I’m reading and engaging with, I need to be writing about them. And writing a blog post, or constructing a tweet thread, can be a very effective way to push that writing (and thinking) beyond rough bullet points, to more complete thoughts. Such posts often work well as external memory: more than once I’ve searched for an idea, and come upon a blog post I wrote about it many years ago – rediscovering content I might not have found had it been buried in a personal notebook. (It turns out comments from spam bots are also a good ‘random-access-memory-prompt’ on a wordpress blog.)

I’ve also long been influenced by my colleague Bill Badham’s commitment to shared learning. My work often affords me the privilege to read, research and reflect – and there’s something of an obligation to openly share the learning that arises. On a related note, I’m heavily influenced by notions of open academic debate, where there’s a culture (albeit not uncomplicated) of raising questions or challenging data, assumptions and conclusions in the interest of getting to better answers.

So what’s stopping you?

At risk of harking back to a golden age of RSS and blogging, that died along with Google Reader, I suspect I need to consciously adapt my practices to a changed landscape.

Online platforms have changed. I felt most fluent in a time of independent bloggers, slowly reading and responding to each other over a matter of days and weeks. Today, I discover most content via Tweets rather than RSS, and conversations appear to have a much shorter half-life, often fragmenting off into walled garden spaces, of fizzling out half completed as they get lost between different timezones. I’m reluctant to join discussions on walled garden platforms like Facebook, and often find it hard to form thoughts adequately in tweet length.

My networks have changed. At the macro level, online spaces (and public discourse more generally) feels more polarised and quick to anger: although I only find this when I voyage outside the relatively civil filter bubble of online I seem to have built. On the upside, I feel as thought the people I’m surrounded with online are more global, and more diverse (in part, from a conscious effort to seek more gender balance and diversity in who I follow): but on the flip-side, I’m acutely aware that when I write I can’t assume I’m writing into a common culture, or that what I intend as friendly constructive critique will be read as such. Linked to this:

I’m more aware of unintended consequences of a careless post. In particular, I’m aware that, as a cis white male online, I don’t experience even half of the background aggression, abuse, gaslighting or general frustration that many professional women, people of colour, or people from minority communities may encounter daily. What, for me, might be a quick comment on something I’ve read, could come across to others as ‘yet another’ critical comment rather than the ‘Yes, and’ I meant it to be.

There are lots of subtleties to navigate around when an @ mention might be seen as a hat-tip credit, vs. when it might be an unwelcome interruption.

My role has changedI still generally think of myself as a learner and junior practitioner, just trying to think out loud. But I’ve become aware from a couple of experiences that sometimes people take what I write more seriously! And that can be a little bit scary, or can place a different pressure on what I’m writing. Am I writing for my own process of thinking? Writing for others? Or writing for impact? Will my critical commentary be taken as having a weight I did not intend? And at the times when I do intend to write in order to influence, rather than just offer a viewpoint, do I need different practices?

My capacity, and focus, has changedThe pandemic and parenthood have squeezed out many of the time-slots I used to use for reflective writing: the train back from London, the early evening and so-on. I’m trying to keep social media engagement to my working hours too, to avoid distractions and disruption during time with family.

Over editingA lot of the work I’ve done over recent years has involved editing text from others, and it’s made me less comfortable with the flow-of-writing, overly subclaused, and less-than-perfectly-clear sentences I’m prone to blogging with. (Though I can still resist that inner editor, as this mess of a paragraph attests: I am writing mainly for my own thinking after all.)

So what do I do about it?

Well – I’m certainly not over posting paralysis: this post has been sitting in draft for a week now. But in the process of putting it together I’ve been exploring a few things:

  • A more conscious reading practice
  • Improving my note-taking tools
  • Linking blogging and social media use
  • Not putting too much pressure on public posting

I’ve brought scattered notes from the last few years together into a tiddlywiki instance, and have started trying to keep a daily journal there for ad-hoc notes from articles or papers I’m reading – worrying less about perfect curation of notes, and more about just capturing reflections as they arise. I’ve reset my feed reader, and bookmarking tools to better manage a reading list, and am trying to think more carefully about the time to give to reading different things.

I’ve also tried getting back to a blog-post format for responding to things I’m reading, rather than trying twitter threads, which, whilst they might have a more immediate ‘reach’, often feel to me both a bit forced, and demand more immediate follow-up to engage with than my capacity allows.

I was considering setting myself an artificial goal of posting daily or weekly, but for now I’m going to allow a more organic flow of posting, and review in a few weeks to see if developing the diagnosis, and some of the initial steps above, are getting my practice closer to where I want it to be.

I’ve posted this.

Reflections on “Participatory data stewardship: A framework for involving people in the use of data”

I read with interest the new Ada Lovelace report Participatory data stewardship: A framework for involving people in the use of data, not least because it connects two fields I’ve spent a good while exploring: participation & data governance.

Below I’ve shared a few quick notes in a spirit of open reflection (read mostly as ‘Yes, and…‘ rather than ‘No, but’):

The ladder: Arnstein, Hart and Pathways of Participation

Arnstein’s ladder of participation.
RSA ‘Remix’ of Arnstein

The report describes drawing on Sherry Arnstein’s ‘ladder of citizen participation’, but in practice uses an RSA simplification of the ladder into a five-part spectrum that cuts off the critical non-participation layers of Arnstein’s model. In doing this, it removes some of the key critical power of the original ladder as a tool to call out tokenism, and push for organisations to reach the highest appropriate rung that maximises the transfer of power.

I’ve worked with various remixes of Arnstein’s ladder over the years, particularly building on building on Hart’s youth engagement remix) that draws attention to distinction between ‘participant initiated’ vs. ‘organisationally initiated’ decision making. In one remix we put forward with Bill Badham and the NYA Youth Participation Team we set the ladder against the range of methods of participations, and explored the need for any participation architecture to think about the pathways of participation through which individuals grow in their capacity to exercise power over decisions.

It would be great to see further developments of the Ada Lovelace framework consider the mix of participatory methods that are appropriate to certain data use contexts, and how these can be linked together. For example, informing all affected stakeholders about a data use project can be the first rung on the ladder towards a smaller number becoming co-designers, joint decision makers, or evaluators. And to design a meaningful consultation reaching a large proportion of affected stakeholders might require co-design or testing with a smaller group of diverse collaborators first: making sure that questions are framed and explained in legitimate and accessible ways.

Data collection, datasets, and data use

“Well-managed data can support organisations, researchers, governments and corporations to conduct lifesaving health research, reduce environmental harms and produce societal value for individuals and communities. But these benefits are often overshadowed by harms, as current practices in data collection, storage, sharing and use have led to high-profile misuses of personal data, data breaches and sharing scandals.”

It feels to me as though the report falls slightly (though, to be fair, not entirely) into the trap of seeing data as a pre-existing fixed resource, where the main questions to be discussed are who will access a dataset, on what terms and to what end. Yet, data is under constant construction, and in participatory data stewardship there should perhaps be a wider set of questions explicitly on the table such as:

  • Should this data exist at all?
  • How can this data be better collected in ways that respect stakeholders needs?
  • What data is missing that should be here? Are we considering the ‘lost opportunites’ as well as the risks of misuse?
  • Is this data structured in ways that properly represent the interests of all stakeholders?

Personally, I’m particularly interested in the governance role of data standards and structures, and exploring models to increase diverse participation in shaping these foundational parts of data infrastructure.

Decentering the dataset

The report argues that:

“There are currently few examples of participatory approaches to govern access and use of data…”

yet, I wonder if this comes from looking through a narrow lens for projects that are framed as just about the data. I’d hazard that there are numerous social change and public sector improvement projects have drawn upon data-sharing partnerships – albeit framed in terms of service or community change, rather than data-sharing per-se.

In both understanding existing practice, and thinking about the future of participatory data governance practices, I suspect we need to look at how questions about data use are embedded within wider questions about the kinds of change we are seeking to create. For example, if a project is planning to pool data from multiple organisations to identify ‘at risk families’ and to target interventions, a participatory process should take in both questions of data governance and intervention design – as to treat the data process in issolation of the wider use process makes for a less accessible, and potentially substantially biased, process.

Direct participation vs. representatives

One of the things the matrix of (youth) participation model tries to draw out is the distinction between participatory modalities based on ad-hoc individual involvement where stakeholders participate directly, through to those that involve sustained patterns of engagement, but that often move towards models of representative governance. Knowing whether you are aiming for direct or representative-driven participation is an important part of then answering the question ‘Who to involve?’, and being clear on the kind of structures needed to then support meaningful participation.

Where next?

It’s great to see participatory models of data governance on the agenda of groups like Ada Lovelace – although it also feels like there’s a way still to go to see many decades learning from the participation field better connecting with the kinds of technical decisions that affect so many lives.

Integrating Loss and Hope

Five years ago this week we lost our first daughter to a late miscarriage.  I quietly collapsed. Unable to write, I abandoned my PhD dissertation – and have rarely felt fluent in writing since. We named our daughter Hope, because, after a long-time trying to conceive, she had given us hope of having a family.  Over the next two years we had two more miscarriages. I put my energy into work and politics. In 2019, walking the Coast to Coast route, we decided to explore adoption. Later this year we’re hoping to be granted an adoption order for two children placed with us last Autumn.

I’ve tried, and failed, to write about these experiences before. Miscarriage is a complicated and often very private loss. Adoption equally lacks simple narratives: in many cases everyone, adult and child, come to it from a place of both loss and of hope. All stories involving family are shared stories where each party to them has different experiences, needs for acknowledgement, and needs for privacy. And adopting during a pandemic complicates the hope of building new family, with the loss of normal social interactions and clarity about the future.

However, to leave these experiences out of the public ‘biography’ created through various writings and work here or elsewhere, creates a gap in my own story that I’ve found increasingly difficult: particularly as my focus in the coming years will be much more on the equal parenting two children with my partner than on the kinds of projects and work I’ve done in the past. As this comes to transform both what I work on and how, there is both hope for new adventures and growth, and a loss to acknowledge, familiar to many parents I’m sure, of established identity, roles and routines.

As the personal is never separate from the social, I also find it important to recognise the last year as one of profound societal and individual losses across the world: both directly from the pandemic, and from the wider environmental and political challenges it has placed into sharp relief. At the same time, the last year has provided glimpses of hope for new ways of living more connected and sustainable lives.

Returning to the personal: in many ways, it feels as though my last five years have been a time of living with loss, but acting with hope. I look towards future years of living with hope, but acting everyday in recognition of, and learning from, loss.

Coda

In the short liturgy we held to remember our first daughter, we used words from Emily Dickinson that I’m now reminded of daily when our children take joy in seeing the birds in the garden:

“Hope” is the thing with feathers –

That perches in the soul –

And sings the tune without the words –

And never stops – at all –

And sweetest – in the Gale – is heard –

And sore must be the storm –

That could abash the little Bird

That kept so many warm –

I’ve heard it in the chillest land –

And on the strangest Sea –

Yet – never – in Extremity,

It asked a crumb – of me.

Emily Dickinson, “Hope” is the thing with feathers

How might a Data Pledge function?

[Summary: Reflections on the design of ITU Data Pledge project]

The ITU, under their “Global Initiative on AI and Data Commons have launched a process to create a ‘Data Pledge’, designed as a mechanism to facilitate increased data sharing in order to support “response to humanity’s greatest challenges” and to ”help support and make available data as a common global resource.”.

Described as complementary to existing work such as the International Open Data Charter, the Pledge is framed as a tool to ‘collectively make data available when it matters’, with early scoping work discussing the idea of conditional pledges linked to ‘trigger events’, such that an organisation might promise to make information available specifically in a disaster context, such as the current COVID-19 Pandemic. Full development of the Pledge is taking place through a set of open working groups.

This post briefly explores some of the ways in which a Data Pledge could function, and considers some of the implications of different design approaches.

[Context: I’ve participated in one working group call around the data pledge project in my role as Project Director of the Global Data Barometer, and this is written up in a spirit of open collaboration. I have no formal role in the data pledge project..]

Governments, civil society or private sector

Should a pledge be tailored specifically to one sector? Frameworks for governments to open data are already reasonably well developed, as our mechanisms that could be used for governments to collaborate on improving standards and practices of data sharing.

However, in the private sector (and to some extent, in Civil Society), approaches to data sharing for the public good (whether as data philanthropy, or participation in data collaboratives are much less developed – and are likely the place in which a new initiative could have the greatest impact.

Individual or collective action problems

PledgeBank, a MySociety project that ran from 2005 to 2015, explored the idea of pledging as a solution to collective action problems. Pledges of the form: “I’ll do something, if a certain number of people will help me” are now familiar in some senses through crowdfunding sites and other online spaces. A Data Pledge could be modelled on the same logic – focussing on addressing those collective action problems either where:

  • A single firm doesn’t want to share certain data because doing so, when no-one else is, might have competitive impacts: but if a certain share of the market are sharing this data, it no longer has competitive significance, and instead it’s public good value can be realised.
  • The value of certain data is only realised as a result of network effects, when multiple firms are sharing similar and standardised data – but the effort of standardising and sharing data is non-negligible. In these cases, a firm might want to know that there is going to be a Social Return on Investment before putting resources into sharing the data.

However, this does introduce some complexity into the idea of pledging (and the actions pledged) and might, as PledgeBank found, lead also to lots of unrealised potential.

Pledging can also be approached as a means of solving individual motivational problems: helping firms to overcome inertia that means they are not sharing data which could have social value. Here, a pledge is more about making a statement of intent, which garners positive attention, and which commits the firm to a course of action that should eventually result in shared data.

Both forms of pledging can function as useful signalling – highlighting data that might be available in future, and priming potential ecosystems of intermediaries and users.

An organisational or dataset-specific pledge

Should a Pledge be about a general principle of data sharing for social good? Or about sharing a specific dataset? It may be useful to think about the architecture of the Data Pledge involving both: or at least, optionally involving data-specific pledges, under a general pledge to support data sharing for social good.

Think about organisational dynamics. Individual teams in a large organisation may have lots of data they could safely and appropriately share more widely for social good uses, but they do not feel empowered to even start thinking about this. A high-level organisational pledge (e.g. “We commit to share data for social good whenever we can do so in ways that do not undermine privacy or commercial position”) that sets an intention of a firm to support data philanthropy, participate in data collaboratives, and provide non-competitive data as open data, could provide the backing that teams across the organisation need to take steps in that direction.

At the same time, there may be certain significant datasets and data sources that can only be shared with significant high-level leadership from the organisation, or where signalling the specific data that might be released, or purposes it might be released for, can help address the collective action issues noted above. For these, dataset specific pledging (e.g. “We commit to share this specific dataset for the social good in circumstance X ”) can have significant value.

Triggers as required or optional

Should a pledge be structured to place emphasis on ‘trigger conditions’ for data sharing? Some articulations of the Data Pledge appear to think of it as a bank of data that could be shared in particular crisis situations. E.g. “We’ll share detailed supply chain information for affected areas if there is a disaster situation.”.  There are certainly datasets of value that might not be listed as a Pledge unless trigger conditions can be described, but it’s important that the design of a pledge does not present triggers as essentially shifting any of the work on data sharing to some future point. Preparing for data to be used well and responsibly in a crisis situation requires work in advance of the trigger events: aligning datasets, identifying how they might be used, and accounting carefully for possible unintended consequences that need to be mitigated against.

There are also many global crisis we face that are present and ongoing: the climate crisis, migration, and our collective failure to be on track against the Sustainable Development Goals.

Brokering and curating

Data is always about something, and different datasets exist within (and across) different data communities and cultures. To operationalise a pledge will involve linking actors pledging to share data into relevant data communities: where they can understand user needs in more depth, and be able to publish with purpose.

The architecture of a Data Pledge, and of any supporting initiative around it, will need to consider how to curate and connect the many organisations that might engage – building thematic conversations, spotting thematic spaces where a critical mass of pledges might unlock new social value, or identifying areas where there are barriers stopping pledges turning into data flows.

Incorporating context, consent and responsible data principles

Increased data sharing is not an unalloyed good. Approaching data for the public good involves balancing openness and sharing, with robust principles and practices of data protection and ethics, including attention to data minimisation, individual rights, group data privacy, indigenous data sovereignty and dataset bias. Data should also be shared with clear documentation of it’s context, allowing an understanding of its affordances and limitations, and supporting debate over how data ecosystems can be improved in service of social justice.

A Pledge has an opportunity to both set the bar for responsible data practice, and to incentivise organisational thinking about these issues, by including terms that require pledging organisations to uphold high standards of data protection, only sharing personal data with clear informed consent or personal-derived data after clear processes that consider privacy, human rights and bias impacts of data sharing. Similarly, organisations could be asked to commit to putting their data in context when it is shared, and to engaging collaboratives with data users.

There may also be principles to incorporate here about transparency of data sharing arrangements – supporting development of norms about publishing clearly (a) who data is shared with and for what purpose; and (b) the privacy impact assessments carried out in advance of such shares.

Conditional on capacity?

Should pledging organisations be able to signal that they would need resources in order to make certain data available? I.e. We have Dataset X which has a certain social value: but we can’t afford to make this available with our internal resources? For low-resource organisations, including SMEs or organisations operating in low income economies, this could be a way to signal to philanthropic projects like data.org a need for support. But it could also be used by higher-resource organisations to put a barrier in front of data sharing. However, if a Pledge targets civil society pledgees, then allowing some way to indicate capacity needs if data is to be shared is likely to be particularly important.

A synthesis sketch

Whilst ideologically, I’d favour a focus on building and governing data commons, more directly addressing the modern ‘enclosure’ of data by private firms, and not forgetting the importance of proper taxation of data-related businesses to finance provision of public goods, if it’s viable to treat a data pledge as a pragmatic tool to increase availability for data for social good uses, then I’d sketch the following structure:

  • Target private sector organisations
  • A three part pledge
    • 1. A general organisational commitment to treat data as a resource for the public good;
    • 2. A linked organisational commitment to responsible data practices whenever sharing data;
    • 3. An optional set of dataset specific pledges, each with optional trigger conditions
  • A platform allowing pledging organisations to profile their pledges, detail contact points for specific datasets and contact points for organisation-wide data stewards, and to connect with potential data users;
  • A programme of work to identify pre-work needed to allow data to be effectively used if trigger conditions are met ;

Rhodes must fall

17 years ago I was an undergraduate at Oriel College, Oxford. I lived for my first year in the ‘Rhodes building’ – not many metres from the statue of Cecil Rhodes that adorns the front of the building.

The only narrative of Rhodes I recall from that time, was one of the college’s proud connection to its alumni and benefactor. To my shame, whilst with student campaigners I was active against contemporary donations to the University that appeared to buy naming rights and launder the reputations of questionable modern day donors – I left unexplored how the ongoing honouring of past donors had allowed them to ‘buy’ a ‘controversial reputation instead of the condemnation their actions deserve. Nor did I consider then how the memorialisation of Rhodes plays a part (even if small compared to other factors) in perpetuating the continued exclusion of marginalised communities from Oxford, and in reinforcing barriers to people from (Oxford) minorities taking greater ownership over the institutions of the University. 

The college has a (belated) opportunity to make the right statement with the removal of the Rhodes statue. Leave it there, and Rhodes remains a ‘controversial figure’ and the college an institution concerned only with reproducing “an educated ruling class” (to quote from the college’s essay on Rhodes). Move it to a museum where it belongs, and the conversation with every undergraduate can be about our importance of questioning and learning from history – using education as a means of creating a more just future. The teachable moment will be all the stronger when the statue’s niche stands empty. 

Rhodes must fall.

Open Contracting & Inclusion – notes from an online discussion

[Summary: Exploring inclusion impacts of data and standards in response to a paper on Open Contracting & inclusion]

Yesterday I had the pleasure of joining a call hosted by HIVOS, and chaired by ILDA’s Ana Sofia Ruiz, to discuss a recent paper from Michael Canares and François van Schalkwyk on “Open Contracting and Inclusion”. The paper is well worth a read, and includes a review of five cases against a theoretical framework looking across data flows, opportunities for action, infomediary presence, and through to inclusion outcomes (see table below for example of how these play out in a few of the cases reviewed)

Table 2: Summary of conditions met by the cases with regard to open contracting and social inclusion

After the discussion, we were asked to summarise some of our inputs – hopefully feeding into a wide write-up. However, in case what I’ve written up doesn’t really fit the format of that, I’m posting a cleaned up and slightly expanded version of the remarks I made below:

This paper, and the discussion around it, raises a number of valuable questions – drawing on a rich theoretical landscape to post them. 

Firstly, it asks us “How are data flows being disrupted?”. This question is important, because in many ‘open contracting’ projects it is rarely explicitly asked. We’re living in a time of mass disruption, yet open contracting is often ‘sold’ as a kind of reform. One of the widely used success stories for work on open contracting data comes from Ukraine, where there was a true disruption in data flows – using the moment of revolution to reconfigure patterns of procurement, and to create data infrastructures that enabled those new more open practices. 

Secondly, this paper calls on us to question “what is the value of data in bringing about inclusion?” In the past we’ve talked about whether open data is either necessary or sufficient to create change. The answer I take from this paper is that increased accessibility of information and data is ‘a very useful, but nowhere near sufficient’ condition for inclusive change. 

Thirdly, the use of Castell’s framework from Communication Power of ‘network power’ (shaping the information that can be transmitted), ‘networking power’ (gatekeeping which information is transmitted), and ‘networked power’ (control by one node in the network of others), and ideas of ‘programming the network’ and ‘reprogramming the network’, raise some critical questions about the role of data standards. Often treated as neutral artefacts, standards are in fact sites of power, and of the negotiation of network and networking power. A standard defines what can be expressed, and its implementation involves choosing what will be expressed. Standards can be at once tools that cross contexts, taking with them the potential of inclusion and exclusion (network power), and at the same time, have that potential left inert if the localised networking power decides not to take up inclusion oriented features. 

To put this more concretely (if still a little complex I fear), the Open Contracting Data Standard was explicitly designed with a technical architecture that permits data about any given contracting process to be published by any actor, not only the ‘official’ information provider, and with a mechanism for extensions, supporting new fields of data to be attached to a contracting process. The ‘protocol’ sought to be inclusive. However, in practice, most tools have not been built to exploit this feature – meaning that in practice, the ‘platforms’ that exist don’t support inclusion of alternative perspectives on the state of a contracting process. This highlights that even at the level of the technical infrastructures, these are not made once, but have to be constantly remade, and their inclusive potential reinforced.

Fourth, the paper calls for a renewed focus on both governance context, and on intermediaries. Whilst technical artefacts can cross contexts, intermediary capacity building needs significant investment setting-by-setting. Equally, the discussion brought into view that this cannot be a short-term process. Intermediaries need not only skills, but also stocks of trust, in order to broker connections and communication. One of the evaluation team who had worked on a case covered by the paper discussed how it was individuals’ ability to maintain trusted relationships across different stakeholder groups that was critical to connecting information and empowerment. The importance of this cannot be overstated. 

Fifth, and finally, in his opening statement, Michael Canares challenged us to consider whether Open Contracting is different from other public sector reforms? After all – there have been decades of procurement reform. To this, I’m prepared to advance an answer: There is a meaningful qualitative difference with government reforms that start from the premise of openness. When a commitment to being open by default is put into practice, the configuration of actors involved in creating change is different, and conventional patterns of bureaucratic reform can be disrupted. Whether they are disrupted or not depends still on individual internal and external actors, and on whether the culture, as well as the practice, of openness has been brought into play. Nevertheless, Open Contracting has certain potential that is simply absent from past procurement reforms – and that is something to continue to build on.

The challenge ahead now is to work out what to do with these questions. We’re starting to unpack the complexity of open contracting practice – and the nuances for each individual setting. But, if all we have are critical questions, we risk inaction rather than advances in inclusion. During the early development of the Open Contracting Data Standard we often turned to the mantra that we should not let the perfect be the enemy of the good. This carries forward: as we avoid the perfect being the enemy of making things better. I’d contend that we need to continue turning our learning into tooling – whether technical tools, evaluation frameworks, to simple planning tools for new initiatives. Only then can we be part of taking on the large scale reforms that this time of disruption needs. 

Inclusive AI needs inclusive data standards

[Summary: following the Bellagio Center thematic month on AI last year, I was asked to write up some brief notes on where data standards fit into contemporary debates on AI governance. The below article has just been published in the Rockefeller ‘notebook’ AI+1: Shaping our Integrated Future*]

Copy of the AI+1 Publication, open at this chapter

Modern AI was hailed as bringing about ‘the end of theory’. To generate insight and action no longer would we need to structure the questions we ask of data. Rather, with enough data, and smart enough algorithms, patterns would emerge. In this world trained AI models would give the ‘right’ outcomes, even if we didn’t understand how they did this. 

Today this theory-free approach to AI is under attack. Scholars have called out the ‘bias in, bias out’ problem of machine-learning systems, showing that biased datasets create biased models — and, by extension, biased predictions. That’s why policy makers now demand that if AI systems are used to make public decisions, their models need to be ‘explainable’, offering justifications for the predictions they make. 

Yet, a deeper problem is rarely addressed. It is not just the selection of training data, or the design of algorithms, that embeds bias and fails to represent the world we want to live in. The underlying data structures and infrastructures on which AI is founded were rarely built with AI uses in mind, and the data standards — or lack thereof — used by those datasets place hard limits on what AI can deliver. 

Questionable assumptions

From form fields for gender that only offer a binary choice, to disagreements over whether or not a company’s registration number should be a required field when applying for a government contract, data standards define the information that will be available to machine-learning systems. They set in stone hidden assumptions and taken-for-granted categories that make possible certain conclusions, while ruling others out, before the algorithm even runs. Data standards tell you what to record, and how to represent it. They embody particular world views, and shape the data that shapes decisions. 

For corporations planning to use machine-learning models with their own data, creating a new data field or adapting available data to feed the model may be relatively easy. But for the public good uses of AI, which frequently draw on data from many independent agencies, individuals or sectors, syncing data structures is a challenging task. 

Opening up AI infrastructure

However, there is hope. A number of open data standards projects have launched since 2010. 

They include the International Aid Transparency Initiative (IATI) — which works with international aid donors to encourage them to publish project information in a common structure — and HXL, the Humanitarian eXchange Language, which offers a lightweight approach to structure spreadsheets with ‘Who, What, Where’ information from different agencies engaged in disaster response activities. 

When these standards work well, they allow a broad community to share data that represents their own reality, and make data interoperable with that from others. But for this to happen, standards must be designed with broad participation so that they avoid design choices that embed problematic cultural assumptions, create unequal power dynamics, or strike the wrong balance between comprehensive representation of the world and simple data preparation. Without the right balance certain populations may drop out of the data sharing process altogether. 

To use AI for the public good, we need to focus on the data substrata on which AI systems are built. This requires a primary focus on data standards, and far more inclusive standards development processes. Even if machine learning allows us to ask questions of data in new ways, we cannot shirk our responsibility to consciously design data infrastructures that make possible meaningful and socially just answers.

 

*I’ve only got print copies of the publication right now: happy to share locally in Stroud, and will update with a link to digital versions when available. Thanks to Dor Glick at Rockefeller for the invite and brief for this piece, and to Carolyn Whelan for editing.

Aligning Insight: standardisation and data collection for COVID-19 responses

[Summary: a brain-dump of thoughts on approaches to data standardisation relevant in the current coronavirus context.]

Over the last few weeks I’ve talked with a number of initiatives that are seeking to bring greater coherence to data collection on the impacts that coronavirus is having on their constituencies. Thousands of organisations, from chambers of commerce, to charity networks, and international agencies, are sending out surveys, or soliciting inputs, to help them understand the social, economic, organisational and operational impacts of the current pandemic – and to start charting ways forward in response.

This has led to a number of conversations asking how data standards could help. Common fears of wasted effort in duplicate data collection, missed insights from siloed data, or confusion created by incompatible categorisations, are all being compounded by the rapid data collection needs in this crisis. Yet, creating new standards can be a time-consuming process: involving in-depth negotiation of different user needs and capacities, careful drafting of definitions, and rigorous testing of schemas, in order to develop something that can function as an equitable tool for long-term communication and collaboration. That doesn’t mean, however, that it’s not possible to iterate towards more aligned and standardised data right now.

In this post I’ll try and set out a few (non-exhaustive) considerations on where some of the data standardisation practices I’ve engaged with over recent years fit in the current landscape, and some approaches to move towards aligning data collection initiatives.

Documentation, documentation, documentation

There are a couple of different parts of a data standard, including definitions that describe what the data should cover, and what each field is about and schemas that determine how the data should be encoded, serialised and shared. But it is documentation that brings these together, and makes them widely usable.

Good documentation should allow people designing data collection instruments (surveys, studies etc.) to quickly identify the building blocks of standardisation that they can draw upon, and should make following the standard the path of least resistance, rather than an uphill struggle.

Ideally documentation should be clearly versioned, and, if intended for global use, published in ways that support language translation.

Start from user needs

It’s easy to fall into the trap of being ‘data driven’, and trying to work out ways to bring together ’all the data’ by imposing top-down structures on data collection or aggregation. But, in working out where to prioritise alignment of definitions and structures it’s crucial to be driven user need. In a crisis context, it may help to identify the primary user need that data pipelines are being built to meet (e.g. a dashboard for operational decision making), and secondary user needs that is is desirable to meet too (e.g. evaluating whether support has been provided equitably; gathering baselines for future research; supporting advocacy for funding certain needs). This will help guide decisions on…

…’just enough standardisation’

Standards are about the distribution of costs and benefits between data producers, intermediaries and data users. Without any standards, data users wanting to draw on data from different sources have to do all the work of reconciling differences and inconsistencies – and sometimes find different datasets are simply irreconcilable. Where multiple datasets have compatible definitions, but different schemas, if may be possible for intermediaries to do the work of creating a consistent dataset by standardising non-standard data. Where data produces are made responsible for data standardisation, they have to do the work of reconciling their own business needs and local definitions, with the definitions and structures provided by a standard.

In the early stages of a crisis, the focus should be on what intermediaries can do: keeping the burden on data producers and users as low as possible, and focussing only on essential standardisation (guided by an understanding of user needs). By seeking to reconcile data from different sources, intermediaries will quickly learn which gaps in data alignment or standardisation are most costly to creating interoperable datasets.

Whilst adopting standards like the Open Contracting Data Standard or Beneficial Ownership Data Standard involves working with organisations over many months and even years to align their data (and in some cases, underlying business processes) with a shared model – in a crisis response, data producers need light-weight building blocks that make their job easier – giving them content to copy and paste into surveys, or data structures that can be easily implemented.

One well-developed approach for alignment in a crisis context comes from HXL – the Humanitarian eXchange Language which provides a simple approach to mark-up columns in spreadsheets using a collection of known # hash-tags, and then provides tools to combine and filter tagged data.

(For more on ‘just enough’ thinking see Rachel Coldicutt’s post on just enough internet)

(Critically) re-use existing standards

It’s rare that you will need to ‘invent’ any standards from scratch: standardisation is often an assembly job: working out which existing standards to align with and which pieces are aligned enough to work together. As a starting point I often turn to schema.org, the ad-hoc effort by search engines to create a common (and relatively loose) vocabulary of terms to describe everything from people, local businesses and books, to pandemic related data, or I look at conventions at use in existing datasets in the domain I’m helping create data models for.

Certain lower-level conventions, like using ISO Dates, unicode for text, and ISO language and country codes, are also worth encouraging and documenting: although in most cases as long as a data source is internally consistent in how it encodes countries, dates, languages and so-on, intermediaries will be able to more-or-less map the data to common codes over the short-term.

I say that one should ‘critically’ re-use existing standards, because, as the fantastic Data Feminism book underscores, definitions of data are about power: about whose lived experience and accounts of the world will be represented and shared. There is often a balance to strike between adopting common ways of representing the world, and challenging oppressive and problematic representations.

Particularly when building standards for use across national and cultural boundaries, this calls for an awareness of the many falsehoods embedded in data models, and consideration of the embedded assumptions in off-the-shelf data models. It can also call for a sensitivity to when standards, even in a crisis, should not take the path of least resistance, but should introduce some friction in deciding which categories to use, or how to disaggregate data. For example, where user needs (and here is where considering diverse secondary user needs can be important, as ‘primary user needs’ may often represent dominant power perspectives) require an understanding of how data varies by gender, or the ability to provide intersectional disaggregation, then standards should make clear how this should be recorded and shared.

Look for the keys

One way to lower the burden on data collectors is to look for the keys that unlock additional existing open datasets. For example:

  • Postcodes in many countries allow data to be geocoded, and allow you to integrate a range of local classifications and statistics. In the UK, collecting the postcode of where a service is delivered allows you to look up the socio-economic status of the are, the local authority responsible for service delivery there, and a whole host of other information. In other countries, location data may be possible to match with satellite observation data to infer other relevant classifications for a survey respondent.
  • Organisation identifiers – which, if collected and well validated, can be reconciled against public databases to find information on companies, charities and other entities. In the UK, a Charity number can be used to look up classification data on the organisation’s beneficiaries taken from annual charity returns. For many nations, company numbers can be reconciled against OpenCorporates to provide detailed corporate information.
  • URLS and Social Media IDs can be useful in some use-cases to crawl web pages and social network and find signals about the networks an organisation is part of, of the topics they work on.

Each sector and domain is also likely to have some of its own ‘keys’ that can hook into existing datasets (e.g. the Common Procurement Vocabulary for classifying public procurements in Europe). If you are lucky, they will be attached to relevant open datasets.

Care still needs to be taken to consider gaps in the lookup data (e.g. some countries lack open corporate register data; satellite data coverage varies; not all organisations have websites), and to avoid introducing biases through faulty assumptions (e.g. if assuming the ‘register office’ postcode of UK charities is where their beneficiaries are, then it looks like London gets more funding than it does). It’s also important to consider how easy it will be for those providing data to enter it. For example, do organisations know their registration number? (On the organisation identifiers point, this is one of the reasons I was involved in creating org-id.guide and there remains a lot still to do in this area).

Decide on your approach to categories

At the heart of many standardisation processes is classification: sorting needs, organisations, events or people into categories. Standardising categories can be notoriously difficult: and is often hard to do in a rush. You might find there are existing classification schemes you can draw upon, or you might find a need to create your own (or, as LandVoc has done, albeit over a number of years, to engage with an existing classification scheme to get the elements you need included).

Good documentation of the boundaries of a category (ideally with examples) is vital for them to be used in interoperable ways.

Many of the standards I’ve worked on have stepped back from settling categorisation debates, but representing classification elements in terms of:

  • A vocabulary – to allow different datasets to use different classification schemes
  • A code – that stays constant across languages
  • A label – that can be translated into local languages

This offers a way to at least avoid two people talking about different things with the same terms, but leaves the alignment problem to later.

In an ideal world, a rapid standardisation project might be able to provide ‘good enough’ categories for data collectors to start with, but then offer them some level of flexibility so that individual data collection exercises can address their local user needs by adapting core categorisations.

Semantic standards such as SKOS have a lot to offer to efforts to bring together data using heterogenous classification schemes: allowing not only hierarchical relationships (i.e. the ability to add a ‘narrower’ concept under a headline category), but also broad and narrow matches between neighbouring concepts. However, tools and skills for working well with this kind of data and classification structure are, in my experience, quite scarce.

Meta-data matters

One of the most important things to help intermediaries align different datasets is ‘data about the data’. Knowing who collected a dataset (ideally with ability to contact them), knowing when and where it was collected, and ideally having pointers to the survey forms or data collection instruments used can make the process of ingesting and reconciling disparate datasets at lot, lot easier.

Conventions like MetaTab provide an easy way to get started providing standardised meta-data when circulating spreadsheets, and there are well established standards for meta-data in most domains.

Meta-data should also include clear information on restrictions or permissions that apply to re-use of a dataset, which brings me onto:

Don’t forget standards of data governance

The first question to ask before making use of any dataset that might contain sensitive information from individuals or organisations is: do I have the right to use this data? Does using or sharing this data (or analysis based on it), put anyone at risk?

As the responsible data initiative puts it, there is a:

…collective duty to account for unintended consequences of working with data by:

1) prioritising people’s rights to consent, privacy, security and ownership when using data in social change and advocacy efforts,

2) implementing values and practices of transparency and openness.

Working out early on a set of shared procedures for assessing the need for, obtaining and recording consents from data subjects for data sharing and re-use can avoid hitting barriers later on. This might take a number of forms, such as:

  • Suggested privacy policy terms that describe how data might be shared and re-used;
  • Identifying the different states that consent might take (.e.g. consent for data to be ‘shared’ with identified partners, or consent for non-personal data to be ‘open’ – drawing on the ODI’s data spectrum and how these should be encoded in each relevant row of a dataset;
  • Adding a section to meta-data templates for those sharing data to indicate who else data can be shared with, and if any fields should be masked from an open version of a dataset.

Standards are about people

Lastly, but by no means least – it is important to think of standards as a process, not a product. That documentation I mentioned at the start? That’s not for users: that’s for you. Because most of the time people don’t read documentation: they don’t have the time, or don’t know where to start. In reality, most of the standards I’ve worked on require conversations, engagement and feedback to help people align their data with them.

If someone is designing a data collection survey, the prime opportunity for standardisation is between their first draft, and it going out in the field. If you can get into a conversation then, and provide prioritised feedback on how it can align more with the documented standard, how it could incorporate some ‘key fields’ that will unlock other data, or how the consent questions could be worded to be compatible with shared data governance, then you have a chance of the data that flows from that data collection will be possible to bring together as part of a wider aligned insight datasets.

In all the standards I’ve worked on, the ‘Helpdesk’ team have been as vital as the documentation and schema to making standards truly work as tools of coordination and collaboration.

 

 

2019 in Review

I started writing this just before the Christmas break, but got interrupted by both festivities and flu. So, below, a slightly belated look back at 2019: where yet again my blogging has been far too sporadic.

January – FOI & Javelin Park Protests

Last Christmas eve, I was pouring over the newly released details of a £100m+ cost increase in the contract for the Javelin Park incinerator I’ve written about before. Over Christmas, we put together calls for an Independent Inquiry into the project, and come January, I was outside the plant, taking part in protests at the price rise.

Since then, the County Council have been taken to court over the contract, putting the calls for an inquiry on hold (although questions were finally put to the Chief Executive of the Council in March, with updates on the court case expected in early 2020.

My other FOI adventures of 2019 have been less conclusive:

  • Gloucestershire’s refusal to provide prices and buyers of the public land they have sold off means the only way to piece this together would be by spending £100s on land registry records: something I’ve not had space to pursue. Promises that this information would be published proactively from September have been broken by Cabinet – and our experiment in using the Local Audit and Accountability Act in June to look at relevant documents didn’t appear to provide a full overview. It seems profoundly odd that there is so little transparency over how public assets are being disposed of.

February – Exploring Arts and Data

At the start of the year, I kicked off a part-time role as ‘Data Catalyst’ with Create Gloucestershire working on a number of fronts to support their internal data practices, but also to scope out ways to connect artists with debates around data. I shared some initial research back in February and in September had great fun co-facilitating a ‘Creative Lab’ at Atelier in Stroud, where we co-created a range of data-informed art works – from VR Design Teachers, to fabric chromatography creations that visualised data on school subject choice.

March – TicTec & The State of Open Data

Much of March was spent working on final editing of chapters for The State of Open Data, and then, late in the month, heading to Paris for The Impacts of Civic Technology (TicTec) conference to present initial finings with my co-editor, Mor. An evening reception and hearing about digital democracy and participation projects at French National Assembly was particularly inspiring.

April – Printing and Driving

2019 was supposed to be a bit of a sabbatical year (learning point: I’m not very good at sabbaticals), but in late March and April I did finally get round to my two main goals of: (a) learning a bit about printmaking; (b) passing my driving test.

A wonderful two day workshop with Rod Nelson had me exploring woodcut designs exploring field patterns and the Stroud landscape.

And Bob Waters got me through my test first time.

I’ve promptly failed to do any more printing or driving this year, but at least I now know a bit more about how to!

First ‘field patterns’ print drying on Rod Nelson’s studio

May – State of Open Data Book Tour and OGP

May took me to the US, for a few weeks of #slowTravel by train around the East Coast, and then up to Canada, for the full launch of the the State of Open Data book. It was a real pleasure to catch up with old friends, and to take part in some really stimulating workshops, including a fascinating Belfer Center session on ‘Data as Development’ which gave rise to this note on the idea of a ‘a data extraction transparency initiative.

Getting hold of physical copies of The State of Open Data book was a great moment: as at times the project has felt quite beyond delivery. I’m pretty pleased indeed with how it turned out – with contributions from 60+ authors, and many more reviewers and contributors.

I’ve still got a few hard copies that can go free to University or organisational libraries, so if you’ve read this far, and you would like one – do drop me a note.

At IDRC for book talk on State of Open Data
With the editors of State of Open Data sharing findings from the book at IDRC HQ.

June – Facilitation fun with IATI

In June I took another #slowTravel trip – heading to Copenhagen by train to facilitate a workshop for the International Aid Transparency Initiative’s technical community on the draft strategy.

This followed some online facilitation work for strategy dialogues earlier in the year. I’ve also had chance this year to co-facilitate an online dialogue for Land Portal: reminding me how much I enjoy this kind of blended online and offline facilitation work. Perhaps something to explore more in 2020.

July – Coast to Coast

In July, Rachel and I set out walking across the UK on Wainright’s Coast to Coast path – raising funds for  Footsteps Counselling and Care .

The weather and walk was stunning – and a real chance for reflection. Photos from the coast to coast walk

 

August – Impact Bonds and Waste Management

Besides the annual August pilgrimage to Greenbelt, it was a month of interesting UK projects – including work with the Government Outcomes Lab at the University of Oxford to scope out ways to improve transparency and data sharing around Social Impact Bonds, and contributing to a  (sadly unsuccessful) pitch by Open Data Manchester and Dsposal to secure innovation funding to build on their prototype KnoWaste standard.

September – Civic Media Observatory

In September, I had my first opportunity to work in-depth on a project with the fantastic Global Voices team – using AirTable to rapid prototype a database and workflow for tracking and analysing mainstream media, social media, and offline events through a local lens, and understanding the context and subtext of the media that platform moderators may be asked to make snap judgements over.

A three-day workshop in Skopje, Northern Macedonia, looking at coverage of the EU Accession talks, put the prototype to the test (and introduced me to some quite remarkable monumental architecture….). 

October – AI at Bellagio

I spent all of October in Italy, first as a residential fellow at the Rockefeller Bellagio Center in Italy, and then with a brief vacation in Verona, and quick trip to Rome to work with Land Portal.

Taking part in the Bellagio Center’s thematic month on Artificial Intelligence was quite simply a once in a lifetime opportunity. I didn’t write much about it at the time (as I was busy trying to pull together the outline of a new book proposal) and with an election called in the UK just as we were heading home, haven’t had the space to follow up. Hopefully some point next year I’ll be publishing a few outputs from the month.

However, I can’t leave my fellow resident’s work un-shared, so if I’ve not already signposted the below to you, do take time to:

I should also mention one of the other highlights of the residency: enjoying two shows, numerous tricks. and sage advice from ‘Magician in residence’ Brad Barton, Reality Thief – go see him if you are ever in the Bay Area!

November – Elections!

I returned from Italy right into the middle of the biggest General Election campaign Stroud District Green Party have ever run, for the fantastic Molly Scott Cato. It was a month both spent both on the doorstep, and juggling spreadsheets – exploring the reality of values-based volunteer-driven political campaigning in an era of data.

December – Global Data Barometer

Over November and December I was also working on the scoping for a potential new project – the Global Data Barometer – a successor to the Open Data Barometer study I helped create at the Web Foundation back in 2013. The goal is to explore how a 100+ country study could provide insight into patterns of ‘responsible re-use’ of data around the world – capturing both use of data as a resource for sustainable development – and efforts to manage the risks that the unregulated collection and processing of ever increasing quantities of data might create. I published the initial draft research framework just before Christmas, and will be exploring the project more in a workshop in Washington next week.

2020 plans

Over 2020 I’m looking forward to more work on the Global Data Barometer, and with the Open Ownership team, as well as some further facilitation projects, and, hopefully, a bit more writing time! We’ll see.

Algorithmic systems, Wittgenstein and Ways of Life

I’m spending much of this October as a resident fellow at the Bellagio Centre in Italy, taking part in a thematic month on Artificial Intelligence (AI). Besides working on some writings about the relationship between open standards for data and the evolving AI field, I’m trying to read around the subject more widely, and learn as much as I can from my fellow residents. 

As the first of a likely series of ‘thinking aloud’ blog posts to try and capture reflections from reading and conversations, I’ve been exploring what Wittgenstein’s later language philosophy might add to conversations around AI.

Wittgenstein and technology

Wittgenstein’s philosophy of language, whilst hard to summarise in brief, might be conveyed through reference to a few of his key aphorisms. §43 of the Philosophical Investigations makes the key claim that: ”For a large class of cases–though not for all–in which we employ the word ‘meaning’ it can be defined thus: the meaning of a word is its use in the language.” But this does not lead to the idea that words can mean anything: rather, correct use of a word depends on its use being effective, and that in turn depends on a setting, or, as Wittgenstein terms it, a ‘language game. In a language game participants have come to understand the rules, even if the rules are not clearly stated or entirely legible: we engage successfully in language games through learning the techniques of participation, acquired through a mix of instruction and of practice. Our participation in these language games is linked to the idea of ‘forms of life, or, as it is put in §241 of the Philosophical Investigations, “It is what human beings say that is false and true; and they agree in the language they use. That is not agreement in opinions but in form of life.”.

As I understand it, one of the key ideas here can be expressed by stating that meaning is essentially social, and it is our behaviours and ways of acting, constrained by wider social and physical limits, that determine the ways in which meaning is made and remade.

Where does AI fit into this? Well in Wittgenstein as a Philosopher of Technology: Tool Use, Forms of Life, Technique, and a Transcendental Argument, Coeckelbergh & Funk (2018) draw on Wittgenstein’s tool metaphors (and professional history as an engineer as well as philosopher) to show that we can apply a Wittgensteinian analysis to technologies, explaining that: that “we can only understand technologies in and from their use, that is, in technological practice which is also culture-in-practice.” (p 178) . At the same time, they point to the role of technologies in constructing the physical and material constraints upon plausible forms of life:

Understanding technology, then, means understanding a form of life, and this includes technique and the use of all kinds of tools—linguistic, material, and others. Then the main question for a Wittgensteinian philosophy of technology applied to technology development and innovation is: what will the future forms of life, including new technological developments, look like, and how might this form of life be related to historical and contemporary forms of live?  [sic] (p 179)

It is important though to be attentive to the different properties of  different kinds of tools in use (linguistic, material, technological) within any form of life. Mass digital technologies, in particular, appears to spread in less negotiable ways: that is, some new technology introduced, whilst open to be embedded in forms of life in some subtly different ways, often has core features presented only on a take-it-or-leave-it basis, and, once introduced, can be relatively brittle and resistant to shaping by its users.

So – as new technologies are introduced, we may find that they reconfigure the social and material bounds of our current forms of life, whilst also introducing new language games, or new rules to existing games into our social settings. And with contemporary AI technologies in particular, a number of specific concerns may arise.

AI Concerns and Critical Responses

Before we consider how AI might affect our forms of life, a few further observations (and statements of value):

  • The plural of ‘forms’ is intentional. There are variations in the forms of life lived across our planet. Social agreements in behaviour and action vary between cultural settings, regions or social strata. Many humans live between multiple forms of life, translating in word and behaviour between the different meanings each requires. Multiple forms are not strictly dichotomous: different forms of life may have many resemblances, but their distinctions matter and should be valued (this is an explicit political statement of value on my part).
  • There have been a number of social projects to establish certain universal forms of life over past centuries. For example, the development of consensus on human rights frameworks is one of these. seeking equitable treatment of all (I also personally subscribe to the view that a high level of respect for universal human rights should feature as a constraint to  all forms of life).
  • Within this trend, there are also a number of significant projects seeking to establish greater acceptance of different ways of living, including action to reverse the victorian imposition of certain normative family structures, work to afford individuals greater autonomy in defining their own identities, and activity to embed much more ecological models of thinking about human society.

These trends (or ongoing social struggles if you like) seeking to make our ways of living more tolerant, open,  inclusive and sustainable are important to note when we consider the rise of AI systems. Such systems are frequently reliant on categorised data, and on a reductive modelling of the human experience based on past, rather than prospective, data.

This noted, it appears then that we might point to two distinct forms of concern about AI:

(A) The use of algorithmic systems, built on reductive data, risks ossifying past ways of life (with their many injustices), rather than supporting struggles for social justice that involve ongoing efforts to renegotiate the meaning of certain categories and behaviours.

(B) Algorithmic systems may embody particular ways of life that, because of the power that can be exercised through their pervasive operation, cause those forms of life to be imposed over others. This creates pressure for humans to adapt their ways of life to fit the machine (and its creators/owners), rather than allowing the adaptation of the machine to fit into different human ways of life.

Brief examples

Gender detection software is AI trained to judge  the gender of a person from an image (or from analysing names, text or some other input). In general, such systems define gender using a male-female binary. Such systems are being widely used in research and industry. Yet, at the same time the task of judging gender is being passed from human to machine, there are increasingly present ways of life that reject the equation of gender and sex identity, and the idea of a fixed gender-binary. The introduction of AI here risks the ossification of past social forms.

Predictive text tools are increasingly being embedded in e-mail and chat clients to suggest one-click automatic responses, instead of requiring the human to craft a written response. Such AI-driven features are at once a tool of great convenience, but also an imposed shift in our patterns of social interaction.

Such forms of ‘social robot’ are addressed by Coeckelbergh & Funk when they write: “These social robots become active systems for verbal communication and therefore influence human linguistic habits more than non-talking tools.” (p 185). But note the material limitations of these robots: they can’t construct a full sentence representative of their user. Instead, they push conversation towards the quick short response, creating a pressure to change patterns of human interaction.

Auto-replies suggested by Google Mail based on a proprietary algorithm.

The examples above suggested by gmail for me to use in reply to a recent e-mail might follow terms I’d often use, but push towards a form of e-mail communication that, at least in my experience, represents a particularly capitalist and functional form of life, in which speed of communication is of the essence, rather than social communication and exploration of ideas.

Reflections and responses

Wittgenstein was not a social commentator, but it is possible to draw upon his ideas to move beyond conversations about AI bias, to look at how the widespread introduction of algorithmic and machine-learning driven systems may interact with different contemporary forms of living.

I’m always interested though in the critical leading to the practical, and so below I’ve started to sketch out possible responses the analysis above leads me to consider. I also strongly suspect that these responses, and justification for them, can be elaborated much more directly and accessibility without getting here via Wittgenstein. Writing that may be a task for later, but as I came here via the Wittgensitinian route, I’ll stick with it.

(1) Find better categories

If we want future algorithmic systems to represent the forms of live we want to live, not just those lived in the past, or imposed upon populations, we need to focus on the categories and data structured used to describe the world and train machine-learning systems.

The question of when we can develop global categories that have meaning that is ‘good enough’ in terms of alignment in use across different settings, and when it is important to have systems that can accommodate more localised categorisations, is one that requires detailed work, and that is inherent political.

(2) Build a better machine

Some objects to particular instances of AI may be because it is, ultimately, too blunt in its current form. Would my objection to the predictive text tools be the same if they could express more complete sentences, more in line with the way I want to communicate? For many critiques of algorithmic systems, there may be a plausible response to suggest that a better designed or trained system could address the problem raised.

I’m sceptical however, of whether it is plausible for most current instantiations of machine-learning to be adaptable enough to different forms of life: not least on the grounds that for some ways of living the sample-size may be too small to gather enough data points to construct a good model, or the collection of the data required may be too expensive or intrusive for theoretical possibilities of highly adaptive machine-learning systems to be practically feasible or desirable.

(3) Strategic rejection

Recognising the economic and political power embedded in certain AI implementations, and the particular form of life it embodies, may help us to see technologies we want to reject outright. If a certain tool makes moves in a language game that are at odds with the game we want to be playing, and only gains agreement of action through its imposition, then perhaps we should not admit it at all.

To put that more bluntly (and bringing in my own political stance), certain AI tools embody a late-capitalist form of life, rooted in cultures and practices of a small strata of Silicon Valley. Such tools should have no place in shaping other ways of life, and should be rejected not because they are biased, or because they have not adequately considered issues of privacy, but simply because the form of life they replicate undermines both equality and ecology.

Where next

Over my time here at Bellagio, I’ll be particularly focussed on the first of these responses – seeking better categories, and understanding how processes of standardisation interact with AI. My goal is to do that with more narrative, and less abstraction, but we shall see…