Weeknotes – 22nd July 2022

[Cross-posted from Connected by Data blog]

Well, as Jonathan said but two weeks ago, a week’s a long time… Just as we thought ministerial mayhem might mean we had a bit longer before the ‘Data Reform Bill’ (DRB) would be out, on Monday this week the ‘Data Protection and Digital Information Bill’ dropped (DPDIB) revealing not only the new name, but the scope, for the DRB. We’ve got a team retreat next week where we’ll be digging into the detail of the Connected by Data response, but suffice to say that, right now, collective impacts and public voice do not feature as strongly as we think they could and should.

As I skipped writing up weeknotes last week, a couple of different themes to reflect on this time around, and lots of assorted extra bits.

Digging into dialogue

One of the big challenges in seeking to embed participatory mechanisms for data governance into legislation, is that there is a big risk of creating yet-another-tick-box and ending up with low quality compliance-oriented engagement, rather than transformative forms of participation.

Over the last three weeks I’ve been an observer of the NHS AI Lab Public Dialogue on data stewardship: a process involving around 50 members of the public meeting for 12 hours (across four sessions) to share their ‘thoughts, aspirations, hopes and concerns’ about how access to healthcare data for AI purposes should be managed. I’ve got a full write up in the works, but it’s been a really interesting opportunity to watch a ‘dialogue on dialogue’ as members of the public explored different models for public engagement in governing access to health data.

I’ve also been trying to read up more on the history of public dialogue, as our expression of interest in partnership with OpenSAFELY to the RSA Rethinking Public Dialogue fund has made it through to the second round of bidding. Here’s the one paragraph summary of what we’re trying to develop:

“Connected By Data and OpenSAFELY will collaboratively develop a protocol for ‘dialogue on demand’: agile and inclusive mini-dialogues on data governance and research design decisions that are developed based on bottom-up input from affected groups, and that feed into both iterative data governance process refinement, and into focussed operational decision making.”

Plus, I had the opportunity last week to sit-in on a training delivered by Simon Burral of Involve for the Data Trusts Initiative on governance and engagement design.

All this has been really useful for starting to think about the different factors that might help deliver the ‘powerful say’ for data-affected communities that we’re calling for. For now, I’ve captured this as an opinionated statement on what meaningful and effective participation looks like:

Generally the more concrete the issue or situation that discussion can focus on, and the more ‘moving parts’ of that issue/situation that can be made legible to participants, the more meaningful the discussion is likely to be. And the more that points made in a discussion can be grounded in relatable lived experiences, the more powerful the messages from a discussion are likely to be.

Sector specifics

Over the last fortnight Jonathan and I have been round a few loops of trying to articulate simple (hypothetical) stories of how current data practices affect real people in our target sectors (debt; housing; education). It’s proven (surprisingly?) challenging to articulate the narratives for debt in short prose, I think for a number of reasons:

I’ve been trying to focus on present problems rather than future fears. Reports like the fantastically useful Governing data and artificial intelligence for all: Models for sustainable and just data governance arguably have an easier job of it by looking primarily at (reasonably) imagined future AI harms, rather than quantified current harms.
In many cases, I’ve been finding that present problems are covered by regulation in some form, even if the data component of the problem has limited governance. For example, we started looking at targeted loan advertising, but find that industry self-regulation has led to voluntary action not to take adverts from payday lenders. This limited governance at the application layer doesn’t remove the issue that data is collected, pooled and shared that could be used to target people with risky financial products, although it means that right now this harm isn’t generally observed.
There are multiple stages, and multiple actors, in any story of how people are Connected by Data. Where I started by trying to present stories of single named individuals, I’ve now been experimenting with sketching scenarios with visual representations of the data flows that connect people, and that raise data governance questions.
The data problems are often indirect. Jonathan did some great work developing problem trees for debt and data; demonstrating that there are a couple of steps between the abuse of data, and the ‘crunch’ of relatable harms. Those ‘harms’ are even trickier to land when they are the absence of actions (e.g. missing data-supported provision of support to someone in debt).

It’s feeling like (a) it might be a few more iterations before we land really clear example stories for each sector; (b) like we might be discovering some of the challenges with getting robust stories to land in the debt sector specifically. I’ll be looking next week at whether this means we should revisit some of our focus sector selection.

Reading and reflections

Critical data studies and outsider action

I skimmed through a fantastic new Critical Data Studies Reading List from Frances Corry and colleagues that is focussed on papers that critically explore the data pipeline for machine learning. I’ve found lots of background reading to add to my own backlog of papers to hopefully get to reading over this summer, but was also looking out for any papers that might hint towards a participatory response to the many problems in the AI data pipeline. The few that did jump out could be said to take a more or less an outsider advocacy approach: representing voices and perspectives from populations affected or harmed by the data choices of an AI system to highlight they were not considered in the initial dataset selection or design. Such advocacy has led, in a number of cases, to significant AI training datasets being withdrawn or substantially modified.

I’ve been reflecting on how such independent and outsider activism is a key part of a spectrum of participation: able to set the agenda for discussions in more formalised participative spaces, and to hold those spaces to account for their outcomes, to provide a check on corporate capture of participatory processes. There’s more to think about here, but it’s also worth explicitly noting that more formalised participation of groups affected by AI systems in dataset governance did not appear (at a read of titles / abstracts) to be part of the repertoire of solutions being put forward by researchers in the ML community covered by this particular reading list.

Legislating data loyalty

This interesting new paper on Legislating Data Loyalty has a lot of resonances with ideas around Connected by Data, framing data loyalty as made up of three key components: “a (1) relational duty; (2) that prohibits self-dealing (3) at the expense of a trusting party”

The concept felt slightly limited by restricting the duty of loyalty from a firm solely to those whose data they collect (data subjects) rather than those affected by the data (data stakeholders), but it makes for an interesting and challenging re-articulation of privacy law, with a focus on US privacy law debates.

Where the paper gets into the details of implementation (p 374) it explores an approach to dealing with “inevitable conflicts between [the interest of] trusting parties”, by proposing that firms have reference to the “collective best interests of trusting parties” although how this is to be determined is not explored. From a Connected by Data perspective, we might suggest that one way a firm can establish that it has sought to understand collective interests is through some form of robust independent dialogue with a broad cross-section of its ‘trusting parties’.

AI in the City: Building Civic Engagement & Public Trust

Lots of interesting short essays in this colloquium collection from Ana Branduescu and Jess Reia including points on the importance of power, doubt, open processes and voices of the marginalised when governing the introduction of technology into the urban setting.

Other things from the fortnight

Filed under ‘listing out stuff mainly so I don’t forget it’, and just in case it sparks a useful connection anywhere…

I did some work this week on potential metrics and measurement tools aligned with our theory of change, looking at the outputs, outcomes and impacts we might want to track.
I shared some feedback with colleagues at Research ICT Africa for a paper looking at African perspectives on data trusts and other collective data governance mechanisms.
I’ve reviewed a couple of papers for Data & Policy
I took part in an Institute for Government roundtable discussion on Data sharing during the pandemic and the confusingly named GPDPR (which, it turns out is just the start of the confusion…)
The Data Values Project has published their final white paper on Reimagining Data and Power which we shared some input on earlier in the year. It also includes a good example of showing how consultation has been taken into account with a table of changes in response to feedback.
Our proposal for a session on Collective Data Governance at the 2022 Internet Governance Forum was unfortunately not selected. I’ll be looking for other venues where we might take this conversation forward.