Tim's Blog – working for social change; exploring the details; generally quite nuanced

What role does artificial intelligence play in transparency and accountability?

[Summary: looking at the limits of large language models, and finding the place they might play.]

Joseph Foti, Principal Advisor on Emerging Issues for the Open Government Partnership, has been exploring the role of artificial intelligence as a tool for government accountability in low data environments. His conclusions, that “AI won’t replace accountability actors, especially in low-data contexts, but done right, it can help them see further, move faster, and be more accurate” made me think of a recent conversation with Tiago Piexoto, on the possibility that AI might overcome the ‘infomediary gap’ that arguably acted as a significant brake on the potential of open data.

In short, if the potential for transparency through open data to drive accountability at scale was stymied by the difficulty of finding and sustaining intermediaries who could translate raw data into actionable information, then can recent developments in AI, and the arrival of generative AI in particular, fill that gap?

Joe takes as a starting point that journalists, policy makers and activists are likely to use generative AI as an information source in any case, and therefore this creates both a renewed case for investing in the production of good quality data, and a need to find ‘workarounds’ for using AI for accountability research work in low resource contexts.

I want to offer a slightly sceptical build on Joe’s post, and in turn, a reflection on if and how recent AI development might revitalise open data.

1) We should name and accept the fundamental limits of generative AI until or unless they are overcome

The narrative that AI is an inevitable component of all workflows, or that AI tools are capable of just about any task, should not be taken for granted. Ever since breaking into popular awareness through ChatGPT, large language models (LLMs) have not overcome the powerful 2021 ‘stochastic parrots’ critique that while they generate plausible language they don’t have any ‘understanding’ of concepts or truth. In short, bare generative AI tools are capable of error-prone (but often quite useful) synthesis of information and knowledge by virtue to patterns in their training texts, but are not capable of generating knowledge.

This is a really important distinction for thinking about accountability work, and reflected in some of Joe’s workarounds. Whilst in a high-resource environment, an LLM may be able to synthesise existing documented knowledge on corruption or accountability issues (or indeed, knowledge that can provide useful background for a researcher investigating a potential issue) – where the issues have not hitherto been documented in prose, a general purpose LLM isn’t going to serve up new knowledge.

We also need to consider the limitations of different approaches to get more local context into LLMs. We can identify broadly three routes by which data from low-resource topics might end up in generative AI responses:

Model training. GPT-4 is estimated to have trained on roughly 1 petabyte of data, and cost tens of millions of dollars in compute power. Retraining models only takes place periodically: and so there can be a significant lag between data making it into ‘crawls’ and datasets fed into models, and then into a released model.
Fine tuning. Relatively small datasets, and much lower compute costs, are involved in fine-tuning an LLM: though generally this requires some amount of ‘labelled data’ (i.e. data already human classified – rather than just a corpus of documents) so may have greater human labour costs involved.
Retrieval Augmented Generation (RAG) . This is a common feature of how we experience many LLMs today, when they either draw on user-supplied documents, or search the web for extra content, to fit into the context-window used to respond to user prompts. RAG essentially combines generative AI supported search (of your own, or an open repository of information) with the text synthesis capability of AI.

For each of these approaches we should consider (a) the labour involved in improving LLM outputs; and (b) the alternative uses of that labour and/or the outputs of that labour.

For example if, to help address extractives corruption, we plan to bring together a collection of license documents into a database for RAG, do we also consider other non-AI ways we could use the resulting corpus? Or rather than putting resource into click-work labelling of data to fine-tune an LLM, might we be better creating tools and processes that help active citizens to engage with the meaning of, and potential response to, the documents we’ve collected?

When we approach specific forms of AI as distinctly limited technologies within any given workflow, rather than magical general purpose agents, we can better evaluate where they expand our capabilities, and where they risk misleading and eroding our effectiveness.

2) LLMs are not great at structured data

As this paper highlights, in general, LLMs are not tailored to working with structured data. As a simplification: it appears that an LLM sees tabular data as a set of relationships between neighbouring words and terms, and not as a set of columns, rows and abstracted data points. This has some interesting consequences both for how open data might be fed into LLM training and the right models for using AI as a tool of accountability where our source material is structured.

Perhaps the more interesting application of generative AI is not as an interface to structured data, but as a tool for extracting structured data from documents. For example, a fine-tuned vision model might be capable of extracting structured data from printed financial records or beneficial ownership disclosures for later analysis, whilst a general purpose LLM might be poorly suited to RAG-based search of the same documents.

3) Using AI for ‘signal’ detection involves an algorithmic approach whether with Narrow or General Purpose AI

One of the main infomediary roles in transparency and accountability is finding signal in the noise of large datasets or flows of information. We’ve long been able to develop, and, in cases with enough labelled data, train, algorithms that can flag data points worthy further investigation – but this has often proven very labour intensive, and led to a limited supply of sustainable tools. In this context, generative AI tools which appear at first glance to allow signal to be found by simply prompting can look appealing. But, in practice, if we want to get away from false positives and negatives, it takes a more systematic and technical approach.

For example, this recent paper on Automation of Systematic Reviews with Large Language Models describes an AI agent-based workflow, chaining together different LLM-enabled processing steps to identify, assess and evaluate evidence. Whilst this shows some of the promise of AI tools, it also demonstrates the level of careful workflow design required to use AI in truth-seeking contexts: and in many cases we should keep in mind that running step-by-step data processing instructions on top of probabilistic LLMs may be a more computational (and environmentally) costly, and still more error prone, approach than running more directly in code.

4) LLMs might lower the technical barriers to structured data analysis

One of the big barriers to the emergence of infomediaries has been the technical and domain knowledge skill required to make good use of data. It is perhaps here that LLMs are quite well-suited as a tool: trained on a wealth of resources that discuss how to analyse data, and able to suggest ‘good enough’ code or formula to get started with a dataset – albeit with the risk that they leave users with many important ‘unknown unknowns’ that might affect the quality of analysis produced.

But – it’s a different approach to focus on the LLM as a tool to help provide contextual teaching on the analysis of documents and data, rather than to take on that analysis – and perhaps a more appropriate use.

5) We should support local data inputs: but not restrict our vision to generative AI use

In his post, Joe reflects on the need to ‘build a data corpus based on a limited set of local priorities’, and the need for local data partnerships to develop and sustain this kind of resource.

At the hyperlocal level in Gloucestershire, sparked by the NewCommons.ai challenge, we’ve also been thinking about how to bring together collections of data that could support AI tools to generate richer local community profiles and provide actionable information for community groups through generative AI interfaces. But we’ve also been thinking about the other ways that data might be used: making sure we don’t only shape our data practices around centralising AI systems – and focussing on data collection and sharing as a social process.

If the data-demands of AI are to become a driver for new community and open data efforts, we have important questions to answer about whether we’re handing over data and power to centralised systems, or whether we are developing, maintaining and governing data in ways that empower communities.

Working with, working around and working in other ways

The notes above are preliminary and incomplete. If I can try and draw out some initial conclusions, they might be that:

Alongside recognising the momentum of AI practice and seeking to shift this towards responsible practice, we should keep developing alternative narratives of our informational future;
We need to get really specific about which parts of accountability workflows current generative AI tools are suited to, and be clear where they don’t fit;
AI isn’t a drop-in fix for the infomediary gap in open data theories of change, but it might have a role in structuring data, and supporting analysts to generate insight.

In closing, let me quote from the 2025 Human Development Report:

“…we should not task machines with decisions simply because they now seem capable of making them; we should instead do so based on whether ceding those decisions expands or contracts our agency and freedoms.” Human Development Report 2025

Protecting democracy: where are “Crypto Bros for Good?”

[Summary: assorted thinking aloud about emerging technologies and democracy in 2025.]

In a Linked In post reflecting on the Copenhagen Democracy Summit Blair Glencourse includes the question:

Where are “Crypto Bros for Good?”

following up with:

(Apparently this group does actually exist). There are some fascinating new ways that tech, AI and crypto are being used to support democracy- but we need to better support coordination of, narratives around and amplification of the positive pieces of the tech ecosystem.

It’s an interesting question to unpack – and I though I should have a quick go at capturing some thoughts, as it relates to some of the work I’m currently doing supporting World Bank’s Coalitions for Reforms program on research and writing for a brief around Emerging Technology and the Social Contract, and ties also into a mapping I was working on earlier this year for the Open Government Partnership on participatory governance of digital technology.

I read the question as fundamentally pointing to the gap between communities traditionally thinking about the protection and development of democracy, and the latest computerisation movements, exploring the potential application of emerging technologies to questions of social and political organisation. However, invoking the ‘crypto bro’ idea, pointing to an often derided, tribal, hype-centred, male-dominated, technically skilled, and seemingly well-funded community (drawing often on private rather than state or philanthropic capital), might more be asking about why this wealth of resource for, experimentation with, and excitement about, ‘democratic’ innovation is nowhere near the traditional democracy field. Although framed as ‘crypto bro’, the same question might also be posted around AI innovations. There is marginally more talk of artificial intelligence in spaces near to traditional democracy reform groups, but much of the work around technologically enabled collective intelligence or AI & Democracy is focussed more on governing the power of technology firms than on addressing the democratic quality of states.

I also read the question in light of the much closer connection between the past computerisation movements, such as e-government/e-democracy and open data, and the democracy and open government fields. In the past, where the ‘natural resource’ of interest to technologists was government data there was perhaps a tighter collaboration between governance reformers and technology innovators. In current waves of technological innovation, that link feels less apparent.

All that said, where are the, “Crypto bros”? Or perhaps, I can say ‘techno-idealists’ for want of a better terms that allows us to look both beyond crypto, and the ‘bros’, for the promising practice around democratic renewal through technology? And how can the democracy field amplify positive pieces of the tech ecosystem?

Crypto unpacked

Perhaps the best known manifestation of crypto culture is the blockchain, and BitCoin in particular. BitCoin has roots as a political project: one rooted in libertarian ideas of freedom from centralised government control. Central to many crypto projects is the idea that instead of relying on the coercive power of the state (or on any other form of centralised power) to stabilise social systems (including money), cryptography and decentralisation enable allow ‘trustless trust’. For example, distributed cryptographic signatures can provide assurance that some value or vote has not been counted twice, without needing to trust in the guarantees of some external authority. This leads in a couple of different directions.

Firstly, within crypto-communities, there has been effort to extend the distributed organisation idea from currency to other forms of social practice, resulting in ideas such as Decentralised Autonomous Organisations (DAOs) often deploying textured forms of internal democracy to govern collective associations without hierarchy or individual ownership . However, generally these new forms of organising are rooted in “one-token, one-vote”, rather than “one-person, one-vote’: reflecting both bitcoin cultures of both anonymity/pseudonymity, and a tight connection between crypto-libertarian ideas and an ideological faith in markets. Widely discussed innovations such as quadratic voting (which seeks to weight votes based on strength of preference, rather than token holdings) can be ready as essentially attempts to re-introduce elements of fairness/equity into systems bootstrapped from unequal holding of initial resources.

The second direction is the use of cryptographic approaches to improve more traditional democratic systems, such as models enabling digital voting. A recent OECD report on emerging technologies for civic participation includes a case study of Vochain, a digital ballot platform that’s been used for non-binding referenda at the municipal level, and in voting within voluntary associations. However, as critics of digital voting have long noted, replacing legible processed backed by public authority, with processes that demand citizen trust in complex and, to most, illegible code, may have a long path to travel before they command trust at scale.

The third path is the use of decentralised/crypto approaches to addressing emergent problems of our digital public sphere, such as misinformation, or asserting identity and credentials in the digital world. Here we find perhaps the most promising elements of a crypto offer to democracy communities: holding out the possibility of checks-and-balances when governments seek to expand their powers (such as in the creation of national identity schemes) by proposing technical architectures that enforce the distribution of governance power. We saw some of this in place in debates over whether COVID-19 applications should rely on decentralised cryptographic models, or on centralised government systems. Ultimately, the realisation of such models rely on governments participating in schemes that, ultimately, constrain their power and involve trade-offs against other policy objectives (e.g. fraud prevention). And to date, we’ve seen limited willingness to do this, nor advocacy from democracy communities to argue that the legitimacy of governments extending reach into digital identity (for example) should rest on having new forms of cryptographically enforceable controls.

At the end of this brief survey, I’m left more or less in agreement with the the OCED analysis that “Blockchain has yet to demonstrate substantial real-world impact in the context of participation”, and with interviewees I’ve spoken too who argue that some of the inherent market ideology and logic of blockchain thinking is hard to get away from. It’s not clear that many of the Blockchain for Good projects that exist are addressing the right problems, or the right part of problems. Whilst the possibility of bootstrapping trust through crypto may have value in environments (e.g. post conflict states) where trust and economic systems have collapsed, in practice it may act as a shortcut to the wrong destination.

A-Idealists

So – if I’m more-or-less dismissing crypo, what of the other part of Blair’s question about the need to “to better support coordination of, narratives around and amplification of the positive pieces of the tech ecosystem”, including work with AI amongst other technologies.

Drawing on a social contract analysis framework, and a focus on the alignment between citizen expectations, and the outcomes produced by states, much of the potential of emerging technologies may turn out to be on the delivery and outcome side of the equation: offering opportunities for states to deliver services in new, more efficient and more tailored ways – increasing satisfaction with democratic governance. However, automation of delivery comes with trade-offs, often with minority needs and rights being the first loss in the trade. Substantial critiques exist of inherent biases in AI-driven automation of public goods, and the extent to which current arrangements around AI involve greater surveillance, datification and corporate capture of state services.

This presents a challenge for democracy communities: do we amplify a narrative about AI-driven public service delivery – potentially over civil society critiques of the limitations of these emerging technologies? Do we focus attention on the need to govern, or build alternatives to, big tech led delivery? Or do we focus more on narratives around technology at the citizen-state interface?

In this later space, there are a couple of particular roles of emerging technology worth considering.

Firstly, the role of technology in safeguarding or building an informed public sphere. Ironically, the focus of many projects here may be on undoing the damage done by past waves of technology. Despite the belief, shared by many at the time, that the Internet and social media might usher in a more inclusive, and global, public sphere – supporting democratic dialogue and debate – many current assessments would point to algorithmically-driven social media as a driver of disinformation, division and at least part of our current democratic crisis. Groups like Full Fact have been exploring the potential of AI to scale fact-checking practices, and a 2024 systematic review found that AI systems with human oversight could be effective in tackling misinformation.

When it comes to more proactively building an informed public, there are some interesting experiments going on with using LLMs to increase the accessibility of existing democratic processes (see for example the AI-driven summaries of UK local council meetings at Open Council Network), as well as to inform citizens taking part in deliberative democracy fora such as citizens assemblies. Projects coming out from Google Deepmind researchers such as the Habermas Machine hold out a promise of machine-facilitated dialogue, although in practice they are very early stage experiments.

Secondly, we can look at the role of technology in facilitating ‘listening at scale’. Ethan Zuckerman points out that “listening at scale” is “one of the hardest problems of democracy since its inception”, noting that:

From listening to those voices in the agora, to thousands of citizens crashing the Congressional phone system, ensuring that every voice in a democracy is heard has been an unsolved problem. Many of the systems we associate with democracies – voting, polling, petitions, the structure of representation itself – are technologies designed to enable listening at scale.

Aggregating inputs at scale has long been a design goal of many e-democracy projects, but Large Language Models are offering a new set of tools with some capability for summarising, categorising and sense-making across large volumes of content from citizens. Government-led projects to streamline consultation analysis, and start-ups providing tooling for deliberative dialogues all look to AI to speed up, and potentially deepen, citizen engagement in shaping or making policy – and the legibility of citizen inputs to policy makers.

Thirdly, there is work on the role of technology in bridging perspectives and building consensus. This is where we find the relatively high-profile work of vTaiwan, ably represented globally by Audrey Tang, using tools such as pol.is to build forms of civic engagement that prioritise bridging between different interests, rather than polarising into conflicting positions. Tang reports that digitally-supported citizen participation in Taiwan boosted levels of civic trust from 9% to 70%. This narrative, linking structured, at-scale and digitally mediated citizen participation to strengthening trust, is perhaps one of the most powerful at play today: although digging into both the theory and practice behind the Taiwan experience suggests that it is as much about leadership, values and organisation – as about technological fixes.

Governing (with) AI

As I set out to write this reflection, I was hoping I might find more to be excited about in emerging technologies for democratic renewal: more bits of the tech ecosystem to amplify. I’ll still be searching for those in the ongoing work for the research projects I mentioned above. But I’m increasingly brought back to reflecting that the need is not to seek out ‘at scale’ tech-fixes for democratic crisis, but to recognise the prevalent power of tech (and big tech firms) within our democracies, and to focus some of our attention on governing this power as a means to strengthen citizen feelings of control over their lives.

Here, perhaps, the work of groups like the Public AI Network is offering one positive narrative to amplify: of collectively resourced and public interest-based AI infrastructures. Growing movements for community-level AI practice – cutting out big-tech platforms and building capacity to use the power and potential of AI tools in citizen groups and civilc society, also feel like promising territory.

When it comes to accessing the kinds of venture funding for democratic innovation that an initial turn to ‘cypto bros’ may have been seeking, there are perhaps lessons from the open data movement, where it was not the largest incumbents, but finance seeking to disrupt incumbent industry players, that was aligned with reform.

Returning to the question

Looking back at Blair’s post that sparked off these reflections, the answers to where tech narratives fit into protecting democracy may in fact be found in many of the earlier bullet points. Rather than leading with tech narratives, we should show how technologies can be shaped to support action on corruption, outcome delivery, communicating the poetry, engaging wider groups shaping policy, and driving better coordination between progressive regimes. I’m leaving these reflections with a slightly different question, of how we can better bring those seeking to develop new narratives of technology closer to those working on narratives on safeguarding and charting a future for democracy.

Exploring future UK open government actions on digital governance (Nairobi workshop)

I’ve been spending this week at a workshop on governing new and emerging digital technologies, organised by the Open Government Partnership, in Nairobi, Kenya.

Over the first 1.5 days we had the great privilege to hear from civil society and government presenters from across 11 countries about current digital governance actions, agendas and challenges.Then our focus turned to prospective future open government and digital governance actions, working in country clusters.

The current UK Open Government National Action Plan which runs 2024 – 2025 notes that while “there are currently no commitments on climate change or digital governance … we look forward to pursuing these in the next plan”. Below I’ve tried to capture some of the potential themes for UK focus that came up in the discussion. These are shared (in no official capacity – simply as an independent civil society attendee at the workshop) not as fully-formed proposals, but as rough outlines that could be explored more, and discussed to see whether any deserve a little more colouring in.

An Open Government mission in a renewed Roadmap for Digital and Data

With the current UK Roadmap for Digital and Data (based around six ‘missions’ for government digital) also running until the end of 2025, the UK Open Government process could provide an opportunity to feed public priorities into the roadmap, or call for a mission that better embeds open government values of Transparency, Participation and Accountability into the roadmap.

Renewing the open data agenda

Open data was a big feature of the first four UK National Action Plans, but a lot of momentum has been lost. At the same time, as Renata Avilla and I recently argued in the conclusion to the revised edition of The State of Open Data, the critical creation of open data infrastructures remains as important as ever.

Building on thinking about a potential ‘Fourth Wave of Open Data’ (GovLab), government progress on developing an internal data marketplace, as well as design work to rethink the potential role of open data portals, there is both a need and opportunity to convene conversations around how to refresh, refine and renew an open data agenda in the UK.

Embedding Transparency, Participation and Accountability in AI Governance

We face not so much a shortage of evidence on public attitudes on AI in general, or on public perspectives about particular applications of AI in public services, as we face a gulf between the places where public engagement is happening, and the places where the promotion and governance of AI are taking place.

For example, the introduction to the work of the AI Safety Institute mentions public input just once, and while the ai.gov.uk incubator promote a recent demonstrator of AI for consultation analysis they make no mention of recent public deliberations (commissioned by DFT) exploring public perspectives on the use of AI in consultations and correspondence.

Some of the recent feedback I’ve heard on the Perspectives on the AI Fringe report has highlighted the value that having a public perspectives chapter (from the People’s Panel on AI) has had. Building on this – could we not be asking that the reports of the AI Safety Institute should include a public perspectives chapter, and that central government experiments with AI should demonstrate how they have either built on existing relevant public engagement, or carried out direct engagement with affected communities during the innovation process?

Other areas for action

Drawing on both our country discussions, and listening to action plan ideas from other countries at the workshop, there are a number of other ideas that could be in the mix (though I’ve not worked up these more than a bullet point right now):

Improving public input into AI procurement processes either at the national level, and/or by providing frameworks and support for greater participatory practice around AI procurement in local public services.

Adopting a thematic approach focussing for example on actions that explore transparency, participation and accountability around EdTech, or exploring the intersection between digital governance and climate change.

Where to go from here?

One of the OGP speakers earlier today reflected on the journey towards strong actions and commitments: noting the need to build civil society and government coalitions around particular actions, and to find the champions. It’s notable that the UK Roadmap for Digital and Data adopts a mission-oriented approach that names responsible government stakeholders.

One of the key challenges then for the multi-stakeholder groups around open government in the UK is not to wait for the next national action plan cycle to start, but to be thinking now (albeit accepting possible temporary election distractions…) about building conversations and coalitions that might own and advance improved digital governance commitments in future.

Reflections on two reunions

[Cross-posted from my Connected by Data weeknotes]

Are we capable of governing ourselves?

Asking this of the emerging online world was one of the driving questions that led Charlie Nesson (and colleagues) to establish the Berkman-Klein Center (BKC) for Internet and Society at Harvard Universtity 25 years ago. It was a question that was posed again on Thursday morning to the gathered community of current and former fellows, affiliates and faculty in Cambridge, MA to celebrate the 25th anniversary of BKC.

While we gathered in celebration, and with much joy at reconnecting with old colleagues and meeting new ones, we also gathered, I hazard to say, with a degree of weariness carried in from our respective work. For a community that has, collectively, been studying, building, litigating, organising and advocating in the hope that we might develop, deploy and govern technologies for the public good, 2023 offers a tough reality check.

Online discourse feels more fractious. Digital power more concentrated. Our climate in crisis. And our diagnosis of priorities for action less unified. Openness, conventionally the go-to tool of the place that gave birth to Creative Commons, feels less effective, and even counterproductive, in confronting and challenging the power dynamics at play. And as much as BKC has operated for over two decades as an impressive institutional hack, seeking to share out some of the privilege embodied in Harvard to unusual suspects, the institutional dynamics and legacy relationships can at times feel in the way of, rather than in service of, transformational scholarship and action.

I could tell perhaps a similar story of the start of my week spent with open government advocates in Tallinn as we hosted a fringe workshop on the sides of the Open Government Partnership. Although I didn’t get chance to stay for the full OGP Summit, I had the opportunity to reconnect with old colleagues, and meet some new. Amongst the energy from meeting together, we had little illusion that the stakes, and the challenges, have rarely been so big. And there was ample recognition that ad-hoc tools of openness alone do not deliver the kinds of accountability, reallocations of power and social justice interventions so desperately needed.

Are we capable of governing ourselves?

Perhaps first we have to (re-)learn how to facilitate first…

Our workshop venue in Tallinn was a little unconventional. Often used a yoga studio rather than meeting space, the room had a mix of armchairs, sofa and futons in place of the usual tables and chairs. As I was staring to unpack flip-charts and post-it notes, Veronica Cretu, who had kindly arrived early to help us set up, took one look at the arrangement, with two rows of chairs and obstructed sight-lines, and set about re-arranging to help our conversations flow. Into this re-arranged room we brought in findings from background interviews, expressed, thanks to the insight of Helena Hollis, though large-printed mind-maps that invited discussion, addition and elaboration. And we tried to structure the day to flow from building shared understanding of a problem, to considering solutions, and sketching potential actions. By the end of the day we had sketched out some promising policy proposals.

In Cambridge, within the imposing setting of Harvard Law School, the BKC team demonstrated characteristic thoughtful action to disrupt the formality, and bring the whimsy and warmth for which the center has a quiet reputation (a lot more costumes than I see at most academic gatherings for starters…). But as the programme itself progressed through a series of conventional panel sessions, I started to wonder if we were missing the critical role of facilitation in fully unlocking the wisdom, ideas and energy of the assembled group? And why?

This year is also ten years since my own fellowship year at BKC, and so the ‘class of 2013’ were well represented, and were recalling our months of weekly ‘fellows hour’ (2 hours duration), working groups and collaborations. I was reminded by Amy Johnson that one of the particular interventions we made as a group was to create a small ‘facilitation team’ that would hold a weekly standing meeting to help the rotating host of the next weeks fellows hour to think of a creative and engaging way to lead their topic (not for us a weekly seminar: think instead hands on-workshops, learning games and curated discussions). There was an important recognition here of content presentation and facilitation as distinct roles, and of chances to gather not just as a moment for a transfer of thinking, but as generative moments of deeper exchange.

Over drinks in Tallinn, I had the chance to briefly reflect with Alex Howard on OGP Summits past. One notable feature of early summits were the national or regional sessions. Slots on the agenda to share what had made it onto the open government National Action Plans of different states, and, crucially, where governments and civil society shared the room and stage in talking about them. These have dropped from the agenda in recent years. And with that, a critical moment around which to structure other conversations in the run up to, and follow up from, a summit. Formal panels have their place.

For many years, BKC had an active unmoderated discussion e-mail list, linking current and former fellows, affiliates and faculty. In the last few years, the list was paused, after a number of heated discussions and clashes, and it has not (yet?) returned. I don’t think it’s an exaggeration to say that, at least for those not in residence in Cambridge, the listserv felt like the heart of the BKC community – albeit not a perfect forum. Yet, its role was not addressed from the stage over two days of discussion about the past, future and present of the Center, and while I heard some side discussions exploring what kinds of moderation a renewed list might need, I heard little on the question of facilitation: an active and justice oriented process to build conversations across community.

Effective facilitation frequently also involves thinking about the who as well as the how of discussion. Part of this is about boundaries and curation. The Open Government Partnership has a set of criteria for which countries can, or cannot, be formal members, although civil society from non-members are not excluded. One of the reasons I suspect a self-organised listserv has not emerged in place of BKCs discussion list, is that the gatekeeping function of a BKC affiliation helps draw boundaries of an official list, in a way that a self-organised list could not easily achieve. Equally, facilitation also needs to consider invitation, space-making, and sharing of power. In our workshop in Tallinn, we tried to get a balance of countries, sectors and disciplines represented, while looking for enough shared interest and background to enable productive conversation.

Are we capable of governing ourselves?

The BKC reunion did not just start with the question. The opening panel also pointed towards one possible answer answer. For too long, it was argued, we’ve been looking at how to build governance top-down. Instead, one panellist urged, we need to start from the bottom up. I heard this as an educators answer: to start from cultivating the virtue and capability of the individual student, and to build out from this to collectives, organisations and states that can be self-governing. It is a good answer. Yet, bringing in the lens from our work at Connected by Data, I wonder if we also need to start from community and collective. When we focus on governing from the group-up, then the facilitation of sense-making and position-taking both within, and between, groups becomes central to the question.

I suspect my own approach remains deeply informed by early engagement with informal education, youth work and group work. We learn in groups. Not open, unbounded groups. But intentional groups, with the right mix of support and challenge. The groups we belong to also confer privileges, interests, burdens and oppressions: and the extra labour that some collectives and individuals face in joining dialogue, particularly racialised or majority world citizens, need recognition[1].

This is something I’ll be reflecting on more as we come to think about resources for deliberative engagement on data and AI governance : thinking about how inputs for dialogue might be received (and responded to) differently in groups brough together by sortition, or by those connected through solidarity and shared experience.

Are we capable of governing ourselves?

Given the challenges and crisis we face as both local and global communities, we can only keep trying to work that out.

[1] I must acknowledge here my own imperfect journey with being attentive enough to the impacts of privilege in my work and practice, or being clear and direct enough as an ally of communities facing inequity and injustice.

Participation and the swimming pool problem…

[An occasional weeknotes cross-post from Connected by Data]

At a school governors meeting last week, I was reminded of what, when I first got involved in youth council participation in the late 1990s, we used to talk of as ‘The swimming pool problem’. The conversation that triggered this memory went something like this:

Governor 1: “We need to do something about Parent and Pupil Voice.”

Governor 2: “Well, we often ask the kids what they want in the school. But they always come back asking for a swimming pool: and that’s just impossible.* So not sure it’s worth us spending much time on pupil voice right now.”

*There is literally no space on the school site for a pool, and perhaps more importantly, it would take at least 1000x the available budget.

The idea that, because when asked the open question ‘what do you want to see?’ the people consulted come back with an unrealistic suggestion, is a reason not to engage with participatory practice, is one we often had to fight against as youth councillors. Over time, we learnt that instead of responding to “But we can’t build a swimming pool in every park!” with “Why not?” instead we needed to ask “What did you tell the group about your actual budget? Did you share information about what you do have the power to change?”.

This points to a challenge at the heart of any participatory practice: designing processes that are open enough to allow participants to express views rooted in their authentic experience and interests and that are constrained enough to focus discussions on decisions that can be made, and that give real power and influence to participants.

This theme has come up in three different pieces of work this week.

Firstly, in this write-up of my observations on the NHS AI Lab Public Dialogue on Data Stewardship I discuss an example of public dialogue work that sought to equip participants with background information on a complex topic (use of AI to analyse imaging datasets), and then to scaffold a meaningful discussion about models of data governance to be applied to this.

Secondly, in our evaluation of a deliberative engagement exercise commissioned by Justice Lab (the first Justice Data Matters report) we look at the challenges of supporting a diverse group of members of the public to engage with details of how access to machine-readable data from court records should be governed. In particular, we highlight the value of background materials that can provide shared reference points to enable ‘experts’ and ‘non-experts’ to talk effectively about key concepts like open justice, or kinds of court data use.

Lastly, in this write up describes the ‘Discovery’ workshop we held to inform Joseph Rowntree Foundation’s work on developing an insight infrastructure, we talk about how we used a set of example websites (selected based on prior interviews and survey responses) as the anchor for a discussion about ‘what works’ in provision of insight infrastructure. The JRF team have been keen to avoid imposing too strong a notion of what an insight infrastructure might be at the start of the engagement process, conscious of their power as a funder to (intentionally or not) steer discussions in ways that might prematurely close down important avenues of exploration. However, given the term insight infrastructure is under-defined, we also needed starting points concrete enough to allow comments and ideas raised in the workshop to speak to the kinds of programmes or activities JRF might develop.

In reflecting on the development of public dialogue and deliberative workshop approaches over the last few decades, it is good to see that a lot of participation has moved on from simply asking (and then dismissing the answer to) the question: “So what do you want?”. However, I’m also left observing that the seemingly ‘intangibility’ of so many data questions, and the way they are often interrelated with other complex questions (open justice; health economics; the politics of poverty etc.) means that developing materials and methods that will enable both inclusive and powerful citizen voice on data governance is an ongoing challenge.

In other news

Tickets are now available for Gloucestershire Data Day on 26th April – the event I’ve been co-organising with Create Gloucestershire, Active Gloucestershire and Barnwood Trust. It’s shaping up into a great agenda to mix practical and critical conversations on the role of data in community action.

Plus, we’ve got a lovely logo designed by the fantastic Joe Magee, who is more usually found creating films and backdrops for Bill Bailey tours: setting the creative bar high for the day.

Data governance in the everyday: beyond big platform conversations

[Summary: another occasional cross-post of my Connected by Data weeknotes]

I work three days a week for Connected by Data. Outside that, as well as parenting two active under 10s, over the last few years I’ve been trying to get more involved in my local community, whether supporting democratic engagement delivering leaflets or wrangling data for Stroud District Green Party, helping out as a Parent, Teacher & Friends Association (PTFA) member and parent-governor at my child’s school, or joining the board of Create Gloucestershire, a county-wide non-profit with a mission to expand access to arts, culture and creativity.

In the last few months I’ve been struck by how often data governance issues have been coming up in these roles – and how rarely it has been possible to resolve those issues simply with the conceptual tools to hand in the form of GDPR, or a data protection policy.

In work time, a lot of the conversations I encounter about rethinking data governance focus on relatively large-scale interventions: like establishing new institutional forms (data trusts, co-ops etc), or changing policy to better regulate big tech. However, the idea that we are ‘connected by data’ can perhaps also apply very productively to everyday data governance.

To look at just two examples:

Class WhatsApp groups

My phone regularly pings with alerts from the WhatsApp group setup by parents of other children in my son’s class. Most classes at the school have a group like this. These informal, unofficial groups provide a stream of information and interaction: from questions about PE day or school trips, to confirming the week’s homework spellings, and sharing news of events.

When I was at primary school 30 years ago, this information might have been flowing as parents waited in the playground for school pickup. But, today, whether we’re all still standing a little future apart after COVID lockdowns, or because changing family structures and after school clubs mean there isn’t a common cohort of parents that meet each afternoon, the natter networks are somewhat broken, and platform-mediated WhatsApp groups are filling that gap.

This raises some challenges. Last week I got an e-mail from ClassList, the school-based social network, highlighting a legal opinion they commissioned that suggests school-based use of WhatsApp groups may not comply with GDPR. The fact that joining a WhatsApp group shares phone numbers, and some might be excluded if groups are the only formal route for sharing information, are amongst the concerns they raise.

Invoking GDPR might be a good marketing strategy to encourage risk-averse schools to adopt a platform that promises to ease compliance – but it converts a set of questions about how best to support communication and connection between families, into one about data controllers, notice, consent, and individual privacy controls. And in practice, questions of inclusive communication, and understanding the needs of different families, are likely to remain unaddressed.

Instead, I wonder how we might create light-weight models for conversations that allow the ad-hoc collectives convened, for example, in a WhatsApp group, to explore the norms and behaviours they want to jointly operate by, and the data governance (small d, small g) implications of those choices. For example, a set of conversation prompts might cover:

What do we want this group to be here for?
What impact does the information shared here have on others? On teachers? On students?
What is it ok, and not-ok to share in this space?
Should we move to another platform (e.g. Signal) that has more privacy-preserving features?

I’m not sure how starting a conversation like this would be received – and what other resources (e.g. background explainers etc.) might be needed to support a meaningful conversation. But it is the kind of discussion of platform data governance in the everyday that I think we need to be having alongside the big picture work to secure better platform defaults. Perhaps a bit of action research is required.

Catalysing creativity

On Wednesday I had a meeting of the board working group on data for Create Gloucestershire (CG). The group was set-up to support CG, as a small non-profit infrastructure organisation, to make better internal use of data and to catalyse good data practice amongst partners. GDPR was on our agenda this week, triggered by a need to work through the NHS Data Security and Protection Toolkit as part of new work with the NHS.

However, it quickly became clear that the conversation was not just about protection of personal data. Instead, it was also about a sense of data extractivism, and the feeling that voluntary sector organisations risk ending up in the middle of processes that capture data from communities, but that don’t provide insights or identified benefits in return. And it was about making sure data practices were the right-fit, applying strong protections to sensitive data, but not inhibiting sharing of community-level insights, or co-operative working on non-personal data. I was struck that, although the CG team introduced the item with questions about GDPR compliance, the language used to talk about what practices to encourage or require from partners, was much more a language of community, capacity building and collective responsibility.

The conversation ended with the idea for a local data unconference, to create a space that could both share practical data security and data management skills and give practitioners greater confidence in handling data at the individual level, at the same time as building a stronger collective voice amongst voluntary sector organisations to talk about how to data collection and sharing could work for them.

Just as the individuals whose lives are captured in the same dataset, or whose choices are shaped by its analysis, are connected by data, so too are organisations reporting to the same funders, or operating in policy landscapes governed by the same centralised metrics. If these organisations can find common voice, then there may be opportunities to shape more equitable data infrastructures that more effectively deliver the public good.

I took the idea of this unConference to Jeni and Jonathan at our regular check-in on Thursday, and they liked it. So I’m hoping to work with the CG team in the next few weeks to work up the idea more. If you might be interested in collaborating too – as a co-host, sponsor or attendee of a data-focussed day-long unconference in Gloucestershire, do drop me a line!

Weeknotes – 22nd July 2022

[Cross-posted from Connected by Data blog]

Well, as Jonathan said but two weeks ago, a week’s a long time… Just as we thought ministerial mayhem might mean we had a bit longer before the ‘Data Reform Bill’ (DRB) would be out, on Monday this week the ‘Data Protection and Digital Information Bill’ dropped (DPDIB) revealing not only the new name, but the scope, for the DRB. We’ve got a team retreat next week where we’ll be digging into the detail of the Connected by Data response, but suffice to say that, right now, collective impacts and public voice do not feature as strongly as we think they could and should.

As I skipped writing up weeknotes last week, a couple of different themes to reflect on this time around, and lots of assorted extra bits.

Digging into dialogue

One of the big challenges in seeking to embed participatory mechanisms for data governance into legislation, is that there is a big risk of creating yet-another-tick-box and ending up with low quality compliance-oriented engagement, rather than transformative forms of participation.

Over the last three weeks I’ve been an observer of the NHS AI Lab Public Dialogue on data stewardship: a process involving around 50 members of the public meeting for 12 hours (across four sessions) to share their ‘thoughts, aspirations, hopes and concerns’ about how access to healthcare data for AI purposes should be managed. I’ve got a full write up in the works, but it’s been a really interesting opportunity to watch a ‘dialogue on dialogue’ as members of the public explored different models for public engagement in governing access to health data.

I’ve also been trying to read up more on the history of public dialogue, as our expression of interest in partnership with OpenSAFELY to the RSA Rethinking Public Dialogue fund has made it through to the second round of bidding. Here’s the one paragraph summary of what we’re trying to develop:

“Connected By Data and OpenSAFELY will collaboratively develop a protocol for ‘dialogue on demand’: agile and inclusive mini-dialogues on data governance and research design decisions that are developed based on bottom-up input from affected groups, and that feed into both iterative data governance process refinement, and into focussed operational decision making.”

Plus, I had the opportunity last week to sit-in on a training delivered by Simon Burral of Involve for the Data Trusts Initiative on governance and engagement design.

All this has been really useful for starting to think about the different factors that might help deliver the ‘powerful say’ for data-affected communities that we’re calling for. For now, I’ve captured this as an opinionated statement on what meaningful and effective participation looks like:

Generally the more concrete the issue or situation that discussion can focus on, and the more ‘moving parts’ of that issue/situation that can be made legible to participants, the more meaningful the discussion is likely to be. And the more that points made in a discussion can be grounded in relatable lived experiences, the more powerful the messages from a discussion are likely to be.

Sector specifics

Over the last fortnight Jonathan and I have been round a few loops of trying to articulate simple (hypothetical) stories of how current data practices affect real people in our target sectors (debt; housing; education). It’s proven (surprisingly?) challenging to articulate the narratives for debt in short prose, I think for a number of reasons:

I’ve been trying to focus on present problems rather than future fears. Reports like the fantastically useful Governing data and artificial intelligence for all: Models for sustainable and just data governance arguably have an easier job of it by looking primarily at (reasonably) imagined future AI harms, rather than quantified current harms.
In many cases, I’ve been finding that present problems are covered by regulation in some form, even if the data component of the problem has limited governance. For example, we started looking at targeted loan advertising, but find that industry self-regulation has led to voluntary action not to take adverts from payday lenders. This limited governance at the application layer doesn’t remove the issue that data is collected, pooled and shared that could be used to target people with risky financial products, although it means that right now this harm isn’t generally observed.
There are multiple stages, and multiple actors, in any story of how people are Connected by Data. Where I started by trying to present stories of single named individuals, I’ve now been experimenting with sketching scenarios with visual representations of the data flows that connect people, and that raise data governance questions.
The data problems are often indirect. Jonathan did some great work developing problem trees for debt and data; demonstrating that there are a couple of steps between the abuse of data, and the ‘crunch’ of relatable harms. Those ‘harms’ are even trickier to land when they are the absence of actions (e.g. missing data-supported provision of support to someone in debt).

It’s feeling like (a) it might be a few more iterations before we land really clear example stories for each sector; (b) like we might be discovering some of the challenges with getting robust stories to land in the debt sector specifically. I’ll be looking next week at whether this means we should revisit some of our focus sector selection.

Reading and reflections

Critical data studies and outsider action

I skimmed through a fantastic new Critical Data Studies Reading List from Frances Corry and colleagues that is focussed on papers that critically explore the data pipeline for machine learning. I’ve found lots of background reading to add to my own backlog of papers to hopefully get to reading over this summer, but was also looking out for any papers that might hint towards a participatory response to the many problems in the AI data pipeline. The few that did jump out could be said to take a more or less an outsider advocacy approach: representing voices and perspectives from populations affected or harmed by the data choices of an AI system to highlight they were not considered in the initial dataset selection or design. Such advocacy has led, in a number of cases, to significant AI training datasets being withdrawn or substantially modified.

I’ve been reflecting on how such independent and outsider activism is a key part of a spectrum of participation: able to set the agenda for discussions in more formalised participative spaces, and to hold those spaces to account for their outcomes, to provide a check on corporate capture of participatory processes. There’s more to think about here, but it’s also worth explicitly noting that more formalised participation of groups affected by AI systems in dataset governance did not appear (at a read of titles / abstracts) to be part of the repertoire of solutions being put forward by researchers in the ML community covered by this particular reading list.

Legislating data loyalty

This interesting new paper on Legislating Data Loyalty has a lot of resonances with ideas around Connected by Data, framing data loyalty as made up of three key components: “a (1) relational duty; (2) that prohibits self-dealing (3) at the expense of a trusting party”

The concept felt slightly limited by restricting the duty of loyalty from a firm solely to those whose data they collect (data subjects) rather than those affected by the data (data stakeholders), but it makes for an interesting and challenging re-articulation of privacy law, with a focus on US privacy law debates.

Where the paper gets into the details of implementation (p 374) it explores an approach to dealing with “inevitable conflicts between [the interest of] trusting parties”, by proposing that firms have reference to the “collective best interests of trusting parties” although how this is to be determined is not explored. From a Connected by Data perspective, we might suggest that one way a firm can establish that it has sought to understand collective interests is through some form of robust independent dialogue with a broad cross-section of its ‘trusting parties’.

AI in the City: Building Civic Engagement & Public Trust

Lots of interesting short essays in this colloquium collection from Ana Branduescu and Jess Reia including points on the importance of power, doubt, open processes and voices of the marginalised when governing the introduction of technology into the urban setting.

Other things from the fortnight

Filed under ‘listing out stuff mainly so I don’t forget it’, and just in case it sparks a useful connection anywhere…

I did some work this week on potential metrics and measurement tools aligned with our theory of change, looking at the outputs, outcomes and impacts we might want to track.
I shared some feedback with colleagues at Research ICT Africa for a paper looking at African perspectives on data trusts and other collective data governance mechanisms.
I’ve reviewed a couple of papers for Data & Policy
I took part in an Institute for Government roundtable discussion on Data sharing during the pandemic and the confusingly named GPDPR (which, it turns out is just the start of the confusion…)
The Data Values Project has published their final white paper on Reimagining Data and Power which we shared some input on earlier in the year. It also includes a good example of showing how consultation has been taken into account with a table of changes in response to feedback.
Our proposal for a session on Collective Data Governance at the 2022 Internet Governance Forum was unfortunately not selected. I’ll be looking for other venues where we might take this conversation forward.

Weeknotes – July 8th 2022

[Cross-posted from Connected by Data blog]

A bumper two-week weeknotes today, as I was travelling last week (delightful Rail+Sail journey over to the Netherlands for a workshop with The Land Portal and Eurostar back for my first international trip in more than two years).

Researching collective narratives

The first thing to share is that I’ve just posted a call for a contract researcher (or team) to help us map out existing cases and media stories that talk about the impacts of data with a collective lens. We’re looking for an individual or team who can work between August and October searching out relevant stories (in focus areas of health, housing, debt and education), and build a framework for analysing how they address the collective dimensions of data impact.

The idea for this broad piece of mapping work came from our team day on Monday, where we looked at our current plans to commission a series of stories that help land the point that data needs to be governed collectively, rather than solely through individual consents and controls. We identified the need to both track down existing stories that we might amplify, and to understand more of how a collective lens is currently being adopted in popular stories about where data is being used or abused to help or harm communities.

Building a more diverse network

We had some discussions over whether to just reach out to researchers we know for this project, or whether to run an open call. The deciding factor was that we have a better chance of reaching a more diverse network of potential researchers with an open call, so, drawing on the fantastic guide Gavin Freeguard developed for MySociety on commissioning research we put together the full CfP and an application process.

We’ve setup the application form both for this particular opportunity, and to allow people to opt-in to being part of a ‘research pool’ we could draw on in future, and we’ve included a question that can help us to, when other factors are equal, to prioritise applications that help us use our position and privilege to help increase the diversity of the data policy field.

Are you a member of a community that is under-represented in work on data, digital and AI in the UK and Europe, and if so, how?

We are asking this question because we are particularly keen to work with a diverse and inclusive network of partners. Please only provide details you are comfortable with sharing.

I’ve also, alongside the obligatory data processing consent statement, included an experimental ‘collective data governance’ question. After all, people will be taking time to submit their information to this form, and might have ideas for what more they would like to see done with it.

Collective Data Governance question in application form

I have no idea what this will generate, if anything: but it will be interesting to see if it triggers any interesting ideas and responses.

Narratives and frames

As background to prepare the CfP, I spend some time going through an interesting paper from Skurka, Niederdeppe and Winett called ‘There’s More to the Story: Both Individual and Collective Policy Narratives Can Increase Support for Community-Level Action’ which uses an experimental design to explore whether individualised or collective narratives about food deserts, and narratives framed using left (equity) or right (loyalty) based language were more likely to solicit support for policy proposals based on a Social Determinants of Health (SDH) model. They present a detailed theoretical case for thinking through individual and collective storytelling, and mapping mechanisms such as identification, empathy, transportation, hostility and counterarguing that shape how an audience processes a story into policy support.

They highlight the concerns that “telling stories about individual cases – even when emphasising system and policy-level solutions- may inadvertently reinforce beliefs about personal responsibility for health, thereby undermining public willingness to support community-level efforts to address factors in the environment.”, although their experimental evidence does not appear to bear out this concern.

They also outline the distinction between narrative frameworks (which include a setting, characters, plot and moral), and message framing (the particular aspects of a story that are given emphasis). Critically, this highlights that both narrative frameworks, and message framing, may vary in their approach to an individual vs. collective dichotomy. For example, it is possible to have a narrative centred on an individual, but where the framing draws attention to collective level issues, or it is possible to have a narrative story told at the level of the community, but that emphasises issues of individual responsibility or action.

Whilst we’ve left things fairly broad in the CfP, just giving examples of the kinds of stories we hope to find, and planning to iterate with the selected researcher on the exact approach to categorising stories, I’m anticipating we might draw on some of the approaches and learning from the Global Voices Civic Media Observatory, which has developed a workflow for sourcing and annotating media stories to uncover the different frames at play.

Dialogue, decisions and design

I put the finishing touches to our expression of interest for the RSA’s call on Rethinking Public Dialogue this week, developed along with Jess Morely at OpenSAFELY. In a nutshell, our proposal is to explore a model of ‘dialogue on demand’: agile and inclusive mini-dialogues on data governance and research design decisions that are developed based on bottom-up input from affected groups, and that feed into both iterative data governance process refinement, and into focussed operational decision making.

Many of the public dialogues I’ve been looking at while building our participation cases database take the form of large-scale engagement activities with a broadly representative population, run over multiple weeks and months. Indeed, yesterday I had my first three hours as an observer for a current NHS AI focussed dialogue, run by IPSOS with the Open Data Institute, and that’s due to have three more three hour sessions (12 hours online dialogue time in all). Our working hypothesis for our RSA proposal is that, where this kind of model may be good at establishing general principles for how data should be governed, “A streamlined protocol for responsive informed dialogue, shaped by bottom-up inputs, can provide a scalable model for public engagement to be applied to live data and research governance.”.

Our proposal explores developing/adapting methods to map out the data flows involved in particular data-rich health research studies, and the potential outputs or outcomes from research, and then using visual and text artefacts from this mapping to solicit initial input from people who might be affected by a particular study, the data it uses, or the issues it might raise. From this, having potentially found communities affected by a given set of data governance decisions, we would then design shorter focussed dialogues rooted around very concrete cases, which, we hypothesise, will be more tangible than discussions about data sharing or governance ‘in general’, even if they can generate higher-level lessons for data governance practice.

Working out, in practice, how different publics can be engaged in data governance decisions is going to be really important to our work in the coming year, and is a piece of the work I’m particularly excited about.

We’ll hear more in the next few weeks about whether we can take this particular idea forward to a full proposal for the RSA, or whether we might need to find other ways to take it forward.

The intersection of data and AI governance

I’ve still got an outstanding task of trying to map where data and platform governance intersect, but this week I’ve been looking a bit more at how current work on data and AI governance might connect. Key to that was reading the new paper “Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem” from Sasha Constanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini (who, as an aside I must note, are each some of the most inspiring, thoughtful and engaged scholars and humans anyone could hope to learn from). It has a number of useful insights for our thinking about the potential to embed collective data governance into organisational practice.

In their interviews with ten leading algorithmic auditors, and a survey of more than 150 people connected to algorithmic audit, they find significant gaps in the involvement of affected stakeholders in the algorithmic audit process, with just 30% of auditors saying that consider real-world harm to stakeholders when auditing algorithms, and only two providing examples of this. As a result, Sasha, Deborah and Joy recommend that_ “It should be a priority for regulators to ensure that audits include affected stakeholders, and for organisations to establish internal policy that promotes direct involvement of the stakeholders most likely to be harmed by AI systems.”_ going on to argue that, whilst participatory practice can be messy, “Solutions should be informed by the existing field of participatory design, and by the growing community of design justice practitioners, and should be supported by a field-wide investment in strategies to meaningfully engage community partners and support community-led processes for algorithmic accountability.”

The paper also describes some of the challenges that internal (first-party), or contracted (second-party) teams involved in algorithmic audit face, in terms of resistance of organisations to engaging with audit processes that might lead to a need to change profitable practices, or restrictions on making audit findings public. This resonates with themes I’ve found in Waldman’s Industry Unbound, around the way in which corporate structures can significantly inhibit the freedom of workers to insert public interests into private enterprise, and points to some of the significant challenges that efforts to embed collective and participatory models of data governance will face.

I’ve also got a few other FAccT papers on my reading list thanks to Catherine D’Ignazio’s fantastic thread that picks out a number of the key findings. In particular, as we explore the point in our Theory of Change (update on that coming soon) that addresses developing a community of practice, this piece on tech worker organising looks particularly important to consider.

Learning to govern

The last two Monday evenings I’ve been undertaking mandatory online training as a new school parent governor at my son’s school. In the UK, over a quarter of a million people volunteer as school governors, taking on a strategic and oversight role for finance, staffing and school development. The training, unsurprisingly, was heavy on running through all the processes and practical activities of governance: from making and writing up school observation visits, to plotting a calendar of policy reviews and a cycle of meetings setting and tracking progress against improvement plans. We also spent some time exploring the different structures of governing boards depending on the type of school (local authority, foundation, multi-school trust etc.), and the different kinds of governor (some appointed by parents, others by the local authority or trust, others from the staff body etc.).

Of course, school governance is very well established, and models have, more-or-less, settled into place (albeit with constant government reforms leading to updates and changes). But reflecting on this day-to-day bit of the national governance infrastructure, and it’s strengths and weaknesses in practice (in the break-out sessions there was a bit of opportunity hear from other governors about how well the theory presented in the training represents the reality in their schools), has me wondering what sort of scale and structure collective data governance at scale might take? Do we need 1000s of people on standing structures, with robust training and development programmes in place, to govern our shared data infrastructures? Or is collective data governance most often going to be a ‘function’ that fits into existing governance structures? Or are there new models entirely that can take the best of new technical approaches, while remaining inclusive, accessible and accountable?

Perhaps, most importantly, the training, and my recent conversations with other people who have experiences in school governance, highlight that governance in practice is, of course, about people. Personalities, a desire of a group to ‘get on’ and a recognition of the need to support resource-constrained teams, can all both help governance work well, and, at the same time, create barriers to effective scrutiny and accountability.

Other things

Thanks to a kind invite from Asaf Lubin, I was on a Datasphere panel for the American Society of International Law last week, where our discussions touched on the interaction between agile regulation and public participation, and the need for data policy built on new narratives that understand the global and cross-boundary nature of contemporary data.
For a couple of freelance projects with organisations that have defined their strategies around open data, I’ve been trying to write about some of the big trends of the last decade that have been reframing openness. I’ll hopefully have that in a blog post form soon.
We had a team meeting day in Reading, which is written up in other’s team notes, and for which I spent some time digging into consultation responses to the Data Reform Bill (thanks to Peter Wells for this super helpful spreadsheet).
I’ve been working on updates to our sectoral scoping on debt, again hopefully with more to share soon.
I managed to follow most of the launch event for the Education Data Reality report from the Digital Futures Commission while on the train back from Amersfoort (super reliability of Dutch 4G) – which was packed full of useful insights to feed into our scoping of work on education as a sector. In short, there are a lot of questions to ask about how education data is being gathered and used, without a lot of good oversight right now (Note to self: explore whether the school governing board thinks about this at all!)

Weeknotes – June 24th 2022

[Cross-posted from Connected by Data blog]

There are a couple of themes that have run through this week that I’ve been trying to reflect on for this week’s weeknotes. The first of those is around the role of narratives and imagination, and the second, on approaches to legislating around data protection and sharing.

Narratives and imagination

Over the last few weeks I’ve been reading Ari Ezra Waldman’s Industry Unbound which provides an account of how even strong privacy advocates within the technology industry become co-opted into serving the goals of data-hungry corporations. This occurs through the reframing of privacy in terms of security, and the articulation of compliance regimes that sidestep substantive privacy issues and instead cast privacy narrowly in terms of transparency/notice and consent. Ari’s account argues that the policy space for thinking about meaningful privacy practice has been intentionally eroded by corporate lobbying, and space for meaningful privacy action within firms has been shut-down by bureacratic organisational practice that means privacy practitioners inside firms are excluded from design decision-making, or downplay concerns to avoid being cut-out of future discussions.

In the context of Data Reform Bill proposals to reduce the independence (or even existence) of Data Protection Officers, and shift towards a more US framework of organisational ‘privacy programmes’, Industry Unbound feels like essential reading. I’m not all the way through yet, but I’m already taking away a deeper appreciation of the hard work we have ahead to make sure any policy proposals Connected by Data may bring forward are, as far as possible, designed with potential patterns of corporate resistance in mind, and shaped to try and protect against the risk of they are simply translated in compliance checkboxes with their force ultimately blunted.

This has got me thinking more about the importance of Connected by Data work on developing and embedding narratives that tap into a broader view of both what we mean by protecting data, and what we mean by data sharing. It’s not enough to have policies that provide the ‘letter of’ participatory data governance, if we’ve not also secured engagement with the ‘spirit’ of the proposals too.

Right now, this feels like quite an uphill task. I was struck in the Living with Data panel I attended at the Data Power conference how difficult it appeared to be for people to imagine collective control over data, and indeed, in many cases, to imagine control over data at all outside of straight resistance to data collection. In a week when the MyData 2022 conference has been talking place in Helsinki, essentially doubling-down on models of individual data sovereighty that do little to disrupt narrow data discourses, we’ve been spending some time, led by Jonathan, on the Connected by Data brand narrative. Central to this is working out how to bring the problems of current data practice more clearly into view, and thinking about ways to support clearer collective imagination about the ways community-centred data governance could transform things.

On my ToDo list for the coming weeks is to work on a blog post on ‘Questions to ask about data governance?’ to try and capture some critical tools to bring into relief the problems with the status quo (reliance on notice and consent; narrowing of both privacy and data sharing concepts; failures of transparency etc.) as a first step to then supporting exploration of alternatives. I also found it useful in preparing for the Open Futures Salon on Thursday to look at the flow-chart of data governance processes they have set-out in their proposal for future Business to Government (B2G) data sharing in Europe, and to reflect on the kinds of participatory governance that might be possible at each level.

I found the State of Open Data panel I chaired on Wednesday was also a powerful reminder of the importance of ‘re-imagination’. Where I had anticipated that our discussions might get drawn into a focus on the deficits around open data and AI, the inputs from Reneta Avilla, Jeni Tennison and Feng Gao all offered a number of points of hope around building more inclusive data futures, putting particularly emphasis on cultures of openness, and the power of openness to support collaborative and imaginative problem solving. Rather than presenting a case to ‘go back’ to the open data of old, they each offered a view of an open data landscape which has become more nuanced, and that has, in practice, adapted to a much more complex landscape of data access and use, even while overarching narratives around an open binary, and open licenses, have not been wholly updated – at least at the global level. From this point, the session started to sketch out a way forward, building on the collaborative potential of open data: something also picked up in a blog post from Leigh Dodds this week. Reflecting on this session makes me reflect on how to make sure the Connected by Data narrative is about the future of data governance, not about recapturing a lost (and fictional) past.

Legislating lists or processes

I noticed an interesting resonnance between the two bits of proposed legislation I’ve had on my radar this week. Both the EU Data Act, and the update on government proposals for the Data Reform Bill, get into the question of listing particular categories of data that might be covered by B2G data sharing, and use by firms without needing to carry out legitimate interest balancing tests, respectively. And in both cases, the process of creating such lists to ‘bake into’ legislation is problematic. Either, legislation is inflexible, or, if mechanisms are put in place (as proposed around the Data Reform Bill) for secondary legislation to add categories, then there are significant concerns about not having adequate scrutiny of new categories, and risks that corporate lobbying will be able to extend or limit data sharing and processing.

In general then, there may be case to be developed for setting out the robust participatory processes that can sit in the place of legislated lists, or at least, that can be embedded as part of the way in which lists may be extended (or indeed curtailed). I’ve more to explore on whether there is precendent in the governance innovation space for this kind of approach (ping me if you’ve got experience here and would be up for a chat!), and to work out some ideas more concretely – but it seems we should be making the case that legislation that embeds space for dialogue and participatory decision making is more likely to be able to cope with the pace of technological change, than legislation that tries, a priori, to identify all the boundaries between frictionless or barrier-encountering data use and sharing.

Other things

I had a catch up with Michael Canare’s, where we talked a little about the Data Empowerment framework – which is something I need to dig into a bit more, particularly to explore the interaction of individual and collective empowerment around data.
After almost two months not touching a line of code, I worked up some Google Apps Script to get project-classified data from our accounting tool, FreeAgent, into a Google Spreadsheet to help with our financial tracking and reporting. Still some tidying up to do, and then I’ll try and share a version.
As of yet, I’ve not had any responses to the e-mails I sent last week to ask for details of company balancing tests.

Next week I’m off to Amsterdam (Monday) and Utrecht (Tuesday) for a bit of freelance work supporting Land Portal with their data strategy – but, on the off-chance, I should have a bit of time free both days if any Netherlands-based collective data governance folk fancy catching up for a chat. Let me know!.

Weeknotes – 17th June 2022

[Cross-posted from Connected by Data blog]

It’s been a week of planning & strategising, in-between two conference and panel-heavy weeks last week and next. On that note, do join me for the State of Open Data panel on AI next Wednesday (1pm BST), and at Open Future’s first salon looking at Business to Government Data Sharing on Thursday. Plus, I’m hoping to make it along to some of the online components of the Data Power conference.

Iterating on the case database

It looks like we’re getting into a good pattern of Monday and Wednesday team meetings, which offers a mix of focus on what we need to deliver (Monday meetings with a work planning spreadsheet) and a space to reflect on what we’re learning through the week (Wednesday meetings, where I experimented this week with bringing a sketch of the case database development for team feedback).

I’ve been getting a bit stuck with working out how to move forward the work I’ve been doing to build a dataset of cases of participatory data governance, particularly working out how to align this with our wider advocacy and practice work. So, picking up on the suggestion that it is sometimes easier to brainstorm in slides than in a prose document, I pulled together a short deck outlining where I’ve got to, and providing some rough mock ups of possible ways to expose the case study research on the Connected by Data website.

Caption: Rough mock up of Connected by Data website with four ‘calls to action’ that build on the case database work.

Feedback from Jeni and Jonathan pointed to a number of useful areas to explore more, including thinking about how far we editorialise cases to highlight our opinions on what best practice is, how we might work with partners to provide a long-term home to any case and method library resource we create, and how, when allowing users to browse by methods, we clearly communicate that effective participatory governance often requires a mix of methods.

In the deck I shared a few experiments that try and get at this latter point – visually presenting the ‘structure’ of the different cases I’ve surveyed to highlight that they involve multiple related components. I had initially thought that it might be possible to generate a ‘graph’ of relationships between components, but experimenting with mermaid.js graphs (And its nifty text to graph syntax) quickly revealed that it was going to be tricky to generate elegant presentations this way. Instead, I turned to a more linear approach to showing the structure of an example case, using icons from the noun project to start to pull out relevant facts about each component of a participatory data governance case, such as whether engagement activities are one-off, repeated, or ongoing, and whether they involved a single group over time, or multiple groups.

Image showing network graph, and linear graph, of case components: Rapid review; Dialogues (weighted sample), participant led research, specificailly impacts group sessions, and analysis and report.

I’m going to do some work in the coming weeks to explore engaging with a designer on a next iteration of this, helping to firm up some of the key concepts we want to communicate about getting the practice of participatory governance right.

Sector selection

As Jeni has explored in her weeknotes, we spent some time this week looking at selecting a small number of sectors in which to focus our work over the next year, settling on a shortlist of debt, education and housing. I’ve started writing up a scoping document for our sectoral focus on Debt, (incorporating consumer finance and gambling) to sketch out some of the key data governance issues, key stakeholders, and potential policy influence opportunities related to data governance. At this stage, the focus is on rapid research to validate whether or not this should be a focus sector for us, and to develop our shared understanding of the scope of the sector.

Campaign strategy

I also spent a bit of time this week talking with Jonathan about our next steps of campaign planning, and how to facilitate our next stage of work on the Data Rights Bill. More on that in the coming weeks.

Other notes

Workshop on Governing Knowledge Commmons

On Monday I dropped into an online session of the Workshop on Governing Knowledge Commons set-up for discussions of ‘half baked research ideas’ linked to smart cities and knowledge commons. There were a couple of really useful insights from the discussions, including tips from Brett Fischman on making sense of complex phenomena (like adoption of smart city technology, or, indeed, collective governance of data) through analysing in different action arenas from Macro (i.e. how is the city as a whole adopting a collective approach to data governance?), to Meso (how is work in the housing sector in the city adopting collective data governance?), to Micro (how is a particular project making use of a collective approach to data governance?).

Katherine Strandberg pointed to the particular features of the Governing Knowledge Commons (GKC) framework, as opposed to Ostrom’s commons governance work, in dealing with the fact that “knowledge commons are especially likely to have impact (positive or negative) beyond the community obviously involved in creating the knowledge”, such as in cases of patients involved in rare disease research.

In response to some of my musing on how we can use our Connected by Data case research to understand the kinds of governance appropriate to different situations, Brett offered the concept of externalities as one tool to use. Depending on the data and context in play, there may be different positive or negative externalities from data collection and use to worry about, and different kinds of governance institutions may be more or less effective at managing these.

Indigenous Data Sovereignty

Thanks to Jeff Doctor for sparing the time to chat through some of the ways Indigenous tech firm Animikii are thinking about data governance, and about some of the data (and wider) issues facing Indigenous communities. We touched on the challenge of identifying the legitimate collectives that have a role in governing data, particularly in cases where the claim of states to jurisdiction over territory and peoples remains contested, and the need to recognise the ongoing struggle that many Indigenous people face to find security, and to avoid being criminalised or marginalised through data-driven forms of surveillance and control. This brings into relief some of the challenge of designing participatory data governance approaches that engage those most affected by data use, whilst respecting that the point at which individuals and communities most experience data-based harms may be the point at which they have least capacity to engage in wider governance debates.

It was also insightful, amidst the talk that sometimes comes up around the Datasphere initiative of navigating data governance in a post-Westphalian order, to be reminded of the many Indigenous nation’s claims on land, that have long challenged the settled international boundaries taken for granted in so much work. Jeff pointed me in particular to the Land Rights statement of The Council of Chiefs of the Haudenosaunee, making a connection between data rights and land rights.

RightsCon

Building on last week’s weeknotes, I added a few bits into our write up of RightsCon which you can find here.

Visualising processes

On Wednesday I had a catch up with Mel Flanagan of Nook Studios, whose work seeks to make complex processes much more accessible through careful information design. Mel shared updates on the work they have been doing to join the dots between different open government initiatives and data silos, but we also talked briefly about ways the process-visualisations developed for this could be applied in data governance dialogue processes.

Legitimate interests research

Lastly, I ended the week by firing off a few ‘test requests’ for Legitimate Interest balancing tests from a selection of companies whose privacy policies invite users to request these.

Using the Princeton-Leuven Longitudinal Privacy Policy Dataset I’ve searched for “balancing test” and then identified a number of large websites that have text in their current privacy policies to the effect that: they process certain data on the basis of legitimate interests; they have carried out balancing tests; these balancing tests can be requested by emailing them.

Conscious of the controversy from the Princeton-Radboud Study on Privacy Law Implementation which sent simulated messages to request GDPR related implementation information to a large number of websites, triggering significant work by in-house legal teams, I’ve taken care to clearly identify in the outgoing messages that this is part of Connected by Data research work, and is not strictly a customer request, so I will be interested to see what replies, if any, we get.

This will help shape any future research work into how balancing tests are currently used, particularly relevant with the upcoming details of the Data Reform Bill.