A data portal deep dive

Over the last week I’ve been sharing a short series of articles exploring the past, present and future of (open) data portals. This comes as part of a piece of work I’m doing for the Open Data Institute on ‘Data Platforms and Citizen Engagement’.

The work starts from the premise that data portals have been an integral part of the open data movement. Indeed, for many (myself included) the open data movement was crystallised with, or first discovered through, the launch of platforms like Data.gov and Data.gov.uk. However, we are going on to ask whether, a decade on, portals still have a role to play? And if so, what might that role most usefully be? Ultimately, we’re asking if, and if so, how, portals might be (re-)shaped as effective platforms to support ongoing ambitions for open data to support meaningful citizen participation in all its forms.

Over the course of a short rapid research sprint I’ve been pulling at a couple of threads that might contribute to that inquiry. The goal has been to carry out some groundwork to support the next stage of the project: which we are hoping will take the form of some sort of design excercises, accompanied by a number of deeper conversations and possibly further research. I overshot my initial plan of spending five days ‘catching up’ with what’s been happening in the portal landscape since I last looked, not least because the simple answer is – a lot’s been happening. And, at the same time, if you compare a portal from 2012 with the same one today, the answer to the question ‘What’s changed?’ often also seems to be, not very much. The breadth and depth of work constructing and critiquing portals across the world is both impressive, and oppressive. It seems that, collectively, we know there are problems with portals, but, there is much less consensus on the way forward.

Each post in this series has tried to look at ‘the portals problem’ from one specific perspective, aiming to provide some shared context that might assist in future conversations. The posts are all over on PubPub, where they’re open to comment (free sign-up needed):

Terminology: When is a portal not a portal?

Technology: A genealogy of data portals

Research: The pressure on portals: an hourglass approach

Academia: Evidence and insights: other findings from research

Experiments: Selected examples of data portals

Organisational: The people and processes behind the portals

Engagement: Portals and participation

Speculation: Focussed futures: the portal as…

If, after exploring some of these, you think you might be interested in joining some of the open design sprint work we’re planning for next year  to build on this exploration – and on parallel strands of research that have been taking place (likely involving some online or in-person full and half-day sessions in early Feb) do drop me a line via twitter or (for this project only) my ODI e-mail address: tim.davies@theodi.org and I can share more info as plans firm up.

Data portals and citizen engagement: participation in context

I’m cross-posting this from a deep-dive series of working drafts I’ve been developing for The Open Data Institute, providing ground work for exploring potential future developments that could support data portals and platforms to function better as tools of civic participation. It provides a general history of the development of citizen participation, primarily in the UK context, that I hope may be of interest to a wide range of readers of this blog, as well as setting this in the context of data portals as participation tools (possibly more of a niche interest..). You can find the full series of posts which talk a lot more about data portals, here.

A key cause of data portal dissatisfaction is the apparent failure of portals to provide effective platforms for citizen participation in government and governance. The supposed promise of portals to act as participatory platforms can be read into the 2009 Obama Open Government Memo on transparent, participatory and collaborative government, and the launch of data.gov.uk amongst the hackathons and experiments with online engagement that surrounded the Power of Information report and taskforce. Popular portal maturity models have envisioned them evolving to become participatory platforms [1] [2] and whilst some work has acknowledged that there are different forms of participatory engagement with the state, ranging from monitorial democracy, to the co-production of public services [3], the mechanisms by which portals can help drive participation, and the forms of participation in focus, have been frequently under-theorised.

In the current policy landscape, there is a renewed interest in some forms of participatory engagement. Citizens assemblies, deliberative fora, and other forms of mini-public are being widely adopted as ways to find or legitimate ways forward on thorny and complex issues. Amidst concerns about public trust, democratic control, and embedded biases, there are calls for participatory processes to surround the design and deployment of algorithmic systems in particular [4], creating new pressure on participatory methods to engage effectively with data. However, public participation has a long history, and these latest trends represent just one facet of the kinds of processes and modes of engagement we need to have in mind when considering the role of data portals in supporting citizen engagement. In this short piece I want to briefly survey the history of public participation, and to identify potential insights for the development of data portals as a support for participatory processes. My focus here is primarily on the UK landscape, although I’ll try and draw upon wider global examples where relevant.

A short history of citizen participation

In the blog post ‘A brief history of participation’, historian Jo Guldi explores the roots of participatory governance ideas, tracing them as far back as the early mediaeval church, and articulating ideas of participatory governance as a reaction to the centralised bureaucracies of the modern nation state. Guldi points to the emergence of “a holistic political theory of self-rule applicable to urban planning and administration of everyday life” emerging in the 1960s, driven by mass youth movements, mass media, and new more inclusive notions of citizenship in an era of emerging civil rights. In essence, as the franchise, and education, expanded, default models of ‘elite governance’ came to be challenged by the idea that the public should have a greater voice in day to day decision making, if not greater direct ownership and control of public authority.

In Guldi’s global narrative, the emphasis of the 1970s and 80s was then on applying participatory ideas within the field of International Development, particularly participatory mapping – in which marginalised citizens are empowered to construct their own maps of territory: in a sense creating counter-data to secure land rights, and protect customary resources from logging or other incursions. Guldi points in particular to the role of institutions such as the World Bank in promoting participatory development practises, a theme also found in Leal’s ‘Participation: the ascendancy of a buzzword in the neo-liberal era[5]. Leal highlights how, although participatory methods have their roots in the emancipatory pedagogy of Paulo Friere and in Participatory Action Research, which aims at a transformation of individual capabilities alongside wider cultural, political and economic structures – the adoption of participation as a tool in development can act in practice as a tool of co-option: depoliticising critical decisions and offering participants only the option to modify, rather than fundamentally challenge, directions of development. Sherry Arnstein’s seminal ‘A ladder of citizen participation’ article [6], published in 1969 in an urban planning journal, has provided a reliable lens for asking whether participation in practice constitutes decoration, tokenism, or genuine citizen power.

Illustration of the ladder of participation from Arnstein’s original article, showing eight rungs, and three categories of participation, from ‘nonparticipation’, to ‘degrees of tokenism’ and up to ‘degrees of citizen power’.

In the UK, whilst radical participatory theory influenced grassroots community development work throughout the 1980s, it was with the election of the New Labour Government in 1997 that participation gained significant profile in mainstream policy-making: with major initiatives around devolution, the ‘duty to consult’, and an explosion of interest in participatory methods and initiatives. Fenwick and McMillan describe participation for New Labour as ‘something at the heart of the guiding philosophy of government’, framed in part as a reaction to the consumer-oriented marketised approach to public management of the Thatcher era.  Yet, they also highlight a tension between an ideological commitment to participation, and a managerial approach to policy that sought to also ‘manage’ participation and its outcomes. Over this period, a particular emphasis was placed on participation in local governance, leading top-down participation agendas to meet with grassroots communities and community development practices that had been forged through, and often in opposition to, recent decades of Conservative rule. At its best, this connection of participatory skill with space to apply it provided space for more radical experiments with community power. At its worst, and increasingly over time, it led to co-option of independent community actors within state-directed participation: leading ultimately to a significant loss of both state-managed and community-driven participatory practice when the ‘era of austerity’ arrived in 2010.

The 2000s saw a proliferation of guides, handbooks and resources (e.g.) outlining different methods for citizen participation: from consultation, to participatory budgeting, citizens panels, appreciative inquiries, participatory research, and youth fora. Digital tools were initially seen broadly as another ‘method’ of participation, although over time understanding (albeit still relatively limited) has developed of how to integrate digital platforms as part of wider participatory processes – and as digital development has become more central in policy making, user-involvement methodologies from software development have to be critically considered as part of the citizen participation toolbox. Concepts of co-production, co-design and user-involvement in service design have also increasingly provided a link-point between trends in digital development and citizen participation.

Looking at the citizen participation landscape in 2021, two related models appear to be particularly prominent: deliberative dialogues, and citizens assemblies. Both are predicated on bringing together broadly representative groups of citizens, and providing them with ‘expert input’, generally through workshop-based processes, and encouraging deliberation to inform policy, or to generate recommendations from an assembly. Notably, deliberative methods have been adopted particularly in relation to science and technology, seen as a way to secure public trust in emerging scientific or technological practice, including data sharing, AI and use of algorithmic systems. Whilst deliberative workshops and citizens assemblies are by no means the only participatory methods in use in 2021, they are notable for their reliance on expert input: although the extent to which direct access to data features in any of these processes is perhaps a topic for further research.

By right, or by results

Before I turn to look specifically at the intersection of data and participation, it is useful to briefly remark on two distinct lines of argument for participation: values or rights-based, vs. results based.

The rights-based approach can be found both in theories of participatory democracy that argue democratic mandate is not passed periodically from voters to representatives, but is constantly renewed through participatory activities engaging broad groups of citizens, and in human-rights frameworks, including notably the UN Convention on the Rights of the Child (UNCRC), which establishes children’s rights to appropriate participation in all decisions that affect them. Guidance on realising participation rights adopted in 2018 by the UN Human Rights Council explicitly makes a link with access to information rights, including proactive disclosure of information, efforts to make this accessible to marginalised groups, and independent oversight mechanisms.

A results-based approach to citizen participation is based on the idea that citizen engagement leads to better outcomes: including supporting more efficient and effective delivery of public services, securing greater citizen trust in the decisions that are made, or reducing the likelihood of decisions being challenged. Whilst some user and human-centred design methodologies may make reference to rights-based justifications for inclusion of often marginalised stakeholders, in general, these approaches are rooted more in a result-based than a rights-based framework: in short, many firms and government agencies have discovered projects have greater chance of success when you adopt consultative and participatory design approaches.

Participation, technology and data

Although there have been experiments with online participation since the earliest days of computer mediated communication, the rise of Web 2.0 brought with it substantial new interest in online platforms as tools of citizen engagement: both enabling insights to be gathered from existing online social spaces and digital traces, and supporting more emergent, ad-hoc or streamlined modes of co-creation, co-production, or simply communication with the state (as, for example, in MySociety’s online tools to write to public representatives, or report street scene issues in need of repair). There was also a shift to cast the private sector as a third stakeholder group within participatory processes – primarily framed as originator of ideas, but also potentially as the target of participation-derived messages. As the Open Government Partnership’s declaration puts it, states would “commit to creating mechanisms to enable greater collaboration between governments and civil society organizations and businesses.”

With rising interest in open data, a number of new modes and theories of participation came to the fore: the hackathon [7][8][9], the idea of the armchair auditor [10], and the idea of ‘government as a platform’ [11][12] each invoke particular visions of citizen-state and private-sector engagement.

A focus in some areas of government on bringing in greater service-design approaches, and rhetoric, if not realities, of data-driven decision making have also created new spaces for particular forms of participatory process, albeit state-initiated, rather than citizen created. And recent discussions around data portals and citizen participation have often centred on the question of how to get citizens to engage more with data, rather than how data can support existing or potential topic-focussed public participation.

In my 2010 MSc thesis on ‘Open Data, Democracy & Public Sector reform: open government data use from data.gov.uk’ I developed an initial typology of civic Open Government Data uses, based on a distinction between formal political participation (representative democracy), collaborative/community based participation (i.e. participatory democracy or utility-based engagement), and market participation (i.e. citizen as consumer). In this model, the role data plays, and the mechanisms it works through, vary substantially: from data being used through media to inform citizen scrutiny of government, and ultimately discipline political action through voting; to data enabling citizens to collaborate in service design, or independent problem solving beyond the state; and to the consumer-citizen driving change through better informed choices of access to public services. In other words, greater access to data theoretically enables a host of different genres of participation (albeit there’s a normative question over how meaningful or equitable each of these different forms of participation are) – and many of these do not rely on the state hosting or convening the participation process.

What is notable about each of these ‘mechanisms of change’ is that data accessed from a portal is just one component of a wider process: be that the electoral process in its entirety, a co-design initiative at the community level, or some national market-mechanism supported by intermediaries translating ‘raw data’ into more accessible information that can drive decisions over which hospital to use, or which school to choose for a child. However, whilst many participatory initiatives have suffered in an era of austerity, and enthusiasm for the web as an open agora for public debate has waned in light of a more hostile social media environment, portals have persisted as a primary expression of the ‘open government’ era: leaving considerable pressure placed upon the portal to deliver not only transparency, but also participation and collaboration too.

Citizen participation and data portals

What can we take from this brief survey of citizen participation when it comes to thinking about the role of data portals?

Firstly, the idea that portals as technical platforms can meaningfully ‘host’ participation in its entirety appears more or less a dead-end. Participation takes many varied forms, and whilst portals might be designed (and organisationally supported) in ways that position them as part of participatory democracy, they should not be the destination.

Secondly, different methods of citizen participation have different needs. Some require access to simple granular ‘facts’ to equalise the balance of power between citizen and state. Others look for access to data that can support deep research to understand problems, or experimental prototyping to develop solutions. Whilst in the former case, quick search and discovery of individual data-points is likely to be the priority, in these latter cases, greater understanding of the context of a dataset is likely to be particularly valuable, as would, in many cases, the ability to be in contact with a datasets’ steward.

Third, the current deliberative wave appears as likely to have data as its subject (or at least, the use of data in AI, algorithmic systems or other policy tools), as it is to use open data as an input to deliberation. This raises interesting possibilities for portals to surface and support great deliberation around how data is collected and used, as a precursor to supporting more effective use of that data to drive policy making.

Fourth, citizen participation has rarely been a ‘mass’ phenomena. Various research suggest that at any time less than 10% of the population are engaged in any meaningful form of civic participation, and only a percentage of these are likely to be involved in forms of engagement that are particularly likely to benefit from data. Portals should not carry the burden of solving a participation deficit, but there may be avenues to design them such that they connect with a wider group of active citizens than their current data-focussed constituency.

Fifth, and finally, citizen participation is not invented with the portal – and we need to be conscious of both the long history, and contested conceptualisations, of citizen participation. The government portal that seeks to add participatory features is unlikely to be able to escape the charge that it is seeking to ‘manage’ participation processes: although independently created or curated portals may be able to align with more bottom-up community participation action and operate within a more emancipatory, Frierian notion. Both data, and participation, are, after all, about power. And given power is generally always contested, the configuration of portals as a participatory tool may be similarly so.

Citations

  1. Alexopoulos, C., Diamantopoulou, V., & Charalabidis, Y. (2017). Tracking the Evolution of OGD Portals: A Maturity Model. In Lecture Notes in Computer Science (pp. 287–300). Springer International Publishing. https://doi.org/10.1007/978-3-319-64677-0_24

  2. Zhu, X., & Freeman, M. A. (2018). An evaluation of U.S. municipal open data portals: A user interaction framework. Journal of the Association for Information Science and Technology, 70(1), 27–37. https://doi.org/10.1002/asi.24081

  3. Ruijer, E., Grimmelikhuijsen, S., & Meijer, A. (2017). Open data for democracy: Developing a theoretical framework for open data use. Government Information Quarterly, 34(1), 45–52. https://doi.org/10.1016/j.giq.2017.01.001

  4. Wilson, C. (2021). Public engagement and AI: A values analysis of national strategies. Government Information Quarterly, 101652. https://doi.org/10.1016/j.giq.2021.101652

  5. Leal, P. A. (2007). Participation: The Ascendancy of a Buzzword in the Neo-Liberal Era. Development in Practice, 17(4/5), 539–548.

  6. Arnstein, S. R. (1969). A Ladder Of Citizen Participation. Journal of the American Institute of Planners, 35(4), 216–224. https://doi.org/10.1080/01944366908977225

  7. Johnson, P., & Robinson, P. (2014). Civic Hackathons: Innovation, Procurement, or Civic Engagement? Review of Policy Research, 31(4), 349–357. https://doi.org/10.1111/ropr.12074

  8. Sieber, R. E., & Johnson, P. A. (2015). Civic open data at a crossroads: Dominant models and current challenges. Government Information Quarterly, 32(3), 308–315. https://doi.org/10.1016/j.giq.2015.05.003

  9. Perng, S.-Y. (2019). Hackathons and the Practices and Possibilities of Participation. In The Right to the Smart City (pp. 135–149). Emerald Publishing Limited. https://doi.org/10.1108/978-1-78769-139-120191010

  10. O’Leary, D. E. (2015). Armchair Auditors: Crowdsourcing Analysis of Government Expenditures. Journal of Emerging Technologies in Accounting, 12(1), 71–91. https://doi.org/10.2308/jeta-51225

  11. O’Reilly, T. (2011). Government as a Platform. Innovations: Technology, Governance, Globalization, 6(1), 13–40. https://doi.org/10.1162/inov_a_00056

  12. The OECD digital government policy framework. (2020, October 7). OECD Public Governance Policy Papers. Organisation for Economic Co-Operation and Development  (OECD). https://doi.org/10.1787/f64fed2a-en

Fostering open ecosystems around data: The role of data standards, infrastructure and institutions

[Summary: an introduction to data standards, their role in development projects, and critical perspectives for thinking about effective standardisation and its social impacts]

I was recently invited to give a presentation as part of a GIZ ICT for Agriculture talk series, focussing on the topic of data standards. It was a useful prompt to try and pull together various threads I’ve been working on around concepts of standardisation, infrastructure, ecosystem and institution – particularly building on recent collaboration with the Open Data Institute. As I wrote out the talk fairly verbatim, I’ve reproduced it in blog form here, with images from the slides. The slides with speaker notes are also available shared here. Thanks to Lars Kahnert for the invite and opportunity to share these thoughts.

Introduction

In this talk I will explore some of the ways in which development programmes can think about shaping the role of data and digital platforms as tools of economic, social and political change. In particular, I want to draw attention to the often dry-sounding world of data standards, and to highlight the importance of engaging with open standardisation in order to avoid investing in new data silos, to tackle the increasing capture and enclosure of data of public value, and to make sure social and developmental needs are represented in modern data infrastructures.

Mind map of Tim's work, and logos of organisations worked with, including: Open Data Services, Open Contracting Partnership, Open Ownership, IATI, 360 Giving, Open Data in Development Countries, Open Data Barometer, Land Portal and UK Open Government Civil Society Network.

By way of introduction to myself: I’ve worked in various parts of data standard development and adoption – from looking at the political and organisational policies and commitments that generate demand for standards in the first place, through designing technical schema and digging into the minutiae of how particular data fields should be defined and represented, to supporting standard adoption and use – including supporting the creation of developer and user ecosystems around standardised data. 

I also approach this from a background in civic participation, and with a significant debt to work in Information Infrastructure Studies, and currently unfolding work on Data Feminism, Indigenous Data Sovereignty, and other critical responses to the role of data in society.

This talk also draws particularly on some work in progress developed through a residency at the Rockefeller Bellagio Centre looking at the intersection of standards and Artificial Intelligence: a point I won’t labour – as I fear a focus on ‘AI’ – in inverted commas – can distract us from looking at many more ‘mundane’ (also in inverted commas) uses of data: but I will say at this point that when we think about the datasets and data infrastructures our work might create, we need to keep in mind that these will likely end up being used to feed machine learning models in the future, and so what gets encoded, and what gets excluded from shared datasets is a powerful driver of bias before we even get to the scope of the training sets, or the design of the AI algorithms themselves. 

Standards work is AI work.

Enough preamble. To give a brief outline: I’m going to start with a brief introduction to what I’m talking about when I say ‘data standards’, before turning to look at the twin ideas of data ecosystems and data infrastructures. We’ll then touch on the important role of data institutions, before asking why we often struggle to build open infrastructures for data.

An introduction to standards

Each line in the image above is a date. In fact – they are all the same date. Or, at least, they could be. 

Showing that 12/10/2021 can be read in the US as 10th December, or in Europe as 12th October.

Whilst you might be able to conclusively work out most of them are the same date, and we could even write computer rules to convert them, because the way we write dates in the world is so varied, some remain ambiguous.

Fortunately, this is (more or less) a solved problem. We have the ISO8601 standard for representing dates. Generally, developers present ‘ISO Dates’ in a string like this:

2021-10-12T00:00:00+00:00

This has some useful properties. You can use simple sorting to get things in date order, you can include the time or leave it out, you can provide timezone offsets for different countries, and so-on. 

If everyone exchanging date converts their dates into this standard form, the risk of confusion is reduced, and a lot less time has to be spent cleaning up the data for analysis.

It’s also a good example of a building block of standardisation for a few other reasons:

  • The ISO in the name stands for ‘International Organization for Standardization’: the vast international governance and maintenance effort behind this apparently simple standard, which was first released in 1988, and last revised just two years ago.
  • The ‘8601’ is the standard number. There are a lot of standards (though not all ISO standards are data standards)
  • Uses of this one standard relies on lots of other standards: such as the way the individual numbers and characters are encoded when sent over the Internet or other media, and even standards for the reliable measurement of time itself.
  • And, like many standards, ISO 8601 is, in practice, rarely fully implemented. For example, whilst developers talk of using the ISO standard, what they actually rely on is from RFC3339, which leaves out lots of things in the ISO standard such as imprecise dates. As a rule of thumb: people copy implementations rather than read specifications.

Diagram showing ISO8601 as an interchange standard

ISO8601 is called an Interchange standard– that is, most systems don’t internally store data in ISO8601 when they want to process it, and it’s a cumbersome form that makes everyone write out the date in ISO format, instead – the standard sits in the middle – limiting the need to understand the specific quirks of each origin of data, and allowing receivers to streamline the import of data into their own models and methods.

And to introduce the first critical issue of standardisation – as actually implemented – it constrains what can be expressed: sometimes for good, and sometimes problematically.

Worked example: (a) The event took place in early November ; (b) 2021-11 fails validation; (c) To enter data, user adds arbitrary day: 2021-11-01; (d) Data from multiple sources can be analysed (all dates standardised) but the data might mislead: “The 1st of the Month is the best day to run events.”

For example, RFC3339 omits imprecise dates. That is, if you know that something happened in October 2021, but not which date – your data will fail validation if you leave out the day. So to exchange data using the standard you are forced to make up a day – often 1st of the month. A paper form would have no such constraint: users would just leave the ‘day’ box blank. The impact may be nothing, or, if you are trying to exchange data from certain places where, for legacy reasons, the day of the month of an event is not easily known – that data could end up distorting later analysis.

So – even these small building blocks of standardisation can have significant uses – and significant implications. But, when we think of standardisation in, for example, a data project to better understand supply chains, we might also be considering standards at the level of schema- the agreed way to combine lots of small atoms of data to build up a picture of some actions or phenomena we care about.

Diagrams showing table-based and tree-structured data.

A standardised data schema can take many forms. In data exchange, we’re often not dealing with relational database schemas, but with schemas that allow us to share an extract from a database or system.

That extract might take the form of tabular data, where our schema can be thought of as the common header row in a spreadsheet accompanied by definitions for what should go in each column.

Or it might be a schema for JSON or XML data: where the schema may also describe hierarchical relationships between, say a company, its products, and their locations. 

At a very simplified and practical level, a schema usually needs three things:

  • Field names (or paths)
  • Definition
  • Validation rules

Empty table with column headings: site_code, commodity_name, commodity_code, quantity and unit

For example, we might agree that whenever we are recording the commodities produced at a given site we use the column names site_code commodity_name, commodity_code, quantity and unit. 

We then need human readable definitions for what counts as each. E.g:

site_code - (string) a unique identifier for the site. commodity_name - (string) the given name for a primary agricultural product that can be bought and sold. Note: this is used for labelling only, and might be over-ridden by values taken from the commodity_code reference list. commodity_code- (string; enum) a value from the approved codelist that uniquely identifies the specific commodity. quantity- (number) the number of unit of the commodity. unit- (string; enum) the unit the quantity is measured in, from the list kg for kilograms or tonne for metric tonne. If quantities were collected in an unlisted unit, they should be converted before storage.

And we need validation rules to say that if we find, for example, a non-number character in the quantity column – the data should be rejected – as it will be tricky for systems to process.

Note that three of these potential columns point us to another aspect of standardisation: codelists.

A codelist is a restricted set of reference values. You can see a number of varieties here:

The commodity code is a conceptual codelist. In designing this hypothetical standard we need to decide about how to be sure apples are apples, and oranges are oranges. We could invent our own list of commodities, but often we would look to find an existing source.

We could, for example, use ‘080810’ taken from the World Custom Organisations Harmonised System codes

Or we could c_541 taken from AGROVOC – the FAO’s Agricultural Vocabulary.

The choice has a big impact: aligning us with export applications of the data standard, or perhaps more with agricultural science uses of the data.

site_code, by contrast, is not about concepts – but about agreed identifiers for real-world entities, locations or institutions. 

Without agreement across different dataset of how to refer to a farm, factory or other site, integrating data and data exchange can be complex: but maintaining these reference lists is also a big job, a potential point of centralisation and power, and an often neglected piece of data infrastructures.

For example: The Open Apparel Registry has developed unique production site identifiers by combining different existing datasets

Now – I could spend the rest of this talk digging deeper into just this small example – but let’s surface to make the broader points.

  1. Data standards are technical specifications of how to exchange data

The data should be collected in the way that makes the most sense at a local level (subject to ability to then fit into a standard) – and should be presented in the way that meets users needs. But when data is coming from many sources, and going to many destinations, a standard is a key tool of collaboration.

The standard is not the form.

The standard is not the report.

But, well designed, the standard is the powerful bit in the middle that ties together many different forms and sources of data, with many different reports, applications and data uses.

  1. Designing standards is a technical task

It needs an understanding of the systems you need to interface with.

  1. Designing standards goes beyond the technical tasks

It needs an awareness of the impact of each choice – each alignment – each definition and each validation rule.

There are a couple of critical dimensions to this: from thinking about whose knowledge is captured and shared through a standard, and who gains positions of power by being selected as the source of definitions and codelists. 

At a more practical level, I often use the following simple economic model to consider who bears the costs of making data interoperable:

Diagram showing 'Creator-->Intermediary-->User'

In any data exchange there is a creator, there may be an intermediary, and there is a data user.

The real value of standards comes when you have multiple creators, multiple users, and potentially, multiple intermediaries.

If data isn’t standardised, either an intermediary needs to do lots of work cleaning up the data….

Diagram showing different 'colours' of data from three creators, made interoperable by the work of an intermediary, to support three users.

…or each user has all that work to do before they can spend any time on data use and analysis.

The design choices made in a standard distribute the labour of making data interoperable across the parties in a data ecosystem. 

There may still be some work for users and intermediaries to do even after standardisation. 

For example, if you make it too burdensome for creators to map their data to a standard, they may stop providing it altogether. 

Diagram showing three creators, two of which have decided not to provide standardised data.

Or if you rely too much on intermediaries to clean up data, they may end up creating paywalls that limit use.

Or, in some cases, you may be relying on an exploitative ‘click-work’ market to clean data, that could have been better standardised at source.

So – to work out where the labour of making data interoperable should and can be located involves more than technical design.

You need to think about the political incentives and levers to motivate data creators to go to the effort of standardising their data.

You need to think about the business or funding model of intermediaries.

And you need to understand the wide range of potential data users: considering carefully whose needs are served simply, and who will end up still having to carry out lots of additional data collection and cleaning before their use-cases are realised. 

But, hoping I’ve not put you off with some of the complexity here, my fourth and final general point on standards is that: 

4. Data standards could be a powerful tool for the projects you are working on.

And to talk about this, let’s turn to the twin concepts of ecosystem and infrastructure.

Data ecosystems

Diagram showing: Decentralised and centralised networks. Supporting text: Standards can support decentralisation and innovation.

When the Internet and World Wide Web emerged, they were designed as distributed systems. Over recent decades, we’ve come to experience a web that is much more based around dominant platforms and applications. 

This ‘application thinking’ can also pervade development programmes – where, when thinking about problems that would benefit from better data exchange, funders might leap to building a centralised platform or application to gather data.

Diagram showing closed and open networks. Supporting text: Approaches open to decentralisation can support greater generativity, freedom and resilience

But – this ‘build it and they will come’ approach has some significant problems: not least that it only scales so far, it creates single points of failure, and risks commercial capture of data flows. 

By contrast, an open standard can describe the data we want to see better shared, and then encourage autonomous parties to create applications, tools, support and analysis on top of the standard. 

It can also give data creators and users much greater freedom to build data processes around their needs, rather than being constrained by the features of a central platform.

In early open data thinking, this was referred to as the model of ‘let many flowers bloom’, and as ‘publish once, use everywhere’ – but over time we’ve seen that, just like natural ecosystems – creating a thriving data ecosystem can require more intentional and ongoing action.

Diagrams showing natural and human-built 'ecosystems'.

Just like their natural counterparts, data ecosystems are complex, dynamic, and equilibrium can be hard to reach and fragile. Keystone species can support an ecosystems growth; whilst a local resource drought harming some key actors could lead to a cascading ecosystem collapse.

To give a more concrete example – a data ecosystem around the International Aid Transparency Initiative has grown up over the last decade – with hundreds of aid donors big and small sharing data on their projects in a common data format: the IATI Standard. ‘Keystone species’ such as D-Portal, which visualises the collected data, have helped create an instant feedback loop for data publishers to ‘see’ their data, whilst behind the scenes, a Datastore and API layer feeds all sorts of specific research and analysis projects, and operational systems – that one their own would have had little leverage to convince data publishers to send them standardised data.

However, elements of this ecosystem are fragile: much data enters through a single tool – AidStream – which over time has come to be the tool of choice for many NGOs and if unavailable would diminish the freshness of data. Many users accessing data rely on ‘the datastore’ which aggregates published IATI files and acts as an intermediary so users don’t need to download data from hundreds of different publishers. If the datastore is down, many usage applications may fail. Recently, when new versions of the official datastore hit technical trouble, an old open source software was brought back initially by volunteers.

Ultimately, this data ecosystem is more resilient than it would otherwise be because it’s built on an open standard. Even if the data entry tool, or datastore become inaccessible, new tools can be rapidly plugged in. But that they will can’t just be taken for granted: data ecosystems need careful management just as natural ones do.

Support standards over apps

The biggest point I want to highlight here is a design one. Instead of designing platforms and applications, it’s possible to design for, and work towards, thriving data ecosystems. 

That can require a different approach: building partnerships with all those who might have a shared interested in the kind of data you are dealing with, building political commitments to share data, investing in the technical work of standard development, and fostering ecosystem players through grants, engagement and support.

Building data ecosystems through standardisation can crowd-in investment: having a significant multiplier effect. 

Screenshot of Open Contracting Partnership worldwide map from https://www.open-contracting.org

For example, most of the implementations of the Open Contracting Data Standard have not been funded by the Open Contracting Partnership which stewards it’s design – yet they incorporate particular ideas the standard encodes, such as providing linked information on tender and contract award, and providing clear identifiers to the companies involved in procurement. 

For the low millions of dollars invested in maintaining OCDS since it’s first half-million dollar year-long development cycle – many many more millions from a myriad of sources have gone into building bespoke and re-usable tools, supported by for-profit and non-profit business models right across the world. 

And innovation has come from the edges, such as the adoption of the standard in Ukraine’s corruption-busting e-procurement system, or the creation of tools using the standard to analyse paper-based procurement systems in Nigeria.

As a side note here – I’m not necessarily saying ‘build a standard’: often times, the standards you need might almost be there already. Investing in standardisation can be a process of adaptation and engagement to improve what already exists.

And whilst I’m about to talk a bit more about some of the technical components that make standards for open data work well, my own experience helping to develop the ecosystem around public procurement transparency with the Open Contracting Data Standard has underscored for me the vitally important human community building element of data ecosystem building. This includes supporting governments and building their confidence to map their data into a common standard: walking the narrow line between making data inter-operable at a global level, and responding to the diverse situations in terms of legacy data systems, relevant regulation and political will different that countries found themselves in.

Infrastructures

Icon for concept of Infrastructure

I said earlier that it is productive to pair the concept of an ecosystem with that of an infrastructure. If ecosystems contain components adapted to each niche: an infrastructure, in data terms, is something shared across all sites of action. We’re familiar with physical infrastructures like the road and rail networks, or energy grid. These can provide useful analogies for thinking about data infrastructure. Well managed, data infrastructures are the underlying public good which enable an ecosystems to thrive.

Some components of data infrastructure: Schema & documentation; Validation tools; Reference implementations & code; Reference data; Data registries; Aggregators and APIs

In practice, the data infrastructure of a standard can involve a number of components:

  • There’s the schema itself and it’s documentation.
  • There might be validation tools that tell you if a dataset conforms with the standard or not.
  • There might be reference implementations and examples to work from.
  • And there might be data registries, or data aggregators and APIs that make it easier to get a corpus of standardised data.

Just like our physical infrastructure, there are choices to make in a data infrastructure over whose needs it will be designed around, how it will be funded, and how it will be owned and maintained.

For example, if the data ecosystem you are working with involves sensitive data, you may find you need to pair an open standard with a  more centralised infrastructure for data exchange, in which standardised data is available through a clearing-house which manages who has access or not to the data, or which assures anonymisation and privacy protecting practices have taken place.

By contrast, a data standard to support bi-lateral data exchange along a supply chain may need a good data validator tool to be maintained and provided for public access, but may have little need for a central datastore.

There’s a lot more written on the concept of data infrastructures: both drawing on technology literatures, and some rich work on the economics of infrastructure. 

But before sharing some closing thoughts, I want to turn briefly to thinking about ‘data institutions’ – the organisational arrangements that can make infrastructures and ecosystems more stable and effective – and that can support cooperation where cooperation works best, and create the foundations for competition where competition is beneficial.

Institutions

Standards adoption requires trust. Ownership, stewardship and institutions matter

A data standard is only a standard if it achieves a level of adoption and use. And securing that requires trust.

It requires those who might building systems that will work with the standard to be able to trust that the standard is well managed, and has robust governance. It requires users of schema and codelists to trust that they will remain open and free for use – rather than starting open and then later enclosed like the bait-and-switch that we’ve seen with many online platforms. And it requires leadership committing to adopting a standard to trust that the promises made for what it will do can be delivered. 

Behind many standards and their associated infrastructures – you will find carefully designed organisational structures and institutions.

Image showing governance structure of the Global Legal Entity Identifier Foundation

For example, the Global Legal Entity Identifier – designed to identify counter-parties in financial transactions – and avoid the kind of contagion of the 2008 financial crash, has layers of international governance to support a core infrastructure and policies for data standards and management, paired with licensed ‘Local Operating Units’ who can take a more entrepreneurial approach to registering companies for identifiers and verifying their identities.

The LEI standard itself has been taken through ISO committees to deliver a standard that is likely to secure adoption from enterprise users. 

Image showing the revision process described at https://standard.open-contracting.org/latest/en/governance/#revision-process

By contrast, the Open Contracting Data Standard I mentioned earlier is stewarded by an independent non-profit,

However, OCDS – seeking as I would argue it should – to disrupt some of the current procurement technology landscape – has not taken a formal standards body route – where there is a risk that well resourced incumbents would water down the vision within the standard. Instead, the OCDS team have  developed a set of open governance processes for changes to the standard that aim to make sure it retains trust from government, civil society and private sector stakeholders, whilst also retaining some degree of agility.

We’ve seen over the last decade that standards and software tools alone are not enough: they need institutional homes, and multi-disciplinary teams who can provide the ongoing support that technical maintenance work, with stakeholder engagement, and strategic focus on the kinds of change the standards were developed to deliver.

If you are sponsoring data standards development work that’s aiming for scale, are you thinking about the long-term institutional home that will sustain it?

If you’re not directly developing standards, but the data that matters to you is shaped by existing data standardisation, I’d also encourage you to ask: who is standing up for the public interest and the needs of our stakeholders in the standardisation process?

For example, over the last few weeks we’ve heard how a key mechanism to meet the goals of the COP26 Climate Conference will be in the actions of finance and investment – and a raft of new standards, many likely backed by technical data standards – are emerging for corporate Environmental, Social and Governance reporting.

There are relatively few groups out there like ECOS, the NGO network engaging in technical standards committees to champion sustainability interests. I’ve struggled to locate any working specifically on data standards. Yet, in the current global system of ‘voluntary standardisation’, standards get defined by those who can afford to follow the meetings and discussions and turn-up. Too often, that restricts those shaping standards to corporate and developed government interests alone.

If we are to have a world of technical and data standards that supports social change, we need more support for the social change voices in the room.

Closing reflections & challenges

As I was preparing this talk, I looked back at The State of Open Data – the book I worked on with IDRC a few years ago to review how open data ecosystems had developed across a wide range of sectors. One of things that struck me when editing the collection was the significant difference between how far different sectors have developed effective ecosystems for standardised, generative, and solution-focussed, data sharing. 

Whilst there are some great examples out there of data standards impacting policy, supporting smart data sharing and analysis, and fostering innovation – there are many places where we see dominant private platforms and data ecosystems developing that do not seem to serve the public interest – including, I’d suggest (although this is not my field of expertise) in a number of areas of agriculture.

So I asked myself why? What stops us building effective open standards? I have six closing observations, and in that, challenges, to share.

(1) Underinvestment

Data standards are not, in the scheme of things, expensive to develop: but neither is the cost zero – particularly if we want to take an inclusive path to standard development.

We consistently underinvest in public infrastructure, digital public infrastructure even more so. Supporting standards doesn’t deliver shiny apps straight away – but it can prepare the ground for them to flourish.

(2) Stopping short of version 2

How many of you are using Version 1.0 of a software package? Chances are most of the software you are using today has been almost entirely rewritten many times: each time building on the learning from the last version, and introducing new ideas to make it fit better with a changing digital world. But, many standards get stuck at 1.0. Funders and policy champions are reluctant to invest in iterations beyond a standard’s launch.

Managing versioning of data schema can involve some added complexity over versioning software – but so many opportunities are lost by a funding tendency to see standards work as done when the first version is released, rather than to plan for the ‘second curve’ of development.

(3) Tailoring to the dominant use case and (4) Trying to meet all use cases

Standards are often developed because their author or sponsor has a particular problem or use-case in mind. For GTFS, the General Transit Feed Specification that drives many of the public transport directions you find in apps like GoogleMaps, that problem was ‘finding the next bus’ in Portland Oregon. That might be the question that 90% of data users come to the data with; but there are also people asking: “Is the bus stop and bus accessible to me as a wheelchair user?” or “Can we use this standard to describe the informal Matatu routes in Nairobi where we don’t have fixed bus stop locations?”

A brittle standard that only focusses on the dominant use case will likely crowd out space for these other questions to be answered. But a standard that was designed to try and cater for every single user need would likely collapse under its own weight. In the GTFS case, this has been handled by an open governance process that has allowed disability information to become part of the standard over time, and an openness to local extensions.

There is an art and an ethics of standardisation here – and it needs interdisciplinary teams. Which brings me to recap my final learning on where things can go wrong, and what we need to do to design standards well

(5) Treating standards as a technical problem and (6) Neglecting the technical details

I suspect many here would not self-identify as ‘data experts’: yet everyone here could have a role to play in data standard development and adoption. Data standards are, at their core, a solution to coordination and collaboration problems, and making sure they work as such requires all sorts of skills, from policy, programme and institution design, to stakeholder engagement and communications. 

But – at the same time – data standards face real technical constraints, and require creative technical problem solving.

Showing that 12/10/2021 can be read in the US as 10th December, or in Europe as 12th October.

Without technical expertise in the room after all, we may well end up turning up a month late. 

Coda

I called this talk ‘Fostering open ecosystems’ because we face some very real risks that our digital futures will not be open and generative without work to create open standards. As the perceived value of data becomes ever higher, Silicon Valley capital may seek to capture the data spaces essential to solving development challenges. Or we may simply end up with development data locked in silos created by our failures to coordinate and collaborate. A focus on data standards is no magic bullet, but it is part of the pathway to create the future we need.

Can the UK’s Algorithmic Transparency Standard’ deliver meaningful transparency?

[Summary: a critical look at the UK’s Algorithmic Transparency Standard]

I was interested to see announcements today that the UK has released an ‘Algorithmic Transparency Standard’ in response to calls recommendations from the Centre for Data Ethics and Innovation (CDEI) “that the UK government should place a mandatory transparency obligation on public sector organisations using algorithms to support significant decisions affecting individuals”, and commitments in the National Data Strategy to “explore appropriate and effective mechanisms to deliver more transparency on the use of algorithmic assisted decision making within the public sector”and National AI Strategy to “Develop a cross-government standard for algorithmic transparency.”. The announcement is framed as “strengthening the UK’s position as a world leader in AI governance”, yet, at a closer look, there’s good reason to hold out judgement on whether it can deliver this until we see what implementation looks like.

Screenshot of press release: Press release UK government publishes pioneering standard for algorithmic transparency The CDDO has launched an algorithmic transparency standard for government departments and public sector bodies, delivering on commitments made in the National Data Strategy and National AI Strategy.

Here’s a rapid critique based purely on reading the online documentation I could find.  (And, as with most that I write, this is meant in spirit of constructive critique: I realise the people working on this within government, and advising from outside, are working hard to deliver progress often on limited resources and against countervailing pressures, and without their efforts we could be looking at no progress on this issue at all. I remain an idealist, looking to articulate what we should expect from policy, rather than what we can, right now, reasonably expect.)

There are standards, and there are standards

The Algorithmic Transparency Standard is made up of two parts:

  • An algorithmic transparency data standard’ – which at present is a CSV file listing 38 field names, brief descriptions, whether or not they are required fields, and ‘validation rules’ (given in all but one case, as ‘UTF-8 string’);
  • An algorithmic transparency template and guidance described as helping ‘public sector organisations provide information to the data standard’ and consisting of a Word document of prompts for information that is required by the data standards.

Besides the required/non-required field list from the CSV file, there do not appear to be any descriptions of what adequate or good free text responses to the various prompts, or any stated requirements concerning when algorithmic transparency data should be created or updated (notably, the data standard omits any meta-data about when transparency information was created, or by whom).

The press release describes the ‘formalisation’ route for the standard:

Following the piloting phase, CDDO will review the standard based on feedback gathered and seek formal endorsement from the Data Standards Authority in 2022.

Currently, the Data Standards Authority web pages “recommends a number of standards, guidance and other resources your department can follow when working on data projects”, but appear to stop short of mandating any for use.

The Data Standards Authority is distinct from the Open Standards Board which can mandate data standards for exchanging information across or from government.

So, what kind of standard is the Algorithmic Transparency Standard?

Well, it’s not a quality standard, as it lacks any mechanism to assess the quality of disclosures.

It’s not a policy standard as it’s use is not mandated in any strong form.

And it’s not really a data standard in it’s current form, as it’s development has not followed an open standards process, it doesn’t use a formal data schema language, nor is it on a data standards track.

And it’s certainly not an international standard, as it’s been developed solely through a domestic process.

What’s more, even the template ultimately isn’t all that much of a template, as it really just provides a list of information a document should contain, without clearly showing how that should be laid out or expressed – leading potentially to very differently formatted disclosure documents.

And of course, a standard isn’t really a standard unless it’s adopted.

So, right now, we’ve got the launch of some suggested fields of information that are suggested for disclosure when algorithms are used in certain circumstances in the public sector. At best this offers the early prototype of a paired policy and data standard, and stops far short of CDEI’s recommendation of a “mandatory transparency obligation on public sector organisations using algorithms to support significant decisions affecting individuals”.

Press releases are, of course, prone to some exaggeration, but it certainly raises some red flags for me to see such an under-developed framework being presented as the delivery of a commitment to algorithmic transparency, rather than a very preliminary step on the way.

However, hype aside, let’s look at the two parts of the ‘standard’ that have been presented, and see where they might be heading.

Evaluated as a data specification

The guidance for government or public sector employees using algorithmic tools to support decision-making on use of the standard asks them to fill out a document template, and send this to the Data Ethics team at Cabinet Office. The Data Ethics team will then publish the documents on Gov.uk, and reformat the information into the ‘algorithmic transparency data standard’, presumably to be published in a single CSV or other file collecting together all the disclosures.

Data specifications can be incredibly useful: they can support automatic validation of whether key information required by policy standards has been provided, and can reduce the friction of data being used in different ways, including by third parties. For example, in the case of an effective algorithmic transparency register, standardised structured disclosures could:

  • Drive novel interfaces to present algorithmic disclosures to the public, prioritising the information that certain stakeholders are particularly concerned above (see CDEI background research on differing information demands and needs);
  • Allow linking of information to show which datasets are in use in which algorithms, and even facilitate early warning of potential issues (e.g. when data errors are discovered);
  • Allow stakeholders to track when new algorithms are being introduced that affect a particular kind of group, or that involve a particular kind of risk;
  • Support researchers to track evolution of use of algorithms, and to identify particular opportunities and risks;
  • Support exchange of disclosures between local, national and international registers, and properly stimulate private sector disclosure in the way the press release suggests could happen;

However, to achieve this, it’s important for standards to be designed with various use-cases in mind, and engagement with potential data re-users. There’s no strong evidence in this case of that happening – suggesting the current proposed data structure is primarily driven by the ‘supply side’ list of information to be disclosed, and not be any detailed consideration of how that information might be re-used as structured data.

Diagram showing a cycle from Implementation, to Interoperability, to Validation, to Policy and Practice Change - surrounding a block showing the role of policy and guidance supporting an interplay between Standards and Specifications.
Modelling the interaction of data standards and policy standards (Source: TimDavies.org.uk)

Data specifications are also more effective when they are built with data validation and data use in mind. The current CSV definition of the standard is pretty unclear about how data is actually to be expressed:

  • Certain attributes are marked with * which I think means they are supposed to be one-to-many relationships (i.e. any algorithmic system may have multiple external suppliers, and so it would be reasonable for a standard to have a way of clearly modelling each supplier, their identifier, and their role as structured data) – but this is not clearly stated.
  • The ‘required’ column contains a mix of TRUE, FALSE and blank values – leaving some ambiguity over what is required (And required by who? With what consequence if not provided?)
  • The field types are almost all ‘UTF- string’, with the exception of one labelled ‘URL’. Why other link fields are not validated as URLs does not appear clear.
  • The information to be provided in many fields is likely to be fairly long blocks of text, even running to multiple pages. Without guidance on (a) suggested length of text; and (b) how rich text should be formatted; there is a big risk of ending up with blobs of tricky-to-present prose that don’t make for user-friendly interfaces at the far end.
Screenshot of spreadsheet available at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1036242/Algorithmic_transparency_data_standard.csv/preview
Screenshot of current Algorithmic Transparency Data Standard

As mentioned above, there is also a lack of meta-data in the specification. Provenance of disclosures is likely to be particularly important, particularly as they might be revised over time. A robust standard for an algorithmic transparency register should properly address this.

Data is more valuable when it is linked, and there are lots of missed opportunities in the data specification to create a better infrastructure for algorithmic transparency. For example, whilst the standard does at least ask for the company registration number of external suppliers (although assuming many will be international suppliers, an internationalised organization identifier approach would be better), it could be also asking for links to the published contracts with suppliers (using Contracts Finder or other platforms). More guidance on the use of source_data_url to make sure that, wherever a data.gov.uk or other canonical catalogue link for a dataset exists, this is used, would enable more analysis of commonly used datasets. And when it comes to potential taxonomies, like model_type, rather than only offering free text, is it beyond current knowledge to offer a pair of fields, allowing model_typeto be selected from a controlled list of options, and then more detail to be provided in a free-text model_type_detailsfield? Similarly, some classification of the kinds of services the algorithm affects using reference lists such as the Local Government Service list could greatly enhance usability of the data.

Lastly, when defined using a common schema language (like JSON Schema, or even a CSV Schema language), standards can benefit from automated validation, and documentation generation – creating a ‘Single Source of Truth’ for field definitions. In the current Algorithmic Transparency Standard there is already some divergence between how fields are described in the CSV file, and the word document template.

There are some simple steps that could be taken to rapidly iterate the current data standard towards a more robust open specification for disclosure and data exchange – but that will rely on at least some resourcing and political will to create a meaningful algorithmic transparency registers – and would benefit from finding a better platform to discuss a standard than a download on gov.uk.

Evaluated as a policy standard

The question “Have we met a good standard of transparency in our use of X algorithm?” is not answered simply by asserting that certain fields of information have been provided. It depends on whether those fields of information are accurate, clearly presented, understood by their intended users, and, in some way actionable (e.g. the information could be drawn upon to raise concerns with government, or to drive robust research).

The current ‘Algorithmic transparency template’ neither states the ultimate goal of providing information, nor give guidance on the processes to go through in order to provide the information requested. Who should fill in the form? Should a ‘description of an impact assessment conducted’ include the Terms of Reference for the assessment, or the outcome of it? Should risk mitigations be tied to individual risks, or presented at a general level? Should a template be signed-off by the ‘senior responsible owner’ of the tool? These questions are all left unanswered.

The list of information to be provided is, however, a solid starting point – and based in relevant consultation (albeit perhaps missing consideration of the role of intermediaries and advocacy groups in protecting citizen interests). What’s needed to make this into a robust policy standard is some sense of the evaluation checklist that needs to be carried out to judge whether a disclosure is a meaningful disclosure or not and some sense of how, beyond pilot, this might become more mandatory and part of the business process of deploying algorithmic systems, rather than simply an optional disclosure (i.e. pilots need to talk about the business process not just the information provision).

Concluding observations

The confusion between different senses of ‘standard’ (gold standard, data standard) can deliver a useful ambiguity for government announcements: but it’s important for us to scrutinise and ask what standards will really deliver. In this case, I’m sceptical that the currently described ‘standard’ can offer the kind of meaningful transparency needed over use of algorithms in government. It needs substantial technical and policy development to become a robust tool of good algorithmic governance – and before we shout about this as an international example, we need to see that the groundwork being laid is both stable, and properly built upon.

On a personal level, I’ve a good degree of confidence in the values and intent of the delivery teams behind this work, but I’m left with lingering concerns that political framing of this is not leading towards a mandatory register that can give citizens greater control over the algorithmic decisions that might affect them.

Tackling the climate crisis with data: what the built-environment sector can do

One of my first assignments now that I’m back working as an independent consultant was to support the Open Data Institute to develop a working paper on data, the built environment and climate action.

The draft paper was released on Monday, to coincide with the start of COP26, and ahead of the Open Data Institute’s Summit – and it will be open for comments and further input until 19th October.

Over here on Twitter I’ve put together a brief thread of some of the key messages in the report.

 

Joining the dance: ecology, AI and minimum shared frameworks?

[Summary: Fragments of reflection on the Decarbonisation and Decolonisation of AI]

I’ve spent some time this morning reading the ‘AI Decolonial Manyfesto’ which opens framed as “a question, an opening, a dance about a future of AI technologies that is decolonial”. Drawing on the insights, positions and perspectives of a fantastic collective authorship, it provides some powerful challenges for thinking about how to shape the future applications of AI (and wider data) technologies.

As I’ve been reading the Manyfesto on Decolonialisation in a short break from working on a project about Decabonisation – and the use of data and AI to mitigate and adapt to the pressing risks of climate breakdown, I find myself particularly reflecting on two lines:

“We do not seek consensus: we value human difference. We reject the idea that any one framework could rule globally.”

and

“Decolonial governance will recognize, in a way that Western-centric governance structures historically have not, how our destinies are intertwined. We owe each other our mutual futures.”

Discussions over the role of data in addressing the global climate crisis may veer towards proposing vast centralising data (and AI) frameworks (or infrastructures), in order to monitor, measure and manage low-carbon transitions. Yet – such centralising data infrastructures risk becoming part of systems that perpetuates historical marginalisation, rather than tools to address systemic injustice: and they risk further sidelining important other forms of knowledge that may be essential to navigate our shared future on a changing planet.

I’m drawn to thinking about the question of ‘minimum shared frameworks’ that may be needed both in national and global contexts to address the particular global challenge of the climate in which all our destines are intertwined. Yet, whilst I can imagine decentralised, (even decolonised?), systems of data capture, sharing and use in order to help accelerate a low-carbon transitions, I’m struggling at first-look to see how those might be brought into being at the pace required by the climate crisis.

Perhaps my focus for that should be on later lines of the Manyfesto:

“We seek to center the engineering, design, knowledge-production, and dispute-resolution practices of diverse cultures, which are embedded with their own value systems.”

My own cultural context, social role, academic training and temperament leaves me profoundly uncomfortable ending a piece of writing without a conclusion – even if a conclusion would be premature (one of the particular structures of the ‘Western male, white’ thought that perhaps does much harm). But, I suspect that here I need to simply take first steps into the dance, and to be more attuned to the way it flows…

A dirty cloud hangs over Javelin Park as auditors fail to establish legality of £600m contract

In what I think is now the longest running series of posts on this blog, I’m back reviewing documents relating to Gloucestershire’s Javelin Park Incinerator saga (see here). This is a long and particularly wonky post, put together quickly ahead of tomorrow’s Audit Committee Meeting. I’ll try and draw out some of these ad-hoc observations into something more accessible soon.

A brief history

To put the latest developments in context:

The Auditors response stops short of issuing a ‘Report in the Public Interest’, and triggering the associated public meetings. This appears to be at odds with the outcome that the objectors, Community R4C had expected based on their access to earlier drafts of the report from the auditor which concluded that a Public Interest Report was required [Addition: 29/09/21].

This post is my notes from reading the auditors letter and report.

What does the report find?

In short, the auditors conclude that they cannot draw a definitive conclusion with respect to audit objections:

  • Because of the length of time passed since the contract was signed;
  • Because of the complexity of the contacts financial model, and the assumed cost of assessing whether it’s renegotiation in 2016 shifted the economic balance in favour of the operator (UBB);
  • Because other Energy from Waste contracts are not transparent, making comparison to other authority contracts difficult.

They write that:

” our main objective is to explain why we have not reached a definitive conclusion, rather than to express such a conclusion,”

although then then state that:

“we do not consider that anything further by way of public reporting and consideration by a Committee of the Council is required”

with perhaps an implicit suggestion it seems that the council wanted to avoid public reporting of this at all?

However, on the substantive matters, the report finds (page 12):

  • The council conclusively did not consider whether the 2016 renegotiation shifted the economic balance in favour of UBB
  • The auditors consider it would have been appropriate to conduct such an assessment and to keep records of it;
  • The auditor does not agree with the council’s legal opinion that it was not required to produce such an assessment, but accepts that the council was acting on its own legal advice.

They go on to say:

“From an audit perspective, a decision making process followed by a council which accorded with its legal view at the time is not in itself necessarily a cause for concern simply because that legal view may have been erroneous. Such a process does not necessarily indicate that the council lacks appropriate internal procedures for ensuring the sound management of its resources.”

So, whilst the council relying upon faulty legal advice for a 25-year contract appears not to be grounds for a negative independent audit conclusion – it should surely be a significant matter of serious concern for the Audit and Governance Committee.

Put another way, the auditors conclude that:

“Our view, in line with the advice we have received from independent Counsel, is that the material we have so far considered is insufficient to enable us to reach a firm conclusion as to the lawfulness under procurement law of the modifications.

Which, although it appears nothing can now be done to exit the Javelin Park contract, leaves at 25-year, £600m commitments by Gloucestershire Taxpayers under a significant cloud.

Establishing the legality of their actions is surely the least we should expect from our local authorities, let alone establishing that they operate in the best-interests of local residents and the wider environment.

It is also notable that, had the authority not fought against disclosure of contract details until late 2018, more contemporary examination of the case may have been possible, lessening the auditors objection that too much time has passed to allow them to conduct a proper investigation. The auditor however studiously avoids this point by stating:

“It is not our function to ‘regulate’ the Council in terms of whether it is sufficiently committed to transparency, or whether it has unjustifiably refused to release information in response to Freedom of Information Act requests.”

Yet, transparency is a central part of our public governance arrangements (not least in supporting meaningful public engagement with audit objections), and for it to fall entirely outside the scope of auditors comments about whether processes were robust is notable.

Observations to explore further

As I’ve read through, a few different things have struck me – often in connection with past documents I’ve reviewed. I’ve made brief notes on each of these below.

Wider procurement issues

Page 12 of the letter states “We have not seen evidence that suggests that there may be a pattern of non-compliance with procurement law by the Council.”but does not detail whether any evidence was sought, or what steps were taken to be satisfied as to this conclusion. Notably, public news reports covering the periods 2015 – 2019 highlight other governance failings related to procurement (though not necessarily procurement law), and at least from a public perspective raise some red flags about whether appropriate controls and oversight have been in place at GCC.

Recycling incentives

On page 19, considering the impact of the contract on incentives to recycle states that:

“While the average cost per tonne does clearly reduce as the level of waste increases, which may be as a result of lower recycling rates, the Council does not have direct influence over recycling rates.”

This appears at odds with the fact the authority provide Waste Incentive Payments to Waste Collection Authorities designed to influence recycling rates, and that these rates have been altered since the Incinerator became operational.

What’s a fair comparison?

A considerable part of the case the Council rely upon to prove Value for Money of the incinerator is the report produced by EY that compares the cost of the renegotiated 2016 contract with the cost of terminating the contract and relying on landfill for the next 25 years.

The auditors note that:

“the Council was professionally advised during the negotiation process, including by EY on the VfM of the RPP in comparison to terminating the contract and relying on landfill.”

However, the scope of the EY report, which compares to the “Council’s internal Landfill Comparator” (see covering letter) was set not on the expert advice of EY, but at the instruction of the Council’s procurement lead, Ian Mawdsley. As I established in a 2019 FOI, when I asked for:

the written instructions (as Terms of Reference, e-mail confirmation or other documentary evidence) of the work that was requested from Ernst and Young. Depending on the process of commissioning this work, I would anticipate it would form a written document, part of a work order, or an e-mail from GCC to E&Y.

and the council replied:

“We have completed our investigation into the points you raise and can confirm that the council do not hold any separate written terms of reference as these were initially verbal and recorded in the document itself.

It seems reasonable to me that an expert advisor, given scope to properly consider alternatives, may have been able to, for example, compare termination against short-term landfill, followed by re-procurement. This should have been informed by the outcome of the Residual Waste Working Group Fallback Strategy that considered alternatives in 2013, but appears to have been entirely ignored by the Council administration.

If the council is to rely on ‘expert advice’ to establish that it, in good faith, sought to secure value for money on the project – then the processes to commission that advice, and the extent to which consultants were given a brief that allowed them to properly consider alternatives, should surely be considered?

Cancellation costs are a range: where are the error bars?

The auditor briefly considers whether councillors were given accurate information when, in meetings in 2015, they were debating contract cancellation costs of £60m – £100m.

My reading of the EY report, is that, on the very last page, it gives a range of possible termination costs for force majeure planning permission-related termination, with the lowest being £35.4m (against a high of £69.8m). Higher up, it reports a single figure of £59.8m. The figure of £100m is quoted as relating to ‘Authority Voluntary Termination’ by EY note they have not calculated this figure in detail. It therefore seems surprising to me for the auditors to conclude that, a meeting in 2015 considering contract cancellation, that was not provided with an officer report explaining either figure, but being told that cancellation costs were in the range £60m to £100m was:

“not distorted by inaccurate information.”

As surely the accurate information that should have been presented would have simply been:

  • EY have produced estimated costs in a range from £35.4m – £69.8m if we cancel due to passing the planning long-stop date. Their best estimate for a single figure in this range of £59.8m
  • EY have produced a rough estimate (but no calculations) of a cost of £100m if the authority cancels for other reasons outside the panning delay.
  • The council estimates that sticking with landfill for a period of time, and carrying out another procurement exercise could add up to X to the cost.

Eversheds advice

Re-reading the EY report, I note that it refers to separate advice provided by Eversheds on the issues of State Aid, Documentation Changes and Procurement risks including risks of challenge.

To my knowledge this advice has never been put in the public domain. It may be notable however, that the auditor does not reference this advice in their reply on the objectors allegation that the contract could have constituted illegal state aid.

Perhaps another FOI request if someone else wants to pick up the baton on that?

We should have recorded meetings!

I was present at the March 2019 meeting when the chief executive admitted that the council were in a poor negotiating position in relation the contract. My partner, Cllr Smith, raised the failure of the minutes to include this point at the subsequent meeting but it appears the administration were already attempting to remove this admission from the record.

Whilst the auditor states:

“In our view, even assuming that such a statement was made by the Chief Executive (and we make no finding as to whether it was: we note that the Council does not accept that your record of the meeting is accurate), it would not in itself justify our making a finding that the contract modifications shifted the balance of the contract in UBB’s favour.”

That this point is addressed, and that the Council administration have attempted to keep admissions in a public meeting of their weak negotiation position from the record, is of note.

With hindsight, given the Council chose to hold this meeting in a room that was not webcast, we should have arranged independent citizen led recording of the meeting.

A problem with facts?

The final line of the auditors letter, in their reasons for not seeking to make an application to the court for a declaration that council acts may have been against the law is rather curious:

“the issues underlying these matters are very fact specific such that there would be limited wider public interest in a court declaration.”

An argument for or against Open Contracting?

The report appears to make a strong case for wholesale Open Contracting when it comes to large EfW projects. They state:

“We accept that the comparisons included in the WRAP report do have significant limitations, mainly because they are, as the Council notes, quoted at a point in time and in isolation from the underlying contractual terms such as length of contract, risk share etc. Without access to such information on the contracts in place elsewhere, it is impossible to do a conclusive comparison, and even with full information on the various contracts, there would still be a good many judgements and assumptions involved in making a comparison because of, for example, variations in the ‘values’ associated with particular risks.”

In other words – the lack of transparency in Energy from Waste projects makes it nearly impossible to verify that the waste ‘market’ (which, because of geographical constraints and other factors is a relatively inflexible market in any case), has generated value for money for the public.

I’m also not sure why ‘values’ gets scare quotes in the extract above…

However, it appears to me that, rather than calling for greater publication of contracts, the auditors want to go the other way, and argue that contracting transparency could be bad for local authorities:

“Procurement law pursues objectives that are wider than promoting the efficient use of public resources. In particular, procurement law, as applicable at the relevant time, sought to pursue EU internal market objectives and to ensure the compliance of EU member states with obligations under the World Trade Organisation Global Procurement Agreement, by ensuring that contract opportunities were opened up to competition and that public procurement procedures were non-discriminatory and transparent. In some circumstances, public procurement law could potentially operate to preclude an authority from selecting an approach which could reasonably be regarded by the authority as the most economically efficient option available to it in the circumstances.”

This critique of the laws ‘as applicable at the relevant time’ (i.e. during EU membership) also raises a potential red flag about arguments post-Brexit Britain may increasingly see.

Is Local Audit Independent and Effective?

I recall hearing some critique of Grant Thornton’s audit quality – and struck by some of my concerns about how this objection letter reads, did a brief bit of digging into the regulators opinion.

In 2019/20, the Financial Reporting Council reviewed six local audits by Grant Thornton. None were fully accepted, with the regulator concluding that:

Thee audit quality results for our inspection of the six audits are unacceptable, with five audits assessed as requiring improvement, although no audits were assessed as requiring significant improvement.

going on to note that:

“At least two key findings were identified on all audits requiring improvement and therefore areas of focus are the audit of property valuation, assessment and subsequent testing of fraud risks, audit procedures over the completeness and accuracy of expenditure and EQC review procedures.”

Whilst this does not cover assessment of the quality of reports in relation to audit objections, it is notable that in their response to the report Grant Thornton state:

“We consider that VfM audit is at the centre of local audit. We take VfM work seriously, invest time and resources in getting it right, and give difficult messages where warranted. In the last year, we have issued a Report in the Public Interest at a major audit, Statutory Recommendations and Adverse VfM Conclusions.”

Yet, in the Gloucestershire case, the auditors failed have studiously avoided asking any substantive Value for Money questions about the largest ever contract for the local authority, either at the time the contract was negotiated, or following concerns raised by objectors.

In their response to objectors, Grant Thornton rely a number of times on the time elapsed since the contract was signed as a reason that they cannot conduct any VfM analysis. Yet, they were the auditors at the time significant multi-million capital sums were committed the project: which surely should have triggered contemporary VfM questions?

It’s notable that local electors are being asked to trust that Grant Thornton have very robust processes in place to protect against conflict of interest, not only because a finding that VfM was not secured would surely call into question the comprehensiveness of Grant Thornton’s past audit work (none of which is referenced in the report) and because, as we learnt from the £1m Ernst and Young Report relied upon to assert that the council had sought suitable independent advice, the financial models of Incinerator operator UBB were written by, none other than, Grant Thornton.

Scope of the EY report

(Oh, and today’s news on Grant Thornton doesn’t add to public confidence either.)

Effective objection, and the need for dialogue

One thing that has come across in years of reading the documents on this process, from Information Tribunal rulings, Court rulings and the Auditors letter, is the ‘frustration’ of the authorities (e.g. Judges, Auditors) being asked to ‘adjudicate’ in this case with one or other of the parties. At times, the Council has come in for thinly veiled or outright criticism for lack of co-operation, and Community R4C appear to have at times undermined their case by making what the auditors view as excessive or out-of-scope objections.

A few takeaways from this:

  • There is a high bar for citizen-objectors to clear in making effective objections, and little support for this. Community R4C have drawn on extensive pro-bono legal advice, crowd-funding and other resources – and yet their core case, that the project is neither Value for Money, nor in-line with the waste hierarchy, has never been properly heard: always ruled out of consideration on ‘technicalities’.
  • Objection processes need to be made more user-friendly: and at the same time, objectors need to be supported with advice and even intermediaries who can help support filtered and strategic use of public scrutiny powers.
  • The lack of openness from Gloucestershire County Council to dialogue has been perhaps the biggest cause of this saga running on: leading to frustrating, irritable and costly interactions through courts and auditors – rather than public discussion of constructive ways forward for waste management in the County.

Where next?

I’ll be interested to see the outcome of tomorrow’s meeting of the audit committee, where, even though there were only a few hours between the report and question deadline, I understand there will be a substantial number of public questions asked.

My sense is there still remains a strong case for an independent process to learn lessons from, what remains to my mind, a likely significant set of governance failures at Gloucestershire County Council, and to ensure future waste management is properly in line with the goal of increased recycling and waste reduction.

Overcoming posting-paralysis?

[Summary: I’m trying to post a bit more critical reflection on things I read, and to write up more of my learning in shared space. I’ve been exploring why that’s been feeling difficult of late.] 

Reading, blogging and engaging through social media used to be a fairly central part of my reflective learning practice. In recent years, my reading, note-taking and posting practices have become quite frayed. Although many times I get as far as a draft post or tweet thread of reflections, I’m often hit by a posting-paralysis – and I stop short both of engaging in open conversation, and solidifying my own reflections through a public post. As I return to a mix of freelance facilitation, research and project work (more on that soon), I’m keen to recover an open learning practice that makes effective use of online spaces

Inspired by Lloyd Davis’ explorations in ‘learning how to work out loud again’(appropriately so, since Lloyd’s earlier community convening and event hosting was a big influence on much of my earlier practice), I’m taking a bit of time in my first few days weeks back at work to identify what I want from a reflective learning practice, to try and examine the barriers I’ve been encountering, and to prototype the tools, processes and principles that might help me recapture some of the thinking space that, at it’s best, the online realm can still (I hope) provide.

Why post anyway?

The caption of David Eaves’ blog comes to mind: “if writing is a muscle, this is my gym”. And linked: writing is a tool of thought. So, if I want to think properly about the things I’m reading and engaging with, I need to be writing about them. And writing a blog post, or constructing a tweet thread, can be a very effective way to push that writing (and thinking) beyond rough bullet points, to more complete thoughts. Such posts often work well as external memory: more than once I’ve searched for an idea, and come upon a blog post I wrote about it many years ago – rediscovering content I might not have found had it been buried in a personal notebook. (It turns out comments from spam bots are also a good ‘random-access-memory-prompt’ on a wordpress blog.)

I’ve also long been influenced by my colleague Bill Badham’s commitment to shared learning. My work often affords me the privilege to read, research and reflect – and there’s something of an obligation to openly share the learning that arises. On a related note, I’m heavily influenced by notions of open academic debate, where there’s a culture (albeit not uncomplicated) of raising questions or challenging data, assumptions and conclusions in the interest of getting to better answers.

So what’s stopping you?

At risk of harking back to a golden age of RSS and blogging, that died along with Google Reader, I suspect I need to consciously adapt my practices to a changed landscape.

Online platforms have changed. I felt most fluent in a time of independent bloggers, slowly reading and responding to each other over a matter of days and weeks. Today, I discover most content via Tweets rather than RSS, and conversations appear to have a much shorter half-life, often fragmenting off into walled garden spaces, of fizzling out half completed as they get lost between different timezones. I’m reluctant to join discussions on walled garden platforms like Facebook, and often find it hard to form thoughts adequately in tweet length.

My networks have changed. At the macro level, online spaces (and public discourse more generally) feels more polarised and quick to anger: although I only find this when I voyage outside the relatively civil filter bubble of online I seem to have built. On the upside, I feel as thought the people I’m surrounded with online are more global, and more diverse (in part, from a conscious effort to seek more gender balance and diversity in who I follow): but on the flip-side, I’m acutely aware that when I write I can’t assume I’m writing into a common culture, or that what I intend as friendly constructive critique will be read as such. Linked to this:

I’m more aware of unintended consequences of a careless post. In particular, I’m aware that, as a cis white male online, I don’t experience even half of the background aggression, abuse, gaslighting or general frustration that many professional women, people of colour, or people from minority communities may encounter daily. What, for me, might be a quick comment on something I’ve read, could come across to others as ‘yet another’ critical comment rather than the ‘Yes, and’ I meant it to be.

There are lots of subtleties to navigate around when an @ mention might be seen as a hat-tip credit, vs. when it might be an unwelcome interruption.

My role has changedI still generally think of myself as a learner and junior practitioner, just trying to think out loud. But I’ve become aware from a couple of experiences that sometimes people take what I write more seriously! And that can be a little bit scary, or can place a different pressure on what I’m writing. Am I writing for my own process of thinking? Writing for others? Or writing for impact? Will my critical commentary be taken as having a weight I did not intend? And at the times when I do intend to write in order to influence, rather than just offer a viewpoint, do I need different practices?

My capacity, and focus, has changedThe pandemic and parenthood have squeezed out many of the time-slots I used to use for reflective writing: the train back from London, the early evening and so-on. I’m trying to keep social media engagement to my working hours too, to avoid distractions and disruption during time with family.

Over editingA lot of the work I’ve done over recent years has involved editing text from others, and it’s made me less comfortable with the flow-of-writing, overly subclaused, and less-than-perfectly-clear sentences I’m prone to blogging with. (Though I can still resist that inner editor, as this mess of a paragraph attests: I am writing mainly for my own thinking after all.)

So what do I do about it?

Well – I’m certainly not over posting paralysis: this post has been sitting in draft for a week now. But in the process of putting it together I’ve been exploring a few things:

  • A more conscious reading practice
  • Improving my note-taking tools
  • Linking blogging and social media use
  • Not putting too much pressure on public posting

I’ve brought scattered notes from the last few years together into a tiddlywiki instance, and have started trying to keep a daily journal there for ad-hoc notes from articles or papers I’m reading – worrying less about perfect curation of notes, and more about just capturing reflections as they arise. I’ve reset my feed reader, and bookmarking tools to better manage a reading list, and am trying to think more carefully about the time to give to reading different things.

I’ve also tried getting back to a blog-post format for responding to things I’m reading, rather than trying twitter threads, which, whilst they might have a more immediate ‘reach’, often feel to me both a bit forced, and demand more immediate follow-up to engage with than my capacity allows.

I was considering setting myself an artificial goal of posting daily or weekly, but for now I’m going to allow a more organic flow of posting, and review in a few weeks to see if developing the diagnosis, and some of the initial steps above, are getting my practice closer to where I want it to be.

I’ve posted this.

Reflections on “Participatory data stewardship: A framework for involving people in the use of data”

I read with interest the new Ada Lovelace report Participatory data stewardship: A framework for involving people in the use of data, not least because it connects two fields I’ve spent a good while exploring: participation & data governance.

Below I’ve shared a few quick notes in a spirit of open reflection (read mostly as ‘Yes, and…‘ rather than ‘No, but’):

The ladder: Arnstein, Hart and Pathways of Participation

Arnstein’s ladder of participation.
RSA ‘Remix’ of Arnstein

The report describes drawing on Sherry Arnstein’s ‘ladder of citizen participation’, but in practice uses an RSA simplification of the ladder into a five-part spectrum that cuts off the critical non-participation layers of Arnstein’s model. In doing this, it removes some of the key critical power of the original ladder as a tool to call out tokenism, and push for organisations to reach the highest appropriate rung that maximises the transfer of power.

I’ve worked with various remixes of Arnstein’s ladder over the years, particularly building on building on Hart’s youth engagement remix) that draws attention to distinction between ‘participant initiated’ vs. ‘organisationally initiated’ decision making. In one remix we put forward with Bill Badham and the NYA Youth Participation Team we set the ladder against the range of methods of participations, and explored the need for any participation architecture to think about the pathways of participation through which individuals grow in their capacity to exercise power over decisions.

It would be great to see further developments of the Ada Lovelace framework consider the mix of participatory methods that are appropriate to certain data use contexts, and how these can be linked together. For example, informing all affected stakeholders about a data use project can be the first rung on the ladder towards a smaller number becoming co-designers, joint decision makers, or evaluators. And to design a meaningful consultation reaching a large proportion of affected stakeholders might require co-design or testing with a smaller group of diverse collaborators first: making sure that questions are framed and explained in legitimate and accessible ways.

Data collection, datasets, and data use

“Well-managed data can support organisations, researchers, governments and corporations to conduct lifesaving health research, reduce environmental harms and produce societal value for individuals and communities. But these benefits are often overshadowed by harms, as current practices in data collection, storage, sharing and use have led to high-profile misuses of personal data, data breaches and sharing scandals.”

It feels to me as though the report falls slightly (though, to be fair, not entirely) into the trap of seeing data as a pre-existing fixed resource, where the main questions to be discussed are who will access a dataset, on what terms and to what end. Yet, data is under constant construction, and in participatory data stewardship there should perhaps be a wider set of questions explicitly on the table such as:

  • Should this data exist at all?
  • How can this data be better collected in ways that respect stakeholders needs?
  • What data is missing that should be here? Are we considering the ‘lost opportunites’ as well as the risks of misuse?
  • Is this data structured in ways that properly represent the interests of all stakeholders?

Personally, I’m particularly interested in the governance role of data standards and structures, and exploring models to increase diverse participation in shaping these foundational parts of data infrastructure.

Decentering the dataset

The report argues that:

“There are currently few examples of participatory approaches to govern access and use of data…”

yet, I wonder if this comes from looking through a narrow lens for projects that are framed as just about the data. I’d hazard that there are numerous social change and public sector improvement projects have drawn upon data-sharing partnerships – albeit framed in terms of service or community change, rather than data-sharing per-se.

In both understanding existing practice, and thinking about the future of participatory data governance practices, I suspect we need to look at how questions about data use are embedded within wider questions about the kinds of change we are seeking to create. For example, if a project is planning to pool data from multiple organisations to identify ‘at risk families’ and to target interventions, a participatory process should take in both questions of data governance and intervention design – as to treat the data process in issolation of the wider use process makes for a less accessible, and potentially substantially biased, process.

Direct participation vs. representatives

One of the things the matrix of (youth) participation model tries to draw out is the distinction between participatory modalities based on ad-hoc individual involvement where stakeholders participate directly, through to those that involve sustained patterns of engagement, but that often move towards models of representative governance. Knowing whether you are aiming for direct or representative-driven participation is an important part of then answering the question ‘Who to involve?’, and being clear on the kind of structures needed to then support meaningful participation.

Where next?

It’s great to see participatory models of data governance on the agenda of groups like Ada Lovelace – although it also feels like there’s a way still to go to see many decades learning from the participation field better connecting with the kinds of technical decisions that affect so many lives.

Integrating Loss and Hope

Five years ago this week we lost our first daughter to a late miscarriage.  I quietly collapsed. Unable to write, I abandoned my PhD dissertation – and have rarely felt fluent in writing since. We named our daughter Hope, because, after a long-time trying to conceive, she had given us hope of having a family.  Over the next two years we had two more miscarriages. I put my energy into work and politics. In 2019, walking the Coast to Coast route, we decided to explore adoption. Later this year we’re hoping to be granted an adoption order for two children placed with us last Autumn.

I’ve tried, and failed, to write about these experiences before. Miscarriage is a complicated and often very private loss. Adoption equally lacks simple narratives: in many cases everyone, adult and child, come to it from a place of both loss and of hope. All stories involving family are shared stories where each party to them has different experiences, needs for acknowledgement, and needs for privacy. And adopting during a pandemic complicates the hope of building new family, with the loss of normal social interactions and clarity about the future.

However, to leave these experiences out of the public ‘biography’ created through various writings and work here or elsewhere, creates a gap in my own story that I’ve found increasingly difficult: particularly as my focus in the coming years will be much more on the equal parenting two children with my partner than on the kinds of projects and work I’ve done in the past. As this comes to transform both what I work on and how, there is both hope for new adventures and growth, and a loss to acknowledge, familiar to many parents I’m sure, of established identity, roles and routines.

As the personal is never separate from the social, I also find it important to recognise the last year as one of profound societal and individual losses across the world: both directly from the pandemic, and from the wider environmental and political challenges it has placed into sharp relief. At the same time, the last year has provided glimpses of hope for new ways of living more connected and sustainable lives.

Returning to the personal: in many ways, it feels as though my last five years have been a time of living with loss, but acting with hope. I look towards future years of living with hope, but acting everyday in recognition of, and learning from, loss.

Coda

In the short liturgy we held to remember our first daughter, we used words from Emily Dickinson that I’m now reminded of daily when our children take joy in seeing the birds in the garden:

“Hope” is the thing with feathers –

That perches in the soul –

And sings the tune without the words –

And never stops – at all –

And sweetest – in the Gale – is heard –

And sore must be the storm –

That could abash the little Bird

That kept so many warm –

I’ve heard it in the chillest land –

And on the strangest Sea –

Yet – never – in Extremity,

It asked a crumb – of me.

Emily Dickinson, “Hope” is the thing with feathers