Digital Government – Tim's Blog

Can the UK’s Algorithmic Transparency Standard’ deliver meaningful transparency?

[Summary: a critical look at the UK’s Algorithmic Transparency Standard]

I was interested to see announcements today that the UK has released an ‘Algorithmic Transparency Standard’ in response to calls recommendations from the Centre for Data Ethics and Innovation (CDEI) “that the UK government should place a mandatory transparency obligation on public sector organisations using algorithms to support significant decisions affecting individuals”, and commitments in the National Data Strategy to “explore appropriate and effective mechanisms to deliver more transparency on the use of algorithmic assisted decision making within the public sector”and National AI Strategy to “Develop a cross-government standard for algorithmic transparency.”. The announcement is framed as “strengthening the UK’s position as a world leader in AI governance”, yet, at a closer look, there’s good reason to hold out judgement on whether it can deliver this until we see what implementation looks like.

Screenshot of press release: Press release UK government publishes pioneering standard for algorithmic transparency The CDDO has launched an algorithmic transparency standard for government departments and public sector bodies, delivering on commitments made in the National Data Strategy and National AI Strategy.

Here’s a rapid critique based purely on reading the online documentation I could find. (And, as with most that I write, this is meant in spirit of constructive critique: I realise the people working on this within government, and advising from outside, are working hard to deliver progress often on limited resources and against countervailing pressures, and without their efforts we could be looking at no progress on this issue at all. I remain an idealist, looking to articulate what we should expect from policy, rather than what we can, right now, reasonably expect.)

There are standards, and there are standards

The Algorithmic Transparency Standard is made up of two parts:

An algorithmic transparency data standard’ – which at present is a CSV file listing 38 field names, brief descriptions, whether or not they are required fields, and ‘validation rules’ (given in all but one case, as ‘UTF-8 string’);
An algorithmic transparency template and guidance described as helping ‘public sector organisations provide information to the data standard’ and consisting of a Word document of prompts for information that is required by the data standards.

Besides the required/non-required field list from the CSV file, there do not appear to be any descriptions of what adequate or good free text responses to the various prompts, or any stated requirements concerning when algorithmic transparency data should be created or updated (notably, the data standard omits any meta-data about when transparency information was created, or by whom).

The press release describes the ‘formalisation’ route for the standard:

Following the piloting phase, CDDO will review the standard based on feedback gathered and seek formal endorsement from the Data Standards Authority in 2022.

Currently, the Data Standards Authority web pages “recommends a number of standards, guidance and other resources your department can follow when working on data projects”, but appear to stop short of mandating any for use.

The Data Standards Authority is distinct from the Open Standards Board which can mandate data standards for exchanging information across or from government.

So, what kind of standard is the Algorithmic Transparency Standard?

Well, it’s not a quality standard, as it lacks any mechanism to assess the quality of disclosures.

It’s not a policy standard as it’s use is not mandated in any strong form.

And it’s not really a data standard in it’s current form, as it’s development has not followed an open standards process, it doesn’t use a formal data schema language, nor is it on a data standards track.

And it’s certainly not an international standard, as it’s been developed solely through a domestic process.

What’s more, even the template ultimately isn’t all that much of a template, as it really just provides a list of information a document should contain, without clearly showing how that should be laid out or expressed – leading potentially to very differently formatted disclosure documents.

And of course, a standard isn’t really a standard unless it’s adopted.

So, right now, we’ve got the launch of some suggested fields of information that are suggested for disclosure when algorithms are used in certain circumstances in the public sector. At best this offers the early prototype of a paired policy and data standard, and stops far short of CDEI’s recommendation of a “mandatory transparency obligation on public sector organisations using algorithms to support significant decisions affecting individuals”.

Press releases are, of course, prone to some exaggeration, but it certainly raises some red flags for me to see such an under-developed framework being presented as the delivery of a commitment to algorithmic transparency, rather than a very preliminary step on the way.

However, hype aside, let’s look at the two parts of the ‘standard’ that have been presented, and see where they might be heading.

Evaluated as a data specification

The guidance for government or public sector employees using algorithmic tools to support decision-making on use of the standard asks them to fill out a document template, and send this to the Data Ethics team at Cabinet Office. The Data Ethics team will then publish the documents on Gov.uk, and reformat the information into the ‘algorithmic transparency data standard’, presumably to be published in a single CSV or other file collecting together all the disclosures.

Data specifications can be incredibly useful: they can support automatic validation of whether key information required by policy standards has been provided, and can reduce the friction of data being used in different ways, including by third parties. For example, in the case of an effective algorithmic transparency register, standardised structured disclosures could:

Drive novel interfaces to present algorithmic disclosures to the public, prioritising the information that certain stakeholders are particularly concerned above (see CDEI background research on differing information demands and needs);
Allow linking of information to show which datasets are in use in which algorithms, and even facilitate early warning of potential issues (e.g. when data errors are discovered);
Allow stakeholders to track when new algorithms are being introduced that affect a particular kind of group, or that involve a particular kind of risk;
Support researchers to track evolution of use of algorithms, and to identify particular opportunities and risks;
Support exchange of disclosures between local, national and international registers, and properly stimulate private sector disclosure in the way the press release suggests could happen;

However, to achieve this, it’s important for standards to be designed with various use-cases in mind, and engagement with potential data re-users. There’s no strong evidence in this case of that happening – suggesting the current proposed data structure is primarily driven by the ‘supply side’ list of information to be disclosed, and not be any detailed consideration of how that information might be re-used as structured data.

Diagram showing a cycle from Implementation, to Interoperability, to Validation, to Policy and Practice Change - surrounding a block showing the role of policy and guidance supporting an interplay between Standards and Specifications. — Modelling the interaction of data standards and policy standards (Source: TimDavies.org.uk)

Data specifications are also more effective when they are built with data validation and data use in mind. The current CSV definition of the standard is pretty unclear about how data is actually to be expressed:

Certain attributes are marked with * which I think means they are supposed to be one-to-many relationships (i.e. any algorithmic system may have multiple external suppliers, and so it would be reasonable for a standard to have a way of clearly modelling each supplier, their identifier, and their role as structured data) – but this is not clearly stated.
The ‘required’ column contains a mix of TRUE, FALSE and blank values – leaving some ambiguity over what is required (And required by who? With what consequence if not provided?)
The field types are almost all ‘UTF- string’, with the exception of one labelled ‘URL’. Why other link fields are not validated as URLs does not appear clear.
The information to be provided in many fields is likely to be fairly long blocks of text, even running to multiple pages. Without guidance on (a) suggested length of text; and (b) how rich text should be formatted; there is a big risk of ending up with blobs of tricky-to-present prose that don’t make for user-friendly interfaces at the far end.

Screenshot of spreadsheet available at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1036242/Algorithmic_transparency_data_standard.csv/preview — Screenshot of current Algorithmic Transparency Data Standard

As mentioned above, there is also a lack of meta-data in the specification. Provenance of disclosures is likely to be particularly important, particularly as they might be revised over time. A robust standard for an algorithmic transparency register should properly address this.

Data is more valuable when it is linked, and there are lots of missed opportunities in the data specification to create a better infrastructure for algorithmic transparency. For example, whilst the standard does at least ask for the company registration number of external suppliers (although assuming many will be international suppliers, an internationalised organization identifier approach would be better), it could be also asking for links to the published contracts with suppliers (using Contracts Finder or other platforms). More guidance on the use of source_data_url to make sure that, wherever a data.gov.uk or other canonical catalogue link for a dataset exists, this is used, would enable more analysis of commonly used datasets. And when it comes to potential taxonomies, like model_type, rather than only offering free text, is it beyond current knowledge to offer a pair of fields, allowing model_typeto be selected from a controlled list of options, and then more detail to be provided in a free-text model_type_detailsfield? Similarly, some classification of the kinds of services the algorithm affects using reference lists such as the Local Government Service list could greatly enhance usability of the data.

Lastly, when defined using a common schema language (like JSON Schema, or even a CSV Schema language), standards can benefit from automated validation, and documentation generation – creating a ‘Single Source of Truth’ for field definitions. In the current Algorithmic Transparency Standard there is already some divergence between how fields are described in the CSV file, and the word document template.

There are some simple steps that could be taken to rapidly iterate the current data standard towards a more robust open specification for disclosure and data exchange – but that will rely on at least some resourcing and political will to create a meaningful algorithmic transparency registers – and would benefit from finding a better platform to discuss a standard than a download on gov.uk.

Evaluated as a policy standard

The question “Have we met a good standard of transparency in our use of X algorithm?” is not answered simply by asserting that certain fields of information have been provided. It depends on whether those fields of information are accurate, clearly presented, understood by their intended users, and, in some way actionable (e.g. the information could be drawn upon to raise concerns with government, or to drive robust research).

The current ‘Algorithmic transparency template’ neither states the ultimate goal of providing information, nor give guidance on the processes to go through in order to provide the information requested. Who should fill in the form? Should a ‘description of an impact assessment conducted’ include the Terms of Reference for the assessment, or the outcome of it? Should risk mitigations be tied to individual risks, or presented at a general level? Should a template be signed-off by the ‘senior responsible owner’ of the tool? These questions are all left unanswered.

The list of information to be provided is, however, a solid starting point – and based in relevant consultation (albeit perhaps missing consideration of the role of intermediaries and advocacy groups in protecting citizen interests). What’s needed to make this into a robust policy standard is some sense of the evaluation checklist that needs to be carried out to judge whether a disclosure is a meaningful disclosure or not and some sense of how, beyond pilot, this might become more mandatory and part of the business process of deploying algorithmic systems, rather than simply an optional disclosure (i.e. pilots need to talk about the business process not just the information provision).

Concluding observations

The confusion between different senses of ‘standard’ (gold standard, data standard) can deliver a useful ambiguity for government announcements: but it’s important for us to scrutinise and ask what standards will really deliver. In this case, I’m sceptical that the currently described ‘standard’ can offer the kind of meaningful transparency needed over use of algorithms in government. It needs substantial technical and policy development to become a robust tool of good algorithmic governance – and before we shout about this as an international example, we need to see that the groundwork being laid is both stable, and properly built upon.

On a personal level, I’ve a good degree of confidence in the values and intent of the delivery teams behind this work, but I’m left with lingering concerns that political framing of this is not leading towards a mandatory register that can give citizens greater control over the algorithmic decisions that might affect them.

Exploring participatory public data infrastructure in Plymouth

[Summary: Slides, notes and references from a conference talk in Plymouth]

Update – April 2020: A book chapter based on this blog post is now published as “Shaping participatory public data infrastructure in the smart city: open data standards and the turn to transparency” in The Routledge Companion to Smart Cities.

Original blog post version below:

A few months back I was invited to give a presentation to a joint plenary of the ‘Whose Right to the Smart City‘ and ‘DataAche 2017‘ conferences in Plymouth. Building on some recent conversations with Jonathan Gray, I took the opportunity to try and explore some ideas around the concept of ‘participatory data infrastructure’, linking those loosely with the smart cities theme.

As I fear I might not get time to turn it into a reasonable paper anytime soon, below is a rough transcript of what I planned to say when I presented earlier today. The slides are also below.

For those at the talk, the promised references are found at the end of this post.

Thanks to Satyarupa Shekar for the original invite, Katharine Willis and the Whose Right to the Smart Cities network for stimulating discussions today, and to the many folk whose ideas I’ve tried to draw on below.

Participatory public data infrastructure: open data standards and the turn to transparency

In this talk, my goal is to explore one potential strategy for re-asserting the role of citizens within the smart-city. This strategy harnesses the political narrative of transparency and explores how it can be used to open up a two-way communication channel between citizens, states and private providers.

This not only offers the opportunity to make processes of governance more visible and open to scrutiny, but it also creates a space for debate over the collection, management and use of data within governance, giving citizens an opportunity to shape the data infrastructures that do so much to shape the operation of smart cities, and of modern data-driven policy and it’s implementation.

In particular, I will focus on data standards, or more precisely, open data standards, as a tool that can be deployed by citizens (and, we must acknowledge, by other actors, each with their own, sometimes quite contrary interests), to help shape data infrastructures.

Let me set out the structure of what follows. It will be an exploration in five parts, the first three unpacking the title, and then the fourth looking at a number of case studies, before a final section summing up.

Participatory public data infrastructure
Transparency
Standards
Examples: Money, earth & air
Recap

Part 1: Participatory public data infrastructure

Data infrastructure

infrastructure. /?nfr?str?kt??/ noun. “the basic physical and organizational structures and facilities (e.g. buildings, roads, power supplies) needed for the operation of a society or enterprise.” 1

The word infrastructure comes from the latin ‘infra-‘ for below, and structure, meaning structure. It provides the shared set of physical and organizational arrangements upon which everyday life is built.

The notion of infrastructure is central to conventional imaginations of the smart city. Fibre-optic cables, wireless access points, cameras, control systems, and sensors embedded in just about anything, constitute the digital infrastructure that feed into new, more automated, organizational processes. These in turn direct the operation of existing physical infrastructures for transportation, the distribution of water and power, and the provision of city services.

However, between the physical and the organizational lies another form of infrastructure: data and information infrastructure.

(As a sidebar: Although data and information should be treated as analytically distinct concepts, as the boundary between the two concepts is often blurred in the literature, including in discussions of ‘information infrastructures’, and as information is at times used as a super-category including data, I won’t be too strict in my use of the terms in the following).

(That said,) It is by being rendered as structured data that the information from the myriad sensors of the smart city, or the submissions by hundreds of citizens through reporting portals, are turned into management information, and fed into human or machine based decision-making, and back into the actions of actuators within the city.

Seen as a set of physical or digital artifacts, the data infrastructure involves ETL (Extract, Transform, Load) processes, APIs (Application Programming Interfaces), databases and data warehouses, stored queries and dashboards, schema, codelists and standards. Seen as part of a wider ‘data assemblage’ (Kitchin 5) this data infrastructure also involves various processes of data entry and management, of design, analysis and use, as well relationships to other external datasets, systems and standards.

However, if is often very hard to ‘see’ data infrastructure. By their very natures, infrastructures moves into the background, often only ‘visible upon breakdown’ to use Star and Ruhleder’s phrase 2. (For example, you may only really pay attention to the shape and structure of the road network when your planned route is blocked…). It takes a process of “infrastructural inversion” to bring information infrastructures into view 3, deliberately foregrounding the background. I will argue shortly that ‘transparency’ as a policy performs much the same function as ‘breakdown’ in making the contours infrastructure more visible: taking something created with one set of use-cases in mind, and placing it in front of a range of alternative use-cases, such that its affordances and limitations can be more fully scrutinized, and building on that scrutiny, it’s future development shaped. But before we come to that, we need to understand the extent of ‘public data infrastructure’ and the different ways in which we might understand a ‘participatory public data infrastructure’.

Public data infrastructure

There can be public data without a coherent public data infrastructure. In ‘The Responsive City’ Goldsmith and Crawford describe the status quo for many as “The century-old framework of local government – centralized, compartmentalized bureaucracies that jealously guard information…” 4. Datasets may exist, but are disconnected. Extracts of data may even have come to be published online in data portals in response to transparency edicts – but it exists as islands of data, published in different formats and structures, without any attention to interoperability.

Against this background, initiatives to construct public data infrastructure have sought to introduce shared technology, standards and practices that provide access to a more coherent collection of data generated by, and focusing on, the public tasks of government.

For example, in 2012, Denmark launched their ‘Basic Data’ programme, looking to consolidate the management of geographic, address, property and business data across government, and to provide common approaches to data management, update and distribution 6. In the European Union, the INSPIRE Directive and programme has been driving creation of a shared ‘Spatial Data Infrastructure’ since 2007, providing reference frameworks, interoperability rules, and data sharing processes. And more recently, the UK Government has launched a ‘Registers programme’ 8 to create centralized reference lists and identifiers of everything from countries to government departments, framed as part of building governments digital infrastructure. In cities, similar processes of infrastructure building, around shared services, systems and standards are taking place.

The creation of these data infrastructures can clearly have significant benefits for both citizens and government. For example, instead of citizens having to share the same information with multiple services, often in subtly different ways, through a functioning data infrastructure governments can pick up and share information between services, and can provide a more joined up experience of interacting with the state. By sharing common codelists, registers and datasets, agencies can end duplication of effort, and increase their intelligence, drawing more effectively on the data that the state has collected.

However, at the same time, these data infrastructures tend to have a particularly centralizing effect. Whereas a single agency maintaining their own dataset has the freedom to add in data fields, or to restructure their working processes, in order to meet a particular local need – when that data is managed as part of a centralized infrastructure, their ability to influence change in the way data is managed will be constrained both by the technical design and the institutional and funding arrangements of the data infrastructure. A more responsive government is not only about better intelligence at the center, it is also about autonomy at the edges, and this is something that data infrastructures need to be explicitly designed to enable, and something that they are generally not oriented towards.

In “Roads to Power: Britain Invents the Infrastructure State” 10, Jo Guldi uses a powerful case study of the development of the national highways networks to illustrate the way in which the design of infrastructures shapes society, and to explore the forces at play in shaping public infrastructure. When metaled roads first spread out across the country in the eighteenth century, there were debates over whether to use local materials, easy to maintain with local knowledge, or to apply a centralized ‘tarmacadam’ standard to all roads. There were questions of how the network should balance the needs of the majority, with road access for those on the fringes of the Kingdom, and how the infrastructure should be funded. This public infrastructure was highly contested, and the choices made over it’s design had profound social consequences. Jo uses this as an analogy for debates over modern Internet infrastructures, but it can be equally applied to explore questions around an equally intangible public data infrastructure.

If you build roads to connect the largest cities, but leave out a smaller town, the relative access of people in that town to services, trade and wider society is diminished. In the same way, if your data infrastructure lack the categories to describe the needs of a particular population, their needs are less likely to be met. Yet, that town connected might also not want to be connected directly to the road network, and to see it’s uniqueness and character eroded; much like some groups may also want to resist their categorization and integration in the data infrastructure in ways that restrict their ability to self-define and develop autonomous solutions, in the face of centralized data systems that are necessarily reductive.

Alongside this tension between centralization and decentralization in data infrastructures, I also want to draw attention to another important aspect of public data infrastructures. That is the issue of ownership and access. Increasingly public data infrastructures may rely upon stocks and flows of data that are not publicly owned. In the United Kingdom, for example, the Postal Address File, which is the basis of any addressing service, was one of the assets transferred to the private sector when Royal Mail was sold off. The Ordnance Survey retains ownership and management of the Unique Property Reference Number (UPRN), a central part of the data infrastructure for local public service delivery, yet access to this is heavily restricted, and complex agreements govern the ability of even the public sector to use it. Historically, authorities have faced major challenges in relation to ‘derived data’ from Ordnance Survey datasets, where the use of proprietary mapping products as a base layer when generating local records ‘infects’ those local datasets with intellectual property rights of the proprietary dataset, and restricts who they can be shared with. Whilst open data advocacy has secured substantially increased access to many publicly owned datasets in recent years, when the datasets the state is using are privately owned in the first place, and only licensed to the state, the potential scope for public re-use and scrutiny of the data, and scrutiny of the policy made on the basis of it, is substantially limited.

In the case of smart cities, I suspect this concern is likely to be particularly significant. Take transit data for example: in 2015 Boston, Massachusetts did a deal with Uber to allow access to data from the data-rich transportation firm to support urban planning and to identify approaches to regulation. Whilst the data shared reveals something of travel times, the limited granularity rendered it practically useless for planning purposes, and Boston turned to senate regulations to try and secure improved data 9. Yet, even if the city does get improved access to data about movements via Uber and Lyft in the city – the ability of citizens to get involved in the conversations about policy from that data may be substantially limited by continued access restrictions on the data.

With the Smart City model often involving the introduction of privately owned sensors networks and processes, the extent to which the ‘data infrastructure for public tasks ceases to have the properties that we will shortly see are essential to a ‘participatory public data infrastructure’ is a question worth paying attention to.

Participatory public data infrastructure

I will posit then that the grown of public data infrastructures is almost inevitable. But the shape they take is not. I want, in particular then, to examine what it would mean to have a participatory public data infrastructure.

I owe the concept of a ‘participatory public data infrastructure’ in particular to Jonathan Gray ([11], [12], [13]), who has, across a number of collaborative projects, sought to unpack questions of how data is collected and structured, as well as released as open data. In thinking about the participation of citizens in public data, we might look at three aspects:

Participation in data use
Participation in data production
Participation in data design

And, seeing these as different in kind, rather than different in degree, we might for each one deploy Arnstein’s ladder of participation [14] as an analytical tool, to understand that the extent of participation can range from tokenism through to full shared decision making. As for all participation projects, we must also ask the vitally important question of ‘who is participating?’.

At the bottom-level ‘non-participation’ runs of Arnstein’s ladder we could see a data infrastructure that captures data ‘about’ citizens, without their active consent or involvement, that excludes them from access to the data itself, and then uses the data to set rules, ‘deliver’ services, and enact policies over which citizens have no influence in either their design of delivery. The citizen is treated as an object, not an agent, within the data infrastructure. For some citizens contemporary experience, and in some smart city visions, this description might not be far from a direct fit.

By contrast, when citizens have participation in the use of a data infrastructure they are able to make use of public data to engage in both service delivery and policy influence. This has been where much of the early civic open data movement placed their focus, drawing on ideas of co-production, and government-as-a-platform, to enable partnerships or citizen-controlled initiatives, using data to develop innovative solutions to local issues. In a more political sense, participation in data use can remove information inequality between policy makers and the subjects of that policy, equalizing at least some of the power dynamic when it comes to debating policy. If the ‘facts’ of population distribution and movement, electricity use, water connections, sanitation services and funding availability are shared, such that policy maker and citizen are working from the same data, then the data infrastructure can act as an enabler of more meaningful participation.

In my experience though, the more common outcome when engaging diverse groups in the use of data, is not an immediate shared analysis – but instead of a lot of discussion of gaps and issues in the data itself. In some cases, the way data is being used might be uncontested, but the input might turn out to be misrepresenting the lived reality of citizens. This takes us to the second area of participation: the ability to not jusT take from a dataset, but also to participate in dataset production. Simply having data collected from citizens does not make a data infrastructure participatory. That sensors tracked my movement around an urban area, does not make me an active participant in collecting data. But by contrast, when citizens come together to collect new datasets, such as the water and air quality datasets generated by sensors from Public Lab 15, and are able to feed this into the shared corpus of data used by the state, there is much more genuine participation taking place. Similarly, the use of voluntary contributed data on Open Street Map, or submissions to issue-tracking platforms like FixMyStreet, constitute a degree of participation in producing a public data infrastructure when the state also participates in use of those platforms.

It is worth noting, however, that most participatory citizen data projects, whether concerned with data use of production, are both patchy in their coverage, and hard to sustain. They tend to offer an add-on to the public data infrastructure, but to leave the core substantially untouched, not least because of the significant biases that can occur due to inequalities of time, hardware and skills to be able to contribute and take part.

If then we want to explore participation that can have a sustainable impact on policy, we need to look at shaping the core public data infrastructure itself – looking at the existing data collection activities that create it, and exploring whether or not the data collected, and how it is encoded, serves the broad public interest, and allows the maximum range of democratic freedom in policy making and implementation. This is where we can look at a participatory data infrastructure as one that enables citizens (and groups working on their behalf) to engage in discussions over data design.

The idea that communities, and citizens, should be involved in the design of infrastructures is not a new one. In fact, the history of public statistics and data owes a lot to voluntary social reform focused on health and social welfare collecting social survey data in the eighteenth and nineteenth centuries to influence policy, and then advocating for government to take up ongoing data collection. The design of the census and other government surveys have long been sources of political contention. Yet, with the vast expansion of connected data infrastructures, which rapidly become embedded, brittle and hard to change, we are facing a particular moment at which increased attention is needed to the participatory shaping of public data infrastructures, and to considering the consequences of seemingly technical choices on our societies in the future.

Ribes and Baker [16], in writing about the participation of social scientists in shaping research data infrastructures draw attention to the aspect of timing: highlighting the limited window during which an infrastructure may be flexible enough to allow substantial insights from social science to be integrated into its development. My central argument is that transparency, and the move towards open data, offers a key window within which to shape data infrastructures.

Part 2: Transparency

transparency /tran?spar(?)nsi/ noun “the quality of being done in an open way without secrets” 21

Advocacy for open data has many distinct roots: not only in transparency. Indeed, I’ve argued elsewhere that it is the confluence of many different agendas around a limited consensus point in the Open Definition that allowed the breakthrough of an open data movement late in the last decade [17] [18]. However, the normative idea of transparency plays an important roles in questions of access to public data. It was a central part of the framing of Obama’s famous ‘Open Government Directive’ in 2009 20, and transparency was core to the rhetoric around the launch of data.gov.uk in the wake of a major political expenses scandal.

Transparency is tightly coupled with the concept of accountability. When we talk about government transparency, it is generally as part of government giving account for it’s actions: whether to individuals, or to the population at large via the third and fourth estates. To give effective account means it can’t just make claims, it has to substantiate them. Transparency is a tool allowing citizens to exercise control over their governments.

Sweden’s Freedom of the Press law from 1766 were the first to establish a legal right to information, but it was a slow burn until the middle of the last century, when ‘right to know’ statutes started to gather pace such that over 100 countries now have Right to Information laws in place. Increasingly, these laws recognize that transparency requires not only access to documents, but also access to datasets.

It is also worth noting that transparency has become an important regulatory tool of government: where government may demand transparency off others. As Fung et. al argue in ‘Full Disclosure’, governments have turned to targeted transparency as a way of requiring that certain information (including from the private sector) is placed in the public domain, with the goal of disciplining markets or influencing the operation of marketized public services, by improving the availability of information upon which citizens will make choices [19].

The most important thing to note here is that demands for transparency are often not just about ‘opening up’ a dataset that already exists – but ultimately are about developing an account of some aspect of public policy. To create this account might require data to be connected up from different silos, and may required the creation of new data infrastructures.

This is where standards enter the story.

Part 3: Standards

standard /?stand?d/ noun

something used as a measure, norm, or model in [comparative] evaluations.

The first thing I want to note about ‘standards’ is that the term is used in very different ways by different communities of practice. For a technical community, the idea of a data standard more-or-less relates to a technical specification or even schema, by which the exact way that certain information should be represented as data is set out in minute detail. To assess if data ‘meets’ the standard is a question of how the data is presented. For a policy audience, talk of data standards may be interpreted much more as a question of collection and disclosure norms. To assess if data meets the standard here is more a question of what data is presented. In practice, these aspects interrelate. With anything more than a few records, to assess ‘what’ has been disclosed requires processing data, and that requires it to be modeled according to some reasonable specification.

The second thing I want to note about standards is that they are highly interconnected. If we agree upon a standard for the disclosure of government budget information, for example, then in order to produce data to meet that standard, government may need to check that a whole range of internal systems are generating data in accordance with the standard. The standard for disclosure that sits on the boundary of a public data infrastructure can have a significant influence on other parts of that infrastructure, or its operation can be frustrated when other parts of the infrastructure can’t produce the data it demands.

The third thing to note is that a standard is only really a standard when it has multiple users. In fact, the greater the community of users, the stronger, in effect, the standard is.

So – with these points in mind, let’s look at how a turn to transparency and open data has created both pressure for application of data standards, and an opening for participatory shaping of data infrastructures.

One of the early rallying cries of the open data movement was ‘Raw Data Now’. Yet, it turns out raw data, as a set of database dumps of selected tables from the silo datasets of the state does not always produce effective transparency. What it does do, however, is create the start of a conversation between citizen, private sector and state over the nature of the data collected, held and shared.

Take for example this export from a council’s financial system in response to a central government policy calling for transparency on spend over £500.

Service Area	ServDiv Code	Type	Code	Date	Transaction No.	Amount	Revenue / Capital	Supplier
Balance Sheet	900551	Insurance Claims Payment (Ext)	47731	31.12.2010	1900629404	50,000.00	Revenue	Zurich Insurance Co
Balance Sheet	900551	Insurance Claims Payment (Ext)	47731	01.12.2010	1900629402	50,000.00	Revenue	Zurich Insurance Co
Balance Sheet	933032	Other income	82700	01.12.2010	1900632614	-3,072.58	Revenue	Unison Collection Account
Balance Sheet	934002	Transfer Values paid to other schemes	11650	02.12.2010	1900633491	4,053.21	Revenue	NHS Pensions Scheme Account
Balance Sheet	900601	Insurance Claims Payment (Ext)	47731	06.12.2010	1900634912	1,130.54	Revenue	Shires (Gloucester) Ltd
Balance Sheet	900652	Insurance Claims Payment (Int)	47732	06.12.2010	1900634911	1,709.09	Revenue	Bluecoat C Of E Primary School
Balance Sheet	900652	Insurance Claims Payment (Int)	47732	10.12.2010	1900637635	1,122.00	Revenue	Christ College Cheltenham

It comes from data generated for one purpose (the council’s internal financial management), now being made available for another purpose (external accountability), but that might also be useful for a range of further purposes (companies looking to understand business opportunities; other council’s looking to benchmark their spending, and so-on). Stripped of its context as part of internal financial systems, the column headings make less sense: what is BVA COP? Is the date the date of invoice? Or of payment? What does each ServDiv Code relate to? The first role of any standardization is often to document what the data means: and in doing so, to surface unstated assumptions.

But standardization also plays a role in allowing the emerging use cases for a dataset to be realized. For example, when data columns are aligned comparison across council spending is facilitated. Private firms interested in providing such comparison services may also have a strong interest in seeing each of the authorities providing data doing so to a common standard, to lower their costs of integrating data from each new source.

If standards are just developed as the means of exchanging data between government and private sector re-users of the data, the opportunities for constructing a participatory data infrastructure are slim. But when standards are explored as part of the transparency agenda, and as part of defining both the what and the how of public disclosure, such opportunities are much richer.

When budget and spend open data became available in Sao Paulo in Brazil, a research group at University of Sao Paulo, led by Gisele Craviero, explored how to make this data more accessible to citizens at a local level. They found that by geocoding expenditure, and color coding based on planned, committed and settled funds, they could turn the data from impenetrable tables into information that citizens could engage with. More importantly, they argue that in engaging with government around the value of geocoded data “moving towards open data can lead to changes in these underlying and hidden process [of government data creation], leading to shifts in the way government handles its own data” [22]

The important act here was to recognize open data-enabled transparency not just as a one-way communication from government to citizens, but as an invitation for dialog about the operation of the public data infrastructure, and an opportunity to get involved – explaining that, if government took more care to geocode transactions in its own systems, it would not have to wait for citizens to participate in data use and to expend the substantial labour on manually geocoding some small amount of spending, but instead the opportunity for better geographic analysis of spending would become available much more readily inside and outside the state.

I want to give three brief examples of where the development, or not, of standards is playing a role in creating more participatory data infrastructures, and in the process to draw out a couple of other important aspects of thinking about transparency and standardization as part of the strategic toolkit for asserting citizen rights in the context of smart cities.

Part 4: Examples

Contracts

My first example looks at contracts for two reasons. Firstly, it’s an area I’ve been working on in depth over the last few years, as part of the team creating and maintaining the Open Contracting Data Standard. But, more importantly, its an under-explored aspect of the smart city itself. For most cities, how transparent is the web of contracts that establishes the interaction between public and private players? Can you easily find the tenders and awards for each component of the new city infrastructure? Can you see the terms of the contracts and easily read-up on who owns and controls each aspect of emerging public data infrastructure? All too often the answer to these questions is no. Yet, when it comes to procurement, the idea of transparency in contracting is generally well established, and global guidance on Public Private Partnerships highlights transparency of both process and contract documents as an essential component of good governance.

The Open Contracting Data Standard emerged in 2014 as a technical specification to give form to a set of principles on contracting disclosure. It was developed through a year-long process of research, going back and forth between a focus on ‘data supply’ and understanding the data that government systems are able to produce on their contracting, and ‘data demand’, identifying a wide range of user groups for this data, and seeking to align the content and structure of the standard with their needs. This resulted in a standard that provides a framework for publication of detailed information at each stage of a contracting process, from planning, through tender, award and signed contract, right through to final spending and delivery.

Meeting this standard in full is quite demanding for authorities. Many lack existing data infrastructures that provide common identifiers across the whole contracting process, and so adopting OCDS for data disclosure may involve some elements of update to internal systems and processes. The transparency standard has an inwards effect, shaping not only the data published, but the data managed. In supporting implementation of OCDS, we’ve also found that the process of working through the structured publication of data often reveals as yet unrecognized data quality issues in internal systems, and issues of compliance with existing procurement policies.

Now, two of the critiques that might be offered of standards is that, as highly technical objects their development is only open to participation from a limited set of people, and that in setting out a uniform approach to data publication, they are a further tool of centralization. Both these are serious issues.

In the Open Contracting Data Standard we’ve sought to navigate them by working hard on having an open governance process for the standard itself, and using a range of strategies to engagement people in shaping the standard, including workshops, webinars, peer-review processes and presenting the standard in a range of more accessible formats. We’re also developing an implementation and extensions model that encourages local debate over exactly which elements of the overall framework should be prioritized for publication, whilst highlighting the fields of data that are needed in order to realize particular use-cases.

This highlights an important point: standards like OCDS are more than the technical spec. There is a whole process of support, community building, data quality assurance and feedback going on to encourage data interoperability, and to support localization of the standard to meet particular needs.

When standards create the space, then other aspects of a participatory data infrastructure are also enabled and facilitated. A reliable flow of data on pipeline contracts may allow citizens to scrutinize the potential terms of tenders for smart city infrastructure before contracts are awarded and signed, and an infrastructure with the right feedback mechanisms could ensure, for example, that performance-based payments to providers are properly influenced by independent citizen input.

The thesis here is one of breadth and depth. A participatory developed open standard allows a relatively small-investment intervention to shape a broad section of public data infrastructure, influencing the internal practice of government and establishing the conditions for more ad-hoc deep-dive interventions, that allow citizens to use that data to pursue particular projects of change.

Earth

The second example explores this in the context of land. Who owns the smart city?

The Open Data Index and Open Data Barometer studies of global open data availability have had a ‘Land Ownership’ category for a number of years, and there is a general principle that land ownership information should, to some extent, be public. However, exactly what should be published is a tricky question. An over-simplified schema might ignore the complex realities of land rights, trying to reduce a set of overlapping claims to a plot number and owner. By contrast, the narrative accounts of ownership that often exist in the documentary record may be to complex to render as data [24]. In working on a refined Open Data Index category, the Cadasta Foundation 23 noted that opening up property owners names in the context of a stable country with functioning rule of law “has very different risks and implications than in a country with less formal documentation, or where dispossession, kidnapping, and or death are real and pervasive issues” 23.

The point here is that a participatory process around the standards for transparency may not, from the citizen perspective, always drive at more disclosure, but that at times, standards may also need to protect the ‘strategic invisibility’ of marginalized groups [25]. In the United Kingdom, although individual titles can be bought for £3 from the Land Registry, no public dataset of title-holders is available. However, there are moves in place to establish a public dataset of land owned by private firms, or foreign owners, coming in part out of an anti-corruption agenda. This fits with the idea that, as Sunil Abraham puts it, “privacy should be inversely proportional to power” 26.

Central property registers are not the only source of data relevant to the smart city. Public authorities often have their own data on public assets. A public conversation on the standards needed to describe this land, and share information about it, is arguable overdue. Again looking at the UK experience, the government recently consulted on requiring authorities to record all information on their land assets through the Property Information Management system (ePIMS): centralizing information on public property assets, but doing so against a reductive schema that serves central government interests. In the consultation on this I argued that, by contrast, we need an approach based on a common standard for describing public land, but that allows local areas the freedom to augment a core schema with other information relevant to local policy debates.

Air

From the earth, let us turn very briefly to the air. Air pollution is a massive issue, causing millions on premature deaths worldwide every year. It is an issue that is particularly acute in urban areas. Yet, as the Open Data Institute note “we are still struggling to ‘see’ air pollution in our everyday lives” 27. They report the case of decision making on a new runway at Heathrow Airport, where policy makers were presented with data from just 14 NO2 sensors. By contrast, a network of citizen sensors provided much more granular information, and information from citizen’s gardens and households, offering a different account from those official sensors by roads or in fields.

Mapping the data from official government air quality sensors reveals just how limited their coverage is: and backs up the ODI’s calls for a collaborative, or participatory, data infrastructure. In a 2016 blog post, Jamie Fawcett describes how:

“Our current data infrastructure for air quality is fragmented. Projects each have their own goals and ambitions. Their sensor networks and data feeds often sit in silos, separated by technical choices, organizational ambition and disputes over data quality and sensor placement. The concerns might be valid, but they stand in the way of their common purpose, their common goals.”

He concludes “We need to commit to providing real-time open data using open standards.”

This is a call for transparency by both public and private actors: agreeing to allow re-use of their data, and rendering it comparable through common standards. The design of such standards will need to carefully balance public and private interests, and to work out how the costs of making data comparable will fall between data publishers and users.

Part 5: Recap

So, to briefly recap:

I want to draw attention to the data infrastructures of the smart city and the modern state;
I’ve suggested that open data and transparency can be powerful tools in performing the kind of infrastructural inversion that brings the context and history of datasets into view and opens them up to scrutiny;
I’ve furthermore argued that transparency policy opens up an opportunity for a two-way dialogue about public data infrastructures, and for citizen participation not only in the use and production of data, but also in setting standards for data disclosure;
I’ve then highlighted how standards for disclosure don’t just shape the data that enters the public domain, but they also have an upwards impact on the shape of the public data infrastructure itself.

Taken together, this is a call for more focus on the structure and standardization of data, and more work on exploring the current potential of standardization as a site of participation, and an enabler of citizen participation in future.

If you are looking for a more practical set of takeaways that flow from all this, let me offer a set of questions that can be asked of any smart cities project, or indeed, any data-rich process of governance:

(1) What information is pro-actively published, or can be demanded, as a result of transparency and right to information policies?
(2) What does the structure of the data reveal about the process/project it relates to?
(3) What standards might be used to publish this data?
(4) Do these standards provide the data I, or other citizens, need to be empowered in relevant to this process/project?
(5) Are these open standards? Whose needs were they designed to serve?
(6) Can I influence these standards? Can I afford not to?

References

1: https://www.google.co.uk/search?q=define%3Ainfrastructure, accessed 17th August 2017

2: Star, S., & Ruhleder, K. (1996). Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces. Information Systems Research, 7(1), 111–134.

3: Bowker, G. C., & Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences. The MIT Press.

4: Goldsmith, S., & Crawford, S. (2014). The responsive city. Jossey-Bass.

5: Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE Publications.

6: The Danish Government. (2012). Good Basic Data for Everyone – a Driver for Growth and Efficiency, (October 2012)

7: Bartha, G., & Kocsis, S. (2011). Standardization of Geographic Data: The European INSPIRE Directive. European Journal of Geography, 22, 79–89.

10: Guldi, J. (2012). Roads to power: Britain invents the infrastructure state.

[11]: Gray, J., & Davies, T. (2015). Fighting Phantom Firms in the UK : From Opening Up Datasets to Reshaping Data Infrastructures?

[12]: Gray, J., & Tommaso Venturini. (2015). Rethinking the Politics of Public Information: From Opening Up Datasets to Recomposing Data Infrastructures?

[13]: Gray, J. (2015). DEMOCRATISING THE DATA REVOLUTION: A Discussion Paper

[14]: Arnstein, S. R. (1969). A ladder of citizen participation. Journalof the American Institute of Planners, 34(5), 216–224.

[16]: Ribes, D., & Baker, K. (2007). Modes of social science engagement in community infrastructure design. Proceedings of the 3rd Communities and Technologies Conference, C and T 2007, 107–130.

[17]: Davies, T. (2010, September 29). Open data, democracy and public sector reform: A look at open government data use from data.gov.uk.

[18]: Davies, T. (2014). Open Data Policies and Practice: An International Comparison.

[19]: Fung, A., Graham, M., & Weil, D. (2007). Full Disclosure: The Perils and Promise of Transparency (1st ed.). Cambridge University Press.

[22]: Craveiro, G. S., Machado, J. A. S., Martano, A. M. R., & Souza, T. J. (2014). Exploring the Impact of Web Publishing Budgetary Information at the Sub-National Level in Brazil.

[24]: Hetherington, K. (2011). Guerrilla auditors: the politics of transparency in neoliberal Paraguay. London: Duke University Press.

[25]: Scott, J. C. (1987). Weapons of the Weak: Everyday Forms of Peasant Resistance.

Principles for responding to mass surveilance and the draft Investigatory Powers Bill

[Summary: notes written up on the train back from Paris & London, and following a meeting with Open Rights Group focussing on the draft Investigatory Powers Bill]

It can be hard to navigate the surveillance debate. On the one hand, whilstblower revalations, notably those from Edward Snowden, have revealed the way in which states are accumulating collection of mass communications data, creating new regimes of deeply intrusive algorithmic surveillance, and unsettling the balance of power between citizens, officials and overseers in politics and the media. On the other, as recent events in Paris, London, the US and right across the world have brought into sharp focus, there are very real threats to the life and liberty posed by non-state terrorist actors – and meeting the risks posed must surely involve the security services.

Fortunately, rather than the common pattern of rushed legislative proposals after terrorist attacks, after the attacks in Paris, the UK has kept to the planned timetable for debate of the proposed Investigatory Powers Bill.

The Bill primarily works to put on a legal footing many of the actions that surveillance agencies have already been engaged in when it comes to bulk data collection and bulk hacking of services (equipment interference, and obtaining data). But the Bill also proposes a number of further extensions of powers, including provisions to mandate storage of ‘Internet Connection Records’ – branded as creating a ‘snoopers charter’ in media debates because of the potential for law enforcement and other government agencies to gain access to this detailed information in individuals web browsing histories.

Page 33 of the draft includes a handy ‘Investigatory Powers at a Glance’ table, setting out who will have access to Communications Data, powers of Interception and Bulk Datasets – and what the access and oversight processes might be.

Reading through the case for new powers put in the pre-amble to the Bill it is important to critically unpack the claims made for new powers. For example, point 47 notes that “From a sample of 6025 referrals to the Child Exploitation and Online Protection Command (CEOP) of the NCA, 862 (14%) cannot be progressed”. The document extrapolates from this “a minimum of 862 suspected paedophiles, involved in the distribution of indecent imagery of children, who cannot be identified without this legislation.”, yet this is premised on the proposed storage of Internet Connection Records being a ‘magic bullet’ to secure investigation of all these suspects. In reality – the number may be much lower.

Yet, getting drawn into a calculus of costs and benefits, trading off the benefits of the protection of one group, with the harms of surveillance to another group, is a tricky business, and unlikely to create a well reasoned surveillance debate. We’re generally not very good at calculating as a society where risks are involved. And there will always be polarisation between those who weight apparently opposing goods (security/liberty?) particularly highly.

The alternative to this cost/benefit calculus is to develop a response based on principles. Principles we can check against evidence, but clear guiding principles none-the-less.

Here’s my first attempt at four principles to consider in exploring how to response to the Investigatory Powers Bill:

(1) Data minimisation without suspicion. We should collect and store the minimum possible amount of data about individuals where there is no reason to suspect the threat of harm to others, or of serious crime.

This point builds upon both principles and pragmatism. Individuals should be innocent until proven guilty, and space for individual freedom of thought and action respected. Equally, surveillance services need more signal, not more noise.

When it comes to address terrorism, creating an environment in which whole communities feel subject to mass surveilance is an entirely counterproductive strategy: undermining rather than promoting the liberal values we must work to protect.

(2) Data maximisation with suspicion. Where there is suspicion of individuals posing a threat, or of serious crime, then proportionate surveillance is justified, and should be pursued.

As far as I understand, few disagree with targetted surveillance. Unlike mass surveillance, targetted approachs can be intelligence rather than algorithmically led, and more tightly connect information collection, analysis and consideration of actions that can be taken against those who pose threats to society.

(3) Strong scrutiny. Sustained independent oversight of secret services is hard to achieve – but is vital to ensure tagetted surveillance capabilities are used responsibly, and to balance the power this gives to those who weild them.

The current Investigatory Powers Bill includes notable scrutiny loopholes, in which once issued, a Warrant can be modified to include new targets without new review and oversight.

(4) A safe Internet. Bulk efforts to undermine encyption and Internet security are extremely risky. Our societies rely upon a robust Internet, and it is important for governments to be working to make the network stronger for all.

Of course, putting principles into practive involves trade offs. But identifying principles is an important starting point to a deeper debate.

Do these principles work for you? I’ll be reflecting more on whether they capture enough to provide a route through the debate, and what their implications are for responding to the Investigatory Powers Bill in the coming months.

(P.S. If you care about the future of the Investigatory Powers Bill in the UK, and you are not already a member of the Open Rights Group – do consider joining to support their work as one of very few dedicated groups focussing on promoting digital freedoms in this debate.

Disclosure: I’m a member of the ORG Advisory Council)

Creating the capacity building game…

[Summary: crowdsourcing contributions to a workshop at Open Development Camp]

There is a lot of talk of ‘capacity building’ in the open data world. As the first phase of the ODDC project found, there are many gaps between the potential of open data and it’s realisation: and many of these gaps can be described as capacity gaps – whether on the side of data suppliers, or potential data users.

But how does sustainable capacity for working with open data develop? At the Open Development Camp in a few weeks time I’ll be facilitating a workshop to explore this question, and to support participants to share learning about how different capacity building approaches fit in different settings.

The basic idea is that we’ll use a simple ‘cards and scenarios’ game (modelled, as ever, on the Social Media Game), where we identify a set of scenarios with capacity building needs, and then work in teams to design responses, based on combining a selection of different approaches, each of which will be listed one of the game cards.

But, rather than just work from the cards, I’m hoping that for many of these approaches there will be ‘champions’ on hand, able to make the case for that particular approach, and to provide expert insights to the team. So:

(1) I’ve put together a list of 24+ different capacity building approaches I’ve seen in the open data world – but I need your help to fill in the details of their strengths, weaknesses and examples of them in action.
(2) I’m looking for ‘champions’ for these approaches, either who will be at the Open Development Camp, or who could prepare a short video input in advance to make the case for their preferred capacity building approach;

If you could help with either, get in touch, or dive in direct on this Google Doc.

If all goes well, I’ll prepare a toolkit after the Open Development Camp for anyone to run their own version of the Capacity Building Game.

The list so far

Click each one to jump direct to the draft document

A Data Sharing Disclosure Standard?

[Summary: Iterations on a proposal for a public register of government data sharing arrangements, setting out options for a Data Sharing Disclosure Standard to be used whenever government shares personal data. Draft for interactive comments here (and PDF for those in govt without access to Google Docs )]

At the instigation of the UK Cabinet Office, an open policy making process is currently underway to propose new arrangements for data sharing in government. Data sharing arrangements are distinct from open data, as they may involve the limited exchange of personal and private data between government departments, or outside of government, with specific purpose of data use in mind.

The idea that new measures are needed is based on a perception that many opportunities to make better use of data for research, addressing debt and fraud, or tailoring the design of public services, are missed because either because of legal or practical barriers to data moving being exchanged or joined up between government departments. Some departments in particular, such as HMRC, require explicit legal permissions to share data, where in other department and public bodies, a range of existing ‘legal gateways’ and powers support exchange of data.

I’ve been following the process from afar, but on Monday last week I had the chance to attend one of the open full-day workshops that Involve are facilitating as part of the open policy making process. This brought together representatives of a range of public bodies, including central government departments and local authorities, with members of the Cabinet Office team leading on data sharing reforms, and a small number of civil society organisations and individuals. Monday’s discussion were centered on the introduction of new ‘permissive powers’ for data sharing to support tailored public services. For example, powers that would make it easier for local government to request and obtain HMRC data on 16 – 19 year olds in order to identify which young people in their area were already in employment or training, and so to target their resources on contacting those young people outside employment or training who they have a statutory obligation to support.

The exact wording of such a power, and the safeguards that need to be in place to ensure it is neither too broad, nor open to abuse, are being developed through the open policy making process. One safeguard I believe is important comes from introducing greater transparency into government data sharing arrangements.

A few months back, working with Reuben Binns, I put together a short note on a possible model for an ‘Open Register of Data Sharing‘. In Monday’s open policy making meeting, the topic of transparency as an important aspect of tailored public service data sharing came up, and provided an opportunity to discuss many of the ideas that the draft proposal had contained. Through the discussions, however, it became clear that there were a number of extra considerations needed to develop the proposal further, in particular:

Noting that public disclosure of planned data sharing was not only beneficial for transparency and scrutiny, but also for efficiency, coordination and consistency of data sharing: by allowing public bodies to pool data sharing arrangements, and to easily replicate approved shares, rather than starting from scratch with every plan and business case.
Recognising the concerns of local authorities and other public bodies about a centralised register, and the need to accommodate shares that might take place between public bodies at a local level only, without involvement of central government.
Recognising the need for both human and machine-readable information on data sharing arrangements, so that groups with a specific interest in particular data (e.g. associations looking out for the rights of homeless people) could track proposed or enacted arrangements without needing substantial technical know-how.
Recognising the importance of documents like Privacy Impact Assessments and Business Cases, but also noting that mandatory publication of these during their drafting could distort the drafting process (with the risk they become more PR documents making the case for a share, than genuine critical assessments), suggesting a mix of proactive and reactive transparency may be needed in practice.

As a result of the discussions with local authorities, government departments and others, I took away a number of ideas about how the proposal could be refined, and so this Friday, at the University of Southampton Web and Internet Science group annual gathering and weekend of projects (known locally as WAISFest) I worked in a stream on personal data, and spend a morning updating the proposals. The result is a reframed draft that, rather than focusing on the Register, focuses on a Data Sharing Disclosure Standard emphasising the key information that needs to be disclosed about each data share, and discussing when disclosure should take place, whilst leaving open a range of options for how this might be technically implemented.

You can find the updated document here, as a Google Doc open to comments. I would really welcome comments and suggestion for how this could be refined further over the coming weeks. If you do leave a comment and want to be credited / want to join in future discussion of this proposal, please also include your name / contact details.

The Gazette provides semantically enriched public notices: readable by humans and machines.

A couple of things of particular note in the draft:

It is useful to identify (a) data controllers; (b) dataset; (c) legislation authorising data shares. Right now the Register of Data Controllers seems to provide a good resource for (a), and thanks to recent efforts at building out the digital information infrastructure of the UK, it turns out there are often good URLs that can be used as identifiers for datasets (data.gov.uk lists unpublished datasets from many central government departments) and legislation (through the data-all-the-way down approach of legislation.gov.uk).
It considers how the Gazette might be used as a publication route for Data Sharing Disclosures. The Gazette is an official paper of record, established since 1665 but recently re-envisioned with a semantic publishing platform. Using such a route to publish notices of data sharing has the advantage that it combines the long-term archival of information in a robust source, with making enriched openly licensed data available for re-use. This potentially offers a more robust route to disclosures, in which the data version is a progressive enhancement on top of an information disclosure.
Based on feedback from Javier Ruiz, it highlights the importance of flagging when shared data is going to be processed using algorithms that will determine individuals eligibility for services/trigger interventions affecting citizens, and raises of the question of whether the algorithms themselves should be disclosed as a mater of course.

I’ll be sharing a copy of the draft with the Data Sharing open policy process mailing list, and with the Cabinet Office team working on the data sharing brief. They are working to draft an updated paper on policy options by early September, with a view to a possible White Paper – so comments over the next few weeks are particularly valued.

New Paper – Mixed incentives: Adopting ICT innovations for transparency, accountability, and anti-corruption

[Summary: critical questions to ask when planning, funding or working on ICTs for transparency and accountability]

Last year I posted some drafts of a paper I’ve been writing with Silvana Fumega at the invitation of the U4 Anti-Corruption Center, looking at the incentives for, and dynamics of, adoption of ICTs as anti-corruption tools. Last week the final paper was published in the U4 Issue series, and you can find it for download here.

In the final iteration of the paper we have sought to capture the core of the analysis in the form of a series of critical questions that funders, planners and implementers of anti-corruption ICTs can ask. These are included in the executive summary below, and elaborated more in the full paper.

Adopting ICT innovations for transparency, accountability, and anti-corruption – Executive Summary

Initiatives facilitated by information and communication technology (ICT) are playing an increasingly central role in discourses of transparency, accountability, and anti-corruption. Both advocacy and funding are being mobilised to encourage governments to adopt new technologies aimed at combating corruption. Advocates and funders need to ask critical questions about how innovations from one setting might be transferred to another, assessing how ICTs affect the flow of information, how incentives for their adoption shape implementation, and how citizen engagement and the local context affect the potential impacts of their use.

ICTs can be applied to anti-corruption efforts in many different ways. These technologies change the flow of information between governments and citizens, as well as between different actors within governments and within civil society. E?government ICTs often seek to address corruption by automating processes and restricting discretion of officials. However, many contemporary uses of ICTs place more emphasis on the concept of transparency as a key mechanism to address corruption. Here, a distinction can be made between technologies that support “upward transparency,” where the state gains greater ability to observe and hear from its citizens, or higher-up actors in the state gain greater ability to observe their subordinates, and “downward transparency,” in which “the ‘ruled’ can observe the conduct, behaviour, and/or ‘results’ of their ‘rulers’” (Heald 2006). Streamlined systems that citizens can use to report issues to government fall into the former category, while transparency portals and open data portals are examples of the latter. Transparency alone can only be a starting point for addressing corruption, however: change requires individuals, groups, and institutions who can access and respond to the information.

In any particular application of technology with anti-corruption potential, it is important to ask:

What is the direction of the information flow: from whom and to whom?
Who controls the flow of information, and at what stages?
Who needs to act on the information in order to address corruption?

Different incentives can drive government adoption of ICTs. The current wave of interest in ICT for anti-corruption is relatively new, and limited evidence exists to quantify the benefits that particular technologies can bring in a given context. However, this is not limiting enthusiasm for the idea that governments, particularly developing country governments, can adopt new technologies as part of open government and anti-corruption efforts. Many technologies are “sold” on the basis of multiple promised benefits, and governments respond to a range of different incentives. For example, governments may use ICTs to:

Improve information flow and government efficiency, creating more responsive public institutions, supporting coordination.
Provide open access to data to enable innovation and economic growth, responding to claims about the economic value of open data and its role as a resource for private enterprise.
Address principal-agent problems, allowing progressive and reformist actors within the state to better manage and regulate other parts of the state by detecting and addressing corruption through upward and downward transparency.
Respond to international pressure, following the trends in global conversations and pressure from donors and businesses, as well as the availability of funding for pilots and projects.
Respond to bottom-up pressure, both from established civil society and from an emerging global network of technology-focussed civil society actors. Governments may do this either as genuine engagement or to “domesticate” what might otherwise be seen as disruptive innovations.

In supporting ICTs for anti-corruption, advocates and donors should consider several key questions related to incentives:

What are the stated motivations of government for engaging with this ICT?
What other incentives and motivations may be underlying interest in this ICT?
Which incentives are strongest? Are any of the incentives in conflict?
Which incentives are important to securing anti-corruption outcomes from this ICT?
Who may be motivated to oppose or inhibit the anti-corruption applications of this ICT?

The impact of ICTs for anti-corruption is shaped by citizen engagement in a local context. Whether aimed at upward or downward transparency, the successful anti-corruption application of an ICT relies upon citizen engagement. Many factors affect which citizens can engage through technology to share reports with government or act upon information provided by government. ICTs that worked in one context might not achieve the same results in a different setting (McGee and Gaventa 2010). The following questions draw attention to key aspects of context:

Who has access to the relevant technologies? What barriers of connectivity, literacy, language, or culture might prevent a certain part of the population from engaging with an ICT innovation?
What alternative channels (SMS, offline outreach) might be required to increase the reach of this innovation?
How will the initiative close the feedback loop? Will citizens see visible outcomes over the short or long term that build rather than undermine trust?
Who are the potential intermediary groups and centralised users for ICTs that provide upward or downward transparency? Are both technical and social intermediaries present? Are they able to work together?

Towards sustainable and effective anti-corruption use of ICTs. As Strand (2010) argues, “While ICT is not a magic bullet when it comes to ensuring greater transparency and less corruption . . . it has a significant role to play as a tool in a number of important areas.” Although taking advantage of the multiple potential benefits of open data, transparency portals, or digitised communication with government can make it easier to start a project, funders and advocates should consider the incentives for ICT adoption and their likely impact on how the technology will be applied in practice. Each of the questions above is important to understanding the role a particular technology might play and the factors that affect how it is implemented and utilised in a particular country.

You can read the full paper here.

ICTs and Anti-Corruption: Uptake, use and impacts

[Summary: The forth section of our draft paper on ICTs and Anti-corruption looks at the evidence on uptake, use and impacts. We’d love your comments…

I’m currently posting draft sections of a report on ICTs and anti-corruption to invite comments before the final paper is written up in a few weeks time. If you’ve any comments on the draft, please do add them into the Google Doc draft or leave a note below. This forth and final section looks at uptake of anti-corruption ICTs in developing country contexts and issues concerning who uses these technologies.

4. UPTAKE, USE AND IMPACTS

Government incentives aside, it is important for advocates and funders of ICT-enabled anti-corruption activity to consider the factors that may affect the impact of these interventions in developing countries. As previously outlined, ICT-based reforms tend to focus on either transactions or transparency. Both rely upon the engagement of citizens. Citizens are crucial either to access and respond to information that is made available through transparency, or to originate and communicate to government their own experience through transactional channels. Therefore, it is important to ask what incentives and barriers citizens have for such engagement, and to explore what kinds of citizen engagement are important to the success of certain ICTs.

4.1 THE CITIZEN ROLE

Much of the limited evidence we do have on citizen engagement with transparency and accountability ICTs comes from cases where those tools/platforms have been deployed by civil society. Avila et. al. divide interventions into two kinds: push and pull transparency (Avila, Feigenblatt, Heacock, & Heller, 2011). In the former, citizens speak up, and communicate their experience of an issue; in the later, citizens ‘pull’ down information from an available pool and use it to act in some way. In practice, many interventions require both: citizens to access information, and citizens to act through exercising their voice and pushing issues onto the agenda (Avila, R. et al, 2009). An ICT intervention might be designed around the idea of citizens acting individually (e.g. in transactional citizen reporting channels), or around the idea of citizens acting collectively, as in the idea of that, on identifying corrupt activity through information on a transparency portal, or an open data catalogue, citizens speak out politically on the need for change. Citizen action in these cases may be direct, or mediated. In mediated cases, technical intermediaries, sometimes termed “infomediaries”, play a particularly important role in theories of change around how open data may be used by citizens (Steinberg, 2011).

4.2 WHICH CITIZENS?

The effort, as well as the skills, that each of these different models (push or pull; individual or collective action) demand from the citizens varies significantly across ICT interventions. Users can be passive consumers of information, accumulating it to use at some future point, such as when voting. Or, as Fung et al (2010) outline, they can be requested to act on information that they receive, drawing on a range of resources to make a change in their behaviour as a result of transparent information, for example in citizens’ reporting channel (from government or civil society) or in participatory budget exercises.

Differences emerge not only between the users of different models, also amongst users in each of them. The skills, resources and capacity to influence others are not the same between mass users (general public) and organized entities (such as NGOs, journalist, companies and public officials). According to Fung et. al. (2011) the interventions that aim to increase political accountability (understood as the demand over the “behaviour of political officials whose policies have more generalized effects”) generally rely upon centralized users (media, NGOs, among others) while the general public (decentralized actors) tend to be more inclined towards interventions designed to demand service accountability (ibid.). This distinction seems to present some sort of correlation with the assumption that people values information that is directly relevant to their well-being and they are interested in a few select political issues that are directly relevant to their lives.

Besides the incentives behind each user, there certainly is a disparity in terms of resources to disseminate the information and also regarding the capacity to channel demands through the appropriate institutional channels. Following Fung et. al. “political campaigns and candidates, for example, may be far more sensitive and responsive to the criticisms that journalists make than to the more diffuse, harder to discern views of mass voters” (Fung et al., 2011).

In terms of the characteristics of the mass users, there is limited analysis on the demographics of ICT-led transparency initiatives user. Some reports argue that poorer demographics are the most affected by corruption (Knox, 2009). Despite that, the analysis that does exist suggests that more educated, higher income and more technologically comfortable demographics of the population are more incline to engage with ICT-led interventions (Kuriyan, Bailur, Gigler, & Park, 2012). It is perhaps not surprising as these groups are the most likely to be online and to engage with Internet applications more frequently, as well as more likely to participate in politics. However, the implications of this for the design of technology for anti-corruption projects is offer an afterthought, rather than a key design consideration from the start. The fact that ICT-based innovations may primarily reach relatively predictable (and relatively affluent) proportions of the population (at least in the short term) may play a role in making such approaches appealing to governments who believe they can manage any input they may receive within existing institutional processes.

4.3 BARRIERS TO UPTAKE

According to figures on Internet penetration, in 2013 there is still a big gap in terms of users between developing and developed countries (ITU, 2013). These figures show a penetration of 70% approximately for developed countries while only a 30% for the developing ones.

Traditionally the digital divide has had a correlation with the difficulties to access (and use[1]) Internet connexion. Those difficulties could be related to access to old computers, high price connexions, among others. Some analysts (Gurstein, 2011) argue that some of these initiatives (open data initiatives, in particular) might present a new divide among the population. Together with the digital divide, the rapid development in ICT tools seems to add new barriers to entry.

Current discourses on ICT tools for transparency and accountability suggest (implicitly or sometimes explicitly) that with these new tools everybody can make use of the data and information provided as well as act upon them. However, there are numerous barriers that are not related only to the access to Internet or others technologies (digital divide) but also, as Gurstein mentioned, to the educational resources/skills which would allow for the effective use of those resources.

“…the lack of these foundational requirements means that the exciting new outcomes available from open data are available only to those who are already reasonably well provided for technologically and with other resources.” (Gurstein, 2011)

For the community of potential users to be able to interact with the project, they need the necessary skills to use digital technology as well as to manage, and assess information regarding public interest issues. That is, it is important to count with an ICT literate community. This is relevant for government project as well as civil society initiatives.

“..the release of public sector information without a commensurate increase in data literacy will do little to empower the average citizen.” (Gigler, Custer, & Rahemtulla, 2011)

Furthermore, in developing contexts, not only ICT literacy is a key element for the success of a project but also language differences as well as the material factors such as access to low cost technologies (digital divide not only in terms of access to technology but also regarding the skills to effectively make use of those tools). As explained in the Ugandan context:

“A major constraint mentioned […] was funding shortages. This was followed by the high cost of accessing the tools, the capability to use (language and literacy) the mainly Internet or mobile based platforms.” (Kalemera, Nalwoga, & Wakabi, 2012)

In that sense, according to Courtney Tolmie, director at the Research for Development Institute, websites that allow reporting in the local languages, and that also receive high levels of publicity, and accept SMS texting (a much more accessible technology in many developing countries), should prove more successful (Dawson, 2012).

Even in the absence of some of the above-mentioned barriers, such as an ICT literacy community with an easy access to technology, there is not a guarantee of a robust citizen engagement.

“… increasing the availability of Internet based information does not necessarily mean that citizens will use it to demand greater accountability. The proportion of citizens who are prepared to be consistently engaged in the process of governance is relatively small. Even where there are high rates of Internet penetration, experience has shown that creating a good website or online portal does not guarantee its use” (Bhatnagar, 2003)

4.4 CONTEXT

All of the above-mentioned factors can provide insights in terms of user trends and pre-conditions for that uptake. However, when considering technological interventions it is important to consider the legal, policy and social context in which technology is introduced. In that sense, low engagement could also be a result of distrust or poor relationships with the intended users of disclosed information (government). Following Finnegan (2012) “Distrust, animosity and secrecy are commonly cited issues for technology projects working towards government accountability (Finnegan, 2012).

A clear example of that limitation to engage with the general public is shown by the experience of the civil society initiative, “Map Kibera”, a community-mapping project. The local mappers working on the project were originally met “with suspicion by residents, and questioned about their right to collect and record information. Some mappers were asked whether they were being paid for their work, or were asked for payment in return for the data they received” (Finnegan, 2012).

This poor relationship with government might be also related, among other reasons, to the frustration coming from the absence of institutional mechanisms to submit the input/demand/grievance from the community of users.

Even when those mechanisms are in place, the lack of a timely response (or the complete absence of feedback) can lead to apathy from the users. Clear evidence of the use of the data/input collected and their contribution in correcting and/or punishing wrongdoing could incentivize users to engage with anti-corruption ICT projects more in figure. For example, in Bangalore, Bhaskar Rao, the Transport Commissioner for the state of Karnataka, used the data collected on I Paid a Bribe to push through reforms in the motor vehicle department. As a result, and in order to avoid bribes, licenses are now applied for online (Strom, 2012), and citizens have seen an impact from their use of transactional ICTs to report corruption.

Anupama Dokeniya explains that “transparency policies will achieve little if the political system does not create the incentives for officials to be sanctioned when corruption is exposed, for service providers to be penalized when poor performance or absenteeism is revealed, or for safeguards or structural reforms to be adopted when evidence of systemic governance problems emerge” (Dokeniya, 2012). The same logic can be applied to all the ICT-led projects we have surveyed. Technology just provides the tools for a greater number of citizens to access a large amount of information, but the pivotal driver of success in these initiatives are broadly the same as for any other transparency policy.

Furthermore, following Finnegan, in many cases, even when there is significant interest from communities of users, if the application or platform is unable to produce any change, the interest and support from those before-enthusiastic users start to fade. Conversely, when participants realize that their contribution could lead to any relevant outcome, the esteem for the tool increases (Finnegan, 2012).

4.5 INTERMEDIARIES

To lower those barriers (absence of an ICT literate community, lack of easy access to technology and/or high costs of accessing internet and other technologies), when a project is focused on government’s disclosure of public information (open data initiatives, transparency portals), it is important to count with the presence of intermediaries (centralized users) to amplify and simplify the disclosed data/information. To create awareness among citizens and to provide the tools for those citizens to later scrutinize, assess and hold governments accountable, intermediaries are key actors to engage users with that information, especially in political accountability initiatives as they translate the sometimes abstract ideas and data into simple messages and stories that other citizens can relate to.

“Genuinely promoting transparency requires the hard work of doing investigative research, publishing reports, and promoting them to the media. Bubble 2.0 hype aside, the fanciest pop-up windows and Google Maps mashups won’t change that.” (Swartz, 2006)

Those intermediaries can be social or technical skilled groups. Some of the intermediaries may focus on creating applications to simplify the access and use of the raw data and some others may help with information distribution and citizens’ engagement to demand accountability. As previously mentioned, no every citizen is eager to engage with transparency initiatives (due to a lack of interest, skills or resources), therefore to intermediaries play a key role in the use of those provided ICT tools. The existence and capacity of technically skilled intermediaries is likely to be an important determining factor for the success of many ICT-led interventions, particularly open data interventions.

4.6 IMPACT

To present a clear idea about the above-mentioned questions on incentives and desired outcomes could help to the assessment of these interventions. There is no proper impact assessment without the presence of a theory of change.

Anecdotal evidence can be found about particular initiatives and some of the changes they produce, however, there is a lack of systematic assessments of these policies and their relationship to greater government transparency, accountability and participation in decision-making. In that sense, there are several recounts of individual initiatives but in terms of developing frameworks to assess each type of ICT initiative, there is a lack of academic research.

Moreover, in terms of initiatives related to the disclosure of information (transparency portals and Open Data Initiatives) the idea of counting visits to a website and/or the number of ‘downloads” of certain datasets or documents cannot be presented as indicators of usage, and much less, of impact of any of these policies. In many cases, these initiatives are compared to one another in terms of number of published documents and datasets as well as number of visits. However, these numbers could lead to wrong results, or partial ones at its best.

References

Avila, R., Feigenblatt, H., Heacock, R., & Heller, N. (2011). Global mapping of technology for transparency and accountability: New technologies.

Bhatnagar, S. (2003). E-government and access to information. In Global Corruption Report (pp. 24–32).

Dawson, S. (2012). Citizens wield web tools to combat petty bribery. Thomson Reuters Foundation.

Dimaggio, P., & Hargittai, E. (2001). From the “Digital Divide” to “Digital Inequality”: Studying Internet Use as Penetration Increases.

Dokeniya, A. (2012). #6 from 2012: Opening Government Data. But Why? People, Spaces, Deliberation World Bank Blog. Retrieved from http://blogs.worldbank.org/publicsphere/opening-government-data-why

Finnegan, S. (2012). Using technology for collaborative transparency?: Risks and opportunities. In GIS Watch 2012 (Vol. 8, pp. 29–33).

Fung, A., Gilman, H. R., & Shkabatur, J. (2011). Impact case studies from middle income and developing countries New technologies.

Gigler, B.-S., Custer, S., & Rahemtulla, H. (2011). Realizing the Vision of Open Government Data: Opportunities, Challenges and Pitfalls (Abridged Version).

Gurstein, M. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday, 16(2).

ITU. (2013). ICT Facts and Figures – The World in 2013.

Kalemera, A., Nalwoga, L., & Wakabi, W. (2012). How ICT tools are promoting citizen participation in Uganda.

Knox, C. (2009). Dealing with sectoral corruption in Bangladesh: Developing citizen involvement. Public Administration and Development, 29(2), 117–132. doi:10.1002/pad.523

Kuriyan, R., Bailur, S., Gigler, B.-S., & Park, K. R. (2012). Technologies for Transparency and Accountability. Washington DC.

Steinberg, T. (2011). Asking the wrong question about Data.gov. Premise (blog). Retrieved from http://steiny.typepad.com/premise/2011/04/asking-the-wrong-question-about-datagov.html

Strom, S. (2012, March 6). I Paid a Bribe and Similar Corruption-Exposing Sites Spread – NYTimes.com. New York Times. New York.

Swartz, A. (2006). Disinfecting the Sunlight Foundation. Aaron Swartz’s Raw Thoughs. Retrieved from http://www.aaronsw.com/weblog/dissunlight

[1] However, it is important that access and use are not necessarily synonymous. Some studies have shown that: “…more people have access than use it (NTIA 1998); and, second, that whereas resources drive access, demand drives intensity of use among people who have access” (Dimaggio & Hargittai, 2001)

Thoughts? Reflections? Add a comment on the draft by 23rd November.

Exploring the incentives for adopting ICT innovation in the fight against corruption

[Summary: Invite for comments on a new draft report exploring incentives for ICT use in the fight against corruption]

Back in January, in response to a blog post by Doug Hadden, I wrote down a few reflections on the incentives for technology for transparency in developing countries. That led to a conversation with Silvana Fumega and the U4 Anti-Corruption Resource Centre about a possible briefing note on the topic, which quickly turned into a full paper – designed to scope out issues for donors and governments to consider in looking at supporting ICT-based anti-corruption efforts, particularly in developing countries. Together with Silvana, I’ve been working on a draft over the last few months – and we’ve just placed a copy online for comments.

I’ll be blogging sections of the draft over the coming week, and you can find the full draft as a Google Document with comments enabled (until 18th November 2013) here.

Here’s the introduction, setting out the focus of the paper:

Information and Communication Technology (ICT) driven initiatives are playing an increasingly central role in discourses of transparency, accountability and anti-corruption. The Internet and mobile phones are widely hailed as powerful tools in the fight against corruption. From mobile phone based corruption crowd-sourcing platforms, to open government data portals providing citizens with access to state datasets, technology-centric interventions are increasingly attracting both political attention and donor funding flows. The Open Government Partnership (OGP) declaration, launched in 2011, commits the 60 OGP member states to “…seizing this moment to strengthen our commitments to promote transparency, fight corruption, empower citizens, and harness the power of new technologies to make government more effective and accountable” (Open Government Partnership, 2011). In an analysis of the first action plans published by OGP members (Global Integrity, 2012), e-government and open data related commitments were markedly the most common made, illustrating the prominence given to ICTs in creating more open and accountable government.

However, the ‘sales pitch’ for governments to adopt ICTs is far broader than their anti-corruption applications, and the fact that a government adopts some particular technology innovation does not necessarily mean that its potential corruption-reducing role will be realised. Criticisms have already been levelled at open data portals that give an initial appearance of government transparency, whilst either omitting any politically sensitive content, or remaining, in practice, inaccessible to the vast majority of the population; and there are numerous examples to be found of crowd-sourcing platforms designed to source citizen feedback on public services, or corruption reports, languishing with just a handful of reports, or no submissions made for months on end (Bailard et. al., 2012; Brown, 2013) Yet, as Strand argues, “while ICT is not a magic bullet when it comes to ensuring greater transparency and less corruption…it has a significant role to play as a tool in a number of important areas” (Strand, 2010). The challenge is neither to suppose that ICTs will inevitably drive positive change, nor to ignore them as merely high-tech distractions. Rather, there is a need to look in detail at the motivations for ICT adoption, and the context in which ICTs are being deployed, seeking to understand the ways in which strategic and sustainable investments can be made that promote the integrity of public services, and the capacity of officials, citizens and other stakeholders to secure effective and accountable governments.

In this issue paper we consider the reasons that may lead governments to adopt anti-corruption related ICT innovations, and we look at the evidence on how the uptake and use of these ICTs may affect their impacts. In doing so, we draw upon literature from a range of fields, including open government, transparency and anti-corruption, e-government and technology for transparency, and we draw in speculation from our observations of the open government field over the last five years. To ground our argument, we offer a range of illustrative case studies that show some of the different kinds of ICT interventions that governments are engaging with.

Comments? Questions? Add your notes on the Google Doc version of this draft here.

References

Bailard, C., Baker, R., Hindman, M., Livingston, S., & Meier, P. (2012). Mapping the Maps: A meta-level analysis of Ushahidi and Crowdmap.

Brown, G. (2013). Why Kenya’s open data portal is failing — and why it can still succeed | Opening Parliament Blog Post. Retrieved from http://blog.openingparliament.org/post/63629369190/why-kenyas-open-data-portal-is-failing-and-why-it

Global Integrity. (2012). So What’s In Those OGP Action Plans, Anyway? Global Integrity Blog. Retrieved from http://globalintegrity.org/blog/whats-in-OGP-action-plans

Open Government Partnership. (2011). Open Government Declaration (pp. 1–2).

Strand, C. (2010). Introduction. In C. Strand (Ed.), Increasing transparency and fighting corruption through ICT: empowering people and communities (Vol. 8). SPIDER. doi:10.1016/0083-6656(66)90013-4

Opening the National Pupil Database?

[Summary: some preparatory notes for a response to the National Pupil Database consultation]

The Department for Education are currently consulting on changing the regulations that govern who can gain access to the National Pupil Database (NPD). The NPD holds detailed data on every student in England, going back over ten years, and covering topics from test and exam results, to information on gender, ethnicity, first language, eligibility for free school meals, special educational needs, and detailed information on absences or school exclusion. At present, only a specified list of government bodies are able to access the data, with the exception that it can be shared with suitably approved “persons conducting research into the educational achievements of pupils”. The DFE consultation proposed opening up access to a far wider range of users, in order “to maximise the value of this rich dataset“.

The idea that government should maximise the value of the data it holds has been well articulated in the open data policies and white paper that suggests open data can be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”. However, the open data movement has always been pretty unequivocal on the claim that ‘personal data’ is not ‘open data’ – yet the DFE proposals seek to apply an open data logic to what is fundamentally a personal, private and sensitive dataset.

The DFE is not, in practice, proposing that the NPD is turned into an open dataset, but it is consulting on the idea that it should be available not only for a wider range of research purposes, but also to “stimulate the market for a broader range of services underpinned by the data, not necessarily related to educational achievement”. Users of the data would still go through an application process, with requests for the most sensitive data subject to additional review, and users agreeing to hold the data securely: but, the data, including easily de-anonymised individual level records, would still be given out to a far wider range of actors, with increased potential for data leakage and abuse.

Consultation and consent

I left school in 2001 and further education in 2003, so as far as I can tell, little of my data is captured by the NPD – but, if it was, it would have been captured based not on my consent to it being handled, but simple on the basis that it was collected as an essential part of running the school system. The consultation documents state that “The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices”, yet, it would be hard to argue this would constitute informed consent for the data to now be shared with commercial parties for uses far beyond the delivery of education services.

In the case of the NPD, it would appear particularly important to consult with children and young people on their views of the changes – as it is, after all, their personal data held in the NPD. However the DFE website shows no evidence of particular efforts being taken to make the consultation accessible to under 18s. I suspect a carefully conducted consultation with diverse groups of children and young people would be very instructive to guide decision making in the DFE.

The strongest argument for reforming the current regulations in the consultation document is that, in the past, the DFE has had to turn down requests to use the data for research which appears to be in the interests of children and young people’s wellbeing. For example, “research looking at the lifestyle/health of children; sexual exploitation of children; the impact of school travel on the environment; and mortality rates for children with SEN”. It might well be that, consulted on whether the would be happy for their data to be used in such research, many children, young people and parents would be happy to permit a wider wording of the research permissions for the NPD, but I would be surprised if most would happily consent to just about anyone being able to request access to their sensitive data. We should also note that, whilst some of the research DFE has turned down sound compelling, this does not necessarily mean this research could not happen in any other way: nor that it could not be conducted by securing explicit opt-in consent. Data protection principles that require data to only be used for the purpose it was collected cannot just be thrown away because they are inconvenient, and even if consultation does highlight people may be willing for some wider sharing of their personal data for good, it is not clear this can be applied retroactively to data already collected.

Personal data, state data, open data

The NPD consultation raises an important issue about the data that the state has a right to share, and the data it holds in trust. Aggregate, non-disclosive information about the performance of public services is data the state has a clear right to share and is within the scope of open data. Detailed data on individuals that it may need to collect for the purpose of administration, and generating that aggregate data, is data held in trust – not data to be openly shared.

However, there are many ways to aggregate or process a dataset – and many different non-personally identifying products that could be built from a dataset, Many of these government will never have the need to create – yet they could bring social and economic value. So perhaps there are spaces to balance the potential value in personally sensitive datasets with the the necessary primacy of data protection principles.

Practice accommodations: creating open data products

In his article for the Open Data Special Issue of the Journal of Community Informatics I edited earlier this year, Rollie Cole talks about ‘practice accommodations’ between open and closed data. Getting these accommodations right for datasets like the NPD will require careful thought and could benefit from innovation in data governance structures. In early announcements of the Public Data Corporation (now the Public Data Group and Open Data User Group), there was a description of how the PDC could “facilitate or create a vehicle that can attract private investment as needed to support its operations and to create value for the taxpayer”. At the time I read this as exploring the possibility that a PDC could help private actors with an interest in public data products that were beyond the public task of the state, but were best gathered or created through state structures, to pool resources to create or release this data. I’m not sure that’s how the authors of the point intended it, but the idea potentially has some value around the NPD. For example, if there is a demand for better “demographic models [that can be] used by the public and commercial sectors to inform planning and investment decisions” derived from the NPD, are there ways in which new structures, perhaps state-linked co-operatives, or trusted bodies like the Open Data Institute, can pool investment to create these products, and to release them as open data? This would ensure access to sensitive personal data remained tightly controlled, but would enable more of the potential value in a dataset like NPD to be made available through more diverse open aggregated non-personal data products.

Such structures would still need good governance, including open peer-review of any anonymisation taking place, to ensure it was robust.

The counter argument to such an accommodation might be that it would still stifle innovation, by leaving some barriers to data access in place. However, the alternative, of DFE staff assessing each application for access to the NPD, and having to make a decision on whether a commercial re-use of the data is justified, and the requestor has adequate safeguards in place to manage the data effectively, also involves barriers to access – and involves more risk – so the counter argument may not take us that far.

I’m not suggesting this model would necessarily work – but introduce it to highlight that there are ways to increase the value gained from data without just handing it out in ways that inevitably increase the chance it will be leaked or mis-used.

A test case?

The NPD consultation presents a critical test case for advocates of opening government data. It requires us to articulate more clearly the different kinds of data the state holds, to be be much more nuanced about the different regimes of access that are appropriate for different kinds of data, and to consider the relative importance of values like privacy over ideas of exploiting value in datasets.

I can only hope DFE listen to the consultation responses they get, and give their proposals a serious rethink.

Further reading and action: Privacy International and Open Rights Group are both preparing group consultation inputs, and welcome input from anyone with views of expert insights to offer.

What should a UK Open Government Partnership Forum look like?

[Summary: Open spaces events across that whole UK that provide access for all ages are key to an effective UK OGP forum]

A key step in a countries participation in the Open Government Partnership (OGP) involves establishing ongoing public consultation between government, citizens, civil society organisations and the private sector on the development and implementation of OGP action plans. Given the UK is currently co-chair of OGP, and will be hosting the next OGP plenary meeting in London in March next year, establishing an effective, credible and dynamic forum for ongoing multi-stakeholder participation in OGP should be a top priority.

Members of the informal network of UK-based Civil Society Organisations (CSOs) engaging with the OGP process have been thinking about what such a forum could look like, and in this post I want to offer one possible take, based on my experience of taking part in a range of open space and unConference events over recent years.

Proposal: At the heart of the UK OGP forum should be a series of regular open space events, taking place across the UK, with a focus on getting out of London. Events should be open to anyone to take part – from active citizens and community groups, to social entrepreneurs, private sector firms, national and local government representatives and local and international CSOs.
Simple principles of inclusion should be established to ensure the events provide a welcoming environment for all, including for children and young people, and older people .

What is an open space or unConference?

Open space events are created by their participants. Rather than having a set agenda, the discussion agenda for an open space event is set on the day by participants announcing sessions and discussions they would like to take part in. Participants then self-select to take part in the sessions they have the most interest in. Simple principles encourage participants, wherever they come from, to take shared ownership of the discussions and the outcomes of the day. Open space events and unConferences can have a focussed theme to guide the focus of the specific sessions that take place.

I first encountered open space on a large scale in the UKGovCamp unconferences, which, as it turns out, are in many ways a paradigmatic example of key aspects of digital open government in action. At the annual UKGovCamp events (and their spin off LocalGovCamp events around the UK), civil servants, citizens, CSOs, social innovators, business people, and event a few politicians, spend a day in practical conversation about how to make government work better – sharing knowledge, developing plans and deepening shared commitment to shared problems.

See the Wikipedia article on Open-space technology for more on open space, and links to examples of open space events in action.

Why should open space events be part of the UK OGP forum?

Open Government is about more than a few action plan commitments to better ICT systems or increasing access to data. It involves active rethinking the relationship between citizen and state both as democracy continues to evolve, and as technologies, globalisation and other social forces reconfigure the capabilities of both citizens and governments. Open Government needs mass participation – and open space events are one way to develop action-focussed dialogues that support large-scale participation.

A UK OGP Forum needs to be not only about feeding demands up to government, but also about disseminating OGP ideas and commitments across the whole of the public sector. For many people, it is open local government which will have most impact on their lives, and taking the OGP conversation on the road to events that can include all tiers of government provides an opportunity to join up open government practice across government.
Open space events are also very cost-effective. You need a room, some refreshments, some flip-chart paper – and, well, that’s about it.
Open space events are powerful network building opportunities – helping develop both civil society open government networks, and build new connections between civil society and government (and even across different parts of government)
With social media and a few social reporters, open space events can also become largely self-documenting, and with good facilitation it is possible to include remote participation, using the Internet to make sure anyone with a contribution to make to a topic under discussion can input into the dialogue.
Most of all, open space events embody principles of openness, collaboration and innovation – and so are an ideal vehicle for developing a dynamic UK OGP forum.

How could it work in practice?

Well, there’s nothing to stop anyone organising their own Open Government unConference, inviting civil servants and a whole range of other stakeholders, recording the key outcomes of the discussions, and then sending that all to the Cabinet Office team working on the UK’s OGP participation. However, to make open space a core part of the UK OGP process a number of elements may be worth considering. Here’s one sketch of how that could work:

In partnership with the OGP team in government, planning a series of quarterly OGP open space events, which central civil servants commit to take part in. These would take place in each of the nations of the United Kingdom, and should have as their core theme the commitments of the UK Action Plan. Events should issue and open invite, and should be designed to ensure maximum diversity of participants from across all sectors.
In addition, government, CSOs and other stakeholders should agree to providing sponsorship for thematic OGP open space meetings. Anyone could organise a thematic meeting, providing they apply key principles of inclusiveness, open participation and transparency in the organisation of the events.
The OpenGovernment.org.uk site becomes a platform to collate notes from all the discussion sessions, drawing on social media content and notes captured by facilitators and rapporteurs at the events.
Each individual open space discussion within the events does not have to reach a consensus on its topic, but would have the option of producing a 1/2 page summary of discussions that can be shared online. Government commit to reading all these notes when reviewing the action plan.
Existing open space events (e.g. UKGovCamp) could choose to add an OGP track of discussions, feeding in as any thematic event would.

What about formal representation and accountability? How do decisions get made?

Some of the other ideas for a UK OGP Forum are far more focussed on formal structures and procedures. I don’t reject the value of formal structures where questions of accountability and representation are in play. However, unless actual authority to decide what does into country action plans is shared with an OGP forum, then as a consultative body, a more open model would seem more appropriate.

Established CSOs have existing channels through which they are talking with government. A forum should help them co-ordinate their asks and offers on open government issues through existing channels, rather than add another narrow channel of communication.

Open processes are not immune from their problems: they can suffer from those who shout loudest being those who are heard most, or from those in power being able to pick and choose which voices they engage with. However, finding ways to deal with these issues in the open is an important challenge and learning journey for us to go on if we truly want to find inclusive models of open governance and open government that work…

A realistic proposal?

I’ve written this outline sketch up as a contribution to the debate on what an OGP forum should look like. Government tendencies to control processes, and manage engagement in neat boxes can be strong. But to an extent open government has to be about challenging that – and as a process that will involve a shared learning journey for both government, civil society and citizens, I hope this does make for a realistic proposal…