[Summary: Reflections on the design of ITU Data Pledge project]
The ITU, under their “Global Initiative on AI and Data Commons” have launched a process to create a ‘Data Pledge’, designed as a mechanism to facilitate increased data sharing in order to support “response to humanity’s greatest challenges” and to ”help support and make available data as a common global resource.”.
Described as complementary to existing work such as the International Open Data Charter, the Pledge is framed as a tool to ‘collectively make data available when it matters’, with early scoping work discussing the idea of conditional pledges linked to ‘trigger events’, such that an organisation might promise to make information available specifically in a disaster context, such as the current COVID-19 Pandemic. Full development of the Pledge is taking place through a set of open working groups.
This post briefly explores some of the ways in which a Data Pledge could function, and considers some of the implications of different design approaches.
[Context: I’ve participated in one working group call around the data pledge project in my role as Project Director of the Global Data Barometer, and this is written up in a spirit of open collaboration. I have no formal role in the data pledge project..]
Governments, civil society or private sector
Should a pledge be tailored specifically to one sector? Frameworks for governments to open data are already reasonably well developed, as our mechanisms that could be used for governments to collaborate on improving standards and practices of data sharing.
However, in the private sector (and to some extent, in Civil Society), approaches to data sharing for the public good (whether as data philanthropy, or participation in data collaboratives are much less developed – and are likely the place in which a new initiative could have the greatest impact.
Individual or collective action problems
PledgeBank, a MySociety project that ran from 2005 to 2015, explored the idea of pledging as a solution to collective action problems. Pledges of the form: “I’ll do something, if a certain number of people will help me” are now familiar in some senses through crowdfunding sites and other online spaces. A Data Pledge could be modelled on the same logic – focussing on addressing those collective action problems either where:
- A single firm doesn’t want to share certain data because doing so, when no-one else is, might have competitive impacts: but if a certain share of the market are sharing this data, it no longer has competitive significance, and instead it’s public good value can be realised.
- The value of certain data is only realised as a result of network effects, when multiple firms are sharing similar and standardised data – but the effort of standardising and sharing data is non-negligible. In these cases, a firm might want to know that there is going to be a Social Return on Investment before putting resources into sharing the data.
However, this does introduce some complexity into the idea of pledging (and the actions pledged) and might, as PledgeBank found, lead also to lots of unrealised potential.
Pledging can also be approached as a means of solving individual motivational problems: helping firms to overcome inertia that means they are not sharing data which could have social value. Here, a pledge is more about making a statement of intent, which garners positive attention, and which commits the firm to a course of action that should eventually result in shared data.
Both forms of pledging can function as useful signalling – highlighting data that might be available in future, and priming potential ecosystems of intermediaries and users.
An organisational or dataset-specific pledge
Should a Pledge be about a general principle of data sharing for social good? Or about sharing a specific dataset? It may be useful to think about the architecture of the Data Pledge involving both: or at least, optionally involving data-specific pledges, under a general pledge to support data sharing for social good.
Think about organisational dynamics. Individual teams in a large organisation may have lots of data they could safely and appropriately share more widely for social good uses, but they do not feel empowered to even start thinking about this. A high-level organisational pledge (e.g. “We commit to share data for social good whenever we can do so in ways that do not undermine privacy or commercial position”) that sets an intention of a firm to support data philanthropy, participate in data collaboratives, and provide non-competitive data as open data, could provide the backing that teams across the organisation need to take steps in that direction.
At the same time, there may be certain significant datasets and data sources that can only be shared with significant high-level leadership from the organisation, or where signalling the specific data that might be released, or purposes it might be released for, can help address the collective action issues noted above. For these, dataset specific pledging (e.g. “We commit to share this specific dataset for the social good in circumstance X ”) can have significant value.
Triggers as required or optional
Should a pledge be structured to place emphasis on ‘trigger conditions’ for data sharing? Some articulations of the Data Pledge appear to think of it as a bank of data that could be shared in particular crisis situations. E.g. “We’ll share detailed supply chain information for affected areas if there is a disaster situation.”. There are certainly datasets of value that might not be listed as a Pledge unless trigger conditions can be described, but it’s important that the design of a pledge does not present triggers as essentially shifting any of the work on data sharing to some future point. Preparing for data to be used well and responsibly in a crisis situation requires work in advance of the trigger events: aligning datasets, identifying how they might be used, and accounting carefully for possible unintended consequences that need to be mitigated against.
There are also many global crisis we face that are present and ongoing: the climate crisis, migration, and our collective failure to be on track against the Sustainable Development Goals.
Brokering and curating
Data is always about something, and different datasets exist within (and across) different data communities and cultures. To operationalise a pledge will involve linking actors pledging to share data into relevant data communities: where they can understand user needs in more depth, and be able to publish with purpose.
The architecture of a Data Pledge, and of any supporting initiative around it, will need to consider how to curate and connect the many organisations that might engage – building thematic conversations, spotting thematic spaces where a critical mass of pledges might unlock new social value, or identifying areas where there are barriers stopping pledges turning into data flows.
Incorporating context, consent and responsible data principles
Increased data sharing is not an unalloyed good. Approaching data for the public good involves balancing openness and sharing, with robust principles and practices of data protection and ethics, including attention to data minimisation, individual rights, group data privacy, indigenous data sovereignty and dataset bias. Data should also be shared with clear documentation of it’s context, allowing an understanding of its affordances and limitations, and supporting debate over how data ecosystems can be improved in service of social justice.
A Pledge has an opportunity to both set the bar for responsible data practice, and to incentivise organisational thinking about these issues, by including terms that require pledging organisations to uphold high standards of data protection, only sharing personal data with clear informed consent or personal-derived data after clear processes that consider privacy, human rights and bias impacts of data sharing. Similarly, organisations could be asked to commit to putting their data in context when it is shared, and to engaging collaboratives with data users.
There may also be principles to incorporate here about transparency of data sharing arrangements – supporting development of norms about publishing clearly (a) who data is shared with and for what purpose; and (b) the privacy impact assessments carried out in advance of such shares.
Conditional on capacity?
Should pledging organisations be able to signal that they would need resources in order to make certain data available? I.e. We have Dataset X which has a certain social value: but we can’t afford to make this available with our internal resources? For low-resource organisations, including SMEs or organisations operating in low income economies, this could be a way to signal to philanthropic projects like data.org a need for support. But it could also be used by higher-resource organisations to put a barrier in front of data sharing. However, if a Pledge targets civil society pledgees, then allowing some way to indicate capacity needs if data is to be shared is likely to be particularly important.
A synthesis sketch
Whilst ideologically, I’d favour a focus on building and governing data commons, more directly addressing the modern ‘enclosure’ of data by private firms, and not forgetting the importance of proper taxation of data-related businesses to finance provision of public goods, if it’s viable to treat a data pledge as a pragmatic tool to increase availability for data for social good uses, then I’d sketch the following structure:
- Target private sector organisations
- A three part pledge
- 1. A general organisational commitment to treat data as a resource for the public good;
- 2. A linked organisational commitment to responsible data practices whenever sharing data;
- 3. An optional set of dataset specific pledges, each with optional trigger conditions
- A platform allowing pledging organisations to profile their pledges, detail contact points for specific datasets and contact points for organisation-wide data stewards, and to connect with potential data users;
- A programme of work to identify pre-work needed to allow data to be effectively used if trigger conditions are met ;