Monthly Archives: February 2014

Five critical questions for constructing data standards

I’ve been spending a lot of time thinking about processes of standardisation recently (building on the recent IATI Technical Advisory Group meeting, working on two new standards projects, and conversations at today’s MIT Center for Civic Media & Berkman Center meet-up). One of the key strands in that thinking is around how pragmatics and ethics of standards collide. Building a good standard involves practical choices based on the data that is available, the technologies that might use that data and what they expect, and the feasibility of encouraging parties who might communicate using that standard to adapt their practices (more or less minimally) in order to adopt it. But a standard also has ethical and political consequences, whether it is a standard deep in the Internet stack (as John Morris and Alan Davidson discuss in this paper from 2003[1]), or a standard at the content level, supporting exchange of information in some specific domain.

The five questions below seek to (in a very provisional sense) capture some of the considerations that might go into an exploration of the ethical dimensions of standard construction[2].

(Thanks to Rodrigo DaviesCatherine D’Ignazio and Willow Brugh for the conversations leading to this post)

For any standard, ask:

Who can use it?

Practically I mean. Who, if data in this standard format was placed in front of them, would be able to do something meaningful with it. Who might want to use it? Are people who could benefit from this data excluded from using it by it’s complexity?

Many data standards assume that ‘end users’ will access the data through intermediaries (i.e. a non-technical user can only do anything with the data after it has been processed by some intermediary individual or tool) – but not everyone has access to intermediaries, or intermediaries may have their own agendas or understandings of the world that don’t fit with those of the data user.

I’ve recently been exploring whether it’s possible to turn this assumption around, and make simple versions of a data standard the default, with more expressive data models available to those with the skills to transform data into these more structured forms. For example, the Three Sixty Giving standard (warning: very draft/provisional technical docs) is based around the idea of a rich data model, but a simple flat-as-possible serialisation that means most of the common forms of analysis someone might want to do with the data can be done in a spreadsheet, and for 90%+ of cases, data can be exchanged in flat(ish) forms, with richer structures only used where needed.

What can be expressed?

Standards make choices about what can be expressed usually at two levels:

  • Field choice
  • Taxonomies / codelists

Both involve making choices about how the world is sliced up, and what sorts of things can be represented and expressed.

A thought experiment: If I asked people in different social situations an open question inviting them to tell me about the things a standard is intended to be about (e.g. “Tell me about this contract?”) how much of what they report can be captured in the standard? Is it better at capturing the information seen as important to people in certain social positions? Are there ways it could capture information from those in other positions?

What social processes might it replace or disrupt?

Over the short-term, many data standards end up being fed by existing information systems – with data exported and transformed into the standard. However, over time, standards can lead to systems being re-engineered around them. And in shifting the flow of information inside and outside of organisations, standards processes can disrupt and shift patterns of autonomy and power.

Sometimes the ‘inefficient’ processes of information exchange, which open data standards seek to rationalise, can be full of all sorts of tacit information exchange, relationship building etc. which the introduction of a standard could affect. Thinking about how the technical choices in a standard affect it’s adoption, and how far they allow for distributed patterns of data generation and management may be important. (For example, which identifiers in a standard have to be maintained centrally, thus placing a pressure for centralised information systems to maintain the integrity of data – and which can be managed locally – making it easier to create more distributed architectures. It’s not simply a case of what kinds of architectures a standard does or doesn’t allow, but which it makes easier or trickier, as in budget constrained environments implementations will often go down the path of least resistance, even if it’s theoretically possible to build out implementation of standard-using tools in ways that better respect the exiting structures of an organisation.)

Which fields are descriptive? Which fields are normative?

There has recently been discussion of the introduction on Facebook of a wide range of options for describing Gender, with Jane Fae arguing in the Guardian that, rather than provide a restricted list of fields, the field should simply be dropped altogether. Fae’s argument is about the way in which gender categories are used to target ads, and that it has little value as a category otherwise.

Is it possible to look at a data standard and consider which proposed fields import strong normative worldviews with them? And then to consider omitting these fields?

It may be that for some fields, silence is the better option that forcing people, organisations or events (or whatever it is that the standard describes) into boxes that don’t make sense for all the individuals/cases covered…

Does it permit dissent?

Catherine D’Ignazio suggested this question. How far does a standard allow itself to be disputed? What consequences are there to breaking the rules of a standard or remixing it to express ideas not envisaged by the original architects? What forms of tussle can the standard accommodate?

This is perhaps even more a question of the ecosystem of tools, validators and other resources around the standard than a standard specification itself, but these are interelated.


[1]: I’ve been looking for more recent work on ‘public interest’ and politics of standard creation. Academically I spend a lot of time going back to Bowker and Star’s work on ‘infrastructure’, but I’m on the look out for other works I should be drawing upon in thinking about this.

[2]: I’m talking particularly about open data standards, and standards at the content level, like IATI, Open 311, GTFS etc.

How can we make Internet Governance processes more legible?

[Summary: Links and reflections on the need for an improved information and engagement architecture for Internet Governance]

At a Berkman lunchtime talk today, Veni Markovski, ICANN vice-president for Russia, discussed high-level conferences on ICT and the Internet’ and what they mean for the Internet as we know it. The two diagrams below which Veni had on screen during his talk capture the increasing complexity of the Internet Governance process, with a mix of open and closed meetings of overlapping participants and stakeholders.


You can find Nate Mathias’s live-blog of the talk here, including reporting from the Q&A where Ethan Zuckerman put the question, with the importance of upcoming decisions: What should people who care about the Internet do? And, what should foundations be doing in this space too? Vini’s response was a call for interested parties to get involved in Internet Governance, following mailing lists and taking the advantage of remote participation in upcoming meetings.

Yet – with the complexity visible above, doing that is no small task. Keeping up with Internet Governance mailing lists could easily be a full-time job: and meeting information, participation opportunities and meeting records are scattered across the web. The ‘information architecture’ of Internet Governance is far from intelligible to outsiders trying to work out which issues matter to them, where they should get involved, and what the history of an issue is. It seems not a little ironic given the potential of the web to link up and make information more navigable, and to support global engagement and interaction, that Internet Governance processes and their online presences (and particularly those launched recently) feel very old fashioned. Whilst the early multi-stakeholderism of many Internet Governance fora was innovative, it feels very much like that innovation is on the wane as governments increasingly shape the agenda, and civil society capacity is spread ever more thinly.

So: what process and technical innovations should the Internet Governance field be engaging with to make it possible for more people to be involved in?

The recently launched Friends of the IGF project is trying to address some of the problems that exist when it comes to the Internet Governance Forum, bringing together and curating transcripts from past fora, and trying to tag content and speakers, proving new entry points into the governance debates. Tomorrow we’ll be having a skill-share workshop at the Berkman Center with Susan Chalmers who heads up the project, exploring how an open and user-centred design process might help focus that project on meeting key needs of IGF followers. But it feels like we also need a much broader conversation, and work on design, to join the dots between different Internet Governance silos for those approaching from outside, and to really work on institutionalisation of improved and open working practices.