20 ways to connect open data and local democracy

[Summary: notes for a workshop on local democracy and open data]

At the Local Democracy for Everyone (#notInWestminister) workshop in Huddersfield today I led a session titled ‘20 ways to connect open data and local democracy‘. Below is the list of ideas we started the workshop with

In the workshop we explored how these, and other approaches, could be used to respond to priority local issues, from investing funds in environmental projects, to shaping local planning processes, and dealing with nuisance pigeons.

Graphic recording from break-out session by [@Jargonautical](http://www.twitter.com/jargonautical]

There is more to do to re-imagine how local open data should work, but the conversations today offered an interesting start.

1. Practice open data engagement

Data portals can be very impersonal things. But behind every dataset is a council officer or a team working to collect, manage and use the data. Putting a human face on datasets, linking them to the policy areas they affect, and referencing datasets from reports that draw upon them can all help put data in context and make it more engaging.

The Five Stars of Open Data Engagement provides a model for stepping up engagement activities, from providing better and more social meta-data, through to hosting regular office-hours and drop-in sessions to help the local community understand and use data better.

2. Showing the council contribution

A lot of the datasets required by the Local Government Transparency Code are about the cost of services. What information and data is needed to complete the picture and to show the impact of services and spending?

The Caring for my Neighbourhood project in Sao Paulo looked to geocode government budget and spending data, to understand where funds were flowing, and have opened up a conversation with government about how to collected data in ways that make connecting budget data and its impacts easier in future.

Local government in the UK has access to a rich set of service taxonomies which could be used to link together data on staff salaries, contracts and spending, with stats and stories on the service they provide and their performance. Finding ways to make this full picture accesssible and easy to digest can provide the foundation for more informed local dialogue.

3. Open Data Discourses

In Massachussetts the Open Data Discourse project has been developing the idea of data challenges: based not just one app-building, but also on using data to create policy ideas that can address an identified local challenge.

For Cambridge, Mass, the focus for the first challenge in fall 2014 was on pedestrian, bicycle, and car accidents in the City. Data on accidents was provided, and accesed over 2,000 times in a six-week challenge period. The challenge resulted in eight submissions “that addressed policy-relevant issues such as how to format traffic accident data to enable trend analysis across the river into Boston, or how to reduce accidents and encourage cycling by having a parked car buffer.”

The challenge processes culminated in a friday evening meeting that brought together community members who had worked on challenge ideas, with councillors and representatives of the local authority, to showcase the solutions and provide an award for a winning idea.

4. Focus on small data

There’s a lot of talk out there about ‘big data’ and how big data analytics can revolutionise government. But many of the datasets that matter are small data: spreadsheets created by an officer, or records held by community groups in various structures and formats.

Rahul Bhargava defines small data as:

“the thing that community groups have always used to do their work better in a few ways:

  • Evaluate: Groups use Small Data to evaluate programs so they can improve them
  • Communicate: Groups use Small Data to communicate about their programs and topics with the public and the communities they serve
  • Advocate: Groups use Small Data to make evidence-based arguments to those in power”

Simple steps to share and work with small data can make a big difference: and keep citizens rather than algorythms in control.

5. Tactile data and data murals

The Data Therapy project has been exploring a range of ways to make data more tactile: from laser-cutting food security information into vegetables to running ‘low tech data’ workshops that use pipe-cleaners, lego and crayons to explore representations of data about a local community.

Turning complex comparisons and numbers into physical artefacts, and finding the stories inside the statitics can offer communities a way into data-informed dialogue, without introducing lots of alienating graphs and numbers.

The Data Therapy project’s data murals connect discussions of data with traditional community arts practice: painting large scale artworks that represent a community interpretation of local data and information.

6. Data-driven art

The Open Data Institute’s Data as Culture project has run a series of data art commissions: leading to a number of data-driven art works that bring real-time data flows into the physical environment. In 2011 Bristol City Council commissioned a set of art works, ‘Invisible Airs‘ that included a device stabbing books in response to library cuts, and a spud gun triggered by spending records.

Alongside these political art works that add an explicit emotional dimension to public data, low-cost network connected devices can also be used to make art that passively informs – introducing indicators that show the state of local data into public space.

7. Citizen science

Not all the data that matters to local decision making comes from government. Citizens can create their own data, via crowdsourcing and via citizen-science approaches to data collection.

The Public Lab describes itself as a ‘DIY Environmental Science Community’ and provides How To information on how citizens groups can build their own sensors or tools for everything from arial mapping to water quality monitoring. Rather than ‘smart cities’ that centralise data from sensor networks, citizen science offers space for a collaboration between government and communities – creating smart citizens who can collect and make sense of data alongside local officials.

In China, citizens started their own home water quality testing to call for government to recognise and address clean water problems.

8. Data dives & hackathons

DataKind works to bring together expert analysts with social-sector organisations that have data in order to look for trends and insights. Modelled on a hackathon, where activity takes place over an intense day or weekend of work, DataDives can generate new findings, new ideas about hwo to use data, and new networks for the local authority to draw upon.

Unlike a hackathon where the focus is often on developing a technical app or innovation and where programme skill is often a pre-requisite, a Data Dive might be based around answering a particular question, or around finding what data means to multi-disciplinary teams.

It is possible to design inclusive hackathons which connect up the lived experience of communities with digital skills from inside and outside the community. The Hackathon FAQ explores some of the common pitfals of holding a civic hackathons: encouraging critical thought about whether prizes and other common features are likely to incentivise contributions, or distort the kinds of team building and collaboration wanted in a civic setting.

9. Contextualised consultation

Too often local consultations ask questions without providing citizens with the information they might need to explore and form their opinions. For example, a online consultation on green spaces, simply by asking for the Ward or Postcode of a respondent, could provide tailored information (and questions) about the current green spaces nearby.

Live open data feedback on the demographics and diversity of consultation respondents could also play a role in incentivising people to take part to ensure their views are represented.

It’s important though not to make too many assumptions when providing contextualised data: a respondent might care about the context near where their parents or children live, as much as their own for example – and so interfaces should offer the ability to look at data around areas other than your home.

10. Adopt a dataset

When it snows in America, Fire Hydrants on the street can get frozen under the ice, and so its important to dig them out after snowfall. However, the council don’t have resources to always get to all the hydrants in time. Code for America found an ingenious solution, taking an open dataset of fire hydrants, and creating a campaign for people to ‘Adopt a Hydrant‘, committing to dig it out when the blizzards come. They combined data with a social layer.

The same approach could work for many other community assets, but it could also work for datasets. Which dataset could be co-created with the community? Could walkers help adopt footpath data and help keep it updated? Could the local bus user group adopt data on accessibility of public tranport roots, helping keep it updated?

The relationships created around a data quality feedback loop might also become important relationships for improving the services that the data describes. ?

11. Data-rich press releases

Local authorities are used to putting out press releases, often with selected statistics in. But how can those releases also contain links to key datasets, and even interactive assets that journalists and the public can draw upon to dig deeper into the data.

Data visualisation expert David McCandless has argued that interactivity plays an important role in allowing people to explore structured data and information, and to turn it into knowledge. The Guardian Data Blog has shown how engaging information can be created from datasets. Whilst the Data Journalism Handbook offers some pointers for journalists (and local bloggers) to get started with data, many local newspapers don’t have the dedicated data-desks of big media houses – so the more the authority can do to provide data in ready-to-reuse forms, the more it can be turned into a resource to support local debate.

12. URLs for everything – with a call to action

Which is more likely to turn up on Twitter and get clicked on:

“What do you think of new cycle track policy? Look on page 23, paragraph 2 or report at bottom of this page: http://localcouncil.gov/reports/1234”? or

“What do you think of new cycle track policy? http://localcouncil.gov/policy/ab12”

Far too often the important information citizens might want might be online, but is burried away in documents or provided in ways that are impossible to link to.

When any proposal, policy, decision or transaction gets a permenant URL (web address) it can become a social object: something people can talk about on twitter and facebook and in other spaces.

For Linked Data advocates, giving everything in a dataset its own URL plays an important role in machine-to-machine communication, but it also plays a really important role in human communication. Think about how visitors to a data item might also be offered a ‘call to action’, whether it’s to report concerns about a spending transaction, or volunteer to get involved in events at a park represented by a data item.

13. Participatory budgeting – with real data

What can £5000 buy you? How much does it cost to run a local carnival? Or a swimming pool? Or to provide improved social care? Or cycle lanes? Answers to these questions might exist inside spending data – but often when participatory budgeting activities take place the information needed to work out what kinds of options may be affordable only comes into the picture late in the process.

Open Spending, the World Bank, NESTA and the Finish Institute have all explored how open data could change the participatory budgeting process – although as yet there have been few experiments to really explore the possibilities.

14. Who owns it?

Kirlees Council have put together the ‘Who Owns My Neighbourhood?’ site to let residents explore land holdings and to “help take responsibility for land, buildings and activities in your neighbourhood”. Similar sites, with the goal of improving how land is used and addressing the problem of vacant lots, are cropping up across American cities.

These tools can enable citizens to identify land and government assets that could be better used by the community: but unchecked they may also risk giving more power to wealthy property speculators as a widely cited case study from Bangalore has warned.

15. Social audits

In many parts of the developing world, particularly across India, the Social Audit is an important process, focussed on “reviewing official records and determining whether state reported expenditures reflect the actual monies spent on the ground” (Aiyar & Samji, 2009).

Social Audits involve citizens groups trained up to look at records and ‘ground truth’ whether or not resources have been used in the way authorities say. Crucially, Social Audits culminate in public hearings: meetings where the findings are presented and discussed.

Models of citizen-led investigation, followed by formal public meetings, are also a feature of the London Citizens community organising approach, where citizens assemblies put community views to people in power. How could key local datasets form part of an evidence gathering audit process, whether facilitated by local government or led by independent community organisations?

16. Geofenced bylaws, licenses and regulations: building the data layer of the local authority

After seeing some of the projects to open up the legal codes of US cities I started where I would find out about the Byelaws in my home town of Oxford. As the page on the City Council website that hosts them explaines: “Byelaws generally require something to be done – or not done – in a particular location.”. Unfortunately, in Oxford, what is required to be done, and where is locked up inside scanned PDFs of typewritten minutes.

There are all sorts of local rules and regulations, licenses and other information that authorities issue which is tied to a particular geographic location: yet this is rarely a layer in the Geographic Information Systems that authorities use. How might geocoding this data, or even making it available through geofencing apps help citizens to navigate, explore and debate the rules that shape their local places.?

17. Conversations around the contracts pipeline?

The Open Contracting project is calling for transparency and participation in public contracting. As part of the UK Local Government Transparency Code authorities have to publish the contracts they have entered into – but publishing the contract pipeline and planned procurement offers an important opportunity to work out if there are fresh ideas or important insights that could shape how funds are spent.

The Open Contracting Data Standard provides a way of sharing a flow of data about the early stages of a contracting process. Combine that information with a call to action, and a space for conversation, and there are ways to get citizens shaping tenders and the selection of suppliers.

18. Participatory planning: visualising the impacts of decisions

What data should a local authority ask developers submitting planning applications to provide?

For many developments there might be detailed CAD models available which could be shared and explored in mapping software to support a more informed conversation about proposed building projects. ?

19. Stats that matter

?Local authorities often conduct one-off surveys and data collection excercises. These are a vital opportunity to build up an understanding of the local area. What opportunities are there to work in partnership with local community groups to identify the important questions that they want to ask? How can local government and community groups collaborate to collect actionable stats that matter: pooling needs, and even resources, to get the best sample and the best depth of insight?

20. Spreadsheet scorecards and dashboards

Dig deep enough in most local organisations and you will find one or more ‘super spreadsheets’ that capture and analyse key statistics and performance indicators. Many more people can easily pick up the skills to create a spreadsheet scorecard than can become overnight app developers.

Google Docs spreadsheets can pick up data live from the web. What dashboards might a local councillor want? Or a local residents association? What information would make them better able to do their job?

Five reflections for an open data hackathon

Future Food HackI was asked to provide a short talk at the start of the Future Food Hackathon that kicked off in Wageningen, NL today, linked to the Global Open Data on Agriculture and Nutrition workshop taking place over the next few days.

Below are the speaker notes I jotted down for the talk.

On open data and impact

I want to start with an admission. I’m a sceptic about open data.

In the last five years we’ve seen literally millions of datasets placed online as part of a broad open data movement – with grand promises made about the way this will revolutionise politics, governance and economies.

But, when you look for impact, with the exception of a few specific domains such as transport, the broad society wide impact of that open data is hard to find. Hundreds of hack-days have showcased what could be possible with data, but few have delivered truly transformative innovations that have made it to scale.

And many of the innovations that result often seem to focus #FirstWorldProblems – if not purely ‘empowering the already empowered’, then at least not really engaging with social issues in ways that are set to tip the balance in favour of those with least advantage.

I’m sceptical, but I’m not pessimistic. In fact, understood as part of a critique of the closed way we’ve been doing aid, policy making, production and development – open data is an incredibly exciting idea.

However, far to much open data thinking has stopped at the critique, without moving on to propose something new and substantive. It offers a negation (data which is not proprietary; not in PDF; not kept from public view), without talking enough about how new open datasets should be constructed. Because opening data is not just about taking a dataset from inside the government or company and putting it online, in practice it involves the creation of new datasets: selecting and standardising fields and deciding how to model data. This ultimately involves the construction of new systems of data.

And this links to a second blind spot of current open data thinking: the emphasis on the dataset, to the exclusion of the social relationships around it.

Datasets do not stand alone. The are produced by someone, or some group, for some purpose. They get meaning from their relationship to other data, and from the uses to which they are put. As Lisa Gitelman and colleagues have put it in ‘Raw Data is an Oxymoron’, datasets have histories, and we need to understand these to reshape their futures.

Matthew Smith and colleagues at the IDRC have spent a number of years exploring the idea of openness in development. They distinguish between openness defined in ‘universal legal and technical terms’, and openness as a practice – and argue that we need to put open practices at the centre of our theory of openness. These practices are, to some extent, enabled by the formalities of creative common licenses, or open data formats, but they are something more, and draw upon the cultures of peer-to-peer production and open source, not just the legal and technical devices.

Ultimately, then, I’m optimistic about the potential of open data if we can to think about the work of projects like GODAN not just as a case of gaining permission to work with a few datasets, but as about building new open and collaborative infrastructures, through which we can use data to communicate, collaborate and reshape our world.

I’m also hopeful about the potential of colliding cultures from open source and open data, with current cultures in the agriculture and nutrition communities. Can we bring these into a dialogue that builds shared understanding of how to solve problems, and lets us rethink both openness, and agriculture, to be more effective, inclusive and just?

Five observations on hacking with open data

Ok: so let me pause. I recognise that the last few minutes might have been a bit abstract and theoretical for 9am on a Monday morning. Let me try then and offer then five somewhat more practical thoughts about approaching an open data hackathon:

1. Hacking is learning.

A common experience of the hackathon is frustration at the data not just being ready to use. Yet the process of struggling with data is a process of learning about the world it represents – and sometimes one of the most important outcomes of a hack is the induction of a new community of people, from different backgrounds, into shared understanding of some data and domain.

One of the most fascinating things about the open government data processes I’ve been tracking in the UK has been the way in which it has supported civic learning amongst technology communities – coming to understand more how the state works by coming to understand its data.

So – at an interdisciplinary hack like this, there is the opportunity to see peculiarities of the data as opportunities to understand the process and politics of the agriculture and nutrition field, and to be better equipped to propose new approaches that don’t try to make perfect data out of problematic situations – but that try and engage with the real challenges and problems of the field.

2. Hacking is political.

I’ve had the pleasure over the last few years of working an number of times with the team at the iHub in Nairobi, and of following the development of [Kenya’s open data initiative]. In their study of an ‘incubator’ project to encourage developers to use Kenyan open government data, Leo Mutuku and her team made an interesting discovery.

Some developers did not understand their apps as products to be taken to scale – but instead saw them as rhetorical acts. A demonstration to government of how ICTs could be used, and a call on government to rethinking its own ICTs, rather than an attempt by outside developers to replace those ICTs for government.

Norfolk based developer, Rupert Reddington, once referred to this as ‘digital pamphleteering’ in which the application is a provocation in a debate – rather than primarily, or at all, a tool for everyday use.

Think about how you present a openness-oriented provocation to the status quo when you pitch your ideas and creations.

3. You are building infrastructure.

Apps created with open data are just one part of the change process. Even a transport app that lets people know when the next bus is only has an impact if it becomes part of people’s everyday practice, and they rely on it in ways that change their behaviour.

Infrastructure is something which fades into the background: when it becomes established and works well, we don’t see it. It is only when it is disrupted that it becomes notable (as I learned trying to cross the channel yesterday – when the Channel Tunnel became a very visible piece of infrastructure exactly because it was blocked and not working).

One of the questions I’m increasingly asking in my research work, is how we can build ‘inclusive infrastructures’, and what steps we need to take to ensure that the data infrastructures we have are tipped in favour of the least advantaged rather than the most powerful. Sometimes the best innovations are ones that complement and extend an existing infrastructure, bringing hitherto unheard voices into the debate, or surfacing hitherto unseen assumptions.

Sustainability is also important to infrastructure. What you create today may just be a prototype – but if you are proposing it as part of a new infrastructure of action – consider if you can how it might be made sustainable. Would building for sustainability change the concept or idea?

4. Look at the whole value chain.

There is a tendency in hackthons to focus on the ‘end user’ – building consumer oriented apps and platforms. Often that approach makes sense: disintermediation can make many systems work better. But it’s not always the way to make the most difference.

When I worked with CABI and the Institute for Development Studies in 2013 to host a ‘Research to Impact’ hackathon at the iHub in Nairobi, we brought together people involved in improving the quality of agriculture and the lives of smallholder farmers. After a lot of discussion, it became clear that between ‘research’ and the ‘farm’ were all sorts of important intermediaries, from seed-sellers, to agricultural extension workers. Instead of building direct-to-farmer information systems, teams explored the kinds of tools that could help an agriculture extension worker deliver better support, or that could help a seed-seller to improve their product range.

Apps with 10s or 100s of back-office users may be much more powerful than apps with 1000s of ‘end users’.

When the two Open Data in Developing Countries project research partners in Kenya launched their research in the middle of last year, an interesting argument broke out between advocates of ‘disintermediation’, and ‘empowering intermediaries’. One the one hand, intermediaries contextualise information, and may be trusted: helping communities adopt information as actionable insights, when they may not understand or trust the information direct from source. On the other hand, intermediaries are often seen as a problem: middle-men using their position for self-interest, and limiting the freedoms of those they are the intermediary to.

Open approaches can offer an important ‘pressure valve’ in these contexts: focussing on creating platforms for intermediary, but not restricting information to intermediaries only.

5. Evolution can be as powerful as revolution.

The UN Secretary General has led the call for a ‘data revolution for development’, with the Independent Expert Group he appointed proposing a major updated in practices of data use and practice.

This revolution narratives often implies that organisations needs to shift direction; completely transforming data practices; throwing out existing report-writing and paper-based approaches in place of new ‘digital by default’ technology-driven processes. But what happens if we think differently and start from the existing strengths of organisations:

  • What is going well when it comes to data in the international potato trade?
  • Who are the organisations with promising practice in localising climate-change relevant information for farmers?
  • What have been the stories of progress in tracking food-borne disease?

How can we extend these successes? What innovations have made their first iteration, but are just waiting for the next?

One of the big challenges of ‘data revolution’ is the organisational change curve it demands, and the complex relationship between data supply and demand. Often the data available right now is not great. For example, if you are currently running a crop monitoring project with documents and meetings, but a new open dataset becomes available that is relevant to your work, starting a ‘data revolution’ tomorrow will involve lots of time working with bad data and finding new ways to work around the peculiarities of the new system: the investment this year to do the same work you were doing with ‘inefficient’ analogue approaches last year might be double, as you scale the learning curve.

Of course, in year 3 or 4, the more efficient way of working may start to pay off: but often projects never get there. And because use of the new open dataset dropped away in year 2, when early adopters realised they could not afford to transform their practices to work with it, government publishers get discouraged, and by year 3 and 4 the data might not be there.

An evolution approach works out how to change practices year-by-year: iterating and negotiating the place of data in the future of food.

(See Open Data in Developing Countries – Insights from Phase I for more on this point)

In conclusion

Ok. Still a bit abstract for 9.15am on a Monday morning: but I hope the general point is clear.

Ultimately, the most important thing about the creations at a hackathon is their ‘theory of change’: how does the time spent hacking show the way towards real change? I’m certainly very optimistic that when it comes to the pitch back tomorrow, the ideas and energy in this room will offer some key pointers for us all.

Internet Monitor 2014 chapter on Data Revolutions: Bottom-Up Participation or Top-Down Control?

Internet Monitor[Summary: cross-posting article from from the 2014 Internet Monitor]

The 2014 Internet Monitor Report has just been launched. It’s packed with over 35 quick reads on the landscape of contemporary Internet & Society issues, from platforms and policy, to public discourse. This years edition also includes a whole section on ‘Data and privacy’. My article in the collection, written earlier this year, is below to archive. I encourage you to explore the whole collection – including some great inputs from Sara Watson and Malavika Jayaram exploring how development agencies are engaging with data, and making the case for building better maps of the data landscape to inform regulation and action.

Data Revolutions: Bottom-Up Participation or Top-Down Control?

In September 2015, through the United Nations, governments will agree upon a set of new Sustainable Development Goals (SDGs) replacing the expired Millennium Development Goals and setting new globally agreed targets on issues such as ending poverty, promoting healthy lives, and securing gender equality.1 Within debates over what the goals should be, discussions of online information and data have played an increasingly important role.

Firstly, there have been calls for a “Data Revolution” to establish better monitoring of progress towards the goals: both strengthening national statistical systems and exploring how “big data” digital traces from across the Internet could enable real-time monitoring.2 Secondly, the massive United Nations-run MyWorld survey, which has used online, mobile, and offline data collection to canvas over 4 million people across the globe on their priorities for future development goals, consistently found “An honest and accountable government” amongst people’s top five priorities for the SDGs.3 This has fueled advocacy calls for explicit open government goals requiring online disclosure of key public information such as budgets and spending in order to support greater public oversight and participation.

These two aspects of “data revolution” point to a tension in the evolving landscape of governments and data. In the last five years, open data movements have made rapid progress spreading the idea that government data (from data on schools and hospitals locations to budget datasets and environmental statistics) should be “open by default”: published online in machine-readable formats for scrutiny and re-use. However, in parallel, cash-strapped governments are exploring the greater use of private sector data as policy process inputs, experimenting with data from mobile networks, social media sites, and credit reference agencies amongst others (sometimes shared by those providers under the banner of “data philanthropy”). As both highly personal and commercially sensitive data, these datasets are unlikely to ever be shared en-masse in the public domain, although this proprietary data may increasingly drive important policy making and implementation.

In practice, the evidence so far suggests that the “open by default” idea is struggling to translate into widespread and sustainable access to the kinds of open data citizens and civil society need to hold powerful institutions to account. The multi-country Open Data Barometer study found that key accountability datasets such as company registers, budgets, spending, and land registries are often unavailable, even where countries have adopted open data policies.4 And qualitative work in Brazil has found substantial variation in how the legally mandated publication of spending data operates across different states, frustrating efforts to build up a clear picture of where public money flows.5 Furthermore, studies regularly emphasize the need not only to have data online, but also the need for data literacy and civil society capacity to absorb and work with the data that is made available, as well as calling for the creation of intermediary ecosystems that provide a bridge between “raw” data and its civic use.

Over the last year, open data efforts have also had to increasingly grapple with privacy questions.6 Concerns have been raised that even “non-personal” datasets released online for re-use could be combined with other public and private data and used to undermine privacy.7 In Europe, questions over what constitutes adequate anonymization for opening public data derived from personally identifying information have been hotly debated.8

The web has clearly evolved from a platform centered on documents to become a data-rich platform. Yet, it is public policy that will shape whether it is ultimately a platform that shares data openly about powerful institutions, enabling bottom up participation and accountability, or whether data traces left online become increasingly important, yet opaque, tools of governance and control. Both open data campaigners and privacy advocates have a key role in securing data revolutions that will ultimately bring about a better balance of power in our world.

Notes

  • 1: UN High-Level Panel of Eminent Persons on the Post-2015 Development Agenda, “A New Global Partnership: Eradicate poverty and transform economies through sustainable development,” 2013, http://www.un.org/sg/management/pdf/ HLP_P2015_Report.pdf.
  • 2: Independent Expert Advisory Group on the Data Revolution, http://www.undatarevolution.org.
  • 3: MyWorld Survey, http://data.myworld2015.org/.
  • 4: World Wide Web Foundation, “Open Data Barometer,” 2013, http://www.opendatabarometer. org.
  • 5: N. Beghin and C. Zigoni, “Measuring open data’s impact of Brazilian national and sub-national budget transparency websites and its impacts on people’s rights,” 2014, http://opendataresearch.org/content/2014/651/measuring-opendatas-impact-brazilian-national-and-sub-national-budget.
  • 6: Open Data Research Network, “Privacy Discussion Notes,” 2013, http://www.opendataresearch.org/content/2013/501/ open-data-privacy-discussion-notes.
  • 7: Steve Song, “The Open Data Cart and Twin Horses of Accountability and Innovation,” June 19, 2013, https:// manypossibilities.net/2013/06/the-open-data-cart-and-twin-horses-of-accountability-and-innovation/.
  • 8: See the work of the UK Anonymisation Network, http://ukanon.net/.

(Article under Creative Commons Attribution 3.0 Unported)

Do we need eligibility criteria for private sector involvement in OGP?

I’ve been in Costa Rica for the Open Government Partnership (OGP) Latin America Regional Meeting (where we were launching the Open Contracting Data Standard), and on Tuesday attended a session around private sector involvement in the OGP.

The OGP was always envisaged as a ‘multi-stakeholder forum’ – not only for civil society and governments, but also to include the private sector. But, as Martin Tisne noted in opening the session, private sector involvement has so far been limited – although an OGP Private Sector Council is currently developing.

In his remarks (building on notes from 2013), Martin outlined six different roles for the private sector in open government, including:

  1. Firms as mediators of open government data – making governance related public data more accessible;
  2. Firms as beneficiaries and users of open data – building businesses of data releases, and fostering demand for, and sustainable supply of, open data;
  3. Firms as anti-corruption advocates – particularly rating agencies whose judgements on risk of investment in a country as a result of poor governance environments can strongly incentivise governments to institute reforms;
  4. Firms practising corporate accountability – including by being transparent about their own activities.
  5. Technology firms providing platforms for citizen-state interaction – from large platforms like Facebook which have played a role in democracy movements, to specifically civic private-sector provided platforms like change.org or SeeClickFix.
  6. Companies providing technical assistance and advice to governments on their OGP action plans.

The discussion panel then went on to look at a number of examples of private sector involvement in open government, ranging from Chambers of Commerce acting as advocates for anti-corruption and governance reforms, to large firms like IBM providing software and staff time to efforts to meet the challenge of Ebola through data-driven projects. A clear theme in the discussion was the need to recognise that, like government and civil society, the private sector is not monolithic. Indeed, I have to remember that I’ve participated in the UK OGP process as a result of being able to subsidise my time via Practical Participation Ltd.

Reflecting on public and private interests

Regardless of the positive contributions and points made by all the panelists in the session, I do find myself approaching the general concept of private sector engagement with OGP with a constructive scepticism, and one that I hope supports wider reflections about the role and accountability of all stakeholders in the process. Many of these reflections are driven by a concern about the relative power of different stakeholders in these processes, and the fact that, in a world where the state is often in retreat, civil society spread increasingly thin, and wealth accumulated in vastly uneven ways, ensuring a fair process of multi-stakeholder dialogue requires careful institutional design. In light of the uneven flow of resources in our world, these reflections also draw on an important distinction between public and private interest.

Whilst there are institutional mechanisms in place (albeit flawed in many cases) that mean both government and non-profits should operate in the public interest, the essential logic of the private sector is to act in private interest. Of course, the extent of this logic varies by type of firm, but large multi-nationals have legal obligations to their shareholders which can, at least when shareholders are focussed on short-term returns, create direct tensions with responsible corporate behaviour. This is relevant for OGP in at least two ways:

Firstly, when private firms are active contributors to open government activities, whether mediating public data, providing humanitarian interventions, offering platforms for citizen interaction, or providing technical assistance, mechanisms are needed in a public interest forum such as the OGP to ensure that such private sector interventions provide a net gain to the public good.

Take for example a private firm that offers hardware or software to a government for free to support it in implementing an open government project. If the project has a reasonable chance of success, this can be a positive contribution to the public good. However, if the motivation for the project comes from private rather than a public interest, and leads to a government being locked into future use of a proprietary software platform, or to an ongoing relationship with the company who have gained special access as a result of their ‘CSR’ support for the open government project – then it is possible for the net-result to be against the public interest.

It should be possible to establish governance mechanisms that address these concerns, and allow the genuine public interest, and win-win contributions of the private sector to open government and development to be facilitated, whilst establishing checks against abuse of the power imbalance, whether due to relative wealth, scale or technical know-how, that can exist between firms and states.

Secondly, corporate contributions to aspects of the OGP agenda should not distract from a focus on key issues of large-scale corporate behaviour that undermine the capacity and effectiveness of governments, such as the use of complex tax avoidance schemes, or the exploitation of workforces and suppression of wages such that citizens have little time or energy left after achieving the essentials of daily living to give to civic engagement.

A proposal

In Tuesday’s session these reflections led me towards thinking about whether the Open Government Partnership should have some form of eligibility criteria for corporate participants, as a partial parallel to those that exist for states. To keep this practical and relevant, they could relate to the existence of key disclosures by the firm for all the settings they operate in: such as disclosure of amount of tax paid, the beneficial owners of the firm, and of the amount of funding the firm is putting towards engagement in the OGP process.

Such requirements need not necessarily operate in an entirely gatekeeping fashion (i.e. it should not be that participants cannot engage at all without such disclosures), but could be instituted initially as a recommended transparency practice, creating space for social pressures to encourage compliance, and giving extra information to those considering the legitimacy of, and weight to give to, the contributions of corporate participants within the OGP process.

As noted earlier, these critical reflection might also be extended to civil society participants: there can also be legitimate concerns about the interests being represented through the work of CSOs. The Who Funds You campaign is a useful point of reference here: CSO participants could be encouraged to disclosure information on who is funding their work, and again, how much resource they are dedicating to OGP work.

Conclusions

This post provides some initial reflections as a discussion starter. The purpose is not to argue against private sector involvement in OGP – but is to, in engaging proactively with a multi-stakeholder model, to raise the need for critical thinking in the open government debate not only about the transparency and accountability of governments, but also about the transparency and accountability of other parties who are engaged.

OCDS – Notes on a standard

logo-open-contracting Today sees the launch of the first release of the Open Contracting Data Standard (OCDS). The standard, as I’ve written before, brings together concrete guidance on the kinds of documents and data that are needed for increased transparency in processes of public contracting, with a technical specification describing how to represent contract data and meta-data in common ways.

The video below provides a brief overview of how it works (or you can read the briefing note), and you can find full documentation at http://standard.open-contracting.org.

When I first jotted down a few notes on how to go forward from the rapid prototype I worked on with Sarah Bird in 2012, I didn’t realise we would actually end up with the opportunity to put some of those ideas into practice. However: we did – and so in this post I wanted to reflect on some aspects of the standard we’ve arrived at, some of the learning from the process, and a few of the ideas that have guided at least my inputs into the development process.

As, hopefully, others pick up and draw upon the initial work we’ve done (in addition to the great inputs we’ve had already), I’m certain there will be much more learning to capture.

(1) Foundations for ‘open by default’

Early open data advocacy called for ‘raw data now‘, asking for governments to essentially export and dump online existing datasets, with issues of structure and regular publishing processes to be sorted out later. Yet, as open data matures, the discussion is shifting to the idea of ‘open by default’, and taken seriously this means more than just data dumps that are created being openly licensed as the default position, but should mean that data is released from government systems as a matter of course in part of their day-to-day operation.

green_compilation.svgThe full OCDS model is designed to support this kind of ‘open by default’, allowing publishers to provide small releases of data every time some event occurs in the lifetime of a contracting process. A new tender is a release. An amendment to that tender is a release. The contract being awarded, or then signed, are each releases. These data releases are tied together by a common identifier, and can be combined into a summary record, providing a snapshot view of the state of a contracting process, and a history of how it has developed over time.

This releases and records model seeks to combine together different user needs: from the firm seeking information about tender opportunities, to the civil society organisation wishing to analyse across a wide range of contracting processes. And by allowing core stages in the business process of contracting to be published as they happen, and then joined up later, it is oriented towards the development of contracting systems that default to timely openness.

As I’ll be exploring in my talk at the Berkman Centre next week, the challenge ahead for open data is not just to find standards to make existing datasets line-up when they get dumped online, but is to envisage and co-design new infrastructures for everyday transparent, effective and accountable processes of government and governance.

(2) Not your minimum viable product

Different models of standard

Many open data standard projects adopt either a ‘Minimum Viable Product‘ approach, looking to capture only the few most common fields between publishers, or are developed through focussing on the concerns of a single publisher or users. Whilst MVP models may make sense for small building blocks designed to fit into other standardisation efforts, when it came to OCDS there was a clear user demand to link up data along the contracting process, and this required an overarching framework from into which simple component could be placed, or from which they could be extracted, rather than the creation of ad-hoc components, with the attempt to join them up made later on.

Whilst we didn’t quite achieve the full abstract model + idiomatic serialisations proposed in the initial technical architecture sketch, we have ended up with a core schema, and then suggested ways to represent this data in both structured and flat formats. This is already proving useful for example in exploring how data published as part of the UK Local Government Transparency Code might be mapped to OCDS from existing CSV schemas.

(3) The interop balancing act & keeping flex in the framework

OCDS is, ultimately, not a small standard. It seeks to describe the whole of a contracting process, from planning, through tender, to contract award, signed contract, and project implementation. And at each stage it provides space for capturing detailed information, linking to documents, tracking milestones and tracking values and line-items.

This shape of the specification is a direct consequence of the method adopted to develop it: looking at a diverse set of existing data, and spending time exploring the data that different users wanted, as well as looking at other existing standards and data specifications.

However, OCDS by not means covers all the things that publishers might want to state about contracting, nor all the things users may want to know. Instead, it focusses on achieving interoperability of data in a number of key areas, and then providing a framework into which extensions can be linked as the needs of different sub-communities of open data users arise.

We’re only in the early stages of thinking about how extensions to the standard will work, but I suspect they will turn out to be an important aspect: allowing different groups to come together to agree (or contest) the extra elements that are important to share in a particular country, sector or context. Over time, some may move into the core of the standard, and potentially elements that appear core right now might move into the realm of extensions, each able to have their own governance processes if appropriate.

As Urs Gasser and John Palfrey note in their work on Interop, the key in building towards interoperability is not to make everything standardised and interoperable, but is to work out the ways in which things should be made compatible, and the ways in which they should not. Forcing everything into a common mould removes the diversity of the real world, yet leaving everything underspecified means no possibility to connect data up. This is both a question of the standards, and the pressures that shape how they are adopted.

(4) Avoiding identity crisis

green_organisation.svgData describes things. To be described, those things need to be identified. When describing data on the web, it helps if those things can be unambiguously identified and distinguished from other things which might have the same names or identification numbers. This generally requires the use of globally unique identifiers (guid): some value which, in a universe of all available contracting data, for example, picks out a unique contracting process; or, in the universe of all organizations, uniquely identifies a specific organization. However, providing these identifiers can turn out to be both a politically and technically challenging process.

The Open Data Institute have recently published a report on the importance of identifiers that underlines how important identifiers are to processes of opening data. Yet, consistent identifiers often have key properties of public goods: everyone benefits from having them, but providing and maintaining them has some costs attached, which no individual identifier user has an incentive to cover. In some cases, such as goods and service identifiers, projects have emerged which take a proprietary approach to fund the maintenance of those identifiers, selling access to the lookup lists which match the codes for describing goods and services to their descriptions. This clearly raises challenges for an open standard, as when proprietary identifiers are incorporated into data, then users may face extra costs to interpret and make sense of data.

In OCDS we’ve sought to take as distributed an approach to identifiers as possible, only requiring globally unique identifiers where absolutely necessary (identifying contracts, organizations and goods and services), and deferring to existing registration agencies and identity providers, with OCDS maintaining, at most, code lists for referring to each identity ‘scheme’.

In some cases, we’ve split the ‘scheme’ out into a separate field: for example, an organization identifier consists of a scheme field with a value like ‘GB-COH’ to stand for UK Companies House, and then the identifier given in that scheme, like ‘5381958’. This approach allows people to store those identifiers in their existing systems without change (existing databases might hold national company numbers, with the field assumed to come from a particular register), whilst making explicit the scheme they come from in the OCDS. In other cases, however, we look to create new composite string identifiers, combining a prefix, and some identifier drawn from an organizations internal system. This is particularly the case for the Open Contracting ID (ocid). By doing this, the identifier can travel between systems more easily as a guid – and could even be incorporated in unstructured data as a key for locating documents and resources related to a given contracting process.

However, recent learning from the project is showing that many organisations are hesistant about the introduction of new IDs, and that adoption of an identifier schema may require as much advocacy as adoption of a standard. At a policy level, bringing some external convention for identifying things into a dataset appears to be seen as affecting the, for want of a better word, sovereignty of a specific dataset: even if in practice the prefix approach of the ocid means it only need to be hard coded in the systems that expose data to the world, not necessarily stored inside organizations databases. However, this is an area I suspect we will need to explore more, and keep tracking, as OCDS adoption moves forward.

(5) Bridging communities of practice

If you look closely you might in fact notice that the specification just launched in Costa Rica is actually labelled as a ‘release candidate‘. This points to another key element of learning in the project, concerning the different processes and timelines of policy and technical standardisation. In the world of funded projects and policy processes, deadlines are often fixed, and the project plan has to work backwards from there. In a technical standardisation process, there is no ‘standard’ until a specification is in use: and has been robustly tested. The processes for adopting a policy standard, and setting a technical one, differ – and whilst perhaps we should have spoken from the start of the project of an overall standard, embedding within it a technical specification, we were too far down the path towards the policy launch before this point. As a result, the Release Candidate designation is intended to suggest the specification is ready to draw upon, but that there is still a process to go (and future governance arrangements to be defined) before it can be adopted as a standard per-se.

(6) The schema is just the start of it

This leads to the most important point: that launching the schemas and specification is just one part of delivering the standard.

In a recent e-mail conversation with Greg Bloom about elements of standardisation, linked to the development of the Open Referral standard, Greg put forward a list of components that may be involved in delivering a sustainable standards project, including:

  • The specification – with its various components and subcomponents);
  • Tools that assesses compliance according to the spec (e.g. validation tools, and more advanced assessment tools);
  • Some means of visualizing a given set of data’s level of compliance;
  • Incentives of some kind (whether positive or negative) for attaining various levels of compliance;
  • Processes for governing all of the above;
  • and of course the community through which all of this emerges and sustains;

To this we might also add elements like documentation and tutorials, support for publishers, catalysing work with tool builders, guidance for users, and so-on.

Open government standards are not something to be published once, and then left, but require labour to develop and sustain, and involve many social processes as much as technical ones.

In many ways, although we’ve spent a year of small development iterations working towards this OCDS release, the work now is only just getting started, and there are many technical, community and capacity-building challenges ahead for the Open Contracting Partnership and others in the open contracting movement.

Upcoming talks: October/November 2014

[Summary: quick links to upcoming talks]

The next month is shaping up to be a busy one with project deadlines, and lots of interesting opportunities to share reflections on research projects from the last year. Below are details of a few talks and activities I’m involved in over the coming weeks:

29th October 2014: ICT for Transparency, Accountability and Anti-Corruption: Incentives and Key Features for Implementation (Webinar)

Tomorrow (29th October) at 2pm BST (10am EST) I’ll be sharing an outline of the paper I wrote with Silvana Fumega that was published earlier this year, questioning how the motivations of government in adopting open government ICTs may affect the way those ICTs are implemented and the effects they can have, as well as looking at the different factors that shape adoption and implemention of these technologies. The session will also include Savita Bailur, sharing brand new research into the mySociety Alavateli platform for FOI requests, and it’s use around the world.

The session will consist of short presentations, followed by an opportunity for discussion.

Registration to take part is open here.

25th November 2014: Unpacking open data: power, politics and the influence of infrastructures

I’ll be back at the Berkman Center to talk about some of my research from the last year, and to explore some of the new directions my work on open data is taking. Here’s the blurb for the talk:

“Countries, states & cities across the globe are embracing the idea of ‘open data’: establishing platforms, portals and projects to share government managed data online for re-use. Yet, right now, the anticipated civic impacts of open data rarely materialise, and the gap between the promise and the reality of open data remains wide. This talk, drawing on a series of empirical studies of open data around the world, will question the ways in which changing regimes around data can reconfigure power and politics, and will explore the limits of current practice. It will consider opportunities to re-imagine the open data project, not merely as one of placing datasets online, but as one that can positively reshape the knowledge infrastructures of civic life.”

The talk will be webcast, but if you happen to be in Cambridge, MA, you can also join in person at the Berkman Center over lunch. More details and in-person sign-up is here.

November 4th 2014: Sheffield iSchool Seminar

I’ll be joining Jo Bates and Danny Antrobus at the Sheffield iSchool for a seminar on open data theory of practice. Taking place at 1pm. More info should be up soon on the iSchool blog, and the blurb of what I’ll be talking on is below:

“Open data had rapidly become a global phenomena, driven both both top-down policy transfer, and bottom-up demands for greater access to vital information. Drawing on research from the Open Data in Developing Countries (ODDC) project, which has supported case-study research into open data use and impacts in 12 countries across the global South, this presentation will explore how far the models for open government data that are promoted through global institutions are aligned with the needs and realities of different communities around the world. By moving beyond a ‘narrow model’ of open data, focused on datasets, portals and apps, a richer picture of both the potential and the pitfalls of particular approaches to opening up data can be uncovered. “

November 18th 2014: Launch of the Open Contracting Data Standard

At the Open Government Partnership regional meeting in Costa Rica, I’ll be joining with the team who have been working on prototyping a data standard for public contracting to see the public release of the standard launched, and I hope to engage in conversation about how to keep developing it further in open and collaborative ways.

Creating the capacity building game…

Open Development Camp Logo[Summary: crowdsourcing contributions to a workshop at Open Development Camp]

There is a lot of talk of ‘capacity building’ in the open data world. As the first phase of the ODDC project found, there are many gaps between the potential of open data and it’s realisation: and many of these gaps can be described as capacity gaps – whether on the side of data suppliers, or potential data users.

But how does sustainable capacity for working with open data develop? At the Open Development Camp in a few weeks time I’ll be facilitating a workshop to explore this question, and to support participants to share learning about how different capacity building approaches fit in different settings.

The basic idea is that we’ll use a simple ‘cards and scenarios’ game (modelled, as ever, on the Social Media Game), where we identify a set of scenarios with capacity building needs, and then work in teams to design responses, based on combining a selection of different approaches, each of which will be listed one of the game cards.

But, rather than just work from the cards, I’m hoping that for many of these approaches there will be ‘champions’ on hand, able to make the case for that particular approach, and to provide expert insights to the team. So:

  • (1) I’ve put together a list of 24+ different capacity building approaches I’ve seen in the open data world – but I need your help to fill in the details of their strengths, weaknesses and examples of them in action.
  • (2) I’m looking for ‘champions’ for these approaches, either who will be at the Open Development Camp, or who could prepare a short video input in advance to make the case for their preferred capacity building approach;

If you could help with either, get in touch, or dive in direct on this Google Doc.

If all goes well, I’ll prepare a toolkit after the Open Development Camp for anyone to run their own version of the Capacity Building Game.

The list so far

Click each one to jump direct to the draft document

Two senses of standard

[Summary: technical standards play a role in both interoperability, and in target-setting for policy.]

I’ve been doing lots of thinking about standardisation recently, particularly as part of work on the Open Contracting Data Standard (feedback invited on the latest draft release…), and thanks to the opportunity to work with Samuel Goëta on a paper around data standards (hopefully out some time next year).

One of the themes I’ve been seeking to explore is how standards play both a technical and a political role, and how standards processes (at least at the level of content standards) can sensitively engage with this. Below is a repost of my earlier contribution to a GitHub thread discussing some of this in the context of Open Contracting.

Two senses of standard

In Open Contracting I believe we’re dealing with two different senses of ‘standard’, and two purposes which we need to keep in balance. Namely:

  • Standards as a basis for interoperability – as in *”their data complies with the standard, and can be used by standards-compliant tools.”
  • Standards as targets – as in, “they have achieved a high standard of disclosure”.

To unpack these a bit:

(Note: the arguments below are predominantly theoretical, and so some of the edge cases considered may not come up at all in practice in the Open Contracting Data Standard, but considering them is a useful exercise to test the intuitions and principles directing our action.)

Standards as interoperability

We’re interested in interoperability in two directions: vertical (can a single dataset be used by other actors and tools in a value-chain of re-use), and horizontal (can two datasets from different publishers be easily analysed alongside one another).

Where data is already published, then the goal should be to achieve the largest possible set of data publishers who can richly represent their data in the standard, and of data users who can draw on data in the standard to meet their needs. This supports the idea that for any element in the standard where (a) data already exists; and (b) use cases already exist; we should be looking for reference implementations to test that data can be rendered in the standard, and that users (or tools they create) can read, analyse and use that data effectively.

However, it is important that in this we look at both both horizontal and vertical interoperability in making this judgement. E.g. there could be a country as the sole publisher of a field that is used by 5 different users in their country. This should clearly not be a required field in a standard, but articulating how it is standardised is useful to this community of users (one way to accommodate such cases may be in extensions, although the judgement on whether or not to move something to an extension might come down to whether it is likely that other publishers could be providing this data in future).

In many cases, underlying data from different sources is not perfectly interoperable, or there is a mismatch between the requirements of users, and the requirements of data holders. In these cases, the way a standard is designed affects the distribution of labour between publishers and users with respect to rendering data interoperable. For example, a use case might involve ‘Identifying which different government agencies, each publishing data independently, have contracts with a particular firm’. In this case, a standard could require all publishers, who may store different identifiers in their systems, to map these to a common identifier, or a standard could allow publishers to use whatever identifier they hold, leaving the costs of reconciling these on the user. Making things interoperable then involves can involve then a process of negotiation, and this process may play out differently in different places at different times, leaving certain elements of a standard less stable than others. The concept of ‘designing for the tussle’ (PDF) may be relevant here, thinking about how we can modularise stable (or ‘neutral’) and unstable elements of a standard (this is what the proposed Organisation ID standard does, but having a common way to represent identifiers, but separating this off from the choice of identifier itself, and then allowing for the emergence of a set of third-party tools and validation routines to help manage the tussle).

In seeking to maximise the set of publishers and users interoperable through the standard we need to be critically aware of both short-term and long-term interoperability, as organisations modify their practices in order to be able to publish to, or draw upon, a common standard. We need to balance out a ‘Lowest Common Denominator’ (LCD) of ‘Minimum Viable Product’ (MVP) approach that means that the majority of publishers can achieve substantial coverage of the standard, with a richer standard that supports the greatest chance of different producer and consumer groups being able to exchange data through the standard.

initial-sketch-thinking-about-standards

(Initial attempt to sketch distinction between maximising set of common fields across publisher and users, and maximising set of publishers and users)

Standards as targets

Open Contracting is a political process. The Open Contracting Partnership have articulated a set of Global Principles which set out the sorts of information about contracting that governments and other parties should disclose, and they are working to secure government sign-up to these principles. In policy circles, a standard is often seen as a form of measure, qualitative or quantitative, against which process towards some policy goal is measured. Some targets might be based on ‘best practice’, others are based on ‘stretch goals’: things which perhaps no-one is yet doing particularly well, but which a community of actors agree are worth aiming for. A standard, whether specified in terms of indicators and measures, or in terms of fields and formats, provides a means of agreeing what meeting the target will look like.

The Open Contracting Principles call for a lot of things which no governments appear to yet be publishing in machine-readable forms. In many cases we’ve not touched the standardisation of these right now (e.g. “Risk assessments, including environmental and social impact assessments”) recognising that standards for these will either exist in different domains that can be linked or embedded into our standard, or, recognising that interoperability of such information is hard to achieve and ultimately what is needed for most use cases may be legal text or plain language documents, rather than structured data. However, there may be cases where something is a strong candidate for standardisation, having both the potential to be published (i.e. this is something which evidence suggests governments either do, or could, capture in their existing information systems), and for which clearly articulated use cases exist. In these cases a proposed field-level standard can act as an important target for those seeking to provide this data to move towards. It also acts to challenge unwarranted ‘first mover advantage’ where the first person to publish, even if publishing less than an idea target would require, gets to set the standard, and instead makes the ‘target’ subject to community discussion.

Clearly any ‘aspirational’ elements of a standard should not predominate or make up the majority of a standard if it seeks to effectively support interoperability, but in standards that play a part in policy and political processes (as, in practice, all standards do to some extent (c.f. Lessig).

Implications for Open Contracting Data Standard

There are a number of ways we might respond to a recognition of the dual role that standardisation plays in Open Contracting.

Purposes and validation sets

One approach, suggested in the early technical scoping is to identify different sets of users, or ‘purposes’ for the standard, and for each of these to identify the kinds of fields (subset of the data) these purposes require. As Jeni Tennison’s work on the scoping describes “…each purpose can have a status (eg proposed vs implemented) and … purposes are only marked as implemented when there are implementations that use the given subset of data for the specified purpose”.

If their are neither purposes requiring a field, nor datasets providing a field, then it would not be suitable for inclusion in a standard. And if a purpose either went unimplemented for a long period, or required a field that no supplier could publish, then careful evaluation would be needed of whether to remove that purpose (or remove that field from the purpose) against which elements of the standard could be evaluated for relevance to remain in the model.

Purposes could also be used to validate datasets, identifying how many datasets are fit for which purpose.

Stable, ordinary and target elements

We could maintain a distinction in how the standard is described between fields and elements which are ‘stable’ (and thus very unlikely to change), ‘ordinary’ elements (which may have reference implementations, but could change if there was some majority interest amongst those governing a standard in seeing changes), and ‘target’ elements, which may lack any reference implementations, but which are considered useful to help publishers moving towards implementing a political commitment to publish.

Q: Could we build this information into the schema meta-data somehow?

We might need to have quite a long time horizon for keeping target elements provisionally in the standard, and to only remove them if there is agreement that no-one is likely to publish to them. However, being able to represent them visually as distinct in the schema, and clearly documenting the distinction may be valuable.

Extensions

Some ‘target’ elements may best belong in extensions, with some process for merging extensions into the core standard if they are widely enough adopted.

Regular implementation monitoring

The IATI Team run a dashboard which tracks use of particular fields in the data. Doing similar for Open Contracting would be valuable, and it may even be useful to feed such information into the display of the schema or documentation (or at least to make it easy for publishers and users to look up who is implementing a given property)

Implementation schedules

Another approach IATI uses for ‘target elements’ is to ask publishers to prepare ‘Implementation Schedules‘ which outline which fields they expect to be able to publish by when. This allows an indication of whether there is political will to reach some of the ‘stretch targets’ that might be involved in a standard, and holds out the potential to convene together to define and refine target standardisations those who are most likely to publish that data in the near to medium term.

Discussion

What theoretical writing on standardisation could I be drawing on here?

What experience from other standards could we be drawing upon in Open Contracting and in other standard processes?

Exploring Wikidata

WikiData[Summary: thinking aloud – brief notes on learning about the wikidata project, and how it might help addressing the organisational identifiers problem]

I’ve spent a fascinating day today at the Wikimania Conference at the Barbican in London, mostly following the programmes ‘data’ track in order to understand in more depth the Wikidata project. This post shares some thinking aloud to capture some learning, reflections and exploration from the day.

As the Wikidata project manager, Lydia Pintscher, framed it, right now access to knowledge on wikipedia is highly skewed by language. The topics of articles you have access to, the depth of meta-data about them (such as the locations they describe), and the detail of those articles, and their liklihood of being up to date, is greatly affected by the language you speak. Italian or Greek wikipedia may have great coverage of places in Italy or Greece, but go wider and their coverage drops off. In terms of seeking more equal access to knowledge, this is a problem. However, whilst the encyclopedic narrative of a French, Spanish of Catalan page about the Barbican Center in London will need to be written by someone in command of that language, many of the basic facts that go into an article are language-neutral, or translatable as small units of content, rather than sentences and paragraphs. The date the building was built, the name of the architect, the current capacity of the building – all the kinds of things which might appear in infoboxes – are all things that could be made available to bootstrap new articles, or that, when changed, could have their changes cascaded across all the different language pages that draw upon them.

That is one of the motivating cases for Wikidata: separating out ‘items’ and their ‘properties’ that might belong in Wikipedia from the pages, making this data re-usable, and using it to build a better encyclopedia.

However, wikidata is also generating much wider interest – not least because it is taking on a number of problems that many people want to see addressed. These include:

  • Somewhere ‘institutional’ and well governed on the web to put data – and where each data item also gains the advantage of a discussion page.
  • The long-term preservation, and versioning, of data;
  • Providing common identifiers on the web for arbitrary things – and providing URIs for these things that can be looked up (building on the idea of DBPedia as a crystalisation point for the web of linked data);
  • Providing a data model that can cope with change over time, and with data from heterogenous sources – all of the properties in wikidata can have qualifiers, such as when the statement is true from, or until, source information, and other provenance data.

Wikidata could help address these issues on two levels:

  • By allowing anyone to add items and properties to the central wikidata instance, and making these available for re-use;
  • By providing an open source software platform for anyone to use in managing their own corpus of wikified, versioned data*;

A particular use case I’m interested in is whether it might help in addressing the perenial Organisational Identifiers problem faced by data standards such as IATI and Open Contracting, where it turns out that having shared identifiers for government agencies, and lots of existing, but non-registered, entities like charities and associations that give and recieve funds, is really difficult. Others at Wikimania spoke of potential use cases around maintaining national statistics, and archiving the datasets underlying scientific publications.

However, in thinking about the use cases wikidata might have, its important to keep in mind it’s current scope:

  • It is a store of ‘items’ and then ‘statements’ about them (essentially a graph store). This is different from being a place to store datasets (as you might want to do with the archival of the dataset used in a scientific paper), and it means that, once created, items are the first class entities of wikidata, able to exist in multiple collection.
  • It currently inherits Wikipedia’s notability criteria for items. That is, the basic building blocks of wikidata – the items that can be identified and described, such as the Barbican, Cheese or Government of Grenada – can only be included in the main wikidata instance if they have a corresponding wikipedia page in some language wikipedia (or similar: this requirement is a little more complex).
  • It can be edited by anyone, at any time. That is, systems that rely on the data need to consider what levels of consistence they need. Of course, as wikipedia has shown, editability is often a great strength – and as Rufus Pollock noted in the ‘data roundtable’ session, updating and versioning of open data are currently big missing parts of our data infrastructures.

Unlike the entirely distributed open world assumption on the web of data, where the AAA assumption holds (Anyone can say Anything about Anything), wikidata brings both a layer of regulation to the statements that can be made, and the potential of community driven editorial control. It sits somewhere between the controlled description sets of Schema.org, and an entirely open proliferation of items and ontologies to describe them.

Can it help the organisational identifiers problem?

I’ve started to carry out some quick tests to see how far wikidata might be a resource to help with the aforementioned organisational identifiers problem.

Using Kasper Brandt‘s fantastically useful linked data rendering of IATI, I queried for the names of a selection of government and non-government organisations occurring in the International Aid Transparency Initiative data. I then used Open Refine to look up a selection of these on the DBPedia endpoint (which it seems now incorporates wikidata info as well). This was very rough-and-ready (just searching for full name matches), but by cross-checking negative results (where there were no matches) by searching wikipedia manually, it’s possible to get a sense of how many organisations might be identifiable within Wikipedia.

So far I’ve only tested the method, and haven’t run a large scale test – but I found around 1/2 the organisations I checked had a Wikipedia entry of some form, and thus would currently be eligible to be Wikidata items right away. For others, Wikipedia pages would need to be created, and whether or not all the small voluntary organisations that might occur in an IATI or Open Contracting dataset would be notable for inclusion is something that would need to be explored more.

Exploring the Wikidata pages for some of the organisations I did find threw up some interesting additional possibilities to help with organisation identifiers. A number of pages were linked to identifiers from Library Authority Files, including VIAF identifiers such as this set of examples returned for a search on Malawi Ministry of Finance. Library Authority Files would tend to only include entries when a government agency has a publication of some form in that library, but at a quick glance coverage seems pretty good.

Now, as Chris Taggart would be quick to point out, neither wikipedia pages, nor library authority file identifiers, act as a registry of legal entities. They pick out everyday concepts of an organisation, rather than the legally accountably body which enters into contracts. Yet, as they become increasingly backed by data, these identifiers do provide access to look up lots of contextual information that might help in understanding issues like organisational change over time. For example, the Wikipedia page for the UK’s Department for Education includes details on the departments that preceeded it. In wikidata form, a statement like this could even be qualified to say if that relationship of being a preceeding department is one that passes legal obligations from one to the other.

I’ve still got to think about this a lot more, but it seems that:

  • There are many things it might be useful to know about organisations, but which are not going to be captured in official registries anytime soon. Some of these things will need to be subject of discussion, and open to agreement through dialogue. Wikidata, as a trusted shared space with good community governance practices might be a good place to keep these things, albeit recognising that in its current phase it has no goal of being a comprehensive repository of records about all organisations in the world (and other spaces such as Open Corporates are already solving the comprehensive coverage problem for particular classes of organiastion).

  • There are some organisations for which, in many countries, no official registry exists (particularly Government Departments and Agencies). Many of these things are notable (Government Departments for example), and so even if no Wikipedia entry yet exists, one could and should. A project to manage and maintain government agency records and identifiers in Wikidata may be worth exploring.

Whether a shift from seeking to solve some aspects of the organisational identifiers problem through finding some authority to provide master lists, to developing a distributed best-efforts community approach is one that would make sense to the open government community is something yet to be explored.

Notes

*I here acknowledge SJ Klein‘s counsel was that this (encouraging multiple domain specific instances of a wikidata platform) is potentially a very bad idea, as the ‘forking’ of wiki-projects has rarely been a successful journey: particularly with respect to the sustainability of forked content. As SJ outlined, even though there may be technical and social challenges to a mega graph store, these could be compared to the apparant challenges of making the first encyclopedias (the idea of 50,000 page book must have seemed crazy at first), or the social challenges envisioned to Wikipedia at its genesis (‘how could non-experts possible edit an enecylopedia?’). On this view, it is only by setting the ambition of a comprehensive shared store of the worlds propositional data (with the qualifiers that Wikidata supports to make this possible without a closed world assumption) that such limits might be overcome. Perhaps with data there is a greater possibility to support forking, and remerging, of wikidata instances, permitting short-term pragmatic creation of datasets outside the core wikidata project, which can later be brought back in if they are considered, as a set, notable (although this still carries risks that forked projects diverge in their values, governance and structure so far that re-connecting later is made prohibitively difficult).

A Data Sharing Disclosure Standard?

DataSharing[Summary: Iterations on a proposal for a public register of government data sharing arrangements, setting out options for a Data Sharing Disclosure Standard to be used whenever government shares personal data. Draft for interactive comments here (and PDF for those in govt without access to Google Docs )

At the instigation of the UK Cabinet Office, an open policy making process is currently underway to propose new arrangements for data sharing in government. Data sharing arrangements are distinct from open data, as they may involve the limited exchange of personal and private data between government departments, or outside of government, with specific purpose of data use in mind.

The idea that new measures are needed is based on a perception that many opportunities to make better use of data for research, addressing debt and fraud, or tailoring the design of public services, are missed because either because of legal or practical barriers to data moving being exchanged or joined up between government departments. Some departments in particular, such as HMRC, require explicit legal permissions to share data, where in other department and public bodies, a range of existing ‘legal gateways’ and powers support exchange of data.

I’ve been following the process from afar, but on Monday last week I had the chance to attend one of the open full-day workshops that Involve are facilitating as part of the open policy making process. This brought together representatives of a range of public bodies, including central government departments and local authorities, with members of the Cabinet Office team leading on data sharing reforms, and a small number of civil society organisations and individuals. Monday’s discussion were centered on the introduction of new ‘permissive powers’ for data sharing to support tailored public services. For example, powers that would make it easier for local government to request and obtain HMRC data on 16 – 19 year olds in order to identify which young people in their area were already in employment or training, and so to target their resources on contacting those young people outside employment or training who they have a statutory obligation to support.

The exact wording of such a power, and the safeguards that need to be in place to ensure it is neither too broad, nor open to abuse, are being developed through the open policy making process. One safeguard I believe is important comes from introducing greater transparency into government data sharing arrangements.

A few months back, working with Reuben Binns, I put together a short note on a possible model for an ‘Open Register of Data Sharing‘. In Monday’s open policy making meeting, the topic of transparency as an important aspect of tailored public service data sharing came up, and provided an opportunity to discuss many of the ideas that the draft proposal had contained. Through the discussions, however, it became clear that there were a number of extra considerations needed to develop the proposal further, in particular:

  • Noting that public disclosure of planned data sharing was not only beneficial for transparency and scrutiny, but also for efficiency, coordination and consistency of data sharing: by allowing public bodies to pool data sharing arrangements, and to easily replicate approved shares, rather than starting from scratch with every plan and business case.
  • Recognising the concerns of local authorities and other public bodies about a centralised register, and the need to accommodate shares that might take place between public bodies at a local level only, without involvement of central government.
  • Recognising the need for both human and machine-readable information on data sharing arrangements, so that groups with a specific interest in particular data (e.g. associations looking out for the rights of homeless people) could track proposed or enacted arrangements without needing substantial technical know-how.
  • Recognising the importance of documents like Privacy Impact Assessments and Business Cases, but also noting that mandatory publication of these during their drafting could distort the drafting process (with the risk they become more PR documents making the case for a share, than genuine critical assessments), suggesting a mix of proactive and reactive transparency may be needed in practice.

As a result of the discussions with local authorities, government departments and others, I took away a number of ideas about how the proposal could be refined, and so this Friday, at the University of Southampton Web and Internet Science group annual gathering and weekend of projects (known locally as WAISFest) I worked in a stream on personal data, and spend a morning updating the proposals. The result is a reframed draft that, rather than focusing on the Register, focuses on a Data Sharing Disclosure Standard emphasising the key information that needs to be disclosed about each data share, and discussing when disclosure should take place, whilst leaving open a range of options for how this might be technically implemented.

You can find the updated document here, as a Google Doc open to comments. I would really welcome comments and suggestion for how this could be refined further over the coming weeks. If you do leave a comment and want to be credited / want to join in future discussion of this proposal, please also include your name / contact details.

The Gazette provides semantically enriched public notices: readable by humans and machines.

The Gazette provides semantically enriched public notices: readable by humans and machines.

A couple of things of particular note in the draft:

  • It is useful to identify (a) data controllers; (b) dataset; (c) legislation authorising data shares. Right now the Register of Data Controllers seems to provide a good resource for (a), and thanks to recent efforts at building out the digital information infrastructure of the UK, it turns out there are often good URLs that can be used as identifiers for datasets (data.gov.uk lists unpublished datasets from many central government departments) and legislation (through the data-all-the-way down approach of legislation.gov.uk).
  • It considers how the Gazette might be used as a publication route for Data Sharing Disclosures. The Gazette is an official paper of record, established since 1665 but recently re-envisioned with a semantic publishing platform. Using such a route to publish notices of data sharing has the advantage that it combines the long-term archival of information in a robust source, with making enriched openly licensed data available for re-use. This potentially offers a more robust route to disclosures, in which the data version is a progressive enhancement on top of an information disclosure.
  • Based on feedback from Javier Ruiz, it highlights the importance of flagging when shared data is going to be processed using algorithms that will determine individuals eligibility for services/trigger interventions affecting citizens, and raises of the question of whether the algorithms themselves should be disclosed as a mater of course.

I’ll be sharing a copy of the draft with the Data Sharing open policy process mailing list, and with the Cabinet Office team working on the data sharing brief. They are working to draft an updated paper on policy options by early September, with a view to a possible White Paper – so comments over the next few weeks are particularly valued.