Joined Up Philanthropy data standards: seeking simplicity, and depth

[Summary: technical notes on work in progress for the Open Philanthropy data standard]

I’m currently working on sketching out a alpha version of a data standard for the Open Philanthropy project(soon to be 360giving). Based on work Pete Bass has done analysing the supply of data from trusts and foundations, a workshop on demand for the data, and a lot of time spent looking at existing standards at the content layer (eGrant/hGrantIATISchema.orgGML etc) and deeper technical layers (CSV, SDFXMLRDF,JSONJSON-Schema and JSON-LD), I’m getting closer to having a draft proposal. But – ahead of that – and spurred on by discussions at the Berkman Center this afternoon about the role of blogging in helping in the idea-formation process, here’s a rough outline of where it might be heading. (What follows is ‘thinking aloud’ from my work in progress, and does not represent any set views of the Open Philanthropy project)

Building Blocks: Core data plus

Joined Up Data Components

There are lots of things that different people might want to know about philanthropic giving, from where money is going, to detailed information on the location of grant beneficiaries, information on the grant-making process, and results information. However, few trusts and foundations have all this information to hand, and very few are likely to have it in a single system such that creating an single open data file covering all these different areas of the funding process would be an easy task. And if presented with a massive spreadsheet with 100s of columns to fill in, many potential data publishers are liable to be put off by the complexity. We need a simple starting point for new publishers of data, and a way for those who want to say more about their giving to share deeper and more detailed information.

The approach to that should be a modular, rather than monolithic standard: based on common building blocks. Indeed, in line with the Joined Up Data efforts initiated by Development Initiatives, many of these building blocks may be common across different data standards.

In the Open Philanthropy case, we’ve sketched out seven broad building blocks, in addition to the core “who, what and how much” data that is needed for each of the ‘funding activities’ that are the heart of an open philanthropy standard. These are:

  • Organisations – names, addresses and other details of the organisations funding, receiving funds and partnering in a project
  • Process – information about the events which take place during the lifetime of a funding activity
  • Locations – information about the geography of a funded activity – including the location of the organisations involved, and the location of beneficiaries
  • Transactions – information about pledges and transfers of funding from one party to another
  • Results – information about the aims and targets of the activity, and whether they have been met
  • Classifications – categorisations of different kinds that are applied to the funded activity (e.g. the subject area), or to the organisations involved (e.g. audited accounts?)
  • Documents – links to associated documents, and more in-depth descriptions of the activity

Some of these may provide more in-depth information about some core field (e.g. ‘Total grant amount’ might be part of the core data, but individual yearly breakdowns could be expressed within the transactions building block), whilst others provide information that is not contained in the core information at all (results or documents for example).

An ontological approach: flat > structured > linked

One of the biggest challenges with sketching out a possible standard data format for open philanthropy is in balancing the technical needs of a number of different groups:

  • Publishers of the data need it to be as simple as possible to share their information. Publishing open philanthropy must be simple, with a minimum of technical skills and resources required. In practice, that means flat, spreadsheet-like data structures.
  • Analysts like flat spreadsheet-style data too – but often want to be able to cut it in different ways. Standards like IATI are based on richly structured XML data, nested a number of levels deep, which can make flattening the data for analysts to use it very challenging.
  • Coders prefer structured data. In most cases for web applications that means JSON. Whilst someexpressive path languages for JSON are emerging, ideally a JSON structure should make it easy for a coder to simply drill-down in the tree to find what they want, so being able to look foractivity.organisations.fundingOrganisation[0] is better than having to iterate through all theactivity.organisation nodes to find the one which has “type”:”fundingOrganisation”.
  • Data integrators want to read data into their own preferred database structures, from noSQL to relational databases. Those wanting to integrate heterogeneous data sources from different ‘Joined Up Data’ standards might also benefit from Linked Data approaches, and graph-based data using cross-mapped ontologies.

It’s pretty hard to see how a single format for representing data can meet the needs of all these different parties: if we go with a flat structure it might be easier for beginners to publish, but the standard won’t be very expressive, and will be limited to use in a small niche. If we go with richer data structures, the barriers to entry for newcomers will be too high. Standards like IATI have faced challenges through the choice of an expressive XML structure which, whilst able to capture much of the complexity of information about aid flows, is both tricky for beginners, and programatically awkward to parse for developers. There are a lot of pitfalls an effective, and extensible, open philanthropy data standard will have to avoid.

In considering ways to meet the needs of these different groups, the approach I’ve been exploring so far is to start from a detailed, ontology based approach, and then to work backwards to see how this could be used to generate JSON and CSV templates (and as JSON-LD context), allowing transformation between CSV, JSON and Linked Data based only on rules taken from the ontology.

In practice that means I’ve started sketching out an ontology using Protege in which there are top entities for ‘Activity’, ‘Organisation’, ‘Location’, ‘Transaction’, ‘Documents’ and so-on (each of the building blocks above), and more specific sub-classed entities like ‘fundedActivity’, ‘beneficiaryOrganisation’, ‘fundingOrganisation’, ‘beneficiaryLocation’ and so-on. Activities, Organisations, Locations etc. can all have many different data properties, and there are then a range of different object properties to relate ‘fundedActivities’ to other kinds of entity (e.g. a fundedActivity can have a fundingOrganisation and so-on). If this all looks very rough right now, that’s because it is. I’ve only built out a couple of bits in working towards a proof-of-concept (not quite there yet): but from what I’ve explored so far it looks like building a detailed ontology should also allow mappings to other vocabularies to be easily managed directly in the main authoritative definition of the standard: and should mean when converted into Linked Data heterogenous data using the same or cross-mapped building blocks can be queried together. Now – from what I’ve seen ontologies can tend to get out of hand pretty quickly – so as a rule I’m trying to keep things as flat as possible: ideally just relationships between Activities and the other entities, and then data properties.

What I’ve then been looking at is how that ontology could be programatically transformed:

  • (a) Into a JSON data structure (and JSON-LD Context)
  • (b) Into a set of flat tables (possibly described with Simple Data Format if there are tools for which that is useful)

And so that using the ontology, it should be possible to take a set of flat tables and turn them into structure JSON and, via JSON-LD, into Linked Data. If the translation to CSV takes place using the labels of ontology entities and properties rather than their IDs as column names, then localisation of spreadsheets should also be in reach.

Rough work in progress... worked example coming soon
Rough work in progress. From ontology to JSON structure (and then onwards to flat CSV model). Full worked example coming soon…

I hope to have a more detailed worked example of this to post shortly, or, indeed, a post detailing the dead-ends I came to when working this through further. But – if you happen to read this in the next few weeks, before that occurs – and have any ideas, experience or thoughts on this approach – I would be really keen to hear your ideas. I have been looking for any examples of this being done already – and have not come across anything: but that’s almost certainly because I’m looking in the wrong places. Feel free to drop in a comment below, or tweet @timdavies with your thoughts.

Joined Up Philanthropy – a data standards exploration

Earlier this year, Indigo Trust convened a meeting with an ambitious agenda: to see 50% of UK Foundation grants detailed as open data, covering 80% founding grant making by value, within five years. Of course, many of the grant-giving foundations in the UK already share details of the work they fund, through annual reports or pages on their websites – but every funder shares the information differently, which makes bringing together a picture of the funding in a particular area or sector, understanding patterns of funding over time, or identifying the foundations who might be interested in a project idea you have, into a laborious manual task. Data standards for the publication of foundation’s giving could change that.

Supported by The Nominet Trust and Indigo Trust, at Practical Participation I’m working with non-profit sector expert Peter Bass on a series of ‘research sprints’ to explore what a data standard could look like. This builds on an experiment back in March to help scope an Open Contracting Data Standard. We’ll be using an iterative methodology to look at

  • (1) the existing supply of data;

  • (2) demand for data and use-cases;

  • and (3) existing related standards.

Each research sprint focusses primarily on one of these, consisting in around 10 days data collection and analysis, designed to generate useful evidence that can move the conversation forward, without pre-empting future decisions or trying to provide the final word on the question of what a data standard should look like.

Supply: What data is already collected?

The first stage, which we’re working on right now, involves finding out about the data that foundations already collect. We’re talking to a number of different foundations large and small to find out about how they manage information on the work they fund right now.

By collating a list of the different database fields that different foundations hold (whether the column headings in the spreadsheets they use to keep track of grants, or the database fields in a comprehensive relational database) and then mapping these onto a common core we’re aiming to build up a picture of which data might be readily available right now and easy to standardise, and where there are differences and diversities that will need careful handing in development of a standard. Past standards projects like the International Aid Transparency Initiative were able to benefit from a large ‘installed base’ of aid donors already using set conventions and data structures drawn from the OECD Development Assistance Committee, which strongly influenced the first version of IATI. We’ll be on the look-out for existing elements of standardisation that might exist to build upon in the foundations sector, as well as seeking to appreciate the diversity of foundations and the information they hold.

We’re aiming to have a first analysis of this exercise out in mid-October, and whilst we’re only focussing on UK foundations, will share all the methods and resources that would allow the exercise to be extended in other contexts.

Demand: what data do people want?

Of course, the data that it is easy to get hold of might not be the data that it is important to have access to, or that potential users want. That motivates the second phase of our research – looking to understand the different use cases for data from the philanthropic sector. These may range from projects seeking to work out who to send their funding applications to; philanthropists seeking to identify partners they could work with; or sector analysts looking to understand gaps in the current giving environment and catalyse greater investment in specific sectors.

Each use case will have different data needs. For example, a local project seeking funding would care particularly about geodata that can tell them who might make grants in their local area; whereas a researcher may be interested in knowing in which financial year grants were awarded, or disbursements made to projects. By articulating the data needs of each use-case, and matching these against the data that might be available, we can start to work out where supply and demand are well matched, or where a campaign for open philanthropy data might need to encourage philanthropists to collect or generate new information on their activities.

Standards: putting the pieces together

Once we know about the data that exists, the data that people want, and how they want to use it – we can start thinking in-depth about standards. There are already a range of standards in the philanthropy space, from the eGrant and hGrant standards developed by the Foundation Centre, to the International Aid Transparency Initiative (IATI) standard, as well as a range of efforts ongoing to develop standards for financial reporting, spending data, and geocoded project information.

Developing a draft standard involves a number of choices:

  • Fields and formats – a standard is made up both of the fields that are deemed important (e.g. value of grant; date of grant etc.) and the technical format through which the data will be represented. Data formats vary in how ‘expressive’ they are, and how extensible a standard is once determined. However, more expressive standards also tend to be more complex.

  • Start from scratch, or extend existing standards – it may be possible to simply adapt an existing standard. Deciding to do this involves both technical and governance issues: for example, if we build on IATI, how would a domestic philanthropy standard adapt to version upgrades in the IATI standard? What collaboration would need to be established? How would existing tools handle the adapted standard.

  • Publisher capacity and needs – standards should reduce rather than increase the burdens on data suppliers. If we are asking publishers to map their data to a complex additional standard, we’re less likely to get a sustainable supply of data. Understanding the technical capacity of people we’ll be asking for data is important.

  • Mapping between standards – sometimes it is possible to entirely automate the conversion between two related standards. For example, if the fields in our proposed standard are a subset of those in IATI, it might be possible to demonstrate how domestic and international funding flows data can be combined. Thinking about how standards map together involves considering the direction in which conversions can take place, and how this relates to the ways different actors might want to make use of the data.

We’ll be rolling our sleeves up as we develop a draft standard proposal, seeking to work with real data from Phase 1 to test out how it works, and checking the standardised data against the use cases identified in Phase 2.

The outcome of this phase won’t be a final standard – but instead a basis for discussion of what standardised data in the philanthropy sector should look like.

Get involved

We’ll be sharing updates regularly through this blog and inviting comments and feedback on each stage of the research.

If you are from a UK based Foundation who would like to be involved in the first phase of research, just drop me a line and we’ll see what we can do. We’re particularly on the look out for small foundations who don’t do much with data right now – so if you’re currently keeping track of your grant-making records on spreadsheets or post-it notes, do get in touch.