July 12, 2010 1

Data, URIs & Permanence

By

There is a strong drive in the UK at the moment to turn public sector data into semantically marked up and globally accessible resources (see OpenPSI for examples of use). There is also a heavy push from academic funders to make the outputs of research projects available in a similar format, like the JISCexpo call that this project is funded under.

Whilst there seems to be a lot of focus on the creation of URI schemes (Jeni Tennison, data.gov.uk) there doesn’t seem to be the same consideration given to the permanence of the resources created. Typically a project concerned with the creation of Linked Data will be funded for a fixed period of time. During this time the funding pays for staff to conduct the work required, which ultimately results in the production of some Linked Data. Once the project has finished and the funding has stopped, what happens to the data that is produced?

One of the primary requirements of useful Linked Data is that that the URIs created exist in perpetuity, so that future data sets can be linked to them. Assuming a project is funded for a year and that there is a commitment to locally host the data for a further year, what happens after this period has elapsed?

What we need is provision for UK academic data to be hosted on the JANET network under a suitable .ac.uk domain. For example the data outputs of this project could be hosted on http://musicnet.ac.uk, ensuring that:

  1. The URIs are Institution independent
  2. There is no ongoing administration cost for renewing a project domain name (.ac.uk are a one off payment)


The current rules make it hard for short term projects to acquire an academic ac.uk domain, as a project must be funded for two years as a requirement. It is important that there be a process to decide which projects should and shouldn’t get a domain name but as the way we use the web changes, with focus shifting more to semantically marked up data rather than just human consumed HTML, the academic research community need to discuss & rethink the metric on which this decision is made.

This issue of persistent URI hosting is a new and increasingly important problem for the emerging Semantic Web. If we started to asses projects based on their impact over time as opposed to just their funding duration we might encourage the creation of more short term projects that expose Linked Data, which can only be a good thing for the community at large!

There is also the issue of hosting. Who will actually provide the server space where the data is to reside. However this is a secondary issue, one that is easier to discuss once a system is in place for maintaining the actual URIs.

Tags: , , , , , , , , , ,