Peter Murray-Rust is a chemist, a reader in molecular informatics at the University of Cambridge, and a Senior Research Fellow of Churchill College.
I'm a chemist. I'm very interested in how the enormous amount of information that's being put on the web can be used for science. The possibilities of doing things with that information are enormous. What we need to do, however, is to be able to access it. One of the frustrations that many scientists have is that they find the key bits of data they want aren't available. I discovered this in chemistry-there's lots of data out there, but only a very small proportion of it is easy to get without having to pay for it, or without having to ask permission to use it. As a result of this I, along with others, came up with the idea of open data-the assertion, if you like, that certain types of information should be inexorably free for the human race.
It's discipline dependent, and the behavior varies by scientists-chemistry is a fairly conservative discipline in this area, while astronomy and particle physics make all of their data freely available. But there's an increasing realization that if work is funded from public or charitable sources, then there's a requirement on the researchers to make their data available. The various parties responsible for grant making in the United Kingdom, the United States, and in many other places are now actively starting to put requirements on grantees to make not only the textual publication work available, but also the data on which it rests.
There are two or three objective problems; one is that making data available is not trivial. It's easier to make a single document (a copy of your publication, for example) available, than it is to package your data in a way that other people would want to use it. So there are technical aspects. There's also inertia. It's not common for many scientists to share their data; they don't realize the value of doing it. So they need to change the way in which they work and the culture of how they reach out to people. Of course many scientists are naturally competitive because funding depends on publication; the more you get published, the more you are likely to receive. So people are naturally jealous of their results and in many cases, they don't want to make their data available because then their competitors might be able to see things in their data that they hadn't been able to see.
It's fair to say that not all data can be made universally available, and that's particularly true when you've got patient or sociological data which relate to human services. There do have to be areas where privacy makes it impossible to share data universally. But in many branches of science-and this is particularly true of physical science, material science, and so on-there's no reason in principle why the data shouldn't be made available. There has been a history of controlling data through commercial means, and there are a lot of organizations which up until now have made an income by collecting data from the community and then packaging it and selling it back. That was a reasonable thing to do in the 20th century. But in the 21st century, so much information is now born digital that it makes sense to think of an economy where as we create the data, we release it to the community rather than locking it up.
Story as told to Eric Steuer. Click the play button below to listen to the interview on which this piece is based.