Two new developments announced at the beginning of 2008 on which to keep an eye

By administrator | January 30, 2008

First, on January 18th, an announcement was made on blog.wired.comthat Google will be hosting terabytes of science data.

Sources at Google have disclosed that the humble domain, , will soon provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all. The project, known as Palimpsest and first previewed to the scientific community at the Science Foo camp at the Googleplex last August, missed its original launch date this week, but will debut soon.

Google may have just violated one of its operating principles, namely, “do no harm.” I am concerned that Google’s treatment of data as just another Internet commodity may undermine the work that data archivists have been labouring over recent decades to get researchers and research councils to take data preservation and access seriously. While the announcement does not address Google’s approach to data preservation, it is clear that the author sees this as an open access success story. How will data be transferred to Google?

(Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send they data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB).

Can we expect metadata to be slurped up with the data? Overall, the Google approach simply confounds science data with other digitally published resources, i.e., just another digital commodity. I would be much more reassured if science data were being cared by a non-profit, public institution. This is an issue that we should debate within IASSIST and one that we might well have to address in the wider scientific community. After all, Google will only succeed if scientists give them access to research data.

The second development was announced in a press release by the University of Manchester on January 22nd that our good friends in MIMAS will be building an Internet search engine competitive with Google.

The launch follows high profile criticism by a senior academic at Brighton University, who argued that students need to be taught to challenge the facts taken from Google or Wikipedia… [Executive Director Caroline Williams] said, “Google isn’t discriminating about the material it chooses - and with no systematic quality control processes it is very difficult for people to explore and discover trusted information. But automation combined with human value judgments, can be more responsive and dynamic in meeting the needs of higher and further education.”

Time for a rhetorical question: if the academic community can’t trust Google to deliver scholarly search results, why would this community trust Google with research data?

  • submitted by Chuck Humphrey