Feedback on Data Storage

By SMcGinty | February 6, 2014

I posted the following question to the listserv:

“I’m in the early days of exploring what I and our library can do for our faculty and grad students. In my case I’m particularity interested in the social sciences.

It seems there are three main choices:

ICPSR(or other domain-specific site)
Dataverse with my own school’s branding
Local, campus funded storage through an Institutional Repository or something else that can handle larger amounts of data.

Our university is kind of in the vast middle as far as flagship state universities go in budgets and research activity.

What are the pros and cons of these archiving choices? What would best suit a non-wealthy institution? Which requires more training and expertise?”

From the very informative feedback I received from my IASSIST colleagues, I concluded that it is best to keep open to all kinds of possibilities. I was probably naïve in my initial hope that there would be one solution on which I could train my energies. However that is not the case. Different solutions may be best for different factors, including the data in question, local staff skills, and library budgets.

There were many voices that supported the domain-specific repository idea represented by ICPSR. Researchers can get exposure to colleagues in their areas of expertise. There is no need to reinvent the wheel if the expertise and the longevity that ICPSR can provide are out there. In addition, ICPSR is launching “openICPSR,” a new open access repository for researchers and institutions that need to comply with Federal requirements to make data publicly available. Data deposited in “openICPSR” will be discoverable in the ICPSR catalog, but not restricted to ICPSR members – anyone will be able to download. ICPSR staff will edit the metadata appearing in the catalog, and depositors can commission full curation of their collections (e.g. full codebooks, variable-level metadata for searching) by ICPSR staff. In addition to accepting individual projects, openICPSR will also offer packages to meet institutional needs. They are planning at least two options: 1) A multiple deposit option whereby an entity can purchase several project deposits (fees will be discounted for member institutions), and 2) A branded repository page that will list datasets under an institution’s own logo and color scheme.

Many others outlined the Dataverse picture. If you can get a good match between what your campus needs and what Dataverse can provide, this can be a crucial part of an overall solution. Dataverse has ease of entry through a self-service deposit structure, not to mention that the price is right (free)! Many institutions are starting with pilot projects in order to assess the labor impact on the library. A few librarians noted that there are issues of long-term storage, sustainability, and metadata uniformity that can arise with Dataverse.

Some respondents hastened to add that Dataverse will be offering improved services. Dataverse is extending support for additional metadata standards in various scientific domains including biomedical ontologies, astronomy and updating to DDI codebook 2.5 (in the future, support for DDI Lifecycle). They are also extending the search, data exploration and analysis for tabular datasets (with histograms, cross-tabs, enhance descriptive stats, model selection). In addition they are also extending Data/Metadata API and data deposit API, and rich ingest for additional data types.

Local solutions, including formal Institutional Repositories (IRs) and other storage services through a variety of campus resources did not emerge as a popular topic in the posts I received. One librarian commented on the resources in personnel and money that may be needed in IRs to deliver strong service for larger deposits.

Steve McGinty

Social Sciences Librarian

University of Massachusetts - Amherst

Search

Tags