We’re gonna need a bigger boat: Scaling up repository support for larger data

As technologies and computational methods continue to improve, researchers are progressively producing both more data and larger datasets. These larger datasets pose challenges to maintaining the sustainability and capacity of the data repositories that researchers are increasingly expected to utilize for openly publishing their datasets. The Texas Data Repository (TDR) is one such repository currently grappling with these challenges and is, in response, working to refine its service model, technical infrastructure, and data retention policy. The Texas Digital Library which hosts the Texas Data Repository strives to develop collaborative solutions and relies upon the expertise of its service users to address community needs. Following this approach, the Texas Data Repository Steering Committee’s subgroup for Larger Data has developed recommendations for how to scale up support for large datasets while allowing control at the institutional level in our multi-institutional repository. In this presentation, we will share these recommendations, the progress that has been made so far, and our strategy for working within the open source Dataverse community to expand the system beyond our own service needs. The material covered here should be of interest not just to managers of other Dataverse instances, but to all who manage data repositories and rely on them for preserving and publishing large datasets.

Michael Shensky
University of Texas at Austin
United States

Courtney Mumma
Texas Digital Library
United States

Laura Sare
Texas A&M University
United States

Robert Kalescky
Southern Methodist University
United States

Millicent Weber
Baylor University
United States

Bryan Gee
University of Texas at Austin
United States