Data management curation: lessons from government, academia, and research
Michele Hayslett (University of North Carolina at Chapel Hill)
Stefan Kramer (American University)
Dan Gillman (U.S. Bureau of Labor Statistics)
Marcel Hebing (DIW Berlin)
Chuck Humphrey (University of Alberta)
Steven McEachern (Australian Data Archive)
The management, publication, and preservation of datasets have become issues of increasing importance for universities, research institutions, and government agencies. While the reasons and mandates for these activities, and the kinds of datasets collected, differ among these types of institutions, other aspects of data management throughout the research lifecycle concern all of them, including (but not limited to): the discoverability of their data; the choice of metadata standard(s) and the creation of metadata; providing visualization and interaction with data; selection and migration of data formats for long-term preservation; policy development; and storage requirements. Yet, these types of institutions tend to follow different paths in data management and curation, choose different infrastructures, metadata standards and platforms. Are these different approaches inevitably rooted in the differences between these types of organizations and their missions and culture? Or are there lessons they could learn from each other to improve their own practice? The purpose of this symposium is to explore that question. We will have presentations first, then form breakout groups along the lines of different aspects such as platform choice, policy developments, metadata creation. At the end, all will come back together to share the results of their discussion.
Teaching an introductory workshop in digital preservation
Laurence Horton (The London School of Economics and Political Science)
Alexia Katsanidou (GESIS – Leibniz Institute for the Social Sciences)
GESIS’s Archive and Data Management Training Center provides introductory level two-day training events in “First steps towards digital preservation”. This workshop is an overview of our training, introducing participants to the design and intended target audience of our events and showcases our digital preservation support and training.Adopting a “train the trainers” approach, the workshop addresses those interested in conducting organizational level training. The workshop addresses archivists, librarians, repository or research data center staff, and anyone responsible for planning curation and preservation of digital assets independent of disciplinary background. Intended as a primer, the workshop requires no previous experience, introducing participants to the “organizational dimension” of digital preservation.Participants have the chance to try our training materials and exercises on:nbsp;nbsp;nbsp; What is digital preservation and why do we need it?nbsp;nbsp;nbsp; Introduction to the OAIS Reference Modelnbsp;nbsp;nbsp; Preserving information for a designated communitynbsp;nbsp;nbsp; Acquisition policies and selection criterianbsp;nbsp;nbsp; Sustainable digital preservation and cost modelsnbsp;nbsp;nbsp; Licensing for preservation and re-usenbsp;nbsp;nbsp; Trusted digital repositoriesBenefits:learning the conceptualization and structure of introductory workshops on digital preservationfamiliarity with content of the workshop and an overview of workshop materials and exercisesuse of the materials to design their own training workshops.
This workshop will focus on principles and techniques for the visualization of data, with an equal emphasis on theory and implementation. Drawing on classic works by Cleveland (Visualizing Data), Tufte (The Visual Display of Quantitative Information), and Wilkinson (The Grammar of Graphics), a range of best practices for visualization will be illustrated. Recently developed techniques for large-scale, 3D, and interactive visualization will also be discussed. This discussion will be based on works such as Graphics of Large Datasets: Visualizing a Million (Unwin, Theus, and Hofmann), the Handbook of Data Visualization (Chen, Hardle, and Unwin), and Trends in Interactive Visualization: A State of the Art Survey (Liere, Adriaansen and Zudilova-Seinstra) For each of these approaches, methods for creating similar graphics in the R open-source statistical language will be demonstrated, using packages such as ggplot2, lattice, and rggobi. Interactive visualization packages such as shiny and healthvis will also be explored. Prior familiarity with R is helpful but not required.
Open source geographic information systems (GIS) tools are maturing to the point of being viable for widespread use. QGIS is a desktop GIS system that is cross-platform, free, extensible and interoperates well with other GIS tools. Data services that support desktop GIS software packages are increasingly supporting QGIS, and ideas for promoting open source desktop GIS use and support will be discussed.In this workshop, participants will gain hands on experience with QGIS. Exercises will include working with vector and raster data, doing very basic spatial analysis, and producing maps. Participants will leave with resources for learning more QGIS. No previous GIS experience is required.
Data librarians are often called on to help users generate customized summaries of variables contained in large public datasets. Major data archives such as IPUMS and ICPSR now make many such datasets available for online analysis in SDA. Although users can easily generate simple tables using SDA, more complex analyses often require the use of other analytic procedures and the recoding of variables into more usable categories.The purpose of this workshop is to provide data librarians with a greater facility in using the SDA programs for recoding variables and the generation of new variables, in order to be able to produce the customized summaries that are often requested. The generation of subsets of data for input into other analysis systems (like Stata, SAS, and SPSS) will also be covered. Workshop participants will practice using those procedures by making use of the U.S. Census data and some international datasets available in the IPUMS archive.Some basic familiarity with SDA will be presumed. However, no special expertise in SDA is required.
Introducation to Terra Populus: integrated data on population and environment
Tracy Kugler (University of Minnesota, Minnesota Population Center)
In this half-day workshop, Tracy Kugler will demonstrate the new Terra Populus data access system. Building on the MPC’s past experience with demographic data infrastructure projects such as IPUMS and NHGIS, Terra Populus seeks to lower the barriers for conducting interdisciplinary human-environment research by making data from different domains easily interoperable. It incorporates a variety of data types, including census microdata, census summary data, and raster data describing land cover, land use, and climate. The data access system allows users to create customized data extracts blending variables from all data types and providing the output in the user’s preferred format. In this workshop, attendees will learn about the content and data processing capabilities of Terra Populus and learn how to obtain and use the data. Attendees will create extracts, download data over the internet, and analyze it in a statistical, spreadsheet, or GIS software package. (Please note: participants must bring their own laptops.)
2014-06-04: Plenaries 2014
Plenary 1
Barbara Entwisle ()
Barbara Entwisle is the Vice Chancellor for Research at the University of North Carolina and a social demographer who has been a data integrator throughout her career of interdisciplinary research. She chaired the OECD Global Science Forum on Data and Research Infrastructure for the Social Sciences and co-edited the final report, “New Data for Understanding the Human Condition: International Perspectives” (which we mentioned in our call for proposals.).
Plenary 2
Seamus Ross ()
Seamus Ross, Dean of the Faculty of Information at the University of Toronto, has agreed to speak about the changing nature of research data professionals and their education. We are looking forward to a discussion about the variety of career categories around research data management. Who should be educating these professionals? What role do the iSchools have? How much of the preparation has to come from the current data community?
Plenary 3
Myron Gutmann ()
Myron Gutmann, former NSF and ICPSR director, will speak about the changing nature of research data infrastructure. Myron has extensive international involvement in shaping today’s social science research infrastructure and will talk about emerging access and analysis technologies, better alignment of the networks of data repositories, and new ways of honouring societal norms around confidentiality while enabling legitimate research.
2014-06-04: 1A: Building data collections
Restoring and preserving historic data for research purposes
Arne Wolters (UK Data Archive)
Matthew Woollard (UK Data Archive)
The Enhancing and Enriching Historic Census Microdata (EEHCM) project is a joint project between the UK Data Archive, the Office for National Statistics (ONS), National Records of Scotland (NRS) and the Northern Ireland Statistics Research Agency (NISRA). The project consists of two phases; the first phase focusses on the restoration and preservation of the 1961, 1966, 1971 and 1981 census of Great Britain; the second phase will create public use and researcher samples for each census. This paper will outline the key issues on the restoration of the original data, first read from backup tapes earlier in the millennium and discuss the key issues surrounding the necessity for documentation in the successful recovery of historic data. It will also discuss the key lessons learned in both restoring files, and preserving and documenting current datasets for future use. The second part of the paper will explain what public use and researcher samples have been made available, and the principles behind their construction.
ClimoBase: lessons learned while rescuing observational data from extinction
Katie Smitt (University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science)
ClimoBase is a dataset that includes approximately 7000 files of climate-related measurements collected over a 14-year period at sites in Churchill, Manitoba, Marantz Lake, Manitoba, and Inuvik, Northwest Territories. The original dataset was created in 1999 with an extraction program built in Fortran. In 2013, this valuable data is barely machine-readable and in dire need of rescue. This paper will describe the steps that were taken to rescue, migrate and preserve the unique observational data within ClimoBase for future use in climate science, and the many lessons that were learned in the process. It will address best practices for a data rescue workflow and the importance of preserving research datasets and their metadata.
Coding + Metada = datasets: the Amnesty International data and digitization project
Amy Barton (Purdue University)
Ann Marie Clark (Purdue University)
Paul Bracke (Purdue University)
The Amnesty International Research Data and Digital Collection Project is a collaborative venture involving Amnesty International USA (AI-USA), and Purdue University researchers. A Purdue University political scientist in her role as a member of the AI-USA’s Archives Advisory Committee, discovered the need to preserve the early records of AI-USA, and in her professional capacity was also aware of the research value of the documents. In particular, she was interested in patterns of change in the culture of human rights appeals and legal instrumentation. Through contractual agreement, the researcher was able to obtain the documents. She was also able to obtain two data sources, both containing data about the Urgent Action Bulletins dating back to 1974. In addition to these data sources, through Nvivo coding, a third data source was created. This paper will discuss the collaborative relationship with AI-USA, the merging of the data sources, additional metadata development, and the final dataset created for the Purdue Researcher. Additionally, three research products resulting from the project will be discussed: a faceted research dataset, a raw dataset inclusive of the merged data sources and metadata, and a human rights controlled vocabulary honed from several existing vocabularies developed for this project.
Overcoming challenges in global health data to empower research: Collection, attribution, and dissemination
Matthew Israelson (University of Washington, Institute for Health Metrics and Evaluation)
The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center at the University of Washington. In 2011, IHME launched the Global Health Data Exchange (GHDx), the world’s most comprehensive health data catalog and repository. The GHDx includes entries (records) for surveys, censuses, vital statistics, and many other sources of health-related data and provides context for those datasets through pages on countries, series and systems, and organizations. For the 40th iASSIST Conference, IHME will provide a Track 1 Individual Proposal covering the development and implementation of the GHDx, showing how the GHDx enables the successful collection, storage, and attribution of social science data and demonstrating visualizations linking input data to research products. This session will cover the challenges of data sensitivity, management, and the “value chain” of data in research. Presenters will further explore how skilled cataloging, technology and data management improve data linkages, standardize attribution, integrate sources, and ensure data discovery for IHME’s researchers, thereby improving IHME’s publicly available estimates and reports.
New and/or unique data-capture practices on a limited budget
Paula Lackie (Carleton College)
This is a collection of 4 special data projects that exhibit innovation in data gathering processes with human subjects that have solved or hope to solve some nagging issues while on a limited budget. 1) the responsible execution and processing of a survey for native ASL communicators – issues: interpreting cultural differences and applying scientific processes toward an interview-style survey, then interpreting these data into numbers 2) innovations in the census of 3 rural Bengali villages on remittances and perceptions of wealth – issues: survey design and managing local graduate students, mitigating a propensity toward data fabrication, data validation on hand-written survey forms, training and norming processes, working without much electricity and no internet access 3) experiences with non-technically inclined researchers conducting interviews and managing their data using smart pens – issues: making the process easy and fruitful while still technically responsible for qualitative data analysis 4) an experimental technique to capture data on paper and convert it to CSV using smart pen technology.
2014-06-04: 1B: Panel. Data Service Infrastructure for the Social Sciences and Humanities
DASISH - data service infrastructure for the Social Sciences and Humanities
Johan Fihn (Swedish National Data Service (SND))
John Shepherdson (Data Archiving and Networked Services (KNAW-DANS))
Vigdis Kvalheim (Norwegian Social Science Data Service (NSD))
Alexia Katsanidou (GESIS – Leibniz Institute for the Social Sciences)
Katrine Utaaker Segadal (Norwegian Social Science Data Services (NSD))
Catharina Wasner (GESIS – Leibniz Institute for the Social Sciences)
We are currently seeing an explosion in research infrastructure initiatives at domain, local, national and international levels. How does a data archive/repository provide services to multiple infrastructures whilst aligning the requests to its longterm strategies and responsibilities? How does a research infrastructure ensure viability and longterm sustainability by ensuring that it has an engaged community of service providers as well as users? Five European research infrastructures in the social sciences and humanities have come together in the form of DASISH – Data Service Infrastructure for the Social Sciences and Humanities, to find solutions to some of the prevailing issues that we face as service providers and research infrastructures and support shared development. The activities in DASISH are consequently broad and cover areas such as: reference architecture for research infrastructure alignment, preservation challenges deposit service convergence, metadata quality improvement, legal ethical challenges, multilinguality in questionnaire tools and question databank, data, tools and services discovery, annotation frameworks, and training workshops for data managers, service providers and those working in infrastructures.
2014-06-04: 1C: Integrated Data Discovery and Access: Building Data Collections
Introducing da|ra SearchNet: the integrated data portal for the social sciences
Tanya Friedrich (GESIS – Leibniz Institute for the Social Sciences)
Brigitte Hausstein (GESIS – Leibniz Institute for the Social Sciences)
Daniel Hienert (GESIS – Leibniz Institute for the Social Sciences)
Data Sharing is largely dependent on infrastructure that facilitates search for and retrieval of research data. Currently, however, the landscape of data repositories and metadata services is uneven and incoherent even within disciplinary boundaries. In our project da|raSearchNet we address this problem for the case of the social sciences by designing and implementing an integrated search infrastructure that aims at fostering data sharing within the discipline. We build on the outcomes of the completed project “da|ra – development of a registration agency for social science data” that already integrates metadata from nearly 20.000 data files and other resources from more than 30 data providers in one database and search application. We are planning to extend this service by automizing our existent metadata recording workflows and by incorporating even more metadata from large comparative survey programmes, from data archives around the world, from qualitative data providers, and other relevant players. Our goal is to establish da|raSearchNet as a comprehensive, easy-to-use metadata store for secondary researchers. In our presentation we describe in detail how we will approach this task in terms of international networking and cooperation, metadata standardization, and search engine technology.
Archiving, dissemination and reuse of research data enhance the opportunities to make secondary analyses on new subjects. By using the DDI-Lifecycle standard you can make meta-analyses and comparative analyses on data materials from different research areas. In the Danish Data Archive we archive and disseminate health data as well as data from the social sciences. By linking those kinds of data we can get new forms of knowledge on the social impact on public health. Danish Data Archive facilitate the use of registries and databases from the health care system and other administrative resources, along with health surveys and the possibility to link it to classic social surveys like surveys of cultural habits. The different kind of data material makes the comparison complicated. Data has been collected by different researchers and institutions for different purposes. The DDI-Lifecycle standardized metadata and comprehensive study descriptions are the key to make linking and comparison possible.
Questasy is a data dissemination tool based on DDI3 developed at CentERdata. It is written in CakePHP and uses a MySQL database. It supports documentation of longitudinal studies as well as the creation of custom datasets. OAI-PMH support for harvesting of studies can be easily set up. The metadata of studies can be exported as DDI 3.1 XML (since version 4.2). Since CakePHP is an MVC framework and supports theming, Questasy can be easily modified to meet specific customer needs. Version 5.0 saw two main features implemented: multilingual support and DDI 3.1 import. For future developments we will investigate whether we can implement DDI 3.2 export and import. Further issues that will be implemented are better documentation of questionnaire routing, summary statistics of variables and dissemination of datasets and syntax created by researchers.
A metadata portal for complex research data in Germany: An application of DDI
David Schiller (Institute for Employment Research (IAB))
Dana Mueller (Institute for Employment Research (IAB))
Ingo Barkow (German Institute for International Educational Research (DIPF))
The research data centre of the Federal Employment Agency in the Institute of Employment Research provides different types of data for the scientific community. There are register data, survey data and linked data between surveys and register data for example. Available metadata tools do not consider the variety of research data of the research data centre and the workflow of the data documentation within departments and the collaboration between departments. Therefore we have started an international development project in cooperation with TBA 21 Assessment Systeme GmbH (Germany), OPIT Consulting Kft. (Hungary) and Colectica (USA) in 2012 to handle and standardized the metadata for all kinds of data in one technical application. Beside the metadata portal there will be a web application for the data users. The focuses of the presentation is the appropriate subset of the DDI standard first and second the application of the metadata portal for users as well as the workflow within departments e.g. user administration and less the technical structure behind the metadata portal.
2014-06-04: 4N: Integrated Data Discovery and Access
Creating catalog records that reflect data use restrictions
Michele Hayslett (University of North Carolina at Chapel Hill)
Amanda henley (University of North Carolina at Chapel Hill)
Wanda Gunther (University of North Carolina at Chapel Hill)
Margaretta Yarborough (University of North Carolina at Chapel Hill)
Joe Collins (University of North Carolina at Chapel Hill)
Many researchers don’t realize the restrictions involved in using licensed data for research. The Data Services staff at UNC – Chapel Hill are working with catalogers to present that information within our catalog records to give the opportunity for that information to be discovered in advance of deciding on a given data set, better aligning the research and data infrastructures. This session will discuss this collaborative effort, the specific MARC fields we’re using and implications for sustainability and for outreach (to increase visibility).
Powerful access to qualitative data: What's behind the UK QualiBank
Darren Bell (UK Data Archive)
In this paper we discuss how we have implemented a digital data browsing system for qualitative data based on highly structured data and metadata. Our system also enable paragraph-level citation. The crux of this exciting project has been the incorporation of object and sub-object level metadata using the QuDEx metadata schema in addition to DDI study-level metadata, and using the Text Encoding Initiative (TEI) for encoding textual data. The QuDex schema was initially released in 2006 following a project undertaken by the UK Data Archive and Metadata Technologies. QuDEx enables simple description of collections, data objects, parts of data objects, captures formal relationships between them, and analytical elements such as categories, codes and memos. The Text Encoding Initiative (TEI) has further provided a powerful tool for marking up bodies of text to enable rich web display. We will discuss how we have implemented the DDI, QuDEx and the TEI in the UK Quali Bank browsing system, and describe our use of technologies: an XML database (Base X) to store and deliver both metadata and textual data, and Solr and XQuery for powerful searching.
A question database for the German Longitudinal Election Study
Wolfgang Zenk-Moeltgen (GESIS - Leibniz Institute for the Social Sciences)
The GLES Question Database contains all the questions from the German Longitudinal Election Study (http://www.gesis.org/en/elections-home/gles/). The GLES Study is a large and ambitious election study in Germany. It is structured in eleven components conducted in different modes which are connected by a common core questionnaire. The GLES Question Database enables users to search for questions and is currently used as an internal tool for the GLES project. A public release is planned until summer 2014. Questions are shown with their answer categories, their variables, and their association to studies, along with some basic study level information. Studies may also be compared to see differences in methodology and topics covered. The GLES Question Database is a first product based on the STARDAT development that was shown at IASSIST before. It is based on DDI-Lifecycle re-usable components that build a basic common infrastructure for several applications. The GLES documentation was imported from a project specific database into DDI-Lifecycle format and combined with study level information from the GESIS Data Archive. The presentation will show challenges and solutions of the development of the GLES Question Database and the possibilities to use it for similar data collection projects.
New infrastructure for harmonized longitudinal data with MIDUS and DDI
Jeremy Iverson (Colectica)
Barry Radler (University of Wisconsin)
Dan Smith (Colectica)
Researchers wishing to use data from longitudinal studies or to replicate other’s research must currently navigate thousands of variables across multiple waves and datasets to answer simple analysis questions. A tool that allows researchers to create documented and citable data extracts that are directly related to their queries would allow more time to be spent on public health research questions instead of data management. MIDUS (Midlife in the United States) is a national longitudinal study of approximately 10,000 Americans designed to study aging as an integrated biopsychosocial process. The study has a unique blend of social, health, and biomarker data collected over several decades. In late 2013, the the United States National Institutes of Health funded MIDUS to create a DDI-based, harmonized data extraction system. This tool will facilitate identification and harmonization of similar MIDUS variables, while enhancing the MIDUS online repository with a data extract function. This will accomplish something unprecedented: the ability to obtain customized cross-project downloads of harmonized MIDUS data that are DDI-compliant. Doing so will greatly enhance efficient and effective public use of the large longitudinal and multi-disciplinary datasets that comprise the MIDUS study. This session will discuss project background and demonstrate the current state of the software.
Integrating PROV with DDI: Mechanisms of data discovery within the US Census Bureau
Bill Block (Cornell University)
Warren Brown (Cornell University)
Jeremy Williams (Cornell University)
Lars Vilhuber (Cornell University)
Carl Lagoze (University of Michigan)
Within the United States Census Bureau, datasets are often derived by complex methods that are not always well documented. This derivation process, or provenance, can be hard to understand for a researcher attempting to use or explore a given dataset. Without understanding the provenance of a dataset, it can be impossible establish whether it is appropriate to use for a given investigation, because its history remains a black box with no way to see inside. The infrastructure upon which the semantic web is built provides a means to label the relationships of social science datasets with logical meaning according to standardized ontologies and controlled vocabularies. This paper outlines the work of the Comprehensive Data Documentation and Access Repository (CED2AR) to integrate provenance metadata encoded according to the W3C PROV ontology with a DDI-based repository with the aim of making US Census data more discoverable and accessible.
2014-06-04: 1D: Teaching Data Management and Statistical Literacy
Creating a large-cale collection of genuinely teacher-ready teaching datasets
Bronia Flett (SAGE Publications)
Patrick Brindle (SAGE Publications)
SAGE Publications is in the process of starting to put together a large collection of many 100s of teaching datasets that can enable faculty, new researchers and students to teach or self-teach across a wide range of many 100s of analytic techniques. The collection will cover quantitative and qualitative data, but comes with many developmental challenges in terms of working out how to best develop different datasets to showcase markedly different methodological techniques and to do so in way that is easy to understand, easy to use and, perhaps, even fun for the user. This paper will summarise the challenges that the editorial team face in pulling together what will be an unprecedentedly large collection of teaching datasets and will canvas audience feedback on what might be seen as must-have or best practice from a librarian and library-user perspective. This aims to be an open-minded session and an opportunity for delegates to work closely with a publisher on thinking through some difficult yet important issues at the very start of a new project.
Managing and curating undergraduate-generated qualitative data
Peter Rogers (Colgate University)
This paper reports on a project at Colgate University. The Department of Sociology and Anthropology (SOAN), working with the library and Information Technology Services, is creating an archive for students’ qualitative data. SOAN has clear pedagogical goals for this project. One, allow students doing local research to build upon the work of previous students who have done similar projects and to avoid duplication of effort. Two, make greater use of student data. Often the student who does the data collection spends most of their time on that and has little time left over for data analysis. Three, give students a fuller data collection, analysis, and management experience. I will report upon my experiences as the data librarian on this project and the lessons that have been learned. At the moment, we are using a Dataverse on the Harvard Dataverse Network. This has been easy to create, but it means that we have to conform to certain standards built into this network. There are also challenges associated with the focus on qualitative data which can include interview transcripts, scanned magazine ads, and text from newspapers and other documentary sources.
Developing data literacies for graduate students in the social sciences
Hailey Mooney (Michigan State University)
Jake Carlson (Purdue University)
What competencies in working with data do graduate students in the Social Sciences need to acquire before they graduate? What roles can librarians and other information professionals play in teaching these competencies to graduate students? This paper will report on preliminary findings from an investigation into the data management competencies and skill gaps of graduate students in the social sciences. Building from the work of the Data Information Literacy (DIL) project (http://datainfolit.org), this study uses an interview-based approach to discern how competencies in working with data are understood and valued by graduate students and their faculty advisors. The DIL project identified and employed 12 data competencies as starting points for interviews and for developing educational programming on data literacies for graduate students. As the original DIL project focused on students in five different STEM fields, this extended study into the social sciences (DIL-SS) will allow for comparisons of perceptions and practices between these disciplines. In addition, DIL-SS presents an opportunity to further develop the 12 DIL competencies and test their relevance to educational needs in the social sciences. Our findings will inform the work of librarians and others involved in offering data management education and consulting services in academic settings.
This paper focuses on the use of qualitative research data and related teaching tools to enhance instruction in undergraduate and graduate social science courses. We discuss the pedagogical benefits that using qualitative data in the classroom can provide. First, combining exemplary articles and books with the data that underlie those studies allows for a more effective description of the methods being taught and better demonstration of how they work. Second, with access to data, students can more effectively practice using the methods about which they are learning through applying them in context to real social science problems; they are also introduced to the notion and practice of replication earlier in their academic trajectory. Third, through exposure to “real” qualitative data, students learn about the process of generating data in the context of the use of particular analytic methods, as well as about how to clearly document that generation.
Teaching data literacy skills in a lab environment
Heather Coates (Indiana University-Purdue University Indianapolis (IUPUI))
Equipping researchers with the skills to effectively utilize data in the global data ecosystem requires proficiency with data literacies and electronic resource management. This is a valuable opportunity for libraries to leverage existing expertise and infrastructure to address a significant gap data literacy education. This session will describe a workshop for developing core skills in data literacy. In light of the significant gap between common practice and effective strategies emerging from specific research communities, we incorporated elements of a lab format to build proficiency with specific strategies. The lab format is traditionally used for training procedural skills in a controlled setting, which is also appropriate for teaching many daily data management practices. The focus of the curriculum is to teach data management strategies that support data quality, transparency, and re-use. Given the variety of data formats and types used in health and social sciences research, we adopted a skills-based approach that transcends particular domains or methodologies. Attendees applied selected strategies using a combination of their own research projects and a carefully defined case study to build proficiency.
A microdata computation centre for de-centralized data sources
David Schiller (Institute for Employment Research (IAB))
Anja Burghardt (Institute for Employment Research (IAB))
The European Data without Boundaries (DwB) project proposes a Remote Access Network (Eu-RAN) to access confidential microdata from different sources. A central service hub within Eu-RAN should host different services that support researchers and research projects. One of the attached services is a Microdata Computation Centre (MiCoCe). It is made to enable analysis of distributed data sources. Due to data security reasons, some of the most interesting data must stay physically in the facilities of the data owner. At the same time, there is an increasing demand on analysing data from different European sources simultaneously. The challenge is therefore twofold: enable analysis with distributed data sources while fulfilling the requirement of not moving the data. This talk will present the outcomes of a workshop held in Nuremberg, Germany, that focused on this topic. Data managers, IT-experts and statisticians discussed approaches that could support scientific research with decentralized data sources. Solutions come from the areas of grid computing, federated databases, statistical modelling and so forth. Having a service like the MiCoCe would enable the use of already available data sources in Europe and in addition make access to Big Data sources possible.
The beginnings of a European remote access network
Anja Burghardt (Institute for Employment Research (IAB))
David Schiller (Institute for Employment Research (IAB))
Richard Welpton (Institute for Employment Research (IAB))
Data about individuals and organisations are routinely collected across the Member States of Europe, through surveys, administrative and financial transactions. Yet access to these micro-data for research purposes, particularly across national borders, is often restricted for confidentiality or legal reasons. Despite the benefits that accrue to society from allowing comparative research to be undertaken using cross-national data sources (such as policy evaluation), researchers face significant barriers in making comparative analyses of data collected in more than one Member State. Legal restrictions on the dissemination and transfer of data, and the consequential cost of visiting the data within its country of origin, prohibit access. – Through the “Data without Boundaries”(DwB) initiative, the European Remote Access Network (Eu-RAN) is a vision of cross-border data access. The realisation of this European data infrastructure dream will evolve step-by-step. Making the first jump requires a connection between two Research Data Centres. The Research Data Centre (FDZ) at IAB and the Secure Lab at the UK Data Archive have joined forces to provide connections to each other’s centres. A researcher in the UK can access sensitive German micro-data held at IAB and vice versa. In this talk, we present the results of our joint-initiative.
Single Point of Access (SPA): A service hub for a remote access network
Anja Burghardt (Institute for Employment Research (IAB))
David Schiller (Institute for Employment Research (IAB)The European Data without Boundaries (DwB) project proposes a Remote Access Network (Eu-RAN) to access confidential microdata from different sources. A centralized Single Point of Access (SPA) will make it easier to reach several decentralized organized network points and ease the work of researchers running projects with data from different data owners within this Network. To support sophisticated transnational research the IT-Infrastructure of the Eu-RAN contains a service hub attached to the SPA, which provides different tools for accredited researchers. The services offered include but are not limited to secured virtual research environments, user account and contract management, interfaces to research data access, text editors, statistical software packages, and tools for cooperation like forums, wikis or instant messaging service. This talk will highlight added values a SPA will generate for researchers using the Eu-RAN. In addition outcomes of DwB discussions about detailed requirements and functionalities for such SPA are presented. Having a centralized access point equipped with the described Service Hub will ease research with different data sources in Europe and bring together researchers, National Statistical Institutes, data archives, data owners and access facilities by offering the basis for a vibrant community.)
The European Data without Boundaries (DwB) project proposes a Remote Access Network (Eu-RAN) to access confidential microdata from different sources. A centralized Single Point of Access (SPA) will make it easier to reach several decentralized organized network points and ease the work of researchers running projects with data from different data owners within this Network. To support sophisticated transnational research the IT-Infrastructure of the Eu-RAN contains a service hub attached to the SPA, which provides different tools for accredited researchers. The services offered include but are not limited to secured virtual research environments, user account and contract management, interfaces to research data access, text editors, statistical software packages, and tools for cooperation like forums, wikis or instant messaging service. This talk will highlight added values a SPA will generate for researchers using the Eu-RAN. In addition outcomes of DwB discussions about detailed requirements and functionalities for such SPA are presented. Having a centralized access point equipped with the described Service Hub will ease research with different data sources in Europe and bring together researchers, National Statistical Institutes, data archives, data owners and access facilities by offering the basis for a vibrant community.
A new CESSDA portal for European research data discovery: progress to date
John Shepherdson (UK Data Archive)
Ornulf Risnes (Norwegian Social Science Data Services (NSD))
Pascal Heus (Metadata Technology)
Work package 12 of the Data without Boundaries Project (http://www.dwbproject.org), focuses on development of a one-stop discovery portal for social and economic research data held by agencies across Europe. Implementation is a collaborative effort led by the Norwegian Social Science Data Services, the UK Data Archive and Metadata Technology, with assistance from others. The portal offers researchers an easy to use faceted search and browse interface, powered by Solr. Workflow activities (metadata harvesting, quality assurance, storage, versioning, harmonization, indexing and retrieval) are implemented as separate components – accessible via RESTful web services – and orchestrated by an administration component providing a dashboard showing activity health, and an error notification mechanism. Summary information (such as metadata quality, geographic coverage, list of providers, study languages) are visible to all users. The provider portfolio is a value added service that is intended to encourage take up by metadata providers, as it will allow them to see the quality scores for their metadata records, along with information that will help them fix any problems and improve future versions, plus usage statistics. This talk will provide a progress update and a brief demonstration, plus a roadmap for future development within the lifetime of the DwB project.
2014-06-04: 2F: Harmonization, Thesauri and Indexing
Taxonomy / Lexicon project at the US Bureau of Labor Statistics
Daniel Gillman (US Bureau of Labor Statistics)
The US Bureau of Labor Statistics is building a taxonomy and lexicon of terms and concepts of its technical language. The terms will also be linked to plain English words the public tends to use in place of them. The taxonomy will first be organized as a thesaurus, with possible expansion as resources permit and the need demands. The lexicon will be an alphabetical listing of the terms. The main drivers for the project are to support 1) the development of an agency wide series dissemination system, and 2) tagging documents, reports, and periodicals. An aim is to ensure consistent retrieval of documents and data. A team was assembled over the summer to address these issues. First, we gathered the terms and underlying concepts describing time series. These will be organized, and concurrently, the plain English words the public uses will be identified. Finally, terms, concepts, and plain English describing other data will be added. The initial phase of this project is due in January, but additional work and improvements will be necessary. Organizations that build thesauri find the system must be continually maintained, so this will be an on-going effort.
Linking thesauri: ELSST as a hub for social science data terms
Lorna Balkan (UK Data Archive)
Without controlled index terms, data retrieval within a data catalogue becomes at best hit and miss. The UK Data Archive manages two thesauri: the multilingual ELSST thesaurus, and the monolingual HASSET thesaurus, from which it is derived. Over the last year, through funding from the UK’s ESRC, the Archive has developed both of these thesauri, plus their management applications, and created linkages between them. Extending ISO 25964, the UKDA has developed a way of mapping its social science thesauri to facilitate cross-national data retrieval. It has also created SKOS formats and is developing a new and innovative application for thesaurus management which combines term visualisation with tree structures. The two thesauri now share a clearly defined, common set of core concepts, but have room for divergence. The new application will allow terms to be promoted to the core set, where they exhibit partial or exact equivalence, or be demoted to ‘non-core’. The application allows authorised language equivalents to be added to ELSST terms. Bundled suggestions will also be made for changes to the thesaurus terms or structure, linked to the tree structure. This paper describes the new application and the processes used within the UKDA for thesaurus management.
Improving precision and recall in study retrieval: A concept for thesaurus-based syntactic indexing
Tanja Friedrich (GESIS - Leibniz Institute for the Social Sciences)
Pascal Siegers (GESIS - Leibniz Institute for the Social Sciences)
Current practice in subject indexing of study descriptions in data catalogues often consists in assigning a limited number of non-linked subject terms. To control for semantic ambiguity and improve recall in retrieval, ideally a thesaurus is used to perform this task. However, this practice does not solve the problem of syntactic ambiguity in subject indexing, which is of particular relevance to questionnaire-based study descriptions. For example, the general terms attitude, behaviour, and experience may co-occur with subject terms like democracy, homosexuality, and religion without being apparent, to which of these subjects the attitudes, behaviours, and experiences have been enquired. This kind of syntactic ambiguity results in imprecise retrieval, in particular when in-depth indexing is employed. We suggest a concept of thesaurus-based syntactic indexing for study descriptions that aims at using high specificity and pre-coordination of terms with the intention of improving recall as well as precision in retrieval. We are working with role indicators (indicating general or subject terms) and with term linking in order to index concepts on the item or variable level (e.g. ‘democracy: attitude’; ‘homosexuality: behavior’; ‘religion: experience’). We plan to employ our concept of thesaurus-based syntactic indexing to enable sophisticated retrieval techniques like faceted searching.
The case of CharmStats or how the process of harmonization can document itself using the right tool
Kristi Winters (GESIS - Leibniz Institute for the Social Sciences)
Martin Friedrich (GESIS - Leibniz Institute for the Social Sciences)
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Comparative social researchers often confront the challenge of making key concepts equivalent across different factors. Usual examples are geographical space or stretches of time or different codings. GESIS has developed the software program CharmStats (Coding and Harmonizing of Statistics) to provide social researchers with a structured, documentable and publishable solution to data harmonization. By enabling researchers to simultaneously perform and document the process of variable coding and harmonization, the resulting CharmStats Projects can be used for publication and citation providing also syntaxes for proprietary software packages. The presentation reviews the CharmStats software and the logic behind it. We will demonstrate how the application allows users to browse stored harmonization projects. Using the concept ‘Education’ and data from the CSES (Comparative Study of Electoral Systems) as our basis, we will present step by step a full publishable CharmStats Project.
With a study collection fully documented in DDI-L a handful of advantages for dissemination services and support comes along. DDA has built the infrastructure to facilitate dissemination of DDI-L on the web. This infrastructure incorporates: Multi-faceted search Landing page with micro format mark-up prepared for search engine harvesting DDI Codebook rendering DDI URN resolution API access These elements are built into one single platform – on which DDI-L collection deployment is a breeze. The infrastructure is published open source and in production at DDA. We invite you to come and hear about the architecture behind the scenes and see the features the infrastructure offers.
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
Mary Vardigan (University of Michigan, ICPSR)
The DDI specification of the future will be based on an information model. This is a common strategy for standards development and it offers several benefits: improved communication with other disciplines and standards efforts, flexibility in terms of technical expressions of the model, and streamlined development and maintenance, among others. Goals for the new model-based specification include: * Robust and persistent information model * Complete data life cycle coverage * Broadened focus on new research domains * Simpler specification that is easier to understand and use, including better documentation Development of the model has already begun with a recent “sprint” held in Germany in October 2013. This session will provide an overview of the modeling project with an emphasis on both content and technical perspectives.
2014-06-04: 2H: RDM across boundaries and disciplines
Tracking the effectiveness of research data management training
Laurence Horton (The London School of Economics and Political Science)
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Astrid Recker (GESIS - Leibniz Institute for the Social Sciences)
Much work has been done on scoping researcher requirements when it comes to Research Data Management (RDM) training and support. However, despite community agreement that implementation of effective RDM techniques rather than just training is the critical factor in producing good quality reusable data, there has been less research on the effectiveness and impact of training and support. This paper contributes to this literature by asking participants in Research Data Management training workshops organized by the GESIS – Leibniz Institute for the Social Sciences how and if implemented knowledge gained from our workshops. GESIS launched its Archive and Data Management Training Center in 2011. The center provides introductory level training for a pan-European audience of researchers in RDM. Since its establishment, the center has held four workshops attended by 60 people from 12 European nations. As part of our workshops we gather profiles of participants — their research type, discipline, why they attend, and initial workshop evaluations, thereby building up a profile of our audience that allows us to gain impressions of the effectiveness of training that could have wider lessons for the RDM community.
It takes a village: strengthening data management through collaboration with diverse institutional offices
Alicia Hofelich Mohr (University of Minnesota)
Thomas Lindsay (University of Minnesota)
Successful data management involves actions throughout the entire research lifecycle, but often providers of data management services directly interact with researchers only at isolated points in their study. This is a challenge for service providers when trying to help researchers integrate best practices into their workflow, as much of this work takes place in their own offices, labs, or with the help of other specialized support services on campus. Therefore, an important part of promoting data management on campus is to make specialized institutional support offices aware of the impact their services can have on the management of research data. This paper will describe efforts at a large, decentralized, mid-western American university to promote data management practices and awareness by reaching out to offices across campus, including those involved in grants consulting, human subjects protection, data security, survey services, statistical consulting, and technology commercialization. From the perspective of a unit specializing in direct research support, we will also discuss specific ways in which our office is integrating data management into our services. We suggest similar offices at other universities may be overlooked but important allies of libraries in promoting data management practices.
Transcending boundaries: Institutional-level engagement in a transdisciplinary data network at Bielefeld University, Germany
ohanna Vompras (University of Bielefeld)
Jochen Schirrwagen (University of Bielefeld)
Najko Jahn (University of Bielefeld)
A growing number of universities are seeking for ways to deal with research data. Some have recently launched policies and dedicated services. But challenges still exist of how to best address research data management on a multidisciplinary scale and across the research lifecycle. In this talk, we present our experiences in designing, governing, and maintaining institutional services for Research Data Management at Bielefeld University. They are a result of a bottom-up development strategy in cooperation with scientists, library, computing center, and research administration department. We focus on the support for data management planning, providing a web-based data management plan tool. Using a template-based system, the capabilities of the tool are flexible, allowing researchers to cover the specific needs of a project, coming from funding requirements, subject-matter, or the nature of the projects’ data. For data publication, the institutional repository has been extended. Data can be published under an Open Database license and is registered with DataCite. For several disciplines, special services exist. For instance, the repository interoperates with research platforms. Federated author profiles make research data easily accessible from project webpages and staff directory. With this approach, first requirements from Bielefeld University’s researchers and funders are met.
Local assessment of science RDM practices as examined through journal policies and recommendations
Dylanne Dearborn (University of Toronto)
Steve Marks (University of Toronto)
It is well established that research data management practices vary greatly across areas within the physical and applied sciences. Challenges for librarians and other data service providers include understanding the differences in disciplinary practices and identifying researchers receptive to engaging with RDM services. Prominent journals requiring that supplementary data be made publicly available could be one possible driver of increased interest in the development of institutional support for RDM practices in the sciences. Using this driver as an instrument to identify potential “clients”, we propose a replicable methodology for local assessment by examining high impact journal policies and harvesting institutional publishing data to determine current local RDM practice across disciplines. The results of the data gathered in this analysis can supplement traditional researcher interviews and identify clusters to engage in further RDM discussion or projects, as well as allow targeted services, training and/or advocacy efforts.
Exploring the data management and curation practices of scientists in research labs
Plato Smith II (Florida State University)
This presentation will discuss survey results from the Phase 1 of my dissertation research, “Exploring the Data Management and Curation (DMC) Practices of Scientists in Research Labs within a Research University”. Data management and curation (DMC) is comprised of (1) data management planning, (2) data curation, (3) digital curation, and (4) digital preservation concepts. The survey included a 25-question adapted Data Asset Framework (DAF) questionnaire. The survey was administered to five high-profile research labs at Florida State University and scientists affiliated with the National Science Foundation (NSF) EarthCube project in Fall 2013. The survey yielded 107 completes, an 83% completion rate, and 10 scientists that agreed to participate in Phase 2 of my dissertation research that includes interviews currently in progress. The purpose of this survey is to build a better understanding of research data held in departments across multiple disciplinary domains, to inform strategic planning for data management at research labs at Florida State University, and to inform the wider research data management community. This presentation will present survey results from an adapted DAF survey as part of a mixed-methods research approach investigating the data management and curation practices of scientists in research labs and the NSF EarthCube project.
2014-06-04: 3I: Collaboration and Networked Infrastructure
An introduction to the UK Administrative Data Research Network
Tavi Desai (University of Essex)
The Administrative Data Research Network (ADRN) is a major new investment in UK data access infrastructure. Comprising an Administrative Data Research Centre in each country of the UK and a coordinating body – the Administrative Data Service – the ADRN aims to provide a secure integrated environment to allow the analysis of linked administrative data for research. The ADRN was funded in October 2013 as part of a major new step change in data access and analysis in the UK which will also include infrastructure and research funding for data from commercial providers, social networks and the third sector. This presentation will describe the aims and structure of the new Network, outline some of the challenges to be faced and report on the first few months of progress.
Working with data across the humanitites and creative arts: the Humanities Networked Infrastructure (HuNi)
Toby Burrows (University of Western Australia)
The Humanities Networked Infrastructure (HuNI) is one of the national “Virtual Laboratories” which have been developed as part of the Australian government’s National e-Research Collaboration Tools and Resources program. Their aim is to integrate existing capabilities (tools, data and resources), support data-centered workflows, and build virtual communities to address well-defined research problems. HuNI is being developed by a consortium of thirteen institutions, led by Deakin University. It aggregates heterogeneous data from thirty different datasets which have been developed by academic research groups and collecting institutions (libraries, archives, museums and galleries) across the full range of humanities disciplines. The datasets include Design and Art Australia Online, the Australian Dictionary of Biography, AustLit, AusStage, the Dictionary of Sydney, the PARADISEC linguistics archive, and the AUSTLang database. Incoming records are mapped to a common data model, but not merged, while their provenance and distinctive content are retained. HuNI also provides tools for working with the aggregated data. These include tools for data linkage, visualization, and annotation developed at the University of Queensland through the LORE project, and tools for creating and uploading new datasets for aggregation (using the Heurist package developed at the University of Sydney).
Marion Wittenberg (Data Archiving and Networked Services (DANS))
Eric Balster (CentERdata)
Data Archiving and Networked Services (DANS) and CentERdata are working together within Survey Data Netherlands (SDN), a service for the dissemination and long-term preservation of survey data in the Netherlands. Both institutes complement each other in the field of data management for survey data. CentERdata has a lot of expertise in data collection, documentation and dissemination of survey data. DANS has extensive knowledge and experience in the field of sustainable long-term archiving. Survey Data Netherlands will have a three-layer architecture with a portal, which can be searched for survey questions and survey data. This portal will harvest a collection of dedicated survey repositories associated with various projects or institutes. In each repository the surveys of a project can be stored, documented and disseminated. For long-term preservation the content of the repositories will be archived at DANS. This presentation will deal with the various aspects involved in this collaboration, technical as well as organizational. Issues such as the semi-automatic exchange of data and metadata between the different survey repositories and the DANS Electronic Archiving System, the use of Persistent Identifiers, and the one-stop principle of Survey Data Netherlands for users of this infrastructure will be discussed.
Wagging the long tail: Current practices and ways forward
Kathleen Shearer (Confederation of Open Access Repositories)
There are a growing number of institutional services aimed at collecting research data sets that fall outside the scope of large discipline-specific, or government data repositories. Such data sets represent a diversity of formats, may have documentation and curation weaknesses, and are often not easily found. In addition, there are significant variations in approaches in collecting and managing those data. A 2011 survey by Science found that, “even within a single institution there are no standards for storing data, so each lab, or often each fellow, uses ad hoc approaches.” The European Commission says, “Different institutions archive their research data in different ways – making access difficult from outside the institution.” This paper will present the results of an international survey of research data repositories that collect and manage heterogeneous data sets. The survey will be conducted in January/February 2013 under the auspices of the RDA Long Tail of Research Data Interest Group. The results will provide the broader community with a deeper understanding of the metadata and vocabularies that are currently in use within these contexts, with the aim of developing community-based recommendations to ensure lightweight interoperability and visibility of this type of research data across the globe.
Building a national trusted digital repository for SSH data
Aileen O'Carroll (Digital Repository of Ireland)
The Digital Repository of Ireland is an interactive national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions; providing a central internet access point and interactive multimedia tools, for use by the public, students and scholars. It is a four-year exchequer funded project, comprising six Irish academic partners, and is supported by the National Library of Ireland, the National Archives of Ireland (NAI) and the Irish national broadcaster RTÃ137;. A key task is to link together and preserve the rich data held by Irish institutions, provide a central internet access point and interactive multimedia tools. Enabling access and reuse to research data is a central challenge. This paper outlines how the key challenges and lessons learned in the process of building this national repository. It argues that a process of qualitative interviews conducted by DRI allowed us to develop a complex understanding of the barriers which might limit the ability of data to be shared. An unexpected outcome of this process was it facilitated community engagement. This assisted in developing the relations of trust which are so key to overcoming barriers to access and data sharing.
2014-06-04: 3J: Developing Meaningful Data Support Roles and Services
What is a Canadian data dude?
Jane Fry (Carleton University)
Using the results from the 2012 DLI (Data Liberation Initiative) Contacts Survey, I will paint a picture of someone who is a data dude, aka data librarian or data specialist, in a Canadian university or college. A background of the survey will be reviewed briefly, followed by a description of what the data profession in 2012 entailed, including: the amount of time spent offering data services; the frequency of data-related questions; the type of instruction-related activities; what other job duties are performed; the level of confidence in helping others with data; where do data dudes go to get help with data; and whether or not they receive regular professional development. Suggestions for a picture of a Canadian data dude in the future will be given.
Shedding our skins? Reflections on a liasion librarian's attempt to learn how to scrape data from the web using Python
Jeremy Darrington (Princeton University)
In recent years, there has been a sharp increase in the amount of data being created and exposed on the web. Many useful data sources–such as campaign finance and lobbying records, speeches, court rulings, legislative bills, news, geocoded event data, etc.–are of interest to social scientists and the librarians who assist them. These sources come in a variety of formats and often require considerable work to extract, organize, and clean up for analysis. As more of my clientele have expressed interest in using these kinds of sources, I decided to embark on a personal training effort to learn how to scrape and process text from the web using the Python programming language. This paper will present reflections on my experience, the utility of this kind of training for liaison librarians, and issues surrounding professional development and the acquisition of new skills by librarians.
Chasing the research question with data: The development of a data assignment for undergraduates
Katharin Peter (University of Southern California)
For many students, the Library’s contribution to the research process begins and ends with the literature review. In these cases, librarians might assist students with pre-defined research questions or broad research topics to locate existing literature, primary sources, and data. However, as studies have shown, students often struggle with the first step of research: developing a viable question. This presents a unique opportunity for librarians and, especially, data librarians to provide guidance while at the same time supporting campus-wide initiatives to improve critical thinking, quantitative reasoning, and information literacy skills. Specifically, a small, data-based assignment for undergraduates could be employed to help students identify a research question. This paper will present the results of a series of pilot projects in which undergraduate international relations students were tasked with developing research proposals based on exploratory analyses of statistical databases from international and non-governmental organizations. The paper will discuss the development, outcomes, and assessment of data assignments piloted over the past two years and including traditional lectures, hands-on data labs, and online modules.
From data to the creation of meaning part I: unit of analysis as epistemological problem
Justin Joque (University of Michigan)
Aligning data and research infrastructure is, as we are often reminded in our work, an incredibly difficult process. While we often focus on research lifecycles, incentives, storage and transmission technologies, metadata and data sharing we tend to overlook the epistemological incongruencies of diverse research and data practices. All data creation processes, even if unknowingly, make assumptions about the world and what exists as a unique unit that can be analyzed. In attempting to make data meaningful to different audiences, especially across disciplines, we must pay attention to these epistemological assumptions. Failure to do so will inevitably frustrate our attempts to develop meaningful infrastructure for research data and even potentially undermine effective research through misunderstandings of data. Looking at spatial data as an example, this presentation will explore the issues of resolution and unit of analysis as an example of such disciplinary epistemological assumptions. Based on experiences working with spatial data across disciplines, this presentation will explore some of the misunderstandings that arise and suggest ways in which they are indicative of larger issues surrounding data collections, management and interpretation.
From data to the creation of meaning part II: data librarian as translator
Kristin Partlo (Carleton College)
While institutions, methodology and geography all present barriers for communication and development of infrastructure, sometimes the greatest barriers may be in reaching not across the world but across the hallway. Engaging in the work of unified infrastructure requires finding language that bridges modes of inquiry and meaning, so that all participants see their place in the whole. This work of finding shared language involves translation at many levels. Data professionals know that not everyone means the same thing by ‘data’ and increasingly we seek language that spans the practices of social science, sciences, humanities, and performing arts. This paper aims to highlight some of the ways in which data professionals are already adept at translation. Drawing on examples from work as a subject librarian and data professional at an undergraduate institution, I will elaborate on ways in which translation permeates our daily work, from helping new researchers learn the language and methods of a field, to supporting faculty as they expand their teaching and research across disciplines. Additionally, our role as semi-outsiders within the institution situates us well to help drive conversations spanning disciplinary modes of thinking, in which faculty may also find themselves as semi-outsiders.
Arne Wolters (UK Data Archives, University of Essex)
Jo Wathan (University of Manchester)
This paper argues that, using the UK census data from 1961-1981 as a case study, the age of data limits the availability of matching data, and the disclosure risk through spontaneous recognition. Dissemination of data from which individual identity can be deduced from the data, by themselves or with published information is prohibited under the Statistics and Registration Service Act (SRSA) in the UK, and by similar legislation elsewhere. Disclosure can arise from matching data on common keys, spontaneous recognition of oneself, acquaintances or of those in the spotlight. Disclosure also arises by recognizing identifiable combinations of characteristics known to the intruder. Contributing factors to the reduced disclosure risk are: (i) mortality of data subjects, using the English Life Tables, this paper estimates mortality rates for data subjects in the UK census data; (ii) lack of availability of matching data, by law organisation are limited in time they can keep personal data; (iii) memory loss, forgetting one’s own past, as well as that of other data subject; (iv) churn, the tendency for data to become outdated as circumstances of data subjects change. Arguably, the contributing factors above are independent and their effect on disclosure risk therefore additive.
Laurence Holton (The London School of Economics and Political Science)
Katharina Kinder-Kurlanda (GESIS – Leibniz Institute for the Social Sciences)
In this paper we investigate the effect of different legal requirements and attitudes towards data protection, intellectual property, and ethical review processes and procedures in Germany and the UK to illustrate the divergence in regulation. We especially focus on formal and informal ways of data sharing across disciplines and countries, highlighting paths to ethical and responsible data sharing. Data archives often provide discipline specialized research data management support services. An aspect of support is addressing legal data protection requirements, intellectual property standards, and ethical research practices. Researchers encounter laws concerning data protection and intellectual property throughout the data-lifecycle, but established research practices can clash with new legal frameworks (see debate over the proposed reform of European Union data protection rules). Likewise, changing public attitudes to research and notions of consent can affect conceptions of what constitutes ethical research. Archives can help researchers navigate an environment which simultaneously pushes data sharing, and consideration of the individual’s right to privacy and protection. However, doing so often requires negotiating the gaps and contradictions of different national laws, research cultures, and funding environments.
Building a cross border data access system for improved scholarship and policy: the case of the German IAB network of RDCs
Joerg Heining (Institute for Employment Research (IAB))
Warren Brown (Cornell University)
Bill Block (Cornell University)
Stefan Bender (Institute for Employment Research (IAB))
The Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB) provides data on individuals, households and establishments, as well as data that comprise both establishment and personal information. The FDZ data originate from three different sources. From the notification process of the social security system and the internal procedures of the Federal Employment Agency process-generated data are obtained. Furthermore, the IAB acquires data by conducting own surveys. The use of weakly anonymous data is subject to restrictions concerning data protection legislation. Due to these regulations the data can be analyzed only via on-site use. For this purpose, the FDZ provides separate workplaces for guest researchers in Nuremberg and further locations in Germany and in the USA. As this network of research data centers is built out it is with the expectation that improved access to restricted data managed with a high degree of data security will yield benefits to policy makers as well as to scholars. The benefits and the means for documenting these benefits is a central focus of the paper.
Advancing access to restricted data: compliance, regulations, continuous monitoring….Oh my!!
Janet Heslop (Cornell University, Cornell Institute of Social and Economic Research (CISER))
In this session I will demonstrate the restricted access research computing infrastructure (CRADC- Cornell Restricted Access Data Center) that CISER (Cornell Institute of Social and Economic Research) has built to serve Cornell social science researchers and their collaborators worldwide. Based upon a local private-cloud service running on state-of-the-art computing systems that provides researchers (individual or teams) an environment where they can perform basic-to-complex highly secure computing tasks. The private-cloud service is accessible via remote access according to the researcher’s data provider agreement. In this presentation I will show how and why CRADC’s clientele continues to grow exponentially; discuss compliancy, regulations and standards based upon a specific data provider agreement; and review the need for continuous monitoring. Topics will additionally provide a description of the infrastructure, the value of shared economies of scale, the straightforward access to restricted data, and the administration/support services required to keep CRADC known as “the gateway to restricted data” at Cornell University.
Re-using qualitative data: Qualitative researchers' understanding of and their attitude on data re-use
Ayoung Yoon (University of North Carolina at Chapel Hill)
In recent years, data sharing and data reuse in scientific research have been discussed in greater frequency. This has occurred with the revolution in the field of science known as data-intensive research and the growth of data in “big science”. Discussions regarding qualitative data sharing and reuse have also been very active, especially in Europe, where data archiving practices are well established. There has been a significant amount of interest in the issues related to the reuse of qualitative data. Organizations, such as the Social Research Council in the UK and the Australian Research Council have been trying to support this interest. However, the discussions do not seem to be as prominent in the United States, despite a long history of depositing and curating practices of scientific data. This study aims to understand the researchers’ thoughts and perceptions regarding qualitative data reuse in the field of social science in the United States. In particular, this study will focus on the barriers or hindrances to reusing qualitative data, the appropriate conditions for reusing qualitative data, and the perspectives of qualitative researchers on data reuse. The preliminary results from the in-depth interviews with qualitative researchers in social science will be presented.
Discovering and accessing Social Science data of East Asian countries: Trends and obstacles
Jungwon Yang (University of Michigan)
The increasing use of geographic information system (GIS), combined with the wider availability of statistics of census and economic data, has recently opened up new possibilities for more interdisciplinary academic research in social sciences. Social science researchers have become more and more interested in combining geographic analysis and traditional quantitative and statistical methods in order to test hypotheses and present arguments in more effective ways. Yet, discovering, accessing, and using international geospatial data and statistics is still a challenge for researchers: As OECD Global Science Forum report (2013) noted, information about the existence of micro-data and their availability for the re-use is often difficult to find. The language, legal, cultural, and technological obstacles often exacerbate the difficulty to re-use the discovered data. In this paper, I review what kinds of new data on East Asian countries have been recently developed by national statistical agencies, government departments, and academic institutions. Also, what obstacles researchers have encountered in using the data in their research are investigated.
Data sharing and citation practices: An application of the theory of planned behaviour to social science research practice
Steven Mceachern (Australian Data Archive)
Janet McDougall (Australian Data Archive)
This paper will showcase the results of a recent project run by the Australian Data Archive that aims to better understand the data sharing and data citation behaviours and attitudes of Australian social science researchers both in Australia and internationally. We define these two behaviours as follows: a) data sharing: “the voluntary provision of information from one individual or institution to another for purposes of legitimate scientific research” (Boruch, 1985) b) data citation: “the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers” (ANDS, 2011) Drawing on the theory of planned behaviour (Azjen, 1991), the study explores researchers’ current and intended data sharing and data citation practices, attitudes towards each behaviour, and perceived social and institutional barriers and supports for data sharing and data citation within their organisation and discipline. The paper will present the results of the ADA survey, and compare with two recent US studies to explore cross-national similarities and differences in data sharing and citation behavior.
Methodology and outcomes of using the Data Seal of Approval to benchmark and guide trust issues in an evolving European Research Data
Herve L'Hours (UK Data Archive)
This paper will outline the methodology and processes used to develop and define the requirements for trust in the emerging Consortium of European Social Science Data Archives (CESSDA). As CESSDA evolves from a “Council” to a “Consortium” one goal is to ensure that national service providers can align to the same (or similar) practices and standards to maximise cross-national efficiency and interoperability. A key part of this collaboration is trust. The 16 trust guidelines of the data seal of approval were used to benchmark current practice, identify potential gaps and to support the development of mutual trust structures. The paper outlines the progress to date on this project and discusses the potential impact of audit and certification of trusted digital Repositories on data management within this research infrastructure.
Using identifiers to connect researchers, authors and contributors with their research data
Elizabeth Newbold (The British Library)
The ORCiD and DataCite Interoperability Network (ODIN) project is a collaboration between The British Library, CERN, ORCiD, DataCite, Dryad, arXiv and the Australian National Data Service with the aim of using persistent, open and interoperable identifiers for people and for datasets to connect researchers, authors and contributors with their research data. This paper will outline the key results from the first year of the project including the proofs of concept in the Humanities and Social Sciences (HSS) and High Energy Physics (HEP) and will outline the commonalities between the extremely different disciplines that will inform a way forward for implementing an identifier ecosystem across disciplines. The paper will highlight technical work that has already been performed between making the identifier systems interoperable and will showcase value added services that have been built on the open APIs provided by the identifier systems. The paper will also outline gaps that currently hinder the adoption of such systems and outline a way forward for libraries and data centres to help implement these initiatives. The ODIN project ran a session at IASSIST 2013 and are grateful to have the opportunity to update the community on the progress made in its first full year.
Rich metadata from Blaise
Beth-ellen Pennell (University of Michigan)
Gina Cheung ()
This presentation will discuss the process and challenges faced during the harmonization and preparation of the metadata and data files of the Collaborative Psychiatric Epidemiology Surveys (CPES) http://www.icpsr.umich.edu/CPES/index.html. The CPES joins together three nationally representative surveys of adults living in the United States: the National Comorbidity Survey Replication, the National Survey of American Life, and the National Latino and Asian American Study. These data were collected face-to-face using the Blaise software. The Blaise data were transformed into XML capturing the rich metadata available in Blaise data models. The combined CPES dataset contains approximately 20,000 interviews. The initial combined dataset had 9.400 raw variables distributed over 92 sections of the three surveys. The final dataset contains approximately 5,600 harmonized variables, 400 constructed variables and 14 separate weights. The website contains rich metadata including an interactive cross-walk of all harmonized variables with question text in 5 languages, response options, missing data codes, descriptive statistics (frequencies, etc), universes, detailed documentation of all constructed variables, and descriptive statistics of all variables, among a wide variety of other products. These products will be discussed in light of the upcoming release of DDI 3.2 and version 5 of Blaise.
DDI Handbook: Overview and examples of recommended best practices
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
This session introduces the DDI Handbook Project. Building upon previous efforts at various DDI workshops, the project will produce a collection of best practices on using DDI. These best practice descriptions will be modular with a homogeneous format, allowing reorganization in multiple ways. The primary structure for the collection will be organized in alignment with the DDI Lifecycle. A goal will be to involve the DDI community in producing a shared body of resources for all organizations and individuals using the DDI specification. Best practices will be reviewed by a team of editors and reviewers and published on a dedicated website. The presentation will describe the overall project and some of the specific best practices which will appear in the initial collection. These will include guidelines for archives introducing DDI into their workflow and other institutions already using DDI Codebook and shifting some of their workflow to DDI Lifecycle. Another area will be utilizing DDI for data discovery.
2014-06-05: 5Q: Community Source meets Open Source
Community source meets open source: An inspired approach to collaborative funding
Carla Graebner (Simon Fraser University)
Allen Bell (University of British Columbia)
Alex Garnett (Simon Fraser University)
Geoff Harder (University of Alberta)
The University of British Columbia Library, The University of Alberta Library, and Simon Fraser University Library are collaborating with a local open-source developer of Digital Collections and Archival software platforms in order to meet their respective needs for new digital repository initiatives at all three sites. The projects present an intersection of both open and community source development Although each institution has some local development expertise to take full advantage of open-source software, the collaboration remains novel in respect to the funding model allowing each institution to contribute their own site-specific requirements and benefit from the resulting common architecture. The partners will discuss their experiences and expected outcome from this collaboration, touching briefly on their own implementation plans. Simon Fraser University is developing a Research Data Repository (RaDar), The University of British Columbia is creating a digital preservation program which includes integration with cIRcle and the University of Alberta continues to develop the Canadian Polar Data Network along with other collaborations.
2014-06-05: 5R: Innovative Approaches to Promoting Transparency in Research
Data access and research transparency in political science
Colin Elman (Syracuse University)
In a range of academic disciplines, including the social sciences, research transparency is increasingly being seen as an indispensable element of credible inquiry and rigorous analysis, and hence essential to making and demonstrating scientific progress. As a part of this movement, the American Political Science Association (APSA) assembled a Data Access and Research Transparency Ad Hoc Committee and tasked it with developing standards to encourage openness in the discipline. DA-RT standards call on political scientists to show the information underpinning evidence-based claims, describe how that evidence was collected (if a scholar collected it him/herself), and demonstrate how that evidence supports empirical claims and conclusions. Of course, the discipline of political science includes diverse research traditions with very different views on how social inquiry is best conducted, and its commitment to transparency is not unique to one episteme. Accordingly, it was understood from the outset that DA-RT cannot be a one-size-fits-all proposition: while APSA’s new standards apply across the research traditions, those traditions will instantiate APSA’s new standards differently. In order to begin the conversation about how to do so, APSA authorized the DA-RT Ad Hoc Committee to develop more fine grained guidance for DA-RT in different research traditions. These draft guidelines, it is hoped, will serve as a jumping-off point for rich and fruitful discussions about the practices and promise of research transparency in the discipline.
Badges to acknowledge open practices
Andrew Sallans (Centre for Open Science)
Openness is a core value of scientific practice. There is no central authority determining the validity of scientific claims. Accumulation of scientific knowledge proceeds via open communication with the community. Sharing evidence for scientific claims facilitates critique, extension, and application. Despite the importance of open communication for scientific progress, present norms do not provide strong incentives for individual researchers to share data, materials, or their research process. As an example, journals can provide such incentives by acknowledging open practices with badges in publications. Badges do not define good practice; badges certify that a particular practice was followed. This talk will introduce this strategy and the initial three open practices badges specified by the committee of open science community members: Open Data, Open Materials, and Preregistration. It will also include examples of implementation and community reaction.
Over the last decade, the Data Science team at Harvard’s Institute for Quantitative Social Science has been iteratively developing Dataverse, a data repository framework to facilitate and enhance research transparency through data sharing, preservation, citation, reuse and analysis. During the last two years, the group has implemented extensible data publishing workflows and effective ways to link articles to data. This talk will focus on the latest data publishing workflows and data publishing for sensitive data.
A playbook on obtaining funding to archive a prominent longitudinal study
Chiu-chuang Chou (University of Wisconsin. Center for Demography of Health and Aging)
The Center for Demography on Health and Aging (CDHA) at the University of Wisconsin-Madison recently received a small research grant (R03AG045503) from the National Institute on Aging (NIA) to archive three waves of the National Survey of Families and Households (NSFH). This project will evaluate, organize and prepare all public-use data and documentation files from the NSFH project website (http://www.ssc.wisc.edu/nsfh) for archiving in publicly-accessible archives. I will share our experience in writing grant proposals and the evaluation procedure on R03 grant proposals at NIA. Methods and specific aims for this project will be described as well.
What is the place for data visualization in information literacy? Data librarians are typically expert in explaining the use of specific databases and software tools to locate and analyze information, but data visualization outside of the specialized context of GIS is typically given short shrift. The Pecha Kuchka illustrates the elements of data visualization best practices that deserve to be made a part of the data librarian’s standard teaching toolkit, and also makes recommendations for which aspects of data visualization are valuable for general inclusion in information literacy goals.
Andreas Perret (FORS, Swiss Center of Expertise in the Social Sciences)
Switzerland has been running a research inventory in the social sciences for the last 20 years. Little has been published on its contents. We choose to explore this dataset using the open source network graphing tool Gephi as well as visualization instruments Sci2 used for scientometry and made available by the University of Indiana (and introduced in a now famous MOOC on information visualization). The first results have shown an interesting picture of the collaboration and financing flows within the country, as well as the limits of the analysis of research abstracts with tools made for an Anglo-American context. Research in Switzerland is described in English, German, French and sometimes Italian with frequent inclusion of foreign expressions. Such situations certainly occur in other countries and we intend to explore ways to solve these issues. Our aim is also to share some of the lessons learned in this work, among which the use of sql queries to build the input data, and also the effects on the visual outputs of choices made while processing the data. In the absence of a proper documentation, these choices turn scientific visualization into black boxes that become a new challenge for the curious scientist.
When I grow up I want to be a data scientist/ work in policy-related research/ make a difference. Can you help me?
Jackie Carter (University of Manchester)
In 2013 at The Campaign for Social Science event in the UK, David Willetts, Minister for Higher Education and Skills pronounced in a world of increasing volume of social data there is an urgent need to “have properly qualified people to exploit and use the data”. At present [the UK has] a serious shortage of social science graduates with the right quantitative skills to evaluate evidence and analyse data. This is not a new finding. Significant and shared efforts in the last decade have resulted in a large, national initiative being funded to tackle this problem in the UK (Q-Step, Nuffield 2013). This PK will highlight the void in teaching quantitative social science, and show how we (the ESSTED team at Manchester) are addressing the challenge of embedding number into the social science curriculum. A variety of techniques have been adopted and trialled; using real world survey data in the classroom alongside making students part of the dataset; flipping lectures; adopting the mantra of ‘practice’ for what is a real-world and employable skill. The PK will present: results based on evidence collated; conclusions based on our experiences; and ideas about how this is informing the University of Manchester’s Q-Step centre.
Exploring how to raise awareness of a data service through academic libraries in the UK
Margherita Ceraolo (UK Data Service)
The UK Data Service, as part of its marketing strategy, is investigating how to raise awareness through academic libraries. This research, conducted by an intern, takes the form of a case study exploring how to embed the UK Data Service within four UK academic libraries. The institutions were selected based on data usage and the methodology qualitative – the data is collected through semi-structured interviews. Background information gathered about libraries’ structures and strategies suggests that the effect of expense justification on the support of data providers by libraries poses challenges to free-at-the-point-of-use (free) data services. In other words it could result in less focus on support in the form of training through tutorials or focus groups. The following question arises: are ‘free’ services less supported by libraries than those that require fees because of the need to justify membership expenditures? This presentation explores how data services can improve interactions with academic libraries in the UK. By closely examining the strategy of four UK universities, it sheds light on the challenges in raising awareness of a data provider like the UK Data Service through collaborating with libraries. It aims to discover whether the study’s findings can be applied to other institutions.
Streamlining the research data archival process at Johns Hopkins University
Jonathan Peters (Johns Hopkins University)
Among other research data management services, Johns Hopkins University Data Management Services (JHUDMS),nbsp; provides its researchers the opportunity to preserve and share their data through the JHU Data Archive. This archive is a research data-specific repository that can host a wide variety of quantitative and qualitative data, and is both format- and discipline-agnostic. We in JHUDMS have begun archiving research data originating from two NSF-funded engineering research projects. This archiving process has begun with data associated with publications, which may be a typical model for library research data archives. These efforts have been an opportunity to understand the time and effort required for activities that can add value to a research data collection (e.g. discussions with the researcher, development of data flow diagrams, migration of data to non-proprietary formats). We will discuss these collections and the steps taken to create them. It is of benefit to both the researchers and JHUDMS to scope and streamline the archiving process with research data associated with publications in mind. We will discuss our current understanding of the most effective curation activities we can efficiently accomplish, parameters for those activities, and those elements perceived by the researcher to be most valuable.
The School of Information and the Institute for Social Research at the University of Michigan sponsored a Data Dive. It was a 30-hour hackathon-style service event to help three local non-profits make sense of the data they had in their administrative records. The volunteer data scientists, coders, designers, and consultants chose which project to work on. The needs of the non-profits varied from visualizations to analysis. As a participant observer I will report on (a) what do the non-profits look like compared to our normal consultations; (b) what do the hackers look like – are they potential IASSISTers or the future of IASSIST or something else; (c) what tools did the hackers use; (d) what parts of the Data Dive were most like an IASSIST conference?; and (e)what does it take to put on a Data Dive besides cool dry-erase table tops?
DDI3 metrics
Claude Gierl (Centre for Longitudinal Studies, Institute of Education)
Jon Johnson (Centre for Longitudinal Studies, Institute of Education)
The Centre for Longitudinal Studies (CLS) and the CLOSER (Cohorts and Longitudinal Studies Enhancement Resources) programme are, in the United Kingdom, in the process of translating legacy paper questionnaires to electronic format on a large scale. The ultimate goal of this programme is to gather the metadata of nine birth cohort studies into a joint DDI3 repository for online searching. In order to monitor the progress of this operation we define metrics which reflect the characteristics of the DDI3 schema, more particularly, its heavy reliance on references, whereby, whenever possible, a DDI3 element is defined once and reused by reference wherever it is required. We measure the volume of the metadata ingested with a Cell Count of the DDI3 xml elements produced and the quality of the metadata with the Synaptic Density, the ratio of the DDI3 references over the Cell Count. The Synaptic Density is expected to have a dual role over the course of the ingestion. In the earlier phases, it measures the lack of redundancy and monitors the cleaning and deduplication processes. In the more mature phases of the repository, it is expected to rise again as re-use, derivations and harmonisations gradually interweave and enrich the metadata.
In this presentation we will introduce ‘UKDS.Stat’ the new data delivery platform for international macrodata at the UK Data Service. We will describe the exciting new features of UKDS.Stat including: fully integrated metadatadata visualisationsearch across all datasets in one platformsave and share data subsets as queriescombine data from different datasets We will then describe how we have made it possible for users to access both Open and protected data within the same platform. Finally we will illustrate how our implementation of authentication for UKDS.Stat was shared with the Statistical Information Systems Collaboration Community (SIS-CC) – a group of organisations co-developing ‘.Stat’. The OECD led SIS -CC was set up so that members could benefit from a broad collaboration, sharing of experiences, knowledge and best practices, and to enable cost-effective innovation in a minimal time. Members include the International Monetary Fund, the European Commission, Australian Bureau of Statistics and UNESCO.
All [global] data is local: How academic libraries are enabling discovery and access for institutional data collections
Terrence Bennett (College of New Jersey)
Shawn W. Nicholson (Michigan State University)
This poster will report on a methodical analysis of the role and function of academic libraries in support of the global data ecosystem. Specifically, we’ll examine how academic libraries (particularly at large research universities in North America and the UK, where data curation and research data services are functions that have become embedded into the research infrastructure) are enabling cross-institutional and interdisciplinary discovery and use of locally produced research data collections. The analysis will also consider how (or if) academic libraries are positioning their local holdings–particularly digital texts, image files, audio archives, and other non-numeric collections–as research datasets (rather than as artifacts of limited local and/or historical interest); and, more importantly, how libraries are making these datasets discoverable by researchers.
All [global] data is local: How academic libraries are enabling discovery and access for institutional data collections
Terrence Bennett (College of New Jersey)
Shawn Nicholson (Michigan State University)
This poster will report onnbsp; a methodical analysis of the role and function of academic libraries in support of the global data ecosystem. Specifically, we'll examine how academic libraries (particularly at large research universities in North America and the UK, where data curation and research data services are functions that have become embedded into the research infrastructure) are enabling cross-institutional and interdisciplinary discovery and use of locally produced research data collections. The analysis will also consider how (or if) academic libraries are positioning their local holdings - particularly digital texts, image files, audio archives, and other non-numeric collections- as research datasets (rather than as artifacts of limited local and/or historical interest); and , more importantly, how libraries are making these datasets discoverable by researchers.
This poster presents an overview of the new Administrative Data Research Network being established in the UK. The Network aims to build a world-leading infrastructure for enabling research access to data which is routinely collected by UK and devolved government departments and agencies. The Network highlights the importance of linked administrative data answering key policy-related research questions, and aims to provide a streamlined and coherent pathway through the disparate and difficult requirements for research access to these kinds of resources - everything from data negotiation to ethical and governance review to data security requirements, secure data anlysis facilities and statistical disclosure control. The poster will highlight the network brand and upcoming website launch and present our continuing development plans.
Working across boundaries: Public and private domains
Flavio Bonifacio ()
Metis Ricerche ()
This paper will illustrate some problems arising when building the necessary infrastructure for data preservation and data reuse across the public and private domains. We are now building, as a private initiative, a new Nesstar based archive in order to preserve data collection of public and private research centers. For this purpose the old separation between Academic and Normal world creates troubles and appears to be anachronistic and obsolete. Especially if we work in the context of the global data ecosystem.
Jonathan Crabtree (University of North Carolina, Chapel Hill, Odum Institute; United States of America, Library of Congress; NDSA)
The vast amount of research data generated across the globe demands a collaborative approach to digital curation and research data infrastructures. The social science community has enormous knowledge to share in this field. At the same time, we have dynamic needs that stretch our resources. A powerful approach is to exchange knowledge and practices with other data curation and preservation professionals in a joint effort improve our results. The National Agenda for Digital Stewardship, produced by the NDSA, annually integrates the perspective of dozens of experts and hundreds of institutions, convened through the Library of Congress, to identify the highest-impact opportunities to advance the state of the art; the state of practice; and the state of collaboration within the next 3-5 years. The poster highlights emerging technological trends, identifies gaps in digital stewardship capacity and provides funders and decision-makers with insight into the work needed to ensure that today’s valuable digital content remains accessible and comprehensible in the future, supporting a thriving economy, a robust democracy, and a rich cultural heritage. Founded in 2010, the National Digital Stewardship Alliance (NDSA) is a consortium of more than 160 organizations that are committed to the long-term preservation of digital information.
CRADLE: Curating reserach assets and data using lifecycle education
Thu-Mai Christian (University of North Carolina, Chapel Hill, Odum Institute)
Helen Tibbo (University of North Carolina, Chapel Hill)
Jonathan Crabtree (University of North Carolina, Chapel Hill)
Michele Hayslett (University of North Carolina, Chapel Hill)
Barrie Hayes (University of North Carolina, Chapel Hill)
Paul Mihas (University of North Carolina, Chapel Hill)
The proposed poster will highlight the Curating Research Assets and Data Using Lifecycle Education (CRADLE) Project. An IMLS-funded project, CRADLE will produce high-quality massive open online courses (MOOCs) and face-to-face workshops focused on data management best practice for researchers and information professionals sought after to support those researchers. These products will be placed at the center of a greater initiative to establish networks of data management education and practice. For the CRADLE project to be sustainable, the MOOC and other materials will serve as a nexus point from which such networks will emerge. These networks are indispensable for aligning efforts to promote standards of practice and to shift the culture towards one that recognizes research data as valued assets critical to the research enterprise.
A data librarian’s dream come true: Data access made effortless
Jane Fry (Carleton University)
Alexandra Cooper (Queen's University)
A collaborative effort among the Ontario data community created ODESI (Ontario Data Documentation, Extraction Service and Infrastructure), a data portal. This revolutionary process of obtaining data has transformed data access for researchers, empowering them by providing ready access to this vital data library service. This poster illustrates the cumbersome process that was necessary to obtain data 15 years ago and compares it with the quick, innovative way it is obtained today, thus freeing up the Data Librarian’s time to educate the researcher in the intricacies of data literacy.
The Data Seal of Approval (DSA) provides a mechanism for repositories to demonstrate their trustworthiness in a transparent way. The assessment criteria are made up of 16 guidelines for which repositories supply evidence of compliance. Once the self-assessment is complete, a peer reviewer evaluates the evidence, and if the repository is found to be in compliance with the 16 guidelines, the Data Seal of Approval is awarded with the seal displayed on the repository’s Web site. Twenty-four repositories have received the DSA so far with more in the pipeline. This poster will provide more information about the DSA initiative to encourage new DSA applicants.
As Canadian agencies prepare to enact a joint policy on the management of data collected through agency funds, universities must ensure they have the human and technical resources in place to support their researchers in meeting agency requirements for data deposit. This exploratory research attempts to describe the range of RDM services currently offered in selected Canadian universities. Beyond conducting a review of what is available, the purpose of this study is to provide more insight on the required building blocks, including the collaborative models, needed to create a sustainable research data management service. The poster presents the methodology, study framework and results.
Bringing data to the DANCE: Implementing a data acquisition model at the Federal Reserve Bank of Chicago
Jim Obst (Federal Reserve Bank of Chicago)
In August of 2011, Federal Reserve’s librarians were asked to come together to manage purchased data acquisitions. The new initiative required one librarian from each of the 12 Reserve Banks and one from the Board of Governors to implement a common process for data procurement. It also formalized the use of a common data catalog. The overall goal was to avoid unnecessary duplication and leverage purchasing power as whole. In response to the mandate, the managing librarians of the Federal Reserve System formed a new, collaborative data work group made up of their 13 chosen data librarians. Together the group created and implemented innovative system-wide policies and workflows, including a unique online catalog. Each individual data librarian was in turn responsible for initiating new policies and workflows for the data management process within their own Reserve Bank. In the Chicago Fed, implementation of the new data acquisition regime has been successful, with variations on its implementation like a District Data Management Group and educational initiatives about data use and procurement protocols. In the first two years, the need for collaborative tools have given way to workflows in SharePoint. Other solutions are planned.
Architecture of the European Remote Access Network (Eu-RAN)
David Schiller (Institute for Employment Research (IAB))
Anja Burghardt (Institute for Employment Research (IAB))
The European Data without Boundaries (DwB) project proposes a Remote Access Network (Eu-RAN) to bring together researchers and research projects with confidential microdata from different European sources. At the IASSIST in Washington, 2012, first ideas of the Eu-RAN concept were presented. In 2014 a more detailed description of the Eu-RAN architecture can be given. Separated running European Remote Access solutions could be improved by connecting them into a Network. A centralized Single Point of Access (SPA) will make it easier to reach several decentralized organized network points and ease the work of researchers running projects with data from different data owners within this Network. A central service hub within Eu-RAN will host different tools that support researchers and research projects. One of the attached services is a Microdata Computation Centre. Additional services are: a virtual research environments, user account and contract management, interfaces to research data access, text editors, statistical software packages, and tools for cooperation like forums, wikis or instant messaging. This poster shows the architecture of the Eu-RAN. This infrastructure will improve access to confidential microdata by mastering the trade-off between researchers needs and data security; in addition it will lead to legal and organizational harmonization.
There exists a growing desire, and growing requirements for scientific research data collected by federal funds to be shared publicly and without charge. Agencies such as the NSF and NIH require data management plans as part of research proposals and the Office of Science and Technology Policy (OSTP) is requiring federal agencies to develop plans to increase public access to results of federally funded scientific research. To be effectively shared, data must be described and documented, discoverable online, and accessible, both today and into the future. Data must be curated. Data curation requires data sharing entities are sustainable. Sustainability requires funding. In early 2014, ICPSR launched a fee-for-deposit service that provides free access to data and documentation to the public and is sustained by deposit fees. openICPSR is a research data-sharing service for the social and behavioral sciences. openICPSR data are: widely and immediately accessible at no cost to data users, safely stored by a trusted repository dedicated to long-term data stewardship, and protected against confidentiality and privacy concerns. This session will demonstrate the openICPSR system and discuss how researchers can take advantage of this new means of archiving data to comply with federal data sharing and preservation standards.
Colectica for Excel: A free tool for increasing data accessibility using open standards
Jeremy Iverson (Colectica)
Dan Smith (Colectica)
Traditionally, data in spreadsheets and plain text formats do not contain rich documentation. Often, single-word column headers are the only hint given to data users, making it difficult to make sense of the data. Colectica for Microsoft Excel is a free tool to document your spreadsheet data using DDI, the open standard for data documentation. With this Excel addin, users can add extensive information about each column of data. Variables, Code Lists, and the datasets can be globally identified and described in a standard format. This documentation is embedded with the spreadsheet, ensuring the information is available when data are shared. The addin also adds support for SPSS and Stata formats to Excel. When opening an SPSS or Stata file in Excel, standard metadata is automatically created from the variable and value labels. Colectica for Excel can create print-ready reports based on the data documentation. The information can also be exported to the DDI standard, which can be ingested into other standards-based tools. This booth will include live demonstrations of the latest version of the Colectica for Excel tool, showing how to document the contents of a spreadsheet, publish the information, and use the documentation to access data in an informed way.
Data documentation and metadata use in research data management
Christie Wiley (University of Illinois)
The role of librarian and research data management has been a growing topic of discussion and participation within libraries and universities. Librarians and universities have formed many initiatives, committees and groups to assist in the areas of support for data deposit and data management. The University of Illinois created an e-research implementation group to bring together subject specialists, research data librarians, and functional specialists to advance the library’s data initiatives. In order to support the efforts of a broader goal to educate others about the research data services that are available to them and offer tools to meet their data needs, a smaller group of librarians within the e-research implementation group began a project to update the data services website. This update provided information and education regarding the definition of data, intellectual property, data sharing, funding requirements, files, formats, preservation and storage information. This poster illustrates how a website can be used as an instructional model for data documentation and metadata. The goal of this poster is to provide insight and information as a point of reference when librarians, researchers, data managers, curators and scientists meet individuals with data needs.
Deepening collaborative relationships in providing Research Data Management support
Carol Perry (University of Guelph)
Wayne Johnston (University of Guelph)
This poster will trace new patterns of collaboration in establishing programs for research data management in a Canadian context. The University of Guelph Library Research Enterprise and Scholarly Communications team has broadened its relationships with other campus units as well as other institutions to strengthen training program development for graduate students and faculty. Our work includes partnership in a multi-university team creating training modules for graduate students in Ontario. We have worked with a provincial government ministry to create a data repository for agri-environmental research and are working with a non-profit group assisting in the development of a discipline-specific repository. These are just a few examples of initiatives we have undertaken over the past year.
Developing incentives for data stewardship and sharing: Library engagement beyond liaison relationships
Heather Coates (Indiana University-Purdue University Indianapolis (IUPUI))
Ted Polley (Indiana University-Purdue University Indianapolis (IUPUI))
Many of the obstacles slowing the adoption of more democratic dissemination of scholarly products are cultural, not technological. While libraries have extended their technological capacity to new methods of dissemination, we have been less proactive in fostering the cultural change necessary for significant adoption. Two particular groups of constituents and communities of practice have been engaged with the library profession, but the personal contact between faculty and librarians at the institutional level is inconsistent and often hinges upon liaison relationships. This poster will describe opportunities for librarians to engage with institutional units and research communities extending beyond institutional boundaries to advance incentives rewarding new forms of dissemination, including data as a valued community resource. Examples of relating changes in dissemination to various community missions will be provided.
Defining security requirements for a remote access system
Katharina Kinder-Kurlanda (GESIS – Leibniz-Institute for the Social Sciences)
Andreas Poller (GESIS – Leibniz-Institute for the Social Sciences)
Philipp Holzinger (GESIS – Leibniz-Institute for the Social Sciences)
Laura Kocksch (GESIS – Leibniz-Institute for the Social Sciences)
Stefan Triller (GESIS – Leibniz-Institute for the Social Sciences)
Sven Turpe (GESIS – Leibniz-Institute for the Social Sciences)
This paper presents some first results of the one-year project "Empirical Secure Software Engineering (ESSE)" which had the two aims (1) to define security requirements for a planned Secure Data Center remote access at GESIS in Germany and (2) to evaluate different threat modelling techniques. Such techniques are intended to assist software developers in defining and evaluating security risks for a system and in deducing necessary requirements for design, implementation and operation. Using several different modelling techniques a group of participating GESIS staff from various archiving and IT backgrounds generated a collection of threat models. We then interviewed participants about their viewpoints, aggregated the models and discussed them in a group session. Through this process we defined security requirements and translated them into implementable technical and organizational security recommendations. Our approach also enabled us to evaluate the applied techniques’ strengths and weaknesses. We will explain some of the security requirements we defined and also show how our process allowed us to make visible different stakeholders’ viewpoints, was able to support meaningful discussion, and facilitated decision making. Our process can be useful for other archives looking for ways to define security requirements in the fields of archiving and data sharing.
Matti Heinonen (Finnish Social Science Data Archive (FSD))
Tuomas J. Alatera (Finnish Social Science Data Archive (FSD))
This poster introduces Aila and Metka, two brand new tools for the Finnish Social Science Data Archive (FSD). Aila is FSD’s new web based customer service portal. It will be our main tool for data dissemination. Aila allows customers to search, browse and review our data descriptions on study and variable level, and after registration, download data directly from the service portal. We will demonstrate the functionality of the portal and share the experiences gained during the building phase and the first few months of operation. Metka will be FSD’s tool for managing metadata. Metadata will be entered to Metka, which in turn will feed other systems at FSD (e.g., Aila). This will greatly simplify building services based on our rich metadata as it allows repurposing metadata from a single authorative source. We will present Metka’s features and show how it connects to FSD’s services and other systems. The system will be operational in January 2015. Metka is open source and can be obtained from github. With Aila and Metka, we have defined the software platform FSD will utilise in building new tools for the archive and services for our customers. For example, Shibboleth will be used consistently for user authentication.
Training for an infrastructure: The CESSDA Training Centre
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Astrid Recker ()
Jessica Trixa ()
Dafina Kurti ()
The CESSDA Centre has the mission to create a virtual place where archivists, researchers and students of the wide field of social science and humanities data can find training, advice and educational resources. The Center has academic and service excellence on three subjects: Research Data Management, Data Discovery, and Digital Preservation. This thematic structure also allows us to expend to other societal stakeholders interested in data, such as NGO’s, journalists, government organizations, and the general public. The challenge of the training centre is not only to provide adequate services to the designated user communities but also to act as a connecting link between all experts and centres of excellence within the CESSDA infrastructure community. The establishment of collaborations with experts interested in training will allow the maximum flow of information and the constant improvement of quality of training. The target is to make the Centre into a reference point of infrastructure training for experts and users. Activities include coordination and centralized publicising of events, common training and online materials, as well as common research and publicising of activities, so that information reaches all relevant audiences.
Forging a community: Current developments in MTNA's OpenDataForge suite of applications
Andrew DeCarlo (Metadata Technology North America)
Statistical data exist in many different shapes and forms such as proprietary software files (SAS, Stata, SPSS), ASCII text (fixed, CSV, delimited), databases (Microsoft, Oracle, MySql), or spreadsheets (Excel). Such a wide variety of formats presents producers, archivists, analysts, and other users with significant challenges in terms of data usability, preservation, or dissemination. These files also commonly contain essential information, like the data dictionary, that can be extracted and leveraged for documentation purposes, task automation, or further processing. In 2013 Metadata Technology launched their new software utility suite, “OpenDataForge”, for facilitating the reading/writing of data across packages, producing various flavors of DDI metadata, and performing other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. Metadata Technology has continued to revise and expand the OpenDataForge product suite based on user feedback and new technologies. This presentation will focus on OpenDataForge and the updates, new products, and future products that are being developed in 2014.
Global popularization of e-Repositories in the marine information environment: Structures, goals, opportunities
Kateryna Kulakova (YugNIRO)
Up-to-date trends in marine information management, accomplished within the institute specialized in fisheries and oceanographic research, are studied. Possibility to take part in various international projects, which enable more digital output from the institution to be globally accessible and, vice versa, more external data to be fully retrieved, is considered. YugNIRO activities within the ASFA National Partnership are presented in relation to the FAO Secretariat recent requirements on record submission with regard to full text access and timeliness. YugNIRO achievements within the CEEMaR e-Repository are shown as the initiative for storage, conservation and global access of the institute’s born-digital and made-digital collections.
From curation to publication of DDI-L metadata
Jannik Jensen (Danish National Archive)
The DDA DdiEditor was developed as a generic suite of tools to produce DDI-Lifecycle metadata for survey data sets. The DdiEditor supports curation of variables, questions, codes, categories, instrumentation, universes and concepts as well as linkage of these meta elements. The DdiEditor has been supplemented by an indexing platform allowing the user to search (simple or advanced) in study description questions, variables, categories, universes and/or concepts. The study level description is publish as landing pages allowing the user interactively to move from one study description to another by clicking on a metadata element e.g. a keyword. Similarly the traditional codebook presenting questions, variables, universes, concepts and instrumentation has be enriched by interactivity and graphics for descriptive statistics for all variables. All metadata are published in a DDI-L XML format ready to import for external search engines and other services. Every study is given an persistent identifier in form of a DOI (digital object identifier). We would like to invite more organisations (that is other than DDA;) to try out the DdiEditor and indexing platform as a tool for producing and disseminating DDI-L metadata.
Your IASSIST today
Hailey Mooney (Michigan State University)
Paula Lackie (Carleton College)
Stop by this poster to speak with members of the IASSIST Administrative Committee. Learn about and create opportunities for professional development through involvement. Share your professional needs with us and help shape IASSIST for your needs!
IASSIST 2015 Minneapolis
Thomas Lindsay (IASSIST Treasurer)
The 41st annual conference of IASSIST will be June 2-5, 2015 in Minneapolis, MN, USA, hosted by the Minnesota Population Center at the University of Minnesota. Find out about all that the City of Lakes and the Twin Cities region have to offer at our conference presentation poster.
Variable shopping basket architecture: A hybrid approach for CNSS 2.0
Jeremy Williams (Cornell University)
Florio Arguillas (Cornell University)
The Cornell National Social Survey (CNSS), an annual survey conducted by the Survey Research Institute (SRI), is now on its 6th wave. As the designated repository of CNSS datasets, the Cornell Institute for Social and Economic Research (CISER) is responsible for creating, curating, and subjecting to disclosure avoidance techniques the public use and integrated versions of this series. In addition, CISER provides the mechanism for finding, discovering, and disseminating the series complete with a variable shopping basket that allows users to select and download only the variables that they need. In late 2013, CISER decided to update its variable shopping basket architecture to improve user experience and delivery of the collection. In this paper we describe in detail our updated infrastructure for the Cornell National Social Survey search, discovery, and download tool. At the backend we employ an XML repository housing DDI 3 XML files to deliver metadata contents to users and a relational database to deliver the data itself. Both the XML repository and Relational database can be queried using XQuery and SQL, respectively, through a CNSS API that we developed so it can be accessed from other computing devices.
Data management in the liberal arts: Current practices and attitudes at a Big 10 American University
Alicia Hofelich Mohr (University of Minnesota)
Thomas Lindsay (University of Minnesota)
The diverse nature of liberal arts research makes identifying needs and providing support for data management a complex task. Attesting to this diversity, in our survey of 29 departments at Minnesota, we found differing practices, attitudes, and awareness about managing data and research materials. Many respondents identified need for data management support across the research lifecycle, with the largest needs for data security, preservation, and sharing. However, there are striking differences across the social sciences, arts, and humanities in attitudes and perceptions towards data management and what it entails, perhaps due to differing requirements and cultures in the fields. These demonstrate that a one-size-fits-all approach to support data management is not appropriate for a broadly diverse liberal arts college; rather, the services we develop should be sensitive to discipline-specific needs. To further explore these possibilities, we plan to administer our survey to other colleges within the University. As Minnesota is one of the few Big 10 Universities to institutionally separate the liberal arts from sciences, comparing our results with other more disciplinary-specific colleges should allow us to evaluate the roles institutional organization and disciplinary expectations may play in the emergence of data management needs and support.
The American National Election Studies at 65: Looking back and moving forward
Darrell Donakowski (American National Election Studies)
Pat Luevano (American National Election Studies)
Jaime Ventura (American National Election Studies)
Laurie Pierson (American National Election Studies)
In 1948, under the direction of Angus Campbell and Robert Kahn, the Survey Research Center (SRC) at the University of Michigan, carried out what it viewed as a pilot study of the national electorate. 1948 provided the trial for the method and 1952 took that trial, embedded contending theoretical frameworks in the study and fine-tuned measurement. Those studies, 65 years ago, were the beginning of what is now known as the American National Election Study (ANES). This poster session will provide information on the history of the ANES, the advances in the understanding of politics due to the study, and the innovations that have come from the study. It will also present information on the difficulties that arise in conducting a longitudinal study of this nature and provide insight on how the ANES has worked to adapt to the constantly changing world of research in the social sciences.
In this poster the visitors are encouraged to bring their DDI-XML to test the community transformations in DDI-XSLT where you can leverage your metadata in other formats.
Samuel Spencer (Open Source Developer/Freelance Researcher)
A combined demonstration and poster session outlining the advantages of the Canard Question Module Editor the Simple Questionnaire Building Language. The Canard Question Module Editor is a free, open source questionnaire design tool, that allows for the drag and drop creation of rich questionnaires. Using the domain-specific and minimal Simple Questionnaire Building Language [http://sqbl.org] as its target language, Canard is able to support the transformation to numerous formats using XSLT that allows for custom import and export transformations. The minimal language and adherence to the principles of Structured Questionnaire Design, mean that routing is predictable and could be transformed into any format, as well as supporting real-time updates to questionnaires during creation to provide designers with functional example questionnaires and routing diagrams as they edit content. Demonstrations will include: * Import/export functionality to supporting standards including DDI Codebook and Lifecycle, CS-Pro, HTML and PDF. * Design of multilingual content supporting any number of languages, and checking for gaps in translations. * Live previewing of surveys including complex routing and filtering. * Drag-and-drop creation of routing logic, word substitutions and derived data elements. * Automatic creation of questionnaire metadata, including the creation of flowcharts illustrating respondent routing.
2014-06-06: 6T: Metadata Portal for the Social Sciences
Re-envisioning the ANES Data Collection Process
Darrell Donakowski (University of Michigan)
For over 65 years, the American National Election Studies (ANES) has produced high quality data on voting, public opinion, and political participation. Most recently, attention has turned to creating an environment that will allow us to not only produce high quality metadata for the research community, but to also better integrate the creation and use of metadata throughout the lifecycle of our research. On our own, we have taken steps to create tools that allow for more efficient questionnaire development using metadata. This presentation will provide a brief history of our efforts to provide the highest quality of data and metadata and how this current collaboration is further advancing these efforts.
This presentation will focus on DDI-based tools built at ICPSR to enhance discovery and understanding of ANES and GSS data, with special emphasis on the Reverse Universe Engineering (RUG) tool, an application that attempts to recreate skip patterns in a survey by analyzing the dataset using information extracted from the DDI metadata. A review of the tool development process will highlight some preliminary findings that led to a revision of its features and will include a discussion of the tool’s potential and its limitations.
Andrew DeCarlo (Metadata Technology North America)
Metadata is a fundamental component for any study, but it is hard to leverage if it is not comprehensive, of high quality, and captured using standard formats. For the META-SSS project, Metadata Technology North America focused on modernizing the metadata for the ANES and GSS studies, developing automated processes to clean and combine information from multiple sources, to produce accurate and robust DDI 2.5 files. These metadata are then in turn used to drive a new discovery portal aiming at increasing user efficiency by allowing them to quickly investigate study components, access documentation, and create data subsets. Open search and retrieval web services underlying the portal are also available to developers for driving other related tools or web applications. This presentation will outline the procedures used to create structured metadata and will display some of the tools created for the Metadata Portal.
New workflows to capture metadata
Tom W. Smith (University of Chicago, NORC)
Metadata are an essential part of any modern release of survey research data and DDI facilitates the construction of and access to such documentation. While originally a new, extra phase added on after the traditional data collection of data, metadata are now recognized to be an integral and integrated part of conducting survey research. Increasingly, it is possible to extract metadata during the survey research work flow as it unfolds, to retain that information, and to append it to the collected data at the end of the survey research cycle. The goal is to develop a series of procedures so that via the use of DDI metadata, surveys can be designed, collected, documented, and released as part of one seamless process rather than as a series of loosely-related, stop-and-go steps. This presentation will provide recommendations for changes to existing workflows to accomplish this goal.
2014-06-06: 6U: National Data Management Policies
National data management policies: a cross-national comparison of their impact on data archives and institutional repositories
Steven McEachern (Australian Data Archive)
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Jonathan Crabtree (Odum Institute)
Vigdis Namtvedt Kvalheim (IFDO)
Recent developments in government open data and open access policies have increased the emphasis on data management policies and procedures internationally. Shifts in practices have come at both ends of the research lifecycle. The expectations of funders have placed increasing demands on institutions to understand, support and manage the data outputs of projects within their institutions more effectively. Research funders have also increased emphasis on data management planning at the start of research projects, often as a condition of the provision of grant funding. This symposium, convened by the International Federation of Data Organisations, will explore the impact of these shifts in national policies on the policies and practices of two major groups in this space: data archives and institutional repositories. The panel will begin with a detailed analysis of the range of policies in existence internationally, drawing on the recent IFDO data policy survey. This is followed by presentations from four international speakers – two from each of the data archive and institutional repository communities. The panel will then conclude with an open discussion of the likely directions for both national data management policies and directions for supporting these policies in the archive and repository communities.
2014-06-06: 6V: Planning and Assessing Research Data Services
Now we are six: integrating Edinburgh DataShare into local and internet infrastructure
Robin Rice (University of Edinburgh, EDINA and Data Library)
Edinburgh DataShare, an institutional data repository, is six years old. It was built as a demonstrator in DSpace by EDINA and Data Library and has been given new life by the University of Edinburgh’s Research Data Management initiative. Following testing by pilot users in various departments last year, DataShare is confirmed as a key RDM service. Since 2008 much external infrastructure has grown around data sharing, and software developers, publishers and librarians are creating new innovations around the sharing and re-use of data daily. How can DataShare be shaped to fit in to this ever-more-sophisticated environment? A number of ongoing developments are helping us integrate the repository in the global context. DataShare is being indexed in Thomson-Reuter’s Data Citation Index. We aspire to attain the Data Seal of Approval for DataShare, a badge that confers trustworthiness through peer review. It is listed in re3data.org and databib registries of data repositories. We offer via extension, peer review of datasets to our depositors by listing journals that publish ‘data papers’ such as F1000 Research. Locally, as Information Services builds new data services such as the Data Store, [private data] Vault and the [metadata-only] Register, we can focus DataShare on its named purpose.
Building support for research data management: knitting disparate narratives of eight institutions
Natsuko Nicholls (University of Michigan)
Fe Sferdean (University of Michigan)
Katherine Akers (University of Michigan)
Academic research libraries are quickly developing support for research data management (RDM), including both new services and infrastructure. This presentation will include examples from eight universities (including our own institution) that characterize the approach, method and strategy that each institution has applied to the process of developing RDM support programs. We focus on the prominent role of the library in educating and assisting researchers with managing their data throughout the research lifecycle. Based on these examples, we construct timelines for each institution depicting key events and milestones over the course of building support for RDM, and we discuss similarities and differences among universities in motivation to provide RDM support, collaborations among campus units, assessment of needs and services, and changes in staffing. Our case studies will help to outline how academic libraries have generally articulated the conceptual foundations, roles, and responsibilities involved in RDM, while highlighting the specific ways in which we have used research findings to improve the four key RDM areas: 1) Education, awareness, and community building; 2) Infrastructure; 3) Policy and strategy; and 4) Consultation and services. Our findings should provide useful insights for other institutions that are considering or in the progress of developing RDM services.
Challenges in developing a new library infrastructure for research data services
Daniel Tsang (University of California Irvine)
John P. Renaud (University of California Irvine, AUL for Research Resources)
Two 2013 research data management reports, from ARL and from OCLC Research, point to the need to restructure research library data services. We discuss types of restructuring needed, utilizing in part the experience of the University of California, Irvine, Libraries, in establishing an E-Research Digital Scholarship Services component, but also analyzing the changes at other research libraries as they grapple with the new mandate to make research data more accessible, and all that implies. We also analyze recent job announcements to illustrate how the profession of social science data librarian has changed and the recruitment implications. No longer is it sufficient for us to just help academic users find datasets for secondary analysis. Rather, our roles have changed inasmuch as many of us may become active collaborators with faculty during the research life cycle. Not only must research data be aligned with research infrastructure but the traditional liaison model of social science librarianship needs to be enhanced or restructured. We discuss various strategies underway so that libraries can actively participate in the global data ecosystem.
Moving beyond research: building an enterprise data service from a research foundation
San Cannon (Federal Reserve Board)
Data play a critical role in fulfilling the Federal Reserve Board’s mission across a broad range of functions, including monetary policy, financial stability, supervision, consumer protection, and economic research. The current data environment was designed to allow business lines to manage relatively small and predictable data sets that required limited sharing across silos. The Office of the Chief Data Officer was created in May 2013 to address the data needs of the Board, post-financial crisis, with an enterprise focus and a clear set of mandates to enhance data governance, data management, and data integration. The OCDO started operations with a small staff of data management professionals who traditionally supported the research function and now must shift gears to provide data services to a broader base of users and a wider range of analytical work. New infrastructures, programs, processes and staffing are being developed and deployed to ensure that data needs across the lifecycle are met and that a variety of analytical approaches can be supported.
Web 2.0: A little less conversation, a little more action please
Margaret Ward (UK Data Service)
Jack Kneeshaw (UK Data Service)
Web 2.0 is far from a new idea and yet online social science archives, resources and tools have been relatively slow to turn it into something tangible. In 2014, the UK Data Service plans to introduce user ‘portfolios’ – an extension to the traditional user account – that will eventually allow registered users to interact with online content and with each other, collaborating with colleagues and sharing with the public. We know that many of our users are less the passive consumer and more the engaging contributor and that this shift will likely continue as younger cohorts become data professionals. Our planned user portfolio seeks to harness this new mood by, among other things, giving users the opportunity to: (1) create a public profile, incorporating other IDs (e.g., LinkedIn, ORCID); (2) make their current and past research projects more visible; (3) upload and share syntax/code; (4) comment, tag and add content on blogs, resources, records; (5) provide advice to other users; (6) store and share links to searches, resources etc. This presentation outlines the approach being taken by the UK Data Service – from an evaluation of needs through to technical implementation – and provides the planned timetable for delivery.
Brigitte Mathiak (GESIS - Leibniz Institute for the Social Sciences)
In this talk, we would like to present our current work for a virtual research environment designed to enable the collaboration of over 20 Social Scientists working on social indicators. The system centers around the storage and re-use of syntax files. Syntax may describe filtering processes, weighing schemes, data harmonization and, of course, the data analyze that leads ultimately to the publication of scientific insight. Sharing it is the only way to achieve consistency in results over a group of people. We will also shortly explore some of the challenges involved in syntax metadata and how some it can be extracted semi-automatically. From this starting point, we will argue that to re-produce results in scientific publications, we not only need the starting point of the research data used, but also the documentation of how this data was transformed into the numbers that we read in the paper.
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
With the beginning of this year DATORIUM, the data sharing repository of the Data Archive of GESIS – Leibniz Institute for the Social Sciences in Germany has gone online at http://datorium.gesis.org/. It enables researchers to manage their research datasets, document basic study level information, and publish their data and structured metadata on a single web-based platform. The metadata schema of datorium is compatible with other current metadata schemata in the area like DataCite, da|ra, and DDI-Lifecycle. Datasets in DATORIUM may be shared with others according to the needs of the researchers and they get assigned a DOI name so that other users can cite the source and give credit to the primary data collectors. In general, DATORIUM encourages the use of open licenses that make the data available for a broad number of purposes and to the widest possible audience. Alternatively, researches can use more restricted licenses if privacy issues or other restrictions do apply. In any case, DATORIUM provides a medium-term data availability and makes the datasets citable. As an addition, long-term archiving of the data can be achieved by using the standard archiving services of the GESIS Data Archive.
An overview of the translating research in elder care monitoring system (TMS) data platform
James Doiron (University of Alberta, Health Research Data Repository)
Shane McChesney (Nooro Online Research)
The initial Translating Research in Elder Care (TREC 1.0) (http://www.trecresearch.ca) research program was a 5-year (2008-2012), $4.7 million (CAD) CIHR funded project examining the effects of context upon resident and care provider outcomes in the Canadian long term care sector. A second phase of the project, TREC 2.0, is scheduled to commence in 2014.The TREC Measuring System (TMS) Data Platform project, a collaborative effort between the University of Alberta’s Knowledge Utilization Studies Program (KUSP) and Health Research Data Repository (HRDR), Metadata Technology North America (MTNA), and Nooro Online Research, focuses upon the application of DDI based metadata to TREC to support the automated collection/ingestion, quality assurance, harmonization and merging of TREC 2.0 data, as well the timely delivery of reports/outputs and real time ‘TREC-boards’ (dashboards) based on these data. This session will offer a comprehensive synopsis of the Data Infrastructure Platform project, including an overview of the TREC research program, their data types/sources, the HRDR virtual research environment which supports the project, challenges encountered, demonstration of tools, and how the project will serve as ‘proof of principle’ for a transferrable metadata driven management framework for application within future KUSP and HRDR housed research activities and beyond.
A review and redesign of Roper Center infrastructure
Elise Dunham (Roper Center for Public Opinion Research)
Cindy Teixeira (Roper Center for Public Opinion Research)
Like many institutions, the Roper Center has been considering its sustainability over time. Looking to the future, the rapid proliferation of data and emerging technologies in the research community will cause foreseeable pressure on the Center’s aging systems, processing workflows and limited resource allocation. In response to these challenges, the Roper Center, guided by consultant Ann Green, began “A Review and Redesign of Roper Center Infrastructure” in 2011 with funding provided by the Robert Wood Johnson Foundation. The project initiated with an internal review of operations as well as an environmental scan of standards and practices in the digital curation field. Based on this collaborative effort, Green developed a detailed report including a framework and recommendations to facilitate institutional changes. The Roper Center team is developing a new single-stream, DDI-compliant data processing infrastructure that will streamline quality review of incoming public opinion materials. By implementing a more policy-driven and standards-based workflow, the Center will ensure the long-term preservation of and improve access to our materials. This presentation will provide an overview of policy changes and technological improvements the Roper Center has executed based upon its workflow analysis and offer recommendations relevant to any institution considering undergoing an internal review process.
Working with data at its source: empowering social science researchers to share and document their data for archiving and discovery...
Ron Nakao (Stanford University)
Matt Marostica (Stanford University)
Enticing social science researchers to share their data has been significantly improved given the contextual changes spurred by grant foundations, associations, publishers, and government agencies. One major challenge is the documentation of the data. Ideally, the researcher would create the quality metadata that enables his/her data to be discovered, and re-used correctly. However, researchers often lack the time, expertise, or support to create quality metadata. Stanford University Libraries recently added the ability for researchers to self-deposit their data in the University’s Institutional Repository (SDR, Stanford Digital Repository). However, the metadata required for many social science studies goes far beyond that supported by the SDR self-deposit process. Librarians, faculty, and developers at Stanford have collaborated to try to address this issue via a Drupal-based web site: data.stanford.edu. We will share our experiences, assessments, and future plans.
CESSDA archives and research data management activities
Laurence Horton (The London School of Economics and Political Science)
Alexia Katsanidou (GESIS – Leibniz Institute for the Social Sciences)
Mari Kleemola (Finnish Social Science Data Archive)
Veerle Van den Eynden (UK Data Archive)
Alexandra Stam (Swiss Foundation for Research in Social Sciences)
Henrik Sejersen ( Danish Data Archive)
Annette Servan (Norweigian Social Science Data Services)
The Council of European Social Science Data Archives (CESSDA) is an association of European nation archives working in cooperation to develop a European wide data infrastructure. Part of CESSDA’s activities will be promoting Research Data Management training and support through and along with member archives. The aim of this panel is to provide a comparative forum, informing participants on funding environments and data sharing in different European countries thereby providing a cross-national perspective that can be sometimes ignored. This panel brings together representatives from CESSDA member archives to illustrate the social science data sharing requirements and reuse culture in their countries as well as display the work they are doing archiving data for reuse and providing Research Data Management support services. From this panel session we hope to identify commonalities, differences, obstacles, solutions, in the European experience and potential progress strategies for the CESSDA as it addresses the best way to coordinate European wide training and support on RDM, including support for European Union Horizon 2020 research projects. Although the session is focused on the European experience, it welcomes and strongly encourages international perspectives.
Addressing geospatial data needs: fashions and factions in GIS collection development
Maria A. Jankowska (University of California Los Angeles)
Andy Rutkowski (University of Southern California)
Geographic Information Systems (GIS) researchers and users have been at the forefront of creating and using data. GIS collection development within university libraries has evolved and advanced greatly as user demands have increased and become more specialized. Most major universities now either have a GIS specialist or a “data lab” with librarians. The push for addressing data at the university level, whether its data collection, management, or storage, has become common place. This presentation focuses on new and developing trends in users’ demand for geospatial data. We will address questions and issues concerning how these new demands impact academic libraries collection development policies. In particular we focus on four issues: A move from macro data collections to more micro-oriented collections. Nature of geospatial data. What structures are in place to provide access? Need for policy guiding collection development for geospatial data. We give a brief overview of the above issues and focus on the challenges that have persisted through users’ demand for geospatial data. The presentation will look at the challenges through the lens of two specific case studies: the University of California, Los Angeles at Charles E. Young Research Library and the University of Southern California, Los Angeles, GeoPortal.
Can you fix it? Yes, you can!: repurposing user support materials
Richard Wiseman (University of Manchester, UK Data Service, Mimas)
The UK Data Service is a resource funded to support researchers, teachers and policymakers who depend on high-quality social and economic data. It is made up of the former services ESDS, Census.ac.uk and the Secure Data Service. With the birth of the new service, we set about repurposing, as well as creating new content that met the needs of the UK Data Service. This presentation will discuss our approach to supporting our users with high quality and relevant materials within our website. These materials include dataset guides, videos, and slide-packs for teaching, as well as other web resources. In addition, we discuss how we have repurposed materials from webinars to reach a larger audience. We also discuss the steps taken to create these materials, practicalities and the lessons learnt, as well as our future plans.
Visualizing survey data: disseminating results from a population health survey on HIV and AIDS in Canada
Berenica Vejvoda (University of Toronto)
Dan Allman (University of Toronto)
Bharath Kashyap (University of Toronto)
Caroline Godbout (University of Toronto)
Effective knowledge dissemination of population survey results benefit the end user when results are engaging and visually appealing, as they enhance understanding and move research into action. In 2011, the CIHR Social Research Centre in HIV Prevention (SRC) at the University of Toronto’s Dalla Lana School of Public Health and the Canadian Foundation for AIDS Research (CANFAR) conducted a national population health survey to gain a better understanding of Canadians’ behaviours, attitudes, knowledge and perceptions of HIV and AIDS. To maximize the dissemination of these survey results, the team was funded by a Canadian Institutes of Health Research (CIHR) grant to build a prototype for an open source (or open access or non proprietorial) web-based data visualization tool. The interactive tool visualizes the survey data using both spatial and non-spatial elements and utilizes both Drupal and Google map and charts scripts. The tool is currently undergoing evaluation by its target knowledge users, which are staff at organizations that provide HIV and AIDS-related services across Canada. Future plans are to further build-out non-spatial visualization components as well as add additional data to the platform. This project involves a multi-disciplinary collaboration between public health researchers, geographers, librarians and professionals from community-based AIDS organizations. This session will describe the process of developing the data visualization tool; share the results from the evaluation data collected ; and discuss the challenge of designing a tool that engages users through an easily accessible and visually pleasing representation without losing the multidimensional complexity of the data.
The landscape of research data visualization and considerations for strategic interventions
Justin Joque (University of Michigan)
Data visualization has become an increasingly important part of working with research data. While universities, libraries and data providers are investing in data and visualization infrastructure, the term now encompass a broadening range of activities from the design of graphics for publication to real time rendering of terabytes of data in interactive 3D environments. Librarians are increasingly being asked to participate and support research data visualization, but the breadth of the landscape creates difficulties for developing services and meeting campus expectations around visualization. As libraries and academic institutions develop infrastructure, support and services for visualization, we will need to both understand the complexity of the broad space of visualization and strategically develop scalable support for a diversity of types of visualization. The range of activities that constitute visualization each have unique economies of scale and cultural practices that directly influence our ability to provide support. This presentation will attempt to describe the diversity of the academic visualization landscape and building on our experiences at the University of Michigan will suggest some of the opportunities and challenges provided by providing library support for data visualization.