Plenary 1: The History of Data: Technological Change and the Census, 1790-2020
Steven Ruggles ()
The most important innovations in data and data processing between 1850 and 1975 were responses to the needs of the U.S. Census, which posed the world’s greatest data-processing challenges for much of this period. Among other innovations, the U.S. Census led directly to the development of the first punch cards, the first highspeed card tabulators, the first commercial electronic computer, the first high-speed optical mark recognition system, the first large-scale publicly–accessible electronic data files, and the first digital street map. Today, census data are at the center of a big microdata revolution that promises to transform social science research. The Minnesota Population Center is leading a collaborative project to make the 2.2 billion records of U.S. census data collected from 1790 to 2020 available to researchers in a consistent format, and to link them across time and to other sources, including administrative records, vital records, surveys, and environmental data. The new data will allow extraordinary new opportunities for spatiotemporal longitudinal analysis, allowing researchers to trace individuals across the life course and families across generations, enabling the study of economic and geographic mobility, the impact of early life conditions on later outcomes, and the effects of policy on health and well-being.
Plenary 2: Measuring the Digital Divide: Using Existing Data Sources and New Data Collection to Understand Between-Country Differences
Curtiss Cobb (Facebook)
The Internet has already changed many aspects of peoples’ lives in developed economies and has provided far-reaching economic and social benefits. Extending these opportunities is critical to accelerating economic and social growth in developing economies as well. Many international organizations have set ambitious plans to promote Internet access globally; they pore over reports and expend considerable money, time and talent exploring new ways to connect the unconnected (e.g., blimps, drones, satellites). But raw enthusiasm and aggregate statistics fail to capture the reality of the digital divide in the developing world. Facebook’s commitment to connecting the developing world includes a desire to understand the complexity of the issue as it relates to the cultural, structural and technological inequalities between and within countries. This approach requires bringing together insights from large number of publicly available data sources that employ different methodologies to understanding the multi-faceted nature of the digital divide, even when the assembled sources of data reach different conclusions. In this talk, researchers from Facebook will discuss the difficulties and limitations often faced by aggregating numerous country-specific data sources together to measure the extent, cause and consequences of differences in Internet adoption between countries and populations. They will explain how Facebook evaluates the quality of existing publicly available data sources (e.g., national statistics, academic studies and industry reports), aggregates multiple sources to obtain relevant estimates and supplement data “holes” with original data collection efforts. The multi-faceted approach allows Facebook to conduct scalable and comprehensive comparative analyses at multiple levels, which in turn leads to more culturally-sensitive and context-specific approaches for bridging the digital divide.
Plenary 3: Politics of Open Data
Andrew Johnson (City of Minneapolis)
Hear from an elected official who oversaw the implementation of an open data policy in the City of Minneapolis. This presentation will discuss the dynamics and challenges of policy creation and passage, along with answering the critical question of “what’s next?” Including measuring success and evolving the policy, portal, and culture along the way. Andrew Johnson was elected November 5th, 2013 and is the first millennial to serve on the Minneapolis City Council. He brings a unique perspective as the first IT professional to serve on the council, with eight years of experience as a systems engineer at Target Corporation. Andrew believes in the importance of a transparent, responsive government. Nationally he pioneered the concept of a federal taxpayer receipt, a concept which was later implemented by the Obama Administration. He has been working with his colleagues on open data to ensure government information is more accessible to the public. Andrew has been a champion of this issue, with recent progress including the City’s launch of its open data portal at the end of 2014.
2015-06-02: Workshops
Hands-on Big Data
Ryan Womack (Rutgers University)
This workshop is for those of you who, having read about Big Data and seen some of its results in academic studies and the commercial world, would like to get a sense of what actually working with Big Data entails. The workshop will provide an overview of key technologies for the handling and analysis of large scale datasets, including Hadoop/MapReduce, the RHadoop package, other R packages used for large scale analysis, and Big Data handling environments such as Cloudera, Hortonworks, Tessera, and Amazon Web Services. We will also discuss a few of the primary challenges in successfully completing analysis of large scale data, such as integrating and structuring heterogenous data, handling sparse matrices, and devising effective analytical routines using parallel processing and splitting data. Participants will work with a live demonstration environment that provides a realistic introduction to Big Data Analytics using scripts that will run both on a scaled-down demonstration dataset and on truly large scale data.
Where Everybody Knows Your Name: Building Credible and Sustainable Data Services in a Liberal Arts College
Kristin Partlo (Carleton College)
Danya Leebaw (Carleton College)
Paula Lackie (Carleton College)
Peter Rogers (Colgate University)
Diana Symons (College of Saint Benedict/Saint John's University)
Aaron Albertson (Macalester College)
Providing data services within a liberal arts college setting presents unique challenges and opportunities. Residential liberal arts colleges are characterized by a focus on teaching undergraduates, small class sizes, and individualized support from staff and faculty provided with a fraction of the technical infrastructure of research institutions. This workshop will cover topics particularly relevant for those with emerging or established data services in a liberal arts college. Practicing librarian from four instituions will lead discussion and interactive activities designed to help participants learn more about the following as they pertain to the particular institutional context of liberal arts colleges: developing a sustainable and credible model, building on the strengths of a small community, outreach to faculty and students, identifying allies, empowering other colleagues to respond to data questions and needs, establishing data management practices, partnering with related campus initiatives like digital scholarship, integrating data into a traditional collection development model, and curating campus data projects. Participants will leave with strategies to advance data services on their own campuses. Beyond addressing these topics, an important goal for the workshop is for liberal arts data practitioners to build relationships with their colleagues at similar institutions.
Introduction to International Microdata: IPUMS-International and the Integrated Demographic Health Surveys
Lara Cleveland (University of Minnesota)
Patricia Kelly Hall (University of Minnesota)
Miriam King (University of Minnesota)
The IPUMS-International (Integrated Public Use Microdata Series -International) and the IDHS (Integrated Demographic Health Surveys) are international microdata dissemination projects of the Minnesota Population Center (MPC). IPUMS-International provides large samples of census microdata from 79 countries, from the 1960s through the latest census rounds. These records, covering over 500 million individuals, report on demographics, education, household structure, labor force participation, dwelling characteristics, and other topics. IDHS offers data on African and Indian women of childbearing age and children under 5, with information on health topics ranging from contraceptive use and prenatal care to HIV and intimate partner violence. Data from IPUMS-International and IDHS are ideal for comparative analyses across time and space. The user-friendly web interface shows variable availability at a glance, offers variable-specific information on question wording, codes and frequencies, and comparability issues, and merges files to create customized data extracts. This is a hands-on session that will introduce participants to the power and ease-of-use of IPUMS and IDHS. After an introduction to the datasets, participants will do a series of exercises to showcase the interactive metadata, customized microdata extract system, online tabulator, and classroom registration system.
Using NVivo 10 for Qualitative Data Analysis
Mandy Swygart-Hobaugh (Georgia State University)
Many social scientists like to “get their hands dirty” by delving into deep analysis of qualitative data – be it discourse analysis, in-depth interviews, ethnographic observations, visual and textual media analysis, etc. Manually coding these data sources can become cumbersome and cluttered – and may even hinder drawing out the rich content in the data. Through hands-on work with provided qualitative data, participants will explore ways to organize, analyze, and present qualitative research data using NVivo 10 analysis software. The workshop will cover the following topics: Coding of text and multimedia sourcesUsing Queries to explore and code dataCreating Attribute Value Classifications to facilitate comparative analysesData visualizations
The DDI Lifecycle metadata standard enables creating, documenting, managing, distributing, and discovering data. Colectica is a software tool that is built on open metadata standards, and helps facilitate adopting DDI into the research data management process. This workshop starts with a high-level overview of the DDI content model, and then teaches how to create DDI XML, both manually and with Colectica. Finally, participants will learn how to publish DDI metadata. This workshop covers the following topics: Introduction to DDI 3.2Introduction to ColecticaDocumenting concepts and general study designDesigning and documenting data collection instruments and surveysDocumenting variables and creating linkagesIngesting existing resourcesPublishing resourcesHands-on: use Colectica and DDI to manage a sample study
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Access Tools, and Long-Term Availability
Johanna Bleckman (ICPSR, University of Michigan)
Kaye Marz (ICPSR, University of Michigan)
nbsp;Federal data sharing requirements increase public access to federally-funded scientific data. For researchers, data sharing is a key resource in translating research into knowledge, policies, and practices. This workshop will assist participants in facilitating data sharing in the cycle of science that starts with deposited data, which through additional use, leads to the sharing of knowledge that inspires new data collection.nbsp; The workshop will cover several deposit options (to fully-curated archives and the public access archive, openICPSR), differences between sharing public-use and restricted-use data, and benefits to depositors through the ICPSR Website. A hands-on demonstration of making a deposit is planned. Finding data for the unique needs of a research project can be challenging, particularly in a world that values both the liberal use and protection of research data. The workshop will describe and demonstrate the array of discovery and exploration tools that leverage ICPSR’s vast data catalog, metadata, and online analysis options, discuss the discovery, use, and publishing from restricted-use data, and include group discussion of disclosure issues and hands-on time with ICPSR data tools. Participants will become more familiar with: Federal data sharing requirementsOptions for sharing dataData discovery toolsProtection of confidentiality when sharing data
While data access and research transparency are becoming standard practices across the social sciences, the transition has been easier in the quantitative tradition. In part this is because most scholars who use quantitative data and analytical techniques have long accepted the norm, even if they have not regularly complied with it. Standards for making quantitative data accessible are widely acknowledged, and substantial infrastructure for that sharing has been in place for many years.nbsp; The idea that qualitative data should be shared is much more recent and controversial. Part of the debate arises from the absence of widely shared understandings of the concrete operational practices for sharing qualitative data. Of course, many of the best practices for dealing with data that librarians, archivists, data center staff and other information professionals typically employ remain applicable. However, qualitative data present a variety of additional challenges due to their close proximity to the social world from which they were drawn. Their often-textual nature likewise poses special challenges to sharing, particularly internationally. The workshop highlights these challenges and provides a basic framework research data professionals can make use of when called upon to advise their user community about managing qualitative data. Workshop organizers are associated with the Qualitative Data Repository (QDR). Funded by the National Science Foundation, QDR was established in 2014 to provide the infrastructure to safely store and share qualitative data and to contribute to developing the expertise and tools needed to share such data.nbsp; Specific techniques, tools, and resources will be presented on the following topics: Planning to manage qualitative data before a research project beginsOrganizing qualitative data for analysis and writing, research transparency and potential sharingSharing qualitative data ethically and legally and in a way that facilitates broad international accessThe uses to which shared qualitative data can be put
New data from IPUMS-CPS, ATUS-X, and IPUMS-SESTAT
Sarah Flood (University of Minnesota)
Devon Kristiansen (University of Minnesota)
The IPUMS-CPS (Integrated Public Use Microdata Series –Current Population Survey), ATUS-X (American Time Use Survey Extract System), and IPUMS-SESTAT are microdata dissemination projects of the Minnesota Population Center (MPC). The IPUMS-CPS data project was recently expanded and now includes the March Annual Social and Economic Supplement data from 1962 to 2014 and CPS Basic Monthly Samples from 1989 to 2013. In addition to the Basic Monthly data, 13 supplements including the food security, veterans, fertility, tobacco use, and voter are currently available. The ATUS-X contains annual time diary data from 2003-2014, and includes newly available health and well-being data. IPUMS-SESTAT is a new project to make information about college graduates in the United States more easily accessible and includes data since 1993. All MPC Data are harmonized for consistency across time, fully documented, and easily accessible online for the research community.This is a hands-on session that will introduce participants to IPUMS-CPS, ATUS-X, and IPUMS-SESTAT with an overview the data available and topics of interest to research covered by these data. The presenters will lead attendees through a series of exercises to learn how to obtain the data, access the web-based documentation and metadata, and use the data in basic analyses. nbsp;
The Art of the Merge: How to Merge Data in Three Statistical Software Programs
Ashley Jester (Columbia University)
Tara Das (Columbia University)
Starr Hoffman (Columbia University)
nbsp;"I need to add additional years to my dataset for a longitudinal analysis..." nbsp;nbsp; nbsp;"I need to add additional variables to my dataset…" nbsp;nbsp; nbsp;"I’ve found all of my variables and need to bring them into a single file…" Have you ever heard (or said) this? Most researchers will need to merge data at some point in their research process as it is rare that all of the variables relevant to an analysis will be found in a single source.nbsp; This workshop will focus on merging datasets using three statistical software packages: Stata, R, and SAS. It will teach the basic research principles and data requirements necessary to execute a successful merge and will apply this knowledge. Instructors will provide sample datasets and guide participants step-by-step through preparing data, completing a merge successfully, and validating results. This will be of use to researchers as well as to librarians and others who support research. If you need to merge data or assist those who do, this workshop will give you the knowledge to make your data merge a success. Learning objectives: Able to execute successful data merges in Stata, R, and SASUnderstand general principles necessary to complete a data merge in any application
2015-06-03: A1: Secure virtual research environments
An Overview of the University of Alberta Health Research Data Repository (HRDR) Secure Virtual Research Environment
James Doiron (University of Alberta)
Located within the Faculty of Nursing at the University of Alberta, Canada, the Health Research Data Repository (HRDR) is a secure virtual research environment (VRE) developed to support the security, confidentiality, access, and management of health related research data. The HRDR's operational phase commenced in January 2013 and at the time of the writing of this abstract thus far has provided support to over forty-five multi-disciplinary and collaborative health related research projects, both quantitative and qualitative in nature, and with an excess of 125 users across local, national, and international institutions accessing these. Project level services provided by the HRDR includes such things as support for grant writing and ethics submissions; data management planning, guidance and training; comprehensive assessments for resource needs including security, project space set-up, access, and analytic software requirements; detailed user orientations; completion of privacy impact assessments; data acquisitions; and secure file transferring (ingests/extracts). Examples of health related research projects that have benefited from these services will be presented. Additionally, a brief overview of the development and current status of the HRDR, including its policies and procedures, technical infrastructure, and cost recovery model will be discussed.
An Overview of the University of Alberta Health Research Data Repository (HRDR) Secure Virtual Research Environment
James Doiron (University of Alberta)
Located within the Faculty of Nursing at the University of Alberta, Canada, the Health Research Data Repository (HRDR) is a secure virtual research environment (VRE) developed to support the security, confidentiality, access, and management of health related research data. The HRDR's operational phase commenced in January 2013 and at the time of the writing of this abstract thus far has provided support to over forty-five multi-disciplinary and collaborative health related research projects, both quantitative and qualitative in nature, and with an excess of 125 users across local, national, and international institutions accessing these. Project level services provided by the HRDR includes such things as support for grant writing and ethics submissions; data management planning, guidance and training; comprehensive assessments for resource needs including security, project space set-up, access, and analytic software requirements; detailed user orientations; completion of privacy impact assessments; data acquisitions; and secure file transferring (ingests/extracts). Examples of health related research projects that have benefited from these services will be presented. Additionally, a brief overview of the development and current status of the HRDR, including its policies and procedures, technical infrastructure, and cost recovery model will be discussed.
Improving Access to Documentation on Restricted Labor Market Data
Stephanie Jacobs (Cornell University)
Warren Brown (Cornell University)
The Cornell Institute for Social and Economic Research (CISER) and the Institute for Employment Research (IAB) of the German Federal Employment Agency have developed a data service enabling approved researchers to securely access confidential administrative microdata on labor markets. The data files contain detailed information on employment, unemployment benefit receipts, participation in labor market programs and registered job search, and a large number of socio-economic characteristics. Remote access to IAB's Scientific Use Files is available to any researcher approved by IAB, no affiliation with Cornell is required. This presentation demonstrates how researchers can use CED2AR to search through the metadata for IAB's Scientific Use Files (SUF) to locate variables of interest and other documentation essential to formulating and carrying out a research plan. nbsp;CED2AR, funded by the National Science Foundation (NSF) and developed by the Cornell Node of the NSF Census Research Network (NCRN), is designed to improve the discover-ability of both public and restricted data. The project is based upon leading metadata standards and ingests data from a variety of sources. The addition of IAB SUF to CED2AR enables researchers to search across multiple labor market data series including the US Census Bureau's LEHD.
Shared research data in academia is associated with considerable benefits. It makes studies reproducible and enables other researchers to ask new questions based on old data. Thereby data sharing in academia makes research more transparent and fosters innovation. However, curating, archiving and making data available for others is far from being the rationale for good scientific practice. The research project "Data Sharing in Academia" http://data-sharing.org aims to identify factors for efficient data re-use.nbsp; After a systematic review of scientific publications on data sharing and a qualitative analysis as part of the SOEP User Survey 2014, we conducted a quantitative survey of researchers from all disciplines (n ~ 1500). In this presentation we present a generic framework for data sharing in academia and the first results from our survey.
2015-06-03: A2: Life-cycle view of data management
A Web-based Data Management Tool for Collaborative Studies
Dafina Kurti (GESIS Leibniz Institute for the Social Sciences)
Alexia Katsanidou (GESIS Leibniz Institute for the Social Sciences)
In large-scale collaborative research projects, be it national or cross national data collections, each project phase - study planning, fieldwork, data processing and documentation, and data depositing - require a careful data management and a good coordination among different teams. The web-based data management portal will provide a virtual collaborative work space for researcher teams and different stakeholders of projects. By including modularised management tools, the platform will support research projects according to their requirements, allowing an easy, structured, time saving, and secure communication, workflow and data transfer, no matter in which phase of research life cycle the project is. We will present the final concept of this infrastructure with fully fledged user cases, which was developed based on the evaluation of a) practices in existing data management portals (EVS, ISSP, ESS), b) researcher needs (European Election Study, Eurofund), c) and the recommendations we worked out together with the project managers and principal investigators.
The ch-x Experiment: Building Experience in Good Data Management Practices
Alexandra Stam (Swiss Centre of Expertise in the Social Sciences (FORS))
The Swiss Federal Surveys of Adolescents (ch-x) are long established large-scale surveys conducted amongst 19-21 year old Swiss citizens. They consist of near full coverage of young men drafted to the army (about 40,000), as well as a sample of 2,000 women of the same age. While data service staff at FORS (Swiss Centre of Expertise in the Social Sciences) are usually occupied with the archiving of national survey data, they won the open competition to lead the 2016-17 ch-x edition. This was a great opportunity for FORS, not only to produce and make available fascinating data on youth mobility for secondary use, but also to strengthen staff expertise in key areas of data management. The ch-x project provides a unique opportunity to reflect on data management throughout the life cycle, whether by challenging accepted best practices or by better understanding pressures that prevent good data management. The presentation will address our data management practices during the first phase of the project, from study conception to the finalization of a lengthy paper questionnaire. Of particular interest is the aspect of documentation, and how our experiences of good - and bad - data management practices can benefit the larger community.
First Forays into Research Data Dissemination: a Tale from the Kansas City Fed
San Cannon (Federal Reserve Bank of Kansas City)
Deng Pan (Federal Reserve Bank of Kansas City)
The Federal Reserve System has a long tradition of doing economic research; each of the 12 Reserve Banks and the Board of Governors have research departments that together publish more than 400 working papers and journal articles annually.nbsp; Unfortunately, there has never been a tradition of regularly making the data from those papers publicly available.nbsp; A pilot program being undertaken by the Research Division at the Federal Reserve Bank of Kansas City aims to make such data available for research reuse.nbsp; Working as a pilot participant for a new dissemination platform, we have had to educate economists, build metadata specifications, recruit contributors, collaborate with technology and legal staff, and coordinate and build coalitions across multiple functions at our institution and others.nbsp; This presentation will outline the challenges faced and obstacles overcome as we worked to create the infrastructure and workflow, as well as starting the paradigm shift needed, to make research data publication a regular part of the research life cycle.
2015-06-03: A3: Enabling public use of public data
Enhancing Dissemination of Statistical Information in Uganda by Uganda Bureau of Statistics
Winny Akullo (Public Procurement and Disposal of Public Assets Authority)
A quantitative study was carried out to investigate ways of enhancing dissemination of statistical information in Uganda. The purpose of the study was to seek ways to enhance the dissemination of statistical information by Uganda Bureau of Statistics. The objectives were: to establish the extent to which statistical information is available in the country; establish the challenges UBOS faces in disseminating statistical information; establish the users' level of satisfaction in accessing statistical information; identify challenges users face in accessing statistical information; and propose strategies for enhancing dissemination of statistical information. 119 users of statistical information participated in this study and 17 UBOS staff were purposely selected because they are charged with dissemination of statistical information. Questionnaires and interviews were used to collect data. Data was analyzed using Microsoft Excel and presented in form of frequencies, figures and tables.nbsp; The study established that the UBOS website was one of main channels used for disseminating statistical information, however, it is inaccessible. Worse still, most of the publications disseminated cannot be found on the website information. Among the recommendations, UBOS should use as many available channels in disseminating statistical information in multiple formats and languages and establish regional resource centers. to increase accessibility.
This paper will discuss the experiences of the UBDC at the University of Glasgow in supplying "Open" data generated by the City of Glasgow council to public and academic users. Areas covered in this paper will include the standardisation of metadata, data linkage and visualisation services supplied to both communities and will include real world use cases of how the data has been used and how outreach to the public, specifically, was under taken.nbsp; In addition to this, the paper will also touch upon the utilisation of additional visualisation techniques such as a fully geo-referenced Minecraft Map of the UK to enable users to interpret data in a different manner from more traditional techniques.nbsp; Finally, the paper will cover additional examples of enabling the public to engage with data and how this growing field is key for academic and non-academic study to understand how better can be utilised to improve the understanding of the Urban environment and how individuals interact with it.
Finding Space in an Open Data World
Margherita Ceraolo (UK Data Service)
There is a global momentum toward open data, with national governments and IGOs such as the IMF, World Bank and UN embracing and promoting open data. As part of the UK Data Service's commitment to the principle that data which are publicly funded should be publicly accessible, the Service offers an increasing range of open data including UK census data, qualitative data, survey data-sets and international macrodata. In the open data environment, there is a need for the Service to continue to adapt and find innovative ways to improve and enhance the users' data experience. What are the benefits of accessing open data via the UK Data Service? With this question in mind, we have begun to explore ways in which we can develop our open data offering for the Social Science community.nbsp; This presentation focuses on the international macrodata provision and illustrates our approach to developing the delivery platform, UKDS.Stat, with the aim of making it an invaluable resource for anyone interested in international socio-economic data. It will describe the specific methods we are considering such as providing APIs, visualisation, integrating social media, and acting as brokers to highlight the impact of our data.
2015-06-03: A4: Training data users 1
Teaching Users to Work with Research Data
Sarah King-Hele (UK Data Service)
The UK Data Service is a resource funded to support researchers, students, lecturers and policymakers who depend on high-quality social and economic data. This presentation will discuss the methods we use to teach users about what data we have available and how to get started using the data for research.nbsp; Our approaches include a rolling series of webinars, face to face presentations and practical workshops.nbsp; We also provide online materials to help users to get the most out of the service.nbsp; These training methods allow the users to learn about our range of data and how to use them in a variety of formats so that we can provide support to meet the needs of different kinds of users.nbsp; We also discuss how this training programme fits in with other methods training and how we can develop it for new kinds of data and a wider range of users.
The Carrot: Outcomes from a Campus-Wide Grant Program for Creating Data-Driven Assignments
Katharin Peter (University of Southern California Libraries)
Efforts to embrace big data, data analytics, and data visualization methods often overlook the widespread need to develop foundational data literacy competencies.nbsp; This presentation will share the results of one university's efforts to promote data literacy through a competitive, campus-wide grant program for faculty implementing data-driven assignments in undergraduate courses.nbsp; As part of the grant program, 12 faculty from a variety of disciplines received support from instructional designers and data librarians to develop and implement data-driven assignments in support of their course learning outcomes.nbsp; This presentation will discuss the outcomes of the grant program as well as opportunities and strategies for promoting and supporting the creation of data-driven assignments.
Sustainability of Social Science Data Archives: A Historical Network Perspective
Kristin R. Eschenfelder (University of Wisconsin-Madison)
Kalpana Shankar (University College Dublin)
Greg Downey (University of Wisconsin-Madison)
Rebecca Lin (University of Wisconsin-Madison)
This paper will summarize preliminary results from a study to analyze the history of sustainability in social science data archives (SSDA). The purpose of the study is to draw out what sustainability challenges SSDA have faced and what strategies they have employed to remain sustainable and relevant given massive changes in technologies, users, data types, revenue sources and data markets.nbsp; The paper will summarize historical analysis of documentation from the 1960s to the early 2000s from ICPSR, the UK Data Archive, the Roper Center for Public Opinion Research.nbsp; The paper will also include a historical network analysis of interaction among SSDA as represented in full run of IASSIST Quarterly articles from 1960 to the early 2000s. The project's broader goal is to understand how the history of SSDAs can contribute to current conversations on the long term sustainability of other knowledge infrastructures.nbsp; To this end, the project seeks to address the successes and failures that SSDA experienced in trying to remain relevant and funded, and the longitudinal changes in relationships among SSDA as they have collaborated and competed to support research in the social sciences.
The Stakeholder Analysis for The Research Data Management Services for the Public Policy Researchers
Jungwon Yang (University of Michigan)
Since the National Science Foundation now requires a data management plan for proposed grant applications after January 18, 2011, many academic libraries have started to develop research data management service. One of the emerging issues related to the new service is the role of liaison librarian. Articles have noted that to enhance scholarly productivity, liaison librarians need to participate in the entire life-cycle of the research. Liaison librarians also need to be the team builder among library experts for an effective data management service. Yet, it is not clear how a liaison librarian can identify faculty's needs and who need to be the library team for a faculty's research data management, since the researchers' knowledge of data management varies across their personal experience as well as academic discipline. Moreover, the topic and scope of research will highly affect the decision of which library experts will be needed for the research's data management. Given these circumstances, the stakeholder analyses for the faculty will be useful for determining the scope and degree of library service. I will reportnbsp; how the stakeholder analysisnbsp; help me to customized data management service for the Public Policy faculty at the University of Michigan.
Developing Research Data Services Vision(s): An Analysis of North American Academic Libraries
Inna Kouper (Indiana University)
Mayu Ishida (University of Manitoba)
Kathleen Fear (University of Rochester)
Sarah Williams (University of Illinois at Urbana Champaign)
Christine Kollen (University of Arizona)
Many libraries are implementing or getting ready to implement research data services (RDS) (see, for example, http://www.acrl.ala.org/acrlinsider/archives/6297). Oftentimes, these initiatives are reactive, responding to pressures originating outside the library, such as national or funder mandates for data management planning and data sharing. To provide effective support for researchers, libraries must be proactive and develop a shared vision of what they are trying to accomplish. Can such a vision supersede institutional differences while still accommodating diversity in implementation?nbsp; In this presentation we discuss a set of vision statements grounded in an analysis of the drivers of RDS vision as well as libraries' current goals and activities in RDS. We developed these statements based on our examination of documents that advance the need for RDS, such as the funding agencies' requirements, the US Office of Science and Technology Policy memo, and the Canadian Tri-Agency's proposal of a data management plan mandate; a content analysis of North American academic library webpages; and interviews with library deans and other administrators. Finally, we describe how our five institutions are responding to this vision and how our implementations of the vision vary depending on the disciplinarynbsp; and institutional context.
A Coordinated, Decentralized Approach to Data Management Services: From Education to Everyday
Jon Jeffryes (University of Minnesota)
Alice Motes (University of Minnesota)
Amy Neeser (University of Minnesota)
Amy West (University of Minnesota)
We will describe the strategies, methods, and outcomes of the University of Minnesota Libraries’ coordinated, decentralized approach to providing data management education to library staff and users. This presentation outlines the educational challenges in navigating organizational structures, disciplinary commonalities/differences, staff training, and researcher training in a large research institution.nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; Staff are increasingly engaging in data management activities across the libraries system, including collaborative workgroups focused on data management, liaisons’ work with data producers, and library staff’s own research data needs. We will discuss how to coordinate these diffused activities without stifling flexibility or creativity and how to incorporate these practices into routine work. One challenge facing libraries has been disciplinary differences regarding data management and sharing practices. We will discuss what strategies and methods can address the commonalities and disciplinary differences of researchers’ needs.nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; Finally, we will discuss our approach to staff and researcher training as our data management services have grown and developed. We use scenario-based exercises, webinars, and user-facing workshops to incorporate this into everyday library work to better serve the research community. Through staff education, we are building capacity to train our researchers through workshops, data management consultations, and comprehensive data management plans.
Transparency from Scratch: Encouraging Openness and Enhancing Publications in Qualitative Political Science
Colin Elman (Syracuse University)
Diana Kapiszewski (Georgetown University)
The American political science community is engaged in a rigorous and wide-ranging conversation about research transparency, involving communities from across the epistemic spectrum. Broad consensus exists on the need for openness and for the project's general principles to be instantiated in research tradition-specific practices that preserve methodological diversity. Nonetheless, transparency is a novel project for most qualitative political scientists, requiring the development of new practices and strategies. This essay highlights the epistemic, intellectual, and sociological challenges of augmenting transparency in qualitative research -- and some of the pragmatic and operational difficulties of doing so. A central challenge is representing digital documents in on-line publications. We highlight an innovative transparency technique, active citation (Moravcsik 2010), which involves hyper linking citations to central or controversial text in a publication to an accompanying "transparency appendix" (TRAX). A TRAX comprises an overview of the trajectory of the research project underlying the publication, an excerpt from the cited source, an annotation identifying the micro-connection between the cited source and the textual claim, and ideally a link to/copy of the source itself. We conclude by discussing the implications for international scholars of more data being made available, and research being made more transparent, in this novel fashion.
Making Data Citation Connections
Anne Etheridge (UK Data Archive)
Melanie Wright (UK Data Archive)
The UK Data Service is exploring ways of citing data from study level to subsets of data to paragraphs of text.nbsp; We produce citations for each of our data catalogue records in Discover. Each citation includes a persistent identifier, created via DataCite, to give a unique access code for the data. We are working on downloading the citations in multiple formats and adapting the tools we have for our qualitative citations to make theses citations easier to find and use.nbsp; We have been working with the Research Data Alliance Data Citation Working Group to find the best ways to cite subsets of data and apply them to our Nesstar records and international macrodata.nbsp; We have tools to dynamically create a citation from paragraphs in qualitative text. Users select a passage and we then mint a unique identifier on the fly that can be used to cite, precisely, that piece of text. Others reading subsequent research can then go straight to that particular paragraph to read the text in context.nbsp; Our tools allow the citation to be simply copied and pasted into any reference list.
Bridging Disciplines: Assessing the Interdisciplinary Impact of Open Data
Robert R. Downs (Columbia University)
Robert S. Chen (Columbia University)
Freely disseminating scientific data can contribute to multiple disciplines across the physical, social, health, and engineering sciences. If the impact of data centers is not measured, stakeholders will not know whether data centers, archives, and libraries, and the data that they disseminate, are having a positive impact on the conduct of science. Data citations provide evidence on the use of data in various stages of the research process, including problem definition, statistical analysis, modeling, and validation. Measuring the interdisciplinary citation of scientific data disseminated by a data center can reveal the degree to which the data center is supporting cross-disciplinary research. Analysis of a decade of data citations demonstrates the interdisciplinary use of scientific data and the impact that one data center has had across disciplinary boundaries.
2015-06-03: B2: Web archiving, audio visual and image collections
Streaming Access to Oral History Data
Marion Wittenberg (Data Archiving and Networked Services (DANS))
DANS, the research data archive in the Netherlands, has a growing collection of audiovisual data. This includes the witnesses' stories of the Second World War Heritage Program, the Oral History Project Indonesia, and interviews with Dutch Veterans. The collection, with almost 2000 interviews, is accessed by various users. For privacy reasons not all datasets are open access. In my presentation I will introduce the way in which we treat the audio and video data, the difference between high-resolution archival storage and streaming access, restricted access control for privacy sensitive data and future plans for subtitle search.
Freedom on the Move: Discovering the Plight of Runaway Slaves in the United States
Ed Baptist (Cornell University)
Jeremy Williams (Cornell University)
Bill Block (Cornell University)
Slavery is one of the most traumatic and defining aspects of United States history. Despite this fact, there is a paucity of machine actionable data about the individuals who were bought and sold as slaves in the United States. Substantial information does exist, however, in the form of advertisements, placed by enslavers, in antebellum newspapers. These advertisements included any detail that might help readers identify the fugitive: the name, height, build, appearance, clothing, literacy level, language, accent and so on of the runaway, but are not in formats that are amenable to analysis. Led by Cornell University, Freedom on the Move (FOTM) is a comprehensive and highly collaborative effort to transcribe and parse an estimated 100,000 advertisements using OCR and crowd-sourcing to create a new academic data resource. The data is stored in a relational database which is described and published using DDI-DISCO and W3C-PROV ontologies. This paper will introduce the project, provide an overview of the system architecture, and describe how FOTM hopes to use semantic metadata to facilitate discovery by researchers, data citation, and interoperability with other datasets.
Web Archiving for Collection Development: Capturing Event Data on the Umbrella Movement
Daniel Tsang (University of California, Irvine)
Bibliographers have been slow to recognize web archive as a function of collection development, beyond personnel in Special Collections (archiving university domain) or government documents (archiving government sites).nbsp; Yet as more and more data is generated online, including in the social sciences, it is timely to look at how web archiving can fit into a collection development policy and be part of a selector's duties. I assess existing collection development policies on web archiving in selected academic libraries and national institutions.nbsp; This presentation focuses on archiving web content relating to Hong Kong's Umbrella Movement and asseses the complications of such an endeavor in collection development and what can actually be captured in a web crawl.nbsp; It evaluates the research value of such a collection while highlighting some key criteria for selecting sites to crawl. It discusses the issue of international crawling of sites in another country or region and the potential benefits and risks of such a project.nbsp; Finally it offers a case study of how social media can be captured and made accessible to researchers in years to come.
2015-06-03: B3: Systems and standards for metadata management
Aristotle Metadata Registry - A New Contender for Government Metadata Management
Samuel Spencer (National Health Performance Authority)
The ISO/IEC 11179 specification remains the gold standard in the definition of metadata registries. However, to date there have been relatively few open and conformant implementations. The AIHW METEoR metadata registry has a strong reputation as a leading, standards conformant and public facing registry for government metadata, however it growth has pushed it further than its original scope and technological base can support.nbsp; Based on the system architecture of METEoR, the Aristotle Metadata Registry is a rebuilt implementation that provides an free open-source, easy to install and scalable metadata registry. With an enterprise level search engine to improve discoverability, a thoroughly tested permissions suite that ensure security around of the publication of information, and rich authoring environment, Aristotle-MDR aspires to be the next in new phase of metadata registry.nbsp; The use of the Object-oriented principles of the Python-based Django web framework compliments the principle of extensibile metadata as described by the ISO/IEC 11179 standard. This design allows Aristotle-MDR to support the inclusion of third-party modules to provide additional metadata objects, including health indicators, datasets and questionnaires, as well a wide range of export formats such as Adobe PDFs and multiple versions of the Data Documentation Initiative XML format.
RAIRD: Implementing GSIM for Norwegian Administrative Registers
Arofan Gregory (Metadata Technology North America)
Ornulf Rinses (NSD)
The Generic Statistical Information Model (GSIM) is a conceptual model for describing statistical data and metadata, created by the UNECE's High Level Group. It is having a profound effect on standards such as DDI, and is being widely implemented by statistical agencies. One such implementation is the RAIRD project: a joint effort between the Norwegian Data Archive and the Norwegian Statistical Agency to provide online analysis tools for a huge set of Norwegian administrative data. Like many registers, much of the Norwegian data describes events ("event history data"), which is not well-described using traditional approaches such as those found in GSIM. The RAIRD project used it as the basis of a GSIM implementation, which involved extending and refining the GSIM model. This presentation shows how traditional data and metadata models can be extended to better describe administrative registers and the metadata needed by systems supporting their online analysis.
Metadata in Action: Driving TREC Survey Data Production and Dissemination
Shane McChesney (Metadata Technology North America)
In the context of the Translating Research in Elder Care (TREC) survey, we have been collaborating with Knowledge Utilization Studies Program (KUSP), part of the Faculty of Nursing at the University of Alberta, and NOORO Online Research, towards the establishment of a metadata driven platform for facilitating the production, dissemination, and analysis of the TREC2 survey data. The first wave of data collection is currently in progress. This presentation will demonstrate: (1) How metadata is leveraged to facilitate loading data into a MySql based data warehouse, enabling high performance access to all the survey program data. (2) Tools for exporting microdata subsets to statistical packages, in particular R, SAS, and SPSS, for computing/aggregating complex indicators or analysis by researchers (3) Bridging the platform with R-Shiny and R-Markdown, two open source products leveraging the R statistical platform, for the publication of data into dynamic web dashboards and the production reports .nbsp; This project is supported by the Canadian Foundation for Innovation (CFI).
2015-06-03: B4: Increasing openness and connections throughout the scientific workflow
Increasing Openness and Connections throughout the Scientific Workflow
Courtney Soderberg (Center for Open Science)
We can improve scientific communication to increase efficiency in the accumulation of knowledge. This requires at least two changes to the present culture.nbsp; One change is conceptual - embracing that progress is made more rapidly via identifying errors in current beliefs than by finding support for current beliefs.nbsp; Such a shift could reduce confirmation bias, unproductive theory testing, and the blinding desire to be right. The other change is practical - science will benefit from improving technologies to document and connect the entire lifecycle of research projects. This presentation will focus on the practical aspects, illustrated through the efficiencies gained via the Open Science Framework and its add-on connections to Dataverse and Figshare.nbsp; The presentation will specifically talk about how research support teams (ie. data librarians, repository managers, and others) can utilize these tools to help their users improve daily workflows.
We can improve scientific communication to increase efficiency in the accumulation of knowledge. This requires at least two changes to the present culture. nbsp;One change is conceptual - embracing that progress is made more rapidly via identifying errors in current beliefs than by finding support for current beliefs. nbsp;Such a shift could reduce confirmation bias, unproductive theory testing, and the blinding desire to be right. The other change is practical - science will benefit from improving technologies to document and connect the entire lifecycle of research projects. This presentation will focus on the practical aspects, illustrated through the efficiencies gained via the Open Science Framework and its add-on connections to Dataverse and Figshare. nbsp;The presentation will specifically talk about how research support teams (ie. data librarians, repository managers, and others) can utilize these tools to help their users improve daily workflows.
We can improve scientific communication to increase efficiency in the accumulation of knowledge. This requires at least two changes to the present culture. nbsp;One change is conceptual - embracing that progress is made more rapidly via identifying errors in current beliefs than by finding support for current beliefs. nbsp;Such a shift could reduce confirmation bias, unproductive theory testing, and the blinding desire to be right. The other change is practical - science will benefit from improving technologies to document and connect the entire lifecycle of research projects. This presentation will focus on the practical aspects, illustrated through the efficiencies gained via the Open Science Framework and its add-on connections to Dataverse and Figshare. nbsp;The presentation will specifically talk about how research support teams (ie. data librarians, repository managers, and others) can utilize these tools to help their users improve daily workflows.
2015-06-03: B5: Building on common ground: Integrating principles, practices, and programs to support research data management
Building on Common Ground: Exploring The Intersection of Archives And Data Curation
Lizzy Rolando (Georgia Tech Library)
Wendy Hagenmaier (Georgia Tech Library)
Research data management continues to emerge as a distinct information discipline with unique needs, policies and practices, but there are many ways in which it overlaps with the existing disciplines of records management and archives. Examining areas where policies, practices, and resources can be shared between them is increasingly valuable as the digital information universe becomes more complex. This session will examine those shared areas, highlighting efforts to engage with different information communities and programs. Kelly Chatain, Associate Archivist, University of Michigan, will present her work as an ‘embedded’ archivist within the Survey Research Center, focusing on records management tools and archiving principles used to facilitate a practical and cultural shift in the creation of data. Bethany Anderson, Visiting Archival Operations and Reference Specialist, University of Illinois at Urbana-Champaign, will discuss ways of integrating the work of academic archives and research data services to appraise, manage, and steward data. Research Data Librarian Lizzy Rolando will discuss Georgia Tech’s efforts to identify areas of convergence between the functional and policy requirements of a research data repository ecosystem and the requirements of a born-digital archives repository ecosystem.
Sharing Practice: Records Management in a Research Data Management World
Kelly Chatain (Institute for Social Research, University of Michigan)
Research data management continues to emerge as a distinct information discipline with unique needs, policies and practices, but there are many ways in which it overlaps with the existing disciplines of records management and archives. Examining areas where policies, practices, and resources can be shared between them is increasingly valuable as the digital information universe becomes more complex. This session will examine those shared areas, highlighting efforts to engage with different information communities and programs. Kelly Chatain, Associate Archivist, University of Michigan, will present her work as an ‘embedded’ archivist within the Survey Research Center, focusing on records management tools and archiving principles used to facilitate a practical and cultural shift in the creation of data. Bethany Anderson, Visiting Archival Operations and Reference Specialist, University of Illinois at Urbana-Champaign, will discuss ways of integrating the work of academic archives and research data services to appraise, manage, and steward data. Research Data Librarian Lizzy Rolando will discuss Georgia Tech’s efforts to identify areas of convergence between the functional and policy requirements of a research data repository ecosystem and the requirements of a born-digital archives repository ecosystem.
Bethany Anderson (University of Illinois at Urbana-Champaign)
Research data management continues to emerge as a distinct information discipline with unique needs, policies and practices, but there are many ways in which it overlaps with the existing disciplines of records management and archives. Examining areas where policies, practices, and resources can be shared between them is increasingly valuable as the digital information universe becomes more complex. This session will examine those shared areas, highlighting efforts to engage with different information communities and programs. Kelly Chatain, Associate Archivist, University of Michigan, will present her work as an ‘embedded’ archivist within the Survey Research Center, focusing on records management tools and archiving principles used to facilitate a practical and cultural shift in the creation of data. Bethany Anderson, Visiting Archival Operations and Reference Specialist, University of Illinois at Urbana-Champaign, will discuss ways of integrating the work of academic archives and research data services to appraise, manage, and steward data. Research Data Librarian Lizzy Rolando will discuss Georgia Tech’s efforts to identify areas of convergence between the functional and policy requirements of a research data repository ecosystem and the requirements of a born-digital archives repository ecosystem.
Sharing of data has become the norm in academia, thus requiring an infrastructure to manage it. As there are different platforms across which they are shared, interoperability is critical to ensure continued collaboration and data sharing across disciplines and institutions. The Data Documentation Initiative (DDI) standard will be examined regarding its use in Ontario universities and the Microdata Access programs at Statistics Canada. The development and current state of Best Practices and tools to ensure collaboration among the different institutions marking up data will be reviewed, including how the shared infrastructure has reduced the cost of data staging and vastly improved data access not only to our local communities of users, but also to user communities nationally and internationally. Nonetheless, challenges do still exist and these are mentioned as well as potential solutions. Finally, we will discuss the evolving state of best practices and suggest ways to move forward with the partnership of those responsible for tagging datasets.
Improving the Visibility of Data: a Case Study on An International Birth Cohort Survey
Hersh Mann (UK Data Service)
The Young Lives birth cohort study is an innovative project investigating the changing nature of childhood poverty in four developing countries. Released by the Economic and Social Data Service (ESDS) in 2006, 10 researchers accessed the data in the first year. In 2007 the depositors of the survey worked with the ESDS to implement a plan aimed at raising its visibility and increasing its use. The survey became an ESDS 'major study' with its own set of specialist support web pages and an immediate spike in usage followed: by the end of 2007, the study was attracting 40+ users per annum.nbsp; The ESDS (now UK Data Service) has continued to build on the relationship with the data producers and the Young Lives data portfolio now includes a teaching dataset, as well as new waves of data. At the end of 2014, users per annum now stand at 225, with a substantial proportion of these coming from the countries being studied.nbsp; This case study is just one example of how collaboration with the data producer can greatly enhance data visibility and use. It is an illustration of how data services can do much more than simply 'get, store, provide'.
The Makerere University Institutional Repository: Benefits, Challenges and Way Forward
Eric Haumba (YMCA Comprehensive Institute, Kampala)
Sekikome Patrick (Makerere University)
Universities function as focal points for academic research in Africa. Egwunyenga (2008) has attributed this to the fact that research is compulsory for lecturers and post graduate students by job description and mandatory academic requirement respectively. The nature of studies at Makerere University requires students to actively engage in research activities in partial fulfilment of the requirements of the degree being sought.nbsp; For academic staff, the concept of "publish or perish" has come to secure their promotion within the academic environment. Consequently, it is expected that, the volume of research output originating from the university addressing local problems in Uganda will continue to increase.nbsp; The research outputs addressing issues endemic to the region should be given wide circulation so that the results can be applied in addressing the issues that they sought to tackle. Unfortunately, these outputs gather dusts in departmental offices and library shelves without getting published (Gideon, 2008).nbsp; Subsequently, these findings die at the institutional level as those in need of this knowledge cannot access it due to institutional and external challenges associated with the Institutional Repository thus the need for an investigation into the practical benefits, challenges andnbsp; propose strategies for improvement. (Presented by WinnyAkullo-Nekesa)
2015-06-03: C2: The useful in-between: Where data, arts and humanities meet
The Useful In-Between: Where Data, Arts, and Humanities Meet
Justin Joque (University of Michigan)
David Pavelich (Duke University)
Heather Tompkins (Carleton College)
Margaret Pezalla-Granlund (Carleton College)
Kristin Partlo (Carleton College)
In the last 3-5 years data librarians, long accustomed to working primarily with social scientists and scientists, are increasingly called to work with people across the disciplines who are interested in using their data. This leads to many challenges, among them, discerning when a problem of access requires a technological, methodological, or cultural solution. Working across disciplinary boundaries also opens up new possibilities for engaging with data, by uncovering new uses for familiar data and by introducing new approaches of appreciating and critiquing our understanding of data and how we put it to use.nbsp; The presenters on this multidisciplinary panel will speak from their experiences in this fertile zone where data science meets the arts and humanities. A digital humanities librarian, a special collections librarian, a visualization librarian, and a curator of library exhibitions will each talk about their experiences reaching across disciplinary practices to get at and connect with data. Their case studies will shed light on common questions and experiences regarding working with new partners, managing expectations around such work, and helping patrons find data in places they may have never thought to look before.Heather TompkinsDigital humanities often results in the production ofnbsp; rich collections of digital objects, metadata, and data, but digital humanists may not always see this digital output as data. This space between digital humanities and data services creates an occasion for librarians in expanding conceptions of data on campus to include materials beyond quantitative information.nbsp; This work takes on additional pedagogical significance when mentoring and teaching undergraduate research assistants who will support faculty projects.nbsp; This presentation explores one approach for exploring this intersections betweennbsp; digital humanities and data services and raises questions about what DH can borrow from the tradition of data services in this area. David PavelichFor decades, archives and special collections libraries have been collecting data in diverse formats, sometimes purposely, sometimes incidentally. The content is equally, endlessly diverse, from diary reckonings of the value of slaves; to 19th century weather data; to unpublished financial data collected by twentieth century economists. Many such archival items (like ledgers) are passed over by researchers because of their complexities or inscrutability. However, these collections of under-explored data hold pedagogical potential for undergraduate (and even graduate) instruction. This paper offers a way for special collections librarians and data librarians to work together to teach students about using primary sources from two very different perspectives within the research library.Margaret Pezalla-GranlundMany artists are interested in the way information is represented, and explore techniques of visualizing data through their artwork. Some of the most interesting artwork about data gets to questions about how we read data, how it is understood (and misunderstood), and the possibility of uncertainty. Is there an art behind data? Can a graph be expressive? What can artists tell us about how we look at numbers? For this session, I will choose three key artist’s books to use as case studies to explore the ways in which artists visualize and interpret data.Justin JoqueFrom text mining projects to the creation of interactive websites, humanists are turning towards data as a way to understand and augment their research. Offering data visualization and mapping support as part of our Spatial and Numeric Data Services, we often assist on substantial portions of these projects. Especially as various sources and types of textual data, including those with interesting topological features such as link networks for websites, become available and methodologies for processing large corpora develop, humanists are increasingly using and thinking critically about data. The vast amounts of data that can be computationally processed are pushing the boundaries of what reading and analyzing textual information means in the humanities. This presentation will explore some of the interesting uses of data in the humanities we have developed and supported at the University of Michigan Library and the ways in which humanists along with data librarians are thinking about data and its relation to the humanities.
The road to data sharing is paved with good intentions: Looking at UK and German University Research Data Policies
Laurence Horton (London School of Economics and Political Science)
Astrid Recker (GESIS - Leibniz Institute for the Social Sciences)
Chloe Dumas (Enssib)
As of late 2014, 20 percent of UK Higher Education Institutions (HEIs) have adopted a Research Data Policy. In contrast, only one percent of German HEIs have adopted one. We examine policies in the context of national funder requirement differences and the overall research funding landscape.nbsp; Whereas recommendations exist on what should go into a policy, there is no analysis on what is going into policies. This presentation compares the content of policies from both countries for similarities and differences to see if -- regardless of the differences in the environments -- a standard form and language is emerging. The presentation will illustrate the adoption of two distinct approaches. The first is a 'general principles' approach. This policy is short, strong on the normative values for data re-use and preservation, and general goals, but weak on policy detail and enforcement mechanisms. The other approach is a formal "legalistic" style; it's longer, specific in requirements, strong in definitions, but not necessarily clear in direction or easy for researchers to work with. Policies are tested for type of university (research intensive vs non-research intensive institutions) and age (university cohort). The results of this research fed into LSE's own draft research data policy.
Data sharing Practices across the Social Science Disciplines
Amy Pienta (ICPSR, University of Michigan)
Data sharing has become an increasing important issue facing scientists in recent years.nbsp; And, understanding what kinds of factors affect data sharing behavior remains an important goal in informing those setting data sharing policy. The present analysis examines survey data ICPSR collected from social scientists in the United States who collected primary research data under funding from the National Science Foundation or the National Institutes of Health. Building on our prior work, here we examine whether certain social science disciplines embraced data sharing more than others early on. Results from multivariate regression models suggest political scientists and economists are most likely to share their data and psychologists and health scientists are the least likely. Implications for discipline-specific policies are discussed.
Data sharing Behavior: a Social Psychology Approach
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Previous work on journal data sharing focused on the relation between data policies and research data availability (Ghergina and Katsanidou 2013 and Zenk-Möltgen and Lepthien 2014). A clear literature gap is the omission of analyzing individual researcher intrinsic motivation for data sharing. Social psychology offers the analytical framework that allows us to investigate how personal beliefs can shape intentions of individuals and how these intentions influence their behavior. Based on the theory of planned behavior by Ajzen and Fishbein, which emphasizes the impact of peer group, this paper sets out to explain data sharing behavior by authors in political science and sociology journals. A set of authors of publications from pre-selected ISI indexed journals will be the sample for a survey conducted to explore the author's personal beliefs, intention and behavior regarding sharing the data their analysis is based upon. We hope to shed some light on a previously obscure component of data sharing behavior.
2015-06-03: C4: Data repository models and infrastructure
Developing a Repository Lifecycle Model and Metadata Description: Modeling and Describing Changes
Juliane Schneider (Research Data Curation Program, University of California San Diego)
In the past decade a wealth of data repositories and open datasets across all disciplines have been created. Registries of repositories have also been established, mostly by discipline (medical, social sciences) or by ownership (academic, governmental).nbsp; We have reached a point where a lifecycle model should be constructed for these resources, as well as a set of agreed-upon metadata to describe them. We will present our repository lifecycle model, and propose the most likely existing metadata schemas for constructing an overall description for repositories.
Research Data Repositories: Review of Current Features, Gap Analysis, And Recommendations for Minimum Requirements
Amber Leahey (Scholars Portal, Ontario Council of University Libraries)
Scientific reproducibility and data sharing are increasingly recognized as integral to scientific research and publishing, to ensure new knowledge discovery. This goes far beyond making data publicly available. It requires informed and thoughtful preparation from initial research planning to collection of data/metadata, considerations of interoperability, and publication in curated repositories. Research Data Canada (RDC) is a collaborative, non-government organization interested in access to and preservation of Canadian research data. The RDC Standards and Interoperability Committee (RDC-SINC) assessed 30 Canadian and International research data repositories for data transfer, storage, curation, preservation, and access. We identified data submission requirements, Standards, features and functionality implemented by the repositories, and performed a gap analysis. Results are discussed in light of current and evolving needs. Recommendations are made for minimum research data and repository requirements. Terminology used complies with RDC's new glossary of research data "Terms Definitions". This paper provides a practical multi-disciplinary compendium of core research data submission and repository requirements currently in use. Given this rapidly developing field, the paper will be updated just prior to submission.
Re-Shaping the Landscape of Research Data Repositories
Louise Bolger (UK Data Service)
ReShare is the UK Data Service's online data repository for archiving and sharing research data, produced by researchers, primarily for ESRC grant holders. It is designed for 'short-term management' whereby researchers self-deposit data and prepare data files themselves. However, ReShare's metadata profile and discovery system is fully integrated in UKDS. We optimise linkages with other systems to maximise standardisation, and minimise the 'burden' on depositors of completing a metadata record. Once depositors complete their project record, ReShare administrators conduct reviews to check for disclosure risks, and quality of documentation. Currently, there are 595 collections in ReShare, 500 of which were migrated from the ESRC Data Store, which ReShare replaced. To incentivise and reward depositors who provide complete well-constructed project records containing data and robust supporting documentation, we are introducing a quality mark on projects which meet the criteria.nbsp; This paper discusses the criteria for this quality mark in more depth, reflects upon the common issues faced in the review process of ReShare, and provides recommendations on how depositors can avoid making the common errors seen in ReShare. Finally, an overview of ReShare is provided, covering topics such as ReShare's purpose, functionality, and a reflection on a year of ReShare.
2015-06-03: C5: No tools, no standards - Software from the DDI community
Web-based Solutions for Data Archiving and Dissemination Using DDI
Adrian Dușa (Romanian Social Data Archive)
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community. Most of the presenters will be available during the subsequent poster session for detailed questions or further demonstrations of their tools.
TERESAH - Authoritative Knowledge Registry for Researchers
Johan Fihn (SND Swedish National Data Service)
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community. Most of the presenters will be available during the subsequent poster session for detailed questions or further demonstrations of their tools.
Building a community platform for DDI Moving Forward
Olof Olsson (SND Swedish National Data Service)
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community. Most of the presenters will be available during the subsequent poster session for detailed questions or further demonstrations of their tools.
Ingo Barkow (DIPF, German Institute for International Educational Research)
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community. Most of the presenters will be available during the subsequent poster session for detailed questions or further demonstrations of their tools.
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community. Most of the presenters will be available during the subsequent poster session for detailed questions or further demonstrations of their tools.
Easy DDI Organizer (EDO): Metadata management and survey planning tool based on DDI-Lifecycle
Akira Motegi (University of Tokyo, Institute of Social Science)
We will introduce a metadata managing software project launched at Social Science Japan Data Archive (SSJDA), which is affiliated with Institute of Social Science, the University of Tokyo. The aim of the project is to develop Easy DDI Organizer (EDO), a metadata managing software which helps researchers to edit and manage metadata based on Data Documentation Initiative (DDI) 3.1. The implementation of DDI 3.1 consitutes one of the greatest features of EDO, in which researchers can record metadata such as study purpose, sampling procedure, mode of data collection, questions, question sequence, variable descriptions, and bibliographic information along with the scheme of data lifecycle. File-import/export function is another salient feature of EDO. It supports importing variable level metadata from SPSS files and exporting codebook and questionnaire. Our poster session will introduce the context and features of EDO with some demonstrations provided. We will also discuss its future improvements such as public release and other function enhancements.
Benjamin Perry (Cornell University Institute for Social and Economic Research)
Venkata Kambhampaty (Cornell University Institute for Social and Economic Research)
Lars Vilhuber (Cornell University Institute for Social and Economic Research)
William Block (Cornell University Institute for Social and Economic Research)
The Comprehensive Extensible Data Documentation and Access Repository (CED2AR), is an online repository for metadata on surveys, administrative microdata, and other statistical information. CED2AR runs directly from DDI 2.5 through a single, non-relational database. While the DDI schema is well developed for documentation purposes, it is not ideal for semantic web applications. Using the schema.org microdata markup, CED2AR allows search engines to parse semantic information from DDI. The solution further enhances the discoverability of DDI metadata, as the data are machine readable to several providers such as Google, Yahoo and Bing. The schema.org markup is not directly embedded within the DDI, so it doesn't directly export when a user downloads a codebook. However, CED2AR can also run as a zero install desktop application. Users can simply download their own copy of CED2AR, quickly import codebooks, and instantly see the schema.org enhancements the system offers. The only prerequisites for the software is Java version 7, and an internet browser. This presentation will demonstrate the advantages schema.org adds to DDI, and the ease of deployment CED2AR allows.
ICPSR: A common data platform serviing unique data needs
Linda Detterman (ICPSR - University of Michigan)
ICPSR is a collection of data collections related to one another by a common data platform, common guidelines for data ingest and processing, and a common presentation of data for discovery (data catalog) and use by the data community. ICPSR's collections number almost 20 including a member (subscriber) collection, several agency and foundation collections, and most recently, a self-deposit, public-access collection. This poster will diagram and present ICPSR's shared infrastructure approach that enables research data collections to be unique from a common platform.
Using undergraduate students to provide data services
Julia Bauder (Grinnell College)
Since 2010, the Data Analysis and Social Inquiry Lab (DASIL, pronounced “dazzle”) at Grinnell College has employed peer mentors to provide certain data services to students and faculty. These peer mentors--undergraduate students with backgrounds in statistics, economics, geographic analysis, computer science, and/or qualitative analysis—provide services including drop-in assistance to students working on data-driven projects in economics, GIS, psychology, and other social science disciplines; preparing data visualizations and data-driven exercises for faculty to use in their courses and datasets for them to use in their research; and providing hands-on assistance during in-class labs that involve data. The poster will cover the who, what, where, when, and why of using this model to provide data services at Grinnell, and also discuss how this experiment has worked out for us.
The carrot that encourages data sharing and its support environment
Florio Arguillas Jr. (Cornell University)
Less than a year after its implementation, CISER's data curation and reproduction of results service, a carrot that encourages Cornell social science researchers to share code and data associated with their publication, has evolved resulting from experiences learned along the way. This poster discusses how the service works, the issues encountered and how they were resolved, the costs involved and the factors influencing the costs; how the cost of the service has given rise to other services and trainings (data management, curation, metadata creation, code documentation and version control) that shifted labor to researchers and/or their assistants; and the supplementary post-reproduction services to promote and increase discoverability of the publications and their associated files.
High value, high risk: Options for restricted data dissemination at ICPSR
Johanna Davidson Bleckman (ICPSR - University of Michigan)
As the body of social science research data grows, so does interest in obtaining data for replication and secondary analysis purposes. Research data involving individuals tend to be highly disclosive and often sensitive. Traditional methods of coarsening, truncating, or otherwise altering the data for public consumption limits both the utility of the study and the impact of the initial investment. There is growing interest in establishing innovative ways of sharing research data in its fullest form while minimizing disclosure risk and honoring confidentiality assurances given to research subjects. ICPSR employs three main methods of restricted data dissemination, and this poster will highlight these methods and demonstrate the features of the newest method, the ICPSR Virtual Data Enclave (VDE)-- a virtual computing environment offering full access to disclosed and sensitive restricted data that never leave ICPSR servers.
Bridging the administrative data gap: supporting researchers and data custodians at the Administrative Data Research Network
Kakia Chatsiou (Administrative Data Research Network)
Carlotta Greci (Administrative Data Research Network)
John Sanderson (Administrative Data Research Network)
Linda Winsor (Administrative Data Research Network)
Researchers go an extra mile to access administrative data in the UK: administrative databases are operational records and need to be processed and documented prior to using them for research. There is no support mechanism for users of administrative data -- there is researchers quite often don't know how to access and analyse administrative data and have to negotiate access with depositors themselves. Support staff working in the administrative data setting supporting researchers need a variety of skills, too. Having a good knowledge of the data is not sufficient - staff need to be approachable and proficient in translating requirements, mediating and negotiating access and thinking laterally to solve any problems arising. Finally, legal pathways to access for research purposes are still in development and quite often vary between different data custodians. The Administrative Data Research Network is a UK-wide partnership between universities, government departments and agencies, national statistics authorities, funders and the wider research community aiming to bridge these barriers. This paper will describe how the ADRN User Services team have been bringing together academic, government and third sector researchers, data custodians and data professionals to enable access of linked, de-identified administrative data for research in the UK.
Bridging the data divide: Economical repository management in the openICPSR Cloud
Linda Detterman (ICPSR - University of Michigan)
Organizations desire to meet changing federal research requirements requiring public access data sharing as well as satisfy the growing global call for replication and transparency of data analyses. However, organizations have limited resources to expend on user support, technical staffing, and the infrastructure development and maintenance needed to support public-access, data-sharing services. This environment has made hosted data sharing in the cloud a solid alternative to costly do-it-yourself (build-it-yourself) repository management. With a growing need at the organizational level for effective and experienced, but economically feasible data sharing, ICPSR researched and developed a public-access, data-sharing service for use by institutions and journals. This poster will highlight the research findings and demonstrate the features and benefits of openICPSR for Institutions and Journals, a fully-hosted, data-sharing service for use by organizations, departments, and journals of all sizes.
DANS Strategic Plan 2015-2020: Sharing data together
Ingrid Dillo (DANS - Data Archiving and Networked Services)
DANS promotes sustained access to digital research data. For this, DANS encourages scientific researchers to archive and reuse data in a sustained form, for instance via the online archiving system EASY (easy.dans.knaw.nl) and the Dutch Dataverse Network (dataverse.nl). With NARCIS (narcis.nl), DANS also provides access to thousands of scientific datasets, e-publications and other research information in the Netherlands. The institute furthermore provides training and consultancy and carries out research on sustained access to digital information. Elements in our new strategic plan 2015-2020 are: 1. the federated data infrastructure in the Netherlands 2. the care for living data and 3. the need to change the business model by charging institutions that deposit data in the DANS-repository for the basic storage costs. Driven by data, DANS with its services and participation in (inter)national projects and networks ensures the further improvement of access to digital research data. Please visit dans.knaw.nl for more information and contact details.
TERESAH - Authoritative knowledge registry for researchers
Johan Fihn (Swedish National Data Service)
TERESAH (Tools E-Registry for E-Social science, Arts and Humanities) is a cross-community tools knowledge registry aimed at researchers in the Social Sciences and Humanities. It aims to provide an authoritative listing of the software tools currently in use in those domains, and to allow their users to make transparent the methods and applications behind them. TERESAH has been developed as part of the Data Service Infrastructure for the Social Sciences and Humanities (DASISH), a Seventh Framework Programme funded project. DASISH collaborates with the five ESFRI Infrastructures in the field of Social Science and Humanities (CESSDA, CLARIN, DARIAH, ESS, and SHARE). The tools and knowledge registry is aimed at researchers from all disciplines and sectors, research infrastructure builders and users, as well as IT personnel. It aims to include information about tools, services, methodologies, and current standards and makes use of existing social media for dissemination and discussions. TERESAH is open source software and has been developed with a reusability plan in mind, meaning that anyone can install and run a TERESAH instance of their own with minimal effort required. This poster will give an overview of TERESAH's structure and features including live demos.
Overcoming issues and challenges facing social sciences data services in the near future
Maria Jankowska (University of California Los Angeles (UCLA))
Social sciences and humanities data services and collections are changing rapidly due to data intensive research, new research data drivers, volume, variety, and velocity of data. This poster focuses on libraries transition from the traditional model of data services focused mostly on secondary data to a new model supporting research communities in discovering primary, secondary data, and the stewardship of research data. Additionally, the poster present the challenges facing data services and collections librarians in the near future and proposes strategies in managing these challenges.
Understanding researcher needs in data management: A comparison of four colleges in a large academic American university
Lisa Johnston (University of Minnesota)
Carolyn Bishoff (University of Minnesota)
Steven Braun (University of Minnesota)
Alicia Hofelich (University of Minnesota)
Josh Bishoff (University of Minnesota)
The diverse nature of research makes identifying needs and providing support for data management a complex task in an academic setting. To better understand this diversity, we compare the findings from three surveys on research data management delivered to faculty across 104 departments in the University of Minnesota - Twin Cities campus. Each survey was separately run in the Medical School, the College of Liberal Arts, the College of Food, Agricultural, and Natural Resource Sciences and the College of Science Engineering and modified to use language that paralleled the different cultural understandings of research and data across these disciplines. Our findings reveal common points of need, such as a desire for more data management support across the research lifecycle, with the strongest needs related to preparing data for sharing, data preservation, and data dissemination. However, the results also reveal striking differences across the disciplines in attitudes and perceptions toward data management, awareness of existing requirements, and community expectations. These survey results can be used by others to demonstrate that a one-size-fits-all approach to supporting data management is not appropriate for a large research university and that the services developed should be sensitive to discipline-specific research practices and perceived needs.
DRUM-roll Please: Introducing an interdisciplinary data repository with a focus on curation for reuse
Lisa Johnston (University of Minnesota)
The Data Repository for the University of Minnesota (DRUM) (http://z.umn.edu/drumposter) is a service that launched in November 2014 enabling campus researchers to provide long-term, open access to their research data. DRUM reflects the Libraries' commitment to providing broad and enduring access to the intellectual output of the University. Making research data openly available in DRUM has numerous benefits, including: the ability to provide a persistent identifier (DOI) to data for citation purposes, compliance with data sharing and preservation requirements of funding agencies, and tracking the downloads of data in order to demonstrate impact. DRUM is one of a number of library data services that include support for writing data management plans (DMPs), training faculty, staff and students in data management best practices, and digital preservation and curation of the digital objects. This poster will describe the open source architecture behind the data archive, the policies for deposit, and the importance of acceptance criteria and curation actions that are taken to ensure that data are discoverable in a way that maximizes potential for re-use.
Telling tales: The power of data stories to illustrate and reach out
Inna Kouper (Research Data Alliance; Indiana University)
Monika Duke (Digital Curation Centre; University of Bath)
Sarah Jones (Digital Curation Centre; University of Glasgow)
In September 2014 a librarian asked for examples of good and bad data management practices on the JISC Research Data Management mailing list. The examples were to be used in training courses and engagement efforts. Following this exchange and several suggested links to examples, a session at the Research Data Alliance (RDA) plenary (a global organization to facilitate data sharing) further highlighted the need for a repository of such examples. The repository would serve as a community resource to promote best data management practices, but also serve larger goals of effecting change in cultures around research data. Responding to this need, the UK Digital Curation Centre and the RDA Engagement Interest Group are launching a service to collect and organize stories about failures and successes in research data management, sharing, and re-use. This poster will describe our effort to date with launching this service. We will outline the framework for organizing the stories and our initiatives to collect them. We will also share preliminary results from the first round of story solicitations and highlight the challenges of making the stories useful. We hope that the poster will stimulate a discussion about education, engagement and outreach in social science data exchanges.
Implementing a data citation workflow within the State Politics and Policy Quarterly Journal
Sophia Lafferty-Hess (University of North Carolina, Odum Institute for Research in Social Science)
Thu-Mai Christian (University of North Carolina, Odum Institute for Research in Social Science)
Journals are increasingly instituting data sharing policies to encourage replication and verification of research results. Workflows that support citing and archiving data alongside the publication of peer-reviewed articles can assist researchers in receiving scholarly acknowledgement for data products and ensure data are properly preserved. In this poster, we will summarize a project sponsored by the Alfred P. Sloan Foundation and the Inter-university Consortium for Political and Social Research (ICPSR) to implement a prototype data citation workflow within the State Politics and Policy Quarterly (SPPQ) journal publication workflow. The project developed a human-driven workflow to archive, share, and link underlying replication data to their associated scholarly publications. Through the development of the workflow, the project team examined some of the challenges and opportunities of integrating data archiving and sharing into existing publishing systems. This poster will present the prototype workflow and key lessons learned such as the importance of relationships and the challenge of working with multiple system.
National Archive of Data on Arts and Culture: A resource for researchers, policymakers, practitioners, and the general public
Alison Stroud (ICPSR, University of Michigan)
Amy Pienta (ICPSR, University of Michigan)
The mission of the National Archive of Data on Arts and Culture is to share research data on arts and culture with researchers as well as those not experienced with statistical packages, such as policymakers, people working for arts and culture organizations, and the general public. Funded by the National Endowment for the Arts, the infrastructure of this data repository within ICPSR is also designed to be accessed through a computer or mobile device. This poster will help attendees to see how ICPSR has designed a repository to increase data access to the arts and culture field, a field not known to have or use research data. Methods to discover and learn about data available to download from NADAC and various techniques to explore the data online will be described. It will also highlight several user-friendly tools for analyzing and visualizing data for this wider spectrum of experienced and novice data users from the community.
In today’s digital world, the creation of media is increasingly easy. In a University environment, managing that content is getting increasingly complex. For researchers, materials may at first be shared with only a select few, but eventually, they will be more broadly available. Some content will need to be preserved for the long term, other items will have a very short shelf life. The media that is managed as close to the period of creation, with the proper description by the creators, has a much better chance of being useful over its life. There are many solutions being made available through the Libraries, IT and commercial entities, but rarely do these options offer customizable metadata templates, granular access controls, and superior searching options. Elevator is one such option. It offers a flexible framework that allows content to move from a privately managed collection into more curated collections that can be more widely searched. Based on open source tools, and backed by Amazon cloud storage, Elevator is a powerful, easy to use, digital asset management tool that works for most types of research and teaching materials.
I teach a 1-credit hour undergraduate course on how to use Stata for empirical research in economics at the University of Notre Dame. I will share my experience teaching this course on these fronts: i) course logistics, ii) student outcomes, and iii) lessons learned: what worked well and what did not. My aim is twofold: to offer information that data librarians and instructors may find useful for their own data literacy initiatives, and to learn from the experiences of conference participants whose work involves promoting data literacy."
Co-circular RDM: A pilot service for graduate students at the University of Toronto
Andrew Nicholson (University of Toronto)
Leslie Barnes (University of Toronto)
Dylanne Dearborn (University of Toronto)
The University of Toronto Libraries (UTL) designed a co-curricular Research Data Management (RDM) workshop aimed at introducing graduate students to RDM principles and best practices. This poster will outline our method of developing the workshop and will detail the preliminary results gathered through student feedback. Findings presented will include the domains and divisions expressing interest in such a workshop and what RDM facets or areas of support have increased demand for RDM services at the University of Toronto. Envisioned as part of a larger initiative to supplement gaps in graduate professional skills training and resources, this workshop is an experiment in linking instruction and RDM service development in a large, distributed research university. Key areas covered include a research data overview and best practice pointers for collecting, describing, storing and sharing research data, with an emphasis on creating sound data management plans. Graduate students also learn about emerging research data policies in Canada, as well as RDM requirements implemented by other funding agencies and publishers. This workshop is being offered through the University of Toronto’s Graduate Professional Skills (GPS) program, which provides graduate students with training in areas such as teaching and advanced research for co-curricular credit on their transcript.
Transforming DDI-L into various formats via stylesheets. This technology is currently in use at the Danish Data Archive, the Swedish National Data Service and the DDI Alliance. The poster will bring updates from the DDI-XSLT platform including METS, DISCO and other formats. With the release of DDI Discovery Vocabulary (DISCO) the project will display its mapping from DDI-L to DISCO. In this context the project is reaching out towards a JSON mapping and invite participants to come and join an informal idea generation for scope and purpose of a future JSON mapping of DDI-L. For online content see: https://github.com/MetadataTransform/ddi-xslt/wiki/ The stylesheets are released as LGPL software and are available for public and commercial use - download at: https://github.com/MetadataTransform https://github.com/linked-statistics/DDI-RDF-tools.
Someday we'll find it, the data connection: Information literacy frameworks and data
Kristin Partlo (Carleton College)
Lynda Kellam (University of North Carolina at Greensboro)
Hailey Mooney (Michigan State University)
What implications does the Association for College Research Libraries' (ACRL) new Framework for Information Literacy for Higher Education have on our approach to teaching about data? Building a strong data literacy instruction program involves bridging the language, standards, and goals of data science, statistical literacy, and information literacy. Practicing instruction librarians are likely to draw on concepts, skills, and competencies from across these areas. The holistic nature of the Framework situates students as both consumers and producers of information, which ties closely to data focused learning outcomes such as those based on competencies from the Data Information Literacy program. This poster will investigate this and other parallels and divergences between the Framework and learning outcomes in data and statistical literacy. Participants will be invited to share their views and reactions on how the Framework addresses the data information literacy needs they encounter in their work.
Visualizing social science research in an institutional repository
Ted Polley (Indiana University-Purdue University)
Using text mining and visualization techniques to identify the topical coverage of text corpora is increasingly common in a number of disciplines. When these approaches are applied to the titles and abstracts of articles published in an academic journal, it yields insight into the evolution of scholarly content in the journal. Similarly, text mining and visualization can reveal the topical coverage of items archived in an institutional repository. This poster will present initial results from mining the text and visualizing the abstracts of social science research in one university’s institutional repository. Generating a topic map visually demonstrates how research in a repository clusters around specific domains in the social sciences. These topic maps are potentially useful to librarians and researchers seeking to learn more about the topical coverage of items in their repository and determine if the research is reflective of the scholarly output from an institution more broadly.
How Canadians access Statistics Canada research data
David Price (Statistics Canada)
Donna Dosman (Statistics Canada)
Statistics Canada has been providing access to Canadian research microdata for 18 years. This session will explore the principles through which Statistics Canada will allow access to information, governance of the access programs, the different types of research data that are available and technological solutions used to implement researcher access to microdata. Starting with the Data Liberation Initiative and how Public Use Microdata files are distributed and used in the research community. Then following up with a detailed look at how research data can be accessed through the Research Data Centers and the Real Time Remote Access system.
Archival GIS: Discovering gay LA through Bob Damron's address guide
Andy Rutkowski (University of Southern California)
This poster outlines the development of a mapping project focused on using archival material from the ONE Archives at the University of Southern California. The project began bynbsp;taking one item - Bob Damron'snbsp;address book with gay-friendly bars and other destinations - and mapping out those locations. These locations were used as anbsp;starting point to explore and mapnbsp;other archival holdings.nbsp;As more locations were mapped from subsequent address booksnbsp;opportunities arose for spatialnbsp;analysis. The poster also discussesnbsp;the possibilities of using GISnbsp;as an approachnbsp;for introducing students to archival collections. #mappingtheone .
Solr Cloud-working for the UK Data Service: problems experienced and a different way forward
John Sexton (UK Data Archive, University of Essex)
Solr Cloud working is supposed to provide a fault-tolerant, high availability system, providing a stable platform for blazing fast full-text and faceted searches. A Solr Cloud implementation would typically consist of (x) number of Solr servers (where indexed data is held), (x1) number of zoo-keepers servers (used to share data between Solr servers and maintain system state) and a load-balancer server (providing an evenly distributed load). The implementation of this type of system in the UK Data Service has proved to be anything but this, with indexes frequently getting out of synchronization and an extremely high maintenance overhead required just to keep the service stable and usable. A simpler, more maintainable system was required; simple replication. A single master server provides the source for the indexed data, with a number of slave servers automatically replicating this. Simple load-balancing across the slaves is achieved with the use of a light-weight in-house software component. Extensive testing has proved that there is no detrimental effect for the speed of retrieval for queries, the system is very stable and more reliable, the maintenance overhead very low and the flexibility for alternative configurations greatly improved.
Insights into data recovery projects: A case study from the Roper Center
Cindy Teixeira (Roper Center for Public Opinion Research)
Recently, the Roper Center staff embarked on one of the largest data and documentation recovery projects in its history. In 2014, our host institution announced it was decommissioning the local IBM mainframe system over the next year. The mainframe had been the main access point to our IBM 3480 and 3490E Tape Cartridge collection, our main storage solution from 1977-2001. With limited descriptive and technical metadata, the Roper Center recovered over 30,000 files including data, documentation, published materials and work process files. Based on this case study, this poster will identify the unique challenges and explore important factors to consider in planning a large scale data and documentation recovery project. We will discuss the various techniques and solutions used during file recovery process, and share our recommendations for any organization engaging in such a project.
DLF E-Research Network: Developing a sustainable community of practice for research data services
Rita Van Duinen (Council on Library and Information Services / Digital Library Federation)
To address the need for libraries to be engaged in developing e-research services the Council on Library and Information Resources (CLIR) and the Digital Library Federation (DLF) have developed the DLF E-Research Network; a peer-driven mentoring group focused on sharing information on implementing research data management services as well as on participant-directed learning and shared skill development. Launched in 2014, institutional teams from academic libraries in the US and Canada participated in the DLF E-Research Network, a program aimed at building a mutually supportive community engaged in continuous learning about e-research support. Through a series of in-person meetings, webinars, practical activities, and virtual discussions participating institutions were able to evaluate, refine and further implement research data services. The E-Research Network provides members working in research libraries and data centers the opportunity to develop data management strategies, policies, tools, and services. This poster will describe the DLF E-Research Network, its membership benefits and contributions to the field, as well as highlight participant’s objectives and outcomes. The poster will also demonstrate how the E-Research Network is designed to help build the capacity needed for professional development in research data services.
A. Michelle Edwards (Cornell University, Cornell Institute for Social and Economic Research)
The journey of a data librarian or data specialist is certainly not a straight one, but one that can be very winding and extremely exciting and challenging all at once. If we follow the experiences of many data librarians, we can see a trend that closely mimics that of the data lifecycle. Whether you are the "accidental" data librarian or that individual where data is merely one of many hats, or the experienced data specialists, we see many common threads. We embrace the new challenge that data present (concept), we learn everything we can about that challenge (collection), we develop new skills (processing), often very unique skills, we develop dynamic services to conquer the challenge (distribution, discovery), we evaluate the service (analysis), and then we look forward to the next challenge (concept) that is already knocking down our doors. But, what happens when you change departments within your institution or you change institutions? Can we repurpose what we have learned and created? The goal of this paper is to present the approaches taken when a data librarian engages in the "repurposing" stage of the Data Librarian LifeCycle.
Data professionals' training challenges in dynamic work environments
Adetoun Oyelude (University of Ibadan)
The use of Information and Communications Technology (ICT) by various data professionals like data scientists, data curators, data librarians, data archivists and others has been the focus of researchers worldwide in the past few decades. Workspaces, workplaces and workflows are evolving daily and oftentimes struggling to cope with the emerging technologies. In their various functions in workplaces where career advancement is a sign of progress, it is required that data professionals be further trained, and with enhanced skill, move up the career ladder. Training of data professionals to meet the expectations of the work environment is a thus a challenge. The challenges faced by data professionals in the course of training themselves are the focus of this paper. Using extensive literature review, and survey methods of gathering data, ten data professionals working in different types of work environments were interviewed about the challenges they faced in training. The challenges they identified and described as well as solutions to the challenges proposed are discussed. Recommendations are made on ways in which future challenges can be surmounted especially in the face of the dynamic nature of the technology driven work environment.
Comparing policies for open data from publicly accessible international sources
Line Pouchard (Purdue University)
Megan Sapp Nelson (Purdue University)
Yung-Hsiang Lu (Purdue University)
The Continuous Analysis of Many Cameras (CAM2) project is a research project at Purdue University for Big Data and visual analytics. CAM2 collects over 60,000 publicly accessible video feeds from many regions around the world. These data come from 10 national and international sources, including New York City, the city of Honk Kong, Colorado, New South Wales, Ontario, and the National Park Service. These video feeds were originally collected for improving the scalability of image processing algorithms and are now becoming of interest to ecologists, city planners, and environmentalists. With CAM2's ability to acquire millions of images or many hours of videos per day, collecting this large quantities of data raises questions about data management. The data sources all have heterogeneous policies for data use. Separate agreements had to be negotiated between each source and the data collector. In this paper, we propose to compare data use policies that are attached to the video streams and study their implications for open access. One restriction is that some sources limit the longevity of the data. As the value of this data becomes realized over the long term, issues of storage capacity and cost of stewardship arise.
Since 2010, my partner and I have been recording information about the wine that is consumed by our household, making a point to gather data about each unique bottle. Together, we have accumulated detailed information about over 400 bottles of wine, including tasting notes, varietals, origin, and importer. While this data collection was originally intended to keep us from buying “bad” wines again, it has turned out to be a rich trove of information about the varietals we like, the importers we can trust, and the years that have proven to be good vintages. This Pecha Kucha will present an overview of this data, revealing both some of the substantive findings from our dataset and also the methodologies that have been applied to create the analysis. This Pecha Kucha will be a quick and fun tour of the international landscape through the lens of wine, with a focus on finding out the best way to use data to make more informed consumption choices.
The Data Service Centre (DSC) at Statistics Netherlands: Storing and Exchanging Statistical Data and Metadata
Harold Kroeze (Statistics Netherlands)
The Data Service Centre (DSC) is the central repository for datasets across the entire statistical field of Statistics Netherlands (SN). Its purpose is to archive the datasets as well to enable easy, secure and monitored exchange of data and metadata. The DSC has the following characteristics: - Metadata first, data second. - Datasets are stored as text files (csv or fixed-width) and are described according to a metadata model. - Public access to metadata within SN, data access only after authorisation by data owner. - Service-oriented approach: the backend system uses web services for communication with client tools. - It promotes re-use of variables and definitions. An organisation-wide project ("The treasure chest unlocked") was set up to describe and store the microdata sets that form the basis of our published data. The project also produced a number of tools to manage metadata and data (for example a Metadata editor and a Catalogue). This resulted in a very noticable increase in the volume of metadata and datasets stored at the DSC. Data and metadata can now easily be shared within the organisation, but also with external researchers through our remote access facility.
2015-06-04: D2: First products of the research data alliance (RDA): Foundations
Overview of Data Foundations and Terminology (DFT) WG/IG
Gary Berg-Cross ()
In an era of Big Data we still lack widely used best practices and a common model in key areas of data organization. Without common terminology used to communicate between data communities, some aspects of data management and sharing are inefficient and costly, especially when integrating cross-disciplinary data. To advance the data community discussion towards a consensus core model with basic, harmonizing principles, we developed basic terminology based on more than 20 data models presented by experts coming from different disciplines and about 120 interviews and interactions with different scientists and scientific departments. From this we crafted a number of simple definitions around digital repository data based on an agreed conceptualization of such terms as digital object, persistent ID, state information, and digital repository.
Computer-actionable policies are used to enforce management, automate administrative tasks, validate assessment criteria, and automate scientific analyses. The benefits of using policies include minimization of the amount of labor needed to manage a collection, the ability to publish to the users the rules that are being used, and the ability to automate process management. Currently scientific communities use their own sets of policies, if any. A generic set of policies that can be revised and adapted by user communities and site managers who need to build up their own data collection in a trusted environment does not exist. Thus, the goals of the working group are to bring together practitioners in policymaking and policy implementation; to identify typical application scenarios for policies such as replication, preservation, etc.; to collect and to register practical policies; and to enable sharing, revising, adapting, and reuse of computer-actionable policies. This presentation will provide an overview of the working group and its activities, including a recent survey to elicit the types of policies currently being enforced as well as policy areas considered to be the most important.
Larry Lannom (Corporation for National Research Initiatives)
Daan Broekder (Max Planck Institute for Psycholinguistics)
Giridhar Manepalli (Corporation for National Research Initiatives)
Allison Powell (Corporation for National Research Initiatives)
A Data Type Registry provides a way to easily register detailed and structured descriptions of data that can range from simple single value elements up to complex multi-dimensional scientific datasets. The benefits of registration include enabling those who did not create a given instance of typed data to understand and potentially reuse it, to encourage others to use established data types in their own data collections and analysis efforts, and to build services and applications that could be applied to standardized data types. This presentation will focus on the need for and advantages of Data Type Registries and will provide a demonstration of the latest version of a registry prototype.
Alex Ball (Inter-university Consortium for Polical and Social Research (ICPSR))
Jane Greenburg (Inter-university Consortium for Polical and Social Research (ICPSR))
Keith Jeffery (Inter-university Consortium for Polical and Social Research (ICPSR))
Rebecca Koskela (Inter-university Consortium for Polical and Social Research (ICPSR))
The RDA Metadata Standards WG (MSWG) is comprised of individuals and organizations involved in the development, implementation, and use of metadata for scientific data. The continued proliferation and abundance of content driven metadata standards for scientific data present significant challenges for individuals seeking guidance in the selection of appropriate metadata standards and automatic processing capabilities for manipulating digital data. A collaborative, open directory of metadata standards applicable to scientific data can help address these challenges. A directory listing metadata standards applicable to research data will be of tremendous benefit to the global community facing data management challenges. Previous efforts provide evidence of this need, although these undertakings were not intended to be collective, sustainable directories. Discipline-specific metadata efforts have led to duplicative work because of the lack of communication across communities. The RDA Metadata Directory can begin to address these limitations. The RDA's global platform and cross-disciplinary reach, combined with the capacity to leverage social technology, can support the development of a community-driven and sustainable RDA Metadata Directory. This presentation will demonstrate the latest version of the MSWG prototype directory and discuss the use cases collected for the various ways this directory can be used.
First products of the Research Data Alliance (RDA): Foundations
Mary Vardigan (Inter-university Consortium for Political and Social Research (ICPSR))
An international group of collaborating data professionals launched the Research Data Alliance (RDA) in March 2013 with the vision of sharing data openly across technologies, disciplines, and countries to address the grand challenges of society. RDA is supported by the European Commission, the U.S. National Science Foundation, and the Australian government, and it meets in plenary twice a year. Members of the RDA voluntarily work together in Working Groups with concrete deliverables or in exploratory Interest Groups. Some of the foundational RDA Working Groups have completed the first phase of their projects and have produced results. This session is intended to highlight their activities and accomplishments. Data Foundation and Terminology Working Group -- Gary Berg-Cross Practical Policy Working Group -- Reagan Moore and Rainer Stotzka Data Type Registries Working Group -- Larry Lannom, Daan Broeder, and Giridhar Manepalli Metadata Standards Directory Working Group -- Rebecca Koskela, Jane Greenberg, Keith Jeffery, and Alex Ball
2015-06-04: Pecha Kuchas
University Data Ownership and Management Policies
Abigail Goben (University of Illinois-Chicago)
Lisa Zilinski (Carnegie Mellon University)
Kristin Briney ( University of Wisconsin–Milwaukee)
Data ownership and management policies can affect how research data are supported at a university. This Pecha Kucha presentation will highlight the preliminary results of our current research on university data ownership and management policies. In contrast to previous studies on institutional data management policies, we examined the university websites of 206 institutions with a Carnegie Classification on Institutions of Higher Education of either "High" or "Very High" research level as of July 2014. Some of the major questions we asked included: Does the institution have a data sharing or management policy? What does the policy cover? Who owns the policy (e.g. Office of Research, Information Technology, Libraries)? What happens to the ownership of the data if a researcher leaves the institution? Are universities with data management services provided by the library more likely to have a policy on data management? Ultimately, our goal is to determine if universities support data management comprehensively with complementary policies and services. The topics that will be covered include data stewardship, ownership, retention, and sharing in regards to university research data policies.
Call Me Maybe? It's Not Crazy! Data Collection Offices Are a Good Partner in Data Management
Alicia Hofelich Mohr (University of Minnesota)
Andrew Sell (University of Minnesota)
For data management professionals, attention is largely focused on the beginning and ends of the research process, as many researchers are worried about meeting federal requirements for data management plans (DMPs) and are looking for ways to share and archive their data. As a University office specializing in survey and experimental data collection, we have seen how the "middle" steps of data collection and analysis can be influenced by, and be an influence on, these upstream and downstream data management processes. In this Pecha Kucha, we will present relevant data management lessons we have learned from designing, developing, and hosting data collection tools. Challenges of anonymity and paying participants, quirks of statistical files produced by data collection tools, and transparency in the research process are among some of the issues we will discuss. As many of these challenges directly impact later sharing and curation of the data collected, we emphasize that data collection offices can be important partners in data management efforts.
Trends in Data Submissions at a Social Science Data Archive
Amy Pienta (ICPSR, University of Michigan)
In recent years, new data sharing policies in the US have encouraged scientists to share research data with others, many accomplishing this through archiving their data with a domain repository. Related to this trend, there is strong demand from social scientists for access to research data for a variety of secondary data analysis uses including: support of new grant applications, in classrooms for research papers, and to be used in research projects that lead to conference presentations and publications. Given that many users search for a potential secondary data through Google or through the search feature of data repository, it is possible to create and mine a database for emerging patterns in search behavior that help us better understand the demand for data and how well a domain repository is able to meet that demand. We explore data from the 100 most frequently searched keywords/phrases at ICPSR in 2014. We match these popular terms to the depth of the ICPSR holdings related to these search to determine areas where ICPSR may be lacking data. We also identify common search terms where the users exit the ICPSR web site after searching for data. We find, for example, "demoralization" was searched for 323 times in 2014 and 94% of users exited the ICPSR web site after results from the search were returned. Looking forward, ICPSR expects the number of scientists wanting access to research data collected by others to increase and this user search model may provide a greater understanding of data user needs.
Yes, illustrating a two hour virtual class session on the history of the Census Bureau and its surveys with cat pictures was initially a gimmick to maintain student engagement. Turns out though, cats are particularly effective at illustrating the more complex aspects of how the Census Bureau has developed, the functions that decennial censuses serve and the controversies they engender. I"ll demonstrate this unique qualification with comparisons to other charismatic fauna such as puppies, red pandas and otters.
New ICPSR tools for data discovery and classification
Sandra Ionescu (Inter-university Consortium for Political and Social Research (ICPSR))
Capitalizing on rich, standardized DDI-XML metadata, ICPSR continues to develop its suite of tools for data discovery and analysis with new features and applications. Recently, ICPSR launched an innovative tool that enables linking individual variables with concepts to help increase granularity in the comparison of variables and/or questions across studies and series of studies. The tool allows users to create personalized concept lists and tag variables from multiple studies with these concepts; interactive crosswalks display the variable-concept associations to further assist in data analysis, comparison, and harmonization projects. In addition to personal concept lists, it is possible to create public lists so that an organization can apply its own authoritative tagging and make this resource publicly available. The concept tagging tool is integrated with ICPSR's variable search and comparison functions that have also been upgraded with a novel feature allowing retrieval of separate lists of variables measuring different concepts within the same study. We will present and discuss the tagging tool and the enhanced search features using live examples, and will also introduce the public ICPSR classification of the American National Election Studies and General Social Survey collections and the resulting crosswalk displaying the ANES Time Series and the GSS iterations by individual years.
Public APIs: Extending access to the UK Data Service
John Sheperdson (UK Data Archive, University of Essex)
The UK Data Service is providing access to its data and metadata holdings by making public some of its web service APIs. These REST APIs facilitate a self-service approach for our data producers and researchers alike, whilst also enabling 3rd party developers to write applications that consume our APIs and present novel and exciting ways of accessing and viewing some of the data collections that we hold. We have put new infrastructure in place to enable the provision of these APIs and have already run an App Challenge (for external developers to build mobile applications against our APIs) and added a data collection usage "leader board" as initial tests of the functionality, capacity, account management, developer documentation and performance aspects of our public APIs. The main infrastructure elements are an API management service, HTTP caching and routing and various API endpoints. The other major consideration was a set of design principles for the APIs so that developers have a consistent and predictable experience. This presentation will elaborate on the key components of the infrastructure and the API design guidelines.
Building a public opinion knowledge base at the Roper Center
Elise Dunham (Roper Center for Public Opinion Research)
Marmar Moussa (Roper Center for Public Opinion Research)
A central and ongoing priority of the Roper Center for Public Opinion Research is the development and enhancement of state-of-the-art online retrieval systems that promote the discovery and reuse of public opinion data. It has become clear that foundational changes to the way the Center produces and manages its descriptive metadata throughout the data lifecycle would provide new and more efficient avenues for web application and tool development. In a collaborative effort to solidify the connection between cataloging and retrieval system development goals, the Center is developing a knowledge base system for managing and facilitating access to our vast collection of public opinion datasets. This presentation will provide an overview of the networked system of thesauri and controlled vocabularies that the Center is implementing to create the knowledge base as well as describe the automated classification process the team has developed using machine-learning techniques to repurpose existing metadata and enhance process integration throughout the metadata production workflow.
Colectica Portal vNext: Addressing new data discovery challenges
Dan Smith (Colectica)
A data portal designed to present managed research data has many tasks including making the data discoverability, documenting the research data management process, data access policies, standardized metadata, data linkages, longitudinal data support, programmatic access to the data and metadata, and integrating the data with existing systems. Colectica Portal has always solely focused on providing standardized metadata and metadata discovery, while many other tasks were left to other systems. This sole focus on metadata created challenges integrating rich DDI-Lifecycle information stored in the Portal with other applications that do not support the standard. This presentation will describe how the Colectica vNext project addresses these challenges in two distinct ways. One aspect of the vNext project is to present an integrated view of metadata and data. While the Colectica Portal historically presented DDI metadata in a metadata centric fashion, the vNext project creates focus areas centered around surveys, research datasets, and study documentation. This allows users a familiar and user friendly view laid on top of the more advanced metadata descriptions. The second aspect is a focus on data discovery. The Portal vNext project supports a new programmatic API for both metadata and data search, allowing easier integration with existing systems.
2015-06-04: D5: Curation and research data repositories
New curation software: Step-by-step preparation of social science data and code for publication and preservation
Limor Peer (Yale University)
Stephanie Wykstra (Innovations for Poverty Action)
As data-sharing becomes more prevalent through the natural and social sciences, the research community is working to meet the demands of managing and publishing data in ways that facilitate sharing. Despite the availability of repositories and research data management plans, fundamental concerns remain about how to best manage and curate data for long-term usability. The value of shared data is very much linked to its usability, and a big question remains: What tools support the preparation and review of research materials for replication, reproducibility, repurposing, and reuse? This paper describes new data curation software designed specifically for reviewing and enhancing research data. It is being developed by two research groups, the Institution for Social and Policy Studies at Yale University and Innovations for Poverty Action, in collaboration with Colectica. The software includes curation steps designed to improve the research materials and thus to enable users to derive greater value from the data: Checking variable-level and study-level metadata, replicating code to reproduce published results, and ensuring that PII is removed. The tool is based upon the best practices of data archives and fits into repository and research workflows. It is open-source, extensible, and will help ensure shared data can be used.
Using CED²AR to improve data documentation and discoverability within the United States Federal Statistical System Research Data Center (FSS-RDC)
William Block (Cornell University)
Todd Gardner (U.S. Census Bureau)
The secure environment within the Federal Statistical System Research Data Center (FSS-RDC) supports qualified researchers in the United States while protecting respondent confidentiality with state-of-the-art tools and processes. While the FSS-RDC contains data from an increasing variety of sources, few standards exist for the format and detail of metadata that RDC researchers have at their disposal. Data producers do not, as a rule, consider future research use of their data; rather, the metadata they produce is oriented toward the immediate objective at hand. Still, the RDCs need to have thorough documentation in order for researchers to carry out their projects. This presentation provides an update on the Comprehensive Extensible Data Documentation and Access Repository (CED²AR), a lightweight DDI-driven web application designed to improve the documentation and discoverability of both public and restricted data from the federal statistical system. CED²AR is part of Cornell's node of the NSF-Census Research Network (NCRN) and is now available within the FSS-RDC environment. CED²AR is being used by researchers not familiar with XML or DDI to document their data, supports variable level searching and browsing across codebooks, passively versions metadata, offers an open API for developers, and is simple to get up and running.
DDI as RDM: Documenting a multi-disciplinary longitudinal study
Barry Radler (University of Wisconsin-Madison)
Adhering to research data management principles greatly clarifies the processes used to capture and produce datasets, and the resultant rich metadata provides users of those datasets the information needed to analyze, interpret, and preserve them. These principles are even more important with longitudinal studies that contain thousands of variables and many different data types. MIDUS (Midlife in the United States) is a national longitudinal study of approximately 12,000 Americans that studies aging as an integrated bio-psychosocial process. MIDUS has a broad and unique blend of social, health, and biomarker data collected over 20 years through a variety of modes. For nearly 10 years, MIDUS has relied on DDI to help manage and document these complex research data. In late 2013, the National Institute on Aging funded MIDUS to improve its DDI infrastructure by creating a DDI-based, harmonized data extraction system. Such a system allows researchers to easily create documented and citable data extracts that are directly related to their research questions and allows more time to be spent analyzing data instead of managing it. This presentation will explain the rationale, methods, and results of the project.
Mixed method approaches to GIS: Qualities, quantities, and quandaries
Andy Rutukowski (University of Southern California)
Geographic Information Systems (GIS) have the potential to make sense out of large collections of data. Historically GIS projects have been focused on quantitative data and analysis, whereas qualitative data has been mostly limited to classifying or labeling categories or types. More recently GIS work has shown how different types of qualitative data (such as interviews, Tweets, archival newspaper classifieds, photographs, etc.) can improve our understanding of quantitative data and therefore produce more meaningful maps. I will outline some recent cases of mixed methods approaches to GIS projects and discuss how these approaches benefited from including qualitative data. I will also consider the challenges of collecting, using, and archiving qualitative data. Lastly, I will consider the politics of mixing your methods in academic and other settings.
The landscape of geospatial research: A content analysis of recently published articles
Mara Blake (University of Michigan)
Nicole Scholtz (University of Michigan)
Justin Joque (University of Michigan)
Researchers at all levels frequently refer to existing journal articles for references to data sources, tools and methods, but often the lack of clear information about these prevents continuity and reproducibility in research practices. The authors undertook this study to capture information about the body of published literature utilizing geospatial research methods. The paper presents the results of content analysis on published articles that used methods of geospatial analysis as a major part of their research methodology. In order to better understand the landscape of current publishing practices and methodological approaches, the authors coded a sample of articles from a selection of journals drawn from a variety of disciplines that utilize geospatial analyses. They coded the articles for content, including: data citation; software and tools used; specificity of research methodology description; and methodological errors. In addition to the coded variables, the authors also compiled metadata about the the articles, including: journal title; journal subject area; primary author subject affiliation; primary author sex; and number of authors. The paper presents an exploration of the current state of data and geospatial related practices, especially transparency and quality of sources and methods.
GoGeo: A Jisc-funded service to promote and support spatial data management and sharing across UK academia
Tony Mathys (EDINA, University of Edinburgh)
The implementation and encouragement of good data management practices and data sharing in the social sciences is a formidable challenge, especially for spatial data within academic disciplines that embrace the use of Geographical Information Systems (GIS), image processing and statistical software for research and teaching. The Joint Information Systems Committee (Jisc) has taken the lead to provide resources to support data management and sharing across UK academia. The GoGeo service is an example of Jisc's commitment to provide resources to securely manage and share spatial data. These resources include the Geodoc online metadata tool, which allows users to create, edit, manage, import, export and publish standards-compliant (ISO 19115, UKGEMINI, INSPIRE, DDI and Dublin Core) metadata records; the GoGeo portal, which allows users to publish their records into public or private metadata catalogues; and, ShareGeo, a repository for users to upload and download spatial data. The service also offers geospatial metadata workshops to introduce academics and students to geospatial metadata, standards and to the GoGeo service's resources. This presentation will provide an overview of the GoGeo service, which started as a project between the EDINA and the UK Data Archive in 2002. Its successes and shortcomings will be summarised as well.
Qualitative research: The Jan Brady of social sciences data services?
Mandy Swygart-Hobaugh (Georgia State University)
Librarians providing data services for researchers and learners in the social sciences should be offering data support and management services to qualitative researchers as well as quantitative ones. But, is this the case in practice? Do social sciences data services librarians devote their primary attention to quantitative researchers to the detriment of qualitative researchers? Is qualitative research the Jan Brady of social sciences data services? This presentation will present findings from: (1) a content analysis of IASSIST job repository postings from 2005-2014, gauging their requirements/responsibilities regarding qualitative data services; (2) a content analysis of the social sciences data services professional literature from 2005 forward, gleaning the discussion of qualitative data services, and (3) a survey of social sciences data services librarians, exploring the extent of qualitative data and research support they presently provide at their academic institutions and their thoughts regarding the relevance of qualitative data and research for the future of data support services.
2015-06-04: E2: First products of the Research Data Alliance (RDA): Integration and sustainability
Dynamic Data Citation Working Group: Approaches to data citation in non-trivial settings: How to precisely identify subsets in static and dynamic data
Ari Asmi (Research Data Alliance)
Andreas Rauber (Research Data Alliance)
Dieter van Uytvanck (Research Data Alliance)
Reagan Moore (v)
Being able to reliably and efficiently identify entire or subsets of data in large and dynamically growing or changing datasets constitutes a significant challenge for a range of research domains. To repeat an earlier study, or to apply data from an earlier study to a new model, we need to be able to precisely identify the very subset of data used. While verbal descriptions of how the subset was created are hardly precise enough and do not support automated handling, keeping redundant copies of the data in question does not scale up to the big data settings encountered in many disciplines today. Furthermore, we need to handle situations where new data gets added or existing data gets corrected or modified over time. Conventional approaches are not sufficient. We will review the challenges identified above and discuss solutions that are currently elaborated within the context of the Working Group of the Research Data Alliance (RDA) on Data Citation: Making Dynamic Data Citable. The approach is based on versioned and time-stamped data sources, with persistent identifiers being assigned to the time-stamped queries/expressions that are used for creating the subset of data. We will further review results from the first pilots evaluating the approach.
Early work by five RDA Working Groups (DTR, DFT, PIT, PP, MDSD) developed a foundation that was important for progress and common understanding. As these groups completed their efforts, continued interaction and expansion to other groups was deemed useful to form a more integrated view. As a result a new Interest Group entitled "Data Fabric" was formed. Starting with a white paper, the DFIG will broadly consider and illustrate the possible directions to make data practices more efficient and cost-effective. We will describe important common components and their services, along with principles of component and service interaction and associated best practices. Over time we will seek consensus on conceptual views of the ecological landscape of components and services that are required. The intent is to promote ingredients, such as policy-based automatic procedures adhering to basic data organization principles, that are necessary to professionally deal with large datasets in ways based on well-accepted concepts and mechanisms. These discussions concretized by spin-off WGs are expected to benefit RDA groups as well as the broader research and data community.
DSA-WDS Partnership on Repository Certification Working Group
Mary Vardigan (Inter-university Consortium for Political and Social Research (ICPSR))
Lesley Rickards (Research Data Alliance)
Created under the auspices of the RDA Interest Group on Audit and Certification, this Working Group is a partnership between the Data Seal of Approval (DSA) and the World Data System (WDS) to develop a common set of requirements for basic assessment and certification of data repositories. Both the DSA and the WDS are lightweight certification mechanisms and their criteria have much in common, so it makes sense to bring them together. In addition the Working Group seeks to develop common assessment procedures, a shared testbed for assessment, and ultimately a framework for certification that includes other standards like Nestor and ISO 16363 as well. This presentation will provide an overview of the activities of the working group, including a review of the harmonized requirements and procedures.
Data Publication: Cost Recovery for Data Centres Interest Group
Ingrid Dillo (DANS)
Simon Hodson ()
A lot of work is going on to understand the costs of maintaining long-term accessibility to digital resources, to identify different cost components, and on the basis of this to develop cost models. However, in a broader context that considers data as part of research communication, the identification of costs and development of cost models address only part of the problem. In times of tightening budgets, it is important to address the challenge of ensuring the sustainability of data centres -- and to consider this in the context of the broader processes for data publication. Many established national and international data centres have reliable sources of income from research funders. However, these income sources are generally inelastic and may be vulnerable. There is concern that basic funding of data infrastructure may not keep pace with increasing costs. And there is a need, therefore, to consider alternative cost recovery options and a diversification of revenue streams. The RDA/WDS Interest Group on Cost Recovery for Data Centres aims to contribute to strategic thinking on cost recovery by conducting research to understand current and possible cost recovery strategies for data centres. This presentation will provide an overview of the activities of the interest group.
Bridging the business data divide: insights into primary and secondary data use by business researchers.
Linda Lowry (Brock University)
Academic librarians and data specialists use a variety of approaches to gain insight into how researcher data needs and practices vary by discipline, including surveys, focus groups, and interviews. Some published studies have included small numbers of business school faculty and graduate students in their samples, but provided little, if any, insight into variations within the business discipline. Business researchers employ a variety of research designs and methods and engage in quantitative and qualitative data analysis. The purpose of this paper is to provide deeper insight into primary and secondary data use by business graduate students at one Canadian university based on a content analysis of a corpus of 32 Master of Science in Management theses. This paper explores variations in research designs and data collection methods between and within business subfields (e.g., accounting, finance, operations and information systems, marketing, or organization studies) in order to better understand the extent to which these researchers collect and analyze primary or secondary data sources, including commercial and open data sources. The results of this analysis will inform the work of data specialists and liaison librarians who provide research data management services for business school researchers.
Listening to the user-voice to improve user support and training
Sarah King-Hele (UK Data Service)
Vanessa Higgins (UK Data Service)
The UK Data Service is a resource funded to support researchers, students, lecturers and policymakers who depend on high-quality social and economic data. This presentation will discuss the methods we use to consult with users and track their behaviour on the website in order to improve our services to them. Our approaches include an annual stakeholder consultation, a continuous pop-up survey on the website, ad-hoc consultations with specific user groups, regular user-testing of the website, monitoring of Google Analytics, user conferences and monitoring of feedback and attendance figures from training events. These developments allow the "user-voice" to come through loud and clear in a variety of formats in listening to the user voice we are able to deliver an improved and targeted service. We also discuss future plans to reach new audiences, including expanding our use of data visualisation and a new dissertation zone.
Was it good for you? User Experience Research to improve dissemination of census aggregate data via InFuse
Richard Wiseman (UK Data Service)
InFuse (infuse.mimas.ac.uk) provides easy access to aggregate data from the UK's 2011 and 2011 censuses based on a fundamental remodelling of the thousands of disparate aggregate datasets produced by the three UK census agencies into a single, integrated, standards-compliant dataset suitable for global and automated operations. To date, efforts have mainly been focussed on the enormous task of data processing. This presentation will outline the next phase of development aimed at enhancing users' experiences of InFuse. It will include details of user experience research already carried out, and the ways in which results have guided current development, as well as describing future plans, challenges and opportunities.
Understanding academic users' data needs through Virtual Reference transcripts
Margaret Smith (New York University)
Samantha Guss (University of Richmond)
Jill Conte (New York University)
New York University Libraries has a very high volume chat reference service--averaging more than 14,500 transactions per academic year for the past few years. This popularity offers a unique opportunity for insight into our patrons' conceptualization of their data needs and how these needs are changing. Through analysis of four years worth of chat transcripts, we assessed user needs and familiarity related to locating secondary data and statistics, performing data analysis, and using existing data services. We used a grounded theory approach, exploring the data through coding and categorization. We will discuss the process and results of our investigation, as well as implications for training virtual reference service staff on the data reference interview and other data topics, and improving overall service quality.
Publishing codebooks via CED2AR to enable variable cross-searching between datasets
Janet Heslop (Cornell University)
Ben Perry (Cornell University)
The Comprehensive Extensible Data Documentation and Access Repository (CED2AR) is designed to improve the discoverability of data collections based upon codebooks and metadata of the holdings. CED2AR utilizes DDI 2.5 metadata standards for documenting the holdings, along with schema.org for microdata markup to allow search engines to parse the semantic information from the DDI metadata. This combined solution enhances the discoverability of DDI metadata and displays it through a user friendly web interface. In addition to making individual codebooks searchable, CED2AR also facilitates cross-codebook searching and browsing. Based on the CED2AR application the Cornell Institute of Social and Economic Research (CISER) is currently in the midst of bringing our data archive metadata into DDI 2.5 through the CED2AR application. The presentation will describe the steps taken to accomplish this task and a demonstration on the status of producing an extensible data archive down to the variable level.
Update on Taxonomy / Lexicon Project at the US Bureau of Labor Statistics
Daniel Gillman (U.S. Bureau of Labor Statistics)
The taxonomy and lexicon project at the US Bureau of Labor Statistics was started in summer 2013 with the goal to provide consistent access to BLS data and documents. Each search criterion should provide data and documents that are related. The taxonomy portion of the work is to improve searching for data, and the lexicon portion is to improve tagging, cataloging, and searching for documents. The work has advanced significantly since it was initially described at IASSIST 2014. There are 5 areas of note: 1) The development of a 3 level hierarchy over all the measures and characteristics encompassing BLS data 2) Linkage of all low level characteristics across measures 3) The identification of common confusions and plain language similarities for all BLS data 4) Cognitive evaluations of the 3 level hierarchies 5) The development of a web-based implementation of the taxonomy 6) The inclusion of the taxonomy into the new DataFinder series dissemination tool 7) Assessment of the impact on standardizing all BLS terms Each of these developments will be discussed in more detail, and in particular the impact of each is described.
Doing DDI: Operationalising DDI for longitudinal studies
Gemma Seabrook (Institute of Education)
Funders are rightly concerned that they get the maximum value for their investments. The UK longitudinal studies represent a unique data collection that continues to give value and it is important to ensure that this collection remains relevant. Providing the quality of documentation that modern researchers expect for studies that begin as early as 1946 provides a significant challenge. The CLOSER project brings together nine of these studies. It seeks to enhance the metadata available to provide complete questionnaire and variable metadata that can support ongoing data management and be used to populate a search platform, enabling discovery in general and cross-cohort research in particular. Bringing historic metadata up to the DDI standard being adopted presents a variety of challenges such as the condition of supporting documentation for the oldest parts of the studies, the wide variety of methods and formats, the sheer scale of the number of questionnaires and tools used and the priorities and capacity of the various stakeholders. This paper will detail how CLOSER has addressed these at an operational level (protocols, processes, planning, etc.) and how others might learn from these experiences and make use of the outputs CLOSER provides.
Big Metadata: Bringing researchers CLOSER to longitudinal data with an Advanced Discovery Platform
Jeremy Iverson (Colectica)
CLOSER (www.closer.ac.uk), funded by the ESRC and MRC, aims to maximise the use, value and impact of nine of the UK's longitudinal studies. A central component of CLOSER will be a metadata discovery platform that will enable the discovery of a range of data collection and variable metadata from each of the participating studies. The scale and detail of the metadata to be included will make it amongst the largest and most detailed of such repositories in the world. The discovery platform - nbsp;a customisation of the Colectica search portalnbsp; - will offer cutting-edge search technologies and innovative and user-friendly ways to discover, navigate and display the metadata. This presentation will describe how the metadata were created and harmonized, discuss how the search portal was built, and showcase the innovative search and discovery interfaces that are critical to allowing researchers to understand and leverage a massive data resource.
Training for de-identifying human subjects data for sharing: a viable library service
David Fearon (Johns Hopkins University)
Jennifer Darragh (Johns Hopkins University)
Since 2011, Johns Hopkins Data Management Services (DMS) has provided consulting and training on managing, sharing and preserving research data, and operates the JHU Data Archive. Last year, DMS consultant Dave Fearon collaborated with Jen Darragh, JHU's Data Services and Sociology Librarian, providing training on removing identifiers from human subjects data for sharing and archiving. Both presenters attended ICPSR's 3-day summer program on Assessing and Mitigating Disclosure Risk, drawing upon course materials and additional resources to develop a one-hour session. The training emphasizes how researchers can make disclosure assessment, de-identification and sharing through repositories a viable option through techniques applied at each stage of the research cycle, in order to better meet expanding funder expectations and improving dataset impact. JHU's IRB offices vetted the content, and expressed appreciation for training on areas of disclosure assessment that they do not support extensively. A broad audience across JHU in social science, education, medicine and public health divisions spurred some customization of content and flexible blending of introductory and advanced material. We have been expanding online resources on de-identification software and exploring in-depth consulting on de-identification projects. We will discuss the training's topics and context within library and institutional research support services.
Have data skills will travel: one summer, 19 stories
Jackie Carter (University of Manchester)
Q-Step is the 5-year national UK programme supporting more social science and humanities students to use quantitative data in their undergraduate studies. Q-Step at The University of Manchester is working across politics, sociology, criminology and linguistics degree programmes to 'make numbers normal' in the classroom. In this session I will report on progress in the first 18months, presenting what happened with our 19 summer placement students when we placed them in think-tanks, polling organisations, research consultancies, city councils, the UK Data Service and market research organisations. Each student produced a poster demonstrating their experience; we celebrated these findings in an event entitled 'Stepping Out'. Moreover they produced briefing papers, blog posts, news articles, public presentations, a book chapter and in one case an evidence-based report for MPs. Some returned to their third year and chose to undertake a dissertation involving data analysis. They exceeded our, the employers, and their own expectations, setting the blueprint for 2015, when we will double the number of students, and increase the number of organisations we will place them with. This presentation tells our students', and employers' stories. from our 2014 pilot year, demonstrating how data skills acquired in the classroom travel into the workplace.
Teaching users to work with research data: case studies in architecture, history and social work
Jennifer Moore (Washington University in St. Louis)
Aaron Addison (Washington University in St. Louis)
A tailored approach is ideal for teaching users to work with research data, which varies by project and discipline depending on methodology, data sources and intended outcomes. In this presentation and paper three case studies will be put forth, each with different approaches to research data: primary collection, digital data reuse and mined textual data. In each example, researchers are not only working to implement a functional methodology, but also to engage students in practices that equip them with tools and skills to advance their own research trajectory. Further, these examples are from researchers in distinctly different disciplines: an architect working on climate change in Midwest river basins, a historian reconstructing the creation of our government and a professor of social work collecting wood fuel data for villages in India. The Data and GIS Services team at Washington University has partnered with each project presented to support analyses, visualization, management, preservation and sharing of research data. Methods, challenges and opportunities will be discussed.
SowiDataNet - Bringing social and economic research data together
Monika Linne (GESIS: Leibniz Institute for the Social Sciences)
Flexible data distribution and the reuse of research data are becoming increasingly relevant in the social sciences. Therefore, GESIS in collaboration with the Social Science Centre Berlin, the German Institute for Economic Research, and the German National Library of Economics started the development of SowiDataNet. The overarching - and so far in Germany unique - objective is the construction of an infrastructure for decentralized research data from the social and economic sciences in Germany. At present, the holding of research data in Germany is heavily fragmented, which precludes a user-friendly, centralized and therefore quick data retrieval. Due to this major hurdle, data reuse by other scholars underlies extremely high levels of complexity and effort, or in the worst - but not very uncommon - case is simply impossible. This dissatisfying situation is aimed to be resolved by SowiDataNet, which will integrate decentralized research data together within one repository-network. The core of this network will be a web-based, independent infrastructure that allows for low-threshold self-archiving, standardized documentation and distribution of research data. SowiDataNet is community driven. It focusses on the specific needs of social and economic scientists, in order to prosper the ideal of data sharing and long-term data archiving.
The challenges of reducing the public's data trust deficit: The experience of communications and public engagement across the Administrative Data Research Network
Trazar Astley-Reid (Administrative Data Research Network)
Ilse Verwulgen (Administrative Data Research Network)
Judith Knight (Administrative Data Research Network)
Chris Coates (Administrative Data Research Network)
Securing an understanding of public attitudes to the use and linking of administrative data has been the cornerstone to setting up the new Administrative Data Research Network. The Network is a UK-wide partnership between universities, government bodies, national statistics authorities and the wider research community www.adrn.ac.uk . Accessing and linking administrative data can bring benefits to society but people worry that data sharing is a risk to their privacy and security. Central to our work is the need to communicate with a broad church e.g. the general public, government bodies, academia, the third sector and our own Network. This is borne out of a need to be transparent, inclusive and trusted. One of our challenges to reduce the public's data trust deficit is to balance the communications messages so as not to increase fear by increasing awareness. This is an exercise in risk management, as without widespread communications targeted at all levels of society the benefits may not be realised. The Network's role is to both secure the public's trust and provide a service to researchers that is secure, lawful and ethical, run by experts in the field who ensure privacy is protected.
Improving efficiency and accuracy of administrative data linkage: can methods from other disciplines help?
Kakia Chatsiou (Administrative Data Research Network)
One of the great challenges of enabling access to linked de-identified administrative data is the accuracy and quick delivery of pre-processing and linkage of such large datasets. While quite a lot of work needs to be done in cleaning these datasets and getting them ready to be linked with other datasets, only some of the records are successfully linked using automated methods (by i.e. using deterministic or probabilistic methods) while most are linked by indexing professionals. Clerical data linkage while more accurate is more resource intensive and time consuming and adds to the preparation time needed for the researchers to access the data they need for their research. This paper will provide an overview of current methods for preparing and linking administrative records as used by the Administrative Data Research Network, a UK-wide partnership between universities, government departments and agencies, national statistics authorities, funders and the wider research community. We will present information on quality, accuracy and performance and discuss how methods from other disciplines such as Natural Language Processing, have been dealing with similar challenges when working with similar goals in mind, such as when trying to disambiguate named entities in large corpora/datasets.
2015-06-05: F3: Social science data archives in transition
From being an archive to becoming an archive
Anne Sofie Fink (Danish Data Archive/National Archive of Denmark)
From being an archive to becoming an archive Since 1993 Danish Data Archive (DDA) has been part of the National Archive in Denmark. The DDA has been working as a (small) European style data archivenbsp;- acquiring, curating and disseminating survey data produced by social sciencenbsp;- as an organisational, independent unit. May 2014 the National Archive implemented a new organisational strategy with the aim of specialising activities across the whole organisation. This means that acquisition, curation, dissemination and software development is now carried out across administrative data and research data by four organisational separate units. Therefore the data archive needs to become a new kind of archive. At the moment we are standing in the middle of this implementation of new ways of working. The unit for data dissemination services for administrative data and research data has kept the name DDA and has taken on the responsibility for our international activities including being service provider for CESSDA ERIC, DDI-L based software development and taking part in DDI Alliance. The presentation will outline the challenges and risks in the process of change and point to new ways of becoming an archive in a new context.
European Research Services for Distributed Data: A Semantic Approach
Anja Burghardt (Research Data Centre of the German Federal Employment Agency)
Europe is struggling with societal challenges in fields such as health, migration and demographic change. For the development of tackling policy solutions on a European level innovative pan-European research is crucial. The upcoming challenges are thereby not limited to one specific discipline or European country. Consequently interdisciplinary research on a European level is necessary. For this kind of research data of different types and from multiple sources are needed. A future challenge for the European Research Community and related institutions is to build a research Infrastructure that will be able to integrate data in different forms and from multiple sources such as Data Archives, Research Data Centres, National Statistical Institutes, the corporate sector or the Internet. Thus we propose a European Research Services network equipped with semantic tools organizing multiple ontologies and data flows. This will improve the European Research infrastructure and allow researchers to make use of relational information and data. This will bring the research experience to a new level and ensuing research to its best. This talk will focus on the harmonization of data access forms, data styles, distributed sources, data documentation and possible other necessary information through a semantic model approach.
Data collection today: An overview of data collections and acquisition procedures in health libraries in the South-West, Nigeria
Joseph Olorunsaye (University of Ibadan)
Data collections and acquisitions in the electronic age are increasingly unique globally. But the growing equity in access to data for effective information service delivery and global relevance is a serious import of this study. Therefore, the effect of current economic and political challenges in Nigeria to the community of data, and the need to bridge the gap in literature is imperative. The purpose of the study is to determine the extent to which health libraries in the South-West, Nigeria have formalized data collections and acquisitions in the electronic age and to highlight the guidelines and policies used for collection and acquisition in the electronic age as against the traditional purchasing models. And, to determine the extent of current challenges on collection development and acquisitions for improve access to relevant data. There are scores of medical/health libraries in the south west of Nigeria but the guidelines for collections and acquisitions for effective information service delivery is underdeveloped. Giving the growing importance of this study, a questionnaire and interview approach will be explored to gather data from the Sectional heads and the Medical/Health Library Directors.
Lynn Woolfrey (Data First, University of Cape Town)
2015-06-05: F5: Using data management plans as a research tool for improving data services in academic libraries
Using data management plans as a research tool for improving data services in academic libraries
Amanda Whitmire (Oregon State University)
Lizzy Rolando (Georgia Institute of Technology)
Brian Westra (University of Oregon)
Jake Carlson (University of Michigan)
Patricia Hswe (Pennsylvania State University)
Susan Wells Parham (Georgia Institute of Technology)
To provide research data management (RDM) support services, libraries need to develop expertise in data curation and management within the library. Many academic libraries are reorganizing to initiate RDM service structures, but may lack staff expertise in this area. Funding agencies increasingly require a data management plan (DMP) with funding proposals; they describe how data generated in the proposed work will be managed, preserved and shared. We have developed an analytic rubric for assessing DMPs. An analysis of DMPs can identify common gaps in researcher understanding of RDM principles and practices, and identify barriers for researchers in applying best practices. Our rubric allows librarians to utilize DMPs as a research tool that can inform decisions about which research data services they should provide. This tool enables librarians who may have no direct experience in applied research or RDM to become better informed about researchers' data practices and how library services can support them. This panel will consist of five data specialists from academic libraries who will introduce the rubric, share the results of our individual analyses, and describe how the results informed the evolution of services at our respective libraries.
2015-06-05: G1: DDI moving forward: Progress on a new model-based DDI
DDI Moving Forward: Progress on a new model-based DDI
Arofan Gregory ()
Joachim Wackerow (GESIS: Leibniz Institute for the Social Sciences)
Wendy Thomas (University of Minnesota)
Barry Radler (University of Wisconsin - Madison)
Dan Gillman ()
Mary Vardigan (Inter-university Consortium for Political and Social Research (ICPSR))
Jay Greenfield ()
The future development of the DDI metadata standard will be based on an information model. This is a common strategy for standards development and it offers several benefits, including improved communication with other disciplines and standards, flexibility in terms of technical expressions of the model, and streamlined development and maintenance. This new model for DDI is being developed through the project "DDI Moving Forward", running from 2013 through 2015. Virtual teams from around the globe have been developing the model content, technical production systems and documentation, complemented by a series of face to face sprints. The purpose of this session will be to provide an overview of the current state of the Forward project. The session will include: - an introduction to the project, including the organisation of the modelling framework, bindings and production process. - an overview of the major content areas developed so far, including Conceptual objects, Data Description, Data Capture (for surveys and other measurement instruments), a Simple Codebook, Discovery, - The proposed new DDI process model, and - An overview of future activities for the Moving Forward project. The session will conclude with an open panel discussion with presenters and the audience in a question and answer format.
2015-06-05: G2: Planning research data management services
RDM meets Open Access
Katherine McNeill (MIT)
One method to reduce the digital divide internationally is to increase open access to data and publications for wider use. Many institutions work to help their researchers make their results publicly-accessible, but historically services enabling open access to data vs. publications often have been provided separately. What synergies exist between institutional services for research data management and those for scholarly publishing/open access to publications? How do issues coincide or differ when providing open access to data vs. publications? How might universities address open access in a more holistic manner and unify outreach to researchers? This presentation will describe new efforts in the MIT Libraries of formal collaborations between the groups which provide services for research data management and those for scholarly publishing. Discussions will cover collaborations in areas such as: strategic planning, supporting compliance with funder requirements for open access to data and publications, outreach, repository services, linking data and publications, organizational models, and more.
Partnerships in a Data Management Village: Exploring how research and library services can work together
Alicia Hofelich Mohr (University of Minnesota)
Thomas Lindsay (University of Minnesota)
Lisa Johnston (University of Minnesota)
Providing data management services is a task that takes a village; a distributed model of support, involving collaboration among diverse institutional offices, is needed to do it well. Researchers especially benefit when specialized institutional support offices are aware of other relevant providers and the impact their services have on the management of data across the research lifecycle. However, once a village is assembled, how do we work with members to be committed collaborators, rather than a passive referral network? In this presentation, we will describe a case study of our in-depth collaboration between the University Libraries and the College of Liberal Arts (CLA) at the University of Minnesota. Both groups are developing new suites of data management services to meet evolving researcher needs and rising demands for data management support. Working together has provided many advantages for sharing resources and knowledge, but also has presented challenges, including how to define the respective roles of college-level and university-wide data management services, and how formalized collaborations may work. We will describe these challenges and how the collective and complementary skills of our offices will provide researchers with support across much larger portions of the research lifecycle than either office could provide alone.
Data management on a shoestring budget
Carol Perry (University of Guelph)
Providing data management services at a university with limited resources can be a daunting challenge. With a little ingenuity, a fairly comprehensive data service can be established scaled to the available resources. At the University of Guelph we have utilized resources and expertise available through the greater data community to build our service over a four year period. Now, as we await the Canadian Tri-Council [Granting] Agencies' new open access policy to be implemented, the service we have built will be put to the test in earnest as researchers prepare for new data management planning requirements. This presentation will review the processes and practices we established as we built our service from scratch. We will address the challenges and the successes encountered along the way and examine the challenges we face moving forward.
2015-06-05: G3: Building data capacity internationally through repository management training and collaboration
Building data capacity internationally through repository management training and collaboration
George Alter (Inter-university Consortium for Political and Social Research)
Lynn Woolfrey (DataFirst)
Samuel Kobina Annim (University of Cape Coast, Ghana)
Willliam Block (Cornell University)
Research organizations and universities across the globe show increasing desire to disseminate and archive research data, although they often lack the training and resources to begin. This session will present case studies in training and collaboration between archives (the Inter-university Consortium for Political and Social Research (ICPSR), the Cornell Institute for Social and Economic Research (CISER)) and African universities (the University of Cape Coast (Ghana), IFORD (Cameroon)). Experiences from the case studies point to the need to understand the country context and sequence of needs in relation to resources.
2015-06-05: G4: The benefits of remote data processing: A comparison and look into the future
The benefits of remote data processing: a comparison and look into the future
Donna Dosman (Statistics Canada)
David Price (Statistics Canada)
Atle Alvheim (Norwegian Social Science Data Services)
Ornulf Risnes (Norwegian Social Science Data Services)
Amadou Gaye (University of Bristol)
Vincent Ferretti (Ontario Institute of Cancer Research)
Within this session four remote data processing systems are presented. Thereby different benefits, depending on the regarding implementation, of such solutions are highlighted. Finally possible developments will be discussed. The presented systems are: The job submission system of the Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB) that provides access to highly detailed labour market data. RAIRD, a web-based system for confidential research on full population event data from a set of Norwegian administrative registers. The RAIRD platform supports on-the-fly import (and conversion) of event data into a disclosure-limiting web based statistical package for remote data processing and analysis. The Real Time Remote Access program at Statistics Canada which uses technology to enable fast, on-line access to detailed microdata for researchers through a balance of controlling the risk of disclosure (automation of confidentiality rules) and managing the risk of disclosure (contracts with individuals and institutions). DataSHIELD a novel solution that allows for an analyst to perform pooled analyses of data held at different locations without ever seeing the microdata or transferring them to his computer (i.e. the data remain at their original location under the control of the data owner).
2015-06-05: G5: Dataverse: A repository framework for all
Odum Institute Archives services overview
Jonathan Crabtree (University of North Carolina at Chapel Hill; Odum Institute for Research in Social Science)