Using the Data Curation Profile as a Means to Engage Researchers
D. Scott Brandt (Purdue University Libraries )
Engaging researchers in discussions about data may be new, unfamiliar territory for many librarians. This workshop provides training in the application and use of the Data Curation Profile Toolkit, an instrument used to elicit information about data from researchers. The Toolkit can be used to facilitate data discussions, to identify research data needs, and to help plan for development of data services. It provides a flexible structure for conducting an interview, and can facilitate discussion with researchers about what they may want to do with data beyond its immediate use. Profiles which are developed out of interviews with researchers can be published in the Data Curation Profiles Directory. The goal of the workshop is to build knowledge and skills to discuss data with researchers. Learning is facilitated by presenting scenarios and working through hands-on exercises with the Toolkit, which includes an Interviewer Manual and Interview Worksheet.
Access Policies and Licensing for Archives and Repositories
Laurence Horton (Data Service Infrastructure for the Social Sciences and Humanities project (DASISH))
The workshop combines expertise from the DASISH project, covering five European Social Science and Humanities research infrastructures. It will focus on data administration policies for user management. The workshop is of interest to two groups: people in organizations/institutions who: Are in the process of setting up a data archive or repository Or, with a repository (for data/publications) who wish to systematically consider access policies and licensing as part of a long-term digital preservation and reuse strategy Topics covered include: Data submission License and acquisition agreements between data producers and archives/preservation services: How do you get data into your archive/repository and ensure you can continue preserving and disseminating it in the future? Introducing the DASISH training module Presentation of an Access Policies and Licensing training module with discussion on content and structure Responsibilities for subsequent data reuse What can archives do to ensure that license terms are understood and respected? How to ensure meeting legal requirements on data protection and security. Comparing existing license schemes What are existing licenses in use and what are relevant differences between them? Secure data licenses Access policies and conditions of reuse for sensitive personal data. Legal requirements on data protection and security and how they can be met.
Using OLAP Techniques for Data Presentation and Analysis
Chris Leowski (University of Toronto)
Andreea Gheorghe (University of Toronto)
Preceded by a crash course in the underlying theory of OLAP (On Line Analytical Processing) cubes, the workshop will focus on using OLAP cubes in social sciences for data presentation and analysis of - mostly - Canadian census data and some economic tables from CANSIM. On line hands-on exercises in slicing, dicing, aggregation and disaggregation of data will be offered to participants.
Introduction to R and Reproducible Research
Harrison Dekker (UC Berkeley)
Tim Dennis (UC Berkeley)
R is an open-source statistical computing environment, comparable to SAS, Stata, and SPSS, widely used across disciplines as disparate as bioinformatics and finance. A key feature of the R platform is a “package” system that allows users to easily share code, data, and documentation via community-supported repositories like CRAN, R-Forge, and Bioconductor. Thousands of high quality, well-documented packages are currently available and the number continues to grow. The workshop will focus on R commands for data manipulation and descriptive statistics and the use of the RStudio development environment. Examples and exercises will cover common social science data tasks like subsetting, merging, and creating categorical variables. In addition, the workshop will introduce workflow practices that promote reproducibility which can be implemented in R or any comparable statistics packages. No previous experience with R will be assumed, but participants should be familiar with at least one other statistics package.
da|ra: How to obtain a DOI name for my social and economic research data?
Brigitte Hausstein (GESIS - Leibniz Institute for the Social Sciences)
The GESIS - Leibniz Institute for the Social Sciences and the ZBW Leibniz Information Center for Economics are offering a DOI registration service for social and economic research data. As members of DataCite, GESIS and ZBW pursue the goal of promoting and establishing uniform standards for the acceptance of research data as independent citable scientific objects. The DOI registration service da|ra was introduced in 2010 and meanwhile most of the leading German social science research data centres have been using the service already. The workshop will introduce da|ra, its policy and metadata schema as well as the functionality of the system. The main focus of the workshop will lay on hands-on complemented by the presentation. Thus the participants will learn how to register DOI names with da|ra: 1. da|ra Policy and Metadata schema 2. Registering a DOI name for test data sets generated in the social sciences and in economics. The participants will get the chance to check out the different workflows provided by da|ra: web interface and the xml-upload/web service. The Workshop will also show the interaction between da|ra and the Data Cite Metadata Store. Another component will be the debate on the specific requirements for metadata when it comes to registration.
UK Institutional partnership training workshop: costing, appraising and managing data for social science research
Louise Corti (UK Data Archive)
Jared Lyle (ICPSR)
When it comes to dealing with the ever increasing commitments of research data , both the UK Data Service and ICPSR continue to see institutions struggle with the challenges of domain specificity; how do we help our local researchers cost, plan and manage social science data? How do we then appraise and curate that mixed bag of data that a social scientist might have created? In this workshop we will showcase our collaborative support and training materials that are being used to support both institutional repository managers charged with appraising, ingesting and managing social science research data from local academics; and research support staff who face dealing with ensuring compliance with data management responsibilities set out in almost all research applications. In this session participants will get a chance to try our exercises on: Costing shot and longer-term data management How to write good data management plan Appraising data for social science research Creating sufficient context for data collections Creating Data Centre ‘compliant’ metadata records for local repositories
Data Visualization and R
Ryan Womack (Rutgers University)
This workshop will focus on principles and techniques for the visualization of data, with an equal emphasis on theory and implementation. Drawing on classic works by Cleveland (Visualizing Data), Tufte (The Visual Display of Quantitative Information), and Wilkinson (The Grammar of Graphics), a range of best practices for visualization will be illustrated. Recently developed techniques for large-scale, 3D, and interactive visualization will also be discussed. This discussion will be based on works such as Graphics of Large Datasets: Visualizing a Million (Unwin, Theus, and Hofmann), the Handbook of Data Visualization (Chen, Hardle, and Unwin), and Trends in Interactive Visualization: A State of the Art Survey (Liere, Adriaansen and Zudilova-Seinstra) For each of these approaches, methods for creating similar graphics in the R open-source statistical language will be demonstrated, using packages such as ggplot2, lattice, and rggobi. Prior familiarity with R is helpful but not required.
CharmStats: Coding and Harmonization of Statistics
Kristi Winters (GESIS - Leibniz Institute for the Social Sciences)
The software program Charmstats 1.0 (Coding and Harmonizing Statistics) will provide a structured approach to data harmonization by allowing researchers to: 1) download harmonization protocols; 2) document variable coding and harmonization processes; 3) access variables from existing datasets for harmonization; and 4) create harmonization protocols for publication and citation. It will be open source and free to all users. The workshop will explain the software interface, show participants how to find variables using the program and walk them through creating a harmonized variable.
2013-05-29: Plenary I
Plenary 1: Research Infrastructures (RI) for Social Sciences and Humanities (SSH), From FP7 (2007-2013) to Horizon 2020 (2014-2020)
Maria Theofilatou (European Commission, Directorate-General Research and Innovation, Research Infrastructures Unit)
The European Commission supports the European Research Area (ERA) through strategic funding measures, specifically two programs called FP7 (2007-2013) and HORIZON 2020 (2014-2020). The money provided by those two programs comes from the member states of the European Union (EU). The goal is to coordinate investments into European Research Infrastructures in the most efficient way, while keeping the budget in mind. In this plenary these two funding programs are described and changes in the mode of giving support to the ERA are highlighted. Due to the organizational structure, the focus for fundable projects lies on both national and transnational projects. The main goals are to enable worldwide leading research and research on transnational level within the EU. This plenary session shows the targets of developments for the ERA for the past seven years as well as for the next seven years to 2020. Since 1998, Maria Theofilatou has worked for the European Commission in the Directorate General for Research and Innovation. Currently working in Unit B3-Research Infrastructures, she is responsible for research infrastructure (RI) activities in Social Sciences and Humanities. She is representing the European Commission at the Strategic Working Group of the European Strategy Forum for Research Infrastructures (ESFRI) on 'Social and Cultural Innovation' and has represented the European Commission at the OECD Global Science Forum, Experts Group on 'Data and Research Infrastructures for the Social Sciences'. Born in Greece she studied at Athens University where she gained degree in Economics. She worked as a researcher at Maastricht University in The Netherlands obtaining a doctorate in Political Science, European politics and policy making.
2013-05-29: A1: Panel: Data Centers and Institutional Partnerships
International Perspective from Open Access Repositories
2013-05-29: A3: A DDI Tools Session: Examples and Application Challenges
A Business Perspective on Use-Case-Driven Challenges for Software Architectures to Document Study and Variable Information
Thomas Bosch (GESIS – Leibniz Institute for the Social Sciences)
Matthäus Zloch (GESIS – Leibniz Institute for the Social Sciences)
Dennis Wegener (GESIS – Leibniz Institute for the Social Sciences)
The DDI Discovery Vocabulary represents the most important parts of DDI-Codebook and DDI-Lifecycle in the Web of Data covering the discovery use case. Now, researchers have the possibility to publish and link their data and metadata in the Linked Open Data Cloud. In various software projects in the statistical domain these basic DDI concepts are reused to a large extent. Thus, the DDI Discovery Vocabulary could serve as a common abstract data model for all of these software projects. As software products have individual requirements, the common abstract data model has to be customized in form of individual data models. The projects MISSY and StarDat are used to document variable and study information, respectively. They could serve as representative use cases to show how abstract data models and project-specific data models cover the requirements of these kind of projects. This presentation gives an overview of the project's software architecture and the interaction of their layers and components. In addition, the succeeding talk about the detailed technical perspective (on use-case-driven challenges for software architectures to document study and variable information) describes how to implement the proposed software architecture and how to implement different persistence formats like DDI-XML, DDI-RDF, and relational databases.
A Technical Perspective on Use-Case-Driven Challenges for Software Architectures to Document Study and Variable Information
Matthäus Zloch (GESIS - Leibniz Institute for the Social Sciences)
Thomas Bosch (GESIS - Leibniz Institute for the Social Sciences)
Dennis Wegener (GESIS - Leibniz Institute for the Social Sciences)
Leveraging software architecture techniques to build up well defined software projects like the model-view-controller-pattern has several advantages: the separation of self-contained functionality and the generation of interactive modules. Creating an abstract application programming interface (API), the main intention is to develop individual software projects which can profit from sharing functionality. As shown in the previous presentation on the business perspective, the idea is to create a reusable core data model - the DDI Discovery Vocabulary, which can be extended and adjusted to the requirements of an individual project. In this presentation, however, we will show how the abstract implementation of the DDI Discovery Vocabulary model integrates well into structured software architecture of a software project and how it might be extended. This is shown by means of the technical implementation of the use case project called MISSY. Based on the business perspective and the requirements to the data model of a project, some possible physical persistence implementations to store data are presented as an API. We will also give a step-by-step guidance into how a project, which uses the DDI Discovery Vocabulary as an exchange format and core data model, can be build up from scratch.
Statistical data exist in many different shapes and forms such as proprietary software files (SAS, Stata, SPSS), ASCII text (fixed, CSV, delimited), databases (Microsoft, Oracle, MySql), or spreadsheets (Excel). Such wide variety of formats present producers, archivists, analysts, and other users with significant challenges in terms of data usability, preservation, or dissemination. These files also commonly contain essential information, like the data dictionary, that can be extracted and leveraged for documentation purposes, task automation, or further processing. Metadata Technology will be launching mid-2013 a new software utility suite, "DataForge", for facilitating reading/writing data across packages, producing various flavors of DDI metadata, and performing other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. DataForge will initially be made available as desktop based products under both freeware and commercial licenses, with web based version to follow later on. IASSIST 2013 will mark the initial launch of the product. This presentation will provide an overview of DataForge capabilities and describe how to get access to the software.
Colectica for Excel: Increasing Data Accessibility Using Open Standards
Jeremy Iverson (Colectica)
Dan Smith (Colectica)
Traditionally, data in spreadsheets and plain text formats do not contain rich documentation. Often, single-word column headers are the only hint given to data users, making it difficult to make sense of the data. Colectica for Microsoft Excel is a new, free tool to document your spreadsheet data using DDI, the open standard for data documentation. With this Excel add-in, users can add extensive information about each column of data. Variables, Code Lists, and the datasets can be globally identified and described in a standard format. This documentation is embedded with the spreadsheet, ensuring the information is available when data are shared. The add-in also adds support for SPSS and Stata formats to Excel. When opening an SPSS or Stata file in Excel, standard metadata is automatically created from the variable and value labels. Colectica for Excel can create print-ready reports based on the data documentation. The information can also be exported to the DDI standard, which can be ingested into other standards-based tools. This presentation will include a live demonstration of the Colectica for Excel tool, showing how to document the contents of a spreadsheet, publish the information, and use the documentation to access data in an informed way.
Integrating Colectica, Nesstar, and DDI-Lifecycle Pt 1
Dan Smith (Colectica)
Both Colectica and Nesstar are software applications based on the DDI metadata standard. Many organizations use Nesstar for publishing data on the web, documented with DDI Codebook. Many organizations also use Colectica to document their studies and datasets using DDI Lifecycle. With the creation of the new Nesstar API, an integration opportunity exists to publish data described in DDI Lifecycle to the DDI Codebook based Nesstar server. This presentation will demonstrate the integration that is now implemented between Nesstar and Colectica. In this demo, a dataset and a study documented in Colectica using DDI Lifecycle metadata will be published to a Nesstar server using the Nesstar API. This integration allows users of both software packages greater opportunities for expanded metadata documentation and improved data publication and visualization.
Integrating Colectica, Nesstar, and DDI-Lifecycle Pt2: Nesstar - a Dissemination Toolkit
Ørnulf Risnes (Nesstar)
Both Colectica and Nesstar are software applications based on the DDI metadata standard. Many organizations use Nesstar for publishing data on the web, documented with DDI Codebook. Many organizations also use Colectica to document their studies and datasets using DDI Lifecycle. With the creation of the new Nesstar API, an integration opportunity exists to publish data described in DDI Lifecycle to the DDI Codebook based Nesstar server. This presentation will demonstrate the integration that is now implemented between Nesstar and Colectica. In this demo, a dataset and a study documented in Colectica using DDI Lifecycle metadata will be published to a Nesstar server using the Nesstar API. This integration allows users of both software packages greater opportunities for expanded metadata documentation and improved data publication and visualization.
2013-05-29: B1: Data Visualization and Mixed Methods Analysis: Using Geographic Data
Geocoding: Adding Another Dimension to Non-Spatial Data
Peter Peller (University of Calgary)
The big difference between non-spatial and spatial data is the absence of geographic coordinates; however, non-spatial data frequently does have some kind of geographic reference embedded in it such as an address, postal code, place name, etc. Geocoding is the process by which non-spatial data with this type of implicit geography is converted into geographic coordinates or linked to a geographic space. Geocoding enriches non-spatial data because it provides researchers with additional possibilities for visualization and analysis. This paper reviews the current methods being used to geocode both structured and unstructured data as well as some of the tools, including open source ones. It also documents the ways in which researchers are repurposing their original non-spatial data through geocoding. Finally, it discusses the importance of geocoding as a service offered by data centers and libraries.
Votes and Values and Pretty Maps: Applying Mixed Methods to Canadian Political Data
Daniel Edelstein (University of Windsor)
Barack Obama's 2012 re-election campaign relied heavily on applying social science findings and methods, drawing on large, linked data sets to target, persuade, and turn out voters. As the Canadian Election Study (CES) data sets include detailed data on respondents' geographic location, they are well suited for research that integrates statistical and GIS analysis. Using CES data, Canadian census data, election results, and geospatial data, we will demonstrate how multiple, diverse data sets can be combined and analyzed with a mix of statistical and GIS methods. The CES and other data, and the methods, should be of interest to political scientists, sociologists, and other social scientists. Our illustrated examples are also structured to guide others in using similar combined methods on different data, or in helping users take full advantage of comparably rich data for better research and more vivid presentation of results.
Building Out a Library Based Data Visualization Service
Justin Joque (University of Michigan)
This paper outlines the development of data visualization services as part of the data services provided by the University of Michigan Library. Over the past two years we have expanded our data visualization services to include consultations, workshops and course-based instruction. While a large proportion of our patrons are from the social sciences, we provide both numerical and geographic data support for the entire breadth of disciplines at our university. In addition to disciplinary diversity, library constituents include undergraduates, graduate students, faculty and the wider community. The myriad skills and perspectives our patrons bring to a data visualization problem often demand that we help them develop abstract ways of thinking about visualization and focus on their data and intentions, rather than fixating on a one-size-fits-all tool that does not exist. This presentation will focus on our current service model and a number of the key difficulties and solutions we arrived at in building an interdisciplinary data visualization service.
2013-05-29: B2: Research Data Management Infrastructures: Facilitating Access and Preservation
Using the New SDA to Make Data More Accessible
Tom Piazza ( University of California-Berkeley)
A major revision of the SDA online analysis system will be released this year (version 4.0). It includes new features both for data archives and for the end users of the data such as researchers and students. For data users, there will be an updated user interface with richer, more dynamic components and a more modern look. At the same time, this will simplify the interface for beginning users by hiding more advanced options until they're needed. Users will also be able to store computed and recoded variables in personal workspaces (with archive permission). For data archives, the new SDA will simplify the setup of an SDA data archive by replacing the current combination of CGI programs and Java servlets with a single servlet-based Java Web application. In addition, a new "sdamanager" application will provide a unified control panel for all SDA archive management functions: creating and configuring SDA datasets, managing search options, etc. From this control panel, archive managers will also be able to specify easily which users can access which datasets. Note that existing SDA datasets will not have to be modified. We will demonstrate these new features during the presentation.
Research Data Management with DATORIUM. Filling a Gap by Developing a Data Sharing Repository at GESIS-Leibniz Institute for the Social Sciences
Monika Linne (GESIS - Leibniz Institute for the Social Sciences)
One of the current projects for digital data preservation at the Data Archive of GESIS - Leibniz Institute for the Social Sciences is the data sharing repository DATORIUM. This repository will serve as a web-based software that enables researchers to manage, document, archive and publish their data and structured metadata autonomously. The data will be freely accessible for the scientific community, so that the culture of data sharing, which has been supported and promoted by the Data Archive over the past 50 years, will be pushed forward and facilitate the re-use of the archived data. The pursued aims of DATORIUM are to ensure long-term preservation of the data and metadata as well as wide-ranging dissemination possibilities for scientists in order to increase the visibility and availability of their research projects. By facilitating access to their research data scientists can support new research or secondary analysis and beyond that they profit from the rise of citations of their work and therefore improve their reputation. According to one of the core priorities of the Data Archive, which is to ensure high quality of the provided data and metadata, the uploaded material in DATORIUM will be reviewed by defined quality criteria.
Research Data Management using CKAN: A Datastore, Data Repository and Data Catalogue
Joss Winn (University of Lincoln)
This paper offers a full and critical evaluation of the open source CKAN software (http://ckan.org) for use as a Research Data Management (RDM) tool within a university environment. It presents a case study of CKAN's implementation and use at the University of Lincoln, UK, and highlights its strengths and current weaknesses as an institutional Research Data Management tool. The author draws on his prior experience of implementing a mixed media Digital Asset Management system (DAM), Institutional Repository (IR) and institutional Web Content Management System (CMS), to offer an outline proposal for how CKAN can be used effectively for data analysis, storage and publishing in academia. This will be of interest to researchers, data librarians, and developers, who are responsible for the implementation of institutional RDM infrastructure. This paper is presented as part of the dissemination activities of the Jisc-funded Orbital project (http://orbital.blogs.lincoln.ac.uk).
Harnessing Data Centre Expertise to Drive Forward Institutional Research Data Management: A Case Study from the University of Essex
Thomas Ensom (UK Data Archive)
The Research Data @Essex project, funded under the Jisc MRD Program, piloted a research data management and sharing infrastructure at the University of Essex. The project team was led by the UK Data Archive in collaboration with support services at the University. It built on the Archive's extensive experience in enabling data re-use, now being carried forward by the new UK Data Service. The project demonstrated that an exchange of knowledge between data centers and institutional data services is mutually beneficial, particularly in accelerating institutional infrastructure development. A major focus was the development of an institutional research data repository based on the EPrints software. Key among our innovations has been the expansion of the EPrints metadata profile, to allow the capture of detail necessary for describing diverse research data, while also meeting relevant standards. The metadata profile adopted is compliant with DataCite and INSPIRE schemas, and also leverages the descriptive power of the Data Documentation Initiative (2.1) schema, in a novel use of metadata developed within the social science community. Our solution offers a full-featured and easy to deploy data repository package, now being explored as replacement for the technology behind the UK Data Service's ESRC Data Store self-archiving facility.
2013-05-29: B3: Harnessing the Power of Data: Expanding Linkages
Indicator-Based Monitoring of an Interdisciplinary Field of Science: the Example of Educational Research
Andreas Oskar Kempf (GESIS - Leibniz Institute for the Social Sciences)
Ute Sondergeld (German Institute for International Educational Research (DIPF))
Monitoring of an interdisciplinary field of research is challenged on several levels: Different scholarly communication cultures come into effect, while on the other hand, knowledge stored in disciplinary databases is indexed in different ways. To create an analytical basis for such a field of science, e.g. educational research, it is necessary to clarify heterogeneous metadata by taking into account distinct semantic spheres of concepts and understanding. Scholarly databases contain a multitude of information that is relevant for monitoring. However, owing to conventional user surfaces, databases are usually not interlinked for analytic purposes. Hence, they can only inform about changes in research and publication if respective indicators are deduced, standardized and visualized. The contribution presents findings from the interdisciplinary scientometric project "Educational Research Monitoring" (MoBi). Based on multi-dimensional analyses of projects and publications, MoBi identifies adequate characteristics for describing the field of research and output, and its reception. In a second step, such aspects are translated into standard indicators for the description of science, e.g. research activity, networking and proportion of external funding. These indicators provide a conceptual basis for a web-based monitoring service. MoBi is collaboratively conducted by the Leibniz institutes GESIS, DIPF and ZPID, in co-operation with iFQ.
DataBridge: Building an E-science Collaboration Environment Tool for Linking Diverse Datasets into a Socio-metric Network
Jonathan Crabtree (Odum Institute UNC Chapel Hill)
There are currently thousands of scientists creating millions of data sets describing an increasingly diverse matrix of social and physical phenomena. This rapid increase in both amount and diversity of data implies a corresponding increase in the potential of data to empower important new collaborative research initiatives. However, the sheer volume and diversity of data presents a new set of challenges in locating all of the data relevant to a particular line of research. Taking full advantage of the unique data managed by the "long-tail of science" requires new tools specifically created to assist scientists in their search for relevant data sets. DataBridge is an e-science collaboration environment designed specifically for the exploration of a rich set of socio-metric tools and the corresponding space of relevance algorithms, and their adaptation to define semantic bridges that link large numbers of diverse datasets into a socio-metric network. Data from large NSF funded projects will be analyzed to develop relevance-based data discovery methods. This paper will discuss the design of DataBridge and the Socio-metric network analysis algorithms that will be used to explore the space of relevancy by metadata and ontology, by pattern analysis and feature extraction, and via human connections.
Terra Populus-Integrated Data for Population and Environmental Research
Peter Clark (University of Minnesota)
Alex Jokela (University of Minnesota)
Terra Populus (TerraPop) is one of several projects funded by the National Science Foundation under the DataNet initiative. This initiative seeks to build a network of partners that will create infrastructure and tools for long-term digital data preservation, access, and re-use. The specific goal of Terra Populus is to lower barriers to conducting interdisciplinary human-environment research by making data from different domains easily interoperable. Building on the Minnesota Population Center's past experience with IPUMS and NHGIS, TerraPop will incorporate census, geospatial, land use, land cover, and climate data, along with other environmental, agricultural, and economic datasets. These data currently exist with disparate formats and structures, have generally inadequate metadata, and have incompatible geographic identifiers. This session will focus on the technology infrastructure developed for the initial beta release of the TerraPop system. We'll provide an overview of the software architecture and data models underlying the system, encompassing micro-data, area-level data, raster data, and associated metadata. We'll discuss the structure of the initial data available in the prototype, why these datasets were chosen, and how they're linked. Lastly, we'll discuss the kinds of research that we hope to support via the beta release.
2013-05-29: B4: Qualitative and Atypical Data: Expanding and Facilitating Usage
What Do They Do With It? How People Re-Use Qualitative Data from the UK Data Service
Libby Bishop (UK Data Archive)
Re-use of qualitative data has grown significantly in recent years as demonstrated by numerous publications, conference sessions, and funded projects. At the UK Data Service, over 1000 qualitative data collections were delivered to users in 2009-2010. For the first time, we have undertaken a systematic analysis of how people are re-using qualitative data collections. Analysis included type of user (e.g., student, higher education staff member, government staff, etc.), discipline, whether the re-use is for education; such as post-graduate theses or teaching; or for new research, and whether the topic of the re-use is related to the original research, or for entirely different investigations. Initially, the data were collected in order to provide feedback to data creators about how their data have been repurposed. The response to these reports was very positive; data creators want to know what happens to their data. Some intend to use the report as part of their own impact reporting for research assessment. We value this information for internal purposes as well, such as better understanding usage levels of our holdings, locating innovative uses to develop into case studies, and fostering relationships between data creators and users.
Sharing Qualitative Data of Business and Organizational Research Problems and Solutions, Bielefeld University
Tobias Gebel (The German Data Service Center for Business and Organizational Data (DSZ-BO))
In German empirical organizational research, qualitative methods are used predominantly. For this specific field of research it is typical that the samples are often very small and sensitive. Also, the data often are not usable for other researchers. Consequences are the non-exhaustion of the analysis potential of interesting research data, recurring interviews and a strain on the field. That causes an ongoing decline of the willingness of respondents to participate in interviews. Data sharing can contribute to relieve overstressed research populations and to exhaust the analysis potential of available data. Nevertheless, data sharing has no tradition in qualitative organizational research. Our presentation focuses on four central requirements for data sharing: visibility of data in its existence, data documentation, data protection and data access. We will address the specific features of data documentation, as well as the data protection procedure. Additionally, we would like to discuss specifics and possible solutions for those requirements of qualitative organizational studies.
Use with Caution: A Multi-disciplinary Analysis of Data Use and Access Conditions
Tiffany Chao (University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science)
The rising expectations for public sharing of research data have triggered a greater awareness for describing and documenting data for potential use by a global community. For researchers, it is an opportunity to detail special notes, limitations, and conditions to facilitate appropriate reuse. This study examines what conditions for use and access of publicly available data are made visible through metadata description and how these limitations may differ across disciplines and data types. Content for analysis consists of data records from the Earth and social sciences, which are drawn from the Global Change Master Directory (http://gcmd.nasa.gov/), a public metadata repository for data. The comparative investigation not only brings forth prevalent issues associated with data within a domain research area, but also what commonalities may exist across fields and the types of data produced and used. The content of these use and access descriptions also serves as a point of comparison with findings in the literature from studies on domain-specific data practices and related perceptions. By bringing together these different perspectives, a more cohesive multi-disciplinary data landscape is developed that can inform curation support structures for the stewardship of research data.
2013-05-29: B5: Data Citation: In Principle and Practice
Data Citation in Australian Social Science Research: Results of a Pilot Study
Steven McEachern (Australian Data Archive)
The importance of data citation for understanding the impact of social surveys has becoming increasingly recognized as a priority concern among research infrastructure providers and funders (ANDS, 2012, NSF, 2012; Ball and Duke, 2012). For data archives, data citation provides a mechanism to understand the dissemination activities of the archive, particularly in enabling access to data for secondary use. While social science data archives have long recommended or required the use of citations as a condition of access to datasets, the compliance with this condition is minimal (Piwowar, 2011). For this reason, many data archives and repositories have implemented or are currently exploring new mechanisms for enabling data citation, such as DOIs. Such a pilot study being conducted by the Australian Data Archive (ADA). This project involves three elements: - a review of the current literature on data citation practices in Australian and international social science - a survey of current practice among users of 5 major Australian social science data sets - a pilot study of the use of DOIs with ADA datasets. The paper will present the current results of this project, recommendations for the ADA regarding data citation, and implications for data archives and repositories, more generally.
Making Data Citable. The Technical Architecture of the da|ra Information System
Dimitar Dimitrov (GESIS - Leibniz Institute for the Social Sciences)
Erdal Baran (GESIS - Leibniz Institute for the Social Sciences)
Dennis Wegener (GESIS - Leibniz Institute for the Social Sciences)
Today, exact citation and referencing of datasets used for research becomes more and more important due to the enormous growth of data we are experiencing. Identification mechanisms such as DOI can be used to uniquely identify datasets in a persistent way. However, users need support by information systems for attaching identifiers and discovering data. The da|ra information system is a system that allows registering research datasets, searching for registered datasets, and following links to the landing pages of the registered datasets. We present the technical architecture of the da|ra information system and the 3rd party services the system is based on. The architecture is based on SOA principles and is implemented based on the Grails framework. The functionality of the information system is exposed by the da|ra portal as well as by service interfaces, which allow users to build their own tools for registering datasets or to integrate the functionality into existing environments.
Databib: A Global Catalog of Research Data Repositories
Jochen Schirrwagen (Bielefeld University)
Databib (http://databib.org) is a curated, global, online catalog of research data repositories. Librarians and other information professionals have identified and cataloged over 500 data repositories that can be easily browsed and searched by users or integrated with other platforms or cyber infrastructure. Databib can help researchers find appropriate repositories to deposit their data, and it gives consumers of data a tool to discover repositories of datasets that meet their research or learning needs. Users can submit new repositories to Databib, which are reviewed and curated by an international board of editors. All information from Databib has been contributed to the public domain using the Creative Commons Zero protocol. Supported machine interfaces and formats include RSS, OpenSearch, RDF/XML, Linked Data (RDFa), and social networks such as Twitter, Facebook, and Google+. In this session, we will give a demonstration of Databib and give an overview of how researchers, librarians, funding agencies, students, data centers, software developers, and other users can utilize and integrate it.
2013-05-29: C1: ODIN: 'Identifiers' Connecting Researchers and Research Outputs
2013-05-29: C2: Panel: Strategies and Models for Data Collection Development
Strategies and Models for Data Collection Development
Hailey Mooney (Michigan State University)
Karen Hogenboom (University of Illinois Urbana Champaign)
Bobray Bordelon (Princeton University)
Kristen Partlo (Carleton College)
Michelle Hudson (Yale University)
Maria Jankowska (University of California Los Angeles)
Issues around managing and providing access to data are receiving a lot of attention from academic libraries and information technology departments. This session will discuss how academic libraries handle data collection development and acquisitions. Panelists will share institutional case studies to illustrate various experiences and practices in the development of data collections. Issues include navigating diverse format types and licensing issues, funding and budgets, selection responsibility, and collection development policy statements. Innovative models, such as contests to identify data sets for acquisition will be shared. The treatment of small individual data sets, to large subscription databases, to freely available online resources will all be considered. This session will provide an opportunity for participants to engage in open discussion of a key aspect of their responsibility to ensure access to data resources for their communities.
2013-05-29: C3: Integrating Data Management and Discovery
Utilizing DDI-Lifecycle in the STARDAT Project to Manage Data Documentation
Wolfgang Zenk-Möltgen (GESIS-Leibniz Institute for the Social Sciences)
The STARDAT project at GESIS – Leibniz Institute for the Social Sciences is an effort to develop an integrated metadata management system for social science datasets at the Data Archive for the Social Sciences. It will transfer the features of current applications and tools into a modular software suite that is compatible with current metadata standards like DDI-Codebook and DDI-Lifecycle. It covers multi-language documentation on study and variable-level, and enables long-term preservation and export into different publishing portals like the GESIS Data Catalogue, ZACAT, CESSDA Data Portal, Sowiport, and da|ra. During the development phase of the project, the Data Archive faces additional demands on data archiving. Some of those challenges are new data types, persistent identifiers, highly structured or multi-level datasets, data collected in experimental research, or process data. These requirements lead to an update of the metadata schema and additional functional requirements for STARDAT. The presentation will focus on the development of the conceptual model and will show comparisons or mappings with other metadata models, e.g. from the Data Catalogue, da|ra, DataCite, Dublin Core, and the DDI ontology. Technical considerations and the implementation of the model will be considered according to the stage that they are in the development process.
UK Data Service Discover: Visible Connections and a Structuralist Approach to Discovery/Making Data Visible - Building an Enterprise Search Solution from the Ground Up
Lucy Bell (UK Data Archive)
Matthew Brumpton (UK Data Archive)
This paper describes the journey taken by the UK Data Service to redesign its resource discovery. "Discover" the new, openly-available, faceted search/browse application, makes metadata more accessible for all users. It provides a single point of access to a wide range of high-quality economic and social data, including large-scale government surveys, international macro-data, business micro-data, and census data (1971-2011) -plus, related resources. We have moved from a federated search environment to centralization and, borrowing the principles of FRBR, have created a systematized way of browsing connected metadata. The new search links data collections to: publications; case studies of use; support guides; and beyond. It also encourages serendipity, allowing the user to discover items indexed with similar terms. The system is simple and straightforward, but also innovative in encouraging users to make connections between resources. This paper describes the work undertaken to create efficiencies in metadata use and to develop multi-core functionality, which allows users to search metadata encoded in different schema, simultaneously. It also puts the work into the context of current information management theory, including a move to a wider, macro-view of metadata which supports the creation and sustainability of connections within and between resources of differing natures.
In the Mix - Developing Open Source Search Technologies on the Microsoft Platform
Matthew Brumpton (UK Data Archive)
Matthew Brumpton will demonstrate the development environment and architecture of the UK Data Services' single search interface Discover™. This will cover the architecture of the n-tier search application with the use of both open source and Microsoft technologies and tools. This involves a walk-through of the Discover™ technology stack and how to manage a scalable Solr implementation on the Microsoft platform. He will also be showing tools and techniques to integrate the Java and .NET technologies to build scalable search applications in a distributed environment along with some tips on how to monitor and debug the whole system.
Metadata Driven Tools Developed for the Canada Research Data Centre Network
Donna Dosman (Statistics Canada)
Pamela Moren (Statistics Canada)
Over the past 5 years the Canadian RDC Network in partnership with Statistics Canada has developed a metadata driven catalogue for its data collection which will be incorporated into a soon to be developed Data Management tool for their newly centralized data repository. The goal of this project was to create a suite of tools with which the data and metadate are managed more efficiently and researchers can discover the collection more easily by exploiting machine actionable applications. The tools developed include a RDC Metadata editor tool in DDI3.1, an ingester tool to convert DDI2 to DDI3, a researcher discovery tool as well as a conversion tool which converts Statistics Canada metadata to DDI3. This presentation will focus on the workflow used to populate the metadata catalogue, the tools developed for the process of building the metadata catalogue as well as the tools developed that researchers will be able to use to discover the data and metadata. We will also discuss lessons learned throughout the project and what our next steps are.
2013-05-29: C4: Beyond Theory: Data Management in the "Real World"
Data Management 2.0-Real World Adaptation and User Feedback
Stefan Friedhoff (Bielefeld University)
Due to the demands of progressively more sophisticated data management, many researchers face problems while adapting existing DM strategies to their own research processes. The INF project (Information and Data Infrastructure), which assists data documentation across 17 projects within a Collaborative Research Center (SFB882) in the social sciences, identified three main problems for implementing data management strategies: methodological problems (1); acceptance problems (2); and, problems of granularity (3). Based on open interviews, focus groups and surveys, we were able to identify specific problems in these areas and were able to develop both technical as well as methodological solutions. In this presentation we present a systematization of problems; the corresponding resolving strategies; show to what extent documentation can be standardized in a research center holding heterogeneous data; as well as where it becomes necessary to adapt specific solutions to overcome methodological differences.
Bringing Researchers into the Game with FORSbase: An Integrated System for Archiving, Networking, and Survey Construction
Brian Kleiner (Swiss Centre for Expertise in the Social Sciences (FORS))
Small data archives in Europe often lack the resources for adequate documentation and delivery of data. FORSbase is an IT project in the works at the Swiss Centre for Expertise in the Social Sciences. FORS in Lausanne will facilitate and automate documentation and access in order to free resources for promotional and training activities. Its goal is: to combine within a single system and database a wide range of archiving functions and tools for researchers to document and deposit their data; access data and metadata; establish contacts and communicate with other researchers; and, to create and carry out surveys. All of this is done within individual researcher workspaces, where specific project descriptions and data are safely stored. Within the workspaces, researchers will also have access to a messaging system, a question data bank, a survey management tool, and other resources to assist them in their work. The benefits of such a system for researchers are the ease with which they can manage and store their data, as well as search for and directly download the data of others. Plus, the system will provide tools that help in designing and implementing surveys that are documented throughout the life cycle.
Do We Need a Perfect Metadata Standard or is Good Enough Good Enough?
Samuel Spencer (Open Source Developer)
The role of the data archivist focuses on the collation and sharing of research information. Historically, the quality of incoming digital content has been poor, a state that has driven the need for standards which can adequately capture research activities. One such standard, the Data Documentation Initiative is held as the perfect standard for archivists due to its complex promotion of reuse of survey metadata. However, its complexity has made the creation of software targeted at researchers difficult. As such, with minimal uptake of the standard in the research community, the onus still falls on archivists to transcribe incoming data into this complex standard. This presentation takes an alternative approach and examines how it might be possible to create software targeting basic researcher needs, by simplifying the task of survey research. The aim of this is to create and promote an XML standard that is "good enough" for survey researchers that will increase software adoption among the survey research community, thereby improving the quantity and overall quality of incoming metadata delivered to social science archives.
2013-05-29: C5: Facilitating Access to Sensitive Data
Implementing a Secure Data Enclave with Columbia University Central Resources
Rajendra Bose (Columbia University)
Our approach to implementing a Secure Data Enclave (SDE) pilot for Columbia social science researchers during the 2012-13 academic year builds on the work presented at previous IASSIST workshops and panels on access to sensitive data. The SDE is a scalable alternative to existing "cold room" solutions, and provides access to sensitive or restricted data over the campus network (or over Columbia's virtual private network) with widely used secure remote access software. Our SDE pilot was designed and implemented with guidance from the University's Information Security Office and makes use of other central IT resources including an expanding virtual machine infrastructure. The project was initiated by Columbia's social science computing community which spans a number of departments and research centers. This group engaged university administrators and has proposed the goal of expanding a successful pilot into a research service at the university level. This paper will share Columbia's experiences and present guidelines for other academic institutions interested in implementing an SDE using existing central resources including IT and the libraries.
Expanding the Research Data Center in Research Data Center Approach
Joerg Heining (Institute for Employment Research (IAB))
Since, 2011, the Research Data Centre (FDZ) of the Federal Employment Agency (BA) at the Institute for Employment Research (IAB) in Nuremberg, Germany provides remote access to confidential micro data for approved data users. By implementing the so-called Research-Data Center-in-Research Data Center approach data users can access FDZ data from the premises of institutions (other RDCs, data enclaves, etc.) which share comparable standards with regard to data protection and the perseverance of confidentiality as the FDZ. Starting with four external sites in Germany and one site in the US, the FDZ will expand this remote network both in Europe and in the US in 2013. In order to achieve this ambitious goal, several technical and organizational measures need to be implemented. In addition to this, legal concerns have to be removed and funding needs to be ensured. The presentation will describe how the lessons learned from setting up the initial sites influenced this expansion and to what extend new challenges have to be faced. Starting and establishing an international operating remote network turned out to be a complex and ambitious task. But expanding and sustaining such a network is an additional challenge to overcome.
The UK Data Service: Delivering Open and Restricted Data (and Everything In-between)
Richard Welpton (UK Data Archive)
With the Open versus Restricted data debate in full throttle, we are in danger of forgetting about the intermediate access options (not quite open, not quite restricted, but access with conditions attached as appropriate to the data). Providing an array of data access options which vary in stringency with confidentiality, is a useful mechanism in which researchers self-select access, depending upon their data requirements. This actually opens up data while helping to protect the security of confidential data. Formally launched in October 2012, the UK Data Service is a new service that is mandated to provide an entire spectrum of access options, from open data to restricted sensitive data, including a number of options in-between. This paper presents the new arrangements for economic and social data access via the new UK Data Service, and demonstrates why offering a range of access options (the full Data Access Spectrum), from downloadable open data to more restricted access, increases the range of data available, and widens research possibilities, while maximizing returns from investments made in data delivery.
The State of the Art of Remote Access to Condidential Microdata in Europe
David Schiller (Institute for Employment Research (IAB))
There are different ways to access and work with highly detailed and confidential micro-data. Researchers can work on-site in the facilities of the data providers, or they can submit their program code to the data provider where it is carried out on the servers of the data providers automatically, or by the staff of the RDC. A smooth way of working with the data is Remote Access (RA). The data stays at the facilities of the data provider and only a live stream is transferred to the users screen. At the same time the user can see the micro-data and can work directly with it, i.e. see intermediate results without delays. The FP7 funded European project Data without Boundaries (DwB) has carried out a survey about the state of the art of Remote Access in Europe. Eight RA centers participated and provided information about their solution with a focus on the technical point of view. The presentation will highlight and discuss the relevant findings of the survey. In addition, possible developments regarding Remote Access to confidential micro-data that are based on the existing solutions will be discussed.
The ESS.VIP Programme: a response to the challenges facing the ESS
Eduardo Barredo Capelot (Eurostat Director of Social Statistics)
Eduardo Barredo Capelot is Director for Social Statistics for Eurostat, the European Commission's statistics office. His role covers population and demography, labour market, living conditions and quality of life, education, health, and social protection statistics. He has spent most of his professional career at Eurostat, where prior to his current position he was responsible for fiscal statistics and business statistics as head of the unit for business statistics, co-ordination and registers. Before that, he was the assistant to the Director General in Eurostat and head of the units dealing with government finance statistics and business statistics. An economist and geographer by academic training, he holds a post graduate degree from the College of Europe in Bruges and joined Eurostat in 1991.
2013-05-30: D1: Facilitated Discussion: Research Data Management: Sharing Our Experiences
2013-05-30: D2: Opening Access to Non-Digital and Historic Data
Realizing digital futures: Digitizing and building an online system for key post-1945 social science data sources
Louise Corti (UK Data Archive)
The Digital Futures project was funded by ESRC following a gap identified in the Methods Infrastructure portfolio for qualitative data. It aims to maximize the impact from existing research and resource investments. We promised to deliver access to key post-war qualitative data via online data browsing and exploration, using robust data standards identified by the DDI qualitative working group. In this paper we will describe how we: prioritized data for large-scale digitization by targeting scholarly communities; specified an open-source publishing and data delivery system; employed a mechanism for reliably citing data at the sub- collection level to enable the publication of enhanced outputs and publishing data as linked data resources. The work will be fully integrated into the new UK Data Services ingest and data discovery and delivery architecture.
2013-05-30: D4: DASISH: Data Service Infrastructure for the Social Science and the Humanities
DASISH: The Big Picture
Hans Jorgen Marker (Swedish National Data Service)
DASISH is an EU FP7 funded project that brings together the five ESFRI infrastructures within social science and the humanities to provide common solutions to common problems. The five infrastructures are CESSDA, CLARIN, DARIAH, ESS and SHARE. Major areas that are addressed by DASISH include: occupation coding, questionnaire design, survey translation, question form and quality, survey management, deposit services, model for common deposit service, rules and guidelines for proper data management, trust federation for data access, robust PID service, improving metadata quality, joint metadata domain, workflow implementation, annotation framework, ethical and legal issues, creatin a legal and ethics competence centre, legal and ethical issues involved in data presentation. The results are being continuously presented in various workshops and through other channels. DASISH commenced in January 2012 and has results to report that are of interest to the IASSIST community.
Data Archives in an Environment of multiple Research Infrastructures: Towards a reference architecture for e-Infrastructures in the Social Sciences and Humanities
Mike Priddy (Data Archiving and Networked Services (DANS))
Maarten Hoogerwerf (Data Archiving and Networked Services (DANS))
The developing research infrastructure landscape is diverse, with each infrastructure having particular requirements for data, metadata and protocols, therefore, it is difficult for anbsp;data archive to find a manageable solution to meet all the disparate demands. DASISH is working towards a reference architecture for infrastructures in the social sciences and humanities that recognizes the role data archives play and the challenges that they face. However, the heterogeneity and requirements of the communities that the infrastructures serve must also be maintained. We present our initial findings on the current state of the architectures of the five European social sciences and humanities infrastructures that are part of DASISH, and a vision for a referenc earchitecture, from the perspective of data archives.
New Legal Challenges: New EC Privacy Regulation. Data Preservation and Data Sharing in Danger?
Vigdis Kvalheim (Norwegian Social Science Data Services (NSD))
On January 25th the European Commission published its proposal for a new European Data Protection Regulation. In December the European Parliament published its proposed amendments to thenbsp;EC draft. The proposed regulation will replace the 1995 Data Protection Directive. The new EC regulation will affect large areas of European research, and how it subsquently will be implemented and practiced, is of great interest to European researchers as wel as research infrastructures. The amendments proposed by the parliament include restrictive proposals for historical, statistical and scientific purposes. The reactions in the academic community across Europe have therefore been that of surprise and deep concern, in particular with the possible negative impacts on register based research and the possibilities for long-term preservation and data sharing across Europe. The old Directive from 1995 was not drafted with research interest in mind, but following an extensive consultation procedure that resulted in systematic submissions and political actions from several key players in the research sector, it nevertheless contains several important exemption provisions for scientific purposes. Today EC regulation affords research a privileged position in order to meet and highlight the research sector's legitimate need to process personal data. For empirical research across Europe it is crucial that the research exemptions are continued and if possible improved in the new Regulation. In DASISH WP6 we focus on legal and ethical issues, constraints and requirement for data use, data preservation and sharing for all types of data in the SSH domain. In that nbsp;context the proposed legal framework and how it balances the interest in privacy and research is important. In this presentation we look at this balance and how it may shift in favor of privacy if the amendments proposed by the European Parliament are implemented in its present form. We go on to argue that this may have unintended consequences for research and its contribution to society.
Education and Training for Research Infrastructures
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Laurence Horton (GESIS - Leibniz Institute for the Social Sciences)
The DASISH WP7 brings together representatives from the five social sciences and humanities European Research Infrastructure Consortia (ERICS) to produce training modules and workshops for infrastructure projects on research data management, long-term digital preservation and data reuse. This presentation will illustrate the activities of partners in WP7 as they bring together existing training resources and integrate them with their own outputs the material created by other DASISH work packages. The presentation will cover the structure of the work package, introduces its outputs so far, and outlines its future activities.
2013-05-30: D5: Perspectives: Challenges for Multi-Disciplinary Research Data Infrastructures
Multi-Disciplinary Research Data Infrastructures: Results from a Roadmap Project
Jens Klump (German Research Centre for Geosciences (GFZ))
The flood of digital data, which arises from studies in the social sciences or results from satellite missions in earth or space sciences, is growing rapidly. The permanent storage and its provision for future generations of researchers represent a challenge to the entire science system, however, many questions still remain unresolved. Financial aspects, organizational and technology issues in creating multi-disciplinary research data infrastructures, as well as the legal and political framework need to be clarified. These challenges will be discussed in the context of this session, which will take the form of a discussion panel introduced by four presentations. The data life cycle will serve as the guideline for the presentaitons which take a closer look at its specific challenges. The overall objective is the development of a multi-disciplinary research data infrastructure. The first presentation describe the Private Domain, i.e. the challenges in handling the data deluge from the perspective of the scientist.
Challenges for Multi-Disciplinary Research Data Infrastructures
Harry Enke (Leibniz Institute for Astrophysics Potsdam (AIP))
Jochen Klar (Leibniz Institute for Astrophysics Potsdam (AIP))
The flood of digital data, which arises from studies in the social sciences or results from satellite missions in earth or space sciences, is growing rapidly. The permanent storage and its provision for future generations of researchers represent a challenge to the entire science system, however, many questions still remain unresolved. Financial aspects, organizational and technology issues in creating multi-disciplinary research data infrastructures, as well as legal and political framework need to be clarified. These challenges will be discussed in the context of this session, which will take the form of a discussion panel introduced by four presentations The data life cycle will serve as the guideline for the presentations which take a closer look at its specific challenges. The overall objective is the development of a multi-disciplinary research data infrastructure. The second presentation takes a closer look at the Group Domain represented by VREs.
Challenges for Multi-Disciplinary Research Data Infrastructures: Preservation = Persistent Domain
Torsten Rathmann (German Climate Computing Center (DKRZ))
The flood of digital data, which arises from studies in the social sciences or results from satellite missions in earth or space sciences, is growing rapidly. The permanent storage and its provision for future generations of researchers represent a challenge to the entire science system, however, many questions still remain unresolved. Financial aspects, organizational and technology issues in creating multi-disciplinary research data infrastructures, as well as legal and political framework need to be clarified. These challenges will be discussed in the context of this sesison, which will take the form of a discussion panel introduced by four presentations. The data life cycle will serve as the guideline for the presentations which take a closer look at its specific challenges. The overall objective is the development of a mulit-disciplinary research data infrastructure. The thrid presentation deals with challenge of the Persisten Domain such as cost structures and risk management.
Challenges for Multi-Disciplinary Research Data Infrastructures: The Private Domain
Dieter Van Uytvanck (Max Planck Institute for Psycholinguistics (MPI-PL))
The flood of digital data, which arises from studies in the social sciences or results from satellite missions in earth or space sciences, is growing rapidly. The permanent storage and its provision for future generations of researchers represent a challenge to the entire science system, however, many questions still remain unresolved. Financial aspects, organizational and technology issues in creating multi-discipllinary research data infrastructures, as well as legal and political framework need to be clarified. These challenges will be discussed in the context of this session, which will take the form of a discussion panel introduced by four presentations. The data life cycle will serve as a gudieline for the presentaitons which take a closer look at its specific challenges. The overall objective is the development of a multi-disciplinary research data infrastructure. This session will discuss challenges of thenbsp;Private Domain.
Challenges for Multi-Disciplinary Research Data Infrastructures: The Public Domain
Ralph Muller-Pfefferkorn (Technische Universitat Dresden (ZIH))
The flood of digital data, which arises from studies in the social sciences or results from satelllite missions in earth or space sciences, is growing rapidly. The permanent storage and its provision for future generations of researchers represent a challenge to the entire science system, however, many questions still remain unresolved. Financial aspects, organizational and technology issues in creating multi-disciplinary research data infrastructures, as well as the legal and political framework need to be clarified. These challenges will be discussed in the context of this sesison, which will take the form of a discussion panel introduced by four presentations. The data life cycle will serve as the guideline for the presentations which take a closer look at its specific challenges. The overall objective is the development of a mulit-disciplinary research data infrastructure. This presentation covers the challenges of the Public Domain and present best practices.
2013-05-30: E1: IFDO: Institutional Data Policies in 40+ Countries
IFDO Survey on Research Funders' Data Policies
Vigdis Namtvedt Kvalheim (International Federation of Data Organizations (IFDO) )
In 2012 and early 2013, International Federation of Data Organizations (IFDO) carried out an expert survey on research funders' data policies in 40+ countries. The survey aims to give an overview of the frequency and of the quality of such requirements and the guidelines which promote data sharing.
Dynamics of Data Sharing and Data Policies in Germany
Ekkehard Mochmann (International Federation of Data Organizations (IFDO))
In early 2012 and early 2013, International Federation of Data Organizations (IFDO) carried out an expert survey on research funders' data policies in 40+ coutnries. This presentation is a case study on country and funder specific policies in Germany.
2013-05-30: E2: Making Complex Confidential Microdata Useable
Towards a Procedure to Anonymize Micro Data: Anonymizing Data from Offical Statistics for Public Use
Katelijn Gysen (Swiss Centre of Expertise in the Social Sciences (FORS))
In general, the data collected by a national statistical office relies on large samples, regular data collection and mostly long time series, which makes the data interesting for secondary use. In order to give public access to this data, the issue of finding a good balance between the re-identification risk of the respondents and the utility of the data has to be dealt with. Several projects and publications in the field of Statistical Disclosure Control describe successfully the possiblities of statistics and have developed tools to anoymize; whereas little ha been published on thresholds to guarantee an acceptable balance between the re-identification risk and the data-utility. This presentation will give insight ot the work that has been done by FORS in collaboration with the Swiss Federal Statistical Office (SFSO) towards a procedure to anoymize data such that the data can be provided to the public. The procedure describes how to define and reach the required level of anonymity of a respondent. Of course, this procedure and the thresholds can be adjusted for any other dissemination of micro-data, e.g. survey data collected by researchers interested in making their data available for secondary use.
Ingo Barkow (German Institute for International Educational Research (DIPF))
David Schiller (Institute for Employment Research (IAB))
IT processes in Research Data Centers need extensive administrative resources especially if there is also service for external customers like remote access or remote execution users. Some of those issues could be solved by external cloud services. This means all metadata and research data are shifted from the internal data center to an outside provider who guarantees high service levels, redundancy and availability. Nevertheless the challenge of handing out the data causes legal and technological challenges. This presentation will discuss those implications from scientific and technical sides with the examples of the German situation considering limitations provided by the data protection laws and other regulations. Furthermore, we will also discuss how this change in paradigm and especially the advantages can be communicatednbsp;to data producers, decision makers, research data institutions, data archives and scientists as additionally we predict a lot of mistrust in cloud-based external data hosting.
Legally Bound? Data Protection Legislation and Research Practice
Laurence Horton (GESIS - Leibniz Institute for the Social Sciences)
Katharina Kinder-Kurlanda (GESIS - Leibniz Institute for the Social Sciences)
This paper explores how data archives provide a service facilitating the gap between legal data protection requirements and research practices. Researchers encounter legal frameworks concerning data protection in all phases of the data-lifecycle, but research practice - without due care - can clash with these frameworks. Social science archives can intervene, helping researchers navigate an environment which simultaneously pushes data sharing, and consideration of the individual's right to privacy. These aims are not mutually exclusive, but pressure on researchers to 'get it right' when collecting, storing, analyzing, and anoymizing data has never been greater. We examine how archives can intervene in stages of the data-lifecycle, against the context of German and British regulatory requirements. We propose that whilst differences exist in the substance of laws (e.g. Bundesdatenschutzgesetz and Data Protection Act), research cultures, and funding environments, archives face similar challeges in the data reuse/privacy dynamic. With research innovations such as geo-referenced data, and increasingly, cross-national collaborative projects exisiting either across national laws or outside established legal frameworks - the regulatory grounding is not firm. Here facilitiation becomes an act of setting best-practice standards as guidance and, we suggest, data archives are best suited to be guides.
Generating Useful Test Data for Complex Linked Employer-employee Datasets
Peter Jacobebbinghaus (German Data Service Center for Business and Organizational Data (DSZ-BO))
When data access for researchers is provided via remote execution or on-site use, it can be beneficial for data users, if test datasets that mimic the structure of the original data are disseminated in advance. With these test data researchers can develop their analysis code and avoid delays due to otherwise likely syntax errors. It is not the aim of test data to provide any meaningful results or to preserve statistical inferences. Instead, it is important to maintain the structure of the data in a way that any code that is developed with these test data will also run on the original data without further modifications. Achieving this goal can be challenging and costly for complex datasets such as linked employer-employee datasets (LEED) as the links between the establishments and the employees also need to be maintained. We illustrate how useful test data can be develpoed for complex datasets in a straightforward manner at limited costs. Our apporach mainly relies on traditional statistical disclosure control (SDC) techniques such as data swapping and noise addition. The structure of the data is maintained by adding constraints on the swapping procedure.
2013-05-30: E3: Case Studies: Maximizing Usage of Important Datasets
Development of the Heath Research Data Repository (HRDR) and the Translating Research in Elder Care (TREC) Longitudinal Monitoring System (LMS)
James Doiron (University of Alberta)
Pascal Heus (Metadata Technologies North America)
The Health Research Data Repository (HRDR), located within the Faculty of Nursing, University of Alberta, Canada, entered its operational phase in January 2013. The HRDR employs secure remote access for its approved useres and is a secure and confidential environment for supporting health related research projects and the management of their data/metadata. Additionally, the HRDR has a mandate to promote educational opportunities regarding research data management best practices. One of the initial projects underway within the HRDR is collaboration with Metadata Technologies North America (MTNA) and Nooro Online Research in developing a data infrastructure platform for supporting a Longitudinal Monitoring System (LMS) using data collected within the Translating Research in Elder Care (TREC) project (http://www.trecresearch.ca). Specifically, the LMS data infrastructure plaform uses DDI based metadata to support the collection/ingestion, harmonization, and merging of TREC data, as well the timely delivery of reports/outputs based on these data. Development of the HRDR, as well as a current overview of its status and projects will be discussed. Specific focus will be placed upon the development, current status, and forward work relating to the TREC Longitudinal Monitoring System project.
From 1911 to 203: Renewing UK Birth Cohort Studies Metadata
John Johnson (University of London)
Jack Kneeshaw (UK Data Archive)
CLOSER (Cohorts and Longitudinal Studies Enhancement Resources) is a five-year program which aims to maximize the use, value and impact of these studies both within the UK and abroad. The program is run by a network of nine of the UK's leading studies (8 cohorts and one panel study), with participants born between 1911 and 2007. A major strand will be documenting these surveys and data (over 250 survey instruments and around 250,000 data variables) in DDI-L. The surveys cover a wide range of survey collection methods, paper questionnaires to CAI, biomedical and administrative linked data. The presentation will cover the workflow and systems used to capture paper questionnaires, archived documents, available DDI 2.0 and other electronic metadata from the survey and metdata captured from data into DDI-L. This includes an in-house application for questionnaire capture, written in Ruby on Rails, Python-based data quality tools to interface with SPSS and Colectica for overall data management and its co-ordination across the 8 studies. The presentation will also highlight how some recent changes to the DDI specification will assist in the management of these projects.
2013-05-30: E4: Case Studies in Research Data Management
Erasmus University Rotterdam's approach to supporting researchers with data management and storage
Paul J. Plaatsman (Erasmus University Rotterdam)
As in other Dutch academic institutions we talked a lot about research data. With cases about fraud within our university and other universities in the Netherlands, policy makers and university boards became more demanding on the university libraries helping out with better storage of research data and educating PhD and young researchers about proper ways of handling their research data. So from talks we had to get into action. We are presently doing so by offering an information course about research data within our already existing Research Matters portal. We also want to offer our researchers a safe environment to store their research data for the mid-long term, 5 to 10 years, and think about solutions for the dataset that needs to be stored indefinitely in the national data archive: DANS. We do a pilot with three types of dataset: experimental, survey and qualitative from researchers of the Erasmus Research Institute of Management (ERIM) in the Dutch Dataverse Network, hosted by the University of Utrecht. Dataverse Network facility is now being used by four Dutch universities.
The first institutional Research Data Management (RDM) policy by a UK Higher Education Institution was passed by the Senate of the University of Edinburgh in May 2011. This paper discusses plans to implement this policy by developing services needed to support researchers and fulfill the University's obligations within a changing national and international setting. Significant capital funding have been committed to a major RDM and storage initiative led by Information Services (IS) for the academic year 2012-13. An RDM steering group, made up of academic representatives from the three colleges and IS,nbsp;has been established to ensure that proposed services meet the needs of university researchers. It also overees the activity of an IS cross-divisional RDM Policy Implementation Committee, charged with delivering those policy objectives. An RDM Roadmap (http://www.ed.ac.uk/polopoly_fs/1.101223!/fileManager/UoE-RDM-Roadmap201121102.pdf) was published in November 2012 to provide a high level overview of the work to be carriedout. The roadmap focuses on four strategic areas: data management support, data management planning, active data infrastructure and data stewardship. IS will take requirements from research groups and IT professionals, and are conducting pilot work involving volunteer research units within the three colleges to develop functionality and presentation for the key services.
Dataverse Network and Open Journal Systems Project to Encourage Data Sharing and Citation in Academic Journals
Eleni Castro (Institute for Quantitative Social Science (IQSS) Harvard University)
As data sharing technology and data management practices have developed over the past decade, academic journals have come under pressure to disseminate the data associated with published articles. Harvard University's Institute for Quantitative Social Science (IQSS) recently received a two year grant from The Alfred P. Sloan Foundation to partner with Stanford University's Public Knowledge Project (PKP) in order to help make data sharing and preservation an intrinsic part of the scholarly publication process, and create awareness specifically among journal editors and publishers. This presentation will provide an overview of the collaboration between PKP's Open Journal Systems (OJS), and IQSS's Dataverse Network (DVN) team who are currently working on building the needed technology that will support seamless publication of research data and articles together; and to support new forms of social science data, readership and analysis. The immediate impact of the project will be to increase the number of readily replicable articles published, and the number of social science journals that adopt best data management and citation practices. The broadest impacts of the project will be to increase the pace of discovery in the social sciences, and to broaden the research opportunities for younger scholars.
Promoting data accessibility, visibility and sustainability in the UK: the Jisc Managing Research Data Programme
Laura Molloy (University of Glasgow)
Simon Hodson (Jisc)
Driven by new research objectives and opportunities requiring the interdisciplinary reuse of data as well as research funder and (increasingly) journal policies, the case for skills in research data management (RDM) is becoming clearer to researchers of all disciplines. Some disciplines are historically well-served by national data centers and perpetuate a culture of organized data deposit, management, sharing and re-use. Many other researchers, however, work in disciplines without this heritage or produce data which is not appropriate for data center hosting. Institutions face a concomitant rise in responsibility for the formulation and delivery of appropriate and accessible RDM services and infrastructure for their researchers. Across the UK, the Jisc Managing Research Data program is stimulating improved RDM practice across disciplines and staff groups via development of tailored policy, services, technical infrastructure and training. Our paper will describe the work of the program and complementary work by the Digital Curation Centre. We shall discuss emerging models in institutional approaches which may be of use elsewhere. Above all, we shall examine how data management planning and training activities may be enhanced by a consideration of disciplinary differences and suggest the benefits of drawing on expert partners beyond the institution.
2013-05-30: E5: Never Say Never: Working with Seemingly Disparate Data
Towards making African longitudinal population-based demographic and health data sharable: Data Documentation practices in the past, present and future
Chifundo Kanjala (London School of Hygiene and Tropical Medicine, ALPHA Network)
African longitudinal population-based studies have been collecting demographic, socioeconomic and health data for, on average, over a decade. Efforts are currently being made to make these data more sharable. This current study assesses the extent of the implementation of structured data documentation using the Data Documentation Initiative (DDI) and other related specifications/ standards. This is done by describing efforts that are currently underway among members of the two main networks uniting these studies. These networks are the INDEPTH (International Network for the continuous Demographic Evaluation of Populations and their Health) and the ALPHA (African longitudinal population-based studies) networks.
Researchers create analysis files that are not always based on numeric data. An example would be a database of abortion laws by state and time. US states vary in their abortion regulations: age limits, ultrasound requirements, mandatory waiting periods, etc. Typically, the cornucopia of regulations have been added, modified, deleted over time. And, to complicate matters, sometimes there are jurisdictional variations which may or may not stay constant. Another database would be state-based legislation centered on specific topics and sub-Track, which allows for comparisons over time or across states. A final example that incorporates both data and information would be the yes/no county breakdowns for citizen votes for amendments to state constitutions on a variety of topics. Important information the researcher might collect would be the text of the statute, language on the ballot, source for the vote, legislative vs. citizen-based, year, type of election. Are data documentation initiatives flexible enough to import this type of information? Clearly, the information is structured if it is disseminated via a searchable database. Should social science archives care about preserving these efforts? Would these be considered data under NSF/NIH data sharing legislation?
Distributed archiving of social science research data: On the way to best-practice guidelines
Reiner Mauer (GESIS - Leibniz Institute for the Social Sciences)
Oliver Watteler (GESIS - Leibniz Institute for the Social Sciences)
Distributed archiving is a common topic for most institutions taking care of research data. Organizational and technical solutions are available. But intellectual input is still necessary to keep creation contexts coherent for third parties. Institutions with similar research interests may hold similar data. The Council of European Social Science Data Archives (CESSDA) is one example for an international collaboration. The German Data Forum takes care of a national data infrastructure. The connection of metadata is the key to access distributed data sources. Social science data archives commenced work on an XML standard for this purpose in the mid-1990s, the Data Documentation Initiative (DDI). Digital Object and other persistent identifiers (DOI, URN, etc.) facilitate the technical linkage of objects in various locations. But how do you connect qualitative and quantitative data from the same project which are archived in different locations? Where do you best document an international project context if data-sets are preserved in national archives? How does a scholar learn about the variations in data holdings when a full version is accessible through a Research Data Centre and a reduced version is publicly available? These are some of the intellectual challenges for planning distributed services. Answers to these challenges are necessary to assure the cohesion of creation contexts. Best practice guidelines are needed.
Faculty practices and perspectives on Research Data Management
Jennifer Doty (Emory University)
Katherine Akers (Emory University)
Our university library has a long history of supporting the data acquisition and analysis needs of campus researchers. To address the emerging trend in academic libraries to support research data management needs, and to focus our development efforts on the most effective services, we initiated a campus assessment in fall of 2012 to solicit input from faculty conducting research across all disciplines. The library and institutional research office invited science, social science, and humanities faculty to complete a brief online survey of their data management practices and perspectives, including questions on data management plans, documentation and metadata, data sharing and preservation. This presentation will share findings from our analysis of the survey data, including general and discipline specific responses. Preliminary analyses show that the majority of our researchers are not very familiar with funders’ requirements and do not share their data with people outside of their research group. Although most researchers do not use disciplinary data repositories, many are somewhat or very interested in this option. The results of our survey give insight into researchers’ data management practices and help identify services with the greatest potential to effectively support the creation, preservation, and dissemination of research data within the university.
In its continuing effort to support faculty, ICPSR has built data exploration tools aimed at streamlining data use in the classroom. ICPSR’s online data analysis tool that is based in SDA, the Custom Crosstab Creator enables faculty to set up a crosstab from ICPSR data to be viewed and manipulated by their students. The instructor selects an appropriate dataset and identifies relevant variables. The students can be given varying degrees of autonomy - instructors can designate placement of specific variables (row, column, control) or they can leave the choice up to students. Students will then receive a URL that provides access to the limited-choice table where they can modify the display and create charts or graphs as appropriate. Students can also download the small subset of data used in the exercise for further use in another statistical program. The SDA version of the study’s codebook (searchable) is available on both the instructor and student interface. Other tools offered by ICPSR to support faculty include Short Exercises also known as Data Driven Learning Guides which use data to illustrate a variety of social science concepts, Exercise Sets, such as SETUPS, and TeachingWithData.org, a web portal to data-related teaching and learning materials.
Data are like parachutes: They work best when open
Reiner Mauer (GESIS - Leibniz - Institute for the Social Sciences)
Oliver Watteler (GESIS - Leibniz - Institute for the Social Sciences)
The call for “open data” is very popular these days. The Open Access movement has gained a lot of momentum during the last decade and publishing under this model has partly become routine for researchers. But even though e.g. the Berlin Declaration (2003) mentions free and unrestricted access to research data, there is still a long way to go for the advocates of data sharing. A significant part of research data is still not accessible and will probably be part of the “digital oblivion movement” in the near future. What is actually meant by “open data”? The Open Knowledge Foundation defines “open data” as “data that can be freely used, reused and redistributed by anyone, subject only, at most, to the requirement to attribute and share alike.” However, in practice openness can have different meanings. These meanings form a continuum ranging from the mere documentation of the existence of data to unrestricted and direct access to the actual datasets. We will analyze different access scenarios to research data to disentangle the legal, organizational and economic aspects that need to be taken into account when talking about “opening” access to data.
Getting some bang for the buck: Reaching out to journalists
Lisa Neidert (University of Michigan Population Studies Center)
Much of the emphasis on statistical and data literacy involves teaching students how to use, manipulate, and interpret data. And, university settings are appropriate venues for this effort. However, for many, the horse has already left the barn. How do we reach folks who are no longer in a university setting but potentially have greater impact than our statistically literate researchers? A typical journalist, even for relatively weak web-only newspapers, reaches a wider audience than publications in journals, which may only have 10 citations 5-years after publication. And, journalists for major newspapers reach a very large audience on the order of millions. Thus, it is well worth our time to reach out to bloggers/journalists. This presentation will provide (a) examples of focused training/follow-up for journalists and (b) subject-specific content. It will close with an example of academics who reach out to wide audiences with blogs and Twitter. This could be a model for data professionals and/or for research faculty.
Lisa Neidert (University of Michigan Population Studies Center)
National Public Radio had an open call to Tweet the 2012 Election at NPR headquarters. This presentation will describe the experience for @MsDrData. The presentation will cover the wide range of expertise of the Election2012 meet-up participants; research/preparation; technology and shared resources; an information fire hose; misinformation; live tweeting; production tours; and learning new technologies for recording the experience. This presentation will also put this experience in the larger context of Twitter: information, archival resource, analysis and prediction, and outreach.
As part of the new UK Data Service, a fresh and innovative platform for disseminating country-level international data has now been released. The new interface, UKDS.Stat, uses OECD data warehousing technology and the Statistical Data and Metadata Exchange (SDMX) standard to provide an enhanced user experience with the additional benefit of improvements to the hidden task of data ingest and processing. This presentation sets out the approach taken to choose the new DotStat software; the process of requirements gathering, assessing contenders and finally signing a Memorandum of Understanding with the OECD to work alongside other members of the Statistical Information Systems Community Collaboration team to help develop this non-commercial product. The characteristics and features of the DotStat software will be described, particularly integrated metadata, visualization and searching capabilities. We shall also mention some of the ups and downs we've encountered working with a non-commercial, community based platform. Finally we will set out a possible vision for the future for the UKDS.Stat platform whereby intergovernmental data delivered in SDMX format can be ingested and checked with a minimum of manual intervention enabling the UK Data Service to make available a huge amount of freely available country-level international data.
Andreas Perret (Swiss Center of Expertise in the Social Sciences (FORS))
Whenever a researcher presents his work, some visual support almost inevitably comes along, and the audience is gratified with at least some chart or graphic. Fancy or rudimental, these visualizations are never innocent, we know it, researchers know it, but the topic is seldom addressed in discussions. To understand the issues of visualization in the social sciences, we have chosen to follow two paths that seem to venture in opposite directions: As society is a very tough subject to describe, social scientists have spent decades to master a complex vocabulary that is (somewhat) unanimously accepted. Introducing graphics in that well groomed landscape brings up touchy issues such as how we convince each other of the quality of our findings. Social scientists use statistical packages for to analyze data; they have different visualization possibilities, but differences arise in other dimension as well. So choosing one over another has implication that range from the attitude towards intellectual property to the kind of journal that is likely to publish a paper. Through interviews with social scientists we intend to show how five communities have shaped - and been shaped by - their statistical tools.
Do-It-Yourself Research Data Management Training Kit for Librarians
Robin Rice (University of Edinburgh)
As part of the University of Edinburgh's Research Data Management Roadmap, a pilot training of four liaison librarians was facilitated by the University's Data Library over a four month period in 2012-13. The training design was discussed and agreed by the participants and a mid-period evaluation helped to fine tune the remaining training sessions and affirm the overall approach. The components of the University of Edinburgh training sessions have been packaged for re-use by librarians at other institutions to conduct a do-it-yourself training course in research data management. The design of the course takes advantage of the librarians' professional experience of working with student and research communities to apply their existing skills towards a new arena for them: research data and its management. Course components include: units from MANTRA, an open, online research data management course hosted at the University of Edinburgh; reflective writing assignments; face to face group discussion; selected group exercises from the UK Data Archive's Train the Trainers suite; and expert speakers who provide reinforcement of the learning points at the start of each face to face session. Each speaker has been voice-recorded to create an audio-visual presentation which can be re-used by other institutions as well.
Achieving real data security via community self-enforcement
Richard Welpton (UK Data Archive)
Katharina Kinder-Kurlanda (GESIS - Leibniz - Institute for Social Science)
Providers of sensitive/confidential data typically rely on IT-based measures to control the security of their data. Examples include swipe-card access to “controlled” rooms, with CCTV monitoring, sign-in procedures etc. When providing access to research data such a focus on security reinforces the message that researchers are a security problem. We argue that these measures not only are costly and increase barriers to sensible research, but also create the wrong incentives for researchers, who are seen as a threat to rather than collaborators in creating and maintaining secure. We argue that fostering a community of trusted researchers is the most effective way of achieving security. With the right incentives in place, researchers, when considering themselves as part of a community, will reinforce standards upon each other, lest their entire community is denied access to data. In addition, cumbersome or hard to comprehend measures are more likely to result in security breaches as researchers make mistakes or deliberately flout rules for the sake of convenience. We will explain how to build trusted communities of researchers around the secure data services of both the UKDS and GESIS – Leibniz Institute for the Social Sciences and how these fit with providers’ and researchers’ interests in secure and accessible data.
Marion Wittenberg (Data Archiving and Networked Services (DANS))
Data Archiving and Networked Services (DANS) in the Netherlands has been originated out of the Social Science data archive and the Historical data archive in the Netherlands. Nowadays we are discipline independent. Driven by data, DANS ensures that access to digital research data keeps improving, by its services and by taking part in (international) projects and networks. In this Pecha Kucha we want to illustrate what kind of advantages this broadening has for our work.
How to Make the Most of Your IASSIST Membership between Conferences
Robin Rice (University of Edinburgh, EDINA Data Library)
Tuomas Alatera (Finnish Social Science Data Archive (FSD))
Thomas Lindsay (University of Minnesota)
What does it mean to be a member of IASSIST these days? Does IASSIST do anything other than host annual conferences? Come talk with members of the Communications Committee, the Membership Chair and other IASSIST officials to discuss these and other burning questions you may have. This interactive poster session will reveal ways to get more deeply involved in IASSIST in between conferences by joining committees or interest groups. Perhaps you want to give something back to the organization you love so much. Perhaps you have an idea how the organization can do things better (and even better, the time to help us put it into practice). Perhaps you want to know if others share an important professional interest and how to take that forward. Or, perhaps you are a new member and are just curious to know what makes IASSIST tick. The poster will explain how the committee structure of IASSIST works, how you can contribute your time or expertise to benefit the organization, and will present results of the recent members’ survey. More importantly, IASSIST committee members will be at the poster session ready to talk to you about YOUR needs and interests regarding any aspect of the organization. A suggestion box will be available for anyone wishing to provide anonymous feedback. IASSIST is currently 100 percent volunteer run. We know we’re not perfect. Help us improve through your active participation and feedback.
A Good Practice of Cooperation Between Social Science Data Archives and a National Statistics Office: The Slovenian Example
Sebastian Kočar (Slovene Social Science Data Archives (ADP))
The Statistical Office (SORS) and the Social Science Data Archives (ADP) are both partners of the DwB project. The start of the cooperation between the organizations dates back to the nineties. Since then ADP has been distributing PUF micro- and metadata of surveys such as LFS. The cooperation was increased in 2012 to achieve the main goals of the DwB project in the national level easier. As the projects main objective is to assist European researchers in the access to the official statistics micro-data, meta- and micro-data, including the list of available micro-data, have been prepared. ADP will soon distribute metadata and PUF micro-data for the most important SORS micro-data. In the poster presentation I will expose the problems and challenges we've come across while improving the research environment. We will comment on how the cooperation between archives and NSO's could be improved and also discuss what those organizations could offer to each other. Both Slovene organizations believe that the cooperation should be extended beyond the requirements of the project and become a part of a long term commitment to assist researchers. So it is an example of a good practice which should be implemented in other countries across Europe as well.
Collaborative Research: Metadata Portal for the Social Sciences
Sanda Ionescu (ICPSR)
ICPSR, ANES/ISR, and NORC are currently engaged in a new collaborative effort to create a common metadata portal for two of the most important data collections in the U.S. - the American National Election Studies (ANES) and General Social Survey (GSS). Technical support is provided by Metadata Technology. This pilot project, funded by the National Science Foundation, proposes to build a combined library of machine-actionable DDI metadata for these collections, and demonstrate DDI-based tools for advanced searching, dynamic metadata presentation, and other functions meant to facilitate discovery and analysis of these data. The project will also lay a foundation for developing new metadata-driven workflows for both ANES and GSS. This poster will present our plan of action and the roles of the partners involved in different phases of the project. The progress made in the first stages will be discussed both in terms of accomplishments and any difficulties that had to be overcome.
An Interdisciplinary Repository for Research on Social Dimensions of Emerging Technologies: Challenges and Opportunities
Peter Granda (ICPSR)
By 2015, National Science Foundation Centers for Nanotechnology in Society in the United States, as well as other institutes and researchers conducting social dimensions research, will have spent ten years collecting qualitative and quantitative data and developing analytic and methodological tools for examining the ethical, legal and social impacts (ELSI) of nano-science and emerging technologies. Much of this interdisciplinary inquiry extends beyond quantitative approaches with an established practice of reuse and verification to include research and pedagogical tools for use in planning, informal learning, and decision-making settings. This poster will report on activities associated with a National Leadership Planning Grant that the UMass Amherst Libraries received from the Institute of Museum and Library Services to study what infrastructure, funding, and partnerships will be necessary to develop standards to carry out data archiving for such digital objects. A central activity of this grant will be a dedicated planning workshop, scheduled for June 2013, to discuss the technical and administrative requirements for implementing an interdisciplinary repository for Nano ELSI data.
The Comprehensive Extensible Data Documentation and Access Repository (CED2AR), version 1.0
Jeremy Williams (Cornell University )
Bill Block (Cornell University )
Warren Brown (Cornell University )
Florio Arguillas (Cornell University )
This poster will demonstrate the latest DDI-related technological developments of Cornell University's $3 million NSF-Census Research Network (NCRN) award, dedicated to improving the documentation, discoverability, and accessibility of public and restricted data from the federal statistical system in the United States. The current internal name for our DDI-based system is the Comprehensive Extensible Data Documentation and Access Repository (CED2AR). CED2AR ingests metadata from heterogeneous sources and supports filtered synchronization between restricted and public metadata holdings. Currently-supported CED2AR connector workflows include mechanisms to ingest IPUMS, zero-observation files from the American Community Survey (DDI 2.1), and SIPP Synthetic Beta (DDI 1.2). These disparate metadata sources are all transformed into a DDI 2.5 compliant form and stored in a single repository. It can then be filtered, allowing the creation of derived public use metadata from an original confidential source. This repository is currently searchable online through a web application and application programming interface demonstrating the ability to search across previously heterogeneous metadata sources. In addition, we will demonstrate an extension to DDI 2.5 that allows for the labeling of elements within the schema to indicate confidentiality.
Expansion of the Odum Institute Dataverse Network: Forming Partnerships, Harnessing Infrastructures, and increasing Preservation of Research Data
Jonathan Crabtree (University of North Carolina Chapel Hill)
Lynda Kellam (University of North Carolina Greensboro)
In this poster, we will summarize the expansion of the Odum Institute Dataverse Network (DVN) to support new inter-institutional and inter-disciplinary collaborations. The Odum Institute and the North Carolina Online Collection of Knowledge and Scholarship (NC DOCKS) group have formed an inter-institutional collaboration to address data management needs of researchers within NC DOCKS schools. This collaboration allows researchers, assisted by institutional library staff, to deposit data into Dataverses hosted by the Odum Institute Dataverse Network (DVN) thereby ensuring research data is properly preserved and accessible. Odum is also expanding the Odum DVN to support multi-disciplinary research data. Odum recently formed a partnership with the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study investigators. This partnership will allow the Odum data archive staff, in collaboration with the OPPERA team, to deposit phenotype and genotype data into the DVN as a preservation strategy. These expansions of the DVN show the growing role of collaboration in data management and preservation and the opportunities to harness current infrastructures and grow partnerships instead of creating institutional and discipline-based silos.
The Survey Research Data Archive (SRDA) of Academia Sinica has Taiwan's largest collection of survey data. The collection comes from both the academics and government agencies. Since its establishment in 1994, SRDA has become an important resource for teaching and research. In order to facilitate resource sharing among researchers and to reduce the time and energy individual researchers have to put in for similar and complex analysis tasks, we propose to build an SRDA platform for learning and sharing in quantitative research methods. The platform has two objectives. First, it allows SRDA members to share programs, written by themselves, by uploading it, assigning it to one or more purpose Track and describing its contents and purposes in words. Other members can search for a specific program by using the purpose Track and by searching in the descriptions. Those who contribute programs will manage their own files, so that only the most recent versions are available. Second, it allows members to post comments or questions, whether specific to a program or not, and all members can respond. Members and visitors can also learn by searching in these questions and responses.
Qualitative Data in the Context of Mixed Methods Research: The Concept of Research Data Centre for Education (RDC Education)
Doris Bambey (German Institute for International Educational Research (DIPF))
Against the background of the triangulation of methods, qualitative micro-genetic approaches such as video studies have become significantly more relevant in educational research in the last ten years. At the same time a desideratum is evident in the field of prepared and re-usable qualitative research data. Accordingly a great task is still the development of specific metadata standards for this type of objects as well as a reliable way of dealing the particular problems relating to data protection in the case of AV research data. FDZ Bildung place particular emphasis on audio-visual and auditive data and their numerical-textual assessment and documentation materials such as transcripts, narrative descriptions of observed settings, codings and ratings. This contribution aims to show how FDZ Bildung is dealing with the requirements of bringing together all instruments and data of a study, thus offering researchers an efficient access to the entire quantitative and qualitative output at the level of a study. The concept takes into account the results of interviews with educational researchers which systematically were carried out in 2012.
DDI Tools Catalogue: A Sharing Platform for Everyone
Andias Wira-Alam (GESIS – Leibniz Institute for the Social Sciences)
In recent years the use of the Data Documentation Initiative (DDI) metadata formats has spread across research communities. To date, many DDI developers from different countries have created tools to help active and prospective DDI users to understand and implement the specification. We are pleased to present the DDI Tools Catalogue, a platform for sharing tools among DDI developers, DDI users, DDI prospective users, and others. Currently, there are 46 DDI tools registered in the Catalogue. Among those, 30 tools are under a Freeware License. In terms of DDI Codebook and Lifecycle support, there are 5 tools that support DDI 1.x, 8 tools for DDI 2.0, 15 tools for DDI 2.1, 21 tools for DDI 3.0, and 12 tools for DDI 3.1. Our aim is to make the DDI Tools Catalogue a useful resource not only for current developers but also for prospective developers who are interested in building tools and submitting them to the Catalogue. We also encourage researchers, archivists, data librarians, and others to investigate the Catalogue to identify tools to meet their specific documentation needs. We are confident that in coming years there will be significant growth in the number of DDI tools available.
Linking Research Data and Literature: Integration of da|ra and Sowiport based on Link Information from InFoLiS
Dimitar Dimitrov (GESIS – Leibniz Institute for the Social Sciences)
Daniel Hienert (GESIS – Leibniz Institute for the Social Sciences)
Katarina Boland (GESIS – Leibniz Institute for the Social Sciences)
Dennis Wegener (GESIS – Leibniz Institute for the Social Sciences)
Finding and using relations between digital information systems and their respective data sources is a major issue in the area of scientific information management. A use-case in the social sciences is the linking of publications and underlying research data and making the link information directly accessible via integrated information systems. We present an architecture that allows integrating metadata on study research datasets from the public information system da|ra with metadata on literature from the Sowiport portal. The relations are extracted from study metadata and publication full texts using link detection algorithms developed in the InFoLiS project. By this, we show how to extend existing information systems to allow users to navigate between different datasets via relations.
Data Without Boundaries-Supporting Transnational Research in Europe
David Schiller (Institute for Employment Research (IAB))
Under European FP7 funded Data without Boundaries (DwB) project, academic researchers resident within the European Union member states or the European Free Trade Association are invited to apply for access to highly detailed micro-data from Research Data Centers (RDC) in the UK, Germany, Netherlands and France. This is a unique opportunity for researchers to receive specialist support and reimbursement of costs to conduct comparative research across borders. Depending upon the RDC, available datasets are social survey, census and business micro-data considered to be too detailed, confidential or sensitive to be provided through standard access mechanisms. Researchers should apply to access specific datasets to conduct research at one or more RDCs that are not in the country of their residence. Successful applicants will visit the RDC to conduct their research onsite and/or receive training and will receive specialist support. Onsite research visits will last for one to three weeks, depending upon the research and the RDC. The mentioned support is offered to reach two goals: first to force transnational research in Europe and second to collect information about the needs and obstacles when doing transnational research. The poster will describe the program and inform about the first findings.
The Next Generation Microdata Information System (MISSY)-Towards a Best-Practice Open-Source Software Architecture for DDI-Driven Data Models
Matthäus Zloch (GESIS – Leibniz Institute for the Social Sciences)
Thomas Bosch (GESIS – Leibniz Institute for the Social Sciences)
Dennis Wegener (GESIS – Leibniz Institute for the Social Sciences)
The DDI Discovery Vocabulary represents the most important parts of DDI-Codebook and DDI-Lifecycle in the Web of Data covering the discovery use case. In various software projects in the statistical domain these elementary concepts are re-used to a large extent. Utilizing the DDI Discovery Vocabulary as the core data model the idea is to create a reusable data model, which can be extended and adjusted to the requirements of an individual project. In this poster, it is shown how the abstract data model can be implemented and how individual software products might leverage this model as a foundation for their own data models. By means of the MISSY use case, it is shown how a well-structured software architecture, based on the model-view-controller software design pattern, might look like, how certain software project layers interact, and how to implement different persistence formats like DDI-XML, DDI-RDF, and relational databases. This poster also will give a step-by-step guidance into how a project, which uses the DDI Discovery Vocabulary as an exchange format and core data model, can be build up from scratch. We will also show a live demonstration of the next generation of the Microdata Information System.
DDI-Lifecycle Migration, Curation and Dissemination Production Systems at the Danish Data Archive
Jannik Jensen (Danish Data Archive (DDA))
Anne Sofie (Danish Data Archive (DDA))
The DdiEditor is the key tool in a framework of data processing tools and processes composing data processing of survey datasets. However the DdiEditor is also utilized as middleware to migrate the DDA collection into DDI-L. The DdiEditor lays the foundation for enhanced machine actionable dataset landing pages, as well as search and retrieval based infrastructure. This way the support of curation and dissemination processes is enhanced leading to quality assured products and workflows.
Kristi Winters (GESIS – Leibniz Institute for the Social Sciences)
Comparative social researchers are often confronted with the challenge of making key theoretical concepts comparable across nations and/or time. Further, researchers have multiple ways to recode education into a harmonized variable. GESIS - Leibniz Institute for the Social Sciences is launching two electronic resources to assist social researchers. The website DataCoH (Data Coding and Harmonization) will provide a centralized online library of data coding and harmonization for existing variables to increase transparency and variable replication. DataCoH will contain socio-demographic variables used across the social sciences and then expand to discipline-specific variables. The software program CharmStats (Coding and Harmonizing Statistics) will provide a structured approach to data harmonization by allowing researchers to: 1) download harmonization protocols; 2) document variable coding and harmonization processes; 3) access variables from existing datasets for harmonization; and 4) create harmonization protocols for publication and citation. This paper explains DataCoH and CharmStats and demonstrates how they work.
Come in and Find out about Research Data: Documenting and Searching for Data in the German Data Reference System
Sophia Kratz (GESIS – Leibniz Institute for the Social Sciences)
Social science research data is produced by various institutions and projects. Although archives enable access to some data in well-established ways, many datasets which could be interesting for other researchers cannot be found because there is no central platform where datasets - archived or not - can be documented. As a consequence, there is no easy way for data creators to showcase the data they have collected. Similarly, it is very difficult for users to find the datasets best fitting their research. The existing information is highly scattered in a dispersed data landscape and cannot be found without further knowledge about institutions and projects. To solve these problems, GESIS - Leibniz Institute for the Social Sciences has started a project to identify and document data sources in Germany, which are currently not documented and made available in a systematic way. This database will also be open for researchers to describe their own data by using a tailored metadata schema and to search for already existing data available for secondary analysis. The aim of the poster is to present the current approach to building the data reference system, its position within the data services of GESIS, as well as its structure and features.
New Requirements Regarding Research Data Management and Data Access in Sweden
Mattias Persson (Swedish National Data Service (SND))
Since 2012, grant applications to the Swedish Research Council (SRC) must include a specific data publication plan if a major component of the project involves collecting data. The aim is to ensure that the data can be used in the future by researchers other than those who participated in the project. This means that within a reasonable time, research data should be made available through relevant national and/or international data organizations The Swedish National Data Service (SND) is part of the research data infrastructure financed by the SRC with University of Gothenburg as host university. SND responsibility is within service and support for researchers regarding management and access to research data within Social Sciences, Humanities and Medicine. SND is one of the data organizations that SRC recommend researchers to use, which has increased the demand on service and support at SND. In the research and innovation bill from October 2012, the government stated that the SRC should be given an assignment to develop forms and national guidelines for how researchers can gain access to research results and research data, so called open access. This will further down the road improve both the use of data management and data access.
Focusing Services and Expertise: The Research Data Centre International Survey Programs at GESIS-Leibniz Institure for the Social Sciences
Markus Quandt (GESIS – Leibniz Institute for the Social Sciences)
The poster will showcase diverse added-value services provided by the GESIS RDC International Survey Programs as a central access point for cross-national and cross-cultural survey databases with a wide thematic scope of attitudinal data and subjective social indicators, encompassing an ever-expanding number of countries across several decades. The RDC International Survey Programs was established in order to focus the institution's expertise and activities in this field, benefiting from prominent involvement in almost all steps of the research data life cycle: from the standard demography development for the ISSP or the multilingual instrument documentation and development in the EVS, over data harmonization and detailed documentation in close cooperation with the principal investigators (CSES, EVS, ISSP) to online access facilities to individual level datasets and related materials, even across distributed platforms where necessary (European Elections Study). To complement the infrastructure components, the analysis potential of the comparative database will be exemplified in research oriented data reports and workshops. The poster will also point to future potentials such as the strengthening of links between comparative survey programs and other data types for multi-level analysis.
Under Lock and Key? Setting up a Secure Data Center at GESIS in Germany
Katharina Kinder-Kurlanda (GESIS – Leibniz Institute for the Social Sciences)
Christina Eder (GESIS – Leibniz Institute for the Social Sciences)
This poster is about the specific challenges in setting up a secure data service at GESIS – Leibniz Institute for the Social Sciences. German data protection law (Bundesdatenschutzgesetz) gives every person the right to protection of their personal data. The re-use of survey data, for example, must be provided in a way that no individual person is identifiable. This usually means that data is anonymized. However, anonymization is often difficult to achieve without compromising the quality of the data. To enable research access to disclosive data, GESIS – Leibniz Institute for the Social Sciences is establishing a secure data center with three controlled access points: A low-level solution employing a comprehensive legal contract, a safe room providing particularly strict access control, and, a remote access system. Using German electoral survey data as a practical example, we will examine the technical and organizational challenges, opportunities and questions that arise when building a secure data service in the context of the German data protection framework, which comprises both federal and state laws. We introduce our strategy of setting up pilots for the usage models. This strategy enables us to better understand the legal framework and GESIS's role within it, and also to improve the service's usability by integrating test users' feedback.
Laurence Horton (GESIS – Leibniz Institute for the Social Sciences)
Astrid Recker (GESIS – Leibniz Institute for the Social Sciences)
Alexia Katsanidou (GESIS – Leibniz Institute for the Social Sciences)
The Archive and Data Management Training Center exists to ensure excellence in the creation, management, and long-term preservation of social science research data. Our aim is to increase data quality and data availability for the benefit of the social sciences. At the center of activities lies the importance of sharing publicly funded research data and meeting funder requirements on data management, preservation, and re-use as well as awareness of good practice in data licensing, documentation and data enhancement, methods of data sharing, file formats, physical and digital data storage, and preservation planning. We offer training and consulting on data management planning for researchers, projects, and centers. Additionally, we provide courses on long-term archiving and preservation, data dissemination and security, licensing data for use, managing access systems, format migration and verification. Our target audiences include (a) principal investigators for research projects who plan the data management and are responsible for its implementation and oversight, (b) individual researchers or researchers who are members of project teams and who actually implement data management procedures, (c) social science and humanities data archivists responsible for digital curation, data enhancement, and long-term preservation.
ENGAGE projectnbsp;aims to build an information infrastructure for Public Sector Information (PSI) which is typically an aggregated data published by national and local governments, or other public bodies. An automated or semi-automated discovery of PSI datasets would cater for the needs of researchers in social science, behavioral science, and economics who want to consider freely available PSI sources for their study, in addition to other data perhaps collected or compiled via a dedicated research project. The researchers could then provide either an explicit or an implicit feedback for the relevance of the discovered PSI to their research needs. This would add up to the quality of data exposed via ENGAGE infrastructure, and empower other ENGAGE users: researchers, as well as citizens who are the second major ENGAGE user category, with the links from PSI aggregated data to the concepts and other datasets including well curated micro-data. We see our poster presentation as a means to suggest an approach to data linking and to gather requirements from data practitioners for the rest of the project.
Rob Dymond-Green (Mimas, University of Manchester )
Richard Wiseman (Mimas, University of Manchester )
UK Data Service Census Support provides integrated access to recent UK censuses in ways that make the data easier to understand and use. Census Support is part of the Economic and Social Research Council's new UK Data Service, bringing together a long history of data management, support and facilitating access to census data to the UK research community. This poster illustrates the challenges we are facing while ingesting the 2011 census data into InFuse, our interface. On 27 March 2011, the three UK statistical agencies (ONS, NISRA and NRS) simultaneously conducted the census. We have developed an integrated data model that is flexible enough to cope with the issues raised by combining data from multiple census agencies. Some of the issues we are dealing with are; consistency of definitions within and across censuses, complicated geographic hierarchies, and different thresholds used to generate the small area statistics. InFuse demonstrates benefits of a data feed approach to dissemination, such as being able to filter across the entire dataset. The integration of metadata enhances the discoverability data. It also provides initial solutions to some generic challenges, including management of the sparsity of multi-dimensional datasets through guided queries, and complex operations upon hierarchical structures.
Across the world statistical organizations undertake similar activities albeit with variation in the processes each uses. Each of these activities use and produce similar information (for example all agencies use classifications, create data sets and publish products). Although the information used by statistical organizations is at its core the same, all organizations tend to describe this information slightly differently (and often in different ways within each organization). There is no common means to describe the information we use. GSIM is a conceptual model that provides a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics. GSIM must be implementable. In order to support the implementation of GSIM, known standards and tools have also been examined, to ensure that the reference framework is complete and useful in this respect. The relationship between GSIM and other models and standards is two-fold. The standards and models serve as inputs to the creation of GSIM, and also act as targets for the use of GSIM within organizations. This session aims to introduce the model, discuss the relationship with DDI and look at implementations in national statistical institutes.
Establishing a National Statistical Information Repository in Uganda: Prospects and Challenges
Winny Akullo Nekesa ( Uganda Bureau of Statistics)
Uganda Bureau of Statistics is a semi-autonomous government agency established under the 1998 Act of Parliament to spearhead the development and maintenance of the National Statistical System (NSS). In order to develop a coherent, reliable, efficient and demand-driven NSS that supports management and development initiatives, the Bureau in collaboration with key Ministries Departments Agencies (MDAs) under the Plan for National Statistical Development namely, Ministry of Health, Ministry of Education, Ministry of Tourism, Trade and Industry, Ministry of Gender, Labor and Social Development, Ministry of Agriculture, Animal Industry and Fisheries, Ministry of Local Government , Bank of Uganda (BOU) and Uganda Police Force generate a lot of statistical information which is scattered in various locations in the MDAs. Access to a wide array of statistical information is a challenge because there is no established mechanism or facility were this information or access modalities can be centralized, coordinated and managed. Such a facility once set up would act as the digital preservation repository/hub for data/information to inform and support evidence based decision making for better socio-economic development outcomes. This paper outlines the proposed digitization process and the prospects of this system and the challenges that are likely to be met.
Evaluation of Repository for Inclusion in Data Citation Index
Irena Vipavc Brvar (Slovene Social Science Data Archives (ADP))
Slovene Social Science Data Archives (ADP) are one of the smallest CESSDA archives but still wants to follow the big ones. ADP is registered as local repository at Slovene Ministry of Education and Science and authors of social science surveys with data saved in our archive gain scientific points. To expand beyond country borders we would like to be included as a repository in Data Citation Index on Thomson Reuter. In this poster we will present requirements of Thomson Reuter, and our path from where we are now to the inclusion in this register of data repositories. We would like make data more visible to foreign researchers, and to make links between publication and data possible. Some CESSDA archives already made this path and we would like to follow them.
The primary aim of the work undertaken was to offer a production strength generic service and associated toolset for the benefit of a wider social-science audience. We discuss the creation of an open web service and supporting web site 'galleria' and tools that allow social scientists to create, share and reuse custom, bespoke cartograms. Our objectives in undertaking this work were to: 1. Make more robust (in terms of software implementation and service quality), an existing, proof-of-concept service, previously funded by the UK's Economic and Social Research Council under its Census Program Innovation awards; 2. Ensure that tools exist which are both readily available and comprehensible for use by novice and non-expert users; 3. Showcase user generated datasets in a way that is engaging and permits onward sharing, reuse and remix of original and derived data; and 4. Deliver a sustainable infrastructure that supports both machine-to-machine and human-machine interaction.
Easy DDI Organizer (EDO): Metadata Management and Survey Planning Tool Based on DDI-Lifecycle
Yuki Yonekura (The University of Tokyo)
From 2010, Social Science Japan Data Archive started to introduce DDI and also develop Easy DDI Organizer (EDO). EDO is a tool which helps researchers to conduct social surveys and manage their metadata. It enables researchers to record survey metadata along with data lifecycle such as study purpose, sampling procedure, mode of data collection, questions, question sequence, variable descriptions, and bibliographic information. It also supports importing variable level metadata from SPSS files and exporting codebook and questionnaire. We will introduce and demonstrate these features at the poster session.
Colectica is a standards-based platform for creating, documenting, managing, distributing, and discovering data. Colectica aims to create publishable documentation as a by-product of the data management process. This booth will provide live demonstrations of the various components of the Colectica platform. Colectica for Excel is a new, free add-in for Microsoft Excel that allows you to document spreadsheet data using the DDI-Lifecycle standard. Colectica Repository is a centralized storage system for managing data resources, enabling team-based data management, and providing automatic version control. Colectica Designer interacts with Colectica Repository to provide advanced data management and documentation functionality. Colectica Designer can import data and metadata from a variety of formats, and can generate documentation and source code in a variety of formats. Colectica Portal is a web-based application, powered by Colectica Repository, which enables data and metadata publication and discovery. Colectica Portal integrates with several social networking technologies to provide enhanced collaboration and discovery.nbsp;
Showcasing the UK Data Service; New Pastures, New Horizons
Louise Corti (UK Data Archive)
In the poster session we will be on hand to talk about the new UK Data Service, a five year funding opportunity offering a comprehensive and unified resource for users and creators of social science data. We will tell about our experiences in transitioning to a new operational infrastructure based on OAIS, our work toward establishing Trusted Digital Repository status, and our hard-earned Concordat with our Office for National Statistics. In addition to all that more heavy-duty stuff, come and view our:new web site and shiny new brandingDiscover portalcase studies databaseDot-stat international databanks browsing systemtest Digital Futures qualitative data browsing systemnew training materials for data appraisalWe look forward to seeing you at our stall!
DDI Class Library for .NET
Johan Fihn (Swedish National Data Service (SND))
An open source .NET class library for DDI-Lifecycle has been developed at SND. The class library is a help for developers who wants to create applications based on the DDI standard. The class library handles validation, serialization, de-serialization, generation of URNs and unique IDs.nbsp;
The Data Deposit Workflow: Involving Researchers in Timely Dataset Upload and Description
Christina Ribeiro (DEI-FEUP University of Porto/INESC TEC )
Data repositories are established to ease data sharing, to preserve research data, and to promote the visibility of institutional data assets. Research projects that generate datasets may lack funding for the organization and long-time preservation of the collected data. On the other hand, researchers are more motivated to deposit data as they become aware of the importance that associated datasets can have on the visibility of their research. One of the hard issues in research data management is the setup of an easy communication channel between researchers and data curators. Following the setup of an experimental data repository at U.Porto, we reflected on the need for intuitive tools for researchers to contribute to data deposit and data description. We present two experimental online tools for streamlining the data deposit workflow. The evaluation of the tools with a panel of researchers will allow us to assess their usefulness before planning the integration of the tools with our extended DSpace repository.
UK Data Archive Keyword Indexing with a SKOS Version of HASSET Thesaurus
Mahmoud El-Haj (UK Data Archive)
We show the evaluation results, tools and techniques used to automatically index data collections. We examine the efficiency and the accuracy of keyword automation. We tested the capacity and quality of automatic indexing using a controlled vocabulary called HASSET. We began by applying SKOS to HASSET. The automatic indexing, using the SKOS version of HASSET, provided a ranked list of candidate keywords to the human expert for final decision-making. The accuracy or effectiveness of the automatic indexing was measured by the degree of overlap between the automated indexing decisions and those originally made by the human indexer. We investigated text mining techniques to automatically index the data collection. These included applying the tf.idf model and Keyphrase Extraction Algorithm (KEA) in a Java development environment. We used Machine Learning and Natural Language Processing tools. The tools were used to build a classifier model using training documents with known keywords and then used the model to find keywords in new documents. Extensive manual and automatic evaluation was performed to calculate recall and precision scores. This poster explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.
Statistical data exist in many different shapes and forms such as proprietary software files (SAS, Stata, SPSS), ASCII text (fixed, CSV, delimited), databases (Microsoft, Oracle, MySql), or spreadsheets (Excel). Such wide variety of formats present producers, archivists, analysts, and other users with significant challenges in terms of data usability, preservation, or dissemination. These files also commonly contain essential information, like the data dictionary, that can be extracted and leveraged for documentation purposes, task automation, or further processing. Metadata Technology will be launching mid-2013 a new software utility suite, "DataForge", for facilitating reading/writing data across packages, producing various flavors of DDI metadata, and performing other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. DataForge will initially be made available as desktop based products under both freeware and commercial licenses, with web based version to follow later on. IASSIST 2013 will mark the initial launch of the product. This presentation will provide an overview of DataForge capabilities and describe how to get access to the software.
Interdisciplinarity: Ways to Improve Data and Statistical Literacy
Flavio Bonifacio (METIS Ricerche Srl)
Working on Numbers (IQ, Fall 2009) and modeling multishaped reality through data, I discovered an unexpected and widespread desire to recognize the charming appeal of numbers. This Paper describes how the curiosity for a better knowledge of numbers comes out and what we can do to transform this curiosity into a desire to learn. Curiosity often is the first step toward knowledge and it is evenly distributed among disciplines: the poster will illustrate this natural and parallel interest in numbers to show genuine interdisciplinary ways to improve both data use and statistical literacy. I will present a collection of samples extracted from literature (Paulos and others) and the media world (TV, newspapers, magazines, social networks) to show why the knowledge of numbers and statistical data is needed to understand the real world. Furthermore, I will use two examples from my teaching experience to show how it is possible to teach the "feeling" for statistical numbers: the first is the Numbers Meaning course held [sp1] by METIS Ricerche; and the second is the Master in Data Analysis and Business Intelligence designed and conducted by METIS Ricerche in cooperation with the University of Turin.
2013-05-31: F1: Integrated Efforts: Discovery, Distribution and Preservation
Innovation in thesaurus management
Lucy Bell (UK Data Archive)
This paper gives an overview of recent, high profile, future-focused initiatives undertaken at the UK Data Archive to further the usefulness and usability of its digitally-delivered thesauri. The Archive has recently received funding from two separate sources (Jisc and ESRC) to enhance its thesaurus products. The paper starts by describing the work of the Jisc-funded SKOS-HASSET project (June 2012 - March 2013). This RD project had three aims: to apply SKOS to HASSET; to improve its online presence; and to test SKOS-HASSET's automated indexing capabilities. The paper outlines in more detail the project's aims, objectives, activities and the uses to which its deliverables have been put, post-project. Building on this, a second, five-year project is also underway at the Archive, with wider and more ambitious deliverables. In 2012, the ESRC awarded the Archive funds to improve the content and delivery of ELSST, under the CESSDA ELSST project. The deliverables expected from this work are a new and improved thesaurus management interface, an established annual release process, a review of the thesaurus structures and hierarchies and the creation of a system for remote access. The project will also build on the SKOS-HASSET work in extending our community of thesaurus users.
A Nordic collaboration on data archiving and preservation of data on medicine and health
Elisabeth Strandhagen (Swedish National Data Service (SND))
Bodil Stenvig (Danish Data Archive (DDA))
The archives in the Nordic countries, Danish Data Archive (DDA), the Swedish National Data Service (SND), the Finnish Social Science Data Archive (FSD), and the Norwegian Social Science Data Services (NSD), have started collaboration on data archiving and preservation of data on medicine and health. The first meeting was held in Odense in November 2012 with 2-3 representatives from each country. One area that could benefit from cooperation on preserving data on health is to prepare and create common key words and Track for health sciences, and to prepare and describe the teaching content for data management program. The group will also focus on support for secondary use of data on health science as an important resource for medical scientists. A common goal is to develop a platform for collaboration within the framework of dissemination of research data presented by the Council of European Social Science Data Archives (CESSDA)/CESSDA-ERIC. The Nordic data archives want to be represented at the NordicEpi 2013 and will also report to the NordForsk project "NORIA-net on Registries". The data services in the Nordic countries will qualify the Nordic researches infrastructures for health science in cooperation with the Nordic epidemiological research.
2013-05-31: F3: Panel: Virtual Research Environment for Research in the Social Sciences
2013-05-31: F4: Expanding Scholarship: Research Journals and Data Linkages
Research data management in economics journals: Data policies and data description as prerequisites of reproducible research
Sven Vlaeminck (Leibniz - Information Centre for Economics - ZBW)
Ralf Toepfer (Leibniz - Information Centre for Economics - ZBW)
Replication of research results is eminent for empirical science. But in disciplines like economics, replication is a vision rather than a reality. One reason for this is that research data are not available due to the lack of mandatory data policies and archives. Even if data is available, descriptions with sufficient metadata are often missing. Also the e-infrastructure for providing datasets and other materials is still underdeveloped and offers no features. Our talk focuses on academic journals in economics. We present some results of a study of more than 140 journals regarding their research data management and suggest good practices for data availability policies. Subsequently we propose concepts for improvements regarding the journals' e-infrastructures. In particular we are addressing the problem of metadata creation. Often, the creation of metadata is not accepted by researchers because it is too time consuming. On the other side it must be comprehensive enough for reproducibility purposes. Referring to this contradiction, we define different levels of metadata schemata dependent on the different purposes they should serve - from ensuring the citation of research data to the requirements for replications of data and results.
Perspectives on the role of trusworthy repository standards in data journal publication
Angus Whyte (Digital Curation Centre)
Sarah Callaghan (Digital Curation Centre)
Jonathan Tedds (Digital Curation Centre)
Matthew Mayernik (Digital Curation Centre)
Data journals are a focus for innovation in data sharing and publication, across a growing range of disciplines. They offer a number of significant opportunities to researchers, data centers/ repositories, institutions and publishers. We report on progress in the PREPARDE project, which is addressing key issues including the common ground between 'trustworthy' data repository standards and effective peer review of datasets. The project has an initial focus on earth science disciplines, and the Geoscience Data Journal, a partnership between the UK Royal Meteorological Society and Wiley-Blackwell, and involves major geoscience data centers in the UK and US. We discuss findings of an international interdisciplinary workshop, and its contribution to our aim of producing guidelines on a) dataset review criteria and the associated cross-repository workflows; and b) the roles of trusted repository standards e.g. the Data Seal of Approval and ISO16363 in supporting the peer review of data. These focus on how the responsibilities for both technical and scientific review of data can be met effectively through collaboration between the various stakeholders. These include research institutions, many of which are developing infrastructure for research data management to fulfill their policy obligations towards sharing publicly funded research data as a public good.
2013-05-31: G3: Data Longevity: Tools, Processes, Practical Experiences
How would you like to have your DDI today?
Olof Olsson (Swedish National data Service (SND))
Jannik Jensen (Danish Data Archive)
Johan Fihn (Swedish National Data Service)
Stefan Jakobsson (Swedish National Data Service)
Akira Olsbanning (Swedish National Data Service)
DDI Lifecycle is great to make a complete lifecycle documentation and handles a lot of different use-cases. But sometimes your consumers want other formats, as a good chef you should be able to serve your dinner guest their favorite type of dish. It may be some delicious DDI-codebook, a small portion of DataCite metadata, a spicy PDF codebook or a complete three-course meal of HTML with JavaScript on the side. The need for transforming DDI to other formats is widespread and institutions like libraries have their established formats for their information system. This presentation discusses the need to standardize and provide open solutions for transformations of DDI to other formats. This project will present XSLT transformations from DDI-Lifecycle to: "An interactive HTML codebook with jQuery, MARC-XML, DataCite metadata, DDI-Codebook".
Introducing OAIS and DDI into an on-going research process: The MPC experience
Wendy Thomas (Minnesota Population Center)
The new requirements of the National Science Foundation and the National Institutes of Health that funding proposals need to include a data management plan has resulted in increased interest in data preservation. Many organizations are looking more closely at OAIS and implementation models such as PREMIS as a means of addressing their internal, long-term preservation responsibilities. This in turn has raised new interest in DDI as a means of capturing metadata on the data they create, transform, and deliver. At the Minnesota Population Center we are working on a means of making the capture of metadata used to support the preservation and provenance needs of the OAIS archival model. This presentation uses the MPC as a case study on the issues of integrating DDI and PREMIS into an on-going research process. It will focus on decisions regarding the evaluation of the current organizational activities, determining what archival responsibilities we had in relation to various projects, and means of beginning to integrate the capture of needed metadata without major disruptions of on-going production processes. The three projects included in the discussion will be IPUMS-International, NHGIS, and the new grant project TerraPopulus.
Record Linkage – the Key to Future Research in the Social Sciences
Timothy M. Mulcahy (NORC)
Data about individuals (persons or institutions) is collected routinely. May it be by surveys, administrative processes, data collection for statistical reasons, or through the documentation of transactions. According to that rich data sources that could enhance research in the Social Sciences are out there. Resent developments made those sources available and usable for scientific research (keywords are data access and data documentation). In addition sophisticated methods to do investigation on the level of individuals have been developed. But still the landscape for research data is a fragmented one. Data is collected for different reasons and by different institutions. To enable the full power of modern microdata research, methods for record linkage, that bring together the information about single individuals that are distributed about different data sources, are needed. Within the plenary the developments and needs for linking data on the individuals level and therefore for enhanced record linkage techniques will be discussed with a focus on the state of play in the USA. Timothy M. Mulcahy, is a Senior Research Scientist in the Economics, Labor, and Population Studies department, Project Director of the NORC Data Enclave, and Co-Principal Investigator of an R21 grant examining illicit retail drug markets across the U.S., sponsored by the National Institute on Drug Abuse. He has nearly 20 years of experience in social science research developing and implementing complex, data-centric projects involving sensitive data, evidence-based research, and data warehousing. His areas of expertise include criminal justice and drug policy, secure remote data access technology, data privacy, statistical disclosure control, and confidentiality. He has served as an invited speaker, keynote, panel chair, and panelist at numerous conferences, workshops, and seminars and has published widely on digital age dissemination, data access modalities, data privacy and confidentiality and statistical disclosure control. Prior to 2004, he served as Senior Analyst at Justice Studies, Inc. where he completed two congressionally mandated studies for the National Institute of Justice, one examining the federal death penalty system and the other involving human trafficking in the US. He earned his undergraduate degree in English from the University of Virginia and his graduate degree from the Institute for Policy Studies at the Johns Hopkins University, where he specialized in public policy studies and economics.