Workshop: Improving a Good Thing: the New American FactFinder Interface
Michele Hayslett (University of North Carolina Chapel Hill)
Anyone who uses data from the U.S. Census Bureau will by this time have encountered the new American FactFinder interface. Come learn tips and tricks for getting the most out of it. Michele Hayslett will lead a tour with an example data search and then participants will have a chance to practice with hands-on exercises. If you missed Michele's webinar last June for the North Carolina Libraries Association Government Resources Section, this will be a similar session but will include new material.
Formalizing the process of data and metadata management has become increasingly important. The DDI metadata standard was designed to support metadata management from simple stand-alone studies to major statistical production systems. This workshop will look at how DDI supports the data and metadata management process from a high level business perspective. Use cases covering different organizational structures and processes will be used to provide a checklist of options for introducing DDI into an organization.
Colectica is a platform for creating, documenting, managing, distributing, and discovering data. Colectica is built on open standards including DDI 3.This training course covers the following topics: Introduction to Colectica Introduction to DDI 3 Documenting concepts and general study design Designing and documenting data collection instruments Creating and documenting data products Ingesting existing resources Publishing resources Hands-on: use Colectica to manage a sample study"
Workshop: Introduction to IPUMS and NHGIS: Analyzing and Mapping Demographic Data
Katie Genadek (Minnesota Population Center)
David VanRiper (Minnesota Population Center)
In this day-long two-part workshop, representatives from the Minnesota Population Center (MPC) will demonstrate the very latest harmonized resources for social, demographic, geographic and health research. The Minnesota Population Center (MPC) is one of the world's leading developers of demographic data resources. In this workshop, attendees will learn about the content of the MPC data resources and receive basic information about how to get and use the data, which are available free over the internet. The first part of this workshop will focus on the IPUMS projects, which include census and survey microdata data from over 80 countries. Attendees will download the data over the internet and analyze in a statistical package, and use the online analysis system to perform analyses without a statistical package. The second part of this workshop will focus on the NHGIS, which provides aggregate census data from 1790-present. Attendees will learn how the data are constructed, how to obtain the data, and then they will graph it using GIS.
Workshop: Online Data Tools for Curricular Use: Raising the Bar on Data Literacy for Undergraduates
Paula Lackie (Carleton College)
Adrienne Brennecke (Federal Reserve Bank of St. Louis )
Chiu-chuang (Lu) Chou (University of Wisconsin-Madison)
With tools like SDA, NESSTAR DataVerse, and more specific tools like ICPSR's Online Learning Center, BADGIR, Earth Exploration Toolbook, and FRED, we can help improve undergraduate experience with data. These tools empower students to reference compelling evidence and overcome the temptation to use weasel words. Join us to compare and contrast these web-based resources with particular focus on undergraduate research and writing.
Workshop: NORC and Secure Data Service: Unlocking Access to Sensitive Data
Richard Welpton (UK Data Archive)
Johannes Fernandes-Huessy (University of Chicago)
Daniel Gwynne (University of Chicago)
We know that government agencies collect a huge amount of information about individuals, companies and organisations. Many of us know the potential of unlocking these data for helping us to understand the world in which we live - but too often, access to these sources are denied by the data owners. Security and confidentiality are the main reasons cited.Both NORC and the UK Data Archive's Secure Data Service have pioneered secure access to sensitive sources of data, allowing academic researchers to undertake innovative analyses that better inform public policy.This workshop will focus on how to set up a secure data enclave (a remote access method that allows researchers to access sensitive data, without the data leaving a secure home), based on the experiences of NORC and SDS which are recent success stories. The following areas will be covered in the workshop: 1. Re-visiting the data access spectrum: why do we need a secure enclave? 2. Technologies 3. Security - standards, maintenance, auditing 4. Convincing data owners to provide the data 5. Staffing a secure enclave 6. Working with researchers to ensure safe access 7. Establishing a community of safe researchers 8. Lessons Learned - what we would do differently next time
Workshop: Keeping Your Archive Safe (and on TRAC) with SafeArchive and LOCKSS
Micah Altman (Massachusetts Institute of Technology)
This ½-day workshop focuses on how to protect your archival content, and how to formalize, document and audit storage policies. The workshop is appropriate for curators of content who wish to replicate their content and/or document storage policies for compliance with TRAC, Data Seal of Approval, and other archival standards. This provides hands on practice with, and example configurations for LOCKSS installation and configuration; SafeArchive installation and configuration, generating and interpreting policy reports, and TRAC documentation. At the end of the workshop the student will be able to install the SafeArchive system; use it replicate archival content exposed through DVN, OAI-PMH or the web; and produce formal audits and reports to determine compliance with archival storage policies.
2012-06-06: Plenary I
Creating new types of data from documents and administrative records: a use case from science
Julia I. Lane (American Institutes for Research (AIR))
In common with many countries, the substantial United States investment in R&D is characterized by limited documentation of the nature and results of those investments (MacIlwain 2010, Marburger 2005). Despite the increased calls for reporting by key stakeholders, current data systems cannot meet the new requirements; indeed, the conclusion of the Science of Science Policy interagency group's Federal Research Roadmap (National Science and Technology Council 2008) was that the science policy data infrastructure was inadequate for decision-making. In response to this need, a new data system is being built (STAR METRICS) drawing from administrative records and new computational approaches to analyzing unstructured text; this paper describes the initial results of that effort - focusing on expenditures during the 2011 Federal fiscal year from awards made by the National Science Foundation.
2012-06-06: A1: Research Data Management: assessments and planning
Data Management Planning for Secure Services (DMP-SS)
Fortunato D Castillo (UCL Institute of Child Health)
Stelios Alexandrakis (UCL Institute of Child Health)
Anthony Thomas (UCL Institute of Child Health)
Michael Waters (UCL Institute of Child Health)
Phil Curran (MRC Unit for Lifelong Health and Ageing)
Kevin Garwood (MRC Unit for Lifelong Health and Ageing)
The UK's Digital Curation Centre has recently developed DMPOnline, a web-based service that guides researchers through the process of writing a structured and standardised data management plan, mapped to a wide range of major research funding agencies. Our project seeks to augment the DMPOnline tool with an information risk management tool that would assist data managers in the development and maintenance of formal Information Security Management Systems in line with the ISO-27001:2005 international standard for information security. Since the DMPOnline is currently a simple checklist of questions we aim to refine this model. This project will assess the viability of creating an extensible architecture for data management planning through the use of DDI (an established metadata standard) and related tooling. A pre-existing open source DDI 3.1 registry and editors will be extended, allowing for the creation of semantically rich project-specific representations of data management plans through a DDI broker, providing the basis of interoperability with a set of data management services. We hope to demonstrate the value of data management planning in the provision and maintainance of effective information security.
DataONE: A Glimpse into the Practices of Data Managers
Eleanor J. Read (University of Tennessee)
Data Observation Network for Earth (DataONE; https://www.dataone.org/) is supported by the National Science Foundation and will ensure the preservation of and access to multi-scale, multi-discipline, and multi-national science data about Earth. DataONE will make biological data available from the genome to the ecosystem; make environmental data available from atmospheric, ecological, hydrological, and oceanographic sources; and engage scientists, land-managers, policy makers, students, educators, and the public through logical access and intuitive visualizations. The DataONE Usability and Assessment Working Group is tasked with conducting a variety of assessments on aspects of the DataONE project. This presentation will discuss the results of a survey sent to data managers to assess current data management and data sharing needs, practices, and attitudes. IASSIST members were invited to participate in the survey via the listserv. Questions asked included how much data is deposited on public, internal, or personal websites, access/use conditions, the reasons data managers do not make their data available to others electronically, policies and processes for data deposition and storage, use of metadata to describe data and the metadata standards and tools used, training on best practices and adequate funding for data management, and data management plans, among other things.
DMVitals: A Data Management Assessment Recommendations Tool
Andrew Sallans (University of Virginia Library)
Sherry Lake (University of Virginia Library)
The DMVitals is a key component of the UVa Library's Scientific Data Consulting (UVa SciDaC) Group's research data interviews. The DMVitals is a tool to assess data management practices based on a series of best practices statements. With the DMVitals a data management consultant associates a list of data management practices to portions of the research data interview and rank reserachers' current practices base to their level of data “sustainability”. The DMVitals tool generates customized and actionable recommendations to help researchers improve their data management practices. The tool also creates objective, repeatable recommendations that can be generated rapidly. This paper will detail the development of the DMVitals tool and describe how UVa SciDaC integrates it into their existing data interview and data management plan process to expedite the data management recommendation report process and to provide actionable feedback which researchers can use almost immediately to improve the sustainability of their data. The paper will also illustrate how the DMVitals can easily be integrated in other data management assessment tools such as the Digital Curatin Center's (DCC) CARDIO.
Data in Common(s): Collaborative Models for Robust Data Support
Samantha Guss (New York University)
Nicole Scholtz (University of Michigan)
Jennifer Green (University of Michigan)
Michelle Hudson (Yale Universitiy)
This panel session will explore the issues relating to providing data services within a library research commons by examining three new facilities in academic libraries in the United States: The University of Michigan's Stephen S. Clark Library for Maps, Government Information, and Data Services; New York University's Data Service Studio; and Yale University's Center for Science and Social Science Information. Presentations will focus on aspects such as service models, interactions and relationships to other service providers in the commons, facilities, and levels of data support as they relate to the data lifecycle and needs of library patrons including data management, curation, re-use, and education. This comparison will provide insight into successes, challenges, and lessons learned and will be valuable to those planning and working to improve or implement their own research commons with data service components.
Data's Different Missions in E-Science, E-Social Sciences and E-Humanities
Minglu Wang (Rutgers University)
John Cotton (Rutgers University)
Data has been closely related to all the e-research fields, be it e-science, e-social sciences or e-humanities. There are increasing amount of descriptive studies on the current data needs of different discipline, there are also best practices of data management developed with the mindset of taking care of data during its whole life cycle. But among all these hot discussions of the opportunities and challenges that data is bringing to the world, there haven't been enough philosophical thoughts or reflections on data's advantages and limitations under the consideration of the different goals of science, social sciences, and humanities' inquiries. This paper will start from Jürgen Habermas' epistemology of knowledge and human interests; then examine different e-research fields and their data usage trends with new theoretical lens of their ultimate missions, that is how well they have helped human beings instrumentally control the world, ideally change the society, and easily understand each other across temporal and spatial dimensions. We could then also have a higher level of vision about what might be possible in the future that data could help each research fields to accomplish.
Data Sharing Across the Disciplines, Revisited: Academic Journals and Replication Policies. An Emperical Study
Rob O'Reilly (Emory University)
Academics in the social sciences have long argued for increased sharing of research data as a means of increasing transparency and methodological rigor (see, for instance, the symposium on “Data Collection and Collaboration” in PS: Political Science Politics 43:1: http://dx.doi.org/10.1017/S1049096510990586). One oft-proposed means for encouraging data sharing is for academic journals for academic journals to encourage or require authors to make data publicly available as part of the publication process. But to what extent have journals heeded the call for replication policies? This presentation will re-visit prior work, presented at IASSIST 2009, that compared journals in Economics, Political Science, and Sociology in terms of presence or absence of policies requiring authors to make data available for replication purposes. Using updated data on a larger sample of journals, we will examine both the extent to which journals are adopting replication policies and whether such adoption varies across disciplines.
Network Analysis of Data Reuse in the Social Sciences
Kathleen Fear (University of Michigan)
Studies of the citation networks of scientific papers show that science is slowly but surely becoming more interdisciplinary: publications increasingly incorporate ideas from disciplines outside of the primary authors', and authors from different disciplines more frequently collaborate and co-author papers. Few, if any, studies, however, have distinguished between interdisciplinary collaboration or knowledge sharing and interdisciplinary data reuse. Given the significant barriers to data sharing and reuse that can arise even within a given discipline, it is possible that patterns of interdisciplinary data reuse differ somewhat from those of other kinds of interdisciplinarity. This paper examines patterns of interdisciplinary data reuse in the social sciences through a network analysis of data citation information from ICPSR. The results will offer insight for data producers and data managers in curating data to foster interdisciplinary reuse.
The Influence of Scholarly Output on Scientific Dataset Communication
Tiffany Chao (University of Illinois Champaign-Urbana)
Research datasets possess enduring value beyond their originally designed purpose, especially as they typically will never be analyzed or utilized to their fullest potential within a given project time frame. The extent of dataset reuse as a mechanism to understand long-term usefulness within and beyond formal channels of scholarly communication, is an area where further investigation is needed particularly in relation to domain-based and subdiscipline differences. Based on the use metric categories and publication types presented by the Inter-university Consortium for Political and Social Research (ICPSR) for the dissemination of social science-themed datasets, this study presents a parallel analysis of the distribution of traditional and non-traditional publications affiliated with publicly available datasets in the atmospheric sciences. These affiliated publications are works identified by the dataset creator related to a dataset and includes a variety of items such as peer-reviewed journal articles, technical reports, and theses. Preliminary results reveal the potential audiences and communities that dataset contributors reach through selection of publication type and quantity. As the boundaries of scholarly communication are continually pushed by the visibility and intellectual power of scientific datasets, a clearer understanding of their lasting value and extensive influence takes shape.
2012-06-06: A4: Panel: Toward Trusted Digital Repositories: Three Perspectives
Data Seal of Approval
Matthew Woollard (UK Data Archive)
Herve L'Hours (UK Data Archive)
This presentation will describe the Data Seal of Approval (DSA) initiative in the context of several options for repository audit and certification. The DSA is a lightweight means for repositories to demonstrate their trustworthiness through a process of self-assessment and peer review. Displaying the Data Seal of Approval enables a repository to communicate its compliance with archival best practices in a transparent and open way.
Colectica is a suite of software for managing statistical data for a single user or a large institution. It is based on the DDI 3 metadata standard and provides tools for documenting data, importing metadata from existing sources, and publishing documentation on the Web and in other formats. In this session we will demonstrate features added to Colectica in the past year, including: doubling of the coverage of the metadata content model, user-based access controls, integration with organizational user management such as LDAP and ActiveDirectory, enterprise database support, streamlined synchronization and versioning for multi-user metadata editing, improved full-text search, a new Dataset Explorer view for Colectica Web, linked data publishing, and new releases of the free DDI Reader and DDI multiple-level Validation tools.
Metadata Editing Framework: An open source project for building metadata tools
Jack Gager (Metadata Technology North America)
As part of the DDI4RDC development project initiated in 2009, a set of editors and researcher tools based on the DDI-Lifecycle model were developed to meet the Canadian Research Data Center Network metadata documentation specific needs. As the project evolved, the core components of the platform morphed into a flexible and generic framework on which numerous types of metadata management and user tools can be built. The core framework as well as the RDC specific editors will be released as an open source project in the second quarter of 2012. Our presentation will detail the underlying principles and architecture of the framework, as well as the design of the RDC specific editors. Particular focus will be paid to how these components can be leveraged by other organizations to develop metadata tools meeting their specific management and user needs. Examples will be provided detailing how the framework is being leveraged to develop stand alone DDI editing tools, a data management plan documentation tool, and a researcher Toolkit.
Making the best of existing tools for enhancing documentation
Andreas Perret (FORS)
Using commonly available tools, shareware and standard office solutions, FORS has created detailed codebooks describing public statistics and set-up a multi-lingual portal. The tools needed are mostly known and available to the research community, starting with statistical packages such as SPSS; the editing platform provided by NESSTAR complemented with the open-source Microdata management toolkit by the IHSN; and combined with the functionalities of spreadsheet applications such as Excel -- everything needed to publish data, even XML handling, is provided. Our presentation aims to show that high quality standards are not in contradiction with modest resources. Once the needs of the researchers are well understood, a little tweaking of existing tools allows huge improvements to visual efficiency and user-friendliness. As the skills needed lay far below the level of trained IT specialists, there is no absolute need to wait for the industry to come up with new software; competent and creative social scientists with more modest aims is all it takes.
Tools for extracting ASCII data and DDI metadata from Stata, SPSS, and other proprietary statistical file formats
Andrew DeCarlo (Metadata Technology North America)
Statistical data are often stored in proprietary files formats such as SAS, Stata, SPSS, and others. While useful for processing and analytical purposes, it makes it challenging to access unless you have the right software or utility, which often requires commercial licensing. While statistical packages are not particularly metadata aware, these files hold a significant amount of variable level information. Having the ability to extract these in a DDI friendly XML format along with complementing it with summary statistics computed off the data, is highly desirable. Extending on previous efforts, Metadata Technology North America has enhanced and developed new Java based packages for reading Stata and SPSS files that can export the data in ASCII text format and extract variable level DDI-Codebook and DDI-Lifecycle metadata (data dictionary and summary statistics). Various options are available in terms of ASCII flavors and metadata generation, providing features beyond what is typical export capabilities of statistical packages or utilities. This enables the conversion of data files into open format combining ASCII+DDI, fit for long term preservation, dissemination, or further processing by DDI aware tools. Our presentation will provide an overview of these utilities, describe use cases, share lessons learned, and discuss future development.
How to Infer an Ontology of the Data Documentation Initiative?
Thomas Bosch (Gesis - Leibniz Institute for the Social Sciences)
In close collaboration with domain experts, ontology engineers have designed a first draft of an ontology, describing the DDI data model's conceptual components which are most relevant in queries and use cases. XML Schemas express the vocabulary and the syntactic structure of XML documents representing both DDI data and meta-data. There is a huge amount of XML document instances, which has to be mapped to the RDF representation of the developed DDI domain ontology, in order to reuse already existing DDI data and meta-data and to profit from the benefits associated with the RDF format. A manual mapping, however, requires a lot of time, is very error-prone, and therefore not applicable. The authors devised a generic approach which converts unexceptionally any XML Schemas to generated ontologies automatically using XSLT transformations. Domain ontologies, such as an ontology of the DDI, can be derived automatically on the basis of the generated ontologies using SWRL rules. As a consequence, all the information, located in the underlying XML Schemas and associated XML documents, is also expressed in the DDI ontology and its RDF representation. Subsequently, ontology engineers will add supplementary domain-specific semantic information, not represented in the XML Schemas, to the DDI ontology.
Linking Study Descriptions to the Linked Open Data (LOD) Cloud
Johann Schaible (Gesis - Leibniz Institute for the Social Sciences)
Benjamin Zapilko (Gesis - Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moeltgen (Gesis - Leibniz Institute for the Social Sciences)
The Data Catalogue contains the study descriptions for all archived studies at GESIS. These descriptions include information about primary researchers, research topics and objects, used methods and the resulting dataset. They are primarily used for archiving and retrieval. However, for this purpose the existing metadata can be enriched with further information about the study content, investigators, involved affiliations, collection dates, and more from other sources like e.g. DBpedia, GeoNames or the Name Authority File (PND) of the German National Library. In this paper we present how to enrich a study description with datasets from the LOD cloud. To accomplish this, we expose selected elements of the study description in RDF (Resource Description Framework) by applying commonly used vocabularies. This optimizes the interoperability to other RDF datasets and hence the possibility to express links between them. For link detection we use existing algorithms and tools, which are most promising in discovering relevant links to related data. Once links are detected, the study description is linked to external datasets and holds therefore additional information for the user, e.g. occurred events before or during the collection dates of a study, which are relevant to its topic.
Accessing DDI 3 as Linked Data - Colectica RDF Services
Dan Smith (Colectica)
Linked Data represents an opportunity to publish the structured data of DDI 3 and publish it in a way that can be interlinked to become even more useful. Linked data makes use of existing web technologies standardized by the W3C such as HTTP and URIs and extends them to enable the machine actionability of web information. The DDI 3 has a very robust information model containing many relationships between metadata items. This can be seen through the DDI 3's use of references to enable reuse. This DDI model can also be represented in RDF. This paper examines why serializing DDI as RDF can be beneficial, how DDI 3 can be represented in RDF, stored in a repository, and queried using the standard SPARQL query language. An example of these concepts is demonstrated using Colectica RDF Services.
Thomas Bosch (GESIS - Leibniz Institute for the Social Sciences)
Benjamin Zapilko (GESIS - Leibniz Institute for the Social Sciences)
Arofan Gregory (ODaF - Open Data Foundation)
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
In Schloss Dagstuhl and in Gothenburg before EDDI 2011, ontology engineers and experts from the social, behavioral and economic sciences developed an ontology of both DDI Codebook and Lifecycle and implemented a rendering of DDI instances to RDF (Resource Description Framework). The main goals associated with the design process of the DDI ontology were to reuse widely adopted and accepted ontologies like DC or SKOS and also to define meaningful relationships to the RDF Data Cube vocabulary. Now, organizations have the possibility to publish their DDI data and meta-data in RDF and link it with many other datasets from the Linked Open Data (LOD) cloud. As a consequence, a huge amount of related DDI instances can be discovered, queried, connected, and harmonized. Only the semantic combination of DDI data as well as meta-data of several organizations will enable derivations of even surprising and not imaginable implicit knowledge out of explicitly stated pieces of information.
2012-06-06: B3: Planning for Preservation: TDR/OAIS in data archives
Improving Operations Using Standards and Metrics: Self-Assessment of Long-Term Preservation Practices at FSD
Mari Kleemola (Finnish Social Science Data Archive)
Preserving digital data is a challenge. For the outcome to be successful, many organisational and practical issues need to be in place. The Finnish Social Science Data Archive (FSD, http://www.fsd.uta.fi/) is dedicated to support the life cycle of digital research data in social sciences. It is therefore critical that FSD's operations and procedures are up-to-date and consistent with relevant standards and best practices. The key standard for long term preservation of digital data is the Open Archival Information System (OAIS) Reference Model (ISO14721:2003). This presentation will explore FSD's conformance to OAIS and describe how FSD's functions map to the seven OAIS functions (ingest, archival storage, data management, administration, preservation planning, access and common services), with special attention to ingest and access. We will also discuss the process and results of a self-assessment that was conducted analysing FSD policies, plans and procedures within the framework of the Audit and Certification of Trustworthy Digital Repositories (TDR) Checklist (CCSDS 652.0-M-1). We will summarise the weaknesses, risks and strengths identified, discuss the use of the TDR metrics and analyse how adopting standards and best practices could facilitate co-operation and collaborative partnerships.
Improving the Trustworthiness of an Interdisciplinary Scientific Data Archive
Robert R Downs (CIESIN, Columbia University)
Robert S. Chen (CIESIN, Columbia University)
The opportunity to deposit scientific data with an archive or scientific data center enables researchers and scholars to focus on their intellectual pursuits while trusting that the archive will attend to the details of providing stewardship, management, and long-term access services for their valuable digital assets. Whether an archive is worthy of such trust is a question that needs to be answered so that data producers will know where to submit their data for safekeeping and continuing dissemination to current and future communities of interest. A new standard released by the International Organization for Standardization, ISO 16363, specifies the requirements for certification of a digital repository as trustworthy. With the establishment of the new standard, archives have measurable targets for attaining trustworthiness—and users of archives, including depositors, have a way to determine if a particular archive meets their need for a trustworthy archive. An initial set of test audits using the new standard offers insight into the audit process and the level and type of effort required for archives to become certified as trustworthy. The authors will describe the test audit of an interdisciplinary scientific data center for compliance with the draft standard and summarize the test results.
Data Seal of Approval (DSA) - The assessment procedure
Lisa F. de Leeuw (Data Archiving and Networked Services (DANS))
Henk Harmsen (Data Archiving and Networked Services (DANS))
Lisa de Leeuw (DANS) Henk Harmsen (DANS) The Data Seal of Approval ensures that in the future, research data can still be processed in a high-quality and reliable manner, without this entailing new thresholds, regulations or high costs. The Data Seal of Approval and it's quality guidelines may be of interest to research institutions, organizations that archive data and to users of that data. It can be granted to any repository that applies for it via the assessment procedure. Achieving the DSA means that the data concerned have been subjected to the sixteen guidelines of which the assessment procedure consists. The data archive as an organization should take care of the overall implementation of the DSA in its own specific field. In the integrated Framework for Auditing and certification, which is being set up at the moment, DSA will be the first step in 3 tiered certification process, see: http://www.trusteddigitalrepository.eu The DSA-Board has developed an online self-assessment tool through which the DSA can be applied for. In this paper we will present the online application tool, focussing amongst others on problems and solutions, as well as elaborate on the procedures around DSA.
The project infinitE aims at improving the access to micro data in Germany. At present, researchers have the possibility to gain access to micro data via remote execution. By doing so, the researcher is provided with a so-called "data structure file" that enables him to develop a code that is applied to the statistical software of the researchers choice by the staff of the Research Data Centers. Afterwards, the output is manually checked for confidentiality and is handed over to the researcher. Unfortunately, this imposes a substantial waiting period on the researcher. In addition, the process of manual output checking demands a lot of manpower in the Research Data Centers. That is why in the course of infinitE the foundation for an automatization both of the process of starting the code of the researcher and the output checking was laid. This comes along with the need of the application of new methods of output checking instead of the cell suppression method currently used in the Research Data Centers. The outcomes of infinitE have demonstrated that a partial automatization, or even a full automatization of the German micro data acceess is not a question of feasibility, but of costs.
Scholars GeoPortal: Discovering data one layer at a time
Leanne Hindmarch (Ontario Council of University Libraries)
Jennifer Marvin (University of Guelph Library)
Launched this spring, Scholars GeoPortal (http://geo.scholarsportal.info), the newest service of the Ontario Council of University Libraries (OCUL) allows students, staff, and faculty at Ontario universities to discover, manipulate, and download a wide range of geospatial datasets. The result of a collaborative project involving participants from libraries across the province, the GeoPortal presents consortially licensed data collections to the academic community using exiting new tools that offer state-of-the-art web mapping features. Rich in data and highly visually engaging, the Scholars GeoPortal is the perfect material for a pecha kucha session! We'll provide a whirlwind tour of the portal itself, while touching on lessons learned while undertaking this complex project.
Statistical data are a valuable research resource, however they are usually contained in discrete datasets, disconnected from the people, places and things they describe. This can make finding, accessing and using statistical data a difficult and time-consuming process. This presentation will show how ESDS International, a specialist service of the Economic and Social Data Service which disseminates and supports aggregate and survey international datasets for the UK academic community, has begun to address this problem by using semantic technologies to expose the World Bank World Development Indicators as Linked Data. We will look at the benefits to be gained from exposing statistical datasets to the web of Linked Data, using the World Development Indicators as an example of how linking to other datasets can lower the barrier to usage of statistical data and increase the utility of a dataset. Finally, we will examine the process of exposing statistical datasets as Linked Data, looking specifically at the Data Cube Vocabulary, a framework which has been created to enable the publishing of statistical information as Linked Data, and showing how this relates to and can be used with SDMX, the main standard for dissemination of aggregate statistical data and metadata.
Lisa J Neidert (Population Studies Center, University of Michigan)
A picture is worth a 1,000 words; 20 images x 20 seconds is a Pecha Kucha This Pecha Kucha will present examples of a data service unit engaged with various audiences using a range of tools: webinars; web sites; online voting/surveys; contests; a blog; twitter; a crossword puzzle; audio interviews and even a roast. These examples include triumphs, whimpers, and lots in between. The initial impetus to embrace varied communication tools was to liven things up: in workshops - especially workshops crammed with too much material in too little time; selected techniques were also helpful in an 8:30am required quantitative reasoning class. But, over time, I've embraced technology as a way to enhance communication even when the audience is eager and engaged. In ten years, the technology and tools will be different, but the need for data professionals (and others) to reach their audiences will remain. So, the major issue is really the message, not the medium. But, it is important to learn, evaluate, and embrace new communication technologies. Alas a technique I haven't ever used is the Pecha Kucha. It is time for one more tool for audience engagement.
As data becomes an increasingly important element within the scholarly record there is growing recognition that researchers need to easily locate an original dataset that results are drawn from, and to verify and reproduce these results in order to build upon research work. Due to the lack of standards for citing data, the stage of locating the original dataset is not yet easy to do, or widely embedded in researchers' practice. This presentation will describe the approach taken to address this concern by ESDS International, a specialist service of the Economic and Social Data Service who disseminate and support aggregate and survey international datasets for the UK academic community. Key issues citing ESDS International hosted datasets, such as the OECD's Main Economic Indicators will be highlighted, including: revisions to historical series, frequently updated datasets, multiple publishers, data access restrictions and granualarity of the citation. We will explain how we worked with DataCite UK to create bibliographic citations conforming to the DataCite Metadata Schema (http://dx.doi.org/10.5438/0005) and used their Metadata Store to mint Digital Object Identifiers (DOIs). In addition we will illustrate how we have integrated citation information into our data delivery software. Finally we will describe future challenges - primarily user adoption and publisher buy-in.
OpenMetadata.org: An online portal for sharing and managing statistical data
Abdul Rahim (Metadata Technology North America)
Extensive efforts are being undertaken by data management agencies and users to leverage metadata standards to provide better access to statistical data and knowledge. This however requires combining domain expertise, infrastructures, tools, and technologies, an endeavor that can be challenging even for larger organizations. Metadata Technology North America and Integrated Data Management Services are at the forefront of such efforts. As such, we recognize the need for providing free or low-costs access to shared infrastructure and tools for fostering metadata-driven data management and sharing. OpenMetadata is our attempt at achieving such objective.The web based portal, to be unveiled at IASSIST 2012, aims to provide agencies and users with an online space for sharing metadata and gaining access to useful tools/utilities around the DDI, SDMX and related standards. Furthermore, it will provide detailed provenance information, directing users to sources of data to apply for access and/or leverage other services. The presentation will provide an overview of the initial set of tools and services available through the portal, including OpenDDI, and outline our long term vision.
Bringing them in: what 5 years of data can tell us about growing a data service
Kristi A Thompson (University of Windsor)
The University of Windsor Academic Data Centre, a walk-in service for help with data analysis and statistical software, opened in 2006. Since opening we have kept data on every user to walk in our doors, recording their status, department or major, project type and how they found out about the service. This presentation will delve into the data with graphs, charts and other analysis to see what it can tell us about how to keep bringing them in!
DDI Lifecycle and Qualitative Data: Development of a Formal Model
Arofan Gregory (Metadata Technology North America)
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
In December of 2011 the DDI Qualitative Working group began the formalization of a model describing qualitative studies and mixed-method studies, to complement the existing DDI - Lifecycle model describing quantitative studies. From a large number of collected use cases, it was determined that a very general model would be needed. While not yet in final form, this model borrows from several existing sources such as QuDEX and TEI, and aims to support the description of CAQDAS-documented multi-media data, text analysis, and more traditional interviews, texts, video, audio, and image collections. It will support collections and sub-collections, annotations and coding at the level of the files, and also within files of different types. Further, it will provide a rich toolset for linking between and among qualitative and quantitative data objects. The end result of this work will serve as the basis for the enhancement of DDI - Lifecycle in a future release.
Election Studies: A Research Data Management Challenge
Laurence Horton (GESIS-Leibniz Institute for the Social Sciences)
Alexia Katsanidou (GESIS-Leibniz Institute for the Social Sciences)
Election studies present a research data management challenge due to the size and diversity of the data they collect in a concentrated period. We consider how data infrastructure can support election studies to ensure the data they produce is good quality reusable comparative data. Election studies are critical large-scale data investments. When fully realised, election studies provide an unparalleled spatial and temporal resource for studying social attitudes and political behaviour. They have contemporary value for researchers but also as long-term data investments with prospective comparison across time and in comparative cross-national analyses. However, election studies operate under circumstances exceptional to other long-term longitudinal or repeated cross-sectional studies. Elections may occur at irregular intervals or as sudden ‘snap' elections. Consequently, studies may have a concentrated period to design and collect data. Here the risk is that data management concerns are relegated to irrelevant or secondary consideration under the pressure to produce publications, with the result that data is not comparable across either time or spatial units. We seek to ensure election studies produce well-documented data, collected in compliance to recognised data harmonisation standards by ensuring data management planning and metadata are incorporated and implemented in the design and execution of the research.
Sensitive Data: Organisational Aspects of confidentiality protection and Privacy Concerns
Reza Afkhami (UK Data Archive)
The purpose of this study is to extend the legacy threat-vulnerability model to incorporate human and social factors. This is achieved by presenting the dynamics of threats and vulnerabilities in the human and social context. We examine costs and benefits as they relate to threats, exploits, vulnerabilities, defence measures, incidents, and recovery and restoration. We discuss privacy concerns and the implications of implementing employee surveillance technologies and we suggest a framework of fair practices which can be used for bridging the gap between the need to provide adequate protection for information systems, while preserving employees' rights to privacy. Organizations have to prioritize the security of their computer systems in order to ensure that their information assets retain their accuracy, confidentiality, and availability. While the importance of the information security policy in ensuring the security of information is acknowledged widely, there has been little empirical analysis of its impact or effectiveness in this role.
Nicole Quitzsch (GESIS - Leibniz Institute for the Social Sciences)
The classic form of dissemination of scientific data is to publish only the results of data collection. Their results are published in professional journals, usually without the underlying data. Data should no longer be exclusively part of a scientific publication. This is why the GESIS - Leibniz-Institute for the Social Sciences and the ZBW - Leibniz Information Centre for Economics decided to implement the DOI registration portal for German social and economic data - da|ra. The datasets receive unique DOIs as citable identifiers and all relevant metadata information. Persistent identifiers together with their bibliographical information provide the opportunity to find and to cite primary data in scientific publications. The data is clearly quotable about the DOI. Pre-condition for the citation of data is the creation of qualitative metadata. The da|ra metadata schema consists of mandatory and optional fields. It is compliant with other important metadata standards to ensure metadata interoperability. This presentation will deal with the different steps need to be done for the data publishing process and focus on how new features are included in the already existing DOI service. It will show that using the services enables scientists comfortably to discover, find and access scientific data.
Johns Hopkins University Data Management Services: Reviewing Our First Year
David S Fearon (Johns Hopkins University)
Betsy Gunia (Johns Hopkins University)
Of the growing number of academic libraries helping researchers with data management, Johns Hopkins University has one of the first “full service” infrastructures providing data planning consultation and a repository, JHU Data Archive, built specifically for research data. Although drawing upon the expertise of our partner, the Data Conservancy, we are continually evolving our service model, and testing by trial its sustainability and accommodation of diverse practices among disciplines. We will report on the first year of our Data Management Services program, focusing on planning support for NSF's data management plan requirements, and the particular needs of social science. We have developed tools, such as a questionnaire and in-person meetings, for helping researchers with NSF's 2-page plan, and project management workflows for depositing data into the JHU Data Archive. We will discuss outreach strategies for publicizing our services, and incentives for researchers to invest in data preservation and sharing. Case examples from working with a range of social sciences illustrate data management issues distinct from “big data” sciences, such as sharing data with personal identifiers, managing qualitative research, and multi-disciplinary collaborations. With data management requirements expanding among funders, innovations by academic libraries are of broad interest to data curation professionals.
Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students
Maria A Jankowska (Charles E. Young Research Library, UCLA)
This paper argues for collaboration among faculty, academic subject specialists, data librarians, GIS specialists, and data curators in order to respond to growing graduate student demand for digital statistical information and data. It presets weaknesses and strengths of a model operating at the Charles E. Young Research Library at the University of California, Los Angles. The article outlines a process in which graduate students benefit from having access to multiple points of service. The diversity of service points fosters relationships among all participants and improves communication between instructors and library staff, ultimately strengthening services offered to graduate students. Major challenges to the ongoing successful partnership include the availability of needed resources and sustainability of the model, which fulfills graduate students' needs for numeric, statistical, and geospatial data.
Establishing Collaborative Networks in Sporting Data
Carol Perry (University of Guelph)
Michelle Edwards (University of Guelph)
Over the past decade, it has been standard practice in academic institutions for data centres to be the primary location for services related to supporting data. With the advent of changing funder requirements related to research data, data support has quickly become foremost on the minds of administrators across campuses. At the University of Guelph, seemingly disparate groups are now collaborating to provide a suite of services in support of research and the data it produces. This presentation will probe the emerging trend of bringing together expertise from different stakeholders in order to streamline services while enriching the level and depth of support available to researchers.
Collaborative Data Management: Best Practices throughout the Data Life Cycle
Amber Leahey (Scholar Portal, Ontario Council of University Libraries)
Jacqueline Whyte Appleby (Scholar Portal, Ontario Council of University Libraries)
Steve Marks (Scholar Portal, Ontario Council of University Libraries)
While there is increased recognition of the value of rigorous data management, budgets and resources for this kind of activity are stagnant or decreasing. Perhaps because of this, there has been a growing interest in pursuing collaborative efforts to implement best practices throughout the research data life cycle. Effective collaborations can be local, involving individual researchers or research teams, or large-scale initiatives involving multiple institutions in either informal relationships or formal partnerships such as consortia. When data is collected, processed, archived, or disseminated as part of a collaborative process, the potential for problems is heightened - but so are the rewards. This session will look at examples of effective collaborative data management at all stages of the data life cycle, and consider some of the challenges and potential successes at play when we work together to improve data collection, preservation, and access. Examples will range from landmark projects to emerging initiatives, and include case studies from the Ontario Council of University Libraries (OCUL), an academic library consortium
2012-06-07: C2: DDI Implementation, Production, and Migration
Migrating a Large Collection to DDI-Lifecycle
Wolfgang Zenk-Möltgen (GESIS, Leibniz Institute for the Social Sciences)
The GESIS Data Archive holds about 5000 studies, mainly social science surveys. The documentation of these studies consists of study descriptions, variable descriptions with questions and answers, and other material like methodological information. The datasets are documented by tools that use the DDI-Codebook metadata standard (formerly known as DDI 2). Since 2008, the DDI Alliance has published the DDI-Lifecycle standard (DDI 3/DDI-L), that focuses on re-usable documentation and the support of the full research data lifecycle. To use some of the many advantages that DDI-L provides, a migration of the available documentation should be conducted. The talk will focus on the benefits and challenges of such a migration project, and will show possible options during that process. The use of the recently published DDI version 2.5 will be considered because it aims at making the migration to DDI-L easier. The support of software for the format conversion and for the necessary re-arrangement of documentation parts will be investigated. The consequences of such a migration project for the future maintenance of the data and documentation will be shown.
From 2010, Social Science Japan Data Archive started to study DDI and also develop Easy DDI Organizer(EDO). EDO is a tool which helps researchers to conduct social surveys. It enables researchers to record survey metadata such as study purpose, sampling procedure, data collection, question, and variable descriptions along with data lifecycle. In this year, following new features were added to the EDO. These are importing SPSS files, question sequence manager, exporting codebook/questionnaire, and English interface. We will introduce these new features at the presentation.
Integrating DDI 3-based Tools with Web Services: Connecting Colectica and eXist-db
Johan Fihn (Swedish National Data Service)
Jeremy Iverson (Colectica)
The Swedish National Data Service (SND) maintains metadata about its holdings in the Data Documentation Initiative's DDI-Lifecycle format. The total amount studies in the holdings amounts to over one thousand, both quantitative and qualitative. SND stores and indexes this metadata using eXist-db, an open source XML database. Colectica is another DDI 3-based tool, but by default it uses a different repository structure for storing metadata. In order to allow Colectica tools to interact with SND metadata, we implemented a set of Web Services on top of eXist-db that allow Colectica to store and load information using eXist-db. We will demonstrate functionality provided by the eXist-db system, discuss the steps we took to integrate with Colectica, and demonstrate the resulting functionality with the two systems working together. We will also present recommendations on how to interact between DDI repositories in general and DDI tools. Implementations on our approach could be done from other DDI repositories.
Metadata-driven Survey Design at the Australian Bureau of Statistics
Samuel C Spencer (Australian Bureau of Statistics)
DDI provides survey methodologists with ample metadata to describe statistical survey design. However it is widely recognise that description of processes is not enough, we must be able to use metadata to drive statistical workflows. One particular goal is the ability to automatically generate personalised and dynamic electronic forms from structured metadata. The Australian Bureau of Statistics has conducted research examining how this can be achieved through the novel use of XML technologies to enhance standard DDI 3.1 XML. By examining the use of XSL transforms, and XPath specifications embedded with DDI Lifecycle metadata, we can provide metadata-driven 'industrialisation' to statistical processes. To demonstrate this capability, this talk presents a case study from the Australian Bureau of Statistics, featuring a late-stage prototype of an XSLT system that automatically creates dynamic web forms featuring complex question sequencing and word-substitution in questions using DDI Lifecycle XML. This is demonstrated using DDI3.1 that describes the complex series of ABS instruments - Monthly Population Survey, which includes the Australian Labour Force Survey as well as supplementary questionnaires.
2012-06-07: C3: Technological approaches to enhancing data interoperability
Supporting the Sharing of Longitudinal Health Data
Veerle Van den Eynden (UK Medical Research Council)
The Data Support Service project of the UK Medical Research Council (MRC DSS) developed a Research Data Gateway to enable the deep discovery of MRC-funded population and patient studies and their datasets and variables. The Gateway enables researchers to find and explore variables across longitudinal cohort studies, to support data linkage for new research. A federated approach is used, whereby studies are responsible for storing, preserving, curating and disseminating data; publishing standardised metadata into the gateway. The system uses a Drupal content management system and Apache solr search and browse functionality, with metadata organised into modular units representing studies, time periods, collection events and variables. Users can search and discover variables across studies and export baskets of variables to request access to data. The directory holds over 45,000 variables for four case studies: Avon Longitudinal Study of Parents and Children (ALSPAC), National Survey for Health and Development (NSHD), Southampton Women's Study (SWS), Whitehall II. Variables for a further ten cohort studies are being incorporated. Development towards a DDI3.1 metadata exchange standard is ongoing, enabling metadata from diverse formats and structures to be ingested into the gateway. MRC DSS also works closely with research units towards integrated data management planning.
Data without Boundaries: A DDI-Based Metadata Model for Supporting Cross-National Data Discovery
Arofan T Gregory (Metadata Technology North America)
This presentation discusses the work of Data without Boundaries Work Package 8, exploring the requirements for a joint European-wide portal for the discovery of microdata held by statistical agencies and social science data archives across Europe. In support of this work, a survey of the various organization's metadata holdings has been explored, and work undertaken to produce a metadata model for implementation in Work Package 12. This metadata model will span both the Data Documentation Initiative for documentaing microdata and the Statistical Data and Metadata Exchange (SDMX) model for aggregate data holdings in the statistical offices. while European researchers may be familiar with the data holdings in their own national archives and statistical offices, they may not have as great a familiarity with holdings in other European countries.Aggregate data will be indexed and linked to microdata holdings to provide for improved discovery capabilities for European researchers. Similarities to the ongoing work on RDF expressions of SDMX and DDI are also explored.
Going Local with a World Class Data Infrastrucure: Enabling SDMX for Research Support
Rob Grim (Tilburg University)
At Tilburg University tools are needed to support the workflows of researchers. This paper reports on the use of SDMX to build the World Taxation Indicators portal. The project aims to fill in data gaps that limit research on taxation and to enhance the visibility of taxation research methods and concepts. SDMX is used to capture and register both metadata and research data that are collected in addition to data that are publicly avaialble. An SDMX registry is used to populate a metadata repository. An SDMX repository is used to store the taxation indicators and the time series data that are collected by a macro economic research group. SDMX was chosen as the preferred technology as this standard interoperates with the existing infrastructure for statistical data exchange and can be used for cross-disciplinary research suppport. The CARDS project (Controlled Access to Research Data Storage) project was granted by the SURFfoundation and ran from January to December 2011.
Administrative and Survey Data: DDI-based Documentation for a Combined Analysis
Marcel Hebing (German Institute for Economic Research (DIW Berlin))
David Schiller (The Institute for Employment Research (IAB))
In the search for data more powerful, resources are often created by combining data from different sources, e.g. administrative and survey data. Such merged data sets could only serve the scientific community, if they are high quality. Thereby data documentation is of vital importance, and no easy task. Data that accrued out of two different sources needs an adjusted, standardized and easy to understand documentation. The DDI standard can fulfil these needs. The Institute for Employment Research (IAB) and the German Institute for Economic Research (DIW Berlin) are two major data providers in Germany, the IAB for administrative data and the DIW Berlin for survey data (German Socio Economic Panel, SOEP). Within this presentation the authors will show the challenges in implementing a standardized metadata documentation, the importance of a well-suited documentation for data quality and the advantages of an agreed data documentation for comparison and combination of datasets. The focus will lie on the Data Documentation Initiative (DDI), a metadata standard for research data.
2012-06-07: C4: Practical approaches of record linkage in RDCs
Panel: Practical approaches of record linkage in RDCs
Stefan Bender (Research Data Centre (FDZ), the Institute for Employment Research)
Christopher Gürke (Federal Statistical Office)
The planned session deals with the application of record linkage by several German institutions. The paper "German Record Linkage Center (GRLC)" describes the GRLC, a long-term infastructure facility with the main goal to increase the number and quality of record-linkage applications in order to increase the analytical power of existing data, and to unlock new data sources for research. Afterwards, two practical applications of record linkage are presented. The paper "The project "combined firm data for Germany" Access to combined business micro data" is about a research project carried out by different institutions which provide researchers with enterprise-level micro data. In the course of the project data of the participating instiutions has been merged. Because of the lack of a direct identifier the process of data integration has been very complex and time consuming. That is why record linkage was used. In this context, different string comparisons had to be tested and evaluated. The final paper "German census 2011 as a mixed method design" has a related background: For this years census it was - for the first time - decided not to interview every household but to use a register-based design. Thus, the final data will be a mix of register based complete census and sample survey. Next to a description of the assessment method used the presentation will introduce the way the collected datasets are merged. Concerning the methodology, this procedure is appealing because the used data have no common identifiers. Aditionally, the presentation will introduce the statistical generation of the households. Overall, the session should be of interest for all conference participants dealing with data integration and the application of record linkage.
The planned session deals with the application of record linkage by several German institutions. The paper "German Record Linkage Center (GRLC)" describes the GRLC, a long-term infastructure facility with the main goal to increase the number and quality of record-linkage applications in order to increase the analytical power of existing data, and to unlock new data sources for research. Afterwards, two practical applications of record linkage are presented. The paper "The project "combined firm data for Germany" Access to combined business micro data" is about a research project carried out by different institutions which provide researchers with enterprise-level micro data. In the course of the project data of the participating instiutions has been merged. Because of the lack of a direct identifier the process of data integration has been very complex and time consuming. That is why record linkage was used. In this context, different string comparisons had to be tested and evaluated. The final paper "German census 2011 as a mixed method design" has a related background: For this years census it was - for the first time - decided not to interview every household but to use a register-based design. Thus, the final data will be a mix of register based complete census and sample survey. Next to a description of the assessment method used the presentation will introduce the way the collected datasets are merged. Concerning the methodology, this procedure is appealing because the used data have no common identifiers. Aditionally, the presentation will introduce the statistical generation of the households. Overall, the session should be of interest for all conference participants dealing with data integration and the application of record linkage.
2012-06-07: D1: Supporting online access to geospatial, micro-, and qualitative data
VizLab: A Tool for the Interactive Exploration of Geospatial Election Data on the Web
Adam Schaal (The Center for Socio-Political Data, Sciences Po University)
Advanced desktop tools for geospatial data management and visualization are nothing new. What are new are developments in advanced online tools which help bring some of the same functionalities to the web, and in doing so make the data more accessible to the masses. Each of these solutions has its strengths and its weaknesses- depending on the goals of the application. With a lack of complete solutions (particularly non-commercial ones), custom application development is sometimes needed to meet an institution's goals. Requiring an online visualization/exploration tool for its election and demographic data holdings, the CDSP has developed an application that provides certain features key to its research community. Unlike with most existing tools, the user is able to create custom variables from those already available. Further, users can analyze multiple variables simultaneously through a combination of choropleths, proportional circles, charts, and tables. Also supported is the analysis of geospatial data across changing administrative boundaries. Additionally, the user is able to save the online application state for later analysis, as well as for sharing with colleagues. The online application was developed using entirely free and open source tools.
Open Source Solutions for Open Microdata: The IHSN Tools
Matthew Welch (World Bank)
Olivier Dupriez (World Bank)
A key objective of the International Household Survey Network is to provide data management tools to producers of microdata. These cover all phases of survey implementation, from survey design to data dissemination. Products include a suite of free, open source, DDI compliant data curation tools, known as the IHSN Microdata Management Toolkit. The IHSN Microdata Management Toolkit is being used in National Statistics Offices, the main data producers in developing countries, as well as increasingly by Universities and large International Development Agencies. This presentation will discuss the latest version of our data dissemination application, the National Data Archive as well as our future roadmap for this tool. We will also introduce our Question Bank and our new citations tool and central survey catalogue. Usage examples will include those at a large International Development Agency, a University and some National Statistics Offices.
Implementation of DDI in the National Institute of Statistics and Geography of Mexico
Eric M Rodriguez (National Institute of Statistics and Geography)
The National Institute of Statistics and Geography (INEGI) of Mexico has an experience of more than 10 years in the elaboration of metadata. This work has been developed to satisfy different needs for the information users. In 2009 began the adoption of the DDI with the objective of integrating the metadata projects to produce a more detailed metadata and develop a system of metadata. The adoption of the standard implied the review of the different metadata, the production of a vocabulary of metadata for the INEGI, and the development of materials to facilitate the documentation. In this document we will detail the activities realized for the adoption of the DDI at INEGI, as well as the advances and perspectives for the future.
2012-06-07: D2: Infrastructure to support research data management
Data Service Infrastructure for the Social Sciences and Humanities, DASISH
Michelle L Coldrey (Swedish National Data Service)
DASISH - Data Service Infrastructure for the Social Sciences and Humanities, brings together 5 ESFRI infrastructures within social science and humanities, namely CLARIN, SHARE, ESS, CESSDA, DARIAH with the focus on common activities across the disciplines and infrastructures and the aim of providing solutions to common problems, solutions that will strengthen international collaboration. Among the participants there are six CESSDA members: DANS, FSD, GESIS, NSD, SND and UKDA. DASISH is an EC 7th Framework project with a budget of € 6 million and is coordinated by SND. This presentation will elaborate on four main areas of the DASISH collaboration. 1. The digital research environment, which deals with data quality assurance in the production of primary data in the social sciences, 2. Tools and services, which deal with continuous data enrichment, metadata quality assurance, persistent identifiers and research discipline specific workflows, 3. Common Data Services which concerns digital preservation techniques, quality assessment of data centres and access to and availability of data through networks of community-based solutions and how this will be accomplished and 4. Legal and Ethics which deals with legal and ethics issues involved in preserving data and making them available.
Managing and Sharing Data within the Collaborative Research Center SFB882: Concepts and Requirements of a VRE for the Social Sciences
Johanna Vompras (University Library Bielefeld)
Wolfram Horstmann (University Library Bielefeld)
The recently founded Collaborative Research Center (SFB) 882 "From Heterogeneities to Social Inequalities" at the University Bielefeld represents a framework for 17 sub-projects on social inequalities. With the promotion of a Virtual Research Environment (VRE) by the INF project (integral part of the SFB) advisory and developmental services in the domain of information infrastructure will be provided. The VRE combines both a work and a project-specific research platform. The work platform bundles IT resources by bringing together various tools for administration, project management, and time- and location-independent collaboration work in a single environment adapted to researcher's specific working processes. The research component combines data management with further developments of social science methodologies. It provides services for archiving and re-use of datasets and is responsible for the infrastructural and methodological coordination of the data documentation. Almost entire data life-cycle is documented by implementing and applying DDI3 across all projects. In this talk, we present the challenges and barriers in designing and building-up a VRE for the social sciences. We mainly focus on documentation aspects, especially on the unification of varying documentation requirements which came up. Additionally, various key issues of research collaboration, as an outcome of the user requirement analysis, will be discussed.
ADA-Lab - A Virtual Laboratory for Australian Social Science Research
Steven McEachern (Australian Data Archive, Australian National University)
This paper presents an overview of ADA-Lab, a virtual research laboratory for Australian social scientists. The recently completed ASeSS project established the Australian Data Archive (ADA), launched in August 2011 (www.ada.edu.au). ADA is now working to extend these services, to develop the ADA-Lab which will advance and develop our current services developed by: [a] continuing the extension and diversification of ADA's holdings into new thematic sub-archives and data formats; [b] further development of the ADA visualisation tools [c] creating the underlying architecture and capacity of the existing cloud service to support a viable virtual laboratory for ADA. The new ADA-lab environment will extend the existing ADA storage and online analysis facilities to provide an integrated environment for data access and computation using ADA and external data sources in a high-performance environment. The ADA-Lab environment consists of four elements: 1. Data storage foundation 2. Web analysis environment 3. Remote access environment 4. Secure environment The ADA-Lab development program will also provide a new set of researcher and archivist tools for enabling existing and new ADA data to be used in the ADA-Lab. The paper will provide an overview of the core ADA-Lab infrastructure, implementation plans, and an overview of progress to date.
2012-06-07: D3: Towards an integrated model for access to official microdata in Europe: First findings from the FP7 DwB project
Panel: Towards an Integrated Model for Access to Official Microdata in Europe: First Findings from the FP7 DwB Project
Paola Tubaro (National de la Recherche Scientifique (CNRS)-Réseau Quetelet )
David Schiller (Und Berufsforschung Der Bundesagentur Für Arbeit (IAB) )
The session will present for discussions some preliminary outputs from the FP7 Data without Boundaries project about standards and harmonization issues for discovery, metadata, accreditation and access to official microdata within the European context. M. Wittenberg, M.Priddy, J.Sherpherdson ("Desired portal functionality for effective resource discovery on OS data across Europe") present the work undertaken about desired portal functionality for effective resource discovery: researchers expectations, constraints due to diversity in content and technical level of the various resources, metadata issues and identify the next steps for a possible architecture for the portal. C.Jayet, R.Fleureux, A. Mack, Ch. Wolf "(Servicing researchers in the use of European OS microdata") set forth the tools employed to cope with the difficulties encountered in using the existing metadata standards (DDI2 and 3) in the specific European case when documenting study programmes, studies and datasets from numerous institutions and countries and the metadata scheme proposed for documenting European official microdata both at national and European level with the aim of building a Service Centre for OS microdata in Europe. R. Silberman, P. Tubaro, M. Cros, B. Kleiner ("Access to official microdata and researcher accreditation: State of the art and future perspectives in Europe") map similarities and differences in national access and accreditation arrangements while pointing at commonalities for inter-operability and, ideally, adoption of common standards. D. Schiller ("A pilot for an European Remote Access Network - first results and upcoming challenges") presents preliminary outputs about the concept and architecture of a European distributed remote access for confidential official microdata coping with current legal constraints within Europe, security and standard issues, technical feasibility and researchers needs.
Panel: Towards an Integrated Model for Access to Official Microdata in Europe: First Findings from the FP7 DwB Project
Roxane Silberman (National de la Recherche Scientifique (CNRS)-Réseau Quetelet )
Marion Wittenberg (Koninklijke Nederlandse Akademie Van Wetenschappen - Data Archiving and Networked Services (KNAW-DANS))
Raphaelle Fleureux (National de la Recherche Scientifique (CNRS)-Réseau Quetelet)
The session will present for discussions some preliminary outputs from the FP7 Data without Boundaries project about standards and harmonization issues for discovery, metadata, accreditation and access to official microdata within the European context. M. Wittenberg, M.Priddy, J.Sherpherdson ("Desired portal functionality for effective resource discovery on OS data across Europe") present the work undertaken about desired portal functionality for effective resource discovery: researchers expectations, constraints due to diversity in content and technical level of the various resources, metadata issues and identify the next steps for a possible architecture for the portal. C.Jayet, R.Fleureux, A. Mack, Ch. Wolf "(Servicing researchers in the use of European OS microdata") set forth the tools employed to cope with the difficulties encountered in using the existing metadata standards (DDI2 and 3) in the specific European case when documenting study programmes, studies and datasets from numerous institutions and countries and the metadata scheme proposed for documenting European official microdata both at national and European level with the aim of building a Service Centre for OS microdata in Europe. R. Silberman, P. Tubaro, M. Cros, B. Kleiner ("Access to official microdata and researcher accreditation: State of the art and future perspectives in Europe") map similarities and differences in national access and accreditation arrangements while pointing at commonalities for inter-operability and, ideally, adoption of common standards. D. Schiller ("A pilot for an European Remote Access Network - first results and upcoming challenges") presents preliminary outputs about the concept and architecture of a European distributed remote access for confidential official microdata coping with current legal constraints within Europe, security and standard issues, technical feasibility and researchers needs.
2012-06-07: D4: Preserving and organising qualitative social science data resources for sharing and re-use
Panel: Preserving and Organising Qualitative Social Science Data Resources for Sharing and Re-use
Ruth Geraghty (National University of Ireland)
The session will be the shared responsibility of the DDI Alliance Exchange Qualitative data Working Group (Contact: Anne Sofie Fink, asf@dda.dk) Qualitative social science research generates complex data in a wide range of formats, including text, audio and visual. Making such materials available and usable to other researchers for secondary analysis requires professional standards of documentation and preparation of data, appropriate licensing arrangements, and ongoing data management and user support. A key initiative on standards and data management is taken by the DDI Alliance Qualitative Data Working Group. The group has proposed a conceptual meta model adding to the DDI-Lifecycle for qualitative data resources. An accompanying paper presents the challenges for a unified approach towards managing, archiving and accessing qualitative data, use of a controlled vocabulary for producing metadata as well as a requirement analysis. Additionally the paper introduces a number of cutting-edge tools for sharing and re-using qualitative data developed by data archives, universities and organisations. The session invites papers addressing all issues related to the title: “Preserving and organising qualitative social science data resources for sharing and re-use on”.
2012-06-07: E1: National Data Landscapes: Policies, Strategies, and Contrasts
Open Data in Slovenia: An Assessment of Accountability among Stakeholders
Janez Stebe (Social Science Data Archives, University of Ljubljana)
Social science data archives (ADP) hold a project with the short title ‘Open data'. The goal is to articulate a general strategy of open research data access in Slovenia, in all scientific disciplines and in accordance with OECD principles. In order to assess the initial conditions in the area of data handling in a country we conducted a series of half-structured interviews with different target stakeholders. The emphasis was on stories about the problems foreseen in open data access, and about suggested solutions to overcome barriers. We would like to present results of that preliminary study. The questions were about the accountability and mutual expectations of various stakeholders such as data creators and users, specialised services (e.g. research libraries) and policymakers. We inquired about the existence of specialised competencies, and need for collaboration of different professional communities. Comparison with studies about situations in other countries shows prevailing similarity. Slovene scientific community expressed high level of awareness about best practices in data handling that steam from intensive international cooperation. Concern was expressed about the lack of policy of incentives to motivate additional data exchange, and little support infrastructure, together with the need to develop additional competencies in information service oriented activities.
Strategies of Promoting the Use of Survey Research Data Archive
Meng-Li Yang (Center For Survey Research RCHSS, Academia Sinica)
The Survey Research Data Archive (SRDA) at Academia Sinica in Taiwan has large collections of data from both the government and the academia. In a recent survey on the more than 3,000 active researchers in humanities and social sciences, however, 52% did not hear of SRDA. Among those who ever tried SRDA but gave up (N=257), 17% complained that information provided about data sets was insufficient for an effective search, and another 17% said similarly that they could not find what they needed. Basically, the improvements that SRDA must make are sketched in the above survey results, i.e., the promotion of SRDA and better and more metadata for the archived datasets. For these goals, we devise several strategies. First, produce demonstrations of uses of survey data sets on the website. Second, make power point files of SRDA introductions and send them to college professors for use in class or for research reference. Third, revise abstracts and keywords of datasets. Fourth, construct metadata on Nesstar. Fifth, add a search function that bridges the gap between the restricted and non-restricted data collections.
Infrastructure for Evidence-based Resaerch and Statistical Education in Russia
Anna Bogomolova (Moscow State University)
Tatyana Yudina (Moscow State University)
Natalia Dyshkant (Moscow State University)
Statistical knowledge is one of the main competences of next generation specialists. One by one country declared statistical education as a national priority and launched programs to improve teaching and training in data understanding and analysis. Universities play a leading role. Following other countries Russian university community is working to enhance statistical education. First step is to compose modern statistical information infrastructure. The core is databases designed and maintained under the Moscow State University-based Information System RUSSIA The databases keep social and economic indicators at regional and local levels with annual and monthly update and are complimented by graphics- and map-based data visualization. System and comparative analysis tools are accomplished. Tutorials are implemented to train users. The first one is available to assist in evidence-based investigations in public administration. Work underway is on public administration domain ontology to integrate statistical indicators and analytical publications. The databases are available for free to all RF universities, colleges, think tanks, academic institutes, NGO, public libraries, specialists and citizens and serve for statistical culture dissemination in Russian society. The resource is of special value for local level powers - Russian national statistical agency Rosstat has only 3 years ago started to publish local level data. The UIS RUSSIA databases provide for more developed analytical services.
A Tale of Two Eagles: Comparing and Contrasting the Social Science Data Landscapes in the USA and Germany
Stefan Kramer (German Institute for Economic Research (DIW Berlin) )
Denis Huschka (German Data Forum)
The landscape of research, resources, involved organizations, and services related to social science data has developed quite differently in the USA and in Germany in the last few decades. For instance, while established (even if still developing) in the USA, the role of a designated “data librarian” in the academic/research environment is something quite new in Germany; while the federal statistical system in the USA is based on the activities of several dozen individual agencies, Germany has a single Federal Statistical Office, along with at least equally important central statistical offices for each of its states; and the practice of conducting a nationwide census of the population is established and largely unchallenged in the USA, but was so controversial in (West) Germany in the 1980s that it would not be undertaken again until 2011. Meanwhile, the ideas and challenges of sharing research data, planning for its management, providing curation services and persistent identifiers at appropriate levels for social science datasets, and developing and sharing best practices increasingly cross national boundaries and provide increasing opportunities for collaboration - between people and institutions in these two nations, and well beyond.
The increase and profusion of flavors of data librarianship, in the forms of service, management, curation and visualization have left many of the current data librarians wondering: what's the difference? Did the name of the profession get coopted to something brand new? Or are these "new breed" data librarians remixing or rebranding skills that social science data librarians have acquired through time? Or does the proliferation and change of descriptions actually signal a maturation of the profession? In this paper we will analyze position descriptions from job ads for data-related positions (largely in the US Canada) from the last 4 years, present some in-depth case studies of data librarians in various forms, map the positions in their organizational contexts, and submit some suggestions for future job description language and content to the IASSIST audience for scrutiny and reflection.
The State of Education for Data Curation and Librarianship
Susan R Rathbun-Grubb (University of South Carolina)
Funding agencies increasingly require detailed plans for data management, sharing, and archiving during the grant submission process. Researchers may lack the knowledge required to design and implement a feasible plan for long term storage and secondary analysis of their data, such as technical specifications, metadata standards, and archival challenges pertinent to the types of data they will collect. Librarians, information specialists, and grants compliance administrators in university settings are the natural partners of these researchers; however, these partners may also lack the necessary expertise in managing the data lifecycle. This paper will report the extent to which programs in schools of library and information science (LIS) are formally preparing students for positions in data curation and data librarianship. The results of a content analysis (currently in progress) of North American LIS program websites (iSchools caucus members and American Library Association-accredited programs) will be presented. This presentation will offer a comprehensive snapshot of the courses, certifications, tracks of study, centers/institutes, and grant-funded initiatives that are currently available to those who wish to work in data curation, in the hopes of initiating a conversation about the skills and knowledge needed by data professionals and how LIS programs can help to prepare them.
Data Management Training to Support Faculty Research Needs: Lessons Learned
Ryan Womack (Rutgers University)
To build the skills of its Data Team, the Rutgers University Libraries have developed an internal Research Data Management course. The RUresearch Data Team brings together subject librarians, metadata librarians, and the technical staff working with the RUresearch data portal. The goal of the course is to instill in all team members a baseline knowledge of all aspects of data management necessary to support faculty's research data needs. The entire team meets together approximately monthly for a two hour class. In between class sessions, group homework assignments and discussion reinforce the concepts. The class modules cover these topics and more: the data model; metadata; controlled vocabularies, ontologies, and linked data; data preservation and reuse; the data lifecycle; use cases; designing a research portal; and workflow management. The presentation will present the findings from interviews of course participants and instructors that explore their reactions to the course. What do subject librarians without prior exposure to data issues take away from it? How does learning about the data needs of researchers affect technical staff? What lessons did the Libraries learn from running such a course? This presentation will provide the answers.
Archives as a market regulator, or how can archives connect supply and demand?
Laurence Horton (GESIS-Leibniz Institute for the Social Sciences)
Laurence Horton (GESIS-Leibniz Institute for the Social Sciences)
What do researchers need from archives? What do archives need from researchers? These questions cover two types of researchers that encounter data archives: data creators and data reusers. These groups have different needs and it is archives that mediate between them. The role of an archive for creators is to support them in producing quality data, metadata and documentation and to facilitate wide and multipurpose data dissemination. By supporting multipurpose reuse to the fullest extent possible, archives help realise the value of public investment in academic research. This paper discusses the optimisation of research data management training and support for research data creators and data dissemination and long-term preservation for social science data archives. It outlines the GESIS plan to create a research data management and archive training centre for the cessda-ERIC European research area, to cater for both data supply and data demand. The training centre will look to ensure excellence in the creation and long-term preservation of reusable data in the cessda-ERIC area, contribute to promoting and adaptating standards in research data management and promote data availability and reuse. Finaly, the centre will provide and coordinate training on technologies and tools used by data professionals.
2012-06-07: E3: Latin America, Spain, and Portugal Data Organizations and Resources: An Evolving Discussion
Panel: Latin America, Spain, and Portugal Data Organizations and Resources: An Evolving Discussion
Stuart Macdonald (University of Edinburgh)
Luis Martinez Uribe (Juan March Institute, Madrid)
Paola Bongiovani (Universidad Nacional de Rosario, Argentina)
Alyson Williams (Inter-American Development Bank)
Aída Villanueva (Inter-American Development Bank)
Stuart Macdonald will report on progress of the IASSIST Engaging Spanish Speakers Action Group that he co-chairs with Luis Martinez Uribe including the organization of a series of webinars, the preparation of an IASSIST session and the translation of main IASSIST landing pages. We will then hear about practice in action in Latin America. Patricia Bermúdez Arboleda will discuss the explosion of research in Latin America and the Andean Region. In particular, for the Latin-American Faculty of Social Sciences (FLACSO) in Ecuador, the challenge is assumed through the institutional project of the Andean Virtual Academic Centre, FLACSO ANDES. The present work explores the creation process of the virtual center as well as the concrete situations that are being experienced during its implementation, appropriation and use. Paola Bongiovani will discuss research data access and management initiatives in Argentina. The Database National Systems initiative by the Ministry of Science, Technology and Productive Innovation (MINCyT) includes the National System of Biological Data, the National System of Sea Data, the National System of Digital Repositories, and the National System of Climate Data. Research data access and management legislation was promoted by MINCyT and it is now being discussed by Argentinean Congress. The National Council of Scientific and Technological Research is developing the Interactive Platform for Social Sciences Research to create an appropriate environment for data sharing, to allow interdisciplinary approaches and to contribute to the understating of complex problems. National University of Rosario is conducting a study to learn researchers' needs regarding repository services for data management and access. This will be followed by presentations of 3 librarians who will provide information you can take back to your institutions to help answer questions. An overview of the data produced at the Inter-American Development Bank (IDB) as well as the resources compiled by librarians at the IDB's Felipe Herrera Library will be provided by IDB's Alyson Williams and Aída Villanueva. Todd Hines will conclude with an overview of the major Latin American finance data resources available to non-commercial users.
Panel: Latin America, Spain, and Portugal Data Organizations and Resources: An Evolving Discussion
Todd Hines (Princeton University)
Patricia Bermudez (Latin-American Faculty of Social Sciences (FLACSO), Ecuador)
Stuart Macdonald will report on progress of the IASSIST Engaging Spanish Speakers Action Group that he co-chairs with Luis Martinez Uribe including the organization of a series of webinars, the preparation of an IASSIST session and the translation of main IASSIST landing pages. We will then hear about practice in action in Latin America. Patricia Bermúdez Arboleda will discuss the explosion of research in Latin America and the Andean Region. In particular, for the Latin-American Faculty of Social Sciences (FLACSO) in Ecuador, the challenge is assumed through the institutional project of the Andean Virtual Academic Centre, FLACSO ANDES. The present work explores the creation process of the virtual center as well as the concrete situations that are being experienced during its implementation, appropriation and use. Paola Bongiovani will discuss research data access and management initiatives in Argentina. The Database National Systems initiative by the Ministry of Science, Technology and Productive Innovation (MINCyT) includes the National System of Biological Data, the National System of Sea Data, the National System of Digital Repositories, and the National System of Climate Data. Research data access and management legislation was promoted by MINCyT and it is now being discussed by Argentinean Congress. The National Council of Scientific and Technological Research is developing the Interactive Platform for Social Sciences Research to create an appropriate environment for data sharing, to allow interdisciplinary approaches and to contribute to the understating of complex problems. National University of Rosario is conducting a study to learn researchers' needs regarding repository services for data management and access. This will be followed by presentations of 3 librarians who will provide information you can take back to your institutions to help answer questions. An overview of the data produced at the Inter-American Development Bank (IDB) as well as the resources compiled by librarians at the IDB's Felipe Herrera Library will be provided by IDB's Alyson Williams and Aída Villanueva. Todd Hines will conclude with an overview of the major Latin American finance data resources available to non-commercial users.
2012-06-07: E4: Panel: Unlocking the Power of Restricted Data: A Discussion among Researchers, Producers, and Data Service Providers
Panel: Unlocking the Power of Restricted Data: A Discussion among Researchers, Producers, and Data Service Providers
Stefan Bender (Institute for Employment Research, Germany)
Bill Block (Cornell University)
Warren Brown (Cornell Institute for Social and Economic Research)
Tim Mulcahy (NORC ta University of Chicago)
Chuck Pierret (Bureau of Labor Statistics)
Melanie Wright (UK Data Archive)
Never before has there been a time when social scientists had greater access to confidential microdata. Analysts may access these data in-real time through data archives, research data centers, remote access facilities, secure data enclaves, and virtual data centers, and more. They may also access confidential microdata through4 buffered access (e.g., remote batch execution) wherein users submit code through an online system and the output is reviewed manually for statistical disclosure concerns. Some data access systems, for example online tabulation engines and remote analysis systems, now even automate the batch execution process and provide output straight back to the user, after having gone through an automated disclosure process. Speakers in this session will address the advantages and disadvantages of each of these data access modalities focusing specifically on data protection, analytic utility, cost effectiveness, and researcher convenience.
2012-06-08: F1: Accessing Historic Records using Modern Tools
Crowdsourcing the Past with Addressing History
Stuart Macdonald (EDINA, University of Edinburgh)
The JISC-funded AddressingHistory project, led by the EDINA at the University of Edinburgh in partnership with the National Library of Scotland has created a online crowdsourcing tool and API which enables a broad spectrum of users (particularly local and family history groups, and genealogists) to combine data from digitised historical Scottish Post Office Directories (POD) for Edinburgh (1785, 1865, 1905 in the first instance), with contemporaneous historical maps. The technologies deployed are scalable for the full collection of 670 Post Office Directories covering the whole of Scotland. Phase 2 funding has developed functionality complementary to the original work and broadened geographic coverage of content. Work included spatial searching, and enhancing the geo-parsing process via discreet configuration files. Multiple addresses (i.e. entries where individuals have more than one domestic address) were also made explicit for searching purposes. Additional content for Edinburgh as well as Glasgow and Aberdeen (1881, 1886, 1891) to coincide with census (and an inter-census) years has been incorporated into the web tool, and a mobile Augmented Reality application will be added shortly. This presentation will discuss in more detail the social and technical approaches adopted in both phases of the project.
Three Layers: investigating the potential of data, records and context
Michael Jones (University of Melbourne)
Gavin McCarthy (University of Melbourne)
The University of Melbourne's eScholarship Research Centre is currently working on a number of projects requiring the preparation, submission, preservation and dissemination of multiple types of information. Work on the Saulwick Age-Poll Archive involves paper-based and digital archives, a digital guide to records, micro-data from political polls managed by the Australian Data Archive (ADA), and the proposed development of a context layer to maintain authoritative information on key people, organisations, events and subjects. Similarly, work on important historical social science data collected by Wilfred Prest and the Reverend Robert Richard U'Ren has involved extracting that data from hard-copy archival records for deposit with ADA. Drawing on these examples, we will explore the benefits of treating context, records and data as separate (but interrelated) 'layers', each with their own requirements and limitations. There are challenges involved, particularly when managing boundaries and information flows between each layer. Addressing these through effective collaboration between 'traditional' archives, data archives and contextual information managers is essential to success; and the needs of all three elements need to be considered and balanced to fully realise the potential of historical and current research data and related material.
Digital Reproductions of Authentic Materials for Teaching Early American History: Opportunities and Challenges for Networking Multilingual Records and Historic Maps
Laina Madeline W Padgett (University of Louisiana at Lafayette)
Traditionally, textbooks treating early American history have been written with Anglo-American biases. While many facts are irrefutable, certain socio-cultural perspectives have tainted the multicultural reality of American history. In conventional textbooks, initial English colonies are prominent, yet settlements established contemporaneously by the French and Spanish receive little attention. Growth of Britain's thirteen colonies is presented. However, Spanish development (Florida, Mexico) and French establishments (Canada, Mississippi, Gulf of Mexico) are hardly mentioned. After the American Revolution, the focus is on the nation's westward expansion, including the Louisiana Purchase and Lewis and Clark's Expedition. Yet germane facts remain neglected: Napoleon sold Louisiana due to political uprisings in Saint-Domingue. Sacagawea was indispensable to the success of Lewis and Clark, who might not have survived without her unique communicative capabilities in several Native American languages. Appreciation for America's multicultural past could be enriched by linking multi-perspective narratives to digital reproductions of historic maps, letters, journals, treatises, etc. written in English and other languages. Unfortunately, many historic documents remain inaccessible. This study first considers incorporation of authentic digital resources into educational materials and then explores tensions between “fair use” and institutions' claims of copyright protection over reproductions of public domain works housed in their collections.
2012-06-08: F2: Data management and curation interest group presents: Managing government data assets
Reuse and Remix of Government and Public Sector Data
Minglu Wang (Rutgers University)
Joshua Horowitz (Rutgers-Newark New Jersey DataBank)
Danielle Farrie (Education Law Center)
Peijia Zha (Newark Schools Research Collaborative, Rutgers-Newark)
Data management is integral to the provision of services in most organizations. With the increased global focus on managing data assets, and financial pressures internationally forcing governments to consider innovative methods of maximising returns to investment in data, this panel will draw upon experiences from a variety of service providers to examine some of these issues and highlight strategies that have evolved and are evolving in response to these challenges.nbsp; Minglu Wang will present on data management issues on government and public sector's data reuse and remix, especially systems built up by research centers/schools, but for the public use.
Maximising Returns to Government Investment in Data: Data Management
Tanvi Desai (London School of Economics)
Data management is integral to the provision of services in most organizations. With the increased global focus on managing data assets, and financial pressures internationally forcing governments to consider innovative methods of maximising returns to investment in data, this panel will draw upon experiences from a variety of service providers to examine some of these issues and highlight strategies that have evolved and are evolving in response to these challenges. Tanvi Desai will examine managing government administrative data to maximise returns to investment in data collection.
2012-06-08: F3: Towards Seamless Connections Between Born-Digital and Hard-Copy Records
2012-06-08: G1: Classification, Harmonization
Research on Cognitive Aspects of Classification: Effects on Metadata Practice and Standards
Daniel W Gillman (US Bureau of Labor Statistics)
John Bosley (US Bureau of Labor Statistics)
Scott Fricker (US Bureau of Labor Statistics)
Metadata practitioners and standards developers typically take classifications as given. Rarely do they look at how these were created and whether they make sense for respondents or data users. This talk will break tradition and discuss this issue. We considered a question in the Current Population Survey in the US on self-employment, called Class of Worker (COW). The COW question reads: “Were you employed by government, a private company, a non-profit organization, or were you self-employed (or working in the family business)?” The basic question was whether these four response options make sense together. Said another way, does the COW classification make sense for data users? Based on research done by the Small Business Administration and independent researchers, we suspected the answer was No. To investigate this, we paid 90 volunteers to come to BLS and classify a set of twelve job description vignettes based on two different groupings of the COW classification. The data show answering COW is a difficult task. Using research from cognitive psychology, we are now able to provide reasons for this, and we propose new roles for metadata practitioners and new considerations for metadata standards developers.
Data coding and harmonization: How DataCoH and Charmstats are transforming social science data
Kristi M Winters (GESIS - Leibniz Institute for the Social Sciences)
Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Martin Friedrichs (GESIS - Leibniz Institute for the Social Sciences)
Comparative social researchers are often confronted with the challenge of making key theoretical concepts comparable across nations and/or time. One example is the socio-demographic variable ‘Education'. To operationalize ‘education' researchers must review multiple educational systems across nations and/or changing educational structures within one nation across time. Further, researchers have multiple ways to recode education into a harmonized variable including (inter alia): the Hoffmeyer-Zlotnik/Warner matrix; the CASMIN education scheme; the International Standard Classification of Education; or a harmonized variable provided by the dataset itself. GESIS is developing two electronic resources to assist social researchers. The website DataCoH (Data Coding and Harmonization) will provide a centralized online library of data coding and harmonization for existing variables to increase transparency and variable replication. DataCoH initially will contain socio-demographic variables used across the social sciences and then expand to discipline-specific variables. The software program Charmstats (Coding and Harmonizing Statistics) will provide a structured approach to data harmonization by allowing researchers to: 1) download harmonization protocols; 2) document variable coding and harmonization processes; 3) access variables from existing datasets for harmonization; and 4) create harmonization protocols for publication and citation. This paper explains DataCoH and Charmstats and demonstrates how they work.
A DDI resource package for the International Standard for Classification of Education (ISCED)
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
Hilde Orten (Norwegian Social Science Data Services (NSD))
The International Standard for Classification of Education (ISCED) is at the heart of national and international statistical agencies' reporting on education. Over the last years, ISCED is also increasingly used by official and non-official survey programmes in the measurement of educational attainment. This even if ISCED 1997, the current version of the standard up to recently, is quite complex and does not contain a classification of educational attainment. In November 2011, UNESCO launched a new ISCED version. ISCED 2011 has a numeric coding framework, separate classifications for educational programmes and educational attainment, and more details at the level of tertiary education. Conceptual clarifications compared to the previous version are also made, and in sum ISCED has become more user-friendly. It is thus expected that its use will increase in the coming years. The DDI Lifecycle has a storing module called a resource package, that structures materials for publication that are intended for reuse by multiple studies, projects or user communities. This presentation focuses on how ISCED 2011 metadata components usefully can be structured in a DDI resource package, for the benefit of reuse by reference by national and international statistical agencies, as well as official and non-official survey programmes.