WB1: ICPSR at 50: Facilitating Research and Data Sharing
Peter Granda (ICPSR)
Mary Vardigan (ICPSR)
Linda Detterman (ICPSR)
Presented by ICPSR, this workshop will provide instruction in three primary areas including data exploration tools, data sharing, and data management. Participants in the workshop will learn to employ unique approaches to explore and teach with data using two of ICPSR’s most popular data tools: 1) The Social Science Variables Database (SSVD), enabling the data community to search for variables across datasets, and 2) the Bibliography of Data-Related Literature, a continuously-updated database of tens of thousands of citations to publications using data held in the ICPSR’s collections. Next participants will explore upcoming Census Data 2010 data products available at ICPSR as well as data and distinctive offerings found within several of ICPSR’s publicly accessible data collections. Lastly, as ICPSR begins to celebrate its 50th year as a member of the global data community, the workshop will turn to current and future data management strategies by sharing ICPSR’s experience in preparing data management plans, computing and data sharing in secure environments, and administering restricted data contracts electronically.
WC1: Sharing Data with DDI 3 and Colectica
Jeremy Iverson (Algenta Technologies)
Dan Smith (Algenta Technologies)
Colectica is a platform for documenting and sharing data using open standards such as DDI 3. This workshop covers the following topics: Introduction to DDI 3-based data sharing; Hands on tutorial Document concepts and general study design; Design and document surveys; Share questions with question banks; Create and document datasets; Share variables with variable banks; Ingest existing datasets, variables, and questions and Publish data documentation on the Web.
WD1: Thematic Mapping of U.S. Census Data in ArcGIS
Nicole Scholtz (University of Michigan)
ArcGIS software is a great tool for making customized maps and doing spatial analysis, but it has a fairly steep initial learning curve. In this hands-on workshop you will gain skill and confidence in creating a very specific kind of map: a thematic map with current or historical U.S. Census data, similar to those made in Social Explorer and Simply-Map. We will also review some basic concepts fundamental to geographic information systems (GIS). You will learn appropriateness of various free census tabular and spatial data resources and practice downloading and preparing data for use in ArcGIS. You will join tabular data to spatial data and create thematic maps. We will troubleshoot common projection and join issues, learn best practices for classification and color ramps, and practice adding context layers such as roads and place names. We will create map layouts suitable for export in a variety of file formats. We will look at scenarios which might lend themselves to using ArcGIS and discuss the appropriateness of this and other mapping tools.
WE1: Helping Empower Researchers and Their Institutions to Manage and Share Research Data
Louise Corti (UK Data Archive)
Bethany Brett (UK Data Archive)
In this workshop we will showcase the materials we have produced at the UK Data Archive as part of our support and outreach work on managing research data from the social sciences. The areas we are focusing on are: consent and ethics; IPR; data description and data formatting. The workshop will use a combination of ppt presentations (reflecting materials given to researchers and support staff) and hands-on activities such as: assessing and evaluating authentic consent forms for data sharing; anonymisation exercises; and assessing data formats such as interview transcription. Participants will learn about practical strategies that can help research centres or hubs set up in-house procedures to manage their own live project and legacy data. The workshop will showcase how the UK Data Archive has worked with: 1) Research Ethics Committees to help them appreciate how to accommodate data protection vs data sharing in their own advice, and 2) Continuous Professional Development training programmes within institutions to support core skills of data handling and data security; and 3) PhD and early career research training courses to support data awareness and data management.
WA2: Basic Principles of Survey Design
Thomas Lindsay (University of Minnesota)
Andrew Sell (University of Minnesota)
Michelle Edwards (University of Guelph)
What makes a good survey? This interactive workshop will explore the basic tenets of survey design for an audience of data professionals who are not familiar with survey design principles. In the course of the workshop, participants will work with each other and with the leaders to design a simple feedback instrument from beginning to end. Though the primary focus of the workshop will be on instrument design, the group will start by addressing the research questions that drive instrument creation and will work first to formulate a cogent set of research objectives and analytical goals to reach those objectives. With these objectives and goals established, participants will create a focused questionnaire designed to meet them. Using principles largely based on the theoretical work of Dillman and Krosnick, the leaders will guide participants through the various question types and answer methods to explain when and why survey designers use various question types and structures. Working collaboratively in small groups and as a large group, we will start with larger issues of bias, fatigue, and order effects, and will work in to narrower specific issues such as likert scale polarity and how to word scale items as we create a basic survey instrument. Finally, we will briefly discuss issues of usability testing, recruitment, and analysis.
WC2: Prepare and Publish Multilingual Metadata and Aggregate Data in Nesstar. Embed Live Data into Your Website.
Ørnulf Risnes (Norwegian Social Science Data Services (NSD))
A new version of Nesstar, Nesstar 4.0, with a lot of new functionality has recently been released. The metadata editor and publishing tool Nesstar Publisher is now also made available as freeware. The new version includes, among other things, support for multilingual metadata, powerful support for aggregate data, subscriptions/notifications, cell notes/missing data symbols, and embedding of live data into web pages. This workshop will be a hands-on one, and participants will learn how to use Nesstar to: Prepare and publish survey data with multilingual metadata Produce, prepare and publish aggregate data (cubes) Notify subscribing users about changes in the published material Add thematic mapping capabilities to microdata and aggregate data sets Embed tables, analyses, charts and thematic maps into ordinary web-pages.
WA1: Free Stars in the Data Universe - AKA – Open Sources of Data
Jane Fry (Carleton University)
Wendy Watkins (Carleton University)
All Canadian universities (and some colleges) are members of the Data Liberation Initiative (DLI) of Statistics Canada. This rich resource answers many of the data questions from our clients – but not all of them! This workshop will show you other stars in the data universe. And keeping in mind the budget restrictions under which many of us operate, we will be concentrating on the open data sources, that is, the free data resources, that are out there. We will be examining open data sources from various countries, with a focus on Canada. Some of the areas we will look at are: social surveys; election surveys; socio-economic country level data; public opinion polls; geographic and environmental information; and student financial surveys. The workshop will be part presentation and part hands-on so the participants will be able to examine these data stars themselves. Another component will be the participants showing us any open data sources that they want to share.
Harrison Dekker (Unviersity of California Berleley)
R, the open source statistical environment,is growing rapidly in use and increasing in visibility. R is one of 7 Computing Languages on the Rise, has been profiled in the New York Times, and will be featured in an upcoming issue of Forbes. This program will begin with an introduction to R, its features, and the characteristics that have led to its popularity in the data world. Hands on work will introduce R programming, data manipulation capabilities, basic statistical operations, and graphing functionality. Add-on packages, including graphical interfaces and data mining tools, will be illustrated. Finally, the workshop will discuss how R is being used with “big data”, interacting with databases and large data feeds to perform analysis.
WE2: Developing Effective Data Management Plans using DMP Online
Martin Donnelly (University of Edinburgh)
Sarah Jones (University of Glasgow)
In 2010 the US National Science Foundation (NSF) announced that it will mandate the inclusion of a data management plan with all new funding applications. The NSF is not alone in its efforts to improve accountability for data management, and we are beginning to see Research Councils and funding bodies around the world seeking evidence that adequate and appropriate provisions for data management and curation have been considered from the earliest stages of any publiclyfunded activity.The Digital Curation Centre has developed DMP Online, a Web-based data management planning tool that assists researchers in creating personalized data management plans to meet specific funders’ requirements. DMP Online has three main functions: to assist researchers in the preparation of basic data management plans at the grant application stage; to help them build and maintain a more detailed DMP during the project’s lifetime; and to enable customized reporting of these activities.This workshop will provide participants with: An introduction to DMP Online and related resources; The chance to work through the development of a plan drawing upon their own experience; Opportunities to share data management concerns and experiences.
A Look at Census Taking in Canada: The Recent Past And Looking Ahead
Ian Mckinnon (National Statistics Council)
Ian Mckinnon is the Chair of the Canadian National Statistics Council, the group that isnbsp; tasked to provide Statistics Canada with its best advice. This advice did NOT include cancelling the long form census and substituting a voluntary National Household Survey. He is uniquely placed to give IASSIST a view of how this change came about and the implications for Canada and Statistics Canada as a result.
2011-06-01: A1: Recent Developments in the DDI Implementation Landscape I
Can DDI eXist?
Johan Fihn (Swedish National Data Service)
Olaf Olsson (Swedish Language Bank)
Leif-Jöran Olsson (Swedish Language Bank)
This paper is about how the xml-database eXist can store and index DDI-instances. Some of the technical advantages of the implementation is very fast full-text search in documents, fast and flexible indexing with the built-in indexing engine Lucene and fast development of xml-webservices directly within eXist. We will look at the implementation of Swedish National Data Service's DDI3 based question bank in eXist and show examples of indexing and accessing DDI using eXist.
Recent Developments in the DDI Implementation Landscape 1
Arofan Gregory (Open Data Foundation)
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
This panel addresses recent developments in the DDI implementation landscape - new tools developments, and related developments in the standards. There has been much activity in the DDI community within the past year. Some of the existing tools have been extended with major new features; there are several new tools which support and use DDI 3, and there have been some developments for the DDI standards by the DDI Alliance itself, which will influence on-going tools support. Discussion covers both open-source and freely available tools, and also commercial tools which support DDI 3. This panel summarizes these recent developments, and highlights the points of interest. The panel is divided in three blocks A1, C1 and D1.
Collaboration in Data Documentation: Developing STARDAT - The Data Archiving Suite
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
Providing high quality data and documentation is a major demand on archives and researchers in the field of survey research. GESIS has been developing tools to increase standardized documentation processes on study level and on dataset level (e.g. DBKEdit, a web-based editing system for bi-lingual study descriptions, or DSDM for language specific documentation on variable level). However, the challenges of the DDI Version 3 and the collaboration needs at different stages of the data life cycle leaded to the awareness that an integrated management system for metadata is needed. Therefore GESIS started a project to develop "STARDAT - The Data Archiving Suite". It will be based on DDI 3 and will contain modules for structured metadata capture, management, and administration. STARDAT will be integrated into the workflow between data depositors, data managers, and data users. For this reason, a web-based data ingest module will be provided, that allows researchers to deliver metadata already when the project starts. Requirements of data curation, data documentation, data publication, and long-term availability will be incorporated. STARDAT will allow multi-language documentation on study and variable level, as well as the inclusion of further information, e.g. about related publications, classifications, continuity guides, scales or trends.
Converting MS Word based Questionnaires to DDI: A demonstration application for uses of metadata throughout the data lifecycle
Benjamin Clark (INDEPTH Network)
This tool demonstrates how a structured questionnaire design can be leveraged to harvest metadata, which can then be used to drive tasks downstream in the data lifecycle with greater ease. The application takes in a questionnaire designed in Microsoft Office Word 2007 (.docx format) document and uses LINQ to extract different meta components that make up the questionnaire. The extracted components are then translated into a XAML document that describes the data entry screen and a corresponding DDI document that describes the questionnaire. These two documents are then used to drive the data entry process and preform basic validations like restricting entered values to the codes in a coding scheme, basic data typing, data length and skips. Once the data is entered, the DDI is again leveraged to configure the export process that allows for selecting which variables to export and filtering by error status. The export process also uses the DDI to construct setup files for STATA and produce customized user documentation. Overall, this application demonstrate that creating DDI documentation does not have to be a painstaking, time consuming process done by hand and the advantages of documenting early in the data lifecycle, so that it can be used to drive onward data management activities consistently and efficiently.
Using DDI3 in a Technology Based Assessment environment - opportunities and problems
Ingo Barkow (DIPF - German Institute for International Educational Research)
The Technology Based Assessment (TBA) group at the German Institute for International Educational Research (DIPF) uses DDI3 in several settings to record metadata for paper pencil environments. Nevertheless, as the original aim of the group is to transit studies from paper pencil to computer based assessment it is planned to evolve electronic questionnaires (e.g. CAPI and CATI instruments) like they are used in studies like PISA or PIAAC to a common standard. At the moment TBA uses an own propriety XML structure to describe the items within the questionnaire, but is considering moving towards DDI3 also in the rendering engine which means the metadata and item development tool programmed for the pp instruments can be used as item editor for computer based questionnaires. This presentation will show the opportunities and also challenges for this process as well as introducing a workflow from the metadata editor to the rendering engine. Furthermore the newest version of Metadata Editor as well as a first prototype of the rendering engine will be shown. All tools in this presentation are open source software and can be requested for an own usage.
2011-06-01: A2: Data Management Services: New Roles and Connections
Curation in the age of complexity: reworking an ancient art
Michael Jones (Australian Data Archive - Melbourne)
Gavan McCarthy (Australian Data Archive - Melbourne)
Data curation can trace a lineage back to the very earliest records and the beginnings of materially recorded history. In the world of art galleries and museums, curation evolved to have specific meanings associated with selection, presentation, and interpretation for public exposure; that is, making sure that objects are understandable to the audience of the time. However, in Australia, curation has not been a term associated with traditional archival practice. With the advent of digital data archive programs and the cyberinfrastructure movement in the late twentieth century, curation was a term appropriated to cover a grab bag of roles and functions associated with data capture, documentation, preservation and access. Mapping complexity, that vast array of interconnectedness that characterises human endeavour, has become an acknowledged as necessary in establishing contexts of meaning and interpretation. Visualised networks of connectivity provide a means of understanding larger-scale worlds, lifting data from its mere isolation. These tools provides us with an opportunity to reinstate data curation with the intellectual, scholarly research endeavour as seen in the art world, releasing it from the more technical aspects of preservation and access. Network visualisations of some Australian data worlds will guide an interpretative reworking of curation.
Data Management Preservation: Creating a new service for researchers
Carol Perry (University of Guelph)
In January 2010, the University of Guelph Library launched a new data management and preservation service for researchers on campus. While the new service was in its initial stages of development, research groups on campus were eager to volunteer their research projects to serve as pilot test cases. Ultimately, two large, long-term research groups were selected and, following the administration of needs assessments, management plans have been developed around the issues identified for each research group. This paper will track the development of the service with particular attention being paid to the complex needs of the different groups involved in the pilot project. Subsidiary services for graduate students which have been incorporated into the program will also be discussed. Reactions to these new services will be explored.
UVa Library SciDac: New Partnerships and Services to Support Scientific Data in the Library
Andrew Sallans (University of Virginia Library)
Sherri Lake (University of Virginia Library)
The University of Virginia Library created a new unit called the Scientific Data Consulting Group (SciDaC) in May 2010 to respond to the increasing need for data management support in the sciences. Since its creation, the SciDaC group has focused on three main priorities: 1) data assessment interviews to establish a baseline on scientific research data management processes, 2) development of institutional support for researchers in response to new data management regulations (ie. NSF, etc.), and 3) integration of institutional scientific research data with the new institutional repository. This effort built upon previous experience providing scientific research support through a partnership with UVA’s central IT group. In order to build and scale this new endeavor, we have aggressively worked to establish and grow new partnerships inside and outside of the institution. The talk will focus on our experiences providing these consulting services and the challenges we have faced in integrating data management expertise into the library environment. It will also cover some specifics of the partnerships that we have developed and the opportunities that we are looking toward in the future.
Data archiving and cooperation with medical researchers - An example from Denmark
Bodil Stenvig (Danish Data Archives )
In the DDA a special unit called DDA Health collects preserves and disseminates health-related research data. A very important and necessary part of the effort of the DDA Health aims at increasing medical researchers’ awareness of the need for good archiving practices. So we have stimulated voluntary archiving by collaboration with medical research groups and medical researchers in many ways. The efforts has been on visibility of DDA Health and its values; the means have been a website, a newsletter, site visits, presentations in connection with various kinds of scientific gatherings, contributions to undergraduate courses and research training courses, and communication via a network of interested researchers. In the paper I will focus on the development of cooperation between DDA Health and the medical researchers for instance by making it visible that the DDA Health can add value to research data from medical research by supporting them with DDI 3 and GCP.
2011-06-01: A3: Building Capacity to Link, Visualize, Identify, and Discover
Statistical Data Analysis based on Linked Open Data
Benjamin Zapilko (GESIS - Leibniz Institute for the Social Sciences)
Brigitte Mathiak (GESIS - Leibniz Institute for the Social Sciences)
Oliver Hopt (GESIS - Leibniz Institute for the Social Sciences)
Research in the social sciences is based on the analysis of social phenomena via quantitative evidence. Scientists typically need to perform major and complex analyses on statistical data, but as part of their main tasks, they also require tedious secondary examinations on heterogeneous and distributed datasets, for example to verify prior or referenced assumptions or detecting correlations between two or more datasets. A lot of tools already exist which support researchers in processing and analysing their data effectively, but raw data often has to be converted to particular formats in order to be processed and analysed. In this paper, we propose a method to perform such statistical analyses on Linked Open Data resources in order to support common tasks that researchers encounter when working with heterogeneous and distributed datasets. The idea of Linked Open Data provides a technical basis for exposing, sharing and linking data on the web, based on an established web architecture, comprising standardised formats and interfaces. However, statistical calculations cannot be performed directly on these data sources yet. Our prototype covers some common exemplary tasks for data analysis in the social sciences.
Creating a Linchpin for Financial Data. The need for a Legal Entity Identifier (LEI)
Linda Powell (Federal Reserve Board)
The need for reference metadata has been recognized and identified as crucial within the financial industry. Unique and standardized identification of legal entities has become a top priority for the financial industry. It is now widely recognized that such an identifier is critical to efficient financial transaction processing and is critical for clear and unambiguous identification of parties and counterparties involved in all financial activities. Although the need for unique entity identification across government agencies, data vendors, and financial market participants has been discussed for years there is currently no entity identification schema used across the financial industry. The events of the last several years and the recent financial reform highlight the need for legal entity identifiers that cross organizations. The paper is the collaborate result of several financial regulators and: • Explores the current state of entity identification throughout the financial industry,• Presents the business cases for why unique and standardized identification is necessary,• Summarizes the industry best practices and requirements for entity identification, and• Identifies alternative approaches to implementing industry-wide identifiers.
Longitudinal and Time-series Documentation Protocols at the ADA
Melanie Spallek (The Australian Data Archive)
Steve McEachern (The Australian Data Archive)
This paper will provide an overview of the longitudinal data processing protocols being implemented at the Australian Data Archive as part of the development of the ADA Longitudinal Archive. As part of the migration to a new storage foundation, ADA has been reviewing its longitudinal data holdings to streamline longitudinal data processing, particularly focussing on improvements to the metadata associated with these studies to take advantage of emerging discovery and visualisation tools. This paper will examine ADA's experience with this review process, experiences with the development of guidelines for data dictionaries and variable mapping systems, and the implementation of the procedures with 3 sample longitudinal studies within the ADA holdings. The paper will then conclude with a review of ADA's plans for documentation and support of future longitudinal and cross-sectional time-series data holdings, including new longitudinal panel studies, public opinion poll data and major cross-sectional social surveys.
The expanding computational resources, web interfaces and spatially enabled data have provided an increasing capability for spatial visualisation in social science data analysis. This has prompted the Australian Data Archive to develop visualisation tools for spatial information that complement and extend the traditional analysis methods used in the social sciences. These improved methods however have major implications for the processing workflow for archiving of survey data, from the design of surveys to incorporate the accurate recording of geospatial identifiers, to maintaining confidentiality of geo-located respondents information to prevent identification by unauthorised users, and allowing researchers access to the data in new and powerful ways. This paper will present the recent work of the ADA and the ANU Supercomputing Facility in this area, providing an overview of progress in developing the ADA GIS data framework as well as a demonstration of new online visualisation tools for exploring spatial social science data.
Collaborating with Subject Librarians to Provide Undergraduates with Appropriate International Statistical Resources
Joe Hurley (Georgia State University)
Subject librarians often seek the assistance of data services librarians when faced with reference questions concerning statistical information. As more university courses emphasize international awareness, subject librarians frequently receive questions from undergraduate students seeking statistical information on developing and non-western nations. Often unfamiliar with and sometimes intimidated by international statistical information, some subject librarians are unsure where to begin. In addition, many undergraduate students are unskilled in how to properly interpret a statistical chart. More often, undergraduate students need statistics that include contextual information. The many United Nations agencies and divisions produce an abundance of publications that contain highly sought after international statistics and also provide the reader with background information that explain what the numbers mean, how they were collected and the shortcomings of the statistical information. This presentation will focus on the importance of increasing the awareness of both subject librarians and students of these United Nations publications and will also provide advice on how to access and search for these publications.nbsp;
A Multidisciplinary Analysis of Data Reuse Activities
Nicholas Weber (University of Illinois Champaign-Urbana)
Tiffany Chao (University of Illinois Champaign-Urbana)
The reuse and secondary analysis of digital data in the environmental and social sciences is aided greatly by well-established data repositories and a research culture that fosters trusted data sharing, respectively. However, relatively few studies in either of these disciplines have considered the component activities of reusing data beyond an initial phase of discovery; more often, studies have identified barriers to access or focused on the need to properly attribute datasets. We present a comparative analysis of those activities and practices surrounding secondary use of publicly available data in the environmental and social sciences. Identified activities in these two disciplines are grounded in the current literature, which include: selection criteria for reuse, methodological approaches as they vary by discipline, transfer protocols, citation practices, and explicit barriers to access and use of secondary data. This comparison of data practices provides a formalization of the implicit activities surrounding reuse that will prove highly valuable to data librarians and data scientists in their interactions with an increasingly interdisciplinary and collaborative research community. An analysis of data reuse also offers much needed insight to the development and maintenance of deployed cyberinfrastructures, particularly as these large-scale systems are geared toward data-centric research.
For many years the data archives have concentrated on the preservation, documentation and dissemination for secondary analysis of research data from the social sciences. The research libraries are holding the researchers’ publications based on their data. In Europe the research libraries have begun to be interested in either holding the research data or making open access to the research data and open access to the publications. This paper will firstly present the ongoing projects and efforts of the research libraries that have as objective to investigate the whole area of data discovery and data preservation and linking of publications and data. The paper will secondly discuss the role of the data archives and the research libraries: what is the role of the data archives in the process? How can the data archives’ experience with long time preservation contribute? Will or shall we co-operate more closely with the research libraries? Can DDI be used by both communities?
Lost in Translation? Experiences in documenting qualitative data at the ADA
Steven McEachern (Australian Data Archive)
Lynda Cheshire (Australian Data Archive)
Melanie Spallek (Australian Data Archive)
The expansion of data holdings to incorporate qualitative content has been a major emphasis of the Australian Data Archive since 2007, focussed on the establishment of ADA Qualitative (formerly AQuA). While there have been significant challenges in efforts during this time to encourage qualitative researchers to deposit content with the archive, the deposit of these new data forms have also created new challenges for the archive in ingesting, processing and dissemination. These challenges have been threefold: - methodological: what changes do researchers need to make in their methods to support archival practice - technical: how does ADA adapt its existing metadata schema and data management software (DDI2 and Nesstar) to support qualitative content - practical: how are processing procedures for archivists changed when documenting qualitative content This paper explores each of these challenges in turn, focussing particularly on the adoption of the QuDEx schema developed by the UK Data Archive to support qualitative data archiving. The paper will discuss ADA's experience with the use of the QuDEx schema to address these three challenges, and provide suggestions for future developments of the schema and qualitative archiving more generally.
The next generation Timescapes Archive - supporting the complex structures and relationships of qualitative longitudinal data
Ben Ryan (University of Leeds)
Timescapes, an ESRC funded study, has developed an archive to hold the data and documentation outputs of seven empirical longitudinal qualitative projects. One limitation that has become clear is that the platform used to archive the data (DigiTool) does not support the rich diversity, structure and inter-relationships that the data sets require to be of maximum use to the research community. The archive platform treats all files as "digital objects" and does not allow the modelling of complex structures of information and its inter-relationships. It is not possible to clearly display the connections between artefacts produced from a number of interviews and cohort activities over a number of phases. A consequence of this lack of relation is the inability to present the artefacts utilising time as a dimension other than simplistic date based searching, severely constraining the usability of longitudinal data. Adopting the Fedora Commons platform will allow the archive to represent concepts and relationships between concepts, such as collections, waves, and longitudinal case studies to be directly represented in the archive. The new architecture will support the representation of standards e.g. DDI and emerging initiatives e.g. QuDeX (an XML standard for qualitative data exchange) directly within the archive.
2011-06-01: B2: The IASSIST SIGDC Presents: Perspectives on Data Citation
Building Data Citations for Discovery
Hailey Mooney (Michigan State University )
Mark Newton (Purdue University )
Authors who choose to cite the research data behind their published reports have a variety of options to entertain: domain style guides, publisher requirements, and data provider citation recommendations. Instructions from these sources may differ in terms of the range of required citation elements and guidance to authors on when and how research data merit citation. This presentation will compare the elements of recommended data citations with actual citations in published articles drawn from targeted disciplinary bodies of literature. By creating a window into the practice of data citation, this presentation seeks to understand what guidance is offered to authors who want to cite data and how authors actually compose these citations.
Tracking Data Reuse: Motivations, Methods, and Obstacles
Heather Piwowar (NESCent, University of British Columbia)
Rewarding investigators who share data, assessing the impact of data repositories, and measuring the intended and unintended effects of data policy decisions all depend on being able to track dataset reuse. Unfortunately, tracking data reuse is currently extraordinarily difficult due to diverse attribution practices, tool limitations, and data source restrictions. Through a Pecha Kucha overview and subsequent details in a more traditional format, this talk will summarize recent experiences in tracking data reuse. Examples will be drawn from the ongoing project Tracking Data Reuse: Following one thousand datasets from public repositories into the published literature (winner of the 2010 ASIST SIGUSE Elfreda A. Chatman Research Proposal Award).
Panel Discussion - IASSIST Special Interest Group on Data Citation (SIGDC)
Robert Downs (Center for International Earth Science Information Network (CIESIN), Columbia University )
Michelle Haslett (University of North Carolina Libraries )
Ron Nakao (Stanford University Libraries )
Jan Brasse (German National Library of Science and Technology )
ICPSR's Efforts to Encourage Data Citation
Elizabeth Moss (ICPSR)
With its long-held commitment to linking social science data to the publications based on them, ICPSR has been encouraging stakeholders to make data citation common practice. This presentation will outline ICPSR's efforts to change outmoded citing practices by partnering with other archives to influence the whole community, including researchers, editors, journal publishers, and database aggregators.
2011-06-01: B3: Question and Variable Level Discovery and Access
Creating Personal Extracts While Keeping Confidentiality
David Schiller (Institute for Employment Research (IAB))
The German National Educational Panel Study (NEPS) collects longitudinal data for educational research. With six starting cohorts and 60.000 target persons, a more and more complex data structure emerges with every new wave. Data for scientific research is normally offered as flat files with a fixed number of variables and values. Therefore data providers have to build appropriate flat files for the scientific community, which is a difficult manual task as the data structure is complex. New ways to solve the mentioned problem open up when storing the data within a relational database. Variables or sets of variables, out of the entire range of data, can be selected and combined via a user interface similar to a shopping basket system or a "variable browser". The results are exported as individual datasets in different formats that meet the individual needs of researchers (e.g., Stata, SPSS). Database functionality and tools of business intelligence provide a wide range of possibilities to support the researcher during the selection process (e.g., descriptive statistics or metadata information). Confidentiality can be maintained by using a combination of database tools and statistical disclosure control methods (e.g., by using "security levels" for variables, or by using synthetic data).
Findings of the original language documentation for the European Values Study (EVS)
Evelyn Brislinger (GESIS - Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moeltgen (GESIS - Leibniz Institute for the Social Sciences)
The questionnaire of the recent EVS wave 2008 was translated into 38 languages and adjusted to 46 different cultural contexts to ensure that questions measure the same phenomena. This translation process was closely monitored and well documented, and it was the starting point for a workflow designed by GESIS that aims at high quality documentation of data with original question texts. This documentation is valuable for secondary users who want to look at the question wording used in different languages, countries or waves. We presented the ongoing project at IASSIST/IFDO 2009 and are now able to report on the findings. With the full data release of EVS 2008 in December 2010 the documentation process will be finished and the original language documentation will be published through online retrieval systems, an interactive online overview, and English/original language variable reports (http://www.europeanvaluesstudy.eu/). With the release of the longitudinal file 1981-2008 in June 2011 the original questions for two waves will be available and comparisons of questions across time will be possible. The workflow includes long-term preservation of all key information and databases. Also it allows for the re-use of the original language documentation as basis for questionnaire translation of coming waves.
Social Science Question Database and Research Tools
Xavier Schepler (Reséau Quatelet)
Laurent Lesnard (Reséau Quatelet)
Anne-Sophie Cousteaux (Reséau Quatelet)
The increase in the number of surveys archived and disseminated raises new challenges for data archives. Indeed, without new tools, the growth of the number of surveys makes it increasingly difficult for users to identify surveys that are relevant for them. Without new tools, combining different surveys with similar questions to conduct comparative research is also increasingly a daunting task. Survey designers who wish to reuse questions asked in previous surveys also need tools to find similar questions. To address these issues the CDSP (Réseau Quetelet) has developed a Social Science Question Database and Research Tools that allows users to search for questions (question texts, answer texts, variable labels) across datasets, compare results, and save them. The analysis can be extended to the roots of words or to include stop words. Information on each question include: questions text, categories of answer, location of the variable in the dataset, link to the variables before and after, instructions given to interviewers, text before and after the question, universe of the question, links to questionnaires. Users can store questions and export them (csv or xls). The question database is based on DDI (version 2) and the research module on Apache Solr.nbsp;
2011-06-01: B4: Taking the Pulse of Nations: Issues and Approaches to Census Taking in the 21st Century
The US Experience with the American Community Survey and Test of Voluntary Response
2011-06-01: C1: Recent Developments in the DDI Implementation Landscape II
Metadata Management Platform for the Canadian Research Data Centre Network
Pascal Heus (Metadata Technology North America Inc.)
The DDI4RDC project aims at the implementation of open source solutions for the deployment of a DDI3 driven framework for the management of data and metadata across the Canada Research Data Centre Network providing secure access to microdata from Statistics Canada. This year's session will provide an progress report for the project (initiated in 2009), demo the DDI editors and repository services, and share experiences and lessons learned during development. This project is funded by the Canadian Foundation for Innovation under the umbrella of the University of Manitoba and is a collaborative effort between the Canadian RDC Network, Metadata Technology North America (US), Breckenhill (Canada), Algenta Technologies (US), and Ideas2evidence Ltd (Norway).
QDDS - combining questionnaire development and survey documentation
Oliver Hopt (GESIS - Leibniz Institute for Social Sciences)
Brigitte Mathiak (GESIS - Leibniz Institute for Social Sciences)
QDDS is a general approach to create and design questionnaires and to document the changes and design decisions made in the process. This documentation is especially important for panels or other surveys that re-use instruments, as it allows the users to participate from the decisions already made and existing questionnaire elements. The alternative, to document the changes manually, is labor intensive and complex. Also, the goal is to have a multiple-purpose questionnaire editor that can be used to design surveys regardless of the target distribution system. Currently, we are using the DDI 2.1 standard as general file format and work on a switch to DDI 3.1. Using these standards makes it possible to import questionnaires into structured survey documentations without any loss of information. In this paper, we describe how we manage to hide the complexity of DDI by focusing on the main entities. We also describe the architecture and how it leads to the easy-to-use interface that allows primary researchers to use this tool efficiently, without in-depth documentation skills. Furthermore, we sketch out the necessary changes that are necessary to switch to DDI 3.1 and how that will improve the interaction with future information systems based on DDI 3.1. In the presentation on the last IASSIST we showed mainly the overall concept and reason for the support of DDI 3. For this IASSIST we want to show a first view on the actual solution and the corresponding changes on user interface level that are necessary.
Colectica is a DDI 3-based platform for creating, documenting, managing, distributing, and discovering data. Colectica aims to create publishable documentation as a by-product of the data management process. This demonstration will focus on features that have been added to Colectica over the past year, including: - Metadata repository for multi-user collaboration - Workflow management - Automated metadata harmonization - Improved Web-based data discovery and dissemination.
Questasy is a data dissemination website application based on DDI 3. It manages studies, questions, variables, publications and more for longitudinal panel surveys. It was primarily developed for the LISS Data Archive (http://www.lissdata.nl), but is freely available for other organizations. This presentation will relate the new developments in Questasy since the 2010 IASSIST.
da|ra - The German Registration Portal for Social- and Economic Data
Brigitte Hausstein (GESIS - Leibniz Institute for Social Sciences)
Anja Wilde (GESIS - Leibniz Institute for Social Sciences)
Wolfgang Zenk-Moltgen (GESIS - Leibniz Institute for Social Sciences)
In the interest of good scientific practice there is a demand to make collected primary data publicly accessible, so that one sees not only the final, conclusive research results, but the entire research process can be followed as well. This is why the GESIS–Leibniz-Institute for Social Sciences (GESIS) and the ZBW – Leibniz Information Centre for Economics (ZBW) decided to implement the DOI registration portal for German social and economic data – da|ra. This infrastructure will lay the foundations for a permanent identification, storage, localizing and ultimately a reliable citability of research data. A DOI is a name that permanently identifies an object (such as research data) in the digital environment. The DOI name won’t change, even though the information about the data, including where to find it, may change from time to time. In cooperation with DataCite, the international initiative to establish easier access to digital research data, GESIS is already operating the non-commercial registration agency for social data. The service was initiated in 2010 with a pilot phase in order to set up the technical and organizational framework. Meanwhile 4780 studies are registered with DOI names. From 2011 onwards the registration service will be expanded to economic data. The paper will deal with the technical and organizational solutions of the DOI Name Registration Portal and focus on how the new features are included in the already existing DOI service. http://www.gesis.org/darahttp://www.zbw.eu
2011-06-01: C3: Trusted and Valued: Data Quality Issues
Recruitment, Participation, and Sampling: Researchers' Results in General Practice
Thomas Lindsay (University of Minnesota)
Andrew M. Sell (University of Minnesota)
Theoretical works on survey methodology lay out best practices and expected outcomes for researchers who wish to conduct social science research projects. While national surveys and other large-scale projects have the resources to ensure best practice, most social science researchers face compromises relating to cost, time, and availability of respondents. Over the past five years Survey Services at the University of Minnesota's College of Liberal Arts has conducted surveys for approximately 200 research projects using a variety of methods, with divergent outcomes. Using the metadata from these research projects, we have begun to test a variety of theoretical tenets of survey methodology against the empirical outcomes of the projects we have supported. Additionally, we are working with some of our researchers to experimentally test specific approaches to sampling and recruitment. In our presentation we will compare outcomes of various methodological choices made by our researchers. We will also discuss the results of our experimental tests within the framework of theoretical best practices and expectations.
Improving Data Quality by Data Reviews and Tagging: First Pilot Experiences
Rutger Kramer (Data Archiving and Networked Services (DANS))
Marion Wittenberg (Data Archiving and Networked Services (DANS))
Marjan Grootveld (Data Archiving and Networked Services (DANS))
Part of our data archive is open for self-archiving, which requires that researchers who deposit their data add metadata themselves. In line with this practice and current trends such as Web 2.0. we are interested in enriching our current metadata with user reviews. Moreover, whereas research quality is also measured by the number and quality of publications, it makes sense to value the quality of the underlying data and the degree to which they are fit for re-use. Therefore we recently carried out a pilot study among researchers who downloaded data sets from our archive. We asked them to review the downloaded data set on several dimensions and also to provide keywords "tags" that should help other researchers to find this data set. Next, we have presented both the tags and the averaged review scores as part of that set’s metadata. In this presentation we will give an impression of the reviews and the user tags and how these tags relate to professional keywords.
Karsten Boye Rasmussen (University of Southern Denmark)
Sometimes the platform for conference sessions is a scene changing underneath the presenters. Because our "data quality"-session had a cancellation the session will now include a non-advertised presentation by the session chair. The presentation will show some of the common dimensions and reasons behind the demand for data quality. However, because of the short notice for this presentation it will move in the direction of a short stand-up-sing-and-dance-show with audience participation. The subject for this ending presentation of the session will be an attack - hopefully a surprise attack - on the rationality that we expect from ourselves and other requestors of data. Especially when we repeatedly are demanding a higher data quality we have to ask the question: "Are we worth it?".
2011-06-01: C4: Building Data Services for Confidential and Organizational Data
Dealing with Business and Organizational Data - Insights from the Data Service Centre for Business and Organizational Data, Bielefeld University
Alexia Meyermann (Bielefeld University)
Organizational research in Germany has up to now not developed a tradition of secondary analysis - most of the studies rely on primary data collection but are confronted with dramatically decreasing participation rates - most of the studies report response rates between 5 and 15 percent. This - and the fact that standards of documenting organizational data do not exist - increase the need for institutionalized data sharing. Thus, the Data Service Centre for Business and Organizational Data at Bielefeld University began in 2010 to collect, archive and distribute business and organizational data from the social sciences and several other disciplines. The goal is to bridge the gap between data producers and data users across different disciplines by providing an institutional and technical framework. Our presentation focuses on certain problems that specifically arise from dealing with business and organizational data in various aspects. We would like to discuss the specifics of documenting studies with organizations as the analysis unit compared to individual or household data. This will yield some insights into the complexities of organizational data (hierarchical structures). The requirements of metadata documentation will be discussed from a methodological and a substantial point of view and our solutions will be presented.
Advancing restricted access data computing at CISER: The technology, expertise, and tools of the Cornell Restricted Access Data Center (CRADC)
Janet Heslop (Cornell Institute for Social and Economic Research (CISER))
Jeremy Williams (Cornell Institute for Social and Economic Research (CISER))
Building the infrastructure, expertise, and tools to meet the multifaceted needs of researchers using confidential data confronts challenges in cost, security and legality. To answer these challenges, the Cornell Institute for Social and Economic Research (CISER) staffs and hosts the Cornell Restricted Access Data Center (CRADC). Since its inception in 1999, the CRADC has provided secure access to confidential research data and has become the Cornell University custodian of restricted access data sets. The endeavor to balance a resource-rich and highly-usable collaborative computing environment that also meets security requirements demands technological leadership as well as expert service related to a great variety of data providers and software packages. This presentation will describe how CISER facilitates this balance, via a synthesis of technology, expert service and tools to empower sanctioned and secure scientific research on restricted access data at Cornell University.
Exploring New Methods for Protecting and Distributing Confidential Research Data
Steve Burling (ICPSR)
Bryan Beecher (ICPSR)
ICPSR is building and testing a data storage and dissemination system for confidential data, which obviates the need for users to build and secure their own computing environments. Recent advances in public utility (or “cloud”) computing now makes it feasible to provision powerful, secure data analysis platforms on-demand. We will leverage these advances to build a system which collects “system configuration” information from analysts using a simple web interface, and then produces a custom computing environment for each confidential data contract holder. Each custom system will secure the data storage and usage environment in accordance with the confidentiality requirements of each data file. When the analysis has been completed, this custom system will be fed into a “virtual shredder” before final disposal. This prototype data dissemination system will be tested for (1) system functionality (i.e., does it remove the usual barriers to data access?); (2) storage and computing security (i.e., does it keep the data secure?); and (3) usability (i.e., is the entire system easier to use?).
2011-06-02: Plenary II
Research Data Infrastructure: Are the Social Sciences on Main Street or a Side Road?
Chuck Humphrey (University of Alberta)
Chuck Humphrey is passionate about data and has been examining research data infrastructure with a global perspective. His talk will locate the social sciences in the broader E-science picture and give us a glimpse of the future.
2011-06-02: D1: Recent Developments in the DDI Implementation Landscape III
DDI + API: building services on top of your existing DDI holdings
Ornulf Risnes ( Norwegian Social Science Data Services (NSD))
A Nesstar Server is an example of a web-based container that can store and make available DDI metadata to consuming applications. Once a DDI-document is published to a Nesstar Server, the contents of the document is made available via a Java API. Developers of third party clients may use this API to connect to any such server, and programmatically navigate and harvest collections of DDI-documents held there, or just to extract information from specific DDI-fields relevant to the consuming application. For many years, most clients developed to interact with the Nesstar API was developed in-house, and integrated as part of the Nesstar Software Suite. Recently, however, third party clients have started to emerge, and they use the DDI-container and the API for very different purposes, including, but not limited to: Variable “shopping carts” for simplified downloads of complex data sets Automated harvesters for indexing in Solr-powered search systems, including searchable study- and question/variable databases Harvester taking snapshots for persistent archiving in DataVerse (beta) Search engine optimized rendering of DDI content for exposing DDI-holdings to generic search engines like Google The presentation will demonstrate the basic architecture of the DDI-driven API, and show examples of some of the current services built to interact with the API.
Prototype of Open Source Metadata Editor for Individual Researchers in Japan
Yuki Yonekura (Institute of Social Science, the University of Tokyo)
Keiichi Sato (Institute of Social Science, the University of Tokyo)
The Social Science Japan Data Archive (SSJDA) plays the role of a major data provider for those who seek to analyze the Japanese society using micro data. Deposited datasets to the SSJDA are described in various formats and we have many works to create metadata of the each survey. This situation is not good for managing our archive. We thought this situation can be changed through disseminating the Data Documentation Initiative (DDI). Due to its complexity of the specification of the DDI, the metadata are often created with certain software such as Nesstar, Colectica and so on. With these applications, researchers can easily create DDI documents, but these applications are relatively expensive for individual researchers. In addition, Japanese researchers have difficulty to use them because of language barrier. Thus we are developing free software to edit DDI documents. In this presentation, we will show a prototype of open source metadata editor and its process.
Arisddi is a DDI editor for the Macintosh, Windows, and Linux operating systems. Using a graphical user interface, users can build DDI codebooks using the full DDI 2.1 specification. No knowledge of XML is required. Support for DDI 2.5 and 3.1 is planned. Arisddi is open source software, built on the Eclipse platform.
2011-06-02: D2: Power of Partnerships in Data Creation and Sharing
Partnership in Data Access --- Combining two Data delivery Services - Going Bilingual
Vincent Gray (University of Western Ontario)
Maryna Beaulieu ()
Sébastien Nadeau ()
Elizabeth Hill (University of Western Ontario)
Gaston Quirion (Bibliotheque de l'Universite Laval)
Heather Stevens ()
Academic access to Statistics Canada Data Liberation Initiative (DLI) data at Quebec Universities (CREPUQ) and the University of Western Ontario was delivered via long-standing established data delivery systems, Sherlock and the Internet Data Library System (IDLS). In 2007, a partnership between CREPUQ and University of Western Ontario was established to replace IDLS and Sherlock with a new bilingual system which would draw on the strengths of the former systems and provide an improved interface. The Equinox Data Delivery System (http://equinox.uwo.ca) is the result of this partnership. Equinox was formally launched in Montreal on May 12, 2010. It provides access to DLI data and geospatial files to users at fifty Canadian academic institutions. The presentation will discuss: - the project structure; - timelines; - enhancements realized within the new system; and - the benefits and challenges of working with partners from multiple organizations, in different places.
OCUL's Geospatial Portal Project: From Vision to Reality
Leanne Hindmarch (Ontario Council of University Libraries / Scholars Portal )
Elizabeth Hill (University of Western Ontario)
Jenny Marvin (University of Guelph)
The Ontario Council of University Libraries (OCUL) is a consortium of twenty-one university libraries in the province of Ontario that collaborates on resource sharing, access to resources and technologies. OCUL was represented at IASSIST 2010 with a pecha kucha session that explored and demonstrated our vision for an online geospatial portal, a project that is currently in development. Our vision includes the provision of tools for identifying, exploring, and downloading licensed geospatial data, as well as collaboration and teaching tools. The vision also involves the creation and open sharing of standards-based geospatial metadata. In Fall 2011, OCUL will be launching its first release of the geospatial portal. The spring 2011 IASSIST session will provide a preview of the portal, demonstrating the available tools. The session will also discuss the expected impact of the portal at Ontario universities, assessment plans, and future development goals.
Working Within and Across: the Data Difficulties (and Rewards) of the NECASL Project
Rachael Barlow (Trinity College (Hartford))
Heather Lindkvist (Trinity College (Hartford))
The New England Consortium on Assessment and Student Learning (NECASL) involves qualitative and quantitative data creation and sharing within and across seven liberal-arts colleges in the northeast U.S. (see http://www.wellesley.edu/NECASL/) NECASL is composed of data professionals of varied ilk: Directors of Institutional Research and IRBs, faculty supervisors and analysts, and student interviewers. This group began following a panel of 36 students at each institution in Fall 2006, documenting their experiences through hour-long, open-ended, biannual interviews and annual surveys of the entire Class of 2010. Currently, NECASL is linking the transcriptions from the interviews to both the survey data and administrative data on demographics and academic performance, while preparing for a final “one year out” interview. We will describe NECASL’s data challenges, including: creating survey data across institutions embedded in different data consortiums (HEDS, COFHE, etc.), meshing qualitative data coming from institutions with different IRB procedures for de-identification, transferring large NVIVO (qualitative software) files from one campus to another, and altering data sharing agreements as project participants fall in and out of the project and as consortial ideas about data sharing change.
Canadian National Collaborative Data Infrastructure: a distinctively Canadian approach
Lynn Copeland (Simon Fraser University )
Canadian research libraries have a strong record of success in obtaining funding for collaborative national or provincial projects which support research and innovation. Canadian data librarians have been active in providing institutional support and similarly collaborating on a regional and national level. Characteristic of these initiatives is that they are invariably ‘bottom up’ and require determined consensus building among disparate institutions. The Canadian National Collaborative Data Infrastructure (CNCDI) initiative will build on these successes. Led by the Canadian Association of Research Libraries, we have been consulting with potential partners to create a proposal to demonstrate the feasibility of our goal: to build a national infrastructure to support the innovative re-use of data created through publicly-funded research. The project will build on and enhance the existing patchwork of data management services and infrastructures in Canada to create a comprehensive, integrated network of data repositories capable of supporting Canadian research across all disciplines far into the future. In this presentation, I will discuss the demonstrated need, and the vision and model we are developing, as well as the approach we have taken to building support for this proposal and opportunities and challenges in the current academic, economic and political environment.
2011-06-02: D3: Teach This! Teaching Data in the Library and Across the University
Embedded Data Librarianship
Linda Kellam (University of North Carolina at Greensboro)
Drawing on discussions of information literacy and curriculum integration, Lynda Kellam will discuss the role of the library in promoting statistical and data literacy. She will relate this larger discussion to the real life experiences of embedded librarians at UNCG, including herself and the business librarian.
Models and opportunities for one-shot group instruction
Katharin Peter (University of Southern California)
Katharin Peter will provide a look at models and opportunities for one-shot group instruction within the library and across campus including course-specific instruction, workshops, and instructional partnerships.
Strategies for teaching spatial data resources and software
Nicole Scholtz (University of Michigan)
Nicole Scholtz will describe strategies for teaching spatial data resources and software in a campus-wide drop-in workshop series and in course-specific instruction.
Necessary data and statistical skills for social science students
Jackie Carter (University of Manchester)
Jackie Carter will discuss recent work with the World Bank that has collated evidence from learners and researchers regarding data and statistical skills for UK social science students, and how students can acquire these using socioeconomic data resources like Economic and Social Data Service (ESDS) and the Census Dissemination Unit.
2011-06-02: D4: Latin America, Spain, and Portugal: An Overview of Data Organizations and Resources
IASSIST Latin Engagement Strategic Action Group
Luis Martinez Uribe (Instituto Juan March, Centro de Estudios Avanzados en Ciencias Sociales)
Stuart Macdonald (University of Edinburgh)
The IASSIST Latin Engagement Strategic Action Group was charged with "proposing a set of concrete activities that IASSIST and its members could undertake to further the organizations engaging with data professionals from Spanish speaking institutions from Spain and Latin America". An overview of the findings will be presented.
Mitchell Seligson (Latin American Public Opinion Project, Vanderbilt University )
The AmericasBarometer surveys currently cover all independent nations in North, Central and South America, as well as an important cross-section of the Caribbean. In addition, the Latin American Public Opinion Project archives many other political surveys for Latin America.
Data Curation at U.Porto: Identifying current practices across disciplinary domains
Cristina Ribeiro (Porto University)
Eugénia Matos Fernandes ( Porto University )
The University of Porto is currently concerned with the curation of and the access to the scientific data generated by its researchers. There is a growing awareness of the fragility of digital archives, and researchers feel that they need to keep their data assets alive as the scientific infrastructure becomes more sophisticated. The possibilities of scientific impact derived from open datasets are also becoming evident to them. As a result of an identification task, we present a preliminary study on the datasets which are being used in current research, picking examples from life sciences, engineering, social sciences and arts. The identification also provides insight on current models for data curation, both formal and informal, and on the sensitivity of researchers with respect to open access to their data.
More with Less: Collaborative Trends in Research Data Management
Martin Donnelly (University of Edinburgh)
A range of factors contributes to the recent growth in collaborative activities across the data management landscape. This Pecha Kucha presentation outlines the drivers behind new collaborations in research data management, in the UK and beyond, the ways in which these new collaborations currently manifest themselves, viz. - The 'crowdsourcing' of feedback on the Digital Curation Centre's (DCC's) comprehensive Data Management Planning (DMP) checklist; - Collaboration between the DCC and UKDA on generic data management guidance for researchers, support staff and bid reviewers; - Joint training programmes pitched at various levels; - The development of subject-specific guidance within JISC's Managing Research Data (MRD) programme, and repurposing of existing DCC materials; - International cooperation between the DCC and US colleagues in mapping the generic DMP Checklist to the various National Science Foundation directorates' data-related requirements. The presentation concludes with an overview of future initiatives, including an open registry of data management plans, intended to prevent reinvention of the wheel and to offer exemplar best practices (without reducing the exercise to a boilerplate/tickbox level), online tutorials, the development of componentised, interoperable data management tools, and the repositioning of data management as a shared service, hosted and offered via the cloud.
Fight for your right!: Marketing data and data resources to non-data users
Lynda Kellam (UNCG)
As data professionals, our work often requires explaining and justifying our positions and our data resources. Whether we are data librarians in large research institutions with well-established services or social science librarians at small institutions with few data users, we all must fight for resources that may seem overly expensive and esoteric to non-data users. In this pecha kucha I will highlight efforts at various libraries to make data services and resources accessible (and appealing) to non-data users. These efforts are not limited to the use of creative marketing techniques; they also include approaches to teaching and virtual reference that introduce users to data resources without overwhelming them. At the most basic level, the way that we communicate and explain our services on our websites impact how our users will interact with us and our resources. In a world of 20% budget reductions data professionals must fight for the right to data while developing relationships with partners throughout our user communities that will lead to better understanding of, and as a result, higher use of data resources.
Collaborating Across Formats: the Cultivation of a New Department
Katherine McNeill (Massachusetts Institute of Technology)
What do social science numeric data, e-science, bioinformatics, GIS data, maps, music, images, and video have in common? What issues cross formats and disciplines? How do these areas lend themselves to being coordinated? In the MIT Libraries, these domains now are managed under the newly-formed department of Specialized Content and Services, a product of a recent reorganization. The presenter will discuss the process of bringing together these formats; work done to coordinate services, metadata, and content; and lessons learned about how to leverage commonalities among these areas while still attending to particular needs in each domain.
Harsha Ummerpillai (ICPSR, University of Michigan)
2011-06-02: E1: Enriching Metadata: Controlled Vocabularies and Ontologies
Representation of the Data Documentation Initiative using Semantic Web Technologies
Thomas Bosch (GESIS - Leibniz Institute for the Social Sciences)
The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social and behavioral sciences. In order to establish DDI as a de facto metadata standard in this field DDI should reach a broader audience. The target group could be expanded if DDI structures are used in the increasingly popular Linked Open Data network. The approach is to publish data and metadata in form of a standard based exchange format like the widely accepted and applied Resource Description Framework (RDF), a Semantic Web technology specified by the W3C. Use cases will be exemplified in which specific problems can’t be resolved without or could be solved in a better way using the RDF representation of data and metadata specified in DDI 3. In order to describe data and metadata specified in DDI 3 in form of RDF an ontology has to be built based on the conceptual model of DDI 3. This ontology should encompass the most relevant DDI 3 components. The outline of this approach will be described. Possible applications using the RDF representation of data and metadata will be discussed to show solutions for the issues associated with the identified use cases.
Controlled Vocabularies - A New Product of the DDI Alliance
Sandra Ionescu (ICPSR)
The DDI Alliance is publishing controlled vocabularies for a number of descriptive elements that are frequently used in data documentation. These vocabularies are issued as a separate product from the DDI schemas. They are declared in Genericode - an XML standard for defining code lists and an OASIS committee specification - and they are suitable to use in conjunction with DDI 3 and DDI 2 elements, as well as any matching elements in other data documentation standards. Multiple language translations are also supported. Our presentation will review the first set of published vocabularies, demonstrate how they can be accessed and used, and will offer a preview of those lists on which work is still in progress. We will also discuss the Controlled Vocabularies Working Group's approach to building these code lists, some of the challenges we faced, as well as plans to manage and develop the vocabularies moving forward.
Structuring Unstructured Data Using Controlled Vocabularies
Johann Schaible (GESIS - Leibniz Institute for the Social Sciences)
The Data Documentation Initiative (DDI) is a metadata specification expressed in XML for describing data from e.g. social sciences. DDI metadata allows collecting, processing, analyzing, discovering, distributing and archiving data. The current version is DDI 3, which includes controlled vocabularies, so data sets are categorized, which leads to structured data sets and hence additional information. In practice in DDI 3 there is still a lot of unstructured data stored in uncategorized plain text fields, especially when converted from DDI 2. This means the document contains less information than it should, but to categorize those plain text fields manually would be too inefficient and error-prone. In this paper, we present a solution for categorizing the free texts automatically. This solution is based on the Recommind Mindserver, which uses sophisticated text mining algorithms to categorize the text based on a training sample. This way, the manual task of categorization can be automated, enriching documents in which this metadata is missing.
The Research-Data-Centre in Research-Data-Centre Approach: A First Step Towards Decentralised International Data Sharing
Srefan Bender (German Federal Employment Agency (BA), Institute for Employment Research (IAB))
Jörg Heining (German Federal Employment Agency (BA), Institute for Employment Research (IAB))
This presentation will give an overview of the transnational remote access parts and presents first experiences of establishing a German research data center in the US (at ISR). DwB will have a great impact on the data access landscape in Europe.
2011-06-02: E4: Challenges and Capabilities for Long-term Preservation of Scientific Data
Library of Congress Strategies for Working with Geospatial Data: A Collaborative Engagement
Erin Engle (Library of Congress)
William Lefurgy (Library of Congress)
William Lazorchak (Library of Congress)
For the past ten years, the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) has been working to understand the challenges and to explore strategies of collecting, preserving and making available significant digital content, including geospatial data, for current and future generations. NDIIPP has built a national network of partners in academia, the private sector and in federal, state and local government who are cooperating on best practices and standards and developing shared tools and services for digital preservation. A pillar of NDIIPP is that cooperation and engagement with communities of practice are necessary to select and preserve at-risk digital content. NDIIPP has been very interested in geospatial data since the program’s inception. This presentation will discuss NDIIPP’s strategies for working with the geospatial community on preservation and access issues, including a discussion of recent activities such as the summit meetings with recognized experts to discuss framing a national preservation and access strategy for geospatial data, the launch of the National Digital Stewardship Alliance and the recent work of the Federal Geospatial Data Committee Users/Historical Data Working Group.
Improving Practice through Experiential Learning: Library of Congress Geospatial Data Preservation Projects
Steven Morris (North Carolina State University Libraries)
The North Carolina Geospatial Data Archiving Project (NCGDAP) was a joint effort between North Carolina State University Libraries and the North Carolina Center for Geographic Information Analysis, in cooperation with the Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP). That initial project served to catalyze discussions about data preservation needs within the network of data producers and custodians that form spatial data infrastructure, and led to the formal involvement of state archives in data preservation of activities. Cross-fertilization with activities in other states led to a subsequent NDIIPP initiative called GeoMAPP (Geospatial Multistate Archive and Preservation Partnership), which has built on the initial learning experiences of NCGDAP and has focused more explicitly on formal archival processes such as selection, appraisal, retention scheduling, and content transfer. GeoMAPP has recently expanded to include five participant states, and is now focused on preservation and data transfer, storage and access, industry outreach, mentoring peer states, and business planning. The experience of both NCGDAP and GeoMAPP has shown the value of providing producers and managers of geospatial data with access to information resources that support data preservation efforts.
Developing an Online Resource Center about Geospatial Data Preservation
Robert R. Downs (Columbia University, CIESIN)
Robert C. Chen (Columbia University, CIESIN)
Like other scientific artifacts that are in digital form, geospatial data, maps, and other spatial information products and services are at risk of being lost if not preserved for future use. Enabling the future use of geospatial data can foster new opportunities for learning and facilitate capabilities for scientific investigations to build on the results of previous research. A key challenge is to promote awareness of the need for preservation and the approaches, methods, and tools available to support preservation efforts. Providing developers and managers of geospatial information with web-based tools and information resources to preserve geospatial data can contribute to capabilities for enabling long-term access to geospatial data. The Geospatial Data Preservation Resource Center is being developed to provide communities of geospatial data professionals, scientific data librarians, and others interested in the preservation of geospatial assets with resources to assist them in preserving our geospatial information heritage. The presentation describes the design and development of an online resource center about the preservation of geospatial data, including a survey of the geospatial data management community conducted to inform its design and identify expectations for use.
The International Federation of Data Organizations for the Social Sciences (IFDO) was established in 1977. It supports data exchange and cross-national comparative research through cooperation between national social science data archives. IFDO was established to promote projects and procedures for enhancing exchange of data and technologies among data organizations, to stimulate development and use of these procedures throughout the world, and to encourage new data organizations to further these objectives. Now IFDO is thinking through its strategy, governance and position. The future IFDO aims to work in multilateral cooperation with other organizations (like CESSDA, IASSIST) and activities (like International Data Forum). This poster declares thoughts on the new position of IFDO, and calls data archives and data professionals to contribute ideas to the planning process.
Harmonization Potential of 53 Large Population-Based Studies Using the DataSHaPER
Dany Doiron (Public Population Project in Genomics)
The DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research; http://www.datashaper.org) was developed to provide a flexible, but structured approach to the harmonization and pooling of selected information between studies. In this poster/demonstration, this methodological tool is used to demonstrate the potential of sharing harmonized data (148 reference variables) between 53 large population-based studies (6.9 million participants). The DataSHaPER approach to retrospective harmonization is threefold. Firstly, rules reflecting the formal criteria that determine if a particular reference variable can be recreated from the assessment items of each study are defined. These rules also determine the quality of the match between reference variables and assessment items. Secondly, rules are applied for each reference variable and for each study participating in the harmonization process. Finally, results from this exercise are tabulated to illustrate the data sharing potential between participating studies. Results from this harmonization exercise show that a number of important reference variables can potentially be shared and co-analyzed by a large number of participating studies. The data harmonization potential thus demonstrated by the DataSHaPER tool offers the promise of greatly enhanced collaborative research generated through synthesized databases in many fields including health, environmental, and social sciences.
Implementing DdiEditor in the Danish Data Archive - Demonstration and gained experience.
Nana Floor Clausen (Danish Data Archive)
Jannik V. Jensen (Danish Data Archive)
In the beginning of 2011 the Danish Data Archive (DDA) implemented its first release of the DdiEditor. This has resulted in new perspectives on documenting and managing data, the demonstration and presentation seeks to spread gained knowledge on previous experiences so far. One of the major challenges has been how the DdiEditor can secure high quality data and documentation that has been one of DDA's trademarks so far. This approach has been a key factor in both past and further developments of the DdiEditor. DdiEditor is an Open Source project facilitating editing of DDI-3 for further information se project homepage: http://www.samfund.dda.dk/dditools/default.htm.
The DDI Tools Catalog: development of a resource for the social science (meta)data community
Stefen Kramer (Cornell Institute for Social and Economic Research)
Katherine McNeill (MIT Libraries)
At its annual meeting on May 31, 2010, the DDI Alliance Expert Committee agreed to charge a newly formed group with the revision of the DDI Tools Catalog (http://www.ddialliance.org/resources/tools), which aims to support the social science data community by providing a comprehensive, web-accessible database of tools for utilizing DDI metadata. This live preview of the revised Tools Catalog will show its new features and workflow for developer submissions. Feedback from the social science data community on what would make the DDI Tools Catalog even more useful in the future, and volunteers for editing its contents, will also be invited.
Case Study in Assessing Scientific Data Management Practices and Needs
Sherry Lake (University of Virginia Library)
The University of Virginia Library is working to support new data management requirements in science and engineering by developing a model that first draws upon close collaboration between data experts and subject librarians, and culminates in policy and infrastructure recommendations to the University's Office of the Vice President for Research (VPR) and the Office of the Vice President/Chief Information Officer (VP/CIO). This model begins with a data interview to assess the researcher's data management practices and needs and to establish a baseline awareness of current practice. After collecting this information, the results are furnished to the institutional repository team and NSF Data Management Plan working group to inform their processes. In aggregate form, this information is provided to the VPR and VP/CIO as policy and infrastructure recommendations. Ultimately, the entire process cycles back to the researcher with specific recommendations and solutions that will help improve the research process. This presentation will offer a case study following a scientist through this consulting process with the hope that it will be useful as a means of identifying user needs and as a model for the evolving data profession across many disciplines.
Query-based access is an alternative approach to dissemination-based access for making statistics available. It takes advantage of high performance computing and new information privacy protection methods to reduce the amount of up front work required from the provider and increase the level of access to data for end users. It is particularly useful when your statistics are in demand from researchers.
This poster session showcases some of the key services offered by the UK Data Archive. This year we have undergone a complete rebranding which in itself has given us the opportunity to really think hard about who we are! The focus has been on reaching out to a much wider audience, and giving a more open tone to our web site and communications. The poster highlights a number of new activities: our new website and resources new data management capacity building grants our work on providing case studies of data usage our ESDS Research Methods and teaching resources our new Secure Data Service funded by the ESRC which promotes excellence in research by enabling safe and secure remote access to sensitive, detailed and confidential data our new Question Bank search
Exploring digital curation definition across research centers, university, government, and commercial industries in the U.S.
Plato Smitt II (Florida State University)
Cursory research of literature and select US and foreign institutions' websites on the emerging field of digital curation reveal varied definitions of digital curation with some definitions possessing ambiguous interpretations when digital curation and data curation appear to be used interchangeably to define digital curation. Hence, a survey titled defining digital curation understanding was developed to address this research issue. The goal of this survey is to understand the public’s perception of data curation, digital preservation, digital curation, and life cycle terms. This survey was created to illuminate common terminology development across multiple research disciplines for deeper inter-disciplinary research exploration and collaboration within the broader context of data management. Since digital curation involves multiple research disciplines, institutions and organizations, there is a need to assess the public’s response to the development towards common nomenclature of definitions and interpretations across multiple disciplines. The data from this survey will be used to support and stimulate discussions toward impacting decisions for the establishment of baseline terminology agreement across disciplines, institutions, and organizations in the US with practical implications for contributions to curriculum, theoretical, and professional development. This exploratory research poster will use text, graphics, and survey results to explore defining digital curation understanding across disciplines, institutions, and organizations in the US, provide insight into future digital curation curriculum development, and contribute to existing literature in the emerging field of digital curation research in the US.
Restructuring SDA for Easier Collaboration in Data Analysis
Charlie Thomas (CSM and UCDATA, Univ. of California, Berkeley)
Jon Stiles (CSM and UCDATA, University of California, Berkeley )
SDA has been successful in facilitating easy online analysis of survey data for a wide range of users -- researchers, faculty, students, journalists and others. SDA's ability to provide analysis "in the cloud" -- without the need to download data or install statistical software -- has proven very popular. Now we are extending SDA's capabilities in a number of ways for various types of users. For students and other "beginners" we are simplifying the user interface by hiding more advanced options until they are needed. For more advanced users we are: adding new options for complex standard errors, enabling private workspaces where analysts can create recoded and computed variables and selectively share them with collaborators, and providing ways to save and share analysis options so a particular analysis can be easily recreated. And for archivists who are setting up an SDA archive, we are simplifying the installation procedure by consolidating the SDA distribution package into a single Java Web application.
Shared Digital Technologies for Data Curation, Preservation, and Access: A Proof of Concept
Mary Vardigan (Inter-university Consortium for Political and Social Research)
Bryan Beecher (Inter-university Consortium for Political and Social Research)
Nathan Adams (Inter-university Consortium for Political and Social Research)
Nancy McGovern (Inter-university Consortium for Political and Social Research)
Peggy Overcashier (Inter-university Consortium for Political and Social Research)
ICPSR is building Fedora Commons data models for social science research data and documentation that conform to the OAIS reference model, and which will facilitate the generation of key OAIS products such as the Archival Information Package (AIP). This presentation will share the results of an NSF-funded INTEROP grant which has funded this work since late 2009.
nbsp;Embedded in a trans-disciplinary strategy for the institutional research data management, the Data Service Center for Business and Organizational Data (DSZ-BO) is currently established at the Bielefeld University. The goal is to bridge the gap between data producers and data consumers by providing an infrastructural framework for the acquisition, standardization, preservation, dissemination and - if appropriate discovery and reuse - of scientific data. Beside the identification of the requirements and workflows within the sociological research, there is a technical challenge for developing a DDI3.x-based infrastructure which covers as many of the peculiarities in the description and management of quantitative/qualitative data and its metadata from (longitudinal) surveys. Furthermore, the description of organizational data is very complex and needs another focus of documentation, e.g. usage of controlled vocabularies for the classification of employers. In this presentation we will explain our multi-layer architecture for efficient storage, retrieval, presentation and enabling secondary analysis on the basis of DDI which are relative to DSZ-BO research problems. Furthermore generic add-on services will be considered which connect the data archive with a repository. This enables persistent identification and bidirectional linking of research data and publications as well as access rights management, versioning and data export functionalities.
Melanie Wright (UK Data Archive, University of Essex)
The UK Economic and Social Research Council has recently funded a new Secure Data Service to provide remote secure access to data previously considered too sensitive or detailed to be allowed offsite. After a 2-year pilot and a long journey of collaboration with the UK Office for National Statistics and other major UK data producers, the Secure Data Service has launched a fullblown service for UK academic researchers. The poster will demonstrate what the service offers and how.
Marc Maynard (Roper Center for Public Opinion Research)
One goal of the Data-PASS Partnership is to find and preserve potentially "at-risk" data sets for future generations of social science researchers. These data sets can take on many forms and be found on obsolete media including paper tape, punched cards, as well as, magnetic tape and disk. Based on the collaborative efforts of Data-PASS partners and focused on both the physical and logical aspects of recovery, this poster describes and documents efforts to rescue, read, process, and migrate multi-punched card data to modern formats.
Lois Timms-Ferrera (The Roper Center, University of Connecticut)
Marc Maynard (The Roper Center, University of Connecticut)
There are a variety of new tools available to access the more than a half million US questions in iPOLL and 20,000 US and international dataset files archived at the Roper Center. This poster and live demonstration will display these newer services and present options for assessing user needs and discerning which services will best meet those needs. It will encapsulate the various finding aids and analysis tools that support the discovery and utilization of public opinion surveys, and will focus on the latest service enhancements, iPOLLplus and RoperExplorer, the new interface utilizing SDA to analyze surveys.
2011-06-03: F1: Data Management Plans: UK, US, Australia
Of Policy, Practice and Tools: Data Management Planning in the Social Sciences in the UK
Martin Donnelly (University of Edinburgh)
Veerle Van den Eynden (co-author) (UK Data Archive )
Public funders place increasing importance in data management planning (DMP) for research projects in improving the longevity of research data, and enabling widespread access and reuse. The UK's Economic and Social Research Council's new data policy continues the trend by mandating DMPs as an integral part of all research award applications. Support services have followed suit by developing tools and guidance for researchers to plan and implement data management throughout their work. The Digital Curation Centre has developed a web-based tool, DMP Online, which helps researchers develop data management plans according to their funders' requirements. The UK Data Archive, where data resulting from ESRC-funded research are archived and made available to the academic community, works closely with researchers on data sharing, providing data management guidance and advice. Ongoing efforts combine the strengths of all involved, integrating DMP Online into the ESRC application form, with UKDA and DCC providing guidance for researchers to develop strong plans - and for reviewers to evaluate these. Discussions also focus on how ESRC might monitor how plans are operationalised, and how good data management is demonstrated. In the longer term, this collaboration may provide a model for an integrated approach to DMP across all funders.
Gabrielle Gardiner (University of Technology Sydney)
Elizabeth Mulhollann (presenter) (University of Technology Sydney)
This paper will describe the tools, resources and communication strategy designed to support researchers across multiple disciplines, to think about their data from project inception and planning through to publication and promotion. It describes a staged approach to data management planning, just-in-time information design, and tools and techniques for collaborating. This project, based at the University of Technology, Sydney was designed to improve data management planning, capture and discovery across the University as well as influence management policies and processes, but it also demonstrates the value of designing a user needs approach rather than relying on a compliance-based system. The deliverables from the project, including data management checklists and guidelines, guides to data archives, metadata approaches, protocols and tools for promotion will be discussed.
The Elements of the Data Management Plan: A Gap Analysis and Recommendations
Amy Pienta (University of Michigan)
Mary Vardigan (presenter) (University of Michigan)
Linda Detterman (University of Michigan)
Peter Granda (University of Michigan)
Many federal funding agencies, including NIH and most recently NSF, are requiring that grant applications contain data management plans for projects involving data collection. To support researchers in meeting this requirement, ICPSR is providing guidance on creating such plans. ICPSR published a list of elements for creating a data management plan. To determine the list of elements, ICPSR conducted a gap analysis of existing recommendations for data management plans and other forms of guidance made available for researchers generating data. The result of the gap analysis was a comparison of existing forms of guidance around the world. Findings from the gap analysis will be discussed in this presentation.
Kathleen Fear (School of Information, University of Michigan)
This paper reports the results of a large-scale survey of researchers conducted in spring 2010 at the University of Michigan, aimed at understanding the variety of data management practices and concerns across disciplines. We found differences in how researchers go about managing and preserving their data and, importantly, differences in what different fields felt their most important needs for support were. For example, some groups felt a secure repository for data would be the ideal solution to their problems, while others were enthusiastic about the idea of consulting services that could help them create data management plans for particular projects. Understanding disciplinary differences in data management and the impact those differences have on the kind of support researchers need from the university is critical to implementing a successful data management program. This paper contributes to the literature on disciplinary difference in data practices and directly explores the problem of structuring services for different groups.
2011-06-03: F2: Statistical Metadata Strategies and Benefits
Experimenting with DDI3 at the UK Data Archive: Moving Forward While Accommodating Legacy
John Shepherdson (UK Data Archive)
This paper will present the UK Data Archive's work on building a new metadata infrastructure to accommodate new features of DDI3, while enabling legacy metadata to be seamlessly created and used in our everyday services. Pilot work has been concentrating on enhancing the Archive’s Question Bank using DDI3 and exposing the richer features of chosen longitudinal data series. Adopting new enriched metadata capabilities is a challenge for established Archives who are often bound by the way they have chosen to expose and publish metadata. Any new systems must either compliment or replace existing systems using older version of DDI, such as online catalogues and data browsing and display tools like Nesstar. The older the archive the greater the legacy! Ensuring that older systems and their workflows are richly documented is an important part of being able to move on.
Since the publication of DDI 3 in 2008, a lot of attention has focused on the expanded features and tools being developed around this new structure. At the same time DDI 2 and earlier versions continue to support the work of data archives worldwide and, with the advent of the IHSN Microdata Toolkit, they also facilitate the collection and management of survey data in many developing and transitional countries. DDI 2 has a stable and growing community of users who will either continue to use this development line or will eventually work in an environment where both DDI development lines are used. In order to support this user base and its evolving needs, DDI 2.5 was developed. This new version adds coverage for features of simple surveys found in DDI 3 and better supports the translation of metadata between the two DDI development lines. DDI 2.5 is backwards-compatible with earlier versions while adding support for new elements and better communication with the DDI 3 structures. This presentation will highlight new features and explore some of the use cases in which DDI 2.5 can benefit the current user community.nbsp;
Application of Technological Standards to Improve Documentation and Exchange of Statistical Information. A Perspective from Mexico
Abel Alejandro Coronado Iruegas (INEGI)
INEGI, the National Statistical Office of Mexico, is implementing two technological standards supported by several international organizations (UN, IMF, World Bank, Eurostat, ECB and BIS), DDI and SDMX. They will be part of the technological core of a National Information System, and will be used to improve the documentation and interchange of statistical information. Both standards are part of international efforts that will help national institutes of statistics and international organizations to improve understanding, quality, accessibility and comparability of the statistics. DDI is a standard developed to improve the quality of metadata documentation for the whole of a statistical project. SDMX has been created to integrate data and metadata when exchanging statistical information, mainly in the form of series of indicators. Even though DDI and SDMX have been designed for different purposes they complement each other and both have some commonalities that are being analyzed to establish an integral system that goes from beginning to end of the statistical lifecycle.
Introducing metadata standards to a National Statistical Organisation
Adam Brown (Statistics New Zealand)
After years of development, the advent of good quality and comprehensive standards for describing statistical data has led us to the next challenge; introducing these standards across our diverse and varied organisations. This paper will examine the experiences of introducing improved statistical metadata management at Statistics New Zealand, including the initial introduction of the DDI and SDMX standards. Organisations widely support the idea of data reuse and sharing. However the value of data reuse will only be maximised with widespread acceptance of the importance of high quality and comprehensive metadata. The development of high quality metadata can be facilitated through the application of metadata standards but it is vital to overcome the barriers, build understanding and demonstrate value to ensure these standards can fit into an organisation with mature processes and an entrenched culture. Differences in terminology must be overcome, attitudes must be changed and the complex must be reduced to the explainable. This paper will be a practically focussed case study on the challenges faced and opportunities gained through this process at Statistics New Zealand.
2011-06-03: F3: Government Data Dissemination
Keeping the User out of the Ditch: The Importance of Front-End Alignment
Daniel Coyle (ProQuest)
Web-based, stand-alone datasets offer novice data users a dizzying array of choices before they find the data they're looking for. This presentation will describe the evolution of ProQuest Statistical DataSets, which employs a single interface to access over 630 datasets. Beginning with the three-part screen commonly used in executive information systems, ProQuest and Conquest Systems have altered that interface design to meet the research needs of undergraduates, an effort guided in large part by videotaped sessions with students as they use the product for the first time. The presentation will include screen shots of government dataset sites as well as Statistical DataSets.
Data Feed Collaboration between Academia and Government to Improve Dissemination of the UK 2011 Census
Dave Rawnsley (Mimas, The University of Manchester)
The Census Dissemination Unit (CDU) at Mimas provides access to the Census aggregate datasets for the UK academic community and champions the needs of that community in discussions with the Office for National Statistics (ONS), the UK statistics authority. This paper will describe how the CDU engaged with the ONS to develop a new way that aggregate census statistics are stored, delivered and used, working in collaboration with them to revolutionise the data delivery mechanism for the 2011 Census. The ONS is now committed to using these ideas to create a ‘data feed’ approach to delivering the 2011 Census. What has developed is a synergistic relationship that allows the transfer of knowledge from the academic community and utilises the skills of both parties to develop ideas and products that will enhance the way that the 2011 Census is used not only by the academic community, but by business, government and third party census application developers. The ONS is currently co-funding a project at the CDU to create test datasets for use in developing an Application Programming Interface that the CDU and other developers can utilise to develop applications for delivering census data to their respective communities.
A New Initiative: Access to the Statistics Canada's Public Use Microdata Files Collection
Michel Séguin (Statistics Canada)
Jennifer Pagnotta (Statistics Canada)
For many years, users were indicating the difficulties in accessing the Public Use Microdata Files from Canada in order to conduct international comparisons or to conduct studies on the Canadian society.nbsp; In order to respond to that particular need to access the full Statistics Canada's Public Use Microdata collection, a subscription fee service has been put in place. This service aims at national and international organisations that are not members of the Canadian Data Liberation Initiative who would like to use and share statistics Canada's Public Use Microdata Files within their organisations for non-commercial purposes. This service offers access as well as support through a listserv.nbsp; We hope that through this service which offers a one stop shop, users will be able to add the Canadian perspective to their studies.
Feel the Feed: Dimensionalisation, Dissemination and Definitional Comparison of Aggregate Statistical Datasets
Richard Wiseman (Mimas, The University of Manchester)
Rob Dymond-Green (Mimas, The University of Manchester)
InFuse is a radical new interface to the aggregate datasets from UK censuses. It has been developed to exploit the potentials of the ‘data feeds’ created by the Census Dissemination Unit, which combine restructured, multi-dimensional versions of the original census aggregate datasets with open standards descriptions and publication via web services. InFuse demonstrates some of the end user benefits of a data feed approach to dissemination, such as simple, meaning-based search across entire datasets, and the integration of data and metadata for use in interface design and supply to users. It also provides initial solutions to some generic challenges, including management of the sparsity of multi-dimensional datasets through guided queries, and complex operations upon hierarchical structures. An important challenge to be addressed in developing the information environment is to make it easier to use information from multiple, disparate aggregate datasets in combination.nbsp; Further research aims to develop new measures of similarity between the various definitional elements of the multi-dimensional census aggregate datasets. New structures will also be required within the data feed to store and disseminate this information to make it available and useful. The aim of this research is to enable cross searching of multiple datasets to return equivalent aggregate counts, together with information about the nature and strength of their comparability.
2011-06-03: F4: Curate, Manage, and Share: Support and Repository Services
Building an Open Data Repository for a Specialized Research Community: Process, Challenges, and Lessons
Limor Peer (Yale University)
Ann Green (Digital Life Cycle Research and Consulting)
In 2009, the Institution for Social and Policy Studies (ISPS) at Yale University began building a specialized repository. The goal was to create an open access digital collection of social science experimental data, metadata, and associated files produced by ISPS researchers for the purpose of replicating research findings, further analysis, and teaching. Files are submitted to a rigorous process of quality assessment and normalization, including transformation of statistical code into R. Other requirements include: (a) that the repository is integrated with the current database of publications and projects publicly available on the ISPS website, (b) that it offers open access to datasets, documentation, and statistical software program files, (c) that it utilizes persistent linking services and redundant storage provided within the Yale Digital Commons infrastructure, and (d) that it operates in accordance with the prevailing standards of the digital preservation community. In partnership with Yale’s Office of Digital Assets and Infrastructure (ODAI), the ISPS Data Archive was launched in the fall of 2010. It currently holds 360 files for about 20 studies. We describe the process of creating the repository, discuss prospects for future similar projects, and explain how this specialized repository fits into the larger digital landscape at Yale.nbsp;
Robin Rice (University of Edinburgh, EDINA and Data Library)
For data sharing, access and management in the future to become a higher priority within scholarly communication than it is now, new generations of scientists and scholars need to learn to do research in ways that support these ends. The Research Data MANTRA project (2010-2011) aims to develop online learning materials which reflect best practice in research data management grounded in three disciplinary contexts: social science, clinical psychology, and geoscience. The resulting materials will be embedded in three participating postgraduate programmes and made available through the university's Transkills programme for use by all postgraduate and early career researchers as well as made available generally through an open license and deposited in JorumOpen, a national repository for open educational resources. In addition to web-based 'chapters' that students can work through at their own pace, the course will include video interviews with leading academics about data management challenges, and practical exercises in handling data in four software analysis environments: SPSS, NVivo, R and ArcGIS. The project is a partnership between the Data Library in Information Services and the Institute for Academic Development, at the University of Edinburgh and is funded by JISC as a part of its UK programme, Managing Research Data.
Women Pioneers in Canadian Sociology: A Case Study for Qualitative Research Data Management, Sharing and Re-Use
Berenica Vejvoda (University of Toronto, Map Data Library)
The University of Toronto's Map Data Library, Information Technology Services (ITS) and T-Space (University of Toronto's Research Repository) are working collaboratively to preserve and facilitate future access to Margaret Eichler's (Professor, Department of Sociology and Equity Studies in Education, University of Toronto) primary research data, conducting in the mid-1980s. The original research data consists of 30 interviews with, then, leading pioneers in Canadian Sociology, all born before 1930. This presentation will address the value for curating qualitative research data, especially for future secondary analysis. A data management plan specific to qualitative interview data will also be developed and presented based on a life-cycle approach to data. Information technology solutions for data preservation and access will be addressed. Furthermore, since anonymity takes away potential richness of this particular qualitative dataset, special attention will be also given to solving special privacy and security concerns and how they relate to provision of access to future researchers, whilst complying with university ethics regulations. The main objective of this proposal is to therefore address qualitative research data management issues as well as to unveil the future access and re-use benefits for researchers.
2011-06-03: G1: Continuity and Change - Tales from the Development of the New Australian Data Archive
Data archives in a web services world - An overview of the new ADA
Steven McEachern (Australian Data Archive)
Deborah Mitchell (Australian Data Archive)
This session provides a set of three presentations on the Australian Data Archive, the successor to the Australian Social Science Data Archive. The three papers chart the development of three aspects of the change in the archive from ASSDA to ADA: shifting technologies, changing data types and formats, and expansion across research disciplines. At the same time, the papers also address the challenges associated with the growth of ADA, and provide insight into the changing role of data archives in the new open data environment.
Building a Criminal Justice Data Archive for Australia
Toby Burrows (University of Western Australia)
Leanne den Hartog (University of Western Australia)
This paper will look at the new Criminal Justice Data Archive for Australia, which is being developed as part of the Australian Data Archive. Among the topics covered will be the various types of data being included, and the sources of the datasets. The archive draws on datasets from multiple agencies in both State and Federal jurisdictions, and the paper will discuss the complexities involved in obtaining and delivering data from a range of different government bodies. The new archive is a partnership between the Australian Data Archive and the National Criminal Justice Research Data Network. The paper will discuss the respective roles of the two partners and their expectations from the service. Requirements and arrangements for managing security and access will also be examined. We will conclude by comparing the Australian service with international models, especially the U.S. National Archive of Criminal Justice Data.
Gabrielle Gardiner (University of Technology, Sydney)
Elizabeth Mulhollann (University of Technology, Sydney)
Kirsten Thorpe (University of Technology, Sydney)
Alex Byrne (University of Technology, Sydney)
Len Smith (University of Technology, Sydney)
Mike Jones (University of Technology, Sydney)
You enter the dark and dusty building - confronted with room after room of a researcher's life's work. Their office has been packed up in a rush by someone with no care or consideration for the content. Amongst the detritus of research life - magazines, coffee mugs, baseball caps and other interesting but irrelevant markers of a life well lived - there is material of remarkable historical value. Imagine - a treasure of inter-war census material relating to Australia's indigenous population. Hear the story of the lost Aboriginal Population Register. ATSIDA and ASSDA staff will describe the process for extracting relevant material and compiling these resources into vital digital datasets for Australian Indigenous research. Behold as our intrepid archivists traverse: Respectful engagement with data that relates to Australia's indigenous population, many of whom were forcibly removed from their families How to capture, document and then disseminate numeric data embedded in paper records Establishing workflows for working with print-to-digital conversion Workflows for working across multiple locations Recovering lost data formats - who still has 80-column cards these days? Rated PG-13
2011-06-03: G2: Facilitating Secure Access to Confidential Data
DDI for Restricted Data Contracts
Lisa Neidert (University of Michigan)
We have developed a database of the conditions of use based on all of the restricted data contracts we hold. The conditions are represented in natural language such as "Disclosure Limits" with a link to the disclosure conditions specific to each contract. The database is available from the web. The interface allows one to compare contracts to see if they share the same conditions; to create easy to read reports of contract conditions for contract users; or to pick and choose conditions for a new restricted data contract. The database also allows one to present contracts in a structured language. We have met with data providers to let them see what their contract looks like compared to others; provided "penalty" and "disclosure" language across multiple contracts to researchers and data providers respectively; and produced reports to remind researchers of the contract. We plan on sharing this with the campus IRB as some of the decisions they make are based on contract conditions they are unaware of. At a later point, we will add "security" conditions. These usually reside in an appendix and tend to morph as best practices in computing security evolve over time.
Richard Welpton (UK Data Archive, University of Essex)
Felix Ritchie (UK Office for National Statistics )
Considerable demand exists from researchers for detailed data collected by European-wide surveys. However, the benefits from analysing such data are little realised because of poor access. The problem is often compounded by the risk-averse nature of data owners. Researchers face considerable hurdles in accessing data, especially from more than one country - often they must travel considerable distances to access data at 'safe rooms' located in National Statistics Institutes, which consumes precious time and money, and deters young researchers. Fortunately, Research Data Centres (RDCs) have emerged which provide innovative access solutions for the convenience of researchers and data owners. This paper considers how more extensive 'Decentralised Access' could work in practice. We explore how the research community can reap benefits from an integrated network of RDCs, striving to deliver access to data of different sensitivities, and from different countries, throughout the European Union. We argue that 'friendly' competition amongst RDCs to provide better and innovative solutions for researchers and data owners can only lead to better provision of access to micro-data.
Human Security in Protecting Confidentiality of Data Sharing
Reza Afkhami (UK Data Archive)
Many researches on information security have shown that the human element is crucial to the majority of confidentiality compromise. Most organisations focus on technical security whereas neglecting the human elements of information security. Awareness building and training have an important role to protect people’s vulnerabilities against attack directed at the subconscious. This paper is aiming to demonstrate the need to complement the technical IT security countermeasures with human security protection, to assess the level of risks connected with human security as weakest link in the cycle of data security model and to improve confidentiality protection in human security management in different scenarios of data sharing. The SDS (Secure Data Service) approach as an exemplar towards the establishing of safe people and safe use of data will be discussed.
2011-06-03: G3: Social Networks and User Engagement: Sharing Data and Knowledge
Digital monitoring of societal discussions in Online Social Networks
Timo Wandhoefer (GESIS - Leibniz Institute for the Social Sciences)
Peter Mutschke (GESIS - Leibniz Institute for the Social Sciences )
Mark Thamm (GESIS - Leibniz Institute for the Social Sciences )
York Sure (GESIS - Leibniz Institute for the Social Sciences )
Online Social Networks like Facebook, Twitter and YouTube are increasingly used as platforms for discussing societal issues with a broad online community. The challenge now is to handle the “flood of information” in Online Social Networks such that citizens’ opinions will be heard by stakeholders. The society’s need of “closing the loop” between stakeholder and society is the aim of the European Commission founded project WeGov (“Where eGovernment meets the eSociety”). The goal of the project is to develop a software toolkit that allows stakeholders to get in a two-way dialog with citizens in Online Social Networks. The paper presents use cases for detecting, tracking, mining and monitoring opinions on societal topics that take place in online communities. Furthermore, it is discussed how those techniques of digital monitoring could be enhanced by Social Science data services, such as online surveys allowing a stakeholder to get the citizens’ opinion on a special topic, or search services that may enrich discussions by research data, literature and experts retrieved from Social Science databases.
ScholarLib: Sharing Resources and Data by Linking Scientific Information Portals with Online Social Networks
Peter Mutschke (GESIS - Leibniz Institute for the Social Sciences)
Timo Wandhoefer (presenter) (GESIS - Leibniz Institute for the Social Sciences)
Mark Thamm (GESIS - Leibniz Institute for the Social Sciences)
York Sure (GESIS - Leibniz Institute for the Social Sciences)
Many studies have shown an increasing use of Online Social Networks for scientific work. Social networking platforms therefore provide the strategic chance of enhancing knowledge exchanging and networking processes in science by getting into a digital dialogue with a broader community and, in particular, by making use of the viral effects of Online Social Networks. However, Online Social Networks are usually not linked with scientific databases such that larger amounts of scientific content cannot easily be transferred to and shared via Online Social Networks. The major goal of ScholarLib therefore is to provide a framework for close coupling of Online Social Networks with scientific information portals. The goal is to make search functionality of portals available at Online Social Networks and, the other way around, to enrich scientific portals by social information provided by Online Social Networks. The paper presents a first prototype that links existing social networking platforms (e.g. Xing, iversity) to the German Social Science information portal sowiport, allowing the users in Online Social Networks to search, share and annotate publications provided by sowiport.
User Engagement and Collaboration: Challenges and Tools
Sarah King-Hele (Centre for Census and Survey Research (CCSR), University of Manchester)
We describe conventional approaches to user engagement and contrast this with other methods that are being explored by the MethodBox team at the University of Manchester. Approaches that ESDS Government uses to engage with users include formal dialogues through user meetings and consultations, presentations and other forms of broadcast. This has more recently included the use of social media. We have also attempted to encourage collaboration between users by sharing information about users where possible and encouraging users to meet physically and share expertise at events. MethodBox is a new tool designed by a team at Manchester under the auspices of the E Social Science Programme and the myGrid platform. ESDS Government has collaborated with this team in order to make the Health Survey for England available in a Virtual Research Environment. This tool provides scope for researchers to collaborate within an online environment, sharing ideas, syntax and data. Users are encouraged to share their syntax and workflow to allow reuse by other registered users. The designers have been mindful of the needs of policy users who may find traditional access methods daunting.
Open Data in Vancouver: The Inspiration and the Vision
Andrea Reimer (City of Vancouver)
Andrea Reimer is a Councillor for the city of Vancouver and is a passionate advocate for democracy and civic engagement. The City of Vancouver has led the way with the adoption of a resolution in May  that endorsed open and accessible data, open standards, and open source software. Ms Reimer has been heavily involved in this initiative and will share her passion with IASSIST.