Conference Presentations 2014

  • IASSIST 2014-Aligning Data and Research Infrastructure, Toronto
    Host Institution: University of Toronto, Ryerson University, and York University

Workshops 2014 (Tue, 2014-06-03)

  • Data management & curation: lessons from government, academia, and research
    Michele Hayslett (University of North Carolina at Chapel Hill)
    Stefan Kramer (American University)
    Dan Gillman (U.S. Bureau of Labor Statistics)
    Marcel Hebing (DIW Berlin)
    Chuck Humphrey (University of Alberta)
    Steven McEachern (Australian Data Archive)


    The management, publication, and preservation of datasets have become issues of increasing importance for universities, research institutions, and government agencies. While the reasons and mandates for these activities, and the kinds of datasets collected, differ among these types of institutions, other aspects of data management throughout the research lifecycle concern all of them, including (but not limited to): the discoverability of their data; the choice of metadata standard(s) and the creation of metadata; providing visualization and interaction with data; selection and migration of data formats for long-term preservation; policy development; and storage requirements. Yet, these types of institutions tend to follow different paths in data management and curation, choose different infrastructures, metadata standards and platforms. Are these different approaches inevitably rooted in the differences between these types of organizations and their missions and culture? Or are there lessons they could learn from each other to improve their own practice? The purpose of this symposium is to explore that question. We will have presentations first, then form breakout groups along the lines of different aspects such as platform choice, policy developments, metadata creation. At the end, all will come back together to share the results of their discussion.

  • Teaching an introductory workshop in digital preservation
    Laurence Horton (The London School of Economics and Political Science)
    Alexia Katsanidou (GESIS – Leibniz Institute for the Social Sciences)


    GESIS’s Archive and Data Management Training Center provides introductory level two-day training events in “First steps towards digital preservation”. This workshop is an overview of our training, introducing participants to the design and intended target audience of our events and showcases our digital preservation support and training.

    Adopting a “train the trainers” approach, the workshop addresses those interested in conducting organizational level training. The workshop addresses archivists, librarians, repository or research data center staff, and anyone responsible for planning curation and preservation of digital assets independent of disciplinary background. Intended as a primer, the workshop requires no previous experience, introducing participants to the “organizational dimension” of digital preservation.

    Participants have the chance to try our training materials and exercises on:

        What is digital preservation and why do we need it?
        Introduction to the OAIS Reference Model
        Preserving information for a designated community
        Acquisition policies and selection criteria
        Sustainable digital preservation and cost models
        Licensing for preservation and re-use
        Trusted digital repositories


    learning the conceptualization and structure of introductory workshops on digital preservation
    familiarity with content of the workshop and an overview of workshop materials and exercises
    use of the materials to design their own training workshops.

  • Data visualization and R
    Ryan Womack (Rutgers University)


    This workshop will focus on principles and techniques for the visualization of data, with an equal emphasis on theory and implementation. Drawing on classic works by Cleveland (Visualizing Data), Tufte (The Visual Display of Quantitative Information), and Wilkinson (The Grammar of Graphics), a range of best practices for visualization will be illustrated. Recently developed techniques for large-scale, 3D, and interactive visualization will also be discussed. This discussion will be based on works such as Graphics of Large Datasets: Visualizing a Million (Unwin, Theus, and Hofmann), the Handbook of Data Visualization (Chen, Hardle, and Unwin), and Trends in Interactive Visualization: A State of the Art Survey (Liere, Adriaansen and Zudilova-Seinstra) For each of these approaches, methods for creating similar graphics in the R open-source statistical language will be demonstrated, using packages such as ggplot2, lattice, and rggobi. Interactive visualization packages such as shiny and healthvis will also be explored. Prior familiarity with R is helpful but not required.

  • Introduction to QGIS
    Nicole O. Scholtz (University of Michigan)


    Open source geographic information systems (GIS) tools are maturing to the point of being viable for widespread use. QGIS is a desktop GIS system that is cross-platform, free, extensible and interoperates well with other GIS tools. Data services that support desktop GIS software packages are increasingly supporting QGIS, and ideas for promoting open source desktop GIS use and support will be discussed.

    In this workshop, participants will gain hands on experience with QGIS. Exercises will include working with vector and raster data, doing very basic spatial analysis, and producing maps. Participants will leave with resources for learning more QGIS. No previous GIS experience is required.

  • Advanced SDA usage for data librarians
    Tom Piazza (University of California Berkeley)


    Data librarians are often called on to help users generate customized summaries of variables contained in large public datasets. Major data archives such as IPUMS and ICPSR now make many such datasets available for online analysis in SDA. Although users can easily generate simple tables using SDA, more complex analyses often require the use of other analytic procedures and the recoding of variables into more usable categories.

    The purpose of this workshop is to provide data librarians with a greater facility in using the SDA programs for recoding variables and the generation of new variables, in order to be able to produce the customized summaries that are often requested. The generation of subsets of data for input into other analysis systems (like Stata, SAS, and SPSS) will also be covered. Workshop participants will practice using those procedures by making use of the U.S. Census data and some international datasets available in the IPUMS archive.

    Some basic familiarity with SDA will be presumed. However, no special expertise in SDA is required.

  • Introducation to Terra Populus: integrated data on population and environment
    Tracy Kugler (University of Minnesota, Minnesota Population Center)


    In this half-day workshop, Tracy Kugler will demonstrate the new Terra Populus data access system. Building on the MPC’s past experience with demographic data infrastructure projects such as IPUMS and NHGIS, Terra Populus seeks to lower the barriers for conducting interdisciplinary human-environment research by making data from different domains easily interoperable. It incorporates a variety of data types, including census microdata, census summary data, and raster data describing land cover, land use, and climate. The data access system allows users to create customized data extracts blending variables from all data types and providing the output in the user’s preferred format. In this workshop, attendees will learn about the content and data processing capabilities of Terra Populus and learn how to obtain and use the data. Attendees will create extracts, download data over the internet, and analyze it in a statistical, spreadsheet, or GIS software package. (Please note: participants must bring their own laptops.)


1A: Building data collections (Wed, 2014-06-04)
Chair:Maria A. Jankowska

  • Restoring and preserving historic data for research purposes
    Arne Wolters (UK Data Archive)
    Matthew Woollard (UK Data Archive)


    The Enhancing and Enriching Historic Census Microdata (EEHCM) project is a joint project between the UK Data Archive, the Office for National Statistics (ONS), National Records of Scotland (NRS) and the Northern Ireland Statistics Research Agency (NISRA). The project consists of two phases; the first phase focusses on the restoration and preservation of the 1961, 1966, 1971 and 1981 census of Great Britain; the second phase will create public use and researcher samples for each census. This paper will outline the key issues on the restoration of the original data, first read from backup tapes earlier in the millennium and discuss the key issues surrounding the necessity for documentation in the successful recovery of historic data. It will also discuss the key lessons learned in both restoring files, and preserving and documenting current datasets for future use. The second part of the paper will explain what public use and researcher samples have been made available, and the principles behind their construction.

  • ClimoBase: lessons learned while rescuing observational data from extinction
    Katie Smitt (University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science)


    ClimoBase is a dataset that includes approximately 7000 files of climate-related measurements collected over a 14-year period at sites in Churchill, Manitoba, Marantz Lake, Manitoba, and Inuvik, Northwest Territories. The original dataset was created in 1999 with an extraction program built in Fortran. In 2013, this valuable data is barely machine-readable and in dire need of rescue. This paper will describe the steps that were taken to rescue, migrate and preserve the unique observational data within ClimoBase for future use in climate science, and the many lessons that were learned in the process. It will address best practices for a data rescue workflow and the importance of preserving research datasets and their metadata.

  • Coding + Metada = datasets: the Amnesty International data and digitization project
    Amy Barton (Purdue University)
    Ann Marie Clark (Purdue University)
    Paul Bracke (Purdue University)


    The Amnesty International Research Data and Digital Collection Project is a collaborative venture involving Amnesty International USA (AI-USA), and Purdue University researchers. A Purdue University political scientist in her role as a member of the AI-USA’s Archives Advisory Committee, discovered the need to preserve the early records of AI-USA, and in her professional capacity was also aware of the research value of the documents. In particular, she was interested in patterns of change in the culture of human rights appeals and legal instrumentation. Through contractual agreement, the researcher was able to obtain the documents. She was also able to obtain two data sources, both containing data about the Urgent Action Bulletins dating back to 1974. In addition to these data sources, through Nvivo coding, a third data source was created. This paper will discuss the collaborative relationship with AI-USA, the merging of the data sources, additional metadata development, and the final dataset created for the Purdue Researcher. Additionally, three research products resulting from the project will be discussed: a faceted research dataset, a raw dataset inclusive of the merged data sources and metadata, and a human rights controlled vocabulary honed from several existing vocabularies developed for this project.

  • Overcoming challenges in global health data to empower research: Collection, attribution, and dissemination
    Matthew Israelson (University of Washington, Institute for Health Metrics and Evaluation)


    The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center at the University of Washington. In 2011, IHME launched the Global Health Data Exchange (GHDx), the world’s most comprehensive health data catalog and repository. The GHDx includes entries (records) for surveys, censuses, vital statistics, and many other sources of health-related data and provides context for those datasets through pages on countries, series and systems, and organizations. For the 40th iASSIST Conference, IHME will provide a Track 1 Individual Proposal covering the development and implementation of the GHDx, showing how the GHDx enables the successful collection, storage, and attribution of social science data and demonstrating visualizations linking input data to research products. This session will cover the challenges of data sensitivity, management, and the “value chain” of data in research. Presenters will further explore how skilled cataloging, technology and data management improve data linkages, standardize attribution, integrate sources, and ensure data discovery for IHME’s researchers, thereby improving IHME’s publicly available estimates and reports.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


