Once Upon A Data Point: Sustaining our data storytellers
Host city: Montreal
Host: McGill University, Concordia University, and Université de Montréal in association with the Association of Canadian Map Libraries and Archives | Association des cartothèques et archives cartographiques du Canada (ACMLA-ACACC)
Excel has some surprisingly useful capabilities for working with data and is a great tool to master before moving on to more sophisticated statistical software. In this beginner-friendly workshop, we will use an interactive self-teaching Excel file to learn formulas and functions of use for manipulating data. We will also cover how to create macros (super easy) that will help you save hours of your time by automating your work, and we will explore some good practices around modifying or programming macros (not as easy but very rewarding). We will practice all concepts covered in the workshop by working through some fun exercises. Topics covered include: string functions, removing duplicates from a list, using criteria in formulas, Vlookup, locating special cells (blanks, formulas, etc.) in a worksheet, recording Excel macros, saving Excel macros, and modifying Excel macros. Registration note: This workshop will be most enjoyed by participants who are already able to enter a simple formula in a cell.
W2: Mixing GIS and Text Analytics for Better Analysis and Results
Normand Peladeau (Provalis Research)
Workshop participants will learn how to make full use of geographic information embedded in text and images to enhance research findings by relating textual information with GIS mapping. Geographic information systems (GIS) are digital technologies for storing, analyzing, and displaying geographic information. They allow one to reveal the spatial and geographical nature of complex social phenomena through the production of static and dynamic maps. These tools often use data that is quantitative or categorical in nature. They are less well suited for leveraging the richness of information stored in unstructured text data such as interview transcripts, tweets, crime narratives, customer feedback, field notes, or in photographic images. Text Analytics techniques have proven more valuable for extracting insightful information from such unstructured sources. Provalis Research’s QDA Miner and WordStat provide a unique integration of GIS mapping, qualitative analysis, and text mining features. By using QDA Miner and WordStat, researchers can quickly extract useful information from unstructured text data and relate this type of data with geographic information to create an interactive plot of data points, distribution maps, heatmaps, timelines, and other graphic displays that help researchers to enhance their analysis and presentation results.
W3: Working with Messy Data in OpenRefine
Kelly Schultz (University of Toronto)
Leanne Trimble (University of Toronto)
This workshop will provide an introduction to OpenRefine, a powerful open source tool for exploring, cleaning and manipulating “messy” data. Through hands-on activities using a variety of datasets, participants will learn how to: (1) explore and identify patterns in data; (2) normalize data using facets and clusters; (3) manipulate and generate new textual and numeric data; (4) transform and reshape datasets; (5) use the General Regular Expression Language (GREL) to undertake advanced manipulations; and (6) use APIs to augment existing datasets. The workshop will include a discussion of the applications of this software and a comparison with other tools that can be used for similar purposes. The presenters will also share their experiences teaching this material to students. Come prepared to share your own similar experiences with your colleagues!
W4: Data Storytelling with PolicyMap across Disciplines
Lisa DeLuca (Seton Hall University)
Katie Wissel (Seton Hall University)
Elizabeth Nash (PolicyMap)
This workshop will connect the data points of a cross-disciplinary rollout of PolicyMap (a GIS-lite mapping tool) spearheaded by Seton Hall University Libraries. The business and social science librarian will discuss how they reach academic departments and help to create and support PolicyMap assignments. The discussion of the campaign will cover several avenues outreach including: highlighting the tool via web and social media channels; direct outreach for PolicyMap by liaison librarians; and partnering with the Digital Humanities Committee. Strategies for encouraging faculty to create assignments using the tool will be covered including in-class instruction, one-on-one consultation, and the warehousing of assignments in an Institutional Repository as Open Educational Resources (OER). The value of shared resources aimed at fostering collaboration and discussion about mapping tools will be discussed by liaison librarians and PolicyMap data experts. In this hands-on workshop, participants will learn how to manage a rollout, view examples of assignments, and understand how these assignments can enhance instruction. Participants will view and compare specific PolicyMap examples from different data sources and understand how mapping can enable data storytelling. Participants will build their own maps during the workshop to understand the power of data storytelling with PolicyMap’s public and subscription editions.
W5: Finding, Analyzing, and Understanding Polling from the Roper Center
Kathleen Weldon (Roper Center for Public Opinion Research)
The Roper Center for Public Opinion Research is the world’s largest archive devoted exclusively to public opinion survey research data, with a digital collection of 23,000+ datasets and iPOLL, a question bank with over 700,000 entries. The collection covers 100 countries and includes data dating to 1935. In this workshop, participants will learn about the Roper collection, how to find questions or datasets on particular topics, how to identify trends, and how to use the Center’s online analysis tools. Participants will receive an overview of polling methodologies of the past and present and develop an understanding of the factors to consider when judging polling data quality. The workshop will also cover questions about how to access restricted data, how to help researchers meet publication requirements for replication data for research based on Roper datasets, and how archiving with Roper can help researchers meet data management plan requirements.
W6: Collecting and Processing of Survey Data using the Open Data Kit (ODK) eSurvey Software
Peter Smyth (University of Manchester)
This introductory workshop is aimed at anyone interested in creating, collecting, and processing data from their own survey designs using ODK (Open Data Kit). Participants will learn the following: (1) the overall structure of an ODK implementation, (2) how to install and use the XLSform application, and (3) how to download and analyse survey results in JSON format. In the age of eEverything, it is not surprising that there is now a plethora of software available to produce, gather and collate survey information from a variety of delivery methods from SMS to Android applications. Some of them are ‘free’, providing there are a limited number of simple questions and you only want about ten responses. ODK (Open Data Kit) is an open source set of tools that allows the creation and deployment of surveys to Android devices, as well as the collation of survey data to a central server from where it can be downloaded in a variety of formats for local processing. The survey design can be arbitrarily complex with groups and repeating groups of questions as well as optional questions based on previous responses. REGISTRATION NOTE: No specific prior experience is required, but an ability to understand basic commands in Python will be useful.
W7: Processing Lidar Data using ArcGIS
Gerald Romme (University of Toronto)
This workshop will cover using ArcGIS to process lidar (Light Detection and Ranging) data. Attendees will learn to (1) reclassify classified data, (2) extract building heights, (3) generate shorelines, (4) create a tree canopy using lidar, and (5) use ArcGIS to create digital terrain models and digital surface models. REGISTRATION NOTE: A good understanding of GIS principles and a basic-to-advanced working knowledge of ArcGIS 10.4 or higher is expected.
W8: Supporting Data Storytellers with Fedora
David Wilcox (DuraSpace)
Fedora is a flexible, extensible, open source repository platform for curating digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations. Fedora supports data storytellers by providing infrastructure for curating, archiving, and sharing data. Workshop participants will be introduced to Fedora and learn how to create, manage, and share content in accordance with linked data best practices and the Portland Common Data Model. Participants will also learn how Fedora supports digital preservation, including exporting resources from Fedora to external systems and services as part of a digital curation workflow.
W9: Data Literacy for All, with R
Ryan Womack (Rutgers University)
Introducing general audiences to their first hands-on data work often faces formidable barriers. New users typically must spend their time installing, configuring, and learning the programming conventions of specific software environments that may themselves present barriers of cost and compatibility. Importing and wrangling data into a form suitable for use is another barrier. As data professionals, we can apply our skills to develop relatively painless introductions to data that focus on understanding the data itself and analytical concepts, instead of the mechanics of a program. We can customize and tailor our presentations to the needs of particular audiences by developing wrappers around data and functions that simplify their use, and we can develop techniques and interfaces that allow easy data exploration. Using R, this workshop will explore the following: (1) building packages for distributing data and functions; (2) using sample data and functions to illustrate basic data literacy concepts such as descriptive statistics, modeling, and visualization, while keeping the focus on meaning, not mechanics; and (3) building tools for interactive exploratory data analysis by end users. As open source software, R is easily available and can be locally distributed where internet access and computing resources are scarce.
W10: Host Your Own Digital Repository Without Installing Any Software
Abay Israel (ICPSR)
Harshakumar Ummerpillai (ICPSR)
This workshop will include a hands-on demonstration of the national archives hosted at ICPSR and the free self-publication software products implemented on the Archonnex platform. Participants will learn the steps required to set up a digital repository at their home institutions without the need of building everything from scratch. Participants will also learn about the history and the future of the Archonnex technology and about additional resources that are currently available. Since 1962, ICPSR has been an integral part of the infrastructure of social science research with its vast digital archive supporting over 700 member institutions worldwide. With the release of our new digital assets management system “Archonnex,” ICPSR continues this tradition by extending our expertise and digital technology capabilities as a service to larger community. For the first time researchers, institutions, organizations, and even nations will be able to host their own repositories and setup data services for their members. We call it RaaS - Repository as a Service. The next generation of data management is here with tools that cover data deposit and discovery, metadata management, related citations, restricted and public access, curation, reporting, and more. Archonnex supports any digitally born file, follows ADA compliance, and is constantly evolving to support the needs of the data storyteller community. ICPSR’s RaaS fundamentally changes how we think about data science by breaking down some of the more technical barriers experienced across all disciplines.
2018-05-30: Plenaries
Plenary 1: Data journalism: Dispatches from the frontiers of data science and storytelling
Roberto Rocha (CBC)
Data journalism is the marriage of social science and public-interest storytelling, using data as a source and employing reproducible analysis methods toarrive at a conclusion. This is a radical change from traditional ways of reporting, which often rely on anecdotes and paper documents. But despite being at itsheart journalism, where crafting a gripping narrative is just as important as getting the facts right, data journalists haven't always been good at tellingstories. This is changing, and we're entering a golden age of storytelling based on data and stats.
Plenary 2: Cybercartography and inclusion: The Residential Schools Land Memory Mapping Project
Stephanie Pyne (Carleton University)
Providing an inclusive space for many perspectives and encouraging people to express themselves in their own ways is a key ingredient of the kind of deliberative democracy envisioned in Inclusion and Democracy by Iris Marion Young, who recommended a broad approach to rationality that can accommodate alternative forms of communication such as greeting, rhetoric, and narrative. Inclusion is also central to the critical cartographic view of the map as both a process and a narrative vehicle capable of reflecting multiple dimensions. The critical turn in cartography has led to a vast new range of possibilities for mediating reconciliation and social justice initiatives worldwide. Cybercartography is an evolving theoretical and practical framework for transdisciplinary collaborative projects to develop online interactive multimedia atlas websites intended to present multiple perspectives and dimensions of socioeconomic, political and cultural issues. Research began in 2007 on the prototype Cybercartographic Atlas of Indigenous Perspectives and Knowledge(Great Lakes-St. Lawrence Region), which provided the basis for the five-year SSHRC-funded Lake Huron Treaty Atlas Project (2009-2014), and in turn led to the current five-year SSHRC funded Residential Schools Land Memory Mapping Project (RSLMMP, 2015-2020). This discussion tracks the evolution of the RSLMMP with a focus on inclusivity, emergence and the transdisciplinary research relationships that have developed through this evolution.
2018-05-30: A1: Infrastructure for Geospatial Data Discovery and Reuse
One store has all? The backend story of managing geospatial information toward an easy discovery
Nicole Kong (Purdue University)
Geospatial data includes many formats, varying from historical paper maps, to digital information collected by various sensors. Many libraries have started to build a spatial data portal to connect users with information. For example, GeoBlacklight and OpenGeoportal are two open-source projects that initiated by academic institutions, which have been adopted by many universities and libraries for spatial data discovery. While several recent studies have focused on the metadata, usability and data collection perspectives of spatial data portals, not many have explored the backend stories about data management to support the data discovery platform. The objective of this paper is to provide a summary about geospatial data management strategies involved in the spatial data portal development by reviewing case studies in combined with the experiences gained from our own institutional spatial data portal development. These data management strategies include managing the historical paper maps, scanned maps, aerial photos, research generated spatial information, and web map services. This paper focuses on the data organization, storage, cyberinfrastructure configuration, preservation, and sharing perspectives of these efforts with the goal to provide a range of options or best management practices for information managers when curating geospatial data in their own institutions.
Building a historical (20th-century) Mongolian GIS database
Susan Powell (University of California, Berkeley)
In this presentation I will give an overview of a project I have been working on over the past year to build a GIS database for base data of 20th century Mongolia. My goals for this project are to create a useful set of data for researchers and to develop a set of best practices that can be used in other similar projects.For the first part of the project, I identified and scanned historical maps of Mongolia from the University of California Berkeley Library collections. I also spent 6 weeks in Mongolia in May-June 2017 researching historical maps located in the collections of Mongolian institutions and acquiring scans. This part of the project had many unexpected twists and turns. With the help of a student assistant, I have been geo-referencing the scanned maps and digitizing features to create vector layers. The completed datasets will be made freely available through the UC Berkeley Library geoportak, where scholars can access t hem to make maps, perform analysis, and tell some of the rich stories of the 20th century Mongolia. I will share lessons learned during this process - from planning and funding to building the datasets and creating documentation - as well as next steps.
La refonte de Géoindex+: Vers une plateforme partagée porteuse d’histoires
Stefano Biondo (Universite Laval)
Au Québec, l'entente signée en août 2015 entre le Ministère de l'Énergie et des Ressources naturelles (MERN) et le Bureau de coopération interuniversitaire (BCI) a ouvert la porte à un nouveau mode de gestion et de diffusion des données géospatiales. Du jour au lendemain, l'ensemble des 18 établissements universitaires du Québec ont eu accès à plus de 250 couches de données géospatiales occupant plus 50 To. Comment découvrir, visualiser et diffuser efficacement cette masse de données? Dans une vision de collaboration interuniversitaire et de mutualisation des processus et des ressources, la Bibliothèque de l'Université Laval s'est montrée ouverte à mettre à profit son expertise et son savoir-faire dans le domaine géospatial par la réalisation d'une plateforme partagée pour les bibliothèques participantes, dont elle assume la gestion.La conférence vise à présenter l'état d'avancement de l'élaboration de la nouvelle plateforme permettant d'héberger les données du MERN ainsi que les données locales de chaque institution. Le conférencier se propose de partager son expérience comme pilote de projet, et recueillir les commentaires des collègues autant sur sa mise en œuvre que sur les fonctionnalités de la nouvelle plateforme.nbsp;In Quebec, the agreement signed in August 2015 between the Ministry of Energy and Natural Resources (MERN) and the Interuniversity Cooperation Office (BCI) opened the door to a new mode of management and dissemination of geospatial data. Overnight, all 18 university institutions in Quebec had access to more than 250 layers of geospatial data occupying more than 50 TB. How to effectively discover, visualize and disseminate this mass of data? In a vision of interuniversity collaboration and pooling of processes and resources, the Université Laval Library offered to share its expertise and know-how in the geospatial field to create a shared platform for participating libraries, which it currently manages. This presentation aims to present the progress of the development of the new platform for hosting the MERN data as well as the local data of each institution. The speaker proposes to share his experience as project lead, and to collect feedback from colleagues on both the project’s implementation and the features of the new platform
Gamification and fake news: A thriller about the challenges of developing an in-house quiz app
Cristina Magder (UK Data Service)
As a result of an international mobile applications development contest, the UK Data Service (UKDS) was successful in setting up a simple, yet, functional Data Quiz App. From the beginning of 2017, we began to learn that designing, developing, populating and maintaining an app as part of data service is not an easy task.This presentation will cover the challenges we have faced in populating (resources and skills needed) and promoting the app (reaching non-academic audiences) as well as enhancement of the app's design. The taunts of maintaining an app to current in-house standards of a data service provide both constraints and opportunities. Topics will be explored including best practices in content management, using RESTful APIs, implementing ISO approved software in web apps, and the Android/iOS publishing processes. The presentation will focus on the demand of both new technologies and for the "truth"; in a world of fake news, gamifying data is a useful device in helping to encourage and reassure people that "archived" data are powerful sources of accurate knowledge. This story will be a happy-ending narrative, aiming to remind that, as one wise man once said, "Difficulties vanish when faced boldly."
Helping researchers utilize virtual reality (VR) for data visualization
Madeline Wrable (Massachusetts Institute of Technology)
This presentation seeks to inspire others to get started with virtual reality (VR), and see the wide range of applications and collaborations that have come from our own experiences. The background of this idea to bring VR into the GIS Lab at MIT Libraries was based on patrons being able to "walk" around the maps they created. Due to the nature of a GIS lab it took very little to get started, namely the purchase of a VR headset and powerful graphics card. This presentation will detail the workflow of GIS, 3D modeling, to VR. It will discuss how VR can be used to inform space planning, such as we did with our new GIS lab. Another application was to recreate landscapes from drone imagery to allow students to virtually revisit their prior field sites. Movie clips will be included in the presentation so that viewers can experience these visualization capabilities. We will also discuss the navigation of sharing this technology with students, staff, and faculty.
Programming language instruction is a relatively new offering in academic libraries. A variety of approaches have been adopted, with varying outcomes. In this panel, staff from three university libraries will present programming language instruction efforts.Columbia University Libraries initiated the Open Lab concept two years ago, a weekly session to facilitate learning languages such as R and Python. Participants are given a brief lesson with time to practice, ask questions, and learn from one another in an open environment.The University of North Carolina at Chapel Hill Library staff members adopted the Open Labs model for teaching R with campus partner the Odum Institute for Research in Social Science. Other efforts include traditional workshops and participation in a campus working group.Librarians from Duke University explored a variety of teaching formats for providing R and Python instruction ranging from workshops, boot camps, and community engagement events designed to expand the R/Python community on campus while simultaneously increasing their ability to consult in these areas.Panelists will share their experiences and lessons learned in providing programming language support by describing their approaches, attendee characteristics, staffing and partnerships, and community-building endeavors, and close by relating outcomes and plans for the future.
2018-05-30: A3: Road Maps for RDM Services: What Next?
Scaling up RDM services at Duke University: Where we are now, what we have learned, and where we are going
Jennifer Darragh (Duke University)
Sophia Lafferty-Hess (Duke University)
In 2016 the Provost of Duke University provided funds to the Duke University Libraries (DUL) to develop an infrastructure for research data management (RDM) support. In 2017, the DUL hired four new staff positions as part of this effort -- two consultants in a public-service role, and two behind-the-scenes staff to assist with the ingest of data into the Duke Digital Repository. This presentation will cover how this new team (along with existing library stakeholders) worked to build an infrastructure that strives to meet the needs of Duke researchers across all phases of the research data lifecycle, the lessons learned during the building and implementation process, and thoughts on next steps for the program.
Jeff Moon (Canadian Association of Research Libraries Portage Network)
Shahira Khair (Canadian Association of Research Libraries Portage Network)
Lee Wilson (Canadian Association of Research Libraries Portage Network/ACE-NET)
The story of Portage is one of successes, challenges, and hope. As Canada's national, library-based research data management network, the goal of Portage is to coordinate and expand existing expertise, services, and infrastructure in support of academic researchers across Canada. As Portage enters its fourth year of operation, it is useful to reflect upon our journey with stories that illustrate Portage successes, that describe how we're addressing current challenges, and that anticipate, with hope and determination, continued progress toward a collaborative and productive RDM ecosystem in Canada. Speakers in this session will rely upon Portage-sponsored surveys and initiatives, and consultations with stakeholders from across the country, to conduct a retrospective examination of our journey to date in order to share stories of successes, challenges, and gaps in the Canadian RDM ecosystem. Our vision of the future will be informed by these experiences and by emerging policy statements and funding scenarios. Participants will leave with a better sense of what Portage is about, what Portage has accomplished, and what's coming around the next turn in our RDM journey.
RDM roadmap to the future, Or, Lords and Ladies of the data
Robin Rice (University of Edinburgh)
This is a story about a university's brave and dogged journey in research data management service provision. Like the Lord of the Rings, the story has three parts, as indicated by the production of its first, second and third research data management strategic roadmaps. Since part 1 and 2 have been told at a different place and time (http://dx.doi.org/10.2218/ijdc.v6i2.199; http://dx.doi.org/10.2218/ijdc.v8i2.283), this story will focus on part 3, the University of Edinburgh Research Data Service Roadmap: August 2017-July 2020, approved and made public by the service's steering group. The team designated to deliver the outcomes is made up of intrepid adventurers from across the data, library and computing shires of the university. The narrator -- one of the shorter, brown-haired members, a data librarian being without any magical powers -- was tasked with leading the delegation on the journey, through 32 challenging objectives with associated actions, milestones and deliverables (https://www.ed.ac.uk/is/rdm-roadmap). Five overarching themes characterise the Roadmap: Unification of the service, data management planning, working with active data, data stewardship, and research data support. If the many frightening and powerful challenges can be overcome, then surely victory will be grasped like a ring!
Implementing new data management and curation services at the University of Arizona (UA): Lessons learned from the UA data management and data curation pilot
Christine Kollen (University of Arizona)
The development of efficient data management and curation services will allow researchers to devote more time to research, support the goal of developing a research data ecosystem that facilitates data reproducibility, and have a positive impact on the institution's overall prestige and success in obtaining grant funds. The University of Arizona (UA) Libraries, in collaboration with the UA Office of Research, Discovery, and Innovation and the UA University Information and Technology Services, has been providing data management services to campus for the past several years. Are we providing data management services that UA researchers need, what other services should we offer? To answer these questions, we conducted a campus survey, followed up with the Data Management and Data Curation Pilot. The pilot, working with research projects from education, wildlife biology, entomology, cancer research, and public health, provided an in-depth assessment of specific services, tools, training, support, and technology infrastructure researchers need to effectively and efficiently manage and curate their research data. This presentation will provide details on the research projects involved with the pilot, what services and tools were implemented, results of exit interviews with each research project, development of general and specific recommendations, and progress toward implementation.
Breaking the wall: Building an infrastructure to enable multi-disciplinary analyses for social sciences and the Internet of Things (IoT)
Darren Bell (University of Essex)
The UK Data Service has been investing in a unified repository infrastructure (DSaaP -- Data Services as a Platform) that enables true domain-agnostic research. Starting from first principles at the datum level, we have worked on the proposition that the richest impact on policy is generated when analysis can be conducted in one place on multiple domains (in this case social sciences and energy, although the principle is extensible to any domain). Both "big data" from household smart meters and traditional social science surveys can now be explored, linked and analysed in a single user interface, in a single infrastructure, referencing a single architecture and a single metadata model. Traditionally, this kind of "collective intelligence" has been difficult to achieve without expensive and resource-intensive data brokerage services which are hopelessly unscalable. We demonstrate a real-world portal where you can link streaming smart meter energy data and traditional survey data to create derived data products in real-time and explain how this fundamentally alters the way we enable cross-disciplinary research.
The Census Program Data Viewer (CPDV) is Statistics Canada's new web-based data visualization tool that will make statistical information more interpretable by presenting key indicators in a visual dashboard. Driven by geography and analytical indicators the CPDV allows casual users to see complex conceptual relationships with ease. The presentation will provide an overview of the product, how it works, and how it can be used by different user communities. We discuss the process of working with Goecortex Essential Technologies to refine a product to meet accessibility standards and host the large volume of data that has been made available. The effort to produce large volumes of information and integrate it with geospatial information was a considerable challenge and there are lessons learned that we would like to share. The CPDV is envisioned as a tool to allow a great number of non-sophisticated data users to easily access and interpret Census data for reference and research purposes. Feedback from the IASSIST and ACMLA communities will be invaluable to help meet this vision and provide a great experience for users.
Providing high quality data and metadata through the ExploreData search portal
Julia Hermann (GESIS - Leibniz-Institute for the Social Sciences)
Christina Eder (GESIS - Leibniz-Institute for the Social Sciences)
Wolfgang Zenk-Moltgen (GESIS - Leibniz-Institute for the Social Sciences)
The GESIS -- Leibniz-Institute for the Social Sciences Data Archive prepares extensive national and international study collections (cross-sectional and longitudinal) down to the variable level according to international DDI standards and offers data and documents for reuse. Due to continuous developments of data collection methods and the increasing amount of data and documents, the quality requirements for data documentation and data visualization increase. To meet these demands, the GESIS project ExploreData developed an online search portal in which the complex metadata of different large-scale survey programs are offered in a systematic and user-friendly way. Users can quickly get an overview of survey programs and their different components, e.g. temporal, geographic, thematic units, or populations. Users can browse and use those components to filter search results. Additionally, users can search for studies, documents, single variables, concepts, topics, trends, and other metadata using the free text search. This search contains all studies held by the data archive. On the variable level, users can analyze research data online, compare variables from different study collections and compile customized data sets for large accumulation. A portal like this is unique to Germany because it combines the functions of browsing, searching, filtering, downloading, analyzing and comparing variables and studies.
Roper Center: Where are we now? Where are we going?
Tim Parsons (Roper Center for Public Opinion Research)
Over two years ago, the Roper Center for Public Opinion Research relocated to Cornell University. At IASSIST 2016, we reviewed the myriad of challenges behind maintaining uninterrupted service while migrating a 71-year-old archive from one location to another. Since that time, with the migration successfully completed, we have systematically assessed existing technologies, identified opportunities for improvement, and have created an execution plan that will guide our development over the coming year. With the ultimate goal of a completely redesigned archive, internal workflow and improved user experience, our members will enjoy a host of modern tools that can be leveraged to drive their research forward. In this presentation, we will pull back the development curtain to describe the discovery process, explain our method for prioritizing tasks, review items completed and those we plan to execute, and conclude with a picture of what the Roper Center of the future may look like.
2018-05-30: A5: Tales of Improving and Implementing Metadata
Metadata Improvements on historical polling at the Roper Center
Kathleen Weldon (Roper Center for Public Opinion Research)
William Block (Cornell Institute for Social and Economic Research (CISER))
The Roper Center for Public Opinion Research is the largest archive devoted exclusively to public opinion survey research data, with a collection of 23,000+ datasets dating back to 1937 and iPOLL, a question bank with over 700,000 entries. After relocating to Cornell University in 2015, a comprehensive redesign of Roper's archival data model was undertaken to map the metadata to the standards of the Data Documentation Initiative (DDI), using American Association for Public Opinion Research (AAPOR) Transparency Initiative (TI) disclosure elements to guide metadata requirements and presentation. This paper will describe the Center's efforts to make the archive's metadata more comprehensive, normalized, and granular, specifically focusing on the challenges of enriching metadata and documentation related to historical data from the 1930s-1960s. These improvements, including more complete descriptions of quota sampling methods and sample weighting, will more effectively support emerging consensus in the polling community about necessary levels of transparency as defined in the AAPOR TI standard, providing more thorough methodological information for those studies that predate current disclosure requirements. These archival enhancements will also allow the Center to provide new tools, facilitate new analysis, and help researchers develop better understanding of these important collections.
How the Euro Question Bank project makes survey questions available to the social sciences
Seyedeh Azadeh Mahmoud Hashemi (GESIS: Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moeltgen (GESIS: Leibniz Institute for the Social Sciences)
Alexander Muehlbauer (GESIS: Leibniz Institute for the Social Sciences)
Esra Akdeniz (GESIS: Leibniz Institute for the Social Sciences)
This presentation will give an overview of how different Data Archives (CESSDA) can import their survey questions with heterogeneous technologies and formats into Euro Question Bank (EQB). The Consortium of European Social Science Data Archives (CESSDA) Euro Question Bank (EQB) project has implemented a central search facility across all CESSDA survey questions of different datasets in different languages. For this it uses a metadata schema based on the DDI-Lifecycle standard and provides conversion mappings from other metadata standards. All CESSDA service providers can document their contents using those standards with different technologies to supply their survey questions to EQB. The EQB system architecture consists of two main parts, the EQB-Front-end and the EQB-Back-end. The EQB-Front-end is being implemented by Vaadin, a user interface technology for interacting between different web services. The EQB-Back-end comprises different technologies, such as Elasticsearch, Open Source Metadata Harvester (OSMH), etc. Elasticsearch is a search engine used for indexing the JSON format. The Open Source Metadata Harvester (OSMH) is a project by CESSDA to enable harvesting of different formats. It classifies the entities and objects to be harvested and enables repository owners to write their own repository handlers for the technology they use. By providing an easy access to a comprehensive collection of survey questions from studies all over Europe, the Euro Question Bank will contribute to more comparability between past and future survey data within the social sciences.
An evaluation of a metamodel for documenting data transformations in the ALPHA Network
Chifundo Kanjala (London School of Hygiene and Tropical Medicine)
Jay Greenfield (Consultant)
David Beckles (Consultant)
Basia Zaba (London School of Hygiene and Tropical Medicine)
The network for Analysing Longitudinal Population based HIV/AIDS data on Africa (ALPHA) brings together ten African collaborating research institutions conducting community-based HIV surveillance. At the heart of the network's activities is extensive after-the-fact data harmonisation processes comprising transformations of data from disparate sources into predefined uniform formats. While the documentation of the source data is generally understood, the documentation of data transformations involved in creating the ALPHA harmonised datasets in still shrouded in mystery. In an effort to address this issue, our study evaluates the applicability of a generic and customisable framework for describing extract, transform, and Load (ETL) scenarios for documenting ALPHA transformations. We seek to examine whether the framework covers the common ALPHA data harmonisation scenarios. Further, we assess its capabilities regarding facilitation of reproduction of ALPHA data transformations processes. We will also consider the potential for further formalisation of the ALPHA adapted framework within metadata standards such as the Data Documentation Initiative (DDI 4) model.
Susan Hautaniemi Leonard (ICPSR, University of Michigan)
Abay Israel (ICPSR, University of Michigan)
Margaret Levenstein (ICPSR, University of Michigan)
Trent Alexander (ICPSR, University of Michigan)
Attendees will learn about an exciting new venue for collaboration around data linkage methodology and evaluation, a joint project with ICPSR and Texas AM. The presentation will focus on lessons learned by ICPSR while building the DLRep, a repository and community space for researchers involved in linking and combining datasets, as collaboration between social, statistical, and computer scientists. A common benchmarking repository of linkage methodologies will propel the field to the next level of rigor by facilitating comparison of different algorithms, understanding which types of algorithms work best under different conditions and problem domains, promoting transparency and replicability of research, and encouraging proper citation of methodological contributions and their resulting datasets. It will bring together the diverse scholarly communities (e.g., computer scientists, statisticians, and social, behavioral, economic, and health (SBEH) scientists) who are currently addressing these challenges in disparate ways that do not build on one another's work. Improving linkage methodologies is critical to the production of representative samples, and thus to unbiased estimates of a wide variety of social and economic phenomena. The repository will accelerate the development of new record linkage algorithms and evaluation methods, improve the reproducibility of analyses conducted on integrated data, allow comparisons on same and different data, and move forward the provision of privacy-aware integrated data.
Measuring the wage gap: Building a metadata registry for national gender equality indicators
Samuel Spencer (Commonwealth Scientific and Industrial Research Organisation/Aristotle Metadata Enterprise)
Within Australia the gender wage gap presents a broad social problem under the management of the Workplace Gender Equality Agency (WGEA). This agency is tasked with measuring and reporting on wage inequality across employers at a national level. Now in its fifth year of reporting, in 2016 the WGEA underwent a data management assessment that recommended improvements to the metadata management and governance within the agency. This presentation covers a joint project of the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the Workplace Gender Equality Agency and Aristotle Metadata Enterprises exploring the technical and organisational challenges in the development of a national registry for gender equality indicators. During the talk we cover methods for searching and rebuilding metadata standards from recent legislation, data collection instruments and collected data sources. We also explore methods for recording and publishing governance processes, as well as looking at metadata structures for recording key national performance indicators. The result is a system that accurately describes the provenance of data from collection, transformation, and data analysis to final reporting. Lastly, we look at how this has been packaged within a public facing metadata registry and used to augment existing data analysis tools using modern application programming interface design.
Calls to action: How map and data professionals can participate in the truth and reconciliation process
Rosa Orlandini (York University)
In December 2015, the Canadian Truth and Reconciliation Commission released their final report, as well as 94 calls to action, urging all Canadians to work together to repair the harm caused by residential schools and move towards reconciliation. The calls to action provide data and map experts with the opportunity to build relationships, collaborate, and participate in the reconciliation process as well as advance the cause of social justice of indigenous peoples in Canada and abroad. This presentation will discuss several calls to actions that are of particular relevance to individuals in the map and data services community, specifically the actions that refer to microdata, aggregate data, and cartographic information about residential schools and their legacy for Indigenous people in Canada. This presentation will highlight datasets that require better access, as well as priority data and cartographic resources that need to be rescued. These data and resources are essential so that: the National Centre for Truth and Reconciliation can fulfill their mandate; educators can provide enriched curricular materials for students; academic researchers have access to quality data, and Indigenous Canadians can tell their own stories in richer detail.
Connection of curation and metrics in telling the stories of unique data sets: Two specific case studies
Lily Troia (Altmetric)
Dan Valen (Figshare)
Data sharing and the tracking of geospatial and other geographic datasets can reveal conversations and patterns of digital attention around vital societal issues, and help inform future strategies around publishing datasets that can potentially impact social justice efforts. By fostering collaboration among data repositories and those tasked with tracking and managing metrics assessment, we can better understand the rich narratives that emerge around sharing geographic data. This presentation will include data visualizations, explorations of data across subject areas, and geographic regions, and discuss the need for a critical lens that addresses privacy concerns, illustrating these themes via two case studies. One featured dataset maps patterns of implicit racial bias across Europe, exhibiting how open data can support social science research necessary to tackle crucial, difficult-to-discuss social justice issues. In the second example, highlighted in a collection of widely used and impactful datasets, geographic data and visualizations of U.S. urban commuting reveal incredible insight around economic megaregions. By charting the dataset's path from curation through publication to its broad circulation and influence via usage and altmetrics, a rich narrative emerges around this data, illuminating the broad downstream impact potential associated with sharing data for the social good.
The moral of the story: Assessing the social value of data services
Cameron Tuai (Drake University)
As issues of data ethics grow, so does our need for instruments capable of assessing the social value of data services. Drawing upon concepts of legitimacy, this presentation will outline a practical and conceptually rigorous model for telling the social story of data services in a manner that is both recognized and supported by stakeholders.
The role of data supplements in reproducibility: Curation challenges
Courtney Butler (Federal Reserve Bank of Kansas City)
Christina Kulp (Federal Reserve Bank of Kansas City)
While raw data and its collection methods are increasingly made available and well described, output materials created from third party data are more often provided to comply with journal policies on replication with little thought given to reuse. Thus, they are often treated as secondary supplements to the primary research object (the published paper) and are lower priorities for curation. However, research may be verified and built upon through a number of methods that go beyond basic replication, such as reanalysis and extension. In order for curation activities to properly support these progressions, this means broadening the discussion of what constitutes a primary research object so that all necessary components of a research project, including supplements, are discoverable and usable. This presentation will discuss the curation challenges in treating all types of data and code as primary research objects and strategies to overcome these obstacles.
Scientific prognostication: The challenges of pre-registering research studies
Thomas Lindsay (University of Minnesota)
Alicia Hofelich Mohr (University of Minnesota)
Social science researchers are responding to concerns about failures to replicate research, p-hacking, and other dubious practices by developing new procedures to increase transparency and accountability. Among these is pre-registration of research projects before beginning collection or analysis of research data, which forces researchers to better distinguish between findings resulting from a priori hypotheses versus exploration. In pre-registering our own research, we found it difficult to fully describe analysis procedures before viewing the messy and complex data that often result from social science collection. Decisions about how to deal with unexpected properties of the data, such as outliers and skew, are difficult to predict beforehand. We have found ourselves conflicted on how to handle this, as we failed to address these possibilities in our pre-registration. We expect other researchers confront these and other dilemmas with their pre-registrations; how do they handle them? To answer this, we examined analysis plans from pre-registrations filed on the Open Science Framework (OSF) website. These plans varied widely in detail and description of conditional analyses, demonstrating others likely faced similar dilemmas. This presentation will summarize patterns across existing OSF pre-registrations and open discussion on strategies for finding a balance between sufficient detail and analytical flexibility.
Transparency in practice: Testing annotation for transparent inquiry (ATI)
Colin Elman (Qualitative Data Repository)
Sebastian Karcher (Qualitative Data Repository)
This paper presents initial results of a study assessing a new approach to achieving transparency in qualitative research in the health and social sciences: Annotation for Transparent Inquiry (ATI), which is being developed at the Qualitative Data Repository (QDR). Data sharing and transparency have demonstrated benefits in quantitative social science. Scholars have used replication data to generate new findings, to find errors in existing studies, to teach applied quantitative methods, and in some cases to detect outright fraud. As norms for transparency and data sharing are spreading to qualitative research, we need to properly understand the costs and benefits that transparency provides in such inquiry. To assess ATI, we contracted the authors of 15 recently published qualitative studies in health and social science to annotate their work following ATI guidelines. We matched them with 15 reviewers. Authors were asked to catalog their observations and effort as they annotated, and to reflect on the experience once they completed their project. Reviewers evaluated the article both with and without annotations, considering benefits and undesirable effects of annotation. The paper includes an analysis of both groups' answers, and draws on discussions from a related two-day workshop in February 2018.
Bringing documentation, live code and data together using Jupyter Notebooks
Harsha Ummerpillia (ICPSR)
Thomas Murphy (ICPSR)
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. This presentation will discuss and demonstrate how ICPSR is leveraging Jupyter technology platform to deliver interactive codebooks and facilitating code contributions along with data. Also we hope to share our experience with JupyterHub infrastructure, which can provide ready-to-consume cloud computing environments where users can click and launch Jupyter notebooks without any local setup. ICPSR also plans to provide options to download Jupyter notebooks in R Python along with our data bundles, allowing users to easily dive into analysis and data re-use with the help of documentation included in the notebooks.
My first year as a data visualization librarian: Challenges and opportunities
Kelly Schultz (University of Toronto)
As academic libraries have been exploring innovative ways to support and engage with researchers in data and digital scholarship across disciplines, new roles have been created to address growth in these areas. One such exciting area is the provision of data visualization services. Last year the University of Toronto Libraries posted a newly created position of a Data Visualization Librarian for their Map Data Library. And I got the job! This presentation will focus on my experiences over the last year in this new role, working in the Map and Data Library to build upon and expand data and GIS services. I will discuss what I have been working on, what has worked, what hasn't worked, and plans for the future, covering topics such as workshops, consultations, events, partnerships, and space considerations. What is the best visualization tool for your project? Allison Xu, Boston College Are you interested in making visualizations, but you're not sure what kind of visualization tools to choose? Have you ever had questions about which tool should work for certain visualization? I will help you find out the answers if your answer is yes. I've compiled a list of tools that I have used to create a variety of visualizations (e.g. Gephi, Qgis, Tableau, MS PowerBI, Google data Studio, D3.js, Carto, etc.) The presentation is designed to help the audiences to explore different visualization tools and help them find the right tools for their work.
From paper map to geospatial vector layer: Demystifying the process
Peter Peller (University of Calgary)
With paper map use in decline, one of the strategies that libraries can adopt to make the information contained within them more accessible and usable is to extract features of interest from their scanned raster maps and convert those to geospatial vector data. This process adds valuable, unique data to library geospatial collections and enables those previously map-bound features to be used separately in geographic information systems (GIS) software for custom mapping and analysis. Advances in partially automating most of the process have made this a much more viable option for libraries. Although there is no one-size-fits-all automated solution for all maps and map features, this paper provides a complete description of the entire process incorporating examples of the various techniques and software used in selected studies that would be applicable in the library environment.
The state of data visualization support in libraries
Angela Zoss (Duke University)
Andy Rutkowski (University of Southern California)
Justin Joque (University of Michigan)
Data visualization is a rapidly expanding field whose skills, techniques and insights are increasingly intertwined in disciplines across and beyond academia. Given the interdisciplinary nature of data visualization, libraries are uniquely situated to support building skills and communities of practice in this area. A growing number of academic libraries provide instruction, consultation, technology, and spaces to support data visualization as part of a broader expansion in library data services. Since few graduate programs in library science offer training in data visualization, many later career librarians are finding it necessary to develop data visualization skills on the job. Due to factors, including lack of resources, time, and training opportunities, one of the great challenges for researchers and librarians is developing the necessary skill set and expertise in data visualization to engage critically in the discipline. This presentation will provide an overview of current visualization services in a number of academic libraries and the state of the field. It will include an overview of opportunities and challenges as well as a discussion of ways that libraries can expand both their tool-based and critical support for data visualization and data visualization literacy.
Caring for sharing: Improving self-deposit repository publishing systems
Anca Daniela Vlad (UK Data Archive)
ReShare is the self-deposit data repository of the UK Data Service, now in its third year and holding some 1,200 collections of data, spanning a wide disciplinary range of research assets. It has become the primary publishing system for social science research data in the United Kingdom, and over time has sought to continuously improve its publishing process. Areas of focus have been on enabling an easy-to-use interface and an intuitive workflow, and adding useful metrics for us as data publishers. The tool itself has also been an excellent opportunity to raise awareness of creating high quality sharable data, fulfilling our mission to support and train researchers in good data management practices. The presentation will walk through the main features of ReShare and what has been improved, provide an overview of data collections ingested, and speak to possible future improvements. Finally, it will demonstrate the tracking metrics collected via Google Analytics on active users, sessions, locations and devices being used, as well as a segment on broader ReShare metrics, showing compliance with the COUNTER Code of Practice for metrics
FORSbase: Implementing the fair principles for social science data archives
Stefan Buerli (Swiss Centre of Expertise in the Social Sciences)
For a social science data archive, Open Science in the strict sense is a generally a no-go for obvious data protection reasons. However, the FAIR principles can be implemented while at the same time satisfying the data producer, the data user, and data protection laws. In order to reach that objective, FORS -- the Swiss Centre for Expertise in the Social Sciences -- developed FORSbase, an online web application that makes data discovery, deposit, and delivery as convenient and as open as possible for data in the social sciences. Its goal is to combine within a single system and database a wide range of archiving functions and tools for researchers themselves to document and deposit their data, to choose between different degrees of openness regarding delivery, to access data and metadata, and to establish contacts and communicate with other researchers. In sum, for the social sciences appropriate infrastructure is needed to mediate the specific needs of data producers and users in relation to FAIR principles and data protection laws. FORSbase provides just such a technical solution, facilitating the sharing and use of secondary data, while at the same time protecting the confidentiality of research participants.
Facilitating research data access – supporting data management and curation with community driven solutions
Wolfgang Zenk-Moltgen (GESIS Leibniz-Institute for the Social Sciences)
Monika Linne (ZBW Leibniz Information Centre for Economics)
Jonas Recker (GESIS Leibniz-Institute for the Social Sciences)
In the area of the empirical social sciences, research data have always played a major role for advancing knowledge. However, expectations regarding availability of data for research have increased tremendously over the past years. At the same time, higher complexity of datasets and more diverse origins of data lead to even more demanding challenges. In response to those challenges a great number of services were developed to support researchers with the task of research data management. Examples at GESIS -- Leibniz-Institute for the Social Sciences Data Archive for the Social Sciences are da|ra (DOI registration agency), datorium (data repository service), and SowiDataNet (comprehensive research data infrastructure). The latter supports institutional data management in research organizations already in an early stage of empirical projects. The presentation will show the current state of affairs with the implemented SowiDataNet service and the interplay between the tools. Special consideration will be given to data curation workflows, the roles of data depositors -- institutional or individual -- and curators. In addition, the presentation will focus on the functionalities provided by SowiDataNet, e.g. checklists for basic and more advanced workflow steps, versioning of data and documentation, and assignment of persistent identifiers (DOIs). Finally, future developments towards an integration of different services will be presented.
Data ingest -- the process by which data and related materials are deposited with an archive -- is the crucial entry point upon which all subsequent curation, preservation, and dissemination activities rely. Ingest includes describing and transferring files, as well as obtaining legal permissions necessary to establish adequate legal and intellectual control over digital objects. This presentation discusses recent activities at ICPSR to optimize the data ingest process. By optimization, we mean making the process as easy and convenient for the depositor while gathering as much information about the data collection to enable reuse and preservation. Specific discussion points will include (1) a brief history of ICPSR's data ingest process; (2) a comparison of other archives' data ingest processes; (3) recent improvements and enhancements to the ICPSR data ingest process, including from a major update in June 2017; and (4) lessons learned from the June 2017 update.
2018-05-30: B5: C2 Metadata: Automating the Capture of Data Transformations from Scripts for Statistical Packages
C²Metadata: Automating the capture of data transformations from scripts for statistical packages
George Alter (University of Michigan)
Jeremy Iverson (Colectica)
Dan Smith (Colectica)
Ornulf Risnes (Norwegian Centre for Research Data)
Pascal Heus, Jack Gager, Carson Hunter (Metadata Technology North America)
Sanda Ionescu (University of Michigan)
Jie Song (University of Michigan)
The Continuous Capture of Metadata Project (C²Metadata) aims to automate the process of describing and documenting research data that are managed and analyzed in leading statistical software packages (SPSS®, SAS®, Stata®, R). Statistics packages offer limited ways of describing data, and they provide little help in documenting data transformations and provenance. At best the operations performed by the statistical package are described in a script, which more often than not is unavailable to future data users. Even if DDI metadata exists, updating it to reflect changes made by a statistics package is a manual process. C²Metadata is creating tools that read scripts for the four main statistics packages and insert data transformation metadata into DDI files. We are creating a Standard Data Transformation Language (SDTL), which represents data transformations in a JSON format that is independent of the original statistics package. Our software will also render SDTL metadata into human-readable forms for inclusion in codebooks and other forms of documentation. This session will describe and demonstrate these new tools.
2018-05-30: C1: Tellling the Tale of a Collaborative Geospatial Data Discovery Tool
elling the tale of a collaborative geospatial data discovery tool: Harmonizing metadata, investigating usability, and ensuring sustainability
Kathleen Weessies (Michigan State University)
Amanda Tickner (Michigan State University)
Tim Kiser (Michigan State University)
Nicole Smeltekop (Michigan State University)
When at first you succeed, what comes next? Twelve academic libraries are collaborating on a data discovery portal interface and harmonized metadata schema for geospatial data from a wide range of academic and government sources including digitized historic maps. Following the successful 2016 launch of the Big Ten Academic Alliance Geoportal (https://geo.btaa.org/), we now turn our attention to progressing into a sustainable growing project.This presentation will detail the Geoportal's content and interface and the collaboration structure of the project. Additionally, we will discuss a novel user study with results showing interesting differences between users of varied expertise, which fueled changes in the Geoportal interface. Attendees will be oriented to various considerations involved in establishing and maintaining an aggregation of geospatial information from diverse sources, including the access stability issues involved in data hosted by local governments, the generation of metadata for geospatial datasets and digitized maps, and project structure evolution. We hope our experiences will inspire others toward similar kinds of large-scale collaborative projects.
2018-05-30: C2: Enhancing Data Discovery Through the Lens of User Experience
Improving discovery at the UK Data Service: A systematic UX journey
Jeannine Beeken (UK Data Service)
Katherine McNeill (Harvard University)
Repositories and archives must continually review their tools and services in order to stay most relevant to evolving user needs and changes in the environment. To that end, the UK Data Service has launched a robust User Experience Programme designed to study user needs systematically as part of service development and improvement. The organisation piloted the programme through a recent project to enhance and redesign its centralised discovery portal, a process which included a highly successful user survey and qualitative interviews. This review examined the various components of the discovery system, including the user interface, browsing and search functionalities, presentation of results, filtering, cross-referencing, and more. Overall, the process to develop requirements for enhancement involved a complex, yet interactive set of tasks and work packages involving multiple sources of information (e.g. survey, interviews, and various internal records). Such a large-scale enterprise required centralised coordination, agile project management, the creation of use cases and associated requirements, and -- vitally -- teamwork and communication amongst staff in various areas (e.g. requirements developers, software engineers, user support staff, user experience specialists). This presentation will discuss the overall process and share lessons learned for other service providers seeking to review and improve their products or services.
Data stories from 100 year old Finland: Lessons from FSD user survey 2017
Eliisa Haapaa (Finnish Social Science Data Archive)
Hannele Keckman-Koivuniemi (Finnish Social Science Data Archive)
Henri Ala-Lahti (Finnish Social Science Data Archive)
Seppo Antikainen (Finnish Social Science Data Archive)
Tuomas J. Alatera (Finnish Social Science Data Archive)
How could we serve students, teachers and researchers better? What kinds of stories do our users have to tell? What kinds of narratives are told with the data available at the Finnish Social Science Data Archive (FSD)? Data archives support the scientific community in managing the data life cycle in various ways. Data archives are not burial grounds for data. Instead, stories live on in archived data sets. FSD conducted a survey for its users in Fall 2017. The results of the survey will be introduced in the presentation. Respondents included both data depositors and data re-users. Special themes of the survey were communication and the impact of FSD data and services. Some of the questions are similar to what have been previously asked in the user and impact surveys of other data organisations (Arhiv družboslovnih podatkov, European Social Survey, Economic and Social Research Council). Initial results indicate that the vast majority of our users are satisfied with our services and data, and see FSD as a trustworthy institution. Most users thought that the data and services provided by FSD save time and money. However, users would like to have access to a wider range of up-to-date qualitative data (e.g. interview transcripts, textual data).
Daniel Gillman (Bureau of Labor Statistics, United States Department of Labor)
Daniel Chow (Bureau of Labor Statistics, United States Department of Labor)
Jean Fox (Bureau of Labor Statistics, United States Department of Labor)
Ronald Johnson (Bureau of Labor Statistics, United States Department of Labor)
Brandon Kopp (Bureau of Labor Statistics, United States Department of Labor)
Karen Kosanovich (Bureau of Labor Statistics, United States Department of Labor)
William Mockovak; Jesus Ranon; Garrett Schmitt; Thomas Tedone; Clayton Waring (Bureau of Labor Statistics, United States Department of Labor)
As reported previously, the U.S. Bureau of Labor Statistics is developing a taxonomy of concepts and terms describing time-series data the agency produces. The taxonomy will be used for:1. User interface for the DataFinder general data dissemination tool 2. Source for terms to tag documents and reports consistent with the data they describe 3. Guide for the redesign of the BLS web site Currently, it is not possible to know if all data concerning some subject were found, and it is difficult to specify and download multiple time series at once. The taxonomy is expected help overcome these limitations. The BLS cognitive laboratory is used to test question wording, systems interfaces, and the look and feel of web sites. Testing for a resource similar to the taxonomy is new. Since December, the development of the taxonomy stopped, and work to test the taxonomy in the cognitive laboratory began. The design of the cognitive testing is described in the paper. Many kinds of tests are possible, but the taxonomy structure limits those to a few. How incorporating the results of the tests will alter the taxonomy is described. Finally, the future for continued taxonomy work is discussed.
2018-05-30: C3: Putting the CURATE Model in Practice: Training for Data Curation
Putting the CURATE model in practice: Training for data curation
Wendy Kozlowski (Cornell University)
Jennifer Moore (Washington University, St. Louis)
Mara Blake (Johns Hopkins University)
In order for data to be fully publicly accessible to search, retrieve, and analyze, most data require specialized curatorial treatments. Supporting researchers with data curation is an important role that information specialists aspire to fill as the workforce transforms to assume greater digital stewardship responsibilities. Information specialists are experts at identifying, selecting, organizing, describing, preserving, and providing access to information materials, print and digital. Yet, the results of our recent Association of Research Libraries SPEC Kit indicate that staffing and training for data curators is the biggest challenge facing institutions in the next 3-5 years. To address this need the Data Curation Network (DCN), in collaboration with IASSIST, held a one-and-a-half day data curation training event in December 2017 aimed at instructing a wide range of information specialists in the DCN CURATE method. This method sets a foundation for curation treatments that many individuals can build upon in order to appropriately curate research data at their institution. Panelists will report on this training event, highlighting the curriculum, participant curation treatment adoption post-workshop, and lessons learned.
2018-05-30: C4: DDI: Current Products, Future Developments, and Strategic Directions
DDI: Current products, future developments, and strategic directions
Joachim Wackerow (GESIS - Leibniz-Institute for the Social Sciences)
Barry Radler (University of Wisconsin-Madison)
Wendy Thomas (Minnesota Population Center, University of Minnesota)
Jay Greenfield (Independent Consultant)
Steven McEachern (Australian Data Archive)
This session presents an integrated picture of the current activities and future plans of the Data Documentation Initiative (DDI) Alliance. Presentation 1: DDI's current product line: DDI Codebook, DDI Lifecycle and related products. The Alliance continues to support, develop, and encourage the adoption of these specifications. Presentation 2: Preliminary release of DDI 4, the newest DDI version based on an information model. Slated for a mid-2018 release, this "prototype" is not intended for production but provides an opportunity to test and provide feedback on how DDI 4 describes data capture, data stores, transformation processes, studies, and classification management. Functionality will be presented through use cases. The presentation will highlight the model-driven production framework which generates representations like XML and RDF, and the documentation. Presentation 3: The new strategic plan of the DDI Alliance and the vision of a DDI-based infrastructure for the empirical social sciences. The strategic plan covers the standards (i.e. maintaining multiple lines of specifications), community goals (i.e. engagement with the global digital research infrastructure, and solving common problems with current DDI users), and organizational goals of the DDI Alliance. The infrastructure vision proposes a combination of a curated common data element registry with a portal of existing DDI metadata repositories.
Mapping the clinical narrative: Using ArcGIS tools in health education
Alexandra Williams (Frontier Nursing University)
James White (RetroFit Labs)
It is often difficult for students to make connections between data sets, abstract or difficult to grasp concepts, and their patients. Narrative pedagogy offers a way to address this challenge. Storytelling is a powerful tool--one that is well suited for utilization in simulation based education. Using ArcGIS tools, it is possible to combine map data and narrative medicine, bringing new meaning and clinical and community relevance to data evidence. ArcGIS has developed free to use online tools that allow educators to combine multiple methodologies to create interactive activities. This technology method can be used for low to no cost across a wide variety of courses. Through the combined use of mapped data, storytelling in clinical simulation scenarios (in video format), interactive question sections, and narrative text selections, data evidence can be combined with case-based scenarios. These activities can be used to help students visualize abstract concepts, test clinical reasoning, and develop an understanding of patient centered care and cultural competence.
From literacy to acumen: Opportunities for librarians in undergraduate data science education
Sarah Young (Carnegie Mellon University)
With the rapid growth in data access and availability, there has been a growing need across sectors for expertise in turning data into actionable knowledge. Data science has emerged as a discipline to address this need and data science programs are increasingly common in institutions of higher education. In early 2018, the National Academies of Sciences, Engineering and Medicine will present a report to "set forth a vision for the emerging discipline of data science at the undergraduate level". A recently released interim report highlights the need for a curriculum that teaches "data acumen", or the ability to "make good judgements and decisions with data". Given the infancy of this discipline, and current efforts to shape its future, librarians have an opportunity to work with educators to enrich the data science curriculum to realize the National Academies' vision. I will map existing information literacy and data information literacy frameworks to the forthcoming framework of the National Academies for data science education. I will identify areas of synergy and potential new competencies that will be needed to support this cross-domain discipline. I will also examine curricula in undergraduate data science programs to consider current challenges and opportunities for librarian engagement.
Data ever after: Tackling data reference questions at the library
Alicia Kubas (University of Minnesota)
Jenny McBurney (University of Minnesota)
Questions about locating hard-to-find data, including international data, historical and time series data, and microdata are on the rise. These questions range across subject areas, including economics, politics, health, business, and agriculture, and in addition to the challenge of merely finding this data, users often lack general data literacy skills when first asking their research questions. In order to learn more about the landscape of secondary data reference and how librarians can better support researchers and users in this area, data-focused librarians at the University of Minnesota developed a survey targeting library staff in the United States that work with data-related questions. This poster summarizes and draws conclusions from the survey, which focused on the types of data questions received and from whom, how librarians tackle these questions, frustrations they experience, and opportunities for increasing expertise in this area of librarianship.
Deposit, curate, publish, repeat: Developing a workflow for research data in the Duke Digital Repository
Sophia Lafferty-Hess (Duke University)
Jennifer Darragh (Duke University)
Moira Downey (Duke University)
Susan Ivey (Duke University)
Mara Sedlins (Duke University)
In 2017 the Duke University Libraries (DUL) launched a suite of research data management services. One facet of our service model is to provide permanent archiving and curation of Duke research data in the Duke Digital Repository (DDR). This poster will illustrate the workflow we developed for accepting deposits, performing a curatorial review of data, and publishing data within the DDR. Reviewing data deposits prior to publication has allowed the research data management (RDM) team at DUL to help ensure deposits meet (at least) minimum standards for access and reuse. Our current curation workflow helps catch accidental submission errors, allows us to educate researchers on best practices for data documentation, and enables us to make suggestions for how to best format data for portability, interoperability, and preservation as well as enhance dataset metadata for discovery. The staffing model for supporting these curation services will also be detailed on the poster.
Cycling infrastructure in the Ottawa-Gatineau area: A complex assemblage of data
Sylvie Lafortune (Carleton University)
Joel Rivard (University of Ottawa)
The Ottawa-Gatineau National Capital area has a well-developed and well used cycling network of over 1,000 km of cycling routes which spans both sides of the Ontario and Quebec provincial boundary. The cycling network started in the 1980s and over the past two decades has expanded with varying levels of investment. More recently, the City of Ottawa has received provincial sustainable transportation funding and has strategically decided to invest in cycling infrastructure. The purpose of this research is to map out the complex data landscape behind the cycling infrastructure in the Ottawa-Gatineau area, which is largely based on inter-jurisdictional cooperation and partnerships with cycling advocacy groups. The questions we try to answer are: What data are used for infrastructure decisions? How are data collected? Who are the principal stakeholders in data collection? Do stakeholders share data which become part of infrastructure decisions? And finally, are the data standardized? To gain a better understanding of the data, we attempt to create a basic classification system for the data that are collected by all stakeholders. The years examined will be 2001- 2017, which reflects the current corporate entities for both Ottawa (amalgamated in 2001) and Gatineau (amalgamated in 2002).
Penna - qualitative data collecting platform
Jarkko Paivarinta (Finnish Social Science Data Archive)
This poster presents Penna, a new qualitative data collecting platform created by the Finnish Social Science Data Archive (FSD) in collaboration with Tampere University Computer Sciences students. The tool is designed principally for collecting written narratives, but it adapts easily to other kinds of qualitative textual data too, such as open open-ended questionnaires. Using Penna for data collection gives the FSD a great advantage when it comes to archiving data. With Penna we get the data already in a harmonised format including all the metadata needed. We can also be assured that the research participants have been properly informed and consent to data archiving and sharing has been granted. The Penna platform is maintained by the FSD and is offered free of charge to researchers, students, and data collecting organisations that are willing to archive their data afterwards for academic secondary use. This is a great way to promote reciprocity in the academic field; we help researchers with their data gathering, and in return, they open the data for secondary use. At the IASSIST 2018 poster session we will explain more precisely how Penna works and share the benefits of this approach with our colleagues in the data repository network.
A pilot study towards cross-searchable social science data archive in Japan
Makoto Asaoka (Rikkyo University)
Yutaka Maeda (Rikkyo University)
Miho Funamori (National Institute of Informatics)
Masaharu Hayashi (National Institute of Informatics)
Kazutsuna Yamaji (National Institute of Informatics)
Rikkyo University Data Archive (RUDA) is a Japanese data archive established by the Rikkyo University Center for Statistics and Information (CSI) in 2011. Currently, RUDA and National Institute of Informatics (NII) collaborate to establish a cross-searchable data repository integrating different archives in Japan. In this poster, we would like to share the lessons learned from the introduction of RUDA datasets to the NII repository. RUDA is constructed on DSpace, and thus metadata is described under (qualified) Dublin Core. In order to establish cross-searchable infrastructure where compatibility between different metadata schemes is satisfied, appropriate mapping from DC to DDI is required. This poster discusses how RUDA formulates mapping schemes in which semantics are much emphasized. In addition, we would like to discuss to what degree (meta)data sharing is possible. In the above collaborative project, a cloud-type repository system developed by NII is utilized to accommodate and publish metadata offered from data archives. Since such a project where data archives offer their metadata to another institute is less popular, it should be reconsidered to what extent data archives are allowed to offer their metadata while balancing with responsibility and usability.
Gold (AU)DRIPSS: A decision-making framework for knowledge management
Brett Currier (Federal Reserve Bank of Kansas City)
Courtney Butler (Federal Reserve Bank of Kansas City)
The Federal Reserve Bank of Kansas City is developing a digital preservation strategy for its Research Division. As part of this development, we are working from an internally produced framework (Gold (AU)DRIPSS) to guide decision-making. This framework is an extension of the FAIR Data Principles combined with other concerns from the library community and comprises the principles of accessibility, usability, discoverability, reproducibility, interoperability, preservability, scalability, and sustainability. It differs from existing frameworks by taking a more holistic approach, including incorporating more institutional and project management considerations as well as recognizing that an optimal preservation strategy may require subjective assessment and trade-offs. This presentation will review the development of the Gold (AU)DRIPSS Framework and discuss how it has been internally applied to support digital preservation.
Curating police shooting data in R: Letting the data speak louder than assumptions
John Bradford (Mississippi Valley State University)
Sheeji Kathuria (Mississippi State University)
Since the killing of Michael Brown by police officer Darren Wilson in Ferguson, Missouri in August 2014 and the subsequent rise of the Black Lives Matter (BLM) movement, the use of lethal force by police against civilians, especially against African American civilians, has become one of the most salient public issues in the United States today. While information about the individuals involved in fatal police shooting incidents has become more widely accessible to the public, the data are often disparate and poorly curated. The presenter used the R package Shiny to create an interactive data analysis and visualization tool which centralizes seven different police shooting data sources: The Guardian, The Washington Post, MappingPoliceViolence.org, Lott Moody (2016), KilledbyPolice.net, U.S. Police Shootings Database, and Police Killings in Context. The online application enables users to filter, cross-tabulate, and benchmark police shooting fatalities by population or arrest frequencies. The furnished data and visualizations can also be directly downloaded. The application is intended for multiple purposes including but not limited to community activism, academic research, and pedagogy. The poster session will include appropriate technology to facilitate participation from conference attendees. Participants who attend this session will be able to interact with the visualizations using an iPad and/or laptop and provided an easy link to interact with the visualizations after the conference.
Building high-quality survey data and promoting secondary analysis: Activities of the Social Survey Research Department at the Center for Social Research and Data Archives, University of Tokyo
Kenji Ishida (Institute of Social Science, University of Tokyo)
Hiroshi Ishida (Institute of Social Science, University of Tokyo)
Toshiyuki Shirakawa (Institute of Social Science, University of Tokyo)
Satoshi Miwa (Institute of Social Science, University of Tokyo)
The aim of this poster is to give a comprehensive overview of the activities of the Social Survey Research Department at the Center for Social Research and Data Archives (CSRDA). In addition to managing the Social Science Japan Data Archive (SSJDA), CSRDA conducts the Japanese Life Course Panel Surveys (JLPS). The JLPS is comprised of the following four panel surveys: The youth and middle-aged panel, the high school graduate panel, and the panel for ninth grade students and their mothers. In addition to data creation and primary analysis, we disseminate the JLPS data through our archive (SSJDA) and organize the Secondary Data Analysis Workshops, in cooperation with the SSJDA and the Quantitative Social Research Department. JLPS data is one of the most popular surveys in SSJDA, used by 926 researchers and students in fiscal year 2016. In addition, we have organized six workshops for secondary analysis since 2010; over 70 researchers have participated in these workshops. We will continue to provide high-quality panel datasets and contribute to the development of empirical social sciences in Japan. As to our future challenges, we need to strengthen our financial foundation and construct an efficient data management system.
The story of FAFL: Creating real time and up-to-date end user documentation
Lauren Eickhorst (Aristotle Metadata Enterprises)
Having correct and up-to-date documentation is key to a successful online service. It helps new users navigate a site, and is sometimes the first point of contact for them. But as websites change, documentation that guides users may become out-of-date and inaccurate. Where in-person assistance is not always possible, inaccurate documentation can be a barrier to a user continuing with a service. Additionally, writing technical documentation requires the skill to accurately assess the needs of a broad range of users, and present this as engaging material, which can be time-consuming if you don't know how to program. This poster presents FAFL, a real-time editor that helps writers write code for generating screenshots on-the-fly when publishing documentation. FAFL uses restructured text to store instructions and real-time screenshots to help capture what your website looks like so that a writer can write the code needed to keep documentation for a live system up-to-date. By storing screenshots as code instead of pictures, as webpages change the documentation remains up-to-date, or alerts writers to out of date help documentation that needs to be updated. In this poster, we demonstrate how this is used in the Aristotle Metadata Registry to keep end user help accurate.
Identifying and documenting the locations of Indian residential schools in Canada
Rosa Orlandini (York University)
Between 1851 and 1998, over 139 Indian residential schools operated across Canada. The legacy of these schools is the trauma that has left scars in the communities in which the schools resided, as well as for school survivors and their descendants. The precise location of the schools is known for many of them, but this has yet to be determined for the rest. This poster will discuss the methodology, challenges, and outcomes of a project currently underway, in collaboration with the National Centre for Truth and Reconciliation (NCTR), to continue the work of the Truth and Reconciliation Commission and its allies, by finding and documenting the precise locations of these school buildings and properties. The goal is to produce a geospatial dataset and interactive map of residential school locations that shows the location of the primary school buildings; and a bibliography of maps, plans, aerial photographs, and unpublished documents that show the location of school buildings and properties. The data and bibliography will assist the NCTR and their partners in identifying and documenting the location of residential school cemeteries and unmarked graves, as well as assist educators, students, researchers, elders, and survivors in telling their own stories.
An investigation into the extent to which South African repositories comply with international trust standards
Glenn Tshweu (Human Sciences Research Council)
An institutional repository is seen as a valuable tool to manage digital resources within the organisational context. Repositories can have a positive or negative influence on how an institution manages its digital material in relation to accessibility and dissemination of digital material. The functionality and status quo of digital repositories can be assessed and measured based on specific guidelines to determine practicality and efficacy. The guidelines used in this regard are known as international repository assessment standards. This study aimed to develop a South African digital repository trust assessment model based on the criteria of international standards. This study investigated the level of trust compliance that a very small sample of South African digital repositories met -- using the developed model. The investigation process is also aimed at receiving feedback (in the form of recommendations) from digital repository managers to improve the developed model to make it more useful for South African digital repositories.
Shahira Khair (Canadian Association of Research Libraries Portage Network)
The Portage Network (sponsored by the Canadian Association of Research Libraries) is a national research data management (RDM) initiative dedicated to the shared stewardship of research data in Canada. Portage supports researchers and other stakeholders through a library-based network of expertise that fosters the growth of a national research data culture and the development of shared national platforms and services for planning, preserving, and discovering research data. These shared platforms and services are focused on three key areas: planning; data storage and discovery; and training and guidance. This poster focuses on the training and guidance tools and services developed by the Portage Training Expert Group, including the bilingual training aids and online modules that teach librarians and researchers about basics of RDM and how to complete a data management plan. Practicing good RDM will help researchers to ensure that they are curating, archiving, and sharing their data so that it is not lost to future generations.
Supporting access to secure data: A review of procedures and practices
Christine Woods (UK Data Service)
Hersh Mann (UK Data Service)
Beate Lichtwardt (UK Data Service)
The UK Data Service Secure Lab is a remote access data enclave that enables researchers to access confidential and sensitive microdata that otherwise would not be easily available to use for academic purposes. Due to the confidential nature of the data, researchers must complete a number of necessary steps before being able to access data for analysis. Their subsequent use of the Secure Lab is highly controlled to meet security requirements, with UK Data Service staff carrying out a variety of work to support their activity, including statistical disclosure control checks of any outputs that have been produced. This work is highly resource intensive, and in this presentation we will outline how we have been reviewing all of this activity to make the secure lab more efficient and more sustainable for the longer term, while still maintaining a robust and reliable security model. As an increasing amount of sensitive data become available for use through other research data centres around the world, the experience of the UK Data Service will provide useful insights into how valuable data sources can be accessed in a safe and practical manner.
No data librarian? No problem! Designing data themed workshops without a dedicated data librarian
Kelly McCusker (Auraria Library)
Library data services are often designed once a dedicated data librarian is hired. But what if a library is receiving data-related requests and questions, but there is no data librarian? You can't ignore these requests! Instead, you gather library staff from multiple service areas with different areas of knowledge and design introductory data-related workshops to support the needs of your campus. You also work with people outside of the library who specialize in a data-related field to help design workshops. These groups create learning outcomes, detailed lesson plans, dynamic presentations, creative activities, summary handouts, and online research guides. The materials allows for multiple library staff to teach the workshop and provide support for the topics afterwards. In addition, you will need a comprehensive marketing plan to connect with faculty and students on campus. Finally, reviews of the workshops' content and assessment of registrants, attendees, and learning outcomes are key to improving the workshops for the future. Attendees will learn how to collaborate and communicate with individuals within and outside the library to create data workshops relevant to the needs of a campus.
The upside down of data: Honest community engagement
Thu-Mai Christian (Odum Institute for Research in Social Science)
Mandy Gooch (Odum Institute for Research in Social Science)
In our community we hear about the success of initiatives in areas such as data management, new services and tools, training, outreach, and workflows. This is great for discussion and adoption by our community members; however, we don't often hear stories about the small failures or misses, which are just as useful when developing new strategies at different institutions. Sure, some papers or blog posts may include lessons learned, but it would be beneficial to hear honest accounts of what didn't work and why. This interactive poster will collect the "upside down" of data stories -- narratives of efforts that were tried but didn't work. We want to hear stories that tell about strategies that didn't go as planned so as to build an awareness of things "not to do" when pursuing, for example, a new outreach method to engage faculty in data management planning. These stories will hopefully keep our community from repeating the same mistakes, thereby expanding and improving our services more efficiently, with fewer errors or failures. This poster will collect these stories from IASSIST attendees to provide that insight, which will be used to supplement stories shared in a subsequent Pecha Kucha presentation.
Data management planning in Canada: An overview of the Portage DMP assistant
Carol Perry (University of Guelph)
James Doiron (University of Alberta)
Carla Graebner (Simon Fraser University)
Weiwei Shi (University of Alberta)
Launched in 2015, the Canadian Association of Research Libraries (CARL) Portage Network is a pan-Canadian network dedicated to building capacity around the shared stewardship of research data in Canada. Portage activities are organized around developing and sustaining two main components -- a national network of research data management (RDM) expertise and supportive infrastructure platforms. Currently, there are six Portage expert groups focused upon a wide range of RDM related topics including data management planning (DMP). The DMP Expert Group (DMPEG) provides direction on a number of DMP related initiatives, as well as the technical support and delivery of the Portage DMP Assistant, which is recognized as the "gold standard" data management planning tool in Canada. This poster will offer an overview of DMPEG activities such as outreach and development of exemplar DMPs. Current status of the portage DMP Assistant tool, including key features such as customizing the tool for use by Canadian institutions, will be highlighted. Additionally, ongoing and future work relating to such things as ORCID ID support and APIs for creating and sharing DMPs will be discussed.
DDIR: R package to use DDI as a personal tool for social research data analysis
Yasuto Nakano (Kwansei Gakuin University)
'DDIR' is an R package which handles information in DDI format in the R environment. Because a DDI format file collects/contains all information we need in social research activities (e.g. research questions, variable conceptualizations, questionnaire sentences, variable names, value labels, etc.), it is efficient to use one DDI file as a source of information in any step of research activities even for small research project groups or for individual researchers. In the R environment, there is no standard data format for social research data. In many cases, we have to prepare numerical data and label or factor information separately. If we use DDI file as a data file with DDIR in R, only one DDI file needs to be prepared. *DDI* could be a standard data format of social research data in *R* environment, just the same as .sav file is in SPSS. We can retrieve necessary information from a DDI file with DDIR. Furthermore, we can integrate and export information related to the data as a DDI file with DDIR. The DDIR package realizes integrated social research analysis environment with R, and ensures it as reproducible research.
Education for (a) CURE: Developing a prescription for training in data curation for reproducibility
Florio Arguillas (Cornell Institute for Social and Economic Research)
Thu-Mai Christian (Odum Institute for Research in Social Science)
Limor Peer (Institution for Social and Policy Studies, Yale University)
Since its founding, the Curating for Reproducibility (CURE) Consortium has been working to further define its mission and guiding principles, while also engaging in initiatives to achieve primary CURE goals of establishing standards, sharing practices, and promoting data quality review. One such initiative is the Institute of Museum and Library Services funded Data Curation for Reproducibility (Data CuRe) Training Program Planning Project, which will develop an evidence-based training program for data support practitioners that will equip them with the skills and knowledge necessary to perform intensive data curation tasks that support the demand for high quality data and code. This poster will highlight the objectives of the Data CuRe Training Program Planning Project, with an interactive component that will collect IASSIST community feedback on the training needs of libraries and archivists that will enable them to incorporate data curation and code review into existing workflows to support research reproducibility.
International Federation of Data Organizations (IFDO)
Jonathan Crabtree (Odum Institute for Research in Social Science)
The International Federation of Data Organizations (IFDO) is made up of a diverse group of data repositories and archives with a passion for research data and a mission of providing access to these data for the research community. Many private organizations, governments, and educational institutions rely on access to quality data. International groups such as the Research Data Alliance have flourished and have been support by government agencies around the world. IFDO is exploring the opportunity to reach out to these new communities in an effort to help IFDO and IASSIST become a voice for social science research data in these forums. Come to our poster session to discuss these potentials!
ImPACT (Infrastructure for Privacy-Assured CompuTations) integrates Dataverse
Jonathan Crabtree (Odum Institute for Research in Social Science)
Scientific progress today requires multi-institutional and cross-disciplinary sharing and analysis of data. Many disciplines, such as the social sciences, face a web of policies and technological constraints on data due to privacy concerns. Issues of privacy, safety, integrity, and ownership have led to regulations controlling data location, availability, movement, and access. Compliance poses obstacles to traditional data-processing practices and slows research; yet, increasingly, pressing scientific problems of great concern to society demand collaborative efforts involving data from multiple stakeholders. National Science Foundation (NSF) funded ImPACT (Infrastructure for Privacy-Assured CompuTations), will free researchers to focus on science by supporting the analysis of multi-institutional data while satisfying relevant privacy regulations and interests. It is designed specifically to facilitate secure cooperative analysis, meeting a pressing need in the research community. The project will develop methodologies with best practices in networking, data management, security, and privacy preservation to accommodate a variety of use cases. The ImPACT infrastructure will address the privacy requirements of the use cases. Data access, sharing, processing, and publishing will be integrated into privacy-aware systems that allow scientists to use their own tools and that build upon enabling cyber-infrastructure technologies, Dataverse, CyVerse, and ORCA.
Data purchase program: Lessons learned
Jennifer Huck (University of Virginia)
This year, the University of Virginia Library launched a data purchase program. Open to all university affiliates, the intent was to formalize and rationalize our data purchases. In the past, we were willing to make data purchases, but patrons had to know to ask. The data purchase program gave us the something specific to point out to patrons. Formalizing the program allowed us to do better outreach and marketing for this service. We also rationalized the process of purchasing patron-requested data by gathering all of the requests at once and comparing costs and impact. In conjunction with the data purchase program, we created a data collection development policy. Along with the reasons why we created the program, I will share successes and lessons learned from the pilot year. The greatest success was getting applications from patrons in a variety of disciplines that do not typically use much in the way of data discovery services (i.e., beyond the social sciences). Some lessons learned include needing to do some simple user testing in advance, better evaluating timelines, and connecting library liaisons with applicants.
Creating a research data inventory: How much you can get from researchers
Deng Pan (Federal Reserve Bank of Chicago)
The Federal Reserve Bank of Chicago Research Department has developed a Research Data Life Cycle Strategy (RDLCS) on the aspects of identifying, acquiring, processing, publishing, storing and preserving data. To implement RDLCS, a project is created as the first step to inventory what the department has acquired and who has been using these data. This poster will demonstrate the project phases such as conducting a survey, developing metadata elements, and interviewing researchers to collect further information. Researchers' perspectives and insights on inventorying data as well as their collaborations with the data librarian will be highlighted through a few dataset examples in the poster.
Florio Arguillas (Cornell Institute for Social and Economic Research)
William Block (Cornell Institute for Social and Economic Research)
Cornell Institute for Social and Economic Research (CISER) Crosswalk (formerly Setup File Creator), which brings dead data back to life by making it easy, especially for non-programmers, to create programs in SAS, SPSS, and Stata that would read ASCII datasets and create SAS, SPSS, and Stata datasets, including DDI Codebook, by simply entering codebook information in a particular Excel template, has been restructured and made extensible. With the code restructuring, Python programmers can easily extend Crosswalk's capabilities beyond the three leading software packages stated above by simply creating a module that would utilize the Excel input file and write setup files for the software they want, for example, R. The poster presentation will show Crosswalk's new code structure and how easily new modules can be added to the software by those who want to extend its capabilities. Extended capabilities of the Crosswalk from its prior version will also be discussed.
Building on the rich metadata from decades of health behavior studies: The potential for common data elements (CDEs) to enhance the identification of health data across different research projects
Susan Hautaniemi Leonard (ICSPR, University of Michigan)
Vanessa Unkeless-Perez (ICSPR, University of Michigan)
Kaye Marz (ICSPR, University of Michigan)
James McNally (ICSPR, University of Michigan)
Amy Pienta (ICSPR, University of Michigan)
Continued analyses of key datasets are extremely important to vital research questions, and also multiply the benefits of our nation's investment in science. With funding from National Institute on Drug Abuse (NIDA), National Institute on Aging (NIA), and Office of Behavioral and Social Sciences Research (OBSSR), ICPSR is working to increase the use of extant data for health research by making health-related variables easier to identify. To pilot this work, we are adding variable-level metadata from sub-sets of controlled vocabularies (e.g., Common Data Elements (CDEs) from the NIH CDE Repository and ontology terms from SNOMED, PROMIS and PICO), focusing on opioid use and abuse in the National Addiction and HIV Data Archive Program (NAHAP) and dementia and cognitive function in the National Archive of Computerized Data on Aging (NACDA). The enhanced metadata allows the search to find individual variables where each question is narrowly focused (e.g., participants are asked about the use of specific types of opioids, but the term "opioid" was not used), and to reveal variables where search returns are overwhelming, with hundreds of studies with potentially thousands of variables. The process of piloting this work has yielded interesting insights into the strengths and limitations associated with applying CDEs and ontologies (and other controlled vocabularies) to existing studies to make data more discoverable and more usable. In our presentation, we will discuss the challenges of working with CDEs for identifying existing data. This presentation will give an overview of lessons learned and potential pathways forward to increasing the utility of associating variables with existing systems to increase use.
Come take a sneak peek at what to expect at the IASSIST 2019 Conference!
The Safe Data Access Professionals (SDAP) competency framework
Carlotta Greci (The Health Foundation)
Richard Welpton (Cancer Research UK)
Arne Wolters (The Health Foundation)
Christine Woods (UK Data Archive)
Access to confidential microdata is increasingly provided in "safe settings", ensuring the confidentiality of data subjects is protected. Analysts access the safe setting to view and analyse data, with the results of their work being returned to them, subject to a review to ensure the results do not breach data confidentiality. Staff supporting these safe settings carry out a variety of work to support analysts using the environment. While staff gain much experience, there is often little formal professional development which can have negative consequences for staff, services and analysts (including high staff turnover and lack of relevant skills). In 2016 the UK working group for Safe Data Access Professionals (SDAP) developed a competency framework to help staff develop their professional skills. The framework sets out competencies for staff at different stages of their career. It can be used for staff development (e.g. setting objectives, assessing achievements, preparing for promotion) as well as recruitment. It helps staff identify new skills to learn, and develop existing skills. The framework will benefit support staff, and analysts accessing data in a safe setting.
Data in the library catalog: What's the story on the numeric data material type?
Carissa Phillips (University of Illinois at Urbana-Champaign)
Cataloging of data has been an issue of discussion for over 40 years. Even today, though, as we are buying datasets, we struggle with how to best represent those datasets in the catalog so that users can easily find them and, more generally, how to advise users on the most effective search terms for finding datasets in a library catalog. This study proposes to analyze the approximately 37,000 records in WorldCat which have been categorized as "numeric data" in the "material type" field. What types of items have been categorized? Which genres, file types, and other fields have to this point commonly been assigned to items classified as numeric data? By quantifying other elements of catalog records for items which are being presented as data sources, we can gain a better understanding of the challenges in helping users identify data through the library catalog, and perhaps find better ways to address these challenges in new catalog records we create.
2018-05-31: D1: Historical GIS and Telling Stories about the Past
Surfing sources from the sofa: Using the Web to research the history of a place
Peter Burnhill (Independent - formerly University of Edinburgh)
This is an account of cartographic evidence, statistical analyses, and textual sources, accessed from afar to support enquiry into the impact upon the inhabitants of a rural village in 19th century England that became a garrison town and the military centre of an empire. Initial sources included micro-census data for 1851 and 1861, supplemented by vital registration data. Cross-sectional analysis of demographic and occupational structure is supported by attempt at nominal linkage for families, individuals and households to assess longitudinal change. The serendipity of the Web uncovered news of discovery of a strongbox hidden in a church belfry and the subsequent digitisation of manuscripts on land holdings that extend back over 500 years, digitised copies of maps and contemporary guidebooks, with the "Gary Owen" providing link to the U.S. 7th Cavalry. This did mean delay and need to start anew, 'cause writing up is hard to do, but also enabled rich context for statistical conclusion and a rich experience for the researcher. It also prompted recall of IASSIST 1990 with its then futuristic theme of "numbers pictures words and sounds: all will be digital and accessed from afar".
Post offices, railroads and land offices in the Canadian prairies, 1850-1900
Gustavo Velasco (Winnipeg School Division)
This study is an interdisciplinary investigation that contributes to the analysis of Western Canada settlement by incorporating postal data, homesteads records and the historical railroad network into a Geographic Information System (GIS). For this project, I built a database based on textual records, gazetteers, pamphlets, and government documents. In this form, almost 1000 post offices and several land offices were georeferenced in GIS. In addition, I reconstructed the historical railroad expansion year by year to 1900 based on the actual railroad network. The location of post shows a pattern of occupied space year by year. In addition, it shows with certain precision the formation of communities, villages, and towns that emerged during the period. Moreover, spatial analyses that consider the distance of post offices, first to rivers and then to the railroad network, allow one to evaluate the importance of means of communications in the evolution of the frontier of settlement. Similarly, by analyzing updated homestead entries and cancellations data during the period, my investigation found that farmers' failures were more frequent than the classical literature assumed, particularly after the 1890s, a period that scholars regarded as one of more stable settlement.
Mapping historical Mississauga: Uncovering the city's changing landscapes online using historical maps and digital data
Kara Handren (Scholars Portal, Ontario Council of University Libraries)
Andrew Nicholson (University of Toronto)
The City of Mississauga is very young by Canadian standards, becoming a town in 1968 and reaching city status in 1974. In this short time, Mississauga has evolved from a loose rural grouping of villages and hamlets with a combined population of about 95,000 to being the sixth largest city in Canada with a population today of over 720,000. For better or worse, Mississauga is often held up as a case study of post-World War II urban change, and as an emergent Canadian suburb. One of the best resources for exploring this change is through the use of maps and digital data. The Scholars GeoPortal, a geospatial data discovery and delivery service, provides access to hundreds of maps and geospatial datasets related both to modern day and historical Mississauga, some of which also provide coverage on a national scale. Our presentation will highlight many of these resources and outline how they can provide research impact in various disciplines. By allowing researchers to track the development of Mississauga using maps and data that span the most important decades of the city's history, the Scholars GeoPortal provides a means to discover, analyse, visualize, and teach historical Mississauga.
Not just points on a map! Stories of mapping the historic Welland canals
Colleen Beard (Brock University)
Contemporary technologies for data capture and mapping have revolutionized the way qualitative information can be collected and shared. As the main economic thrust for Niagara in the early 1800s, the three historic Welland Canals traversed through the area leaving significant landmarks. Although most features have been bulldozed or left for ruin, many of the second and third canal features have survived, if you know where (and how) to look. Exercising HGIS research processes, the Historic Welland Canals Mapping Project web app (https://arcg.is/5auDa) documents the digital reconstruction of canal features as they appear on today's landscape. Collector for ArcGIS was used to capture point data of all surviving canal features. What lies beneath these points tell compelling stories that engage audiences to surprising heights. From terrorist attempts to shipwrecks and discoveries left untold, location-based data is transformed into extraordinary anecdotes. Historical photos of these features augment reality and provide startling "then and now" landscape scenarios. This presentation describes the methodologies used to create the Historic Welland Canals Mapping Project and also demonstrates the stories and public interest this data has revived.
2018-05-31: D2: Professional Development in Data Services for Librarians
Building a bridge among experts: Data boot camp for East Asian studies librarians at the University of Michigan, United States
Jungwon Yang (University of Michigan)
In the perspective of social science and data services librarians, providing effective reference services for international data is often not an easy task because of language barriers, copyright laws and license agreements, and different data sharing cultures across countries. Some help from area specialists would improve the chance of data librarians being able to discover and access useful international data sets. Most area specialists, such as East Asian studies librarians, however, are trained in the field of the humanities. They are not familiar with social science data reference services, quantitative research assistance, and data visualization resources. Therefore, it is quite difficult to promote collaboration between data librarians and area specialists in the given conditions. In November 2017, University of Michigan librarians and guest speakers provided a two day data boot camp for the East Asian region specialists in North America. This workshop designed to help them to understand useful and reliable international data resources, to enhance their capabilities to collaborate with data librarians, and to provide effective data consultation services to their own clients. In this presentation, I will explain how we designed the workshop, and will share our experiences what was the challenges and opportunities of this program.
Surveying the data liberators: The Canadian academic data landscape in 2017
Alexander Cooper (Queen's University)
Elizabeth Hill (Western University)
Sandra Keys (University of Waterloo)
Using the results from the 2017 Data Liberation Initiative (DLI) Contacts and Alternates Survey, we will paint a picture of the diversity of the data librarian or specialist across the Canadian academic data landscape. A background of the survey will be reviewed briefly, followed by a description of what the data profession in 2017 entailed and if it has changed since the previous survey in 2012. The DLI Contacts and Alternates Survey explores how data services are implemented in nearly 80 Canadian universities and colleges, types and frequencies of data-related questions and instruction-related activities, research data management at the institution, and the types of training and services DLI provides and how it is used.
Professional development of academic data librarians: A Canadian example
Elizabeth Hill (Western University)
Siobhan Hanratty (University of New Brunswick)
For more than twenty years, the Data Liberation Initiative (DLI) has had a significant impact on how data has supported teaching, research, and publishing in Canada. A partnership between post-secondary institutions and Statistics Canada, an early goal of the DLI was to reduce financial barriers to using Canadian data, but by its very nature, the program also developed a robust communications network and training workshops to respond to DLI Community members' needs. Siobhan Hanratty and Elizabeth Hill have supported the Data Liberation Initiative at their institutions since 2001 and 1998 respectively, and have held leadership roles within the DLI Education (now Professional Development) Committee since 2009 and 2010. As part of a review of the current model of professional development opportunities offered to DLI contacts and alternates, the presenters will explore how data services in Canadian post-secondary institutions have changed over time and how the topics of training presented at regional and national DLI workshops reflect changes within the broader data and research community.
2018-05-31: D3: Integrating Research Data Management into the Research Process
Starting with the end in mind: Data services at the Federal Reserve Bank of Kansas City
Christina Kulp (Federal Reserve Bank of Kansas City)
Courtney Butler (Federal Reserve Bank of Kansas City)
As institutions become more involved in providing the full spectrum of data services to academic researchers, it can be a challenge to provide a realistic roadmap for them to follow as they travel through their individual data journey. At the Federal Reserve Bank of Kansas City (FRBKC), we are using a strategy of mapping data services by beginning with "the end" in mind. Our Center for the Advancement of Data and Research in Economics (CADRE), which includes the library, is designed to contain most of the data services that support our economists in one department. For many projects, the Research Library bookends the data pipeline by providing acquisition services on one end and data curation at the other. However, it is becoming apparent that many of the services still require coordination not only within the library, but with other sections such as Technical Support, Data Science, and Economic Research. This presentation will discuss how we are building a cohesive narrative that hopefully lessens the perils of the individual journey so that our researchers can focus on the mission that matters, the never-ending story of regional, national, and global economies.
A saga in sharing: The story of a young researcher's journey to share her data and the information professionals who tried to help
Sebastian Karcher (Qualitative Data Repository)
Sophia Lafferty-Hess (Duke University)
Sharing data can be a journey with various characters, challenges along the way, and uncertain outcomes. These "sagas in sharing" teach information professionals about our patrons, our institutions, our community, and ourselves. In this paper, we tell a particularly dramatic data-sharing story. It is the quest of a young idealistic researcher collecting fascinating sensitive data and seeking to share it, encountering an institution doing its due diligence, helpful library folks, and an expert repository. Our story has moments of joy, such as when our researcher is solely motivated to share because she wants others to be able to reuse her unique data; dramatic plot twists involving Institutional Review Boards; and a poignant ending. It explores major tropes and themes about how researchers' motivations, data types, and data sensitivity can impact sharing; the importance of having clarity concerning institutional policies and procedures; and the role of professional communities and relationships. Like any good story, ours ends with a moral (or rather, a whole set of morals).
From planning to implementation: Data management in real-time research
Alexandra Stam (FORS, Swiss Centre of Expertise in the Social Sciences)
Brian Kleiner (FORS, Swiss Centre of Expertise in the Social Sciences)
Since October 2017 Switzerland's main science funder, the Swiss National Science Foundation (FORS), requires a data management plan to be submitted along with research proposals. The announcement came as a small tsunami in the research community, with many researchers not trained in data management or comfortable with the concept of data sharing. This is particularly true for qualitative methodologies, which have generally not contributed much to the sharing culture in Switzerland. As a way to catch the next wave, FORS began a reflection some two years ago on how to best implement data management plans in practice. Drawing from researchers' considerations resulting from the new DMP policy, but also from a pilot study we are currently conducting with researchers to examine day-to-day data management issues, in this presentation we suggest a new planning model -- one that addresses funders' requirements but that also guides researchers through the concrete application of data management practices. Our model addresses how original plans should be elaborated and rendered into practice, moving beyond the usual end goal of data sharing, in order to address the immediate data management needs of researchers and their teams.
Downloading has historically been the primary communication between researchers and data archives due, in part, to bandwidth limitations. Today's network connections allow for more two-way communication between researchers and archives. Researchers can continue to download data and codebooks, but they can also contribute notes, programs, scales, and narratives about analyzing archived data. These stories can expand upon documentation and help mentor new researchers by allowing them to learn from others' experience with the data. Stories can become data for analysis; however, stories about data can also become enhanced documentation and metadata. One challenge for a 21st century archive is how to encourage and enable end-user contributions. Shared community knowledge and experience about data will enrich the scientific enterprise, in addition to specific datasets. Integrating end-user contributions into archival activities will require some "pump priming." The Data Sharing for Demographic Research (DSDR) project at ICPSR (University of Michigan) is engaging data users so that they will contribute back information. The DSDR initiatives focused on end-user contributions including mini interviews with study PIs to tell stories of why datasets were collected; and stories from researchers about their data experience, particularly if they overcame obstacles when analyzing data.
Cat among the data: Encouraging researchers to deposit data
Jennifer Doty (Emory University)
Rob O'Reilly (Emory University)
ICPSR encourages representatives to be "sleuthful data stewards" by contacting researchers at their universities about depositing data with the ICPSR. Towards that end, they also provide guidance for representatives on how they might identify data being collected locally. In this presentation, we will talk about our experiences in answering this call, such as how we identified faculty who might be collecting relevant data, how we assessed whether those data might be suitable for the ICPSR or for other locations such as our own Dataverse, and how we worked with subject librarians to develop contacts and get our foot in the door of different departments. We will also share some of the outcomes of these conversations with faculty around managing and sharing their data, conclusions we have reached and insights we have had from this process, as well as some of the questions this work has raised for us in terms of how to provide research data services at our institution.
Why should I share my data if I don't have to? Data sharing as rational choice in a public-goods game
Oliver Watterler (GESIS - Leibniz-Institute for the Social Sciences)
Anja Perry (GESIS - Leibniz-Institute for the Social Sciences)
Data sharing is the provision of a public good. A public good can be used by "everyone" (non-exclusiveness) and does not reduce the availability to others (non-rivalry). To share research data, one has to invest time and resources into data management to ensure reusability of the data (costs). Scientists also partly give up future research ideas as others can work on the same ideas when the data is public. At the same time, the value of shared data (benefit) increases because of new perspectives of looking at the data. This presentation aims to theoretically analyze motivational factors of data sharing by applying game theory. In a basic public goods game, participants choose how much of their private endowment they want to contribute to a public pot. The contributions in the pot are multiplied and evenly shared among the players. Variations of the public goods game, such as punishments and rewards, were experimentally tested to research motivations for providing public goods. Different game scenarios relate to real world regimes like the UK where researchers are "punished" when they do not share their data, or Germany where funders have followed a laissez-fair approach so far.
Gustavo Durand (Institute for Quantitative Social Science, Harvard University)
The Dataverse Project is taking a three-pronged approach to supporting sensitive data: implementing the DataTags interview tool and the resulting file-level security and access requirements, integrating with the PSI differential privacy tool for sensitive data exploration, and specific security enhancements recommended from Harvard University Information Technology Security. In Dataverse 5.0, the application will provide researchers with the tools to classify, explore, store, and share sensitive data. The Dataverse Project team will discuss integration plans, the process of UI/UX testing, and will have plenty of time for questions.
2018-05-31: D5: Sharing Sensitive Data: Interpreting Legal and Policy Frameworks
Privacy and the limits of open data in South Africa
Nobubele Shozi (Council for Scientific and Industrial Research)
Anwar Vahed (Council for Scientific and Industrial Research)
Even though open data has a number of benefits, the biggest challenge facing open data is privacy. Opening access to such data involves trading off privacy for utility or vice versa. Releasing the raw data allows for better engagement with the data; however, this creates privacy risks. Protecting the data limits the usefulness of the data. Therefore a balance between privacy and utility must be maintained. It therefore becomes challenging to release data while ensuring that it is useful. This paper aims to understand the limits of open data in terms of privacy. Given national regulatory requirements in South Africa to ensure data privacy such as the Protection of Personal Information (POPI) Act, appropriate measures for mitigating privacy violation are particularly relevant for the Data Intensive Research Initiative of South Africa (DIRISA), a government funded enterprise to manage and provide access to research data repositories. This paper investigates the currently existing South African open data repositories to identify privacy risks that currently exist in terms of the data fields that are released in the datasets. The repositories are aligned towards the POPI act and privacy mitigation techniques to preserve the data while ensuring utility are recommended.
Current developments in access to administrative data for research
Roxane Silberman (Centre d'Accès Sécurisé aux Données/Groupe des Ecoles Nationales d’Economie et Statistique and Centre national de la recherche scientifique)
Mobilizing the resources of administrative data for official statistics, evidence-based policies. and research is at the top of the agenda. In countries where official statistics are largely based on registers with a national statistical institute in a central position, access to these data has been facilitated earlier than in countries more reliant on surveys for official statistics, eventually decentralized in ministries’ statistical departments, regions, or federal systems. This paper focuses on the French case, a country where administrative data are held by various government bodies under different legal frameworks. Building on previous developments regarding confidential microdata and the implementation of the Secure Remote Access Centre (CASD), changes in the legal frameworks have opened access for researchers to a large bunch of administrative data. Following changes for tax data (2014) and medico-administrative data (2015), the Digital law (2017) is an interesting attempt to provide a general framework dealing with the obstacle of various implementations of "the professional secrecy". The same law allows merging these data using the national identifier both for the official statistics and the research needs. However, a number of challenges arise: harmonization of procedures and requirements; centralized vs decentralized organisation for secure access facilities; metadata; preservation issues.
Diana Kapiszewski (Qualitative Data Repository, Georgetown University)
Dessislava Kirilova (Qualitative Data Repository, Syracuse University)
Christiane Page (Qualitative Data Repository, Syracuse University)
Researchers who work with human participants face competing mandates. Funding organizations increasingly require applicants to develop data management plans that discuss data sharing. Journals increasingly require that authors make accessible the data underlying articles. Institutional Review Boards (IRBs), in contrast, neither encourage scholars to discuss data sharing with study participants nor discourage the withholding of data. Because funders and publishers often defer to IRB mandates, data generated through interaction with human participants -- including data that could be unproblematically shared -- are often only available to the researcher who collected them. We report on two workshops for IRB personnel (a third is planned for May 2018), and a study we conducted of the guidance offered by 50 IRBs at R1 highest research activity universities. While workshop attendees have been open to the ethical sharing of research data, our initial analysis suggests that IRB guidelines rarely mention data sharing. We also propose guidance and informed consent language that anticipate data sharing, around which we seek to build consensus among IRBs.
2018-05-31: E1: Delivering GIS Services: Theory and Practice
Creating critical thinkers in Geographic Information System (GIS) workshops
Jennie Murack (Massachusetts Institute of Technology)
GIS Services in the Massachusetts Institute of Technology (MIT) Libraries has a long tradition of hosting short, one-session GIS workshops that are open to the entire MIT Community. Because GIS support is not available in most departments, many students, faculty, and staff turn to GIS Services to learn software and mapping skills for the first time. Based on workshop feedback and a new vision for the Libraries as a whole, we recently transformed our introductory workshops from software-centric to focusing on the critical skills necessary for participants to become thoughtful map makers and GIS users. We developed and executed a process to incorporate data literacy, cartography skills, and critical cartography values into the workshops. This presentation will discuss the reasons we decided to revamp our workshops, the process we used to develop them, and the techniques we used to teach both critical thinking skills and software skills in one, three-hour workshop. Attendees will learn how to modify their own teaching from software-focused workshops to those that emphasize critical thinking and data literacy skills. The presentation will also provide specific examples of how to do so in GIS instruction.
Data management best practices for geospatial storytelling
Melinda Kernik (University of Minnesota)
Geospatial storytelling has become increasingly popular in academia as a way to communicate research findings and as a replacement for more conventional class assignments. Integrating digital maps with text, images, and other media, story maps are used in a wide range of disciplines from digital humanities to the sciences. Story maps exist, however, in an ever shifting landscape of spatial data services and digital resources -- putting the long-term durability of projects into question. Using Esri Story Maps as a case study, this presentation describes data management best practices for keeping these stories functioning effectively, including strategies for minimizing broken links and navigating the copyright status of "open" data layers. We will also discuss options for backing up story map content and what happens to a project if its creator ceases to have access to an organizational account.
Andy Rutkowski (University of Southern California)
Zoe Borovsky (University of California, Los Angeles)
Yoh Dawano (University of California, Los Angeles)
Dawn Childress (University of California, Los Angeles)
Alex Gil (Columbia University)
In response to Hurricane Maria in the fall of 2017, Columbia University and other universities created a series of map-a-thons that helped on-the-ground relief efforts. Utilizing the infrastructure, skills, and people that are part of digital and data libraries, a set of practical guidelines and how-to has emerged. The resulting "nimble tents toolkit" allows almost anyone, even without a lot of lead-time, to hold their own map-a-thon event. This story is about building community, showing how technology and data transforms lives, and how libraries can become sites for sustained activist and volunteer efforts. The presentation will go over the first set of map-a-thons, how the toolkit emerged, and how it can be deployed by virtually anyone.
Download free geospatial data from open government portal
Jean Pinard (Natural Resources Canada)
The session is an overview of the Federal Government's Open Maps portal. Open Maps provide one-stop access to the Government of Canada’s geospatial information. You can search, combine, visualize, and analyze geospatial data. The portal presently makes more then 750 geospatial collections from 14 federal agencies available in one central location. As well, we present the Toporama interactive map, a demonstration of the Geospatial Data Extraction tool, as well as other federal government applications and sources of data that are useful for students, map librarians, and teachers
2018-05-31: E2: Open Data: Strategies for Sharing Government and Research Data
Where does Canada's social science research data live? An evaluation of data disposition
Peter Webster (Saint Mary's University)
This paper will bring together information from different sources to evaluate the current disposition of Canadian social science and related research data. It will review the more than 20 repositories hosting Canadian social data. Sources of information about Canadian data include the Re3data international data registry and National Research Council Canada Gateway to Research Data, the Canadian Association of Research Libraries Portage project and the Fairsharing data directory, Canadian open government resources, and commercial resources like data.mendeley.com add substantial additional information about Canadian social sciences research data. This review will document the subjects covered, and organizational connections between repositories and the consortia efforts working to coordinate Canadian data collecting. The paper will compare social science repositories to the larger body of data repositories. It will compare government provided data sources, academic, institutional, subject based shared consortia data sources, and publisher based collection data approaches. The paper will outline the considerable progress which is being made in data collection. It will also delineate major issues still to be addressed. Though Canada's data landscape is particular to Canada its course of development and problems will be instructive for other countries developing data services and resources.
Enabling cross-search across social science data archives in Japan - Initiative as part of national endeavor to establish open science infrastructure
Miho Funamori (National Institute of Informatics)
Masaharu Hayashi (National Institute of Informatics)
Kazutsuna Yamaji (National Institute of Informatics)
Makoto Asaoka (Rikkyo University)
Yutaka Maeda (Rikkyo University)
Satoshi Miwa (University of Tokyo)
It has been an issue in Japan that the several social science data archives existing have been started by different research groups and thus are not cross-searchable. This also means the archives suffer from extinction when the faculty in charge retires. If these archives could be integrated to form a scholarly base in Japan, it would advance the scholarship of Japanese social science community and enable long-term sustainability to those datasets in the archives. Integrating data archives has not been possible because of the independent nature and also the limited resources of each archive. However, this could change with the latest national provision of Open Science infrastructure. Japan’s national academic service institution—National Institute of Informatics—currently provides discovery service and institutional repository cloud service mainly for research publications, which will now be enhanced to accommodate research data, as well. If the datasets or at least the metadata of the social science data archives in Japan could be integrated into this infrastructure, it would solve the issues raised above. This presentation will elaborate on the idea and challenges to establish a cross-searchable social science data archive in Japan.
Data archiving for dissemination within a Gulf nation
Brian Mandikiana (Social and Economic Survey Research Institute)
Lois Timms-Ferrara (Data Independence)
Marc Maynard (Data Independence)
David Howell (Institute for Social Research, University of Michigan)
Since 2008, Qatar University's Social and Economic Survey Research Institute (SESRI), has been collecting nationally representative survey data on social and economic issues. In 2017, SESRI leadership established an archiving unit tasked with data preservation and dissemination both for internal purposes and with the intent of disseminating select data to the public for secondary analysis. This paper reviews the lessons learned from creating a data archive in an emerging economy where both cultural and political sensitivities exist amid varied groups of stakeholders. Challenges have included recruiting trained personnel, developing policies for data selection and workflow objectives, processing restricted and non-restricted datasets and metadata, data security issues, and promoting usage. Additionally, there is hope that the presence of the archiving unit adds value for other SESRI research staff involved in the design, collection, documentation, and processing of studies. After successfully addressing these challenges over the past year, the Archive met its objective to launch a data center at the institute's website (http://sesri.qu.edu.qa) and to make multiple datasets available for public download from it. Also to be discussed are the tools, processes, and leveraging of resources that are being implemented as the archiving process continues to evolve.
A tale of three data portals: Open government data in U.S. states
Bonnie Paige (Colby College)
Open government data has been studied at the national and municipal levels, but there has been little previous study of open government data initiatives in U.S. states. In this presentation, the open data sites of three northern New England states (Maine, New Hampshire and Vermont) are considered as case studies to explore the successes and problems present in many state-level open government data initiatives. The investigation focuses on whether the legal context of the state encourages proactive release of data in a reusable format, whether the data provided is sufficient in scope and format to support the sites' stated goals, and whether there is an observable political influence on state-level open data. This preliminary study suggests areas of future research for state open government data. Librarians and other information specialists may be able to contribute to state open data initiatives by using their expertise in knowledge organization and information literacy to enhance the existing sites.
2018-05-31: E3: RDM Services: Understanding User Needs
A grassroots approach to building a holistic service model at a liberal arts institution
Samantha Guss (University of Richmond)
Ryan Brazell (University of Richmond)
Andrew Bell (University of Richmond)
A recent series of brainstorming conversations between the IT, library, and academic technology units made visible a lack of coordination in how the University of Richmond (Virginia, USA) provides data-related support services to students. In response, a group of staff representing all three units, drawing on an existing community of practice, came together to imagine and operationalize a flexible, holistic service model that could be implemented from the ground up. Our new service model is meant to respond to the growing number of interdisciplinary programs on our campus and a disconnect between students' actual data literacy and their instructors' expectations. It serves dual purposes: provide a single point of entry for students needing data-related support (which is currently haphazard and unsustainable), and develop new and strengthen existing behind-the-scenes connections. In our presentation, we will focus on our approach, goals, and challenges encountered in building this model for a liberal arts institution.
Learning how to support RDM across Canada: A multi-institutional survey of health and medical researchers' RDM practices and attitudes
Francine Berish (Queen's University)
Melissa Cheung (University of Ottawa)
Alexandra Cooper (Queen's University)
Dylanne Dearborn (University of Toronto)
K-Lee Fraser (University of Alberta)
Matthew Gertler (Ryerson University)
Vincent Gray (Western University)
In anticipation of forthcoming federal funder requirements that could mandate Canadian researchers to create research data management plans and to archive data, a consortium of Canadian universities surveyed researchers on their RDM practices, attitudes, and interest in data management services. Expanding on the previous consortium efforts surveying researchers in the social sciences, humanities, engineering, and sciences, this presentation details the results of the health and medical science survey from a sample of consortium member institutions. Consortium members include: Dalhousie University, Laurentian University, McGill University, McMaster University, Queen's University, Ryerson University, University of Alberta, University of British Columbia, University of Ottawa, University of Toronto, University of Victoria, University of Waterloo, University of Windsor, and Western University. The amalgamated sample provides insight into both the disciplinary practices of health and medical researchers and the state of research data management across Canada. This national partnership will help inform libraries, researchers, and other stakeholders on the national, provincial, and local level to develop cohesive and reflective data services.
2018-05-31: E4: Joining Forces to Promote Research Transparency
Joining forces to promote research transparency
Harrison Dekker (University of Rhode Island (representing Berkeley Initiative for Transparency in the Social Sciences))
Amy Riegelman (University of Minnesota)
Norm Medeiros (Harvard College (representing Project TIER))
Limor Peer (Yale University (representing Curating for Reproducibility))
Florio Arguillas (Cornell University)
As stated in the "Manifesto for Reproducible Science," the overarching goal of the research transparency movement is to improve "the reliability and efficiency of scientific research" and "increase the credibility of the published scientific literature and accelerate discovery" (Munafò et al, 2017). Given the critical role that curated and shared data plays in conducting reproducible science, not to mention the training and advocacy that data professionals provide, it is clear that there is significant common ground shared by these two communities. Panelists will discuss research transparency and reproducibility by highlighting timely projects, defining the main terms (e.g., replicability, reproducibility), and sharing tools (e.g., Open Science Framework, Project Tier) that could assist in scholar workflow. Panelists represent a variety of disciplines and experience with reproducibility and will share ideas on how the various communities can best work together. Source: Marcus R. Munafò et al. "A Manifesto for Reproducible Science". Nature Human Behaviour, Vol. 1, No. 1. doi:10.1038/s41562-016-0021
2018-05-31: E5: Data Curation Tools and Infrastructure
Updating the classics: A new life for old data
Sharon Bolton (UK Data Archive)
In 2016, a great and unexpected storm hit the UK for the first time since back in the 1970s. In the aftermath, some learned researchers set out to find clues to what had changed between then and now, and how life was back then. The researchers asked an intrepid band of data curators to help them, because the old datasets were dusty and written in a language none of them could read. The curators laboured long and hard to write scripts and routines to convert column binary data to modern-day statistical formats, often with little information to go on. They knew about the Data Rescue Interest Group (https://www.rd-alliance.org/groups/data-rescue.html) and all the valiant work they do, but faced a different challenge. These data were born digital rather than paper-based, but still could not be easily used. After many hours of painstaking work, over 100 datasets were born again, helping the researchers find out about public opinion at the time of the "first" UK-EU Referendum of 1975. Come and hear about our adventures along the way, enhancing data science skills, upgrading documents, and developing format migration strategies
Developing a metadata-driven Data Curation Application in Dataverse
Amber Leahey (Scholars Portal, Ontario Council of University Libraries)
Building on efforts in the Dataverse community to enhance data documentation, input is required for the development of an interoperable, API-based, data curation application built using the DDI standard in Dataverse (an open-source data repository developed by Institute for Quantitative Social Science (IQSS) at Harvard University). As datasets are deposited by researchers for publication using the Dataverse system, dataset and variable-level metadata are stored to enable discovery, access, and preservation of data. To further this, we outline how a data curation application can be developed to support metadata editing and enrichment that will interoperate with components of the Dataverse system, including the Data Explorer visualization tool, and beyond. A web-based data curation application can work with tabular datasets uploaded to Dataverse, allowing curators to edit and create DDI standard metadata to assist with end-user reuse of the data. The focus for this curation tool will be to enable data depositors -- researchers, data managers, curators, etc. -- to create metadata at the dataset and variable-level, that will enhance reproducibility and reuse of datasets published. DDI supports metadata description of granular research data elements which can be stored as identifiable and reusable objects in Dataverse.
Marion Wittenberg (Data Archiving and Networked Services)
One of the projects for 2018 within the Consortium of European Social Science Data Archives (CESSDA) is the DataverseEU project. The aim of the project is to develop a service for CESSDA Service Providers, based on the Dataverse software developed by the Institute for Quantitative Social Science (IQSS) at Harvard University, which would be adjusted to the needs of the CESSDA community, especially small and/or emerging archives. To fulfill this, additional functionalities of the Dataverse software are planned to be developed within the project. Project activities in 2018 include the installation of the software using a Docker container on the CESSDA cloud platform, the enhancement of the system to enable connection with a chosen persistent identifier (PID) provider, the development of an API for the CESSDA controlled vocabularies, topic classification, European Language Social Science Thesaurus (ELSST), and the development of a multilingual user interface. Activities in 2019 are aiming to expand the metadata fields towards the DDI lifecycle, to develop different interfaces for national long-term preservation solutions and more. Project partners are: Slovenian Social Science Data Archives (Arhiv družboslovnih podatkov -- ADP), Austrian Social Science Data Archive (AUSSDA), Data Archiving and Networked Services (DANS -- the Netherlands), GESIS -- Leibniz-Institute for the Social Sciences (Germany), Swedish National Data Service (SND), and The Tárki Data Archive (Hungary). The presentation will focus on the ideas behind the project and the initial project results.
Open infrastructure for research data stewardship with Fedora
David Wilcox (DuraSpace)
Beyond the complexities faced by typical asset management or institutional repository systems, research data presents a number of complications, including complex hierarchies of related objects that must be modeled and displayed, a wider array of data formats that must be supported, and domain-specific metadata that is necessary to make data intelligible. Managing these complications often leads to software that is tailored to a particular domain, making it difficult to maintain or share. This presentation will provide an overview of how Fedora support supports research data management, including the roadmap for future development and integrations. Fedora is a flexible, extensible, open source repository platform for storing, managing, and preserving digital content, including research data. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations. Fedora supports research data management by providing key repository features such as support for millions of resources and files of any size, native linked data functionality, advanced data modeling, and preservation services. Fedora is also extremely well-suited to integrations with existing researcher workflows via a well-documented REpresentational State Transfer (REST) API and event-based messaging service.
2018-05-31: F1: Digital Humanities: Practical Lessons
Excavating narratives: An open database of an illustrated oeuvre from Myanmar (Burma)
Yin Ker (Nanyang Technological University)
Hedren Sum (Nanyang Technological University)
What narratives might illustration as a medium and body of works carry, in addition to those divulged by individual illustrations? In Myanmar (Burma), as in many economically developing countries, illustration was the principal site of avant-garde artistic experimentation in a heavily censored society into which the art market had yet to penetrate. Yet, it has been thus far omitted in the prevailing art historical narrative. Through the illustrations of Myanmar's most prolific illustrator and acclaimed trailblazer of modern art, Bagyi Aung Soe (1923-1990), AungSoeillustrations.org (ASi) recounts the emergence of a novel artistic consciousness between 1948, the year the country gained political independence, and 1990 when Myanmar began to open up after almost three decades of isolation under a purportedly socialist regime. In addition, data visualisations of the illustrations created over four decades uncover narratives of a country, a people and an artist's aspirations and travails. This paper proposes to address the significance and process of uncovering and reinterpreting the (hi)story of a country's modern art through the digitisation, visual analysis, ontology creation, data curation, database design, and data visualisation of 6,000 illustrations and 60 texts sourced from private and public libraries in Paris and Yangon since 2000.
How not to create a digital media scholarship platform: The history of the Sophie 2.0 project
Jasmine Kirby (Iowa State University of Science and Technology)
Since the mid-2000s digital platforms have emerged to take advantage of the capabilities of new technology to incorporate media content, tell nonlinear stories, and reinvent the book for the 21st century. Sophie 1.0, from the University of Southern California, the Institute for the Future of the Book (IFB), and computer scientists based in Europe, was an attempt to create a multimedia editing, reading, and publishing platform. Sophie 2.0 was an international collaboration between the University of Southern California and Astea Solutions in Bulgaria to rewrite Sophie 1.0 in the Java programming language. This research will explore how the Sophie 2.0 project was unable to become a viable and well-maintained open source product despite receiving over a million dollars in funding from the Mellon Foundation. Problems included the technological difficulty of creating an easy-to-use but completely customizable open source multimedia e-publishing platform, which was also compounded by competing visions over what this project was to be. Stakeholders did not demand a deliverable that actually worked. Funders seemed willing to overlook weaknesses in early releases for a more encompassing, if impractical, project. The computer scientists wanted to add the most features possible while, the IFB and USC Institute for Multimedia Literacy focused on creating a product based on the values of a future they hoped to create. Understanding what went wrong with Sophie 2.0 can help us understand how to create better digital media scholarship tools and to start much needed discussions about failure in the digital humanities.
Curiosity's fulcrum: Leveraging the citizen historian to explore and transcribe the history of the antebellum South
Brandon Kowalski (Cornell University)
Since 2015, the Freedom on the Move project has intrigued students and scholars alike with a harrowing glimpse into the North American slave trade. The project's mission since the beginning is to curate, transcribe, and make available an estimated 100,000 runaway ads that appeared in newspapers from colonial and pre-Civil War times. After a multi-year beta period which produced invaluable feedback and recently receiving generous funding from both the National Endowment for the Humanities and the National Historical Publications and Records Commission, we are ready to pull back the curtain and show what has been developed and what is in the works for a next generation transcription and education platform.
Practices of comparisons meet digital humanities
Johanna Vompras (Bielefeld University)
Madis Rumming (Bielefeld University)
The recently founded Collaborative Research Center (SFB 1288) at Bielefeld University in Germany represents a framework of 17 research projects in the field of the humanities addressing "practices of comparisons". Within the next 12 years, an interdisciplinary research team of more than 40 researchers -- drawn from the fields of history, literary studies, philosophy, historical image studies, political science, and law -- will investigate how historically variable comparative practices are integrated in routines, rules, habits, institutions, and discourses, as well as how these practices can create structures and initiate change. Under the umbrella of infrastructure, the SFB receives guidance by a dedicated INF project (Data Infrastructure and Digital Humanities), which is an integral part of the SFB being responsible for supervising all data- and information-related activities, ranging from data storage and retrieval to a web-accessible tool platform in combination with a computing cluster as the backbone. In this presentation, we show -- based on selected research questions from the SFB -- how this platform supports researchers by providing digital tools for digital humanities and how it supports new ways of publication, dissemination, and exploitation of scientific results produced by the SFB.
2018-05-31: F2: Data Loss, Data Rescue, and Digitization
Digital assets in Arabic countries: Workflow and research tools
Michael Nashed (Library of Alexandria (Bibliotheca Alexandrina))
In order to establish digital libraries with digital humanities tools for research in Arab countries, a great deal of initial investment is required. Developing online Arabic materials, as well as a wide range of universal information in Arabic, and facilitating their accessibility for Arab users may be a challenge and may seem problematic, but the beneficial outcomes are outstanding and will show significant results on educational, economical, and social levels within the region. This talk will present the development phases of digitizing Arabic materials in the region in addition to milestones and challenges for Arabic research projects.
Preserving our electronic government information heritage
Lynda Kellam (University of North Carolina at Greensboro)
Roberta Sittel (University of North Texas)
Access to government information is an integral part of our heritage in the United States. In the 1800s, the federal government began distributing publications to American libraries, and with the Printing Act of 1895, Congress created the Federal Depository Library Program (FDLP) with the purpose of distributing government information and data. Through the FDLP, our researchers have access to a wealth of government products, which provide in many cases the foundational materials for telling their research stories. In recent years, the rapidly changing information landscape has disrupted this workflow, especially as government information and data have become available in digital formats. Librarians and other information professionals from a variety of universities and organizations in the United States are undertaking a two year project to address national concerns regarding the Preservation of Electronic Government Information (PEGI) by cultural memory organizations for long term use. The focus of the PEGI project is at-risk government digital information of long term historical significance, especially for our future data storytellers. This session will provide attendees with an overview of the PEGI project and will encourage discussion about preservation of digital-born information and data that will help to inform the project's outcomes.
In the winter of 2017, academic librarians, information science professionals, data scientists, and stakeholders around the world realized great need to take immediate action regarding U.S. federal research data at risk from lack of long-term preservation and access. News about efforts and invitations to join forces for these data spread far and wide. These efforts exposed some fundamental considerations beyond the act of rescue. One year after the alarm call -- where are we and what have we learned? This presentation will take a look at different perspectives that permeate our understanding of data at risk within a profession at the intersection of so many disciplines. We will tell the ongoing story of the data rescue experience at Johns Hopkins and bring to light some of the many efforts that started. We will engage in a discussion about what comes next in an effort to understand not just what we did, but how we can frame sustainable future action.
This presentation will trace the history of the Alberta Hail Data Project and all the ongoing research data management steps involved in its collection, preservation, rescues, re-use, and enrichment for future use. Data "rescue and re-use" projects are a by-product of the rapidly-changing technological era in which we live and work. The Alberta Hail Project Meteorological and Barge-Humphries Radar Archive is an example of one such project. Between 1957 and 1991 the Alberta Research Council (ARC) collected valuable meteorological data in central Alberta. After 30 years of data collection, the project was defunded, original researchers departed, and the ARC was reorganized, putting the data immediately at risk. The University of Alberta Data Library and the ARC embarked on a project to rescue and preserve this "hail data". The data were archived, documented, and made accessible through a dedicated webpage. However, in 2015, the host server was shutting down, and the CDs holding the archived data had been stored in boxes for almost 15 years, putting the data at risk again. A new data re-rescue and reuse effort was undertaken.
2018-05-31: F3: Teacher, Tell Us a Data Story! Moving Toward Best Practices in Data Librarian Education
Teacher, tell us a data story! Moving toward best practices in data librarian education
Mandy Swygart-Hobaugh (Georgia State University)
Abigail Goben (University of Illinois at Chicago)
Margaret Smith (New York University)
Deena Yanofsky (Univresity of Toronto)
This panel explores approaches to teaching data librarianship to aspiring and current librarians. The panelists represent the range of modes of teaching data librarianship: semester-long courses (face-to-face and online), full-day workshops, and "one shot" sessions. Format: A moderator poses a question, each panelist responds, moderator solicits discussion from audience, moderator proceeds to the next question, repeat process. Questions may include the following: (1) What areas of "data services" do you cover in your teaching, and why? (2) What assignments/activities do you have students/attendees do? Is there one particular assignment/activity you find particularly effective, and why? (3) Pick one challenge of teaching data librarianship...how do you try to overcome them? (4) Data services cannot be done in a vacuum. How do you prepare data librarians to collaborate both inside and beyond the library? (5) If you had to name one core skill that data librarians should have, what would that be and why?Outcome: Summarize discussion to share with IASSIST and ACMLA communities, and gauge interest kick-starting an interest group to deliniate best practices in data librarian education.
2018-05-31: F4: Instruction: Data Management Skills
The thrilling tales and derring-do of data librarians and digital scholarship
Heather Whipple (Brock University)
Following Joque and Partlo (IASSIST Quarterly 38.2) and their respective discussions of different conceptualizations of data and of data librarians as translators, this paper considers the stories told about what data librarianship has been and where it might go, focusing on relationships within the library and on the growth and development of digital scholarship as a service or focus in libraries. While some may embrace the wide-open potential for what might constitute "data," others (including library colleagues) are just as likely to turn away out of disinterest or fear. Consequently, the most important stories we need to tell may be those that reveal data to be both exciting and familiar to resistant coworkers, researchers, and students. Digital scholarship shares some of the same challenges: confusion over what it might be and who is or isn't involved: arguably, just as everything can be data, these days everything can be at least partly digital (even those working with physical objects digital tools to share their stories). Like data, digital scholarship may benefit from curated narratives that clarify what and why it is worth libraries' time to support and promote, and how data librarians are part of that story.
Preserving the agricultural data story at the Ontario Agricultural College
Michelle Edwards (University of Guelph)
Carol Perry (University of Guelph)
Established in 1874, the Ontario Agricultural College (OAC) at the University of Guelph maintains a broad base of research activities, ranging from social and environmental conditions of rural communities, to health benefits of new foods, to more traditional plant and animal based agricultural research. Increasingly, OAC is recognized as the "research powerhouse" at the university, supported by the Ontario Ministry of Agriculture, Food and Rural Affairs and the three major federal funding agencies: Natural Sciences and Engineering Research Council of Canada (NSERC), Canadian Institutes of Health Research (CIHR), and Social Sciences and Humanities Research Council (SSHRC). With a broad spectrum of research, come a variety of research protocols, data formats, and outputs. Labs and research institutes develop their own data management processes, many without a preservation and access plan. To provide a cohesive approach to managing research projects, we developed a series of research data management (RDM) workshops for the OAC community. This series was piloted during the summer of 2017, and, based on its popularity, was offered again in fall 2017 and winter 2018. This presentation will discuss our tailored approach to teaching RDM along with impressions and feedback from participants. Connecting to the community through RDM is leading to hidden troves of historical agricultural data, allowing us to preserve the data story of OAC for almost 150 years.
Sustaining our data through data management - it's easier than you think!
Jane Fry (Carleton University)
Carol Perry (University of Guelph)
"Help! I've got these boxes of interviews and data records, and I don't know what to do with them! I want my data to be preserved." If this sounds familiar, then this presentation is one you will want to attend. We will show you an aid that will guide you through the steps necessary to curate, archive, and share your data. It is Portage, a Canadian website that is "dedicated to the shared stewardship of research data in Canada". There are many training resources available that allow you to guide yourself through the research data management process, or to obtain help from others. We will highlight the latest developments in training resources and in the DMP Assistant. You will be amazed! Join us to learn about the latest resources and how they will further your journey in ensuring that your data will last through the ages.
A follow-up study on data management and data sharing training in graduate education in the social sciences
Ashley Doonan (ICPSR, University of Michigan)
Evan Cosby (ICPSR, University of Michigan)
Previous research suggests that many social science graduate programs are not providing data management or sharing training necessary for producing effective and ethical researchers. Understanding effective data management and the importance of data sharing is crucial and should be integrated into graduate education. This is especially important to the mission of data repositories and data libraries wishing to limit obstacles to data sharing and to respond to evolving research needs and data trends. The current study expands on our previous assessment of how social science graduate programs include formal and informal training in data sharing, management, and the use of repositories by conducting a syllabus review and analysis. Syllabi were gathered from databases such as OER Commons and from department websites from a random sample of American social science graduate programs. Text analysis was used to identify the inclusion of content within coursework. Consistent with our earlier work, we found that social science graduate programs often did not include content about data management or data sharing within their coursework. This new study provided insight into the need for supplemental coursework or certification in data management skills geared towards graduate students. We discuss potential strategies to implement this instruction.
2018-06-01: G1: Collaborative Approaches to Metadata Infrastructure Development
The moments of joy and despair of the CESSDA metadata project
Mari Kleemola (Finnish Social Science Data Archive)
Kerrin Borshewski (GESIS - Leibniz-Institute for the Social Sciences)
The CESSDA (Consortium of European Social Science Data Archives) Metadata Management project (CMM1) produced in May 2017 the CESSDA Metadata Standards Portfolio Version 1, containing the CESSDA Metadata Core Model and the multilingual CESSDA Controlled Vocabularies. They are based on DDI Lifecycle 3.2 and DDI CVs. Right now, the second phase of the project is on its way. This presentation will provide a rough overview of the extended CESSDA Core Metadata Model, the CVs and accompanying documentation and guidance. We will show how the model supports other CESSDA services like the Product and Service Catalogue and the Euro Question Bank. CMM has its roots in previous metadata work within CESSDA and individual CESSDA Service Providers. The challenge is to look both backwards and into the future. Add to the equation project partners from eight different organisations and countries, the quickly changing metadata world, CESSDA's evolution from a network of European data archives into a legal entity and large-scale infrastructure, and you have an interesting setting with plenty of potential and plenty of potential pitfalls. We will discuss our metadata journey, experiences and lessons learned, and share the moments of joy and despair we've experienced along the way.
Implementing DDI lifecycle to achieve interoperability between projects at the GESIS - Leibniz-Institute for the Social Sciences Data Archive for the Social Sciences
Esra Akdeniz (GESIS - Leibniz-Institute for the Social Sciences)
Kerrin Borshewski (GESIS - Leibniz-Institute for the Social Sciences)
The GESIS -- Leibniz-Institute for the Social Sciences Data Archive has developed various kinds of tools for services of discovery, documentation, data access, etc. These provide many possibilities for development and diversity, but also some for disunity and overlap. Problems of disunity and overlap among others concern the area of metadata. So far, different DDI formats have been used at GESIS. In order to promote interoperability between different GESIS applications, we have created a common metadata schema in DDI Lifecycle format. Also, the integrative metadata efforts have served as an orientation for Europe-wide projects (e.g. CESSDA Metadata Management and Euro Question Bank). For this presentation, we intend to provide an overview of our metadata schema. We will show which aspects of the data documentation it contains and which other metadata standards and schemas we took into account (DataCite, Dublin Core etc.). To demonstrate, how the common metadata schema is put to use and what its benefits are, we will describe for some GESIS applications (e.g. DataCatalogue, da|ra, etc.) how they implemented the DDI Lifecycle mapping recommendations. Additionally, we will regard how projects outside of GESIS have benefited from this work and take a look at the challenges that arose during and after the creation process of our metadata schema.
German Network for Educational Research Data: Building a federated research data infrastructure for educational studies in Germany
Karoline Harzenetter (GESIS - Leibniz-Institute for the Social Sciences)
Marcus Eisentraut (GESIS - Leibniz-Institute for the Social Sciences)
Reiner Mauer (GESIS - Leibniz-Institute for the Social Sciences)
Doris Bambey (German Institute for International Educational Research)
Alexia Meyermann (German Institute for International Educational Research)
In the multidisciplinary field of educational research in Germany a diverse range of research data infrastructures and services has developed over the last decade. These differ with regard to topical scope, types of data, range of services, and the scientific communities they address. This manifold but rather fragmented service infrastructure is in some regards dysfunctional from a researcher’s perspective. To overcome these barriers to optimal use of existing data infrastructures, the German Network for Educational Research Data (a project funded by the German Ministry of Research) aims at building up a federated infrastructure for research data offering harmonized support services on the basis of common standards and procedures. The network is based upon a collaboration of three established data centers. In its first funding period, the project focused on harmonizing metadata schemes and standardizing curation procedures for offering joint archiving services and on advancing comprehensibility, quality, and visibility of the studies and data within the field. In its current funding period, the project aims to broaden the network of participant data centers in educational science and adjacent disciplines. In our presentation, we will discuss our concept of a federated infrastructure and highlight challenges, opportunities, and risks.
2018-06-01: G2: Using Digitization and Visualising Tools to Develop Interactive Data Stories
Air photo digitization takes flight: Niagara's journey from paper to points to digital mosaics
Sharon Janzen (Brock University)
As snapshots in time, historical air photos capture undisputed changes in our landscape. This story begins with a century old air photo, tattered and torn, and begs for preservation. Thousands of air photos later, these have transformed from 9x9 inch contact prints to seamless digital mosaic datasets spanning the Niagara region. Digitization is not only a hot topic for data creation but also an innovative approach to preserving our local heritage resources. Elements of the process include scanning, mosaicking, geo-referencing, and publishing to online mapping environments. Many of these processes foster a collaborative workflow as thousands of traditional resources are scanned for archiving and repurposed into far-reaching digital exhibits. The Niagara Air Photo Index offers a point-and-click environment for researchers to gather information, access individual air photo images, and browse regional mosaics that span 50 years of imagery. What was once a fragile historical document has become a cutting-edge web resource with many stories to tell! This presentation will describe the workflow, processes, and resurfaced stories that this creative initiative has fostered.
Data storytelling techniques: Presenting survey reports with hundreds pages in short time
Davaasuren Chuluunbat (Mongolian Marketing Research Organization)
Amgalanbaatar Dagdandori (Mongolian Marketing Consulting Group Company)
One key skill for data analysts that is often overlooked: the ability to communicate findings clearly and effectively. If we as data analysts cannot deliver knowledge extracted from data and persuade users and audiences, then our analysis reports will collect dust on a shelf. The solution is data storytelling: using the power of narrative to communicate your findings in a way that resonates with our stakeholders. Storytelling is the mechanism used for sharing "knowledge" in the most engaging, memorable, and persuasive way possible, a technique that dates back to Aristotle's rhetoric. This presentation will show applications of Aristotle's pathos, ethos, and logos to current data-based market research presentations. Analysts and organizations conducting social research base their insights on large, systematic data collection and study. The full representation of this work is often a voluminous report, but the most effective communication of that work often comes in a brief summary presentation that can be enhanced by data storytelling. We will review examples of how to use data storytelling techniques effectively when presenting marketing and social research reports in the context of the public opinion and focus group research conducted in Mongolia by MMCG Company.
Presenting content using Story Maps has been increasingly popular with governments at all levels, NGOs and IGOs, community interest groups, members of the press, as well as others. Many academics have certainly used the resource in their teaching and assignments, but how well received is it for scholarly publishing? The various story map templates allow a user to include as much text as one would desire; however, doing so meaningfully and creatively remains a challenge. In 2016, the presenter collaborated with colleagues in charge of the University of New Brunswick (UNB) Libraries' Loyalist Collection in the hopes of producing a work that would result in an attractive solution to their need to publish text-heavy biographies for New Brunswick Loyalists and a proof of concept to pique the interest of other scholars at the UNB. This presentation examines the process and end result of the New Brunswick Loyalist Journeys project, including successes and failures, and reports on the status of the presenter's attempt to lure more colleagues to this resource.
2018-06-01: G3: Beyond the Numbers: Building Collections of Non-Numeric Licensed Data
Beyond the numbers: building collections of non-numeric licensed data
Mara Blake (Johns Hopkins University)
Bobray Bordelon (Princeton University)
Karen Hogenboom (University of Illinois)
Laura Wrubel (George Washington University)
Joel Herndon (Duke University)
How does your library collect data? Many conversations around collecting data in the library center around archiving research data, but libraries are also building collections of licensed and purchased data for campus use. Continuing the conversation from the 2017 IASSIST panel on building numeric data collections, this panel will share best practices and engage in discussion on emerging issues related to collecting non-numeric data. The panel will provide an overview of the landscape of the library collections of licensed data and then focus on three types of non-numeric data: text, social media, and geospatial. Panelists will compare the similarities and differences of these types of data to collecting numeric data. The panel will also leave time for attendees to participate in guided discussion around issues raised by the panelists.
2018-06-01: G4: Challenges in Harmonizing Geospatial Metadata
Challenges in harmonizing geospatial metadata
Karen Majewicz (Univesity of Minnesota)
Andrew Battista (New York University)
Kimberley Durante (Standford University)
Taylor Hixson (New York University, Abu Dhabi)
Melinda Kernik (University of Minnesota)
The development and adoption of online data distribution platforms has vastly improved access to geospatial resources. Metadata is the key to discovery of these resources, but the accompanying geospatial metadata ecosystem lags behind in terms of tools, authoring practices, and dedicated development efforts. As a result, it suffers from interoperability issues due to competing standards, inconsistent applications of schemas, and varying levels of descriptive quality. Our presentations will describe research into mitigating these issues and our progress on establishing consortia approaches to the creation of a shared corpus of geospatial metadata records. This work emerges from the growing community of developers and users of open source geospatial data discovery applications, particularly GeoBlacklight. We will provide an overview of the GeoBlacklight metadata schema and the main challenges that obstruct sharing metadata between projects. We will also introduce a proposed metadata scoring strategy, which provides a framework to evaluate its quality and can serve to inform decisions about cross-institutional metadata record ingest. The presentations will be followed by a moderated discussion on recommendations for remediating metadata, how this work can improve user experience, and opportunities for collaboration.
Danielle Gomes (Administrative Data Research Network, University of Essex)
Louise Corti (UK Data Archive)
In the UK, the world of national and local administrative data for policy still has many untold tales. Intriguing and informative stories are locked or buried away where researchers may not find them, and their keepers may not be willing to hand out their precious treasure, just yet. But all is not lost...the data enablers are here to help unlock the rich resources and maybe rescue and clean the gems and pearls. In other words, help is needed to make these data for policy, "research ready". Our data heroes believe that building trust and transparency is the key to success. Using the formalised data appraisal techniques and trusted data access governance frameworks used by the UK Data Service, we plan to build a proof-of-concept at a local level that can be extended to the national level. The "Data Enablers" work with the Challenge Lab research initiative, a partnership between the University of Essex and Essex County Council, aiming to improve the services provided by the public authority, such as Adult Social Care, Children and Families. Come and find out more about who these enablers are and what they are doing. We hope this story has a happy ending!
Restricting data's use: A spectrum of concerns in need of flexible approaches
Dharma Akmon (ICPSR)
Susan Jekielek (ICPSR)
As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR's data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants' confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. We will describe data producers' range of needs to restrict data access and discuss how systems can better accommodate these needs.
The 4 Rs: Researcher credentials for replication and re-use of restricted data
Allison Tyler (ICPSR, University of Michigan)
The value of research data comes not just from the original results of the research but from the utility of the data over time. Reuse of restricted data is challenged by the legal and ethical requirements to protect the research subjects, including lengthy data access request timeframes and oft-redundant identity verification processes within and between restricted data repositories. This paper will present the results of an organizational study of 23 U.S. and international restricted data services and repositories' data access processes that will lead to the development of a transferable researcher repository credential. The results of this study highlight repository practices, including the classification of data at different levels of restriction, expected researcher training and experience, and tracking of prior data use, that require reconciliation in order for the access credentials of one repository to be accepted at other repositories. Our analysis identifies inconsistencies in language and terminology as well as requirements. We present these points of conflict and provide suggestions for their harmonization.
'SDC for dummies': How to safeguard the safe release of statistics from confidential data
James Scott (UK Data Archive)
Richard Welpton (Cancer Research UK)
Arne Wolters (The Health Foundation)
Christine Woods (UK Data Archive)
Carlotta Greci (The Health Foundation)
Analysts are increasingly demanding access to more detailed data about individuals and organisations. However, the level of detail is such that these data are often considered "personal, sensitive, or confidential"; therefore access is restricted to a controlled environment or "safe setting". Many safe settings have statistical disclosure control (SDC) systems in place to review statistical results, to mitigate the risk of releasing confidential information and to prevent a data subject from being identified. While SDC is being used by many organisations in the UK, there is no common code of practice for the review of statistical results. Since 2017, the UK working group for Safe Data Access Professionals has been developing a handbook to help analysts and staff checking statistical results in a safe setting, by offering practical examples and tips. It also provides guidance to organisations on how to implement an efficient SDC processes. This presentation will give an overview of the contents of the SDC handbook, the general approach and aims behind this work and potential uses within the UK community.
Research data management requirements in social and behavioral sciences academic journals
Spencer Acadia (University of Kentucky)
Abigail Goben (University of Illlinois at Chicago)
Considerations of sharing research data when publishing in academic sciences journals have become increasingly common in the last decade thanks to mandates by national and international granting agencies. Yet, similar directives for social and behavioral sciences journals have been slow to gain traction. This presentation explores the presence of research data management language in manuscript submission directions for the top 136 academic journals in sociology, psychology, political science, and general social sciences as determined by 2015/2016 Scimago and Eigenfactor journal rankings. Official journal websites were examined via content analysis for specific data policies, as well as general mentions of data sharing and data management. The presentation makes unique contributions to existing data management and scholarly communications literature by: 1) drawing attention to data language used by publishers of social and behavioral sciences journals; 2) examining existing data policies and requirements for publication from journals; and 3) discussing how brief or non-existent journal data policies interrupt the data lifecycle.
Splendor in the grass? Troubled tales of turf grass dissertation data discoverability
Terrence Bennett (The College of New Jersey)
Shawn Nicholson (Michigan State University)
A dissertation tells the story of a scholar's entrée into academic life. As the major distinguishing feature of education, the dissertation serves the dual role of making a significant contribution to knowledge and providing training in scholarly techniques. New scholars wish to expose their ideas, but what barriers do dissertators face in dissemination of dissertations and attendant data? And how do libraries integrate available datasets into retrieval services? Are there common schemas to promote linkages across the scholarly communication ecosystem? For this presentation, we revisit the oft-told tale of exploration into the availability and findability of dissertation-related data. To tell our version of this story, we focus on the arcane domain of turfgrass science -- because of its appeal beyond the academy to a large multinational practitioner base--to investigate dissertations across an international sample of research libraries. We examine their presentation of dissertations and, perhaps more importantly, the discoverability and availability of related data files. We then examine and analyze the means by which the data files are attached to the dissertations. Results will inform data literacy praxis and curation protocols, and address concerns around ensuring that these scholars' early stories can be told to future generations.
Data policy implementation: Practices and perspectives from the scholarly publishing community
Mandy Gooch (Odum Institute for Research in Social Science)
Thu-Mai Christian (Odum Institute for Research in Social Science)
Scholarly research is only as strong as its underlying data and code. Increasing access to scholarly research through open access efforts in the scholarly community aids in dissemination of research; however, limited access to the data behind the research is still a hurdle that publishers and journals are attempting to address. In the social, medical, and life sciences, journal editors are beginning the process of administering data policies. These policies can require authors to submit their data and analysis code alongside their manuscripts -- sometimes to be shared in a repository and sometimes to also be verified before publication. As part of a grant-funded project, the Odum Institute for Research in Social Science and Dryad administered a survey and conducted interviews with journal editors and authors to better understand data policy workflows, identify pain points, and highlight streamlined processes. In this presentation, we will discuss the results of this research which contributed to the development of an implementation model for journals interested in enhancing their current policies or developing new data policies.
Data policies of highly ranked social science journals
Julian Gautier (Harvard University)
Sebastian Karcher (Syracuse University)
Gerard Otalora (Harvard University)
Dessislava Kirlova (Syracuse University)
Mercè Crosas (Harvard University)
Abigail Schwartz (Harvard University)
By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor in the movement to improve the openness of data and the reproducibility of research. But how many social science journals encourage or mandate that authors share the data supporting their research findings? What influences these journals' decisions to adopt such policies and instructions? And what do those policies and instructions look like? We will discuss the results of our analysis of the instructions and policies of 291 highly ranked journals publishing social science research, where we looked for and measured the quality of journal data policies and instructions across 14 variables, such as when and how authors are asked to share their data, and what role journal ranking and age play in the existence and quality of data policies and instructions
The myth of the data scientist: The importance of teams in providing data science support
John Cocklin (Dartmouth College)
A strategic look at the need for teams in supporting data science programs. We begin by looking at the influential (and well known) 2012 Harvard Business Review article, "Data Scientist: The Sexiest Job of the 21st century," and at the author's later (and less well known) regrets. Next we will discuss some of the challenges support services face in developing strategy, and the importance of teams in providing data science support. At the end, to help with planning for the future, we will take a look at the forecasts for the field of data science from sources such as Gartner, Frost and Sullivan, and Forrester.
Through time and space: Data as a narrative device
Marcela Isuster (McGill University)
Michael Groenendyk (McGill University)
With the rise of digital scholarship, the way students interact and produce information has changed, reframing the meaning of storytelling. In the context of a humanities course, two librarians were tasked with finding and teaching tools and resources that would allow students to produce multimedia storytelling projects using spatial and temporal data. While the process was challenging and learning curves were steep, students learned to discover data, create metadata, and combine multimedia information to create a compelling narrative. And they learned to code too! This presentation will discuss the process of introducing humanities students into the world of data. It will explore balancing learning curves with project needs, and opportunities to integrate data into the humanities curriculum. The presenters will also discuss instruction challenges and best practices for librarians and instructors wanting to participate in similar projects.
(Less) naked and (less) afraid: Giving graduate students the clothes and confidence for data success
Mandy Swygart-Hobaugh (Georgia State University)
The premise of the reality television show Naked and Afraid -- two people released into the wilderness, with no clothing and minimal supplies, tasked with surviving for 21 days -- is an apt analogy for the experience of greenhorn graduate students: dropped into an unfamiliar academic wilderness with little to no survival skills, in a fleeting two-to-five years they are expected to emerge holding a graduate degree. This presentation offers (1) an overview of the services Georgia State University Library's Research Data Services team provides to help social science graduate students with their data needs and (2) an examination of one year of data services consultations with graduate students that further elucidates their pressing data needs and how Georgia State University Library is endeavoring to meet those needs. This close look at one library's experience demonstrates the potential for academic librarians to focus data services efforts on this vulnerable student population, helping them gain the survival skills to feel less naked and less afraid so they may emerge from the graduate-school wilderness clothed and confident in their data-related abilities
Introduction to the community standards for 3D data preservation (CS3DP) project
Jennifer Moore (University of St. Louis)
Adam Rountrey (University of Michigan)
Hannah Scates Kettler (University of Iowa)
Community Standards for 3D Data Preservation (CS3DP) is an IMLS funded project which aims to pull together 3D model practitioners, librarians, and curators to build consensus on 3D preservation needs and work toward standards. Two national forums will take place in 2018, the first of which will occur in February. That forum will focus broadly on topics identified in a community survey: data management, preservation best practices, copyright and ownership, metadata, access, and discoverability. The forum will establish working groups and set the stage for work to follow. This presentation will report on the outcomes from the first forum, further developments, and discuss topics for the summer forum.
CoreTrustSeal: From academic collaboration, to sustainable services
Hervé L'Hours (UK Data Service)
Mari Klemmola (Finnish Social Science Data Archive)
Lisa de Leeuw (Data Archiving and Networked Services)
The design and delivery of sustainable services that provide a foundation for a range of scientific and data management infrastructures while reducing costs and avoiding duplication of expenditure is a frequently discussed topic at the national and transnational level. The CoreTrustSeal (CTS), launched in 2017, manages requirements and offers core level certification for trustworthy digital repositories (TDR) holding data requiring long-term preservation. This paper traces the journey of the CoreTrustSeal through the Data Seal of Approval (DSA), ICSU World Data System (WDS), Research Data Alliance (RDA) working groups and community engagement, towards becoming a sustainable service supporting global data infrastructure. We outline the design and delivery of the service, current activities, the benefits of certification to a range of communities, and future plans and challenges. As well as providing a historical narrative and current and future perspective, the CoreTrustSeal experience offers valuable lessons for those developing standards and best practices or seeking to develop cooperative and community-driven efforts that transcend disciplines and the academic, governmental, and private sectors data curation spheres.
Data visualization classes often take the form of teachers showing students charts and graphs and describing when to use one graphic versus another. Another common class type is for teachers to introduce students to use a new piece of software to create a visualization. Others allow students to interact with data in a classroom to get real-time feedback from their classmates or instructors about the visualizations they are creating or improving. What format works best when some of the class is attending in-person and some of the class is online? How do you create a class in which everyone is engaged with the course material? Mentimeter is a polling tool that students use on their phones to answer multiple choice questions or take quizzes during a class. It's allows interaction in a fun, engaging way, but is it as effective as PowerPoint? This presentation will discuss the use of Mentimeter as a teaching tool in data visualization classes.
A tour of the National Archive of Computerized Data on Aging to the rhyme of When I'm 64 by The Beatles
Kathryn Lavender (National Archive of Computerized Data on Aging, ICPSR)
The goal is to familiarize the audience with the variety of data on aging that NACDA houses, and promote gerontological research. This presentation will introduce the National Archive of Computerized Data on Aging (NACDA) at ICPSR (funded by the National Institute on Aging) to the IASSIST audience by highlighting collections curated by NACDA, via The Beatles song "When I'm 64". Tentative outline; lyrics with possible accompanying data references: • "When I get older" -- retirement planning from Midlife in the United States (MIDUS 3), 2013-2014 -- Aggregate Data • "Losing my hair" –hair loss; graying hair from Images of Aging in America, 1994 (ICPSR 3094) • "Will you still be sending me a Valentine" -- romance among elderly respondents from Family Exchanges Study Wave 1 -- Offspring Dataset • "Birthday greetings bottle of wine" -- birthdays celebration data from the Health and Aging in Africa: A Longitudinal Study of an INDEPTH Community in South Africa [HAALSI] Baseline; wine in data from Study of Women's Health Across the Nation (SWAN), 2003-2005: Visit 07 Dataset. Survey: Agincourt, South Africa, 2015 • "Will you still need me" -- partner support from the National Social Life, Health, and Aging Project (NSHAP): Wave 1 (ICPSR 20541)
Secure remote access to confidential data in France
Yacine El Bouhari (Groupe des Écoles Nationales d'Économie et Statistique/Centre d’Accès Sécurisé aux Données)
CASD (Secure Data Access Centre, "Centre d'Accès Sécurisé aux Données") is a service designed to allow researchers to remotely and securely work with highly-detailed microdata. Such data are confidential insofar as they are often subject to specific secrecy or confidentiality measures: professional, statistical, bank or medical confidentiality; business or tax secrecy; etc. In brief, the data available through CASD are very precise, directly or indirectly re-identifying, and constitute a broad mine of detailed information. Making such data available requires the highest standards of security to guarantee confidentiality and traceability. Today, CASD provides access to data from the National Statistical Institute (Insee), the Ministries for Justice, Education, Agriculture, and Finance, and also to more and more health microdata. New data sources are constantly being added to the CASD catalogue, thus enlarging the field of investigation offered to research communities. CASD users have various profiles: researchers, consultants, data scientists, physicians working on health data, geo-statisticians, etc. The remote access technology allows widening access to microdata to more and more researchers all across Europe, and for more and more data sources made available in a centralized place, facilitating comparison and data matching.
The CESSDA online expert tour guide on Data Management
Ellen Leenarts (Data Archiving and Networked Services)
Marion Wittenberg (DANS)
Many members of the Consortium of European Social Science Data Archives (CESSDA ERIC) host workshops on how to manage, store, organise, document, publish and reuse data for researchers in the social sciences. The CESSDA online research data management (RDM) expert tour guide is an online tutorial based on the archives' experiences when engaging with researchers in their research data management workshops. The guide is based on the research data life cycle and can be used by individual researchers as self-study but also as part of an online or face to face workshop on research data management or as part of a university offering. When it comes to RDM, many sources of information exist, but researchers often need to comply with specific requirements, e.g. related to funders, national legislation, and common practices within their particular domain. The tutorial describes the diversity that can be found in Europe with respect to the practical implementation of the research data life cycle. There is another storyline that describes what a researcher needs to do when creating and adapting a data management plan in the various stages of their research, taking into account discipline-specific context (e.g. privacy issues, data documentation). The guide also features practical examples and checklists. After the pecha kucha the audience will have an overview of the content of the online guide. Researchers will be able to decide if it would be useful as self-study material and trainers will be able to decide if it would be a useful resource for their trainings.
Mapping LGBTQ St. Louis: Stitching together histories using archival resources, community memory and GIS
Jennifer Moore (Washington University in St. Louis)
Mapping LGBTQ St. Louis describes the spatial history of the LGBTQ community in St. Louis over a fifty-year period. Beginning in the post-WWII era, the project team, led by Andrea Friedman and Miranda Rectenwald, stitched together histories using archival resources and community memory. A project that began with fewer than 100 bar locations grew into over 800 locations of socializing, activism, erotic encounters, and experiences of violence. The project is a part of Washington University's Divided City Initiative, funded by the Mellon Foundation. The brief presentation will describe turning analogue resources into digital data to create an interactive map narrative guided by interpretive essays. Workflows, successes and challenges will be discussed.
Chasing a wild goose: The quest for good data management in a Software as a Service world
Alicia Hofelich Mohr (University of Minnesota)
From cloud storage to online data collection platforms, software as a service (SaaS) tools are becoming core parts of the research workflow. These tools offer powerful and convenient services at low cost, but come at the expense of control. Trying to maintain precise and consistent workflows using these tools can often feel as futile as a wild goose chase. SaaS companies host tools centrally, and can (and do) change functionality without warning. Parts of a workflow or documentation may break down when interfaces, buttons, or functionalities are "updated", providing a challenge for data management and reproducibility. This PechaKucha will describe highs and lows of pursuing good data management in SaaS tools -- maintaining the hope that we can at least stay close to, although perhaps never catch, the goose in the end.
Disney does data: Translating the tech to expose the value
Gemma Hakins (UK Data Archive, University of Essex)
Disney, Pixar, and Aardman Animations -- the formula is universal: hero, sidekick, villain, hope, magic, transformation, love, risk, setbacks, sacrifice, eventually ending happily ever after. When communicating with the myriad of stakeholders about data, technologies and methods, the fairy-tale ending isn't always inevitable. Effective strategic communications have the potential to contribute to the foundations of strong data services and help enable the successful impact of a project. As data scientists are the storytellers, so communications professionals are the translators. Like animations, there are complex stories to tell in a language each audience understands -- from funders to researchers, ministers to the public. Drawing on successes and lessons from prominent data projects including UK Data Service Big Data work and Administrative Data Research Network, this presentation talks about hiding the technology to expose the value of your work through simplification, visual impact, articulating benefits and multiple tones. Like animations, projects must be explained well to capture attentions and imaginations. Creating a step change in awareness, attitudes or behaviours whilst enhancing the reputation of people, organisations and partnerships significantly contributes to future funding opportunities. To unlock the true potential of your data, technology and tools, see what your communications can really do.
Researchers seeking single sheets of large paper map sets have benefited little from the development of library union catalogs, discovery layers, and other digital finding aids. Finding specific sheets is critical for researchers conducting fieldwork anywhere in the world in nearly any discipline. Accurate geographic tagging of individual sheets, however, presents a vast metadata creation burden and interface challenges. Several efforts have tried and failed over the years to make a self-sustaining multi-institutional geographically-indexed map database. This presentation will describe a project currently underway to make and freely share paper map set shapefiles between libraries. Using the fishnet function, this project has made and shared over a hundred standard map indexes. Each institution need only mark their individual holdings.
The upside down of data: Honest community engagement
Thu-Mai Christian (Odum Institute for Research in Social Science)
Mandy Gooch (Odum Institute for Research in Social Science)
Building off of our poster presentation earlier at IASSIST 2018, this PechaKucha will share the narratives of our community members -- their honest accounts of what has and hasn't worked in their efforts in areas such as outreach, service and tool development, data management, training, and workflow implementation. These stories will hopefully inform our community and build awareness of the methods and strategies that did not work at other institutions. In talking about the "upside down" of data, we might just be able to keep from repeating the same mistakes, thereby expanding and improving our services more efficiently. Let's flip these data stories upside down and be open about our failures and misses so we can improve our efforts going forward.
The curse of reference rot: When Web data are here today but gone tomorrow
Peter Burnhill (Independent - ex University of Edinburgh)
This is an alert about the effect of link rot and of content drift for managers of research data, and a pointer to how transactional archiving should accompany references made to web resources. The Hiberlink Project analysed citations made in journal articles and in c.7,000 e-theses to establish the presence of reference rot. Not only were links broken (404 etc.) but content at the end of URLs included in scholarly statement changed over time. Examples include: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167475 https://insights.uksg.org/articles/10.1629/uksg.237/ As more government publications and news media are issued on the web, reference rot undermines integrity in the evidence base. In his blog post following a talk, The Amnesiac Civilization, David Rosenthal notes "The terms "link rot" and "content drift" suggest randomness but in many cases they hide deliberate suppression or falsification of information." http://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html What needs to be investigated and understood is how this undermines research data and codebooks, and of course the "semantic web". All links to the web are subject to rot over time, just as fruit on a tree or fish taken at sea. The key is take snapshots of what is regarded as important -- at the time -- and commit those into network-accessible long-term archives. Routine web-archiving is not enough.