Plenary 1: Data Writ Large: Technology, Culture and Collision
Daniel Reed (University of Iowa)
Big data and deep learning are the memes of the day, as we shift from a world where data was rare, precious, and expensive to one where it is ubiquitous, commonplace, and inexpensive. Massive digital data, powerful multilayer classification networks, and inexpensive hardware accelerators are bringing new data-driven approaches to discovery, challenging some long held beliefs, and illuminating old questions in new ways. Spanning our social experiences (think Netflix or Facebook), business and economics (automation of both physical and intellectual tasks), and research and scholarship (manuscript analysis via feature extraction or digital analysis of large text corpora). Like any new tool or technology, big data challenges and reshapes both our social and technical expectations. This talk will discuss these challenges – how we reached this point — and where we are likely to go, including privacy and security, streaming data (sometimes called the Internet of Things), and the likely and possible futures of current and future data technologies.
Plenary 2: Digital Agriculture: The Midwest Big Data Hub and Global Food Security
Jennifer Clarke (University of Nebraska Lincoln)
This presentation will describe the Midwest Big Data Hub and its focus on Digital Agriculture as a critical challenge for the 21st century. If the global population reaches over 9.5 billion by 2050, as expected, it is estimated that world food production must increase by 70%. Meeting these projected demands for food and feed will be a challenge while facing dwindling natural resources and climate variability. Significant advances in basic and applied interdisciplinary research, as well as in data and computational capabilities, must be achieved in order to better understand agro-ecosystems and leverage the services they provide. Underlying this economic and societal challenge are multiple data challenges, from data sharing to data privacy to data fusion. Brief overviews will be given of several high-impact data projects involving the Midwest and Great Plains regions, and ways to participate in the Big Data Hubs
Collecting and Storing Data from Internet Based Sources
Peter Smyth (UK Data Service)
Many websites allow researchers and developers to download data from them using their Application Programming Interface (API).nbsp; This data is often in formats that social scientists are unfamiliar with (e.g. JSON).nbsp; Downloaded data can be processed immediately or stored in a database for later processing in a package like R or Stata.nbsp; Data can be collected at regular intervals over a period of time, using the built-in functionality of the Windows or Linux operating systems.This introductory workshop is aimed at anyone interested in collecting data from the internet via APIs.
Curating for Reproducibility: Why and How to Review Data and Code
Limor Peer (Yale University)
Florio Arguillas (Cornell University)
Thu-Mai Christian (Odum Institute, UNC Chapel Hill)
Joshua Dull (Yale University)
Developments in digital scholarship, advances in computational science, mandates for open data, and the reproducibility crisis require more attention to code as a research object. We consider activities that ensure that statistical and analytic claims about given data can be reproduced with that data, curating for reproducibility (CURE). This 3-hour workshop will teach participants practical strategies for curating research materials for reproducibility. The workshop will be based on the data quality review, a framework for helping ensure that research data are well documented and usable and that code executes properly and reproduces analytic results. The workshop will introduce three models for putting this framework into practice (the Institution for Social and Policy Studies (ISPS) at Yale University, the Cornell Institute for Social and Economic Research (CISER), and the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill). Participants will learn about the basic components of the CURE workflow using examples and hands-on activities. The workshop will also demonstrate a tool that structures the CURE workflow.
2017-05-24: A2: Data Curation: Perception and Practice
Data curation: Perception and practice
Cynthia Hudson-Vitale (Washington University in St Louis)
Lisa Johnston (University of Minnesota)
Jacob Carlson (University of Michigan)
Wendy Kozlowski (Cornell University)
Understanding the importance researchers place on specific data curation treatments (such as peer-review, persistent identifiers, chain of custody, etc.) is an essential step in building institutional curatorial services that are trusted, meet expectations, and address needs. Additionally, comparing this information to the treatments curators indicate are important, as well as the actual treatments they take locally, provides key insight into perceptions and practices related to data curation. Specifically, this information can tell us what stakeholders value in terms of curation and how this aligns with the local approach for curation services. To uncover this information, the Data Curation Network (DCN) used a mixed methods approach that included faculty focus groups, library surveys, and hands-on curation treatments. This panel will provide the detailed results and a comparative analysis of (1) both the importance of specific curatorial treatments from faculty and curator perspectives, and (2) the curation activities taking place at the six participating institutions, thus comparing perceived importance of curation activities to actual practices. This session will conclude with an audience discussion of the results, limitations of the information, and general project feedback.
2017-05-24: A3: Documentation Challenges in Complex Data Collection Efforts
A school survey management system for educational assessments in Switzerland
Ingo Barkow (HTW Chur)
Catharina Wasner (HTW Chur)
Currently two large educational assessment programs exist in Switzerland which are institutionalized by the cantons: PISA, the well-known Program for International Student Assessment (PISA), an OECD initiative that involves a large number of nations and the Swiss National Core Skills Assessment Program (in German: ÜGK – Überprüfug der Grundkompetenzen). Following completion of the PISA 2015, the core skills assessment program was initiated to focus on the assessment based on Swiss measurement instruments to get more results with national relevance.Both programs are computer based assessments since 2016 but the IT systems for both programs are not yet optimized for supporting the fieldwork in an adequate manner. Therefore a software tool will be developed to support on the one hand the administration and the field monitoring during the data collection. On the other hand the idea is to optimize the data documentation process. In this presentation we would like to show which processes should be modeled and where documentation und metadata could be generated as a byproduct without additional effort. This includes in particular also paradata which provide interesting opportunities for analysis.
Mobile SMS survey data management and preservation
Inna Kouper (Indiana University)
Charitha Madurangi (Indiana University)
Kunalan Ratharanjan (Indiana University)
Tom Evans (Indiana University)
Beth Plale (Indiana University)
The growing availability of mobile phones provides new opportunities for data collection, particularly in developing countries where it is challenging to reach respondents in rural areas. Text messaging or short message service (SMS) allows for high-frequency, automated data collection with large sample sizes at relatively low cost. However, the use of SMS technology also raises critical data quality issues, strongly suggesting the need for ongoing data management that can help evaluate robustness of responses over time, select appropriate tools for storage and data analysis and avoid dependency on specific platforms and standards.In this presentation we will discuss our approach to long-term management and preservation of mobile SMS survey data that are part of a larger project that examines adaptation of small-scale farmers in Africa to climate change. We will describe the development of an automated pipeline to ingest weekly data from a cloud-based platform to local data servers, while maintaining security and confidentiality. We will also demonstrate tools to enhance survey metadata and monitor and visualize data health and trends. The audience will be invited to discuss how robust data management can serve both the needs of the research team and the needs of potential data users and stakeholders.
Tales from the lab: A case study of metadata data management in complex behavioral studies
Pernu Menheer (University of Minnesota)
Andrew Sell (University of Minnesota)
The nature of complex human subjects research experiments makes them inherently ambiguous and dynamic for those developing and administering the experiments. There is often no blueprint on how to best develop and program these complex experimental studies; most are undertaken precisely because they have not been done before. However, innovation in experimental data collection often comes at the expense of the ability to use established methods and tools to compile metadata and paradata. Thus, the programmers who decide how to develop the code and programs used to administer the experiments also have considerable discretion in how to collect and compile the associated data and metadata. We will describe our experiences with developing complex human subjects research experiments and how adopting good practices in metadata and data management has improved the quality of our research support.
2017-05-24: A4: Data Instruction in the Age of Data Science
Cheap, fast, or good - Pick two: Data instruction in the age of data science
Joel Herndon (Duke University)
Justin Joque (University of Michigan)
Angela Zoss (Duke University)
While data-driven research has always required training and a complex understanding of methodologies and epistemologies, the environment for data services in research libraries seems to be growing increasingly turbulent. The demand for data instruction is widening to all disciplines. Even disciplines that have strong traditions in data creation and analysis are experiencing methodological crises. Tools and best practices change rapidly and make it difficult to maintain a stable curriculum for instruction. Finally, the librarians and staff who provide data services are often doing so in an environment where they have to divide their time between many different types of services and where resources for professional development are limited.In this panel, librarians and staff in different data services positions will present briefly on their experiences addressing the complexities of the current data instruction environment, including teaching about data management, data gathering/creation, data cleaning and analysis, and data visualization.nbsp; The panel chair will then lead a discussion between the panelists and audience to explore other challenges and opportunities in this area. A recorder will capture discussion and resources in an openly accessible document that can serve as a resource for data services staff moving forward.
The Council of European Social Science Data Archives (CESSDA) provides large scale, integrated and sustainable data services to the social sciences. Training for data discovery is a new area for CESSDA and aims to help researchers (or other end-users) locate and navigate data collections relevant to their own research/teaching interests; data collections may be stored in different locations and subject to different access conditions.nbsp;
Does graduate training in the Social Sciences prepare students for data management and sharing?
Ashley Ebersole (ICPSR University of Michigan)
Jai Holt (ICPSR University of Michigan)
Graduate programs in the social sciences aim to produce well rounded scholars and researchers. Although the majority of these programs offer curriculum in research methods and data collection as well as provide opportunities for professional development it is less clear how many students leave graduate school with a background in data management and data sharing.For each social science discipline, including psychology, sociology, anthropology, political science, geography, and history, their ethical codes necessitate open and accessible data. This is crucial as research becomes more collaborative, and evidence suggests that the research culture can be a barrier to data sharing. Early training in data management and sharing could help alleviate this problem.The current study assesses how masters and doctoral level programs in the social sciences include formal and informal training in data sharing and the use of data repositories. We developed a survey instrument to measure the extent to which programs are likely to provide this training using a mix of open and closed ended questions. We sent the survey to 30 programs in the United States across the six social science disciplines. We report on our results and discuss implications for developing training materials that can be used in graduate programs.
Reworking the workshop: designing data management workshops to align with behavioral change models
Elizabeth Wickes (University of Illinois at Urbana-Champaign)
Used both to raise awareness about a variety of topics and provide important experience with tools or skills, the humble workshop is often the centerpiece of outreach repertoire.nbsp; However, it is often difficult to critically analyze the effectiveness or understand how to align content.nbsp; Behavioral change models, including self-efficacy stage models, are an adaptable framework for this assessment because they provide well-researched classification approaches and have scientific support that stage-specific interventions provide more impact than non-targeted interventions (Bamberg 2013, Nachreiner, et al., 2015, Schwarzer, 2008).nbsp;This is especially true for data management training, because it attempts to promote long-term behavioral change rather than specific tool training.nbsp; Exploring a stage model toward data management behavior change immediately provides outlines for specific behavioral questions and workshop content can be critically analyzed as a targeted interventions.nbsp; This presentation will summarize the redesign process to identify and support individuals within specific stages of data management behavior change and showcase some of the open materials published to fit within this framework.nbsp; While evaluation continues to be difficult, analyzing our content within these models yielded more informed and critical judgment of content and activities and a basis for future assessment.
The use of statistical data for teaching and research purposes in Canada has substantially increased with the creation of the Data Liberation Initiative (DLI) program, established in 1996 as a partnership among Statistics Canada, other federal departments, and Canada’s academic community.nbsp; While some Canadian universities have a long history in the provision of data services, that was not the norm everywhere, especially in small universities and colleges.nbsp; To maximize the use of data, it was evident that increased education, resources and tools were required to support those delivering data services on their campuses. The development of the DLI Survival Guide, will be examined in this presentation.nbsp; The Survival Guide is an innovative and collaborative project driven by academic librarians and staff of the DLI program to address the diverse needs of those providing data services.nbsp; We will present a mix-methodology approach used to address the evolving needs of the community and describe the challenges and opportunities that were met and how they were addressed over the years. Suggestions for ways to maintain the currency and new ways to utilize the Survival Guide will also be given.
2017-05-24: B2: Repository Strategies across Communities
The Data Seal of Approval in the Australian context - Assessing the Australian Data Archive as a Trusted Digital Repository.
Steven McEachern (Australian Data Archive)
Data archives and funding agencies are increasingly interested in certification of data archives and repositories as "trusted digital repositories". There is now current interest in Australia in understanding certification models for Australian archives and repositories.nbsp;
DataverseNL – New developments of a data management support system for Dutch universities, research organisations, and higher education
Marion Wittenberg (Data Archiving and Networked Services (DANS))
Peter Doorn (Data Archiving and Networked Services (DANS))
Vyacheslav Tykhonov (Data Archiving and Networked Services (DANS))
Twelve years ago Data Archiving and Networked Services (DANS) in the Netherlands developed the first self-deposit archiving system EASY. Over the last few years, most universities and research institutes have developed research data management policies. Many institutions want to offer a repository service to the staff of their institution for storing and sharing research data. University libraries or other university departments usually want to have the control of such a repository solution.nbsp;To meet this demand, we started working according to a front-office, back-office model. A practical implementation of this is DataverseNL, built upon software developed by Harvard University (IQSS). DataverseNL is a shared service provided by the participating institutions and DANS. DANS performs back-office tasks, including server and software maintenance and administrative support. The participating institutions are responsible for managing the content, the data deposited by their staff. The repositories (or dataverses) within DataverseNL are positioned for data storage and sharing during research and about ten years after the conclusion of a research project. With a SWORD interface to EASY, long-term archiving is secured. In this presentation, we will focus on future developments like data tagging for privacy-sensitive data, visualization of data and building Virtual Research Environments within DataverseNL.
FAIR Data in Trustworthy Repositories: Everybody wants to play FAIR, but how do we put the principles into practice?
Ingrid Dillo (Data Archiving and Networked Services (DANS))
Peter Doorn (Data Archiving and Networked Services (DANS))
There is a growing demand for quality criteria for research datasets. We will argue that the Data Seal of Approval and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any Trustworthy Digital Repository.nbsp;In 2014 the FAIR Principles were formulated. The well-chosen FAIR acronym is attractive: it almost automatically gets stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholders, including research funders.The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible, in a usable format, reliable and are identified in a unique and persistent. The DSA presents quality criteria for repositories, whereas the FAIR principles target individual datasets. The two sets of principles will be discussed and compared and a tangible operationalization will be presented.
2017-05-24: B3: Infrastructure to Support Restricted Data Sharing
Using the 5-Safes framework: a case study of health data access in the UK
Carlotta Greci (The Health Foundation)
Arne Wolters (The Health Foundation)
The Health Foundation is an independent charity committed to bringing about better health care provision for people in the United Kingdom. Our in-house research relies on the analysis of patient level data, which can provide insights into health utilisation and outcomes. It is important for any organisation processing patient information to be able to keep these data safe, and demonstrate this to data providers.This paper discusses the application of the ‘5 Safes framework’ in health services research. This framework provides guidance in risk mitigation when using health records, whether in aggregated form, fully identifiable or when de-identified. Fundamental to the framework is creating an appropriate balance of controls in all five dimensions of safe data access (safe data, safe people, safe projects, safe settings and safe outputs).The Health Foundation applies this framework in the designing and operating of its secure data environment. Combined with active data provider engagement, and adherence to national and international standard of best practice, the secure data environment enables the safe processing of de-identified patient’s medical records linked across various health care services in the United Kingdom.
Identifying less common types of restricted data
Trace Crago (Boise State University)
Amber Sherman (Boise State University)
Jean Barney (Boise State University)
Most researchers are aware that personally identifiable information needs to be protected and restricted. There are established ways to automatically scan for information like Social Security or Credit Card numbers and best practices exist for de-identifying or masking variables which could reveal identity. However, there are many other laws and policies which restrict the publishing of certain types of data including information about endangered animals and protected areas. More awareness of the less common types of restricted data is needed among researchers and data management professionals. The national trend toward interdisciplinary research programs makes it more likely that a single researcher will lack a complete knowledge of applicable laws. This paper will discuss the research taking place to create a database of sensitive information types along with corresponding laws restricting that information and suggestions for identifying each of those data types.
Facilitating collaboration with restricted-use data
John Marcotte (ICPSR / University of Michigan)
Data protection requirements often impede collaborations.nbsp; A typical security plan requires a standalone (non-networked) computer in a locked office. This requirement makes collaborating with colleagues at the same institution difficult; moreover, it makes collaborating across institutions practically impossible.The challenge is to provide a platform that facilitates collaboration while also meeting security requirements.nbsp; Cloud computing with virtualized machines can meet these challenges.nbsp; nbsp;Virtual machines can be configured to prevent them from accessing the Internet so that researchers cannot copy files to a server on the Internet.nbsp; nbsp;With virtual environment, implementing vetting of output by authorized reviewers can easily be incorporated into the protocols.nbsp; nbsp;Researchers from different institutions can access the virtual machines in the cloud and have access to shared project resources and files.Virtual environments have been recognized for their security.nbsp; The most important feature of this type of virtual environment is that the restricted-use data never leave the system. Access to the data may be modified instantly.nbsp; When a data use agreement expires, access can be immediately terminated. Another facet is controlling output so that it can be vetted for disclosure risk.nbsp; nbsp;In addition to these security controls, virtual environments can be setup with common disk space for projects. This disk space is a way for researcher at different location to collaborate.nbsp; Since this space is within the virtual environment, it meets security requirements.
UK Data Service responses to changes in the data landscape
Hersh Mann (UK Data Archive)
The Approved Researcher scheme is used by the United Kingdom Office for National Statistics to grant access to microdata that cannot be published openly. Following on from reviews of this scheme and of data that fall within its remit, there have been changes to the mechanisms by which the UK Data Service provides access to these data sources. These changes relate to the process of gaining permission to access data, and to a statistical disclosure review of the licences under which sensitive variables are held. Using these reviews as exemplars, this presentation will discuss how the impact of the changes affects the operation of the UK Data Service (in acquisition, licensing, ingest, access, and support) and how the user experience is altered in parallel. This exercise demonstrates the value of working closely with data depositors at all stages of the data lifecycle to strike a balance between preserving data security and ensuring that sensitive information can be shared safely and practically for legitimate research needs. As legislation and attitudes evolve to encompass new forms of data, there will be a continuing need for data producers and data services to provide dynamic responses to new developments.
2017-05-24: B4: IASSIST: Data Professionals and Collaboration
Building a bigger data tent: What can IASSIST learn from CODATA?
Ernie Boyko (Carleton University (Retired))
CODATA, an interdisciplinary scientific committee of the International Council for Science (ICSU), has many things in common with IASSIST.nbsp; Established in 1966, CODATA promotes and encourages the compilation, evaluation, and dissemination of reliable data of importance to science and technology on a world-wide basis. IASSIST is an international organization of professionals working with information technology and data services to support research and teaching. Membership in CODATA is country and scientific union based while IASSIST is made up of individual memberships.nbsp; In spite of these differences in organizational structure, the focus on technology in support of data stewardship and on capacity-building are shared strategic directions of both organizations. A current proposal to merge the International Social Science Council (ISSC) with ICSU has the potential to bring IASSIST and CODATA even closer together in mission. This presentation will update the audience on the merger process and will outline the strategic directions and recent achievements of CODATA. It will conclude by identifying new doors in data development and service that could be opened by working more closely with CODATA. We look forward to an engaging discussion of our new opportunities.
Maximizing on The IASSIST Way: Data Support for All (without burning out)
Paula Lackie (Carleton College)
Adetoun Oyelude (University of Ibadan)
Libby Bishop (UK Data Archive, U of Essex)
Dessi Kirilova (Syracuse University)
The demand for expert support in all phases of the data lifecycle as well as under increasingly diverse settings has never been greater - and it continues to grow exponentially.nbsp; Coincidentally, our professional staffing levels have rarely kept pace with this demand.nbsp; To maintain our professional standards, IASSIST members can do more to rely on one another to maximize on our shared expertise.nbsp; We have a broad set of expertise in working with written guides, maximizing students as peer leaders, reaching across academic disciplines and being fluid with our vocabulary. Surely there’s a way that we can pull these resources and expertise together for our mutual benefit!Join us in a discussion of models for local/institutional support as well as mechanisms to employ through our membership in IASSIST to continue to nurture the responsible use of data for good, never for evil.
2017-05-24: C1: Data, the Common Language with Different Dialects: Views of Data from Outside of the Social Sciences
Data and metadata standards for biodiversity inventory, modeling, and analysis: Darwin Core and EML
James Beach (University of Kansas)
Humanities linguistics data standards: State of the art challenges
Arienne Dwyer (University of Kansas)
Clinical integrated data repositories and observations regarding data sharing and national collaboration
Russ Waitman (University of Kansas)
2017-05-24: C2: Data Rescue
Accessing historical Canadian census boundaries just got a whole lot easier! A journey in data migration and cross-institutional collaboration
Amber Leahey (Scholars Portal, Ontario Council of University Libraries)
Finding and mapping Canadian historical census data can be a little difficult. This presentation will discuss the project to migrate older spatial boundary data, gather digital data from across libraries and institutions in Canada, and publish decades worth of census boundaries to a central spatial data portal, Scholars GeoPortal (http://geo.scholarsportal.info), for open access. Future work will also be discussed including digitization and georeferencing of "lost" census years and boundaries.
Documenting data rescue. The Ontario Data Community Data Rescue Group and the Data Rescue Curation Guide for Data Rescuers
Kristi Thompson (University of Windsor)
Leanne Trimble (University of Toronto)
Alexandra Cooper (Queen's University)
This presentation will describe the efforts of a group of Ontario data professionals to rescue a collection of Government of Canada survey data files, and will present the guide to data rescue and curation that grew out of their efforts. This group, representing several different institutions, initially came together on a project to rescue various historically important Government of Canada survey data files that were only available in states ranging from unusable to incomprehensible. As the project progressed the members documented the steps they were taking to come up with a streamlined procedure. This internal document that began as a set of guidelines for working with a relatively uniform set of surveys grew as the project expanded into a detailed guide to data rescue.nbsp;This presentation will give an overview of the project and data collections involved, describe the group’s work with government staff to obtain necessary files and educate them on data curation, review the work undertaken to rescue a single survey, and present the Ontario Data Community Data Rescue Group’s Data Rescue Curation Guide for Data Rescuers.
2017-05-24: C3: Ethical Sharing & Management of Data
Restricted Data Contracts: Current and Future Directions
Lisa Broniszewski (Penn State Population Research Institute)
Lisa Neidert (University of Michigan Population Research Center)
Jennifer Darragh (Duke University)
Loren Masters (Penn State Methodology Center)
Contracts for access to restricted data are a growing need for researchers in multiple disciplines. This panel will begin with a discussion of common restricted data contract components such as Data Use Agreement terms, Institutional Review Board (IRB), and Data Protection/Security Plans. We will present current and future directions in these three areas and, time permitting, spend some time discussing issues that attendees have experienced. This panel discussion will benefit those who are new to restricted data contracts – helping provide the common ground and language needed to build relationships at your institution, as well as giving those who have experience working with these contracts a platform to express their sticky points so we can determine how we might be able to help one another.Lisa Neidert, University of Michigan Population Research Center, has over 20 years of experience working with data use agreement terms and conditions. Loren Masters, Penn State Methodology Center, has assisted researchers in the public health field with their contracted data as well as IRBs associated with their projects. Jen Darragh, Senior Research Data Management Consultant with Duke University Libraries has many years of experience working with researchers from various institutions and disciplines in navigating data protection plans associated with restricted data contracts.
Whose data ethics do you mean? Building common language with RCR
Nina Exner (North Carolina Agricultural and Technical State University)
Do some researchers seem confused about data ethics? They may be getting contradictory messages! Data management usually focuses on the data lifecycle. But there is another campus perspective on managing data. Responsible Conduct of Research (RCR) ethics compliance professionals have a different view on data management. Data management perspectives come from data usability and re-usability. The RCR perspective draws from government ethics regulations such as human subjects/IRB guidelines and other ethical protocols. Even though views of data ethics inform lifecycle data management, when the two points of view get to details they have very different perspectives.This presentation will share what happens when scholarly communications and RCRnbsp; professionals realize they have these very different ideas of what data management means. If you come to this session, you’ll learn how we harmonized the two views to build a local partnership around our common language of ethics and data. The shared understanding we built underpins our shared "ethical management of data" workshop. The harmonized workshop lets us have double the reach as we co-teach to audiences whether they are interested in ethics or data.
2017-05-24: C4: Standards Based DDI Tools
No tools, No standard. An introduction to standards based tools
Johan Fihn (Swedish National Data Service)
The acceptance and adoption of a standard like for instance DDI highly depends on the availability of software tools to use it. In this session we like to give you an introduction to work done on tools facilitating use of standards and present you a selection of these.nbsp;
Efficient and flexible DDI handling for the development of multiple applications
Oliver Hopt (GESIS)
Claus-Peter Klas (GESIS)
Wolfgang Zenk-Möltgen (GESIS)
Alexander Mühlbauer (GESIS)
The current usage of DDI is heterogeneous. It varies over different versions of DDI, different grouping, and unequal interpretation of elements. Therefore provider of services based on DDI implement complex database models for each developed application, resulting in high costs and application specific and non-reusable models.nbsp;
Continuum of Statistics Canada’s Microdata Data Access Services
Chantal Ripp (Statistics Canada)
Statistics Canada recognizes that sometimes researchers require access not only to aggregate statistics, but also to microdata at the individual business, household or person level. In order to preserve the privacy and confidentiality of respondents while at the same time encouraging the use of microdata, a range of data access options are offered by Statistics Canada.nbsp; This poster session will present the continuum of microdata access services, including access to public use microdata files (Data Liberation Initiative and Access to Public Use Microdata Files Collection), direct access to detailed microdata in a secure physical environment (Research Data Centres and the Centre for Data Development and Economic Research) and remote access solutions (Real Time Remote Access system).
A Complex Use Case - Documenting the Consumer Expenditure Survey at BLS
Daniel Gillman (US Bureau of Labor Statistics)
Evan Hubener (US Bureau of Labor Statistics)
Reginald Noel (US Bureau of Labor Statistics)
Bryan Rigg (US Bureau of Labor Statistics)
Arcenis Rojas (US Bureau of Labor Statistics)
Lucilla Tan (US Bureau of Labor Statistics)
Taylor Wilson (US Bureau of Labor Statistics)
The Consumer Expenditure Survey (CE) is a Bureau of Labor Statistics (BLS) program that measures how US families spend their money. These data are also input to the CPI. BLS selected DDI-3.2 to document CE, including the entire life-cycle.CE is conducted as 2 separate surveys, Interview and Diary. The data are combined during processing and packaged in 2 ways, one for CE dissemination and one for CPI. Changes in design occur every odd numbered year. Yearly estimates are created every 6 months, PUMD issued yearly, and data sent to CPI monthly. CE processing is divided into 4 sub-systems: 1) sample selection and collection; 2) initial edit subsystem; 3) estimation and edit subsystem, with data sent to CPI; and 4) final edits, tables, microdata. Data are processed in packages by expenditure type.A documentation system needs to handle all these features. For development, BLS is conducting a phased approach, adding complexity from phase to phase. The incremental systems are designed to establish that DDI and the Colectica system are sufficiently sophisticated to account for each feature of CE. This paper will go into detail about the particulars of the CE survey, describe progress made, and plans for the future.
Ernie Boyko (Canada National Committee for CODATA)
Simon Hodson (CODATA)
CODATA, an interdisciplinary scientific committee of the International Council for Science (ICSU), has many things in common with IASSIST. Established in 1966, CODATA promotes and encourages the compilation, evaluation, and dissemination of reliable data of importance to science and technology on a world-wide basis. This poster will outline the scope of CODATA activities with the aim of identifying areas of mutual interest with IASSIST and explore possible areas of collaboration.
Courtney Butler (Federal Reserve Bank of Kansas City)
Brett Currier (Federal Reserve Bank of Kansas City)
The Federal Reserve Bank of Kansas City built a LaTex citation for acquired data so economists could copy and paste the data citations into their preferred word processing program, which is LaTex. Copy and paste citations exist for traditional academic articles and books from places like Citation Machine or Google Scholar, which provides researchers with the code they need for various citation styles and word processing programs, including LaTex. We could not identify a similar plugin for datasets. We reviewed all active contracts and open data sources for publication permissions, restrictions or limitations, post-termination rights, and specific data citation guidelines. Information was then compiled and made available on a private intranet site to avoid violation of non-disclosure agreements. Citation information was translated into LaTex scripting in Modified Chicago Style when specific citation requirements were not indicated by the Licensor. This poster will explain that process and provide a template for data citations and their LaTex scripting.
Data Data Data! But Little to Work With
Adetoun Oyelude (University of Ibadan, Ibadan)
The poster presents a model of data (in)accessibility in institutions where lots of data is produced and (not) stored. The difficulty in accessing available data due to factors such as lack of expertise on the part of the data professionals in charge, and also that of the data user is explained. The solution to the "Data glut" is proffered through proper planning and management of data in institutions that generate or gather data, as well as adequate capacity building for staff who handle data, and for users of data that has been generated. A case study of adequate data management and training is pictorially shown in the poster.
Developing Research Data Life Cycle Strategy: A Collaborative Approach at Fed Chicago
Deng Pan (Federal Reserve Bank of Chicago)
Research data are critical for researchers at the Federal Reserve System to conduct empirical analysis for monetary policy related work and long-term projects. However, the management of research data in Fed Chicago was more likely handled on an ad-hoc basis, lacking a systematic and consistent approach of planning, acquiring, processing, publishing, storing and preserving the data.nbsp; In 2016, a Research Data Life Cycle Strategy (RDLCS) was developed collaboratively among data librarians, IT staff, and researchers at Fed Chicago.nbsp; nbsp;This poster will diagram the six stages required for successful management of research data particularly tailored to the Fed environment, and highlight the key elements undertaken by all stakeholders to address existing issues and challenges and optimize users'nbsp; experience with research data.
As the use of big data in social research continues to grow, challenges are emerging regarding the sharing of these forms of research data.nbsp; Data may be shared in response to funder mandates, journal requirements, or researchers’ preferences.nbsp; Sharing big data may pose legal and ethical challenges for three reasons: 1) data are not created by researchers; 2) data are not created for research purposes; and 3) data are not created in discreet bundles.nbsp; When data are produced outside traditional research frameworks, the conventional protections (e.g., informed consent and anonymisation) may not be feasible or possible.nbsp; Moreover, data that were previously collected for specific research purposes are now increasing capable of being linked with other data sources, potentially increasing disclosure risks.This poster will consider diverse genres of big data (e.g., social media, geo-spatial, and administrative).nbsp; The key challenges of data sharing will be shown, with practical tools—such as checklists and flowcharts—to guide researchers through the steps of sharing big data.
Colleen Fallaw (University Library, University of Illinois at Urbana-Champaign)
To participate in centralized data search and access, Illinois Data Bank (the research data repository for the University of Illinois at Urbana-Champaign) contributes and maintains records for datasets in the DataCite Metadata Store, using the EZID API through Purdue University. The organizational and technical hand-offs through the various layers can be complex to navigate.nbsp;nbsp;nbsp;On the way from research data producers to consumers, Illinois Data Bank metadata is formatted, stored, subsetted, reformatted, and passed through several organizations.nbsp;University Library = EZID at Purdue = DataCite = International DOI Foundationnbsp;If a researcher needs to correct metadata or adjust publication delay, or a curator needs to suppress a dataset while reviewing concerns, Illinois Data Bank propagates adjustments along the chain. Supporting researcher self-deposit, along with a slate of curator controls, exploits the breadth of the API and requires an understanding of the connections among components. The goal of this poster will be to present these intricacies, along with some of our technical strategies for using the EZID API, in way that implements features of the Illinois Data Bank.
Promoting Data Usage in SSJDA: Introducing Our Secondary Analysis Workshops
Izumi Mori (The University of Tokyo)
Natsuho Tomabechi (The University of Tokyo)
Satoshi Miwa (The University of Tokyo)
Social Science Japan Data Archive (SSJDA) has released microdata since 1998. While we initially held no more than 200 datasets being used by a maximum of 10 users per year, we currently hold over 1900 deposited datasets, with a data usage count of approximately 2900 per year. One of our major initiatives in promoting such data use includes Secondary Analysis Workshops, which are held to encourage researchers and graduate students in social sciences to make the best use of the survey data kept in our archive. We seek participants and research themes from all over Japan and analyze the target datasets every year. Researchers from depository institutions who are knowledgeable about the data serve as advisors for the workshops. SSJDA staff also support the workshops as they provide expertise in social research and quantitative data analysis. Through these efforts, participants are able to work together to pursue their own research agenda, as they receive advice on the characteristics of the data as well as on choosing methodologies. The number of participants and research themes for the workshops have been increasing each year, suggesting that such initiatives are highly regarded by Japanese researchers and graduate students in social sciences.
Store It in a Cool Dry Place - Processing and Long-term Preservation of Research Data
Tuomas J. Alaterä (Finnish Social Science Data Archive)
One crucial component of making research data accessible and reusable is preserving it properly. We know that the recipe is heavy on metadata.nbsp;But there are other ingredients too, and even the ripest data need to be carefully prepared for preservation. Furthermore, without the right tools and a cool, dry place for storage, the mission is in jeopardy from the beginning.This poster highlights the recent progress of the Finnish Social Science Data Archive. We run an institutional data repository, but have in a systematic way been preparing our collection for sustainable long-term preservation by taking advantage of an emerging national long-term preservation solution.There are four major, partly parallel, areas for development:1) Choosing sustainable file formats, migrating existing content, and updating data processing policies and software accordingly.2) Producing software for harvesting rich metadata and wrapping it with the data and contextual files into a METS container for transfer.3) Influencing to the adoption of national standards and services.4) Training the staff and administrative tasksBetter data management, increased trustworthiness and automated processes should allow us to allocate more human resources to other critical software development and data services. Our effort focuses on traditional social science data; data matrices, code and textual materials. However, the principles accepted can be adopted by other disciplines as well, given that the formats are the same. The work has been carried out in collaboration with the National Digital Library Initiative.
Switching from Field Work Using ODK Powered Electronic Data Collection to Data Documentation in DDI: A Junior Data Documentation Officer’s Initial Impressions of DDI Codebook, Malawi Epidemiology Interventions and Research Unit (MEIRU)
Themba masangulusko Chirwa (Malawi Epidemiology Interventions and Research Unit (MEIRU)/ KPS)
Chifundo Kanjala (Malawi Epidemiology Interventions and Research Unit (MEIRU)/ KPS)
Dominic Nzundah (Malawi Epidemiology Interventions and Research Unit (MEIRU)/ KPS)
In this paper, I give my perspective of how our organisation started using metadata standards to support data management and data sharing. MEIRU runs a health research programme encompassing a rural site in northern Malawi and an urban site in the capital city of Malawi Lilongwe.nbsp; It has a rich collection of longitudinal data dating to as early as 1979. Work is now underway to convert the vast unstructured documentation into DDI codebook format for data sharing with researchers outside the project. I relate my education background and prior work experience to my current work as a metadata officer. I identify parallels and differences between my current and former jobs and highlight areas where training and closer supervision are required to strengthen my capacity. I finally attempt to identify opportunities for capturing metadata during the field work phase to reduce confusion down the line when the data are being prepared for sharing. The perspectives shared here could be of use to researchers working on projects similar to MEIRU and also to DDI developers who will see how we are implementing the specification in our settings. I am holding a Malawi School Certificate of Education (MSCE), Certificate Computing and Information Systems (CIS)
Using Backward Design to Create Research Data Management Professional Development for Information Professionals
Abigail Goben (University of Illinois-Chicago)
Megan Sapp Nelson (Purdue University)
This poster details the design process that was used to develop the Association of College and Research Libraries “Building Your Research Data Management Toolkit: Integrating RDM into Your Liaison Work” road show.nbsp; Starting with the development of learning objectives, and highlighting the multiple assessments that are offered prior to the road show experience, during the road show itself, and follows up the road show at the one month and six month post- show mark. The poster then shows the links between the learning objectives, assessments, and learning activities developed to assist learners to meet the learning objectives.
Using Data to Make Sense of Data: The Case of Video Records of Practice in Education
Allison Tyler (University of Michigan)
Researchers and teacher educators use video records of practice documenting classroom activity to study and improve upon teaching across grade levels and subject areas.nbsp; Their usage of video records of practice is often accompanied by the use of supplemental data, such as school/classroom demographics, seating charts, lesson plans, and interviews, to achieve research or teaching aims.nbsp; Educational researchers use these data as case studies, to test research questions and framework, and to develop research protocols.nbsp; Teacher educators use the data as teaching exemplars, to allow pre-service teachers to practice and evaluate teaching methods, and reflect upon pedagogy.nbsp; This poster will evaluate patterns in the use of supplemental data usage by researchers, teacher educators, and those who use video records of practice for research and teaching depending on the purpose of that data reuse.nbsp; The results of this analysis will provide a baseline for how and what supplemental data will meet the research and/or teaching needs of schools of education. The findings also have implications for repositories’ data collection strategies and how best to make video records of practice available to these designated communities.
Curation, Collaboration, and Coding—The Secret Sauce for Scholarship Support
Megan Potterbusch (Association of Research Libraries)
Cynthia Hudson-Vitale (Washington University in St. Louis Libraries)
This half-day workshop is an overview and hands-on introduction to the Open Science Framework and the SHARE data set, two tools that form a powerful combination for supporting scholarship and research locally as well as improving scientific integrity and allowing for new forms of meta-research.Developed by the Center for Open Science, the Open Science Framework (OSF; http://osf.io) is a free, open source tool that works within the research workflow to allow for better management, curation, streamlining, and sharing of scholarly outputs. SHARE builds its free, open, data set (https://share.osf.io/) by gathering, cleaning, linking, and enhancing metadata that describe research activities and outputs—from data management plans and grant proposals to research data and code, to preprints, presentations, and journal articles.nbsp;In this workshop, participants will learn to use the OSF to develop embedded data stewardship and research management services for faculty. Attendees will also learn how to leverage and enhance SHARE data to improve their institutions’ understanding of the whole scholarship ecosystem happening on their campuses.This workshop will be divided into two parts. First, attendees will learn strategies to provide curation and research services to the faculty workflow by operating in the OSF. Practical approaches to faculty collaborations and curation assistance throughout the research life cycle will be discussed. The second part will focus on harnessing the power of the SHARE data set to discover and act upon the research outputs of an institution or organization. This hands-on portion of the workshop will use IPython/Jupyter Notebooks to access the SHARE API and search across 129+ different providers and export and clean the metadata.nbsp;Participants are encouraged to bring laptops in order to follow along. No previous programming experience is necessary.
5 Minute Metadata: Informative Videos to Meet the Metadata Novice in the Middle
Lauren Eickhorst (Aristotle Metadata Registry)
Samuel Spencer (Aristotle Metadata Registry)
When searching online for information regarding metadata, like what it is and why it’s useful, it is hard for people new to the concept of metadata to find accurate information. Metadata related videos results are often heavily technical or business-oriented in nature, such as narrated PowerPoint slides or heavy with text. Occasionally videos have factual errors confusing descriptive and structural metadata concepts or present too much information too quickly for people to understand well.The “5 Minute Metadata” videos are a new take on how to introduce metadata to people in a non-confronting way, whether they are a seasoned data expert or are completely new to the concept. These videos are a way to improve metadata literacy by meeting people half way, because they might have heard about metadata but are be unsure of what it is, like the differences between descriptive and structural metadata. These videos introduce metadata in a fun and light-hearted way and help convey information about metadata and expand communication surrounding it.
Archive of Data on Disability to Enable Policy and Research: Creating a Common Resource for Disability and Rehabilitation Stakeholders
Jai Holt (ICPSR)
Alison Stroud (ICPSR)
The Archive of Data on Disability to Enable Policy and research is a new ICPSR initiative to build a central repository of quantitative and qualitative data about disability that has been dispersed across disciplines. The mission of ADDEP is to improve and enable further research on disability for researchers, policymakers, and practitioners by acquiring, enhancing, preserving, and sharing data. This poster will display ADDEP’s newly launched website and available resources. Also described in the poster are ways to discover data available to download from ADDEP and how the data can be used to better understand and inform the implementation of major disability-related policies such as the Americans with Disabilities Act. Details about how user-friendly data exploration tools and other resources on the ADDEP website will help to break down barriers to research within the cross-disciplinary disability and rehabilitation research community will be highlighted.
The Curating for Reproducibility (CURE) Consortium
Thu-Mai Christian (Odum Institute, University of North Carolina at Chapel Hill)
Florio Arguillas (Cornell Institute for Social and Economic Research, Cornell University)
Sophia Lafferty-Hess (Odum Institute, University of North Carolina at Chapel Hill)
Limor Peer (Institution for Social and Policy Studies, Yale University)
In July 2016, the Institution for Social and Policy Studies (ISPS) at Yale University, the Cornell Institute for Social and Economic Research (CISER), and the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill formed the Curating for Reproducibility (CURE) Consortium. These academic institutions all maintain data archives that have been involved in implementing workflows that put into practice data quality review , a framework that includes research data curation and code review. This framework helps to ensure that research data are well documented and usable and that code executes properly and reproduces analytic results. The proposed poster will outline the goals of the consortium as well as provide examples of how these institutions have integrated data quality review into workflows, tools, and protocols.
The State of Data Curation in ARL Libraries
Cynthia Hudson-Vitale (Washington University in St Louis)
Lisa Johnston (University of Minnesota)
Wendy Kozlowski (Cornell University)
Heidi Imker (University of Illinois, Urbana-Champaign)
Jacob Carlson (University of Michigan)
Robert Olendorf (Pennsylvania State University)
Claire Stewart (University of Minnesota)
The Data Curation Network surveyed members of the Association of Research Libraries (ARL) on their Data Curation Activities and Infrastructure as part of the ARL SPEC Kit program in January 2017. The openly accessible results of the study (link forthcoming) demonstrates the current state of data curation services in ARL institutions by addressing the current infrastructure (policy and technical) at ARL member institutions for data curation, treatment activities, the current level of demand for data curation services, and the frequency for how often specialized curatorial actions are taken. This poster dives deeper into the qualitative responses and analyzes the trends and challenges that institutions are currently facing when providing data curation services. As libraries seek to define their mission and service levels in support of data curation activities, having an understanding of the challenges that other institutions face in supporting this effort will be essential. Finally the poster will describe how the current partner institutions of the Data Curation Network will use the results of this survey to gain a more extensive understanding of the curation ecosystem beyond ARL institutions.
Documentation in the Middle: Active Phase Project Documentation for Inclusive and Effective Team-Based Research
Hailey Mooney (University of Michigan Library)
Jacob Carlson (University of Michigan Library)
Karen Downing (University of Michigan Library)
Lori Tschirhart (University of Michigan Library)
Documentation is an essential component of good data management and yet data service providers often struggle to provide effective support to researchers. There are materials available for creating or assisting researchers with documentation at the beginning and end of a project; from data management plans to documenting data for archival purposes. However, we don’t yet have a solid understanding of how research teams incorporate (or not) documentation into their everyday work. This poster reports on a project to investigate, analyze, and synthesize real and ideal documentation practices within research teams in order to develop a universal project manual documentation template. It is our contention that a “lab manual” or “project organization protocol” will enhance the effectiveness and efficiency of research teams, while creating an inclusive environment by making local practices and expectations clear to all team members regardless of previous research experience and disciplinary background. The goal of this project is to identify the basic considerations that any researcher from any discipline should consider for their local documentation in support of team-based research projects.
Finding a Data Sharing Solution: Connecting Journals to Harvard's Dataverse
Sonia Barbosa (Harvard Dataverse Repository)
Harvard Dataverse Repository offers Journals several workflow options to enhance their data sharing and preservation experience: 1. Journals can create a customized dataverse that allows use of the journal publishing workflow; 2. Journals can utilize option 1 paired with reproducibility verification provided by ODUM Archives; 3. Journal systems can make use of our Integration API currently used by OJS and OSF for seamless data deposits; and 4. Journals can recommend that authors deposit data into Harvard's Repository. Journal specific features include: Private URL for dataset review and coming soon, data file widgets that can be included within the published journal article.
Let’s Meet in the Middle: Facilitating Access to Administrative Data in the UK
Rowan Lawrance (UK Data Archive/ADRN)
Sabrina Iavarone (UK Data Archive/ADRN)
The Administrative Data Research Network (ADRN) facilitates access to de-identified administrative data for researchers. Under a complex and dynamic data sharing legal framework in the UK, the Network is a partnership of UK Universities, government departments, national statistical authorities, funders and research centres and it aims to deliver a service enabling secure and lawful access to de-identified linked administrative data to researchers.nbsp;As one of the 'front doors' to the ADRN, the Administrative Data Service is liaising with data owners, researchers and experts in data linkage and data governance to facilitate access to administrative data. In addition to providing guidance on processes and an infrastructure addressing some of the concerns on information governance and data security through dedicated 'secure environments' as points of access. Quite often, we find ourselves in the ‘middle’ of these discussions, as we negotiate access and translate requirements and repurpose documentation to ensure the project resonates with a variety of agendas and priorities.nbsp;The poster will provide an overview of recent work in the area and how we have dealt with challenges up to now. We will summarise work done in trying to streamline application processes for different data providers in different data domains in the UK (e.g. education, health, crime, benefits and labour market). We will talk about how ADRN has been working alongside government departments to design and implement streamlined approaches to administrative data access in the UK and how we are supporting researchers when they apply to access administrative data for their research in the areas of ethics, consent, legal pathways to access, methodology and data availability. And how it’s not just about data meeting in the middle, it’s primarily about people.
New Approaches to Facilitate Responsible Access to Sensitive Urban Data
Andrew Gordon (Center for Urban Science and Progress, NYU)
Rebecca Rosen (Center for Urban Science and Progress, NYU)
Daniel Castellani (Center for Urban Science and Progress, NYU)
Daniela Hochfellner (Center for Urban Science and Progress, NYU)
Julia Lane (Center for Urban Science and Progress, NYU)
Improving government programs requires analysis of government administrative data. Providing access to these data to academic and public sector researchers is an important first step to robust analysis. At the same time, these data contain Personally Identifiable Information and so great care must be taken in obtaining, storing, and providing access to these data. The Data Facility at NYU’s Center for Urban Science and Progress (CUSP) is building on a long history of research on how to facilitate data curation, ingestion, storage, and controlled access in a safe and trustworthy environment. The poster describes how CUSP combines computer science, information science, and social science approaches which include (i) building a data model that accommodates sharing research data across disciplines, (ii) employing data curation and ingestion services so that data providers can confidently share their data with authorized researchers, (iii) converting data restrictions into concise, easy to understand, and searchable metadata to help researchers find appropriate data for their research, and (iv) capturing activity around datasets as contextual metadata so researchers can discover new data to complement their analyses.
Research Data Management and Academic Institutions: A Scoping Review
Leanne Trimble (University of Toronto)
Dylanne Dearborn (University of Toronto)
Ana Patricia Ayala (University of Toronto)
Erik Blondel (University of Toronto)
Tim Kenny (University of North Texas Health Science Center)
David Lightfoot (St. Michael's Hospital)
Heather MacDonald (Carleton University)
This poster will describe the results of a scoping review undertaken at the University of Toronto, Carleton University, and the University of North Texas Health Science Center in 2016-17. The purpose of this study is to describe the volume, topics, and methodological nature of the existing literature on research data management in academic institutions. The specific objectives of the scoping review include:1. to complete a systematic search of the literature to identify studies on research data management across all disciplines in academic institutions;2. to identify what research questions and topic areas have been studied in research data management related to academic institutions; and3. to document what research designs have been used to study these topics.This poster will outline the analysis of the identified literature, and describe the results obtained from the scoping review.Note: The 8th author of this poster is Mindy Thuna from the University of Toronto.nbsp;
Social Science Data Archive Business Models: A Historical Analysis of Change over Time
Kristin Eschenfelder (University of Wisconsin-Madison)
Kalpana Shankar (University College Dublin)
Allison Langham (University of Wisconsin-Madison)
Rachel Williams (University of Wisconsin-Madison)
The sustainability of data archives is of growing concern, and recent reports have raised questions about possible alternative business models for data archives.nbsp; This study will provide a clearer understanding of how and why data archives made changes in business models from the 1970s to the early 2000s in the past in response to evolving conditions. Business models encompass financial structures such as revenue streams and costs, but also relationships (contractual, partnerships etc.), mission decisions about who to serve, and collections decisions about what to maintain.nbsp;This poster is part of a larger project about how social science data archives have adapted over long periods of time and to variety of challenges.nbsp; nbsp;nbsp;We will include data on changes in business models at four prominent and long-lived social science data archives, ICPSR at University of Michigan, the UKDA, part of the UK Data Service at University of Essex, the LIS Cross National Data Center in Luxembourg, and EDINA at the University of Edinburgh. Our data include historical institutional documents and interviews with current and past staff.
Introducing general audiences to their first hands-on data work often faces formidable barriers. New users typically must spend their time installing, configuring, and learning the programming conventions of specific software environments that may themselves present barriers of cost and compatibility. Importing and wrangling data into a form suitable for use is another barrier.As data professionals, we can apply our skills to develop relatively painless introductions to data that focus on understanding the data itself and analytical concepts, instead of the mechanics of a program.nbsp; We can customize and tailor our presentations to the needs of particular audiences by developing wrappers around data and functions that simplify their use, and we can develop techniques and interfaces that allow easy data exploration.Using R, this workshop will explore 1) building packages for distributing data and functions; 2) using sample data and functions to illustrate basic data literacy concepts such as descriptive statistics, modeling, and visualization, while keeping the focus on meaning, not mechanics; and 3) building tools for interactive exploratory data analysis by end users.nbsp; As open source software, R is easily available and can be locally distributed where internet access and computing resources are scarce.Note that workshop attendees will need to provide their own laptop. The workshop leaders will contact attendees with instructions for downloading software (R, version 3.3 or later, download from https://cloud.r-project.org and RStudio, version 1.0 or laterdownload from https://www.rstudio.com/products/rstudio/download/#download) prior to the workshop date and attendees are also welcome to arrive 15 minutes early for help with software installation.
Dueling CAQDAS – Using Atlas.ti and NVivo for Qualitative Data Analysis
Mandy Swygart-Hobaugh (Georgia State University)
Florio Arguillas (Cornell Institute for Social and Economic Research (CISER))
Many social scientists like to “get their hands dirty” by delving into deep analysis of qualitative data – be it discourse analysis, in-depth interviews, ethnographic observations, visual and textual media analysis, etc. Manually coding these data sources can become cumbersome and cluttered – and may even hinder drawing out the rich content in the data. Consequently, qualitative researchers are increasingly turning to computer-assisted qualitative data analysis software (CAQDAS) to facilitate their analyses. Through hands-on work with provided data, participants will explore ways to organize, analyze, and present qualitative research using both NVivo and Atlas.ti analysis softwares. The workshop will cover the following topics:• Coding of text and multimedia sources• Using Queries to explore and code data• Organizing and classifying sources to facilitate comparative analyses across data characteristics (e.g. demographics)• Data visualizations and reportsNote that workshop attendees will need to provide their own laptop running Windows or Windows virtual desktop (for Macs). The workshop leaders will contact attendees with instructions for downloading free trial versions of Atlas.ti and NVivo for installation prior to the workshop date.This workshop is sponsored by the Qualitative Social Science and Humanities Data Interest Group (QSSHDIG).
International Activities in Research Data Management Education: Tools and Approaches
Helen Tibbo (School of Information and Library Science, UNC-Chapel Hill)
Nancy McGovern (MIT)
Thu Mai Christian (Odum Institute, UNC-CH)
Jacob Carlson (University of Michigan)
Merce Crosas (Harvard University)
Robin Rice (University of Edinburgh)
This workshop will present brief overviews of key international RDM education efforts with a synthesizing overview of progress in this area. Tibbo and Christian will report on “Research Data Management and Sharing,” the MOOC (Massively Open Online Course; https://www.coursera.org/learn/data-management) produced by the CRADLE project-- (cradle.web.unc.edu) – and the University of Edinburgh’s MANTRA (datalib.edina.ac.uk/mantra) program. The MOOC is relevant to librarians, archivists, and other information professionals tasked with research data management and preservation as well as to researchers themselves. Rice will provide an update on MANTRA and RDM efforts at the University of Edinburgh, reflect on her experience with the Coursera MOOC, and discuss how this tool might be enhanced for librarians and especially researchers. McGovern will discuss her work with the Digital Preservation Management Workshop series with which she has been a driving force for over a decade and discuss lessons taken from digital preservation for RDM activities and training efforts. Crosas will discuss RDM work at Harvard University and Carlson will talk about how Data Curation Profiles can help with data management education.These presentations will provide the audience with a starting point for breakout session topics that may include but are not limited to:• How do you handle data training at your institution?nbsp;• What are your professional needs in RDM (education for librarians/archivists)?• What lessons have you learned from working with research on their RDM needs?
Introduction to mapping QGIS
Megan Gall (Lawyers' Committee for Civil Rights Under Law)
We’re going to make some maps. Historically, there were substantial barriers to incorporating geographic information systems (GIS) into the social sciences. Originally used in the physical sciences, GIS is now well entrenched as a useful suite of analytic tools for all branches of social sciences and the list of relevant applications grows continually. Additionally, new and open source software remove many of the monetary barriers. This workshop will delve into QGIS, a powerful and free desktop GIS. We’ll cover topics designed to get new users acclimated to the technology and mapping on their own.We will cover basic and intermediate GIS topics. Basic topics include general mapping concepts, data requirements, useful GIS data repositories, and how to load those data into QGIS. Intermediate topics will cover types of GIS data visualizations, data manipulation techniques, and basic analyses.This is a hands-on session that will introduce participants to the fun and ease of map making. Participants will leave with practical skills, free resources, and a well-developed understanding of GIS principles.Note that workshop attendees will need to provide their own laptop. The workshop leaders will contact attendees with instructions for downloading software prior to the workshop date and attendees are also welcome to arrive 15 minutes early for help with software installation.
Preparing Qualitative Data For Sharing and Re-Use
Louise Corti (UK Data Archive, University of Essex)
Libby Bishop (UK Data Archive, University of Essex)
Sebastian Karcher (Qualitative Data Repository, Syracuse University)
This workshop is for researchers interested or actively engaged in the creation and management of qualitative research data, and looks at the steps required to prepare data for sharing and reuse. We will cover existing best practices and tools, looking at data preparation, ensuring that non-proprietary formats are used, and raw data are documented to capture as much context as possible.nbsp; We pay attention to the design of consent forms, and methods of anonymisation and controlling access, highlighting strategies that researchers can use to share as much research information as possible ethically and legally.Finally we show examples drawn from UK Data Service and the Qualitative Data Repository of how data can be published, the levels of access control required, and look at the impact of sharing data as a valued research output, and of course, a great long-lasting asset!nbsp;We track examples of successfully archived qualitative data as it makes its way through the data assessment, review, processing, curation, and publishing pipeline.
You Can Too! Running a Successful Data Bootcamp for Novices
Ryan Clement (Middlebury College)
Successful outreach on topics such as working with and managing research data can be challenging when faced with novice users. Participants in this workshop will learn about v 1.0 (2015) and 2.0 (2016) of a multi-day Data Bootcamp for novice users in the humanities and humanistic social sciences that was held at Middlebury College. This workshop covered topics such as managing, cleaning, and documenting data, as well as data visualization, mapping, and working with textual data. In addition to discussion about what worked for Middlebury, participants will work together to determine audience needs, learning objectives, and tools. Potential workshop plans will focus on active learning methods and free and/or open-source tools and data to increase accessibility. Participants will also be able to access and share workshop materials from an Open Science Framework project.
2017-05-25: D1: Strategies for Collaboration Across the Research Ecosystem
The staff's knowledge sharing in the Management and Planning Organization of Qazvin province
Shima Moradi (National Research Institute for Science Policy)
Zarrin Zare Poorkazemi (Islamic Azad University Central Tehran Branch)
This study aimed to investigate the staff’s knowledge sharing in the Management and Planning Organization of Qazvin province, determining the components ofnbsp; knowledge sharing as well as the relationship between this variable and the organizational posts among staff. A survey method with descriptive – analytical approach was conducted using a self-made questionnaires with Likert scale for this variable. This included motivation, believes, skills, information, technology, time, enjoyment, importance and fear.This organization had 78 staff and 65 persons have been selected as the final research samples. Data analysis was performed using SPSS 22.nbsp;The study revealed the importance of some components rather than others while There was a significant relationship between “organizational position” and knowledge sharing.
Academic liaison librarians in the middle of research data management on campus
Patrick Griffis (University of Nevada, Las Vegas)
Michael Luesebrink (University of Nevada, Las Vegas)
Cinthya Ippoliti (Oklahoma State University)
Hui-Fen Chang (Oklahoma State University)
Helen Clements (Oklahoma State University)
Pat Hawthorne (University of Nevada, Las Vegas)
Academic libraries exist at the center of research on campus and academic liaison librarians have begun providing services to assist researchers to manage their data. Managing research data is currently a strategic initiative of the Greater Western Library Alliance [GWLA]. This moderated panel will be composed of liaison librarian administrators and liaison librarians from two comparable GWLA libraries who will provide case studies regarding their perspective roles concerning research data management services to their researchers on campus. They will describe the current state of their back end data services infrastructure while highlighting best practices in terms of providing front end research data management services for their campus community. Specifically, the panelists will outline their front end research data management services such as providing workshops, online guides and tutorials, as well as providing research consultations and referrals. The panel discussion will provide time at the end of the session for questions and answers with the audience.
Clowns to the left of me, data to the right; stuck in the middle with you: Seeking middle ground for data instruction to non-specialists
Terrence Bennett (The College of New Jersey)
Shawn Nicholson (Michigan State University)
News stories remind us of the exponential growth of collected data—often with a corollary lament that our ability to make sense of that information isn't keeping up.The increased focus on research data services within academic libraries and research centers is an acknowledgment that learners need to gain better data management and manipulation skills in order to succeed beyond the academy. However, this understanding can be diminished by the continued marginalization and isolation of data from the larger realm of information. In reaction to these contradictory messages, this presentation focuses on refinements to library instruction that promote the perception of data as an integral component of information-seeking, rather than perpetuating the message that data represent a specialized domain of knowledge.By purposefully infusing data resources into library instruction, students are better equipped to advance critical thinking skills, and synthesize and apply information within and across disciplines. Inspired by the conference theme—and with a particular emphasis on connecting with learners who are not data specialists—this presentation will illustrate how pop culture references, humor, and low-tech instruction techniques can be employed to find the middle ground that will result in an engaging and stimulating instruction session.
Knowledge management: Introduction and application for the social sciences and beyond
Spencer Acadia (University of Kentucky)
Frank Cervone (University of Illinois at Chicago)
This presentation will expose attendees to the theory and practice of knowledge management (KM). Though KM has been around for a while in the business management and technology sectors, it has been slow to gain traction in other disciplines, including social and information sciences. This presentation will introduce the concept of KM and provide several example case studies of application in a variety of data-driven settings within the purview of social and information sciences, and will present a framework for KM appropriate for dealing with social sciences data issues. The presentation will approach KM in a global context that is appropriate for and relevant to a wide-range of environments within the social and information sciences. The intended audience for this session is librarians, archivists, researchers, managers, educators, and other professionals who deal with and/or would like to understand more about social sciences data management through a KM perspective. The presentation assumes little to no prior knowledge of KM and, therefore, is widely accessible for all interested attendees.
2017-05-25: D2: Instructional Tools
Picturing data within the ACRL Framework for Information Literacy
Cameron Tuai (Drake University)
The controversy surrounding the new ACRL Framework for Information Literacy suggests that significant change is afoot within field of library instruction.nbsp; The critical reconception of information literacy has led to much gnashing of teeth as instruction librarians adapt current instructional practices into the social justice ideals of the Framework.nbsp; This presentation will explore the advantages of critical data literacy as a mean of realizing these ideals in terms of Framework’s “belief” in information literacy as an “educational reform movement”.Applying the information literacy framework, Authority is Constructed and Contextual, we will first demonstrate how critical data literacy supports user recognition of privilege within the process of data collaboration, production and sharing.nbsp; From this example, we will then explore the Framework’s conceptual foundation in meta-literacy and threshold theories in order to explain data literacy’s capacity to support broader critical self reflection.nbsp; Lastly, we will summarize the presentation into a legitimacy based model of data literacy as an educational reform movement.nbsp; The goal of this presentation is to provide both practical guidance to data literacy instructors and the conceptual grounding necessary for customizing these practices into the local context.
Teaching big data skills in the social sciences
Sarah King-Hele (UK Data Service, University of Manchester)
The UK Data Service is a resource funded to support researchers, students, lecturers and policymakers who depend on high-quality social and economic data.nbsp; Over the last year, the service has been running a range of workshops and webinars related to big data to upskill social scientists so they are better able make use of new and novel forms of data to study societies and people.nbsp; Our courses have included webinars and workshops and have concentrated on elements of the Hadoop ecosystem, basic computing skills such as programming, collecting data from the internet and using databases to store and query data, and ethics in big data research. We have also run courses and a summer school in collaboration with other academic organisations.nbsp; This presentation will discuss our experiences running big data training and some of the key lessons learned.
The Software/Data Carpentry Movements: How crowdsourced lessons, research-based pedagogy and peer learning are ameliorating deficits in data literacy and software development skills in academia
Tim Dennis (UCSD)
Juliane Schneider (Harvard Catalyst, Clinical and Translational Science Center)
Recognizing that data and computing have become a "central currency" and "integral part " of science, respectively, but that most early career scientists come ill-preparednbsp; to work with data or build, use and share software, Software Data Carpentries were created to provide a volunteer network of instructors and collaboratively authored open lessons to teach participants the basics of software and data skills.nbsp; In this paper, I'll discuss the basics of what makes up a Software and Data Carpentry workshop, including the learning objectives and overall goals behind each carpentry.nbsp; I'll cover how pedagogical techniques, such as pair programming, collaborative note-taking, live coding,nbsp; sticky-notes for signaling, are employed in eachnbsp; workshop. I'll also discuss how course materials and lessons are collaboratively authored and maintained by volunteers world-wide in GitHub.nbsp; nbsp;Finally, I'll provide advice on how an academic library or archive can use the Software Data Carpentry workshops and lessons to provide data instruction to their clientele and build a community of instructors in their organization.
Understanding data literacy requirements for assignments: A business school syllabus study
Meryl Brodsky (Eastern Michigan University)
Syllabus studies have been used to inform librarians’ work in collection development, instruction and information literacy. Syllabi also provide an opportunity to understand course requirements for data literacy. In this study, syllabi from Eastern Michigan University’s College of Business were analyzed to determine which courses require data literacy for the completion of assignments or projects. The author tested several hypotheses:nbsp;1.nbsp; nbsp; nbsp; Data use in online and hybrid class assignments is greater than for in-person class assignments2.nbsp; nbsp; nbsp; Graduate students have greater data requirements than undergraduate students3.nbsp; nbsp; nbsp; Different business school disciplines have different data needs (i.e., marketing has more, accountingnbsp; has less)nbsp;Analyzing syllabi and assignments can reveal both stated and implied data literacy competencies. Surfacing these competencies and making them explicit gives the librarian and the teaching faculty the opportunity to co-design relevant teaching and learning activities. Since data literacy instruction is a new initiative at the Eastern Michigan University Library, the author also used this study to bring attention to this capability.
2017-05-25: D3: National Infrastructure Initiatives
Challenges of providing outreach services to data users in Uganda; A Case of Uganda Bureau of Statistics
Winny Nekesa Akullo (Public Procurement and Disposal of Public Assets Authority)
Godfrey Geoffrey Nabongo (Uganda Bureau of Statistics)
Patrick Odong (Uganda Christian University)
Outreach services are one of the possibilities to enhance access to health statistical information. Better mobilization of urban health workers to serve remote or underserved areas as a strategy to improve access to health information to the population in remote and rural areas (WHO, 2012). The outreach services goal of Outreach activities of Statistics Canada is to generate interest and add value to their products and services. This has been achieved by publicizing official statistics not only to increase public awareness, understanding and use data, but also to generate interest and encourage greater numbers of businesses and individuals to answer the agency's surveys (Statistics Canada, 2014).nbsp;This paper examines the challenges Uganda Bureau of Statistics (UBOS) faces in providing outreach services to data users in Uganda.nbsp; The objectives of the study were: to examine the outreach services provided by UBOS, the challenges they face in providing the services and proposals for enhancing outreach services to the data users. A total of 10 respondents in UBOS charged with providing outreach services.nbsp;An online questionnaires, and interviews were used to collect data from UBOS staff for this research. The study found out that UBOS provides a number of outreach services to its data users to include exhibitions, school outreach programme, training etc. however, it faces challenges of inadequate funding to finance the initiatives and translate the information into local languages. The study therefore proposes as an institution, UBOS needs to prioritize or allocate funding for the outreach services for it fully achieve its mandate.
Portaging the landscape: Developing delivering a national RDM training infrastructure in Canada
Carol Perry (University of Guelph)
Jane Fry (Carleton University)
James Doiron (University of Alberta)
Launched in 2014, the Canadian Association of Research Libraries’ (CARL) Portage Network has a mandate to develop a national research data culture and infrastructure. Under the guidance of its inaugural Director, Chuck Humphrey, the broad goals within this mandate are two-fold: 1) to develop a robust pan-Canadian network of Research Data Management (RDM) expertise; and 2) connecting essential infrastructure and service components needed for RDM, and a national preservation and discovery network. There are currently six active Portage Expert Groups, including one focused upon the development of RDM training in Canada. This presentation will focus upon three key deliverables undertaken by the group: 1) an environmental scan to identify existing RDM resources and tools which may be leveraged; 2) a White Paper that identifies international training activities, gaps in RDM training in Canada, and a high-level view for a national RDM training program for various stakeholders; and 3) the development and delivery of a national RDM training program. Current status of deliverables, lessons learned, and forward work will be discussed.
Putting the puzzle together- a picture of data centres in the UK
Joanne Webb (Administrative Data Research Network)
The UK's Economic and Social Research Council (ESRC) have established a three strand programme to encourage the use of Administrative and Big Data in research.nbsp; The first strand funded the Administrative Data Network, a UK-wide partnership between universities, government bodies, national statistics authorities and the wider research community. www.adrn.ac.uk.nbsp; The Network facilitates secure research access to linked, de-identified administrative data to enable real-world analysis that can benefit society.The second strand covers the Big Data Centres: the Business and Local Government Data Centre, the Urban Big Data Centre and the Consumer Data Research Centre.nbsp; These concentrate on making data routinely collected by business and local government organisations accessible for academics in order to undertake research in the social sciences.The third strand enables partnerships between academic institutions and citizen and voluntary sector organisations, to establish or build on relationships between academic researchers and civil society organisations. The aim is to demonstrate the value of improved data infrastructure, enabling collection and analysis of data which is of interest to civil society organisations and empowering the sector to better use its own data.nbsp;All three strands aim to enable research in social sciences while safeguarding individuals’ identities.nbsp; Working in a diverse changing landscape brings its own challenges and possibilities.nbsp; The University of Essex hosts the Administrative Data Service of the ADRN, the Business and Local Government Data Centre and the Human Rights, Big Data and Technology Project.nbsp; The challenge is to describe the boundaries, overlaps and synergies in the landscape.nbsp;This paper will review the UK network of these partnerships, a description of some of the challenges of taking part in such a diverse growing landscape and some of the benefits to researchers and society.
Swedish Research Data System
Johan Fihn (Swedish National Data Service)
Swedish National Data Service (SND) ran during 2016 a national pilot project with the purpose of looking at possibilities to establish a national infrastructure with the purpose of making research data more accessible. The project introduced a collaboration between SND, university archives and university libraries, where SND works as a back office knowledge center and university librarians functioning as a front office towards researchers.nbsp;Experiences from the pilot project have been analyzed and integrated into building a new collaborative infrastructure for research data in Sweden. The infrastructure consists of four different modules:Module 1 - Swedish Research Data Repository: A Joint Trusted Digital Repository where SND and several universities collaborate in creating common data curation and storage solutions.Module 2 - Swedish Research Data Portal: A National Metadata Portal for research data.Module 3 - National Knowledge Center: Establishing SND as the National Knowledge Center for research data activities in Sweden, including training of librarians, archivists and researchers.Module 4 - Research Data Collaboration. A national collaboration on all things research data, as e.g. preservation formats, standards, common metadata profiles, DMP, and law.
2017-05-25: D4: Metadata in the Curation Workflow
George Alter (ICPSR, University of Michigan)
Jared Lyle (ICPSR, University of Michigan)
Jeremy Iverson (Colectica)
Accurate and complete metadata is essential for data sharing and for interoperability across different data types. However, the process of describing and documenting scientific data has remained a tedious, manual process even when data collection is fully automated. Researchers are often reluctant to share data even with close colleagues, because creating documentation takes so much time.nbsp;
Building metadata for economic data description and access
Genevieve Podleski (Federal Reserve Bank of St. Louis)
Making economic time series data accessible and understandable to an ever-widening audience of data users is a large and growing challenge. With input from sophisticated and novice data users and from data and metadata librarians, the Federal Reserve Bank of St. Louis has built a metadata standard for economic time series data that expands on other commonly used schemas to provide a more user-centric framework. This presentation will give an overview of the challenges of data description for cataloging, search, and education, and will present the specific solutions developed for the FRED economic data repository.
Documenting non-survey data in the Social Sciences with DDI-Lifecycle
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
Kerrin Borschewski (GESIS - Leibniz Institute for the Social Sciences)
Non-survey data – such as experimental data, social network data, etc. – become more and more important for the work of social scientists. Also, their variety increases steadily. Thus, social science archives have to pay further attention to non-survey data and corresponding data documentation. This presents the Data Documentation Initiative with the challenge to broaden its scope, in order to enable the documentation of the different kinds of non-survey data within its metadata standard.nbsp;This presentation will display a case study of documenting non-survey data with DDI-Lifecycle, based on data examples retrieved from ‘datorium’ - the data repository service of the GESIS Leibniz Institute for the Social Sciences (https://datorium.gesis.org). We will determine the information on study and dataset level needed, in order to enable data reuse. Furthermore, we will assess if those information can sufficiently be captured with DDI-Lifecycle.nbsp;The insights retrieved from this case study can support the further development of the DDI standard. Additionally, they build a basis for data capture and archival work in the future, by considering and examining new data sources for the social sciences.
2017-05-25: E1: Preservation Matters
Who cares about 3D data preservation?
Jennifer Moore (Washington University)
Hannah Scates Kettler (University of Iowa)
Preservation of 3D research data is a present and emerging need.nbsp; An increasing number of researchers are generating, capturing and/or analyzing 3D data, but are rarely focused on preservation or reuse. This paper and presentation will describe models of 3D data creation and use, outline the specific concerns for this data type, unpack complexities and challenges of preserving it, examine existing resources, and discuss possible standards and solutions while working through local case studies from the field of anthropology.
Preparing data files for preservation with Colectica Datasets
Jeremy Iverson (Colectica)
Dan Smith (Colectica)
Data have no meaning without metadata. Statistical data tools like SAS, SPSS, and Stata provide limited metadata capabilities. Commonly, datasets contain variable and value labels, but even these are often missing. Without metadata, research is less credible. To enhance reproducibility and preservation, it must be simpler to add rich metadata to statistical files.
You can’t replicate what you can’t find: Data preservation policies in economic journals
Courtney Butler (Federal Reserve Bank of Kansas City)
Brett Currier (Federal Reserve Bank of Kansas City)
This presentation will review digital preservation strategies of economic journals that have data availability policies. Long-term data preservation is critical for future reproducibility of economic research. A greater focus is being placed on making research data publicly available, but there is a dearth of official policies and discussion in the literature concerning preservation. A sampling of about 250 economics journals was developed by cross-referencing journal impact factors, h5-indices, IDEAS rankings, and Federal Reserve Bank of Kansas City staff authorship and service to the journal. This sampling analyzes whether data preservation policies are present either independently or as part of a larger data availability policy. Preliminary results indicate that while data availability policies are becoming much more common, data preservation policies are practically nonexistent. This has strong implications for future research reproducibility. In response, the Federal Reserve Bank of Kansas City is developing an institutional data preservation platform as an alternative solution.
2017-05-25: E2: Encouraging Data Publishing in the Social Sciences and Humanities
Publishing and reviewing data papers: experiences from the Research Data Journal for the Humanities and Social Sciences
Louise Corti (UK Data Archive)
Peter Doorn (DANS)
In this paper we will, as lead editors of the Research Data Journal for the Humanities and Social Sciences, hosted by Brill publishers, provide our experiences of commissioning, editing and publishing data paper for the new peer-reviewed, online only, open access data journal, established by DANS in 2015. The Research Data Journal contains short publications (data papers) in which researchers describe their dataset: the context of their investigation of the problem and methods used. This is followed by an overall profile of the dataset, for example in terms of general characteristics or remarkable results. Conclusions as an ordinary scientific paper are not required, but there is room for concluding remarks. Readers can respond via a comments field to the content. The data must be deposited in a trusted repository, such as via DANS or the UK Data Service.Data papers are a relatively new venture for the humanities and social sciences, and we have had to work hard to show potential authors the value of them as a valuable output, to complement not only their own research publications, but to promote their published datasets. We showcase some of the papers and set out our requirements and review process for the papers.
Scooping the social sciences: how new metrics can help us make sense of data
Lily Troia (Altmetric)
How much impact does your institution have? And how do you know? Are your researchers getting the funding they deserve, or is there room for improvement?nbsp;In this session we'll look at the role that engagement metrics play in the challenges and opportunities that make up the current research landscape.nbsp;We'll present case studies of how institutions around the world are using these data to track and showcase the value of their arts, social science and humanities outputs, and discuss how these new approaches have been integrated into existing workflows.nbsp;As researchers become more connected to each other and to a broader audience, it's crucial that institutions play a more active role in monitoring and supporting the conversation relating to their expertise. This session will provide attendees with practical ideas for how they might go about getting started.
Why do authors of social science journal articles share their data? Explanations by the Theory of Planned Behavior.
Esra Akdeniz (GESIS – Leibniz-Institute for the Social Sciences)
Wolfgang Zenk-Möltgen (GESIS – Leibniz-Institute for the Social Sciences)
Previous work on data sharing of sociology and political science research datasets focused on journal policies in the fields of social and political science and their impact on authors´ data sharing behaviors. In order to analyze individual motivations of data sharing, we extended this approach with a survey to authors of articles in academic journals, revealing their views on data sharing. After presenting initial descriptive results at IASSIST2016, we now provide some more in-depth analysis. The Theory of Planned Behavior (TPB) has proved to be a powerful approach to better understand human behavior. Therefore, the survey was conducted with the aim to identify possible factors that can affect researchers´ behavior towards data sharing in terms of three aspects: attitude towards data sharing, perceived social norm and perceived behavioral control. The data was analyzed using structural equation modelling (SEM) to outline the role of TPB and to explain data sharing in the social sciences.
2017-05-25: E3: Tools for Reproducible Workflows Across the Research Lifecycle
Building up a Tool Chain to support the Research Data Life Cycle
David Schiller (TBA21 Germany)
Ingo Barkow (HTW Chur - University of Applied Sciences)
The Research Data Life Cycle includes several processes from data collection plans (questionnaires or alternative data collection techniques) to the actual data collection, the processing of raw data, first data analysis, data archiving as well as curating data and data dissemination. Each of those processes needs to be supported by a specialized software. Thereby the quality of data and efficiency of research is heavily depending on the software used and the interoperability of different software products. A variety of products often leads to complications during the Research Data Life Cycle. Different formats and standards, proprietary code, and sometimes the lack of appropriate and user friendly software at all make it hard and sometimes impossible to create and maintain efficient Research Data Life Cycle processes. The recommendation of a Tool Chain to support the complete process aims on solving those challenges. Build on Open Source and with interoperability in mind different software modules focus on different process within the life cycle. The paper first describes the different processes within the Research Data Life Cycle, afterwards names appropriate tools or shows blank spaces in the overall workflow and closes with a summary of important best practice procedures needed to fulfil the requirements of a real Tool Chain.
Reproducing and preserving research with ReproZip
Remi Rampin (New York University )
Vicky Steeves (New York University)
Fernando Chirigati (New York University)
The problem of reproducibility is multifaceted - there are social and cultural obstacles as well as technical inconsistencies that make replicating and reproducing extremely difficult. In this paper, we introduce ReproZip (https://reprozip.org), an open source tool to help overcome the technical difficulties involved in preserving and replicating research, applications, databases, software, and more.
Projects, Packrat, Tidyverse - New ways to do reproducible research in R
Alicia Hofelich Mohr (University of Minnesota)
While the "replication crisis" has called into question the reliability of many scientific findings from psychology to medicine, it has also highlighted the criticality of good data management in the research workflow. As researchers strive to make their work more transparent, open, and reproducible, more tools are being developed to support these efforts. The analysis workflow in particular is one part of the research lifecycle that stands to benefit most from these developments. This presentation will describe how researchers can integrate better data management and reproducibility into their analyses using new extensions in R, a popular tool for statistical computing. These extensions include new support for a variety of tasks, such as file directory management (R Projects), version control and sharing (Git/Github interfaces), reproducible reports (knitr), project portability and longevity (Packrat), as well as visualization and data wrangling (Tidyverse).
2017-05-25: E4: Strategies for Delivering Data Services
Visualization Services in the Harvard College Library: Laying the groundwork
Hugh Truslow (Harvard College Library)
Like many university library systems, the Harvard Library is trying to find ways to more deeply and meaningfully support the many new forms of digital and data-centric scholarship across the disciplines, and data visualization is one important aspect of this. Two newly created positions in the Maps, Media, Data, and Government Information unit in the Faculty of Arts Sciences, are in the early stages of exploring service models, internal partnerships, outreach, and other issues as they try to build a program of services in the complex and decentralized Harvard system that in many ways has adapted to the landscape of digital scholarship by local responses, often within departments. What is the role that data visualization plays in the various aspects of the research lifecycle, not just in the presentation of research results? What are scalable models of support? What is the balance of training opportunities on specific visualization tools versus more general approaches? This presentation will delve into these and other questions.
Data Librarian in the middle, creating instructional content for the digital humanities
Matthew Gertler (University of Toronto Scarborough)
A data librarian at a small to medium sized campus will receive data related requests from many disciplines representing a variety of approaches methodologies and tools. There may not be enough questions to merit multiple data specialists, but the breadth of questions can be difficult for one individual to handle. Join Matthew Gertler a data librarian who recently began his career starting from a Social Sciences Data Services perspective. He will discuss the challenges, opportunities and joys of delivering reference and instruction for data discovery, literacy, management, manipulation and visualization for a diverse academic audience.The discussion will be framed upon a project creating instructional content for five lectures of a digital humanities course with the collaboration and input of other librarians.nbsp; There will be an exploration of how principles related to research data are transferable across disciplines. This commonality aids in the creation of instructional content for discipline specific approaches. In contrast the content was targeted by discipline and skill set. This was true even for examples such as web mapping and data management, which are widely used.
Students helping students: Economic Library Student Assistants at Dartmouth College
John Cocklin (Dartmouth College )
The number of undergraduate students taking Economics courses is rising dramatically at Dartmouth College. To better help them with their culminating senior projects, and assist their faculty, the Library is building a team of undergraduate student assistants. They help their fellow students find data, wrangle it into a form compatible with Stata, and then use it with Stata. When developing the team, the Library looked to examples from other college and university libraries. Since Dartmouth does not offer a Ph.D. in Economics or in Business, who are frequently used as consultants elsewhere, we look to undergraduates for providing consultation.nbsp; Indirectly, this program has proven to be a very effective form of outreach to both students and faculty. Directly, even at this early stage the program is successful on multiple levels. Students feel comfortable working with other students, and they can meet on evenings or week-ends when Librarians are unavailable. With the Library handling many of the data and basic Stata questions, faculty are given more time with students to focus on in-depth questions about econometric methodology. Perks for student assistants include advanced training in databases such as Bloomberg, an attractive skill to employers.
2017-05-25: F1: Health Data: An International Comparative Deep Dive
Health data: An international comparative deep dive
Bobray Bordelon (Princeton University)
Jane Fry (Carleton University)
Ron Nakao (Stanford University)
A deep dive into data sources for health from Canada, the USA, and international developing countries will be presented.nbsp; What are the best sources for individual nations?nbsp; How does one choose which dataset(s) to use? Can comparisons be made between nations?nbsp; Sources from the United States' National Center for Health Statistics; Statistics Canada; the Demographic and Health Surveys; and other agencies will be explored.
2017-05-25: F2: Building Bridges for Qualitative Social Science and Humanities Researchers
Stuck in the middle with you: Building bridges for qualitative Social Science and Humanities researchers
Lynda Kellam (UNCG)
Mandy Swygart-Hobaugh (Georgia State University)
Louise Corti (UK Data Service)
Sebastian Karcher (Syracuse University)
Dessi Kirilova ( Syracuse University)
The Qualitative Social Science and Humanities Data Interest Group was founded in 2016 to explore the challenges and opportunities facing data professionals in these areas. In this panel, Mandy Swygart-Hobaugh introduces how qualitative social science and humanities researchers may feel “stuck in the middle” between more quantitative disciplines, and how the language of data can help build bridges across disciplines and methodological approaches. Lynda Kellam discusses efforts to work with historians to apply principles of research "data" management to archival work. She discusses her efforts to develop and promote best practices with the history graduate students at UNCG, who use non-numerical data, such as PDFs, images etc. Louise Corti addresses disciplinary bridging, showing how older qualitative data, from the 1960s, is being used by social historians, firmly established as a humanities resource. She introduces her collaborative project with humanities scholars looking at automating the publishing of recorded oral histories. Sebastian Karcher and Dessislava Kirilova discuss the recent transparency agenda in the social science and humanities and what this means for different types of qualitative data. They draw on examples from political science and QDR to consider potential solutions for open science, such as the use of Annotation for Transparent Inquiry.
2017-05-25: F3: Open, Public Goods Infrastructure for Research Management & Discovery
Open, public goods infrastructure for research management discovery
Cynthia Vitale (Washington University in St. Louis)
Victoria Steeves (New York University)
Matthew Spitzer (Center for Open Science)
Open technical infrastructure supports research reproducibility, open access mandates, and data management and sharing requirements. As public goods, these tools build community trust through the openness of their code and infrastructure. For libraries and data centers, they provide practical and beneficial opportunities for supporting the work of researchers at various points throughout the research lifecycle. This panel will highlight the role of open, public goods in the efficient and scalable development of infrastructure to support research management, reuse, and discovery. Specifically, panelists will highlight the Open Science Framework (https://osf.io, ReproZip (https://reprozip.org), and the SHARE data set (www.share-research.org), recent advancements and work, and present on how libraries and librarians can engage and embed themselves in the research workflow. A discussion and brainstorming session will follow the presentations.
2017-05-25: F4: Collecting and Mining Data from the Web
Websites, Twitter, and Facebook - oh my! How to start gathering data from the web
Michael Beckstrand (University of Minnesota)
Alicia Hofelich Mohr (University of Minnesota)
The internet is full of information waiting for exploration, from social media, to newspaper comments, to digitized archives. But how can interested researchers get started without knowledge of APIs, scripting, or programming languages? We will introduce participants to browser-based tools for online data collection, including batch download managers and coding-free screen scrapers. It will also introduce Facepager, an open-source graphical API-access tool for gathering information from Twitter, Facebook, and other social media sites.
Access to social science research data by an open API
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
Reiner Mauer (GESIS - Leibniz Institute for the Social Sciences)
This presentation will show how researchers will get a direct machine-actionable access to social science research data from the GESIS Data Archive for the Social Sciences. All research data available at GESIS are accessible via the Data Catalogue DBK (http://dbk.gesis.org/dbksearch/). After registering with the service most of the data is directly downloadable, the rest can be ordered. Predominant data formats provided are SPSS or STATA, and others may only be obtained on request. While this service is well established, it may not be suited for more advanced research use cases like data linking or automatically crawling data in interdisciplinary settings. Also for archiving purposes a vendor independent data format based on Unicode format CSV files would be preferable. Thus, a generic procedure for converting all archived SPSS datasets was developed, tested, and applied to the majority of SPSS datasets in the archive. These Unicode format datasets are now available for download to researchers. Additionally, an open API was developed to give registered users the possibility of directly accessing the data. The system uses the OpenAPI specification (https://www.openapis.org/), a vendor neutral description format for RESTful APIs. This new service enables researchers to do advanced data analysis respecting current access restrictions.
2017-05-25: F5: Small Campuses, Small Repositories: Sustainable Processes
Small campuses, small repositories: Sustainable processes
Rachel Walton (Rollins College)
Paula Lackie (Carleton College)
Patti McCall (Rollins College)
Thu-Mai Christian (UNC Chapel Hill)
Even on small or teaching-centered campuses, faculty conduct research and produce data that would be well suited for deposit in a local repository rather than a disciplinary repository or self-service resources site like FigShare or GitHub. But, building small institutional data repositories with limited resources and staff has not been an attractive solution to most. At the same time, the need continues to grow for locally supporting faculty’s data management, data storage, archiving, and other research needs with locally managed solutions. As data professionals at small institutions consider options like Dataverse, the following questions arise: (1) what unmet faculty data needs exist at these smaller institutions? (2) how may the Dataverse application provide support for those these needs? (3) what needs might NOT be supported by Dataverse? (4) how do we gauge the community’s readiness for considering Dataverse as a solution? (4) what are the alternatives?nbsp; This panel will discuss collected responses to these questions from information professionals within the Oberlin Group, hear from a Dataverse representative at the Odum Institute, reflect on local experiences, and encourage the sharing of similar ideas and experiences among attendees.
2017-05-26: G1: Developments in DDI
Sample Use Cases for the Codebook View in DDI Views (DDI4)
Dan Gillman (BLS - U.S. Bureau of Labor Statistics)
Arofan Gregory (Aeon Technologies)
Larry Hoyle (IPSR, University of Kansas)
Knut Wenzig (DIW Berlin - German Institute for Economic Research)
The DDI Moving Forward project (DDI-4) is the effort to modernize the way DDI is managed. Through the use of UML (Unified Modeling Language), a software independent representation of DDI is being developed and maintained. Compatibility with the older versions of DDI, DDI 2.x (Codebook) and DDI 3.x (Lifecycle), is a requirement. So, XML and RDF bindings to the UML model are being developed. DDI Views (DDI4) includes a Codebook View which can be used to describe the logical and physical structure of a variety of data files along with information supporting both discovery and understanding of the data. This presentation shows the use of this view to describe examples of a dataset written in CSV and fixed column layouts. The presentation will include a brief tour of the DDI Views model, descriptions of the classes and attributes of a simple codebook, and examples of the XML used to describe each type of file. This presentation is based on work done at Schloss Dagstuhl event 16433, October 23 through 28 2016. http://www.dagstuhl.de/de/programm/kalender/evhp/?semnr=16433 as well as a series of online meetings by the Codebook working group https://ddi-alliance.atlassian.net/wiki/display/DDI4/Simple+Codebook+View+Team.
Recent Progress on the DDI Moving Forward Program
Steven McEachern (Australian Data Archive)
The DDI Moving Forward program was established in 2012 to move DDI to the next generation of development. Over a series of face to face and online meetings, the Alliance has been establishing the next phase of DDI - the move from an XML standard to a model-based standard. The effort involves developing not only the information model, but also the infrastructure for building that model, the transformation into a set of representations (initially XML schema and RDF/OWL), and the associated documentation. The first two releases of the DDI Moving Forward program, known as DDI-Views, have now been publicly released, and the work program is now moving into development of core content areas for the DDI community. This panel will provide an overview of the new content released as part of the work program, applications of the new content, and the production framework and tools used in development and management of the standard. Papers include: - An introduction to the model-based approach (Achim Wackerow) - The Data Description, Data Capture and Codebook packages (Dan Gillman, Jay Greenfield, Barry Radler) - Applications of in statistical production and health (Jay Greenfield, Dan Gillman) - Production systems and tools (Larry Hoyle, Achim Wackerow)
A Vision for a Future Research Infrastructure
Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)
Ingo Barkow (Swiss Institute for Information Sciences, HTW Chur)
William Block (CISER, Cornell University)
Jay Greenfield (Booz Allen Hamilton)
Steven McEachern (ADA, Australian National University)
Founded in 1995, the Data Documentation Initiative (DDI) has been used by the international social science research community to describe data. Over those years, individuals and organizations have made strides in both evolving the standard and developing DDI based tools, with a goal of enabling efficient metadata creation and exchange across the life cycle. How can we build on these successes, adapt to a changing environment, and achieve better integration? The answer: a new vision for a large-scale distributed infrastructure for all empirical social science research based on DDI. This interactive panel will present the DDI Alliance’s proposed vision for infrastructure and invite attendees to engage and participate in discussing and developing the plan. Topics discussed will include: proposed tools (e.g., questionnaire element registry, datum-level data storage, interoperability with other metadata standards), funding opportunities, engaging with other stakeholders, and strategies for achieving the vision. This is a working draft of an iterative process started by a working group in the DDI Moving Forward workshop in Dagstuhl in October 2016. Join this unique opportunity for participants in the data life cycle to discuss how collaboratively develop a more robust environment for research.nbsp;
2017-05-26: G2: Developing Academic Library Data Collections & Finding Licensed Data Resources
Finding Licensed Data Resources: the User Experience
Adrienne Brennecke (Federal Reserve Bank of St. Louis)
Developing Academic Library Data Collections: A Discussion of Current Practices
Harrison Dekker (University of California, Berkeley)
Bobray Bordelon (Princeton University)
Robert O'Reilly (Emory University)
Joel Herndon (Duke University)
With the growth of data intensive teaching and research, some academic libraries are committing more funding and attention to the acquisition of numeric data resources. Given the extent to which data differs from traditional library materials, there are a variety of challenges these libraries are facing, particularly in the areas of cataloging, licensing, discovery and access. Panel participants, all experienced data librarians at large research libraries, will discuss their experiences in developing policies for and building data collections at their respective institutions. Attention will also be given to the similarities and differences between these activities and parallel efforts to develop collections of campus-produced data.
2017-05-26: G3: Innovations in Managing Secure Outputs: What's the Takeaway?
Innovations in Managing Secure Outputs: What's the Takeaway?
Matthew Woollard (UK Data Service)
Beate Lichtwardt (UK Data Service)
Deborah Wiltshire (UK Data Service)
Johanna Eberle (Institute for Employment Research (IAB))
Dana Müller (Institute for Employment Research (IAB))
Amy Pienta (ICPSR, University of Michigan)
A number of Research Data Centres (RDCs) run secure environments which provide valuable access to rich sets of data which, because of their level of detail, carry a significant risk of disclosing the identity of individuals or organisations. Many pioneering areas of research are advanced by using these highly sensitive data. RDC use is on the rise, also due to the growing range of new data available exclusively via this route.nbsp;One of the foundations of RDCs is the concept of safe outputs whereby results of analysis, undertaken within secure environments, are only released once they have passed a thorough Statistical Disclosure Control (SDC).nbsp; However, SDC checks can be time and labour-intensive.nbsp; Therefore, a challenge for many RDCs is to make the process of output checking more efficient while retaining the rigour of disclosure control and excellent service for users.This panel session will feature three RDCs based in the UK, Germany and the United States of America, and how they tackle those challenges using methods such as automation, evaluation techniques, managing throughput, and working with data owners.nbsp; Panellists will present their local innovations and discuss how providers internationally can learn from each other in advancing services in this area.
2017-05-26: G4: Data Sharing and Reuse Across Boundaries
Like a Kid in a Sweetshop: Lessons in Managing Researcher Expectations
John Sanderson (UK Data Archive/ADRN)
Sabrina Iavarone (UK Data Archive/ADRN)
Rowan Lawrance (UK Data Archive/ADRN)
The Administrative Data Research Network has been created to arrange access to data which are historically difficult to obtain. In the initial phase of the Network researchers have requested any data which they think will be useful, with the task of establishing the practicalities of data availability, useability and accessibility a responsibility of the ADRN. This 'offer' has presented some unique challenges for the Administrative Data Service - the section of the ADRN which has led many of the discussions with UK government departments - and has created a mix of responsibilities and expectations between User and Service which are different from those experienced by many, traditional, data access services.This presentation will outline: 1) why the ADRN model was adopted and how it was useful in the context of the service, 2) the challenges that became apparent when ADRN began to operate – and the difficulties that were experienced because of the working model and 3) the solutions that the service put in place to enable effective user support and management in a highly challenging environment (as well as some of the blue-sky thinking that we’ve done about the ideal solutions for the future). Food for thought for anyone involved in a data service that needs to cope with the ‘new and novel’.
The Diffusion of Scholarship Across Disciplinary Boundaries through Data Sharing
David Bleckley (ICPSR, University of Michigan)
Susan Jekielek (ICPSR, University of Michigan)
An original data collection effort is often conducted by a scientist or group of scientists representing a single discipline. While secondary analysis of that data may occur within the same field, researchers from additional disciplines may also become interested in the data as well, creating a diffusion of the data across disciplinary boundaries. This paper investigates this idea using datasets archived in the Civic Learning, Engagement, and Action Data Sharing project at ICPSR. We compare the disciplines of the original researcher(s) involved in a data collection to the disciplines of researchers who have published findings based on analyses of these same datasets. Our analysis shows how some data become utilized by diverse disciplines over time. The paper also describes the extent to which researchers collaborate across disciplines in producing and analyzing data. Finally, we examine whether characteristics of the data (such as the breadth of the data) lead to greater diffusion across disciplinary boundaries. We conclude by discussing the value of sharing and using archival data across disciplinary boundaries.
Designing the Cyberinfrastructure for Spatial Data Curation, Visualization, and Sharing
Yue Li (Purdue University)
Nicole Kong (Purdue University)
Standa Pejsa (Purdue University)
Widely used across disciplines such as natural resources, social sciences, public health, humanities, and economics, spatial data (digital maps) are an important component in many studies and have promoted interdisciplinary research development. Though institutional data repository provides a great solution for data curation, preservation, and sharing, it usually lacks the spatial visualization capability which limits the use of spatial data to professionals. To increase the impact of research generated spatial data, and truly turn them into digital maps for a broader user base, we have developed the workflow and cyberinfrastructure to extend the current capability of our institutional data repository. In this project, spatial data are curated and preserved in the data repository, as well as shared as map services using GIS server. Data visualization was created to ensure general information users can browse maps to find location-based information. Data download option, metadata and DOI allow researchers to identify, cite, and reuse the datasets. In addition, these data are ingested into the spatial data portal to increase the discoverability for spatial information users. Initial usage statistics suggest that this cyberinfrastructure has greatly improved the spatial data usage and extended the institutional data repository to facilitate spatial data sharing.
Facilitating Integration of Socioeconomic and Remote Sensing Data to Support Interdisciplinary Research and Applications
Robert Downs (Columbia University)
Robert Chen (Columbia University)
Pressing environmental and societal problems continue to increase the need for interdisciplinary research that cuts across natural, social, and health science disciplines and for innovative solutions that take into account the interlinked behavior of both natural and human systems. A key challenge for the scientific community is to access and integrate data from diverse disciplines, which have traditionally collected or obtained data for very different units of analysis, on widely varying spatial and temporal scales, and using discipline-focused terminology and measurement frameworks. We highlight a number of examples of how the scientific community has integrated satellite-based remote sensing data with various types of socioeconomic data, ranging from simple visualization to statistical models to process-based simulation models. Geospatial tools and methods are often one way in which socioeconomic and remote sensing data can be integrated. We analyze published studies that cite both remote sensing and social science data to identify the ways in which data are transformed and used together and to assess what barriers and challenges users typically must overcome to achieve their objectives.
National Archive of Data on Arts and Culture: Creating a Common Language through Infographics
Jai Holt (ICPSR)
Alison Stroud (ICPSR)
The mission of the National Archive of Data on Arts and Culture (NADAC) is to share research data on arts and culture with researchers as well as those not experienced with statistical packages, such as policymakers, people working for arts and culture organizations, and the general public. Funded by the National Endowment for the Arts, the infrastructure of this data repository is within ICPSR. This Pecha Kucha presentation will demonstrate for the attendees the steps that NADAC takes to create infographics for its website to make learning about arts and culture data fun and approachable. Also, this presentation will provide ICPSR’s web team’s approaches as the team members collaborate with NADAC staff to create appealing and attractive infographics for the user community. These infographics primarily highlight statistics related to the arts and culture from large national data collections and provide a window into data collections that may otherwise intimidate novice users.
Elizabeth Wickes (University of Illinois at Urbana-Champaign)
Tabling at expos, fairs, and other outreach events is a common method of advertising services, but not something that everyone is experienced with or comfortable performing. From choosing the right swag, designing handouts, and practicing elevator pitches, a lot of work is required to provide an impactful message in a 1-5 minute interaction.nbsp; Tabling requires the use of physical and visual aids to draw people in, adapt your message to their particular need, and inspire follow up interactions.nbsp; This Pecha Kucha talk will cover the essential elements of good tabling user experiences, setting goals for what you want users to come away with, crafting materials that match those goals, and how to train and practice your presentations.
When the Kentucky Derby becomes the Grand National : Unexpected Hurdles in Negotiating Access to Government Administrative Data
Tanvi Desai (Administrative Data Service, University of Essex)
Melanie Wright (UK Data Service)
The Administrative Data Service (ADS) is part of the UK’s Administrative Data Research Network a project funded to improve pathways to access for researchers wishing to use administrative data for research with potential public benefit. ADS leads on negotiations for access to government administrative data and has been negotiating with key government departments for up to two and a half years with varying degrees of success. This pecha kucha will take a brief look at some of the hurdles that have been placed in the way of negotiations and examine the impact of these on thenbsp; negotiation process. The presentation aims to highlight issues that may need to be addressed if researchers are to gain effective access to administrative data for research.
You Just Like Me for My Methods: Experiences of Research Design and Analysis Collaborators
Thomas Lindsay (Research Support Services, College of Liberal Arts, University of Minnesota)
Alicia Hofelich Mohr (Research Support Services, College of Liberal Arts, University of Minnesota)
Although collaboration with researchers can take many forms, most clients approach Research Support Services with specific needs in specific parts of their research projects.nbsp; Occasionally however, researchers approach RSS to be co-authors or otherwise deeply involved in the methods of their research projects.nbsp; Being the owner of social science research methods on projects largely defined by faculty researchers presents unique challenges, pitfalls, and opportunities.nbsp; We will discuss a few of these, and the lessons we are learning.
Looking at the Library Data in the Mirror: Taking Our Own Advice about Data Management
Ryan Clement (Middlebury College)
Data librarians consult with and advise users every day on best practices and tools for managing, wrangling, and working with data. As we look to our own extensive stores of data on collections, users, finances and more, though, it seems we have rarely listened to ourselves when it comes to these topics. In this short talk we will cover (1) historical challenges with the management of library data; (2) Middlebury College's Library Data Project, and the strategies and tools to start addressing these challenges; (3) a call to "take our own advice" when it comes to managing and working with data.
Developing Specialized Services to Cultivate Common Skills
Jonathan Cain (University of Oregon)
This proposed talk will talk about meeting an obvious need within a specialized department, and in doing so developing a curriculum to foster basic data literacy skills for students and researchers in underserved programs.In 2016, I lobbied for and were supported in the creation of a strategic alliance Graduate Assistant position with the Department of Public Policy, Planning and Management (PPPM), a program that had historically relied upon a GTF position that was recently discontinued.nbsp;nbsp;This brief talk will show how the new position was designed to conduct a knowledge gap assessment for nontraditional recipients of targeted data services and implement the services that students and faculty need most. The pedagogical method embraces both service learning and teaching to learn methodologies, capitalizes on developing solutions to "real world needs", rather than strictly discipline specific needs. While seeking to foster the development of data focused services for one particular department, we are able to reverse engineer services and a curriculum that are of interest to a wide variety of disciplines who are not traditionally seen as large users of data services.nbsp;nbsp;
2017-05-26: H1: Data Management in the Research Process
Putting Metadata on the Map - Producing Enhanced Geospatial Visualisations from Open-source Tools to Encourage Metadata Creation Earlier in the Data Lifecycle
Samuel Spencer (Aristotle Metadata Registry)
Librarians and archivists have long known the benefits of metadata in for improving discoverability and understanding of datasets. However, with some open-data portals when depositing guidelines are based on policy or legislative requirements metadata quality may be an additional burden for data depositors with lower concern for long-term archival strategies. What is needed is a way to demonstrate short-term benefits for depositors that improve metadata quality while reducing the perceived burden of metadata production.nbsp;To improve the quality of data and metadata records for data.gov.au, we explored methods to improve production of rich geospatial visualisations to show users the immediate benefit of structural metadata, while also showing how metadata improved the quality of deposited data records. Additionally, we offered minimal machine-readable metadata profiles that could be created in common office-suite tools to maximise utility while minimising authoring time.This talk covers the challenges of producing structural metadata from records available in the data.gov.au CKAN data repository, methods for importing this into an open-source Aristotle Metadata Registry and how we connected these to produce metadata-driven interactive maps on NationalMap using Terria.io. Lastly, we look at how this data-visualisation focus engagement strategy has improved the quality of open government data.
From Administrative Burden to Research Excellence: Getting Researchers to Take Data Management Seriously
Alexandra Stam (FORS)
Driven in part by the open access movement, recent years have seen the expansion of initiatives that aim to promote data sharing, with increasing awareness among stakeholders of the importance of making data publicly available. Consequently, many funders have implemented formal data management plans as part of the proposal process, while data repositories and libraries have developed services, guidelines and trainings to help researchers fulfill funders’ requirements and apply good data management practices so that data can be shared. However, these same parties sometimes neglect the fact that many researchers do not take data management seriously, and perceive it and treat it at best as a form of administrative burden, and at worst as an obstacle to doing research. We will call for the repositioning of data management to the heart of the research process, irrespective of data sharing. Researchers should first and foremost see the value of good data management for their own research, as a way to achieve research excellence. We will share some reflections as to how this could be achieved by reconsidering the roles of funding agencies, data repositories, and librarians in encouraging and supporting good data management practices, beyond the goal of data sharing.
The presentation draws on a use case from political science to demonstrate integrated scholarly processes for curating and publishing data. Curation activities are distributed across institutional and national repositories, archives, registries and websites. In workflows which prioritise existing research practice and disciplinary standards, researchers provide structured metadata to the Australian Data Archive (ADA) using a template based on the Data Documentation Initiative (DDI). These metadata are mapped to RDF and RIF-CS standards and sent to the institutional repository at UNSW Australia, for publication and further dissemination. The primary role of the institutional repository is to move the data around – to apply standards and protocols that enable the data to be widely and openly accessible.The implementation supports researchers in comprehensively describing their research methods and data according to a widely-adopted disciplinary standard, and reduces their workload relating to institutional reporting and dissemination. The integrity of the institutional data repository is increased by its direct integration with the rich descriptions of data in the disciplinary archive. Added value for the institution is derived from the reporting capabilities of the repository, which links to enterprise systems to generate statistics about the University’s research assets.
2017-05-26: H2: Research Data Management Strategies & Opportunities
Spreading the Knowledge: Overviewing the University of Alberta Libraries’ Research Data Management Services
James Doiron (University of Alberta Libraries)
In June 2016 the Tri-Council Agencies, a major source of research funding for post-secondary institutions in Canada, released a Statement of Principles on Digital Data Management which identifies research data management (RDM) as being an essential and shared responsibility between researchers, research communities, research institutions, and research funders. As a major international research library, University of Alberta Libraries (UAL) offers expertise, resources, and services for supporting sound RDM throughout the research lifecycle. In alignment with the Tri-Council statement, UAL has adopted a holistic approach to education and delivering of RDM knowledge and resources across campus, focusing upon a variety of stakeholders. Examples of this include a running series of applied RDM training sessions for liaison librarians, customized information sessions both for Research Services Office and Research Ethics Office staff, and collaborative RDM events and training sessions delivered to researchers and students across campus. Some specific services and platforms offered by UAL include the Portage Data Management Planning (DMP) Assistant, Dataverse, and an open access Education and Research Archive for promoting research discovery, archival, and preservation. This session will provide a brief overview of UAL’s RDM services, methods employed for their delivery and uptake, and both current and emerging RDM initiatives.
Across Canada, across Disciplines: Research Data Management Practices and Needs in the Social Sciences and Humanities
Leanne Trimble (University of Toronto)
Dylanne Dearborn (University of Toronto)
Tatiana Zaraiskaya (Queen's University)
Jane Burpee (McGill University)
Eugene Barsky (University of British Columbia)
Catie Sahadath (University of Ottawa)
Melissa Cheung (University of Ottawa)
Across Canada, ten universities (to date) have worked together to survey their research communities in order to better understand research data management practices and needs. This work builds on a previous collaborative effort designed to delve into RDM habits of researchers in engineering and science, by expanding to researchers in the humanities and social sciences. This session will discuss the survey results from participating universities, providing insight into the Canadian RDM landscape while highlighting disciplinary differences and notable results. Survey sections include working with research data, data sharing, funding mandates and research data management services. Information generated by this survey will help inform Canadian institutional services, infrastructure and policies. Participating universities at the time of writing include: Dalhousie University, McGill University, Queen’s University, Ryerson University, University of Alberta, University of British Columbia, University of Ottawa, University of Toronto, University of Waterloo, and the University of Windsor. The session will also discuss the collaboration process, which resulted in the development of a clearinghouse of generic survey documents (questionnaires, ethics review documents) that will be housed by Portage, Canada’s emerging national RDM infrastructure project. These documents can be used by other institutions to conduct similar studies. Future initiatives include a further survey of researchers in the health and medical sciences.Additional authors arenbsp;Marjorie Mitchell,nbsp;University of British Columbia andnbsp;Matthew Gertler,nbsp;University of Toronto.
In Aggregate: Trends, Needs, and Opportunities from Faculty Research Data Management Surveys
Abigail Goben (University of Illinois-Chicago)
Tina Griffin (University of Illinois-Chicago)
A popular starting point for libraries engaging in research data management (RDM) services is a faculty needs assessment. Preliminary reviews of the literature identified almost fifty individual institutional results, mostly surveys from Highest or Higher Research Activity institutions. Henderson and Knott (2015) explicitly argue that no further surveys are needed because of the breadth and depth already covered by these studies. However, no overarching analysis has yet been conducted to examine cross-institutional trends and identify best practices or gaps in the literature.nbsp;To address these issues, the authors will compare published faculty RDM needs assessments. Studies to be included will be US-based, in order to retain homogeneity regarding research institution classification and funding mandates. Studies must be specifically about RDM needs and/or services, as opposed to broader library services, and should not be only evaluations of implemented library RDM services.nbsp;Research questions for this project include: identifying question overlaps; identifying which research data issues are common across institutions; determining if graduate students and research staff were considered in the needs assessments; and what gaps in the literature yet remain.nbsp;Henderson ME, Knott TL. Starting a Research Data Management Program Based in a University Library. Medical Reference Services Quarterly. 2015;34(1):47-59. doi:10.1080/02763869.2015.986783.
2017-05-26: H3: Is Bigger Always Better? Examining Big Data’s Limits in Utility, Quality, and Security
Would Big Data Replace Marketing and Social Surveys? - Potential Usage of Big Data in Marketing and Social Surveys: In Case of Mongolia
Davaasuren Chuluunbat (MMCG Company, Mongolia)
In recent years there have been debates on “Would big data replace marketing and social surveys” among marketing and social research communities in international level. The most trending idea is that big data will not replace MR, but that it will give support to the research. The main explanation of this idea is that big data could not reveal customers’ insights. However they agree that big data could help to conduct MR in an effective way. There is lots of evidence. Therefore MR societies are considering cooperating with big data owners, hiring the new skills of data analysts, combining quantitative and qualitative data for making value together with big data owners, and investing in big data platforms, etc. In this paper, I will outline possible links between big data and MR and usage of big data in MR. I will also discuss some cases known in the global level, showing how big data and MR are combined. For Mongolia, which shifted to a market economy after the socialistic regime was destroyed, big data is quite new concept, even though the value of market research has been recognized by businesses a few years ago. In the international MR communities, big data usage is trending topic. We have to follow the trend, and hope that it will be realized soon by our communities. There are some cases of using big data in marketing and social research, especially in the research design. Because big data is quite a new concept for us, we have opportunities and challenges to use big data. So I will include practices of using big data in our work and opportunities and challenges to use it in this paper. Then I will suggest solution to use opportunities in an effective way and solve challenges in our case.
Data Quality, Transparency and Reproducibility in Large Bibliographic Datasets
Angela Zoss (Duke University)
Trevor Edelblute (Indiana University)
Inna Kouper (Indiana University)
Increasingly, large bibliographic databases are hosted by dedicated teams that commit to database quality, curation, and sharing, thereby providing excellent sources of data. Some databases, such as PubMed or HathiTrust Digital Library, offer APIs and describe the steps to retrieve or process their data. Others of comparable size and importance to bibliographic scholarship, such as the ACM digital library, still forbid data mining. The additional cleaning and expansion steps required to overcome barriers to data acquisition must be reproducible and incorporated into the curation pipeline, or the use of large bibliographic databases for analysis will remain costly, time-consuming, and inconsistent.nbsp;In this presentation, we will describe our efforts to create reproducible workflows to generate datasets from three large bibliographic databases: PubMed, DBLP (as a proxy for the ACM digital library), and HathiTrust. We will compare these sources of bibliographic data and address the following: initial download and setup, gap analysis, supplemental sources for data retrieval and integration. By sharing our workflows and discussing both automated and manual steps of data enhancements, we hope to encourage researchers and data providers to think about sharing the responsibility of openness, transparency and reproducibility in re-using large bibliographic databases.
Secure Data Solutions for Social Media Data Analysis
David Schiller (GESIS)
Katharina Kinder-Kurlanda (GESIS)
Social media are used by more and more people. It influences how people communicate, how they gather information and how they behave – not only on social media platforms but in general.For example, WeChat is used by 700 million people in China. Additional services such as payment systems, language training etc. are incorporated into this app, resulting in a huge source of personal information. It is likely that such repositories will be one future of data collections.Research in the social sciences must be enabled to analyse social media data to understand how societies in general are changing and developing. This personal data should not only be used by commercial companies. The social sciences therefore need to build methods and infrastructures to access social media data sources and to support analyses. For several reasons (e.g. privacy concerns, proprietary data) existing secure data solutions seem to be particularly interesting to facilitate social media data access and sharing. Secure access solutions would also ensure proper documentation and quality of the data and the reproducibility of results. The talk will present first approaches to adapt and develop the Secure Data Center at GESIS also as a platform for social media data analysis.