Data professionals supporting researchers provide valuable services throughout the data management life cycle. According to recent surveys, up to 80% of a data scientist’s time can be spent cleaning, harmonizing and integrating data (a.k.a.: data wrangling). While there are many useful tools available to assist with these types of workflows, knowledge of basic programming can be extremely empowering.This full day workshop will provide an introduction to Python - one of the most popular and versatile languages in use today.No prior programming experience required! The workshop will be split into two parts: “Basic Python Programming” in the morning, and “Working with Data using Python” in the afternoon. Workshop materials are available at this site, https://ucsdlib.github.io/workshops/posts/python/iassist/iassist-python/
10,000 Steps a Day! A Journey in Data and GIS Literacy Using Non-traditional Data Sources, for the New Data Professional
Michelle Edwards (Cornell University)
Quin Shirk-Luckett (University of Guelph)
Teresa Lewitzky (University of Guelph)
The way that we look at and conceive of data has changed. Each of us is a walking data generator as on-line data is collected on our every page click and tweet, and our movements are tracked though our phones, and at the places we visit. Literally millions of people are joining the data revolution to collect and analyse data on facets of ordinary life such as their house temperatures, health indicators, and daily step counts. "The data available are often unstructured - not organized in a database - and unwieldy, but there's a huge amount of signal in the noise, simply waiting to be released." (McAffe & Brynjolfsson, HBR, 2012). Join us for an interactive workshop where we will take advantage of this data trove to learn strategies used to clean data, run key statistical tests, and visualize the data using basic GIS techniques. Our goal is to show you the fundamentals of working with data so you gain the knowledge of strategies and approaches that will work with these unique types of datasets that may cross your desk. We are proposing to use SPSS and ESRI ArcGIS for the workshop, and will be prepared to discuss open source statistical and GIS software. By the end of this workshop you will be able to: Prepare a dataset for analysis ,import the data into SPSS and select and run basic statistical tests in SPSS, import the data into ArcGIS and prepare a map to visualize the data.
Text Processing with Regular Expressions
Harrison Dekker (University of California Berkely)
Regular expressions (regex) are a powerful and ubiquitous programming construct that facilitate a wide range of text manipulation procedures. Essentially, regular expressions provide a means of defining text patterns that can be used to perform text matching and modification operations without having to write a lot of code. Common uses include complex search-and-replace-style data cleaning operations and pattern-based data validation such as detecting properly formatted telephone numbers, email addresses, or URLS. The most typical way of using a regular expression is through a function call in a programming language or directly on a command line, but they can also be used from within many text editors like Sublime, Textmate, or Notepad++. In this workshop you'll learn regular expression syntax and how to use it in R, Python, and on the command line. Participants will use a browser-based notebook, Jupyter, that enables literate computing by letting us experiment with the different flavors of regular expressions as implemented in these languages. The workshop will be example-driven and you will be encouraged to follow along with live-coding demonstrations and complete in-workshop challenges. You will work with real data and perform representative data cleaning and validation operations in multiple languages.
Using Stata for Data Work
James Ng (University of Notre Dame)
Stata is a leading statistical software package in the social sciences. Although not free, it has many of the hallmarks of open source software such as a user-contributed repository of add-on modules, an active community of users, and numerous third party-run online guides and tutorials. Stata arguably strikes perhaps the best balance between sophistication and usability among all statistical software packages.This hands-on workshop will introduce participants to some of the ways Stata is used in empirical research in the social sciences. Participants will work through a series of exercises using data in commonly encountered formats. Many of the exercises will involve reproducing tables and graphs from scratch. Topics to be covered include reading data, cleaning data, manipulating data, combining data, and using the help system. Attention will be paid to reproducibility of results, which means that participants will be writing scripts in a do-file. Detailed notes will be provided to each participant for reference. This workshop's target audience is social science librarians and other data service professionals. By the end of the workshop, participants should have gained enough familiarity with Stata to be able to start using it independently and to provide more in-depth help to their patrons who use Stata.This is not a workshop in statistical methods, hence no knowledge of statistics is assumed. No knowledge of programming is required.
Automating Archive Policy Enforcement Using Dataverse and iRODS
Jonathan Crabtree (University of North Carolina, Chapel Hill)
Helen Tibbo (University of North Carolina, Chapel Hill)
The workshop will highlight work of the Odum Institute as part of the DataNet Federation Consortiums effort to join the Odum Institutes archive platform with the Integrated Rule-Oriented Data System. Participants will see how archive workflows within the Dataverse platform can be connected to iRODS and leverage the policy based rules enforcement capabilities of iRODS. Participants will be able to create working Dataverse virtual archives that are integrated with the iRODS storage grid technology. The workshop will describe and utilize policies sets that have been selected from the new ISO 16363 audit standards for trustworthy digital repositories. These policies are written into iRODS rules that can be machine enforced. These data management and preservation rules will enforce and monitor a wide range of policies: Number of preservation copies, checksum calculations, frequency of integrity checks, creation of preservation formats, verification of preservation formats, movement of digital objects through a secure firewall, scans for sensitive information to protect human subjects, reporting of preservation status, verification of geographic distributed copies, enforce and report access control. Participants will see machine actionable rules in practice and be introduced to an environment where written policies can be expressed in ways an archive can automate their enforcement.
Digital Data Harmonization with QuickCharmStats Software
Kristi Winters (GESIS, Leibniz Institute for the Social Sciences)
QuickCharmStats 1.1 provides a digital solution for the problems of documenting how variables are harmonized. It is free and open-source software that facilitates organizing, documenting and publishing data harmonization projects. We demonstrate how the CharmStats workflow collates metadata documentation, meets the scientific standards of transparency and replication, and encourages researchers to publish their harmonization work. Currently those who contribute original data harmonization work to their discipline are not credited through citations. We review new peer review standards for harmonization documentation, a route to online publishing, and a referencing format to cite harmonization projects. Although CharmStats products are designed for social scientists who must harmonize abstract concepts, our adherence to the standards of the scientific method ensure our products can be used by researchers across the sciences.
Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis
Andrew Battista (New York University)
Stephen Balogh (New York University)
This workshop is a hands-on experience in creating GeoBlacklight metadata, a simplified schema for discovering geospatial data. In developing the GeoBlacklight project, Stanford University implemented a custom element set that is closely related to Dublin Core and is a redaction of much longer and more granular geospatial metadata standards, most notably ISO 19139 and FGDC. GeoBlacklight metadata is required to make the application work, and there are several ways to create records efficiently. Using a re-configured installation of Omeka, we will demonstrate how to capture, export, and store GeoBlacklight metadata. This tool can be leveraged to assist researchers in the submission of GIS data and the creation of geospatial metadata, and it can be used by librarians to generate records at the batch level as they develop collections. In this workshop we will:nbsp;Become familiar with the structure and function of GeoBlacklight metadata in order to create records effectively.nbsp;Learn to translate essential information about GIS files into the GeoBlacklight metadata schema in order to present geospatial data for discovery.nbsp;Develop strategies for creating GeoBlacklight records in bulk and adding them to OpenGeoMetadata (or another shared repository structure).Materials for this workshop are available at http://tiny.cc/iassist2016
Teaching Research Data Management Skills Using Resources and Scenarios Based on Real Data
Veerle Van den Eynden (UK Data Archive)
Jared Lyle (ICPSR)
Lynette Hoelter (ICPSR)
Brian Kleiner (FORS)
The need for researchers to enhance their research data management skills is currently high, in line with expectations for sharing and reuse of research data. Data librarians and data services specialists increasingly provide data management training to researchers. It is widely known that effective learning of skills is best achieved through active learning by making processes visible, through directly experiencing methods and through critical reflection on practice. The organisers of this workshop each apply these methods when teaching good data practices to academic audiences, making use of exercises, case studies and scenarios developed from real datasets. We will showcase recent examples of how we have developed existing qualitative and quantitative datasets into rich teaching resources and fun scenarios to teach research data management practices to doctoral students and advanced researchers; how we use these resources in hands-on training workshops and what our experiences are of what works and does not work. Participants will then actively develop ideas and data management exercises and scenarios from existing data collections, which they can then use in teaching research data management skills to researchers.
R is a powerful tool for statistical computing, but its base capabilities for graphics can be limited, and complicated plots often require a considerable amount of code. Ggplot2 is a popular package that extends R’s capability for data visualization, allowing users to produce attractive and complex graphics in a relatively simple way. This workshop will introduce the logic behind ggplot2 and give participants hands-on experience creating data visualizations with this package. This session will also introduce participants to related tools for creating interactive graphics from this syntax (such as plotly, plot.ly/feed). Prerequisites: Participants should be comfortable working with quantitative data and should have some basic familiarity with R, but do not need any experience with ggplot2. Ggplot2 uses a slightly different syntax than base R plotting, so participants do not need to have experience using R for data visualization. This workshop will involve reading data into R and working in the RStudio environment. By the end of this workshop, participants will: - Understand the syntax and logic behind graphics in ggplot2 - Create a variety of visualizations and learn how to customize features of the graphs, such as color scales and labeling - Learn about extensions for more advanced graphic capabilities using ggplot2 and additional resources for learning more.
2016-06-01: Plenaries
Plenary 1: Data for decision-makers: Old practice - new challenges
Gudmund Hernes (Fafo Institute; BI Norwegian School of Management)
Plenary 2: Embracing the 'Data revolution': Opportunities and challenges for research, or what you need to know about the data landscape to keep up to date
Matthew Woollard (UK Data Archive/ UK Data Service)
2016-06-01: 1A: Vintage data/Data rescue
Digitising 100 Years of Parliamentary Data - An Exercise in Producing a Living Digital Record of Political History
Samuel Spencer (Parliamentary Libaray/Commonwealth of Australia)
Data journalism is a growing field for the improvement of civic engagement in democracy, and the use of open data in political coverage has grown substantially in recent years. At the core of this is the ability to compare and contrast modern events in a historical context, and this requires accurate data to be centrally managed and easily accessible.Currently, historical information on Australian Parliaments has been available in the Parliamentary Library's flagship publication the Parliamentary Handbook - an extensive almanac with biographies, tables and records dating back to Australia's federation. This data is used as a way to track key social issues, such as length of service, gender representation in parliament and historical election information in an authoritative format.To improve access to this information the library began development of a mobile app which evolved into a complete data management system for the recording and sharing of information. To complement this the Parliamentary Library is developing an open-source data management system for managing parliamentary biographies and service histories based on Popolo, a civic data framework for the management and dissemination of parliamentary information. Along with interactive biographies and records of ministries and parties, the system for the first time allows users to build custom tables from complex queries that are dynamically updated as new information is made available.Coupled with this is the development of a biographical data management system that will ensure that records of new parliamentarians and future changes to existing parliamentarians are captured in a single system.In this presentation, we cover the challenges and successes in digitising over 100 years parliamentary data, including migration, data cleansing and data trust issues. We also provide a technical breakdown of the chosen framework and infrastructure, and issues during development especially when dealing with imprecise or incomplete historical records.
Vintage Is Just a Cooler Word for Old: Salvaging the SSLS/SYPS
Laine Ruus (University of Edingburgh)
The first Scottish School Leavers survey (SSLS) was administered in 1962, and conducted usually biennially until 2005. Various principal investigators and funding agencies have been involved; the last PI, Dr. Linda Croxford, has retired, and is cleaning out her office at University of Edinburgh. Some surveys/waves have been deposited with the UKDA. Much of the documentation is paper only, and because of confidentiality and privacy concerns, access to the microdata files has been restricted. Consequently, the data from these surveys spanning over 30 years, in about 17-20 different files, with comparable questions but various levels of documentation, have been underutilized.Beginning in 2014, the Data Library began a salvage operation. The primary focus has been on those data not in UKDA, primarily the 1977 through 1983 surveys. In order to maximize access, it was decided to employ an on-line interactive interface, with good metadata display, a wide range of statistical analysis and recoding/computing, and the best available variable-level confidentiality management capabilities, namely SDA. The processes undertaken to salvage these classic microdata files, ensure their long-term preservation, and enhance access to them, while respecting privacy and confidentiality, will be outlined.
Data Survival on a Seemingly Deserted Island?
A. Michelle Edwards (Cornell University)
Berenica Vejvoda (McGill University)
"Survival, to continue to live or exist, especially in spite of danger or hardship" Data formats and collection methods have changed dramatically over the years, leaving very valuable data behind in the dust. Historical Canadian Agricultural Census and Canadian First Nations surveys are only two examples from one country. The question that may come to mind is: Should all "forgotten or older data survive?" Should data librarians and data archivists invest valuable resources to rescue historical and older data? If we do, how do we evaluate and determine which data survives and which do not? How do we ensure that the surviving data matches today's standards for privacy and access? Should this be a priority? We can only imagine the benefits that adding older data can provide to new data creation and knowledge mobilization, creating valuable links between the past and today's data collections. In order to assist data survival, funding and resources are required. What funding opportunities exist today to help us provide support for surviving data? This paper will discuss the challenges that may be encountered while rescuing older data, provide different scenarios where surviving data could be used, and provide results of a survey of funding opportunities used to rescue data.
2016-06-01: 2D: Data sharing behavior
A Game Theoretic Analysis of Research Data Sharing
Tessa Pronk (Utrecht University)
While reusing research data has evident benefits for the scientific community as a whole, decisions to archive and share these data are primarily made by individual researchers. Is research data sharing to their advantage? To tackle this question, we built a model in which there is an explicit cost associated with sharing datasets whereas reusing such sets implies a benefit. In our calculations, conflicting interests appear for researchers. Individual researchers are always better off not sharing and omitting sharing costs, whereas at the same time both sharing and not sharing researchers are better off if (almost) all researchers share. Namely, the more researchers share, the more benefit can be gained by the reuse of those datasets. Further simulation results point out that, although policy measures should be able to increase the rate of sharing researchers, and increased discoverability and dataset quality could partly compensate for costs, a better measure would be to directly lower the cost for sharing, or even turn it into a (citation-) benefit. Making data available would in that case become the most profitable, and therefore stable, strategy. This means researchers would willingly make their datasets available, and arguably in the best possible way to enable reuse.
Data Sharing Behavior: the Sociology of Data Sharing
Alexia Katsanidou (GESIS, Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moltgen (GESIS, Leibniz Institute for the Social Sciences)
Some social science researchers are pioneers in sharing their data and promoting the replication and transparency in science movements. Others are still very protective of their data and prefer to keep it safe in their own computers. Previous work on data sharing focused on the relation between journal data policies and research data availability (Ghergina and Katsanidou 2013 and Zenk-Moltgen and Lepthien 2014). A clear literature gap is the omission of analyzing individual researcher intrinsic motivation for data sharing.Social psychology offers the analytical framework that allows us to investigate how personal beliefs can shape intentions of individuals and how these intentions influence their behavior. Based on the theory of planned behavior by Ajzen and Fishbein, which emphasizes the impact of peer group, this paper sets out to explain data sharing behavior by authors in political science and sociology journals.A set of authors of publications from pre-selected ISI indexed journals have been surveyed. The aim is to explore the authors' personal beliefs, intention and behavior regarding sharing the data their analysis is based upon. By presenting first results of this survey, we hope to shed some light on a previously obscure component of data sharing behavior.
Data Sharing and Data Citation: Linking Past Practices and Future Intentions
Steven McEachern (Australian Data Archive)
Janet McDougall (Australian Data Archive)
The increasing interest in the archives and repository communities into the use of data citation practices has occurred in parallel to the interest in data sharing practices among depositors. The relationship between data sharing and data citation has not however been considered an issue this paper seeks to address. Recent research by McDougall (2013), drawing on data from Tenopir et al. (2011) suggests that there is a clear link between previous secondary data use and data sharing intentions. Tenopir et al. (2011) also report that the key consideration among researchers to share their data is the appropriate citation of the data when their data are reused. This suggests that past data use and data citation practices may be important influences on future data sharing intentions.This paper seeks therefore to explore the relationship between data sharing and data citation in detail. The paper presents the results of a recent survey of Australian social science researchers, which explores two key areas of research practice - use of secondary data, citation of secondary data, and personal data sharing experience - and their relationship to both data citation and data sharing intentions.McDougall, J. (2013) Sharing social science data: why do researchers share their data with others? Unpublished minor thesis, Masters of Social Research program, Australian Demographic and Social Research Institute. Canberra: Australian National UniversityTenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101
2016-06-01: 3G: Big data, big science
Data science: The future of social science?
Aidan Condron (UK Data Archive)
This talk will focus on developing a "big data" architecture for social science. The UK Data Service is currently engaged in a major project to develop "big data" architecture for social science, enabling social scientists to manage, analyse and produce knowledge from large and complex datasets, or combinations of datasets. The work involves scoping social scientists data requirements, identifying useful datasets, and developing appropriate technological infrastructure and tools. While we are working on producing discipline-agnostic, generic systems and tools, our research and development has focused proof of concepts using household energy consumption data, including data collected from smart meters throughout the UK. These datasets present great opportunities for exploring energy consumption in detail, and when linked to additional datasets, for understanding issues such as fuel poverty and household responsiveness to changing or pricing structures weather conditions with finer granularity than ever before. Experimentation has presented a host of challenges not just in the technical domain, but also with regard to the ethics and legality of reusing data for new and novel purposes. The proposed talk will introduce our conceptual and technical work in developing a big data platform for social science, and outline preliminary findings from work using energy data
Social Media Data in the Academic Environment: Two Institutions and One Big Provider
Stephanie Tulley (University of California, Santa Barbara)
Tim Dennis (University of California, San Diego)
Shari Laster (University of California, Santa Barbara)
Annelise Sklar (University of California, San Diego)
Social media data is a high-profile resource across academic disciplines, in areas as diverse as understanding voter behavior, tracking social communication networks, and identifying sources and effects of pollution on human health. While manual data collection and review from public social media sites can provide some insight into the content of these sources, bulk access to data is preferred for more complex and deeper analysis into the content. A certain amount of data can be accessed directly from some social media companies - whether through an API, screen-scraping, or legally-questionable means - but the environment for access to the full "firehose" of social media data is rapidly changing, making social media research an expensive endeavor. This presentation will include an overview of the social media data landscape and the Crimson Hexagon product, a detailed discussion of the policy and access challenges specific to providing access to Crimson Hexagon, and an update on lessons learned and next steps for using this resource at our respective institutions
Managing 'Big Data' in the Social Sciences: The Contribution of an Analytico-Synthetic Classification Scheme
Suzanne Barbalet (UK Data Archive)
Ben Newman Wright (UK Data Archive)
Rafal Kulakowski (UK Data Archive)
A "Big Data" platform is nascent for the UK Data Service. Our users will require assistance with on-the-fly data linking, extraction and integration, possibly with novel data sources. A suite of bespoke analytical software is in preparation. To complement these powerful tools further development of our Knowledge Organization Systems (KOS) will be required. The success of a recent pilot study to investigate an application of the Universal Decimal Classification System (UDC) to organise an expanded "topics" search of UK Data Service resources led us to consider broader applications in the linked open data environment. The flexibility of an analytico-synthetic scheme, such as UDC, provides granular, language-independent description of data at source that is in both machine-readable and human-readable form. In addition, the recent release of UDC Online (English) facilitates efficient application of the code. UDC has many applications but with an open vocabulary service as our future priority this application is the focus of this paper. Within a vocabulary service UDC will enable our users to negotiate international open data resources which build upon more than a decade of cooperation in developing KOS tools with our CESSDA European colleagues.
2016-06-01: S2: Don't hate the player, hate the game
Don't Hate the Player, Hate the Game: Strategies for Discussing and Communicating Data Services
Terrence Bennett (College of New Jersey)
Shawn Nicholson (Michigan State University)
Joel Herndon (Duke University)
Rob O'Reilly (Emory University)
Some studies of data management services within academic libraries focus on best practices for structuring data services; others consider tools and training needed to successfully offer these services. However, these developments may be underutilized, as the ways libraries talk about data-related research are not always in sync with how scholars think about their work. This panel considers how libraries might strategically reconsider communications about data services.Researchers' debates over the merits of data sharing mention funder mandates only in passing, if at all. This suggests that librarians' focus on mandates for data sharing will connect with only a subset of researchers' data needs. First, Herndon and O'Reilly discuss this difference between librarians and researchers, then suggest different ways that libraries might frame data management services, and consider additional data services that libraries might offer.Next, Bennett and Nicholson consider the premise that "bad information is processed more thoroughly than good" and integrate that premise into an exploration of the alignment of library-emanating data management communications with the data-related expectations of researchers in different academic domains. How could the notion that bad information resonates better be used to inform the ways that libraries approach the promotion of data services across disciplines?
2016-06-01: 1D: Data protection: Legal and ethical review
The Administrative Data Research Network's Citizen's Panel - A Step towards Bridging Public Concerns about Research Using Administrative Data
Judith Knight (Adminstrative Data Research Network)
The content of administrative records are both confidential and personal, therefore the use of administrative data for research purposes is rightly and naturally of concern to us all. It is highly likely that unless public concerns can be understood, met and their confidence and support gained, that the role of research using administrative data cannot develop further.As a member of the general public you may well ask, how will this research help me? The Network enables researchers across the UK to gain access to linked de-identified administrative data to benefit society i.e. research that could change health care systems, improve the distribution of funds to needier areas or has the potential to reduce crime.To extend the Network's reach, the Administrative Data Research Network (ADRN) in addition to a breadth of communications and public engagement activities across the UK is developing a UK National Citizens Panel (CP). The panel will provide a representation of public views on potential changes to Network policy, procedures, governance and service provision issues. The CP will also assist with testing our public facing communications, e.g. events, website and materials.Funded by the Economic Social Research Council, the ADRN, set up as part of the UK Government's Big Data initiative, is a UK-wide partnership between universities, government bodies, national statistics authorities and the wider research community.www.adrn.ac.uk.
'Sorry, that doesn't seem to fit?': Using Traditional Ethical Review Processes to Screen Administrative Data Projects
Carlotta Greci (UK Data Archive)
The Administrative Data Research Network (ADRN) delivers a service to researchers, providing secure and lawful access to de-identified linked administrative data. Before an ADRN research project can be undertaken it must be approved by the ADRN Approvals Panel, which independently reviews all applications to use to the Network.The Approval Panel does not assess the ethics of a project, but has to ensure that an appropriate Ethical Review has been satisfactorily carried out and, ultimately, check that researchers are aware of the ethical implications of using these data. However, ethics review processes have large variation in practice, as do the ethical considerations which may arise from using and linking administrative data for secondary analysis.This presentation will outline the ADRN application process and the role of the Approvals Panel in relation to ethical review. We will also describe the main challenges and initiatives that were put in place to solve some of the problems with the existing coverage of ethical review bodies, e.g. establishing a National Ethics Committee (NSDEC). The aim will be to expand the discussion towards a broader reflection on the ethical dilemmas that administrative data pose, concluding with the steps ADRN have adopted to address these difficulties.
Legal and Ethical Framework for Research in Europe
Katrine Segadal (NSD)
Vigdis Kvalheim (NSD)
The legal basis for the current data protection regime in the EU is the Data Protection Directive (95/46/EC), and the various implementations of this directive in the individual countries. The need for a consistent legal framework across Europe is one important reason why in January 2012 The European Commission proposed a new General Data Protection Regulation (GDPR). A regulation is (in contrast to the directive) a binding legislative act and must be applied in its entirety across the EU. The completion of this reform is expected by the end of 2015. The GDPR will have a direct impact on the framework conditions for research and the result of the ongoing reform process is therefore of great importance to the scientific communities of Europe.One central concern is whether the new regulation creates good, secure and predictable conditions for scientific research and research infrastructures. On the other hand, one of the aims to propose a new legal framework on data protection was to harmonize legal practices across Europe, and thus to ease transfer of personal data between countries. These aspects could be of great value for cross national research.This paper will discuss how the new legislations affect data collection, data use, data preservation and data sharing: How will the regulation influence the possibilities for processing personal data for research purposes? How are personal data defined? What conditions apply to an informed consent? In which cases is it legal and ethical to conduct research without the consent of the data subjects? What are the conditions regarding preservation, transfer, and reuse of personal data?
2016-06-01: 2F: Building capacity for RDM across disciplines
Where Do We Start and Where Are We Going? Bringing Data Curation to the Federal Reserve
San Cannon (Federal Reserve Bank of Kansas City)
The development of data curation services in support of research at the Federal Reserve Bank of Kansas City is a new focus and a central activity for a new group at the Center for the Advancement of Data and Research in Economics. This presentation will outline the foundational work done to bring a full suite of services to the research community, including the creation of new job families, education of HR and other support staff, development of strategic plans, socialization with senior management, development of new business process and evaluation of technology and applications. CADRE has moved the Bank from zero data curation activities to a strategically developed and aligned staff of eight that provide support across the research lifecycle using several new processes and platforms in just a year. This presentation will outline the achievements, set backs, opportunities, and lessons learned.
Experiences from An Interdisciplinary, Long-term Research Project: Research Data Management and Services for the CRC/TR32
Constanze Curdt (University of Cologne)
In recent years, the importance of research data management (RDM) has increased in many fields (e.g. social sciences, earth sciences) due to a growing amount of data. Thus, funding organizations, such as the German Research Foundation (DFG), the European Commissions or the National Science Foundation (NSF), requested data management plans within project proposals to ensure the adequate handling of publicly funded research data. In the context of collaborative, interdisciplinary research projects, proper data management and services should support for example accurate data storage, backup, and documentation. This facilitates data sharing within the project and re-use for future studies.In this contribution, we will present experiences gained from establishing RDM and related services for the DFG-funded interdisciplinary, long-term research project Collaborative Research Centre/Transregio 32 (CRC/TR32, www.tr32.de). Since 2007, CRC/TR32 scientists focus their work on patterns in soil, vegetation, and atmosphere. In this context the CRC/TR32 sub-project INF (˜Information infrastructureâ is responsible for the management of all relevant research data, collected or created by the scientists with the objective to enable systematic and long-term use of this project data. In this framework several RDM services were established. This includes the establishment of the project data repository TR32DB (www.tr32db.de) according to demands of the project participants and DFG. The TR32DB supports common features such as data storage, backup, documentation, search, exchange, provision and DOIs for selected datasets. Moreover, guidance and support for project participants on RDM is provided by the INF-project. This also covers practical training of the project participants for the usage of the TR32DB data repository.
Translating the DMP Process for Researchers with Images: It's Not Your Grandpa's Slide Carousel Anymore
Paula Lackie (Carleton College)
Berenica Vejvoda (McGill University)
K. Jane Burpee (McGill University)
Academics are getting the hang of digital document management, but for images, it is often still practiced as an old shoebox of photos in the closet - only worse. Digital images have proliferated at an incomprehensible rate. Along with this stunning and rapid expansion, the lifespan of the same objects has shortened at a similar rate. Our role model for how to manage images was that shoebox or slide carousel and they stayed put for decades! Adapt this basic strategy to the current era of omnipresent digital images and you have got a recipe for the widespread loss of photo histories for the future. The old-school methods were messy but not impossible. Now the next generation is likely to acquire an unlabeled hard drive or worse, many unlabeled hard drives, zip disks, or flash drives... You know this scenario. Now think of it as applied to images in research projects. If we think of "Big Data" as any amount of data beyond which the owner can easily manage, then it is easy to see how collections of images have become a new layer in the "Big Data" conundrum.In this critical situation, we data services professionals have an opportunity. In this presentation we will translate DMP concepts into practical terms for image management in idiosyncratic research collections. Basic metadata for images is an easily transferable concept which may then be used to gain a foothold in other useful applications of DMP work.
2016-06-01: 3A: Opening up open data
Open data and citizen empowerment: Opening National Food Survey data
Sharon Bolton (UK Data Archive)
During 2016, the UK Data Service has been collaborating with a UK government department on an initiative to open National Food Survey data. What are the rewards and challenges of repurposing previously safeguarded data? This presentation will cover elements such as negotiation, re-licencing, privacy and disclosure review, and the upgrade of legacy data to improve the experience for users old and new.
OpenAIRE2020 is an Open Access (OA) infrastructure for research which supports open scholarly communication and access to the research output of European funded projects. With over five years experience of supporting the European Commission's OA policies, OpenAIRE now has a key role in supporting the EC's Horizon 2020 Open Data Pilot. OpenAIRE's community network works to gather research outputs, highlight the OA mandate, and advance open access initiatives at national levels. It has National Open Access Desks in over 30 countries, and operates a European Helpdesk system for all matters concerning open access, copyright and repository interoperability. At the same time, OpenAIRE harvests metadata information from a network of Open Access repositories, data repositories, aggregators and OA journals. It then enriches this metadata by linking people, publications, datasets, projects and funding streams. This interlinked information which currently encompasses more than 13 million publications and 12 thousand datasets from more than 6 thousand data sources helps optimise the research process, increasing research visibility, facilitating data sharing and reuse and enabling the monitoring of research impact. This presentation will outline how an infrastructure like OpenAIRE can help turn OA policy into successful implementation.
101 cool things do with open data - running an App challenge
Louise Corti (UK Data Archive)
In the summer 2015 the UK Data Service and the small company, AppChallenge.net, collaborated to launch a developer contest using open data about the Quality of Life of European citizens. The project involved us creating an open dataset certified as expert by the UK's Open Data Institute and to be made available via a new test open API (Application Programming Interface). In this paper I will set out how we opened up and richly documented 2 years of the European Quality of Life Survey (EQLS) data carried out by Eurofound; through detailed disclosure review (using an SDC R tool) and harmonising variables across years. These data were made available via our new pilot public API with weights added at the point of making a call. The project used crowdsourcing to generate innovative apps and services from developers who may not have otherwise discovered the UK Data Service. Developers from across the world took part in our EULife AppChallenge competition, with an 18 year old Polish man winning the contest with his EULife Quizzes, and scooping the largest cash prize. I'll share with you how we got this Challenge off the ground, some of the lessons we learned and some of the great winning idea. One thing is, don't assume that app developers will read any of your beautiful archive documentationnbsp;- they won't - they just want rich self documenting data through a single AI interface.
2016-06-01: 3F: Technical data infrastructure frameworks
MMRepo - Storing qualitative and quantitative data into one big data repository
Ingo Barkow (HTW Chur)
In recent years the storage of qualitative data has been a challenge to data archives using repositories which base on relational databases as large files could not really be represented well in these structures. Most of the times two or more structures have to be in place e.g. a fileserver including versioning for large files and a relational database for the tabular information which means handling multiple systems at the same time. With the arrival of Hadoop and other big data technologies there is now the possibility to store qualitative data and quantitative data as mixed mode data into the same structures. This paper will discuss our findings in developing an early prototype version of MMRepo at HTW Chur. MMRepo is planned as a combination of the Invenio portal solution from CERN with a Hadoop 2.0 cluster using the DDI 3.3 beta metadata scheme for data documentation.
The CESSDA Technical Framework - what is it and why is it needed?
John Shepherdson (UK Data Archive)
There is a requirement for a delivery capability to provide the compute power, storage and bandwidth required to host the various products and services that will be developed as components of the forthcoming CESSDA Research Infrastructure (CRI), which will make high quality European research data more readily available to researchers. Alongside this, the provision of a development and test environment with a common, shared toolchain will reap many benefits. The ambition of the Technical Framework is to promote good software development practice across the CESSDA member (aka "Service Provider") community, in respect of the delivery of software-based products and services for the CRI. The publication of architectural guidelines and basic standards for source code quality will ensure Service Providers know what is expected of them, whilst the shared development infrastructure will help them achieve the required standards without a lot of upfront cost and effort. That is to say, the goal is to lower the entry barriers for Service Providers. In summary, modern data curation techniques are rooted in sophisticated IT capabilities, and the CRI needs to have such capabilities at its core, in order to better serve its community. The CESSDA Technical Framework is a key enabler for this.
Archonnex at ICPSR - Data Science Management for All
Thomas Murphy (ICPSR - University of Michigan)
Harsha Ummerpillai (ICPSR - University of Michigan)
Archonnex is a Digital Asset Management System (DAMS) architecture defined to transition to a newer technology stack meeting core and emerging business needs of the organization and the industry. It aims to build a digital technology platform that leverages ICPSR expertise and open source technologies that are proven and well supported by strong Open Source communities. This component based design identifies re-usable self-contained services as components. These components will be integrated and orchestrated using an Enterprise Service Bus and Message Broker to deliver complex business functions. All components starts as a Minimum Viable Product (MVP) and improved in iterative development phases. This presentation will identify all the various operational components and the associated technology counterparts involved with running a data science repository. It will consider the process of the upfront integration with the researcher to allow better managed data collection, dissemination and management (see the SEAD poster proposal) during research and follow the workflow process technologically through from the ingestion of data to the repository, curation, archiving, publication and re-use of the research data including the citation and bibliography management along the way. The integration of data management plans and their impact on this workflow should become apparent with this ground up architecture designed for the data science industry. The conference participants will leave with an understanding of how the Archonnex Architecture at ICPSR is strengthening the data services offer to new researchers as well as data re-use and how repository brokering may be leveraged.
2016-06-01: 1G: Data services: Setting up and evaluating
Maturity Model for Assessing Data Infrastructures - CESSDA as Example.
Marion Wittenberg (DANS)
Mike Priddy (DANS)
Trond Kvamme (NSD)
Maarten Hoogerwerf (DANS)
CESSDA, the consortium of European Social Science Data Archives, aims to provide an infrastructure that enables the research community to conduct high-quality research within the social sciences. Developing such a infrastructure requires all service providers to participate to their ability: some partners have a long history and have set high ambitions and funding whereas other partners are in the process of setting up their archives, sometimes with limited funding. Rather than setting fixed requirements for each partner or services, CESSDA must define both the desired state AND provide effective guidance for partners how to improve their services gradually to the minimal/desired state. Within the SaW-project a maturity model will be developed which helps (aspiring) CESSDA members to assess their services and determine the gap(s) between the current and desired state for each individual partner. In this this presentation we will show the model and we will explain how it could be used for assessments.
Roper@Cornell: TheComplexities of Moving the World's Largest Archive of Public Opinion Data to Its New Home at Cornell University
William Block (Cornell University)
Tim Parsons (Roper Center)
Brett Powell (Roper Center)
In November of 2015 the Roper Center for Public Opinion Research moved from its home of 38 years, the University of Connecticut, to new digs at Cornell University. This presentation will discuss the complexities of moving an archive, especially the decision-making processes involved in that (paper, admininistrative information, data producer agreements, etc). Key decisions have to driven by the preservation policy and by commitments to the membership and society.
Emerging data archives: Providing Data Services with Limited Resources
Aleksandra Bradic-Martinovic (Institute of Economic Sciences)
Marijana Glavica (University of Zagreb)
Vipavc Brvar (Slovenian Social Science Data Archive)
The establishment of data services is a long and challenging process. There are two possible approaches in realizing the establishment: top-down and bottom-up. The bottom-up approach is more common and involves development of services within one institution often funded by projects with limited resources and duration. During the initial phase, a potential service provider (SP) is able to gain necessary knowledge and experience, but after that the provider has to offer some services to their users. The problem is that often a newly established SP is not yet fully operational, and it has to be decided which services can and which can not be offered to the potential users.In this paper we will analyze different pathways for providing data services with limited resources. Our main focus will be on two cases of emerging data archives, one in Croatia and one in Serbia. We will offer a systematic review of data services and argument which of them could and should be provided or not. We will identify a minimum set of services and the way in which they must be delivered in order to build trust with users and to provide long-term preservation and availability of deposited data.
2016-06-01: 1I: Teaching data
A Proposed Scaffolding for Data Management Skills from Undergraduate Education through Post Graduate Training and Beyond
Megan Sapp Nelson (Purdue University)
Initial work in identifying data management or data information literacy skills went as far as identifying a list of proposed competencies without further differentiation between those competencies, whether by discipline, complexity, or use case. This presentation proposes an evolution in existing competencies by identifying a scaffolding built upon existing competencies that moves students progressively from undergraduate training through post graduate coursework and research to post-doctoral work and even into the early years of data stewardship. The scaffolding ties together existing research that has been completed in research data management skills and data information literacy with research into the outcomes that are desirable for individuals to present in data management at each of the levels of education. As a result of this presentation, competencies will be aligned according to application (personal, small group, large group) in such a way that the skills attained at the undergraduate level would give students moving on to graduate work greater familiarity with data management and therefore greater likelihood of success at the graduate and then post graduate and data steward levels.
Data Management... in Writing Studies? A Case Study of Collaboration and Outreach
Alicia Hofelich Mohr (University of Minnesota)
Alice Motes (University of Minnesota)
Graduate students, especially those who are beginning to learn the methods of their fields, are ideal targets for data management education, as they can integrate best practices into their developing research workflows. However, with the diversity of methods and research data being managed, it can be challenging to effectively reach students with a single workshop or series. At the University of Minnesota, we tried to customize our outreach to graduate students by targeting instructors of graduate methods courses. This was a collaborative effort between the Libraries and College of Liberal Arts (CLA), and we approached this outreach with a diverse team of support staff: a data curator with qualitative expertise, a data manager with quantitative expertise, and library liaisons from different areas. We successfully reached five courses within CLA, in the departments of statistics, journalism, communication, and writing studies. This presentation will discuss this effort, along with the surprisingly in-depth collaboration developed with a technical communication course in writing studies. It will also cover takeaways from this experience, such as the benefits of having both qualitative and quantitative viewpoints on a data management task, and how this experience will shape our approaches for providing future data management services to the humanities.
What Is Your 'Unit of Analysis' And, More Importantly, Why? New Tools And Methods for Teaching Undergraduate Social Science Students to Think about Data.
Parvaneh Abbaspour (Lewis Clark College)
E. J. Carter Lewis (Lewis Clark College)
The proliferation of online datasets has created myriad opportunities for undergraduate social science students to delve into complex, quantitative analysis. While students drawn to these courses are often math and statistics savvy and relatively adept at working with statistical programs, many still lack an understanding of data creation processes such as why the data were collected, how the populations delimited and sampled, and precisely how variables are defined and measured such that they might stand-in for phenomena. Moreover, the ease of acquiring these datasets can contribute to the abstraction, and more crucially the assumptions, inherent in translating the complexity of human experience into numerical values.Common approaches to teaching undergraduate social science students to find data include referring them to the secondary literature, pointing them to data repositories, and walking them through a 'unit of analysis' worksheet. We argue that while such worksheets may help a student define the parameters of the data they are after, they reinforce the same abstraction inherent in the data dilemma to begin with. We present a range of tools developed this year to support data discovery with the goal of reinforcing data literacy for undergraduate social science students while helping them find the resources they need. These tools include the data review and the determinant inventory. We describe how we adapted and integrated these tools into a revised data discovery worksheet emphasizing a more holistic conception of how data models real world phenomena.
2016-06-01: 2B: Partners in research data management
The Erasmus Centre for Strategic Competitiveness Research (ECSCR) Data Centre
Paul Plaatsman (Erasmus Data Service Centre)
Recently the Rotterdam School of Management, the Erasmus Data Service Centre and the Research Support Office joined forces to establish the Erasmus Centre for Strategic Competitiveness Research(ECSCR). The goal is to develop a data centre with mixed data about regional competition; data from commercial vendors like Bureau van Dijk will be merged with survey data from their own generated questionnaires.The data were first identified through Data Management Plans (DMP's): individual, subgroup and for the whole group. Next the data have been properly described with meta data, persistent identifiers, versioning and of course securely stored. The data should still be available for analysis with e.g. Stata during the research data life cycle. Different users will have different access rights. The legal aspects about data ownership and privacy issues need to be addressed. From IT perspective we need to investigate business-, functional- and technical requirements. Obviously we need several workflows for all the above mentioned issues.The project wants to make Erasmus University Rotterdam's Research Data Management (RDM) policy more tangible. An official policy has been developed by a taskforce scientific integrity but implementation still needs to be done department by department. The deliverables of this project should become available for other departments as well, so generic solution are our aim. During the presentation I will inform fellow IASSIST members about the present stage of this project.
Embedding Metadata in the Research Process - Archives as Partners in Data Production
Steven McEachern (Australian Data Archive)
Janet McDougall (Australian Data Archive)
Heather Leasor (Australian Data Archive)
In many data archives supporting academic research data archiving, the process of collecting metadata has traditionally been the responsibility of the data archive rather than the data producer. While data producers and researchers may produce automatically generated metadata as a by-product of their work - such as in statistical data files, questionnaires or project reports - any expectations of the manual creation of metadata (such as study metadata in DDI terms) have traditionally been relatively low. However, with the growth in the volume of both research data and content from other sources, the workload demands on archives can only be expected to grow. As such, there is a significant need to reduce the processing and metadata production workload within archives. One means for achieving this is to improve the quality of the automatically generated metadata that is created by producers.This paper reports on two recent projects occurring at the Australian Data Archive that aim to enable this improved production - by supporting minor developments in metadata production to earlier in the data lifecycle. ADA staff have been working with two data producers involved in significant national survey projects to implement minor changes to standards and practices within their data production process. The paper will explore the changes in practices that have been proposed to the data producers, changes in work practices within the producers, and the resulting impacts on metadata quality of new content provided to the Archive.
Community Data Repositories Working with Libraries: Harvard Dataverse Use Case
Eleni Castro (Harvard University)
Since 2012 the Harvard Dataverse (https://dataverse.harvard.edu), powered by the Dataverse Project open source software and developed at Harvard's Institute for Quantitative Social Science (IQSS), has been collaborating with Harvard Library to provide a solution for sharing, publishing and archiving research data for faculty and affiliated researchers. This collaboration has helped to expand the scope of the Dataverse application to better support research data beyond just the social sciences, initially with adding metadata fields to help describe datasets from the biomedical (ISA-Tab) and astronomy (Virtual Observatory) communities, and with the aim of eventually supporting more research communities such as the humanities. The Harvard Dataverse team has also extended its services to provide user support, training, and some data curation services to the Harvard community. This presentation will also highlight some current and upcoming collaborative projects which include: connecting faculty publications with their underlying research data by integrating Dataverse with Harvard's institutional repository Digital Access to Scholarship at Harvard (DASH), providing university-wide open data awareness and support via the Harvard Open Data Assistance Program (ODAP), helping researchers meet the requirements of funder mandated data management plans through customized DMPTool services, and making faculty datasets more widely discoverable by exporting metadata (MARC) into the Harvard Library Catalog, HOLLIS.
2016-06-01: 3H: DDI applications for data access
Experimenting with DDI-L at the French Center of Socio-Political Data (CDSP)
Alexandre Mairot (CDSP - SciencePo)
Alina Danciu (CDSP - SciencePo)
The French Center of Socio-Political Data has presented its reflection on the process of shifting from DDI-C to DDI-L at EDDI14. This year, we will discuss the creation and storage of a DDI-L compliant XML record by capturing metadata of a nine-wave political study of the ELIPSS panel. Determining how best to recognise continuities between metadata collections within the same study, including question continuity and methodological continuities has been a primary challenge. To answer it, the starting point was the creation of a questions database. As seen at the 2014 DDI workshop in Dagstuhl, the minimum requirements that a metadata system should meet before being able to import/export DDI-L are uniqueness of items, versioning and granularity. To conceive such a database, we had to start by using simple tools. We first identified metadata in CSV files that include variable-level information. We then performed a semi-manual import from these files to the database using importing scripts. Once we removed automatically the redundancy, with a further stage of human control, we generated the structure of the DDI-L compliant xml file. Our paper will present this process and discuss its replication to other DDI-C documented studies.
The UK Data Service Variable and Question Bank: Use Cases and Future Enhancements
Hersh Mann (UK Data Archive)
The Variable and Question Bank (VQB) is a search and browse interface that enables researchers to locate and retrieve information about variables and questions from a range of survey data collections held by the UK Data Service. Over a million variables are currently searchable from the most widely used surveys we hold. This tool also allows researchers to directly compare variables, to identify the same variable or question used across several surveys, and to detect questions that belong to a larger defined set. The VQB also enables researchers to easily view associated descriptive data and to see the variable in the wider context of the survey from which it is drawn. We have examined user experience of the tool to inform new enhancements and are working with external partners in the UK Office for National Statistics (ONS) to promote use of the VQB, to improve the visibility of particular surveys, and to support efforts in harmonisation. A metadata enrichment project utilising the power of DDI3.2 will augment the capabilities of the VQB to attribute provenance, identify variations between different versions of data collections, track changes over time, clearly map between questions and harmonised items, and create persistent identifiers.
The DASISH Questionnaire Design Documentation Tool: a tool for documenting questionnaire design under development
Hilde Orten (Norwegian Social Science Data Service (NSD))
Stig Norland (Norwegian Social Science Data Service (NSD))
Dag Ostgulen Heradstveit (Norwegian Social Science Data Service (NSD))
Knut Kagraff Skjak (Norwegian Social Science Data Service (NSD))
The DASISH Questionnaire Design Documentation Tool (QDDT) is a tool under development which aim is to assist large-scale survey projects in relation to their questionnaire design and development processes. Second, researchers and students can use the tool to explore metadata from existing projects, or to design new research. Interoperability with other systems and tools, most importantly the DASISH Question Variable Database and the Translation Management Tool, both currently under development, is another key aim. The work on the QDDT started while the Data Service Infrastructure for the Social Sciences and Humanities (DASISH) project and now continues under the Synergies for European's Research Infrastructure in the Social Sciences (SERISS. The conceptual model for the tool is based on a sub-set of the DDI 3.2 specification. The tool is designed to integrate and communicate with other tools using an API. It is designed to be compatible with DDI and DDI import and export will be implemented as add-ons to the QDDT. A set of modern technologies is used in the development of the tool. This presentation of the QDDT focuses on its conceptual model, system architecture and technologies, functionalities available in the prototype of the tool, as well as plans for the further developments.
2016-06-02: 1C: Data appraisal/selection
Data-Seeking Behavior of Economics Graduate Students: If you buy it, will they come?
Eimmy Solis (New York University)
Current data needs in the field of Economics are largely met through readily available open sources from government agencies, international organizations and non-profit organizations like the National Bureau of Economic Research that freely provide full-text of working papers and data. Proprietary data is also important for research in this area, but is often only available through library-licensed databases. When novice economics graduate students independently seek data, where do they look and why? Are library-licensed data sources being used in addition to widely known free web resources?Through a series of focus group interviews, I am investigating the strategies used to find information online by graduate and PhD students in Economics degree programs at New York University. The study hopes to understand what type of information and resources students use to conduct their research to evaluate current library and publicly available resources related to Economics. The study will assess the quality of data found, identify data trends and innovative search strategies by students. The results will enhance the library's outreach and teaching strategies to improve students' research skills in finding reliable data and lead to data-driven collection development that is more closely tied with the information seeking behavior of economics students.
You are the potter, data sets are the clay: Shaping a collection of small data sets
Karen Hogenboom (University of Illinois at Urbana-Champaign)
Michele Matz Hayslett (Universityof North Carolina, Chapel Hill)
Library vendors are starting to compile data into searchable databases, but these will never be complete and many librarians are purchasing individual datasets. Some of the first issues that librarians need to address when starting a collection of individual datasets are the scope of their collection, and whether or not a formal collection development statement is necessary to guide the collection's development. There are issues in collecting data that are not present when collecting other types of library materials, so a general template for a collection development statement is not as helpful for data as it is for books or even other kinds of electronic resources.The presenters surveyed and interviewed North American academic data librarians about their data collection practices, and this presentation will describe what the study revealed about the benefits of writing a collection development statement, the issues that are addressed in data collection development statements, and what librarians use to guide their purchase and retention decisions if they do not have a collection development statement. Attendees at this presentation will learn about both a general shape of collection development statements for data as well as some specific possible points that can be included, in order to be able to plan their own collection development policies efficiently and thoroughly.
A !!! Data-Driven Approach to Selecting and Curating Content at a Domain Repository
Justin Noble (Inter-university Consortium for Political and Social Research)
Amy Pienta (Inter-university Consortium for Political and Social Research)
The volume of scientific research data being produced is expanding at a rapid rate in the social sciences. We propose to use administrative repository data to guide selection and appraisal practices to ensure that curation resources are used effectively to make the most valuable content findable, understandable, accessible, and usable now and in the future. ICPSR captures information about search behavior to guide what content to add to the repository and also analyzes historic information about data usage to ensure that data likely to get the widest use are curated to the highest level. This paper will share these two analytic models and results. Considerations such as data collection methodology, currency of topic, and breadth and quality of the data surface as key attributes that influence the desirability of data collections. By analyzing a decade of data use patterns, we present information about attributes that predict longer term use of data as well. Finally, how the data-driven models can be linked to repository practices and policies are discussed.
Your Data Wish is Granted: Establishing a Library Data Grants Program at the University of Michigan
Mara Blake (University of Michigan)
In the fall of 2015 the University of Michigan began the Library Data Grants Program as a two-year pilot project. Adding data to the library's collection can prove challenging because of high cost and challenging licensing. Additionally, the impact of those challenges can make the timeframe to acquire data sets long, challenging many researchers requesting data from the library. The library created this program in an attempt to streamline requests for data and clearly communicate a time line to requesters. An additional aim of the program is to create closer, positive relationships for our community of data users on campus. The project received applications from researchers for the library to acquire data sets required for their research projects. The Library Data Grants Committee assessed the proposals based on ability to purchase or license the data, merit of research project, cost, and expected use of the data to make awards. The presentation will provide an assessment of the program after the first cycle of applications and awards, and outline the future direction of the project and broader data collections at the University of Michigan Library.
2016-06-02: 2C: Promoting research data sharing
Incentivize Replication in Economics - Can Data Journals Help?
Ralf Toepfer (ZBW Leibniz Information Centre for Economics)
Though replications and reproducible research are the touchstones of the scientific method, up to now there are just a few published replications in the pages of economics journals. Even in cases where replication attempts fail in reproducing the results of original research paper, economists do not really seem to be particularly interested in such replications. The main reason is that replications do not lead to academic prestige. However, the awareness among researchers, that empirically-based research often is based on shaky grounds, has increased in the last years - not only in economics but also in sociology and psychology. The publication of positive and negative replication attempts can contribute to regain public trust and credibility in empirical economics' research. Against this background my talk will discuss how a data journal could incentivize replications in economics. I will present some studies which describe the outcome of replication attempts and discuss the meaning of failed replications in economics. As a possible way forward I will present the idea of establishing data journals to give a stronger incentive for Economists to conduct replication studies.
Improving Research Data Sharing by Addressing Different Scholarly Target Groups: Individual Researchers, Academic Institutions, Or Scientific Journals
Monika Linne (GESIS - Leibniz Institute for the Social Sciences)
It is gratifying that in recent years a growing awareness for the necessity of research data preservation has led to the implementation of various data sharing repositories in the field of the Social and Economic Sciences. However, some of these repositories address only a specific user group, since issues such as use rights, data access, workflows, data review processes, or metadata schemas are oriented to the needs of a particular target group. Aware of this matter, GESIS is committed to the continuous development of its data sharing services, aiming at different user groups, such as individual researchers, academic institutions or scientific journals. These groups place diverse demands on a data sharing tool, which have to be taken into consideration. Therefore, GESIS provides data sharing tools with functionalities adjusted to certain user groups. These are, for instance, individual researchers without an institutional affiliation or authors, who want to publish data sets they have used for the publication of their research results in academic journals.Additionally, GESIS, in collaboration with the Social Science Centre Berlin, the German Institute for Economic Research, and the German National Library of Economics started the development of SowiDataNet. This tool concentrates on the specific demands of data that has been collected in academic institutions. The overarching objective is the implementation of a national data infrastructure for decentralized research data from the Social and Economic sciences in Germany. SowiDataNet will also consider functionalities for organizational data management of internal data that is not intended to be published (yet!).The barriers and concerns about data sharing can only be overcome, if repositories respond to the specific requirements of different user groups. All the benefits and possibilities of data sharing can only be exploited, if the tool is equipped to render them accessible.
The Role of Case Studies in Effective Data Sharing, Reuse and Impact
Rebecca Parsons (UK Data Service)
Scott Summers (UK Data Service)
The effectiveness and impact of social science research is under constant review. From the sharing, reuse and archiving of social science research data to the outcomes, reach and impact of research, social science professionals are under increasing pressure to realise the maximum potential of their data collections and their research findings.The UK Data Service is playing a key role in supporting researchers in this process and is using detailed and well-received case studies to provide them with guidance on the best practice for sharing and reusing data and also identifying and capturing the impact of research.The impact of research is now routinely considered when considering the "success" of funded projects, but the reality of identifying and capturing impact can be a challenge. Publishing data in its own right is now recognised as being impactful to funders, yet exposing this narrative through "showcasing" is under exploited. Such narratives can incentivise others to share data and also improve the quality of the data and documentation. This paper seeks to explore the role that case studies of research can play in this regard. It achieves this by exploring two separate, but intertwined questions.This paper considers how to shape and position a powerful case study; identifying common challenging issues encountered when sharing data and in effectively capturing impact.The paper also examines the role that depositor and user case studies can play in enhancing the reuse of a showcased data collection. To achieve this, a variety of illustrative depositor and impact case studies are discussed, highlighting the role that these can have on research projects.The paper concludes with some tentative conclusions on how the UK Data Service can continue to develop the role of cases studies in its work and assist our users throughout the lifecycle of their own projects.
How to Convince Researchers of the Usefulness of Data Archiving - The Data Archive in Finland (FSD) as a Case Study
Annaleena Okuloff (Finnish Social Science Data Archive, University of Tampere)
Katja Falt (Finnish Social Science Data Archive, University of Tampere)
Data archiving and reuse is not a common practice in the humanities and health sciences. Researchers in these disciplines can be hesitant to deposit their research data for archiving and reuse. Data repositories and archives spend a lot of time advertising their services at academic institutions and researchers in order to change the attitudes towards open data. This advertising aims at alleviating the concerns researchers may have towards archiving research data.We present a case study, the Data Archive in Finland (FSD) that is broadening its services to the health sciences and humanities. Data sharing practices in these disciplines have not been established. Thus it is vital to introduce the benefits of open research data to these disciplines.In order to chart the researchers' attitudes towards and knowledge about data archiving and reuse, a web-based survey was executed and directed at researchers in the health sciences and humanities. Based on the answers provided in the survey it has been possible to identify the concerns of the researchers and to draw up efficient ways to promote data archiving. We thus present some of the themes emerging from the survey, connected to data archiving, as well as strategies used to approach researchers.
2016-06-02: 2G: Research data management infrastructure and service models
Developing Human Infrastructure to Support Research Data Management Services
Christie Peters (University of Kentucky)
In an effort to develop data management expertise within the library, a team of librarians who have expertise with various aspects of research data management at the University of Kentucky (UK) Libraries established a semester-long training program aimed at retooling library faculty in the area of research data management. The project team distributed a survey beforehand aimed at discerning the perceived level of knowledge about and comfort level with various aspects of research data management, related training needs, and opinions about the level of support needed for data management-related services on campus. The initial 4-day workshop, which included ten guest speakers from across campus, utilized hands-on activities to help participants process the information conveyed. Monthly brown bag sessions on related topics, a semester-long badging program aimed at providing additional resources, motivation and recognition for the various skills achieved, and a concluding celebration at the end of the semester with an external speaker followed the initial workshop. This presentation outlines the results of the pre- and post-workshop surveys, an overview of the training program, feedback from the program's first cohort, and lessons learned throughout the program.
Supporting the Development of a National Research Data Discovery Service - a Pilot Project
Stuart Macdonald (University of Edingburgh)
The Jisc-funded UKRDDS Project aims to develop a national Research Data Discovery Service to allow discovery of research data held in institutions across the UK. The University of Edinburgh is one of the pilot institutions funded to support the development of the service by harvesting metadata records for datasets generated as part of the research process by local researchers.The University of Edinburgh Research Data Management (RDM) Roadmap is a major Information Services-led project to provide a comprehensive RDM storage service. One of the main objectives of the RDM Roadmap falls under the category of "Data Stewardship", namely tools and services to aid in the description, deposit, and continuity of access to completed research data outputs. This reflects one of the EPSRC funding body's key research data expectations and is a University RDM policy requirements.Currently two RDM services are available to University of Edinburgh researchers to address data stewardship, namely:nbsp; PURE, the University's Current Research Information System, where descriptive metadata about datasets can be added along with files, persistent identifiers and links to related research outputs or projects.nbsp; Edinburgh DataShare, a free-at-point-of-use open access data repository which allows University researchers to upload, share, and license their data resources for online discovery and re-use by others.This paper will outline the prospective service and the pilot project. It will detail the use of PhD interns to support the work of Information Services through engagement with the university research community in order help identify and describe data assets for ingest into both PURE and DataShare, and to validate and quality control metadata records for the purpose of being harvested by UKRDDS.
Modularizing Archive Services in the Social Sciences
Oliver Watteler (GESIS, Leibniz Institute for the Social Sciences)
Social science projects and public institutions produce increasing amounts of data to answer research questions. Although a growing number of these data sources are made available, some data producers are not looking for the "classical" full archival service, but only to particular services like long-term preservation (e.g. in institutional repositories). This development was foreseeable and poses challenges to the organizational structures of existing data service providers. To address these challenges, the Data Archive of GESIS, the Leibniz Institute for the Social Science, will modularize its service portfolio. We will move away from offering services as fixed "bundles" only (e.g. documentation, long-term preservation, registration, and distribution or onsite access for all) and aim instead at more accurately customized offers for depositors. By modularizing our services and introducing improved workflow management we hope to make archiving and data services more efficient and more beneficial for the scientific community. GESIS will deliver more timely services, research projects will see reduced costs in time and effort (e.g. in data preparation). Other specialized data services providers might profit from hearing of our experience. This presentation reviews the first phase of services restructuring. We set up new workflows and develop management capacities. Its second aim is to open an exchange with other institutions that have undergone structural changes or are planning to do so.
Research Data Management Tools and Workflows: a Report from the Front
Cristina Ribeiro (University of Porto)
Joao Rocha da Silva (University of Porto)
Joao Aguiar Castro (University of Porto)
Ricardo Carvalho Amorim (University of Porto)
Joao Correia Lopes (University of Porto)
Gabriel David (University of Porto)
Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data on the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable.Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts.The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, a research data management platform based on open-source tools, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.
2016-06-02: 3D: Data curation: Active phase data management
Sharing code and research data with iPython notebooks
Sandra Schwab (University of Alberta)
Code is hard to share. It is often relegated to a citation or mentioned in an analysis, but it is rarely ever published with the results of the research it has informed. While making code and data available to scrutiny is a growing requirement for research funding, sharing code and raw data acknowledges the necessity for openness in research data management and scholarly communication. Librarians are on the forefront of the open data movement, and are uniquely positioned to help researchers find tools and resources that make the work of research easier to do and to disseminate. This presentation introduces iPython Notebook as a tool for sharing code-based research. iPython Notebook is an interactive web-based application that combines live code, data visualizations, and rich text and media. Using my own thesis work as an example, I will show how iPython notebooks can be used for text mining and data visualization on a 58-million-word, multi-file corpus. Using the coding language Python, I've conducted a text analysis on the transcripts of the Canadian Parliamentary debates (known as Hansard) to determine how the discourse of privacy has developed over a 10 year period. I've discovered changes over time in the patterns and frequencies of words being used in the discourse around privacy. I have shared my code and my corpus data on GitHub, for download and use by others. iPython Notebooks not only provide a platform for code to be openly published, they allow other researchers to conduct their own investigations on the data. The structure of the notebooks makes them accessible to non-coders, and their multi-media format provides a linear space for explanation and analysis. For research data to be truly open we must openly share and explain our code. iPython Notebook is an important tool that exemplifies the open data movement.
Electronic lab notebooks: A solution for active-phase data management
Katherine McNeill (MIT)
Many universities are looking to enable researchers to store data effectively during the active phase of research. Solutions need to provide not only storage, but also features for organization and metadata, collaborative work, and version control. One option is Electronic Lab Notebook (ELN) software, a platform used to collect and house experimental procedures, protocols, and data produced in scientific experiments. This presentation will discuss a recent project at one university to initiate a campus-wide rollout and support of a general-purpose ELN, LabArchives. This project is a collaboration between the several university departments: the Libraries, central IT, and the Office of the Vice President for Research. The presentation will discuss and address questions such as: How effective are ELNs for active-phase data management? How might ELNs be used within a suite of tools for active-phase data management? How can staff support adoption and use of ELNs? How can campus departments best collaborate to support use of ELNs? How can this work meet the diverse interests of stakeholders? What can rolling out an ELN teach us about how researchers manage their data day-to-day?
The UK Data Service's Unified User Interface - a framework to provide a consistent user experience for data and metadata management
Ashley Fox (UK Data Archive)
As technology platforms continually evolve to support developing trends such as big data, the need for responsive user facing applications is increasing. This presentation aims to demonstrate how new frameworks like AngularJS can be utilised to build powerful single page web applications. We will look at how The UK Data Service is developing a unified interface to manage the deposit, ingest and access workflow, along with study metadata for collections and series records, which powers our UK Data Service Discover search engine. Increasing use of technologies such as Chart.js and D3.js allow us to provide data visualisation tools which have enriched how users can find and interact with our data. In the future, this technology will expand to power the new UK Data Service user account area, where users can manage their deposits and order new data easily. We will explore the possibilities of open sourcing our unified interface framework, allowing other organisations to provide an integrated experience for their users. For instance, it could be used to provide the CESSDA Research Infrastructure with a common look and feel for the various components delivered by the Service Providers.
Annotation for Transparent Inference (ATI): Selecting a platform for qualitative research based on individual sources
Colin Elman (Syracuse University)
Nicholas Weber (University of Washington)
Diana Kapiszewski (Georgetown University)
Sebastian Karcher (Syracuse University)
Dessislava Kirilova (Syracuse University)
Carole Palmer (University of Washington)
Social scientists working in rule-bound and evidence-based traditions need to show how they know what they know. The less visible the process that produced a conclusion, the less one can see of the conclusion. A sufficiently diminished view of that process undermines the claim. What an author needs to do to fulfill this transparency obligation differs depending on the nature of the work, the data that were used, and the analyses that were undertaken. For a scholar arriving at a conclusion using a statistical software package to analyze a quantitative dataset, making the claim transparent would include providing the dataset and software commands. Research transparency is a much newer proposition for qualitative social science, especially where granular data are generated from individual sources, and the data are analyzed individually or in small groups. Because the data are not used holistically as a dataset, however, new ways have to be developed to associate the claims with the granular data and their analysis. The Qualitative Data Repository has been working on annotation for transparent inference (ATI) for some time, and has made considerable progress, particularly in specifying what information needs to be surfaced for readers to be able to understand and evaluate published claims. With these requirements in mind, this paper will develop a list of functional specifications and a set of criteria for choosing an annotation standard to use as the basis for ATI.
2016-06-02: 1S1: Embracing databrarianship: Professional opportunities and challenges
Embracing Databrarianship: Professional Opportunities and Challenges
Lynda Kellam (University of North Carolina, Greensboro)
Kristi Thompson (University of Windsor)
Hailey Mooney (University of Michigan)
Amber Leahey (Scholars Portal)
Joel Herndon (Duke University)
Rob O'Reilly (Emory University)
Walter Giesbrecht (York University)
This session will explore a variety of issues and responsibilities within data librarianship, drawing on an international community of experts behind a forthcoming book on the theory and practice of the profession. As the number of data librarian positions expands, we bring to light the fundamental opportunities and challenges shaping the field. The session will begin with critical reflection and framing of the current state of data librarianship, followed by a series of snapshots showcasing a mix of case studies and theoretical explorations.First, Mooney will share the essential history and background of data's place in the scholarly communication environment in order to ground data librarians within the larger context shaping their work. Herndon and O'Reilly will complement this more theoretical approach with their comparative study of journal replication policies across various social science disciplines. Next, Leahey and Fry will discuss suggestions for metadata best practices. The final presentation will focus on future opportunities for data librarianship with Giesbrecht considering directions in teaching the next generation of data librarians.In closing, the editors of the forthcoming book will highlight common themes and concerns across the various chapters with the goal of a broader community discussion about the future contours of the field of databrarianship.
2016-06-02: 1B: Data collection challenges
Collecting Community Experiences of Conflict
Celia Russell (JISC)
How do UN peacekeeping missions gather information about local experiences of conflict? What are the effects of personal networks, tensions with national security services and trust in the credibility of the mission? How do these factors influence the reliability and the accuracy of the data produced? Peacekeeping missions gather narratives of security incidents and human rights abuses from local populations in order to monitor the security environment and inform strategic decision-making. The documented incidents create a historical record of the security situation and often constitute one of the few continuous information resources on the conflict. However, relatively little is known about the practices by which this information is amassed, verified and processed. In this key informant study, we talked to five former field officers from the UN mission in Darfur to better understand the methods of incident data collection, evaluation and verification. In particular, we looked at the extent to which the ideals as outlined in formal training and guidelines diverge from the experience of data collection on the ground in Darfur.
Presentation reflects experience of Czech Social Science Data Archive (CSDA) based on collaboration with historians from The Institute for the Study of Totalitarian Regimes' institution documenting period of communist rule in Czechia. Continuing work consists in preparation, archiving and publishing of data sets based on data from "classic" archives.
The Happiness Analyzer: A Proposed Solution to the Challenges of Measuring Well-Being in Developing Countries
Paula Lackie (Carleton College)
Kai Ludwigs (Happiness Research Organisation)
Faress Bhuivan (Carleton College)
Measuring well-being in developing countries is still a work in progress. There are numerous issues at all stages of the process but we are developing new mechanisms to cope with them. This presentation will describe the progression of a census of the relative well-being among villagers in Bangladesh in 2013 from a low-budget, high-labor approach into the next iteration of this research; a smartphone app based on the survey tool Happiness Analyzer. Our new tool is designed to help control the interviewer’s location, their accuracy in inputting the information and informs them who they should interview, when and where. The tool works offline on pc, tablet and smartphone with high data security and ethical standards. It allows for the collection of high quality well-being data in developing countries in a comparatively efficient way. This presentation will also describe how basic components of the data life cycle for this project (a GIS, the database process, its metadata scheme, and security protocols) grew or differ from the first iteration (which used low-power smartpens and paper for the data capture) to the second (which will use the new app).
2016-06-02: 2H: Research replication promotion and service development
Reuse of Research Data - Re-Writing the Economic History of Denmark Using Research Data
Steen Andersen (Danish Data Archive)
.
Reproducible Research: A Replication Server for the Social Sciences
Natascha Schumann (GESIS, Leibniz Institute for the Social Sciences)
Openness is an important aspect of good scientific practice. In this context, data sharing can be seen as a trust-building mechanism: Making underlying data findable and accessible supports reproducibility and confirmability of research results published in journal articles.The presentation gives an overview about the "Replication Server" project, which is an initiative by two leading German sociology journals and GESIS to foster reproducibility in the social sciences. As part of this project, both of the involved journals, "Zeitschrift fur Soziologie" and "Soziale Welt", developed respective data policies. Authors who submit articles based on research data have to agree to make their data available to the community in the case that the article is published. In December 2015 a corresponding service was introduced to support journals in implementing their data policies practically.datorium is an existing GESIS service which provides a user-friendly tool for the documentation, upload and publication of social science research data. Researchers describe their data in a standardized manner. Incoming data will be checked with regard to data privacy, coherence and completeness. All datasets receive a persistent identifier (DOI).For the purposes of the cooperation with the journals, datorium has been extended by additional features. These make it possible to link all data sets to the corresponding articles and vice versa. Users can easily recognise data sets as belonging to an article from the respective journal. Data are accessible via the datorium webpage and access conditions are definedin accordance with the policies of the respective journals.The initiative started with the two mentioned journals but is also open for further partners.
Journals in Economic Sciences: Paying Lip-services to Reproducible Research?
Sven Vlaeminck (ZBW Leibniz Information Centre for Economics)
Felix Podkrajac (Oldenburg University)
The talk focusses on the efforts of journals in economics and business studies in fostering research integrity. We report the findings of an empirical study in which a sample of 346 journals in economics and business studies was examined. One aim of our study was to determine whether these journals support reproducible research by implementing data policies and data archives. Another aim was to analyse the specifications of these data policies and to determine potential differences and commonalities for the two branches of economic research.In addition, the talk presents the outcome of an evaluation of journals' data archives. In this second study, two issues of each journal equipped with a data policy were checked for accompanying research data. With this additional analysis we aim to estimate whether journals with data policy really enforce data availability.Based on these studies, the talk provides an overview in recent developments of journals' research data management in economic research. The results indicate that especially journals in economics are in a state of flux.
Roles for the Data Services Community in Promoting Openness and Integrity in Social Science Research.
Harrison Dekker (University of California, Berkeley)
The goal of this panel is to raise awareness of the nascent movement in economics, political science, sociology, and related disciplines to improve the standards of openness and integrity in research. Certain aspects of this movement, namely those involving data sharing, will be familiar to the IASSIST community. But, the movement is also promoting a range of other practices such as study registries, pre-analysis plans, version control, disclosure standards, and replications that may be less well known in IASSIST. Given the significant overlaps between our communities and the potential for a mutually beneficial ongoing relationship it's an ideal time to begin a conversation. This panel will attempt to do so by bringing together data professionals and academic practitioners who are committed to advancing ethical changes in social science research.
2016-06-02: 3C: Metadata driven systems
Establishing an integrated data sharing process for micro- and metadata at Deutsche Bundesbank
Anja Treffs (Deutsche Bundesbank)
Meike Becker (Deutsche Bundesbank)
Deutsche Bundesbank has a legal mandate to collect monetary, financial and external sector statistical data. These are processed into comprehensive sets of indicators and seasonally ad-justed aggregated business statistics. After the financial crisis microdata are becoming in-creasingly important for evidence based research, politics and decision making. Therefore, establishing a data sharing culture throughout the Bundesbank is crucial. This chal-lenge is addressed by an Integrated Microdata based Information and Analysis System (IM-IDIAS). IMIDIAS is based on the existing statistical data warehouse infrastructure where the aggregated data is successfully stored in SDMX. For the common metadata model SDMX is complemented by DDI in order to address additional needs formulated by researchers and analysts. These combined standards offer ideal means of linking, comparing and consolidating micro-data and metadata. This represents great advantage for data matching, comparing, consoli-dating and analysing data treasures of Deutsche Bundesbank. This talk will focus on the technical infrastructure of the data and metadata integration pro-cess within IMIDIAS as well as the organisational integration with the newly established Re-search Data and Service Centre in order to provide data access to microdata for internal and external researchers.
Andreas Franken (German Centre for Research on Higher Education and Science Studies)
Alexandre Mairot (SciencePro)
Michelle Edwards (Cornell University)
During the 2015 DDI training workshop "Facilitating Process and Metadata-Driven Automation in the Social, Economic, and Behavioral Sciences with DDI" at Dagstuhl, participants took part in many discussions of use cases related to their daily work. One of these discussions focused on discovery and data exploration systems for data described by DDI. The use case was as follows: Discovery systems use varying sets of metadata for search and display depending on the purpose of the search system and the community that it serves. This use case discusses the metadata objects needed to support different services and the implications for DDI metadata content. Several perspectives were discussed in terms of audience, level of searching, display requirements, and linkages between data objects. The result of this discussion was a description of decisions points for the selection of search fields, locations where structured metadata would be useful, and the implications of design decisions for the construction of the DDI metadata. This paper will outline the results of the discussion with a focus on how the metadata informs and defines the potential capabilities of the search system and conversely how the functionality of the desired search design informs how the metadata should be structured. The discussion resulted in recommendations of how to specify search and exploration requirements to programmers in order to achieve the desired results.
Making Nordic health data visible
Dag Kiberg (Norwegian Centre for Research Data(NSD))
Mari Kleemola (Finnish Social Science Data Archive, University of Tampere)
Annaleena Okuloff (Finnish Social Science Data Archive, University of Tampere)
Bodil Stenvig (Danish Data Archive)
Jeppe Klok Due (Danish Data Archive)
Elisabeth Strandhagen (Swedish National Data Service)
Martin Brandhagen (Swedish National Data Service)
The project is network collaboration between the Nordic social sciences data services with the primary aim to develop a discovery portal prototype for Nordic health data. Such portal requires: Common metadata standards; Broadening existing documentation standards with controlled vocabularies; Harmonized formats across repositories A first prototype is already made and published. The project focus on metadata needed to discover, locate and search for Nordic health data, including access rights. One of the challenges is that DDI as such does not specify or restrict the values, or terms, that can or should be used to describe the data. Inconsistent use of terms leads to misunderstandings and complicates drastically the machine-actionability and interoperability. The project has during its first period: 1. Charted the controlled vocabularies used by the four participating data archives and compared how they are used; 2. Identified the vocabularies that would be most useful for a Nordic Health Data Portal and broadened them to include concepts relevant for health data; 3. Mapped the DDI Codebook and DDI Lifecycle data descriptions. The presentation will focus on the objective of the project, controlled vocabulary and metadata, and the working method used
2016-06-02: S3: Data curation for quality and reporducibility
Data Curation for Quality Reproducibility
Thu-Mai Christian (University of North Carolina, Chapel Hill)
Sophia Lafferty-Hess (University of North Carolina, Chapel Hill)
Limor Peer (Yale University)
Florio Arguillas (Cornell University)
William Block (Cornell University)
Prompted by initiatives such as the TOP Guidelines, the scientific community has restored its focus on the replication standard. Defined 20 years earlier by Gary King (1995) in his seminal article, "Replication, Replication," "the replication starndard holds that sufficient information exists with which to understand, evaluate, and build upon a prior work if a third party could replicate the results without any additional information from the author" (p. 444).For data curators, the replication standard should also hold that dataset files, programming code, codebooks, and all other materials that enhance interpretation and reuse of the data are stored in a trustworthy repository where files are normalized to sustainable formats and described using standard metadata. In addition, verification of replication materials to ensure that a third-party user can reproduce the tables and figures presented in published articles gives further assurances as to the reliability of data housed in repositories. These addenda extend and operationalize King's replication standard by establishing data curation processes that uphold the "gold standard" of data quality and reproducibility.Panelists will offer insights into the data curator's role in defining and supporting concepts of data quality as they uphold the replication standard. Each of the panelists will describe their unique experiences working with journal editors and authors as well as ways in which panelists have been rethinking, refining, and retooling data curation roles and processes to support and promote data quality and research reproducibility.
2016-06-02: S5: CESSDA sets the stage fro data infrastructure of the future
CESSDA Sets the Stage for the Data Infrastructure of the Future
Vigdis Kvalheim (Norwegian Social Science Data Service (NSD))
Rory Fitzgerald (SERISS)
Elena Sommer (SERISS)
Ivana Ilijasic Versic (CESSDA/ big Data Europe)
Mari Kleemola (Finnish Social Science Data Archive, University of Tampere)
CESSDA provides large scale, integrated and sustainable data services to the social sciences. It brings together social science data archives across Europe, with the aim of promoting the results of social science research and supporting national and international research and cooperation.The vision of CESSDA, as stated in its statutes, is to provide a full scale sustainable research infrastructure that enables the research community to conduct high-quality research which in turn leads to effective solutions to the major challenges facing society today.CESSDA is a beneficiary in three European projects under the framework of the Horizon 2020 programme of the European Commission:nbsp;Big Data Europe - Empowering Communities with Data Technologiesnbsp;CESSDA SaW - Strengthening and widening the European infrastructure for social science data archivesnbsp;SERISS - Synergies for Europe's Research Infrastructures in the Social SciencesCESSDA also has a Work Plan project specifically dedicated to metadata management.
2016-06-02: Poster Session
Integrated Environment for Social Research Data Analysis: DDIR
Yasuto Nakano (Kwansei Gakuin University)
The purpose of this presentation is to propose an environment for social research data and its analysis. A R package DDIR and an IDE dlcm, which utilize social research informations in DDI format, offer you integrated environments for social research data. DDI(Data Documentation Initiative) is a XML protocol to describe informations related to social research including questionnaire, research data, meta data and summary of results. There are several international research projects which use this protocol as a standard format. ICPSR(Interuniversity Consortium for Political and Social Research), one of the biggest data archive for social research data, encourages data depositors to generate documentation that conforms with DDI. In R environment, there is no standard data format for social research data . In many case, we have to prepare numerical data and label or factor informations separately. If we use DDI file as a data file with DDIR in R, only one DDI file is needed to be prepared. DDI file could be a standard data format of social research data in R environment, just same as 'sav' file in SPSS. DDIR realizes integrated social research analysis environment with R, and ensures it as a reproducible research.
A tale of two services
Carol Perry (University of Guelph)
Michelle Edwards (Cornell University)
Two universities. Two countries. This case study will examine the evolution of data services at two universities of similar size and research focus from two different countries. We will explore the similarities and differences within the context of the fast-paced and ever-changing research data landscape. A review of factors affecting the decisions made at various stages of development and service delivery will expose the cultural differences between the two institutions and the countries in which they reside. What lessons can be learned from the choices that have been made? Will their paths converge or will their futures be vastly different?
Content Analysis of Online Chat Scripts: Ethical and Practical Perspective
Judy Li (University of Tennessee)
David Atkins (University of Tennessee)
Online reference services such as chat and email supplant what were traditionally face-to-face research interactions in the libraries. These transactions have the potential to generate robust data sets comprised of transcripts documenting, hundreds or even thousands of library service interactions spanning a single academic term. In this poster, we discuss the issues and challenges encountered, and the solutions devised to manage chat transcripts including data set creation, de-identification for patron and librarian privacy, and content analysis.
Data Management, Dissemination Linkage in Add Health: The National Longitudinal Study of Adolescent to Adult Health
Ashley Sorgi (University of North Carolina, Chapel Hill)
The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-1995 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews from 1995-2009 and will conduct a fifth wave of web based data collection and in-home visits in 2016-2018 to collect social and biological data on the respondents at ages 31-42. This poster provides an overview of our data dissemination strategies, a four tiered system set to minimize deductive disclosure risk for respondents. The poster also presents biomarker data available for each wave of data collection and provides a summary of Wave V data collection underway. We will discuss the most popular research areas and explore opportunities for new data users. Genome-Wide Association Study (GWAS) data will be disseminated through The NIH database of Genotypes and Phenotypes (dbGaP) in early 2016 and information on data linkage between social science phenotype and the newly available genetic data will be explored.
Integrating European survey research questions with Euro Question Bank
Azadeh MahmoudHashemi (GESIS - Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moltgen (GESIS - Leibniz Institute for the Social Sciences)
The project Euro Question Bank (EQB) will implement a searchable database of all the survey questions of studies, which are provided by CESSDA member archives. For social science researchers, the EQB will provide an easy and central facility to access the survey questions in different languages. So far, existing question databases were provided separately for different collections and in several different countries. This limited the access to existing survey questions as well as possibilities for cross-national comparative research. By providing different functionalities within EQB, such as "search", "comparison", and "multilingualism", users will be enabled to access the holdings of different research communities and browse by different elements such as "questions", "keywords", "concepts", "variables", "collections" and others. The implementation of EQB is based on the DDI-Lifecycle metadata standard and provides both DDI-Lifecycle and DDI-Codebook import/export functionalities. It is based on previous efforts, which have been made by GESIS in cooperation with other CESSDA member archives within several international projects. The content of EQB will be provided by the CESSDA member archives, after successful development of the EQB. The project aims to make it as easy as possible for CESSDA member archives to supply documentation to the EQB.
A lack of Persistent Identifiers means missing data - PIDs in CESSDA
Kerrin Borschewski (GESIS - Leibniz Institute for the Social Sciences)
Brigitte Hausstein (GESIS - Leibniz Institute for the Social Sciences)
The increasing amount of scientific digital data imposes the need to identify datasets with persistent identifiers (PIDs). The functionality to unambiguously locate and access digital resources and to associate them with related metadata is essential to allow data-preservation, -retrieval, and -citation. CESSDA (Consortium of European Social Science Data Archives) aims to promote the results of social science research. To do so and to increase the visibility of data, the persistent identification of the CESSDA data holdings is of great importance. Currently, the different CESSDA data archives use varying PID systems without any common approach, which diminishes CESSDA's potential to achieve its aim. Therefore, the consortium has established a task on PIDs, which aims for the development of a common CESSDA PID Policy, to foster research in general and the visibility of data. To establish this policy, the task-contributors chose diverse approaches (quantitative survey, personal interviews, group discussion), that give an overview of the CESSDA data archives' needs, wishes, and problems concerning the use of PIDs. The poster provides an overview of the current status of the CESSDA PID Task concerning the CESSDA PID Policy.
Making Nordic Health Data Visible
Dag Kiberg (Norwegian Social Science Data Service (NSD))
Mari Kleemola (Finnish Social Science Data Archive, University of Tampere)
Annaleena Okuloff (Finnish Social Science Data Archive, University of Tampere)
Bodil Stenvig (Danish Data Archive - FSD)
Jeppe Klok Due (Danish Data Archive - FSD)
Elisabeth Strandhagen (Swedish National Data Service)
Martin Brandhagen (Swedish National Data Service)
The project is network collaboration between the Nordic social sciences data services with the primary aim to develop a discovery portal prototype for Nordic health data. Such portal requires: Common metadata standards Broadening existing documentation standards with controlled vocabularies Harmonized formats across repositories A first prototype is already made and published. The project focus on metadata needed to discover, locate and search for Nordic health data, including access rights. One of the challenges is that DDI as such does not specify or restrict the values, or terms, that can or should be used to describe the data. Inconsistent use of terms leads to misunderstandings and complicates drastically the machine-actionability and interoperability. The project has during its first period: 1. Charted the controlled vocabularies used by the four participating data archives and compared how they are used; 2. Identified the vocabularies that would be most useful for a Nordic Health Data Portal and broadened them to include concepts relevant for health data; 3. Mapped the DDI Codebook and DDI Lifecycle data descriptions. The presentation will focus on the objective of the project, controlled vocabulary and metadata, and the working method used.
Building a Metadata Portfolio for CESSDA
Mari Kleemola (Finnish Social Science Data Archive, University of Tampere)
Wolfgang zenk-Moltgen (GESIS, Leibniz Institute for the Social Sciences)
Anne Etheridge (UK Data Service)
We will present the first outcomes of the Consortium of European Social Science Data Archives (CESSDA) Metadata Management Project (CMM). The project is a CESSDA Work Plan Task with an objective to develop, promote and implement a standardised metadata design, content and practice for all CESSDA data assets. The main output will be the Metadata Standards Portfolio Version 1 that will encompass support for resource discovery, question banks, preservation, data access and multilinguality, be platform independent, and help CESSDA Service Providers achieve the Data Seal of Approval (DSA) certification requirements related to metadata issues. The two main building blocks of the Portfolio are the Core Metadata Portfolio and the Controlled Vocabularies Portfolio. The Core will be built mainly upon the DDI-Lifecycle standard but will include elements from other relevant standards where appropriate. The CV Portfolio will contain CESSDA Controlled Vocabularies for relevant metadata fields, taking into account and supporting the DDI CVG work. The project period is November 2015 - April 2017 and the project is a collaboration between eight CESSDA Service Providers: FSD (lead), ADP, CASD, DDA, GESIS, NSD, SND and UKDS.
UK Data Service: the 'a' (Access) team
Laura Beauchamp (UK Data Archive)
Alix Taylor (UK Data Archive)
The Access team are responsible for processing user requests for access to data available via the UK Data Service and for managing user queries submitted to its Helpdesk. This poster session will present the work of the team in a graphical way to highlight: a) how the workflow changes depending on the level of access for the data requested, e.g. Open, Safeguarded, Controlled access, and b) the types of query submitted to the Helpdesk.
da|raSearchNet - Searching for Research Data
Karoline Harzenetter (GESIS - Leibniz Institute for the Social Sciences)
Brigitte Hausstein (GESIS - Leibniz Institute for the Social Sciences)
The DFG funded project da|raSearchNet aims for the development and establishment of an integrated search network. It is a project hosted by the GESIS Leibniz Institute for the Social Sciences and carried out in the framework of the Registration Agency for Social and Economic Data da|ra. The agency's service is to allocate persistent identifiers to data using the DOI system. Da|raSearchNet enables users to do research inside an up to date database of data references in one place and links it to data holdings worldwide. The da|ra interface permits to use value-added services that allow for individual search, use and management of data references. The search engine is based on the da|ra search index, which includes the metadata of all objects registered with da|ra and metadata harvested via web interfaces from selected national and international repositories. The content of the database is constantly expanded by registration activities and the harvesting process. Before entering the index of da|raSearchNet the metadata is transformed. This transformation includes among other tasks mapping to different metadata standards, combining linking and enriching metadata and checking for redundant data. Content extension and metadata transformation is strongly supported by technical solutions. Current important clients of da|ra are the GESIS Data Archive, the ICPSR and Research Data Centers like the Socio Economical Panel (SOEP), the Survey of Health, Ageing and Retirement in Europe (SHARE), and many others. da|raSearchNet is a service for data providers to gain greater visibility for their holdings, researchers to find and manage references and for service providers to access exposed structured metadata via OAI-PMH. The project complements the retrieval services of GESIS for to research data that is not stored itself in the Data Archive of GESIS.
Update Your Space: Tips for Renovating a Map and Data Centre
Kevin Manuel (Ryerson University)
Are you planning on updating your Data Centre? Ryerson University Library (Toronto, Canada) renovated its Map and Data Centre starting in the autumn of 2015. Here are some tips and considerations from our experience that will help you in planning your next renovation. The first stage of planning starts with the physical redesign of the space: the architectural design, layout of collections and computer workstations, and selection of new furniture. Next there is planning for the installation of new computer hardware, installation of software, and setting up new internet connections. Once the physical and technical upgrades are completed then policies need to be established for a data centre. These include setting hours of operation, staffing, a food policy, and setting up an online booking system for users to access GIS and statistical software. Finally, involving feedback from users about the final renovation results can confirm if the renovation was successful or if any adjustments need to be made.
Using Stata for Web Scraping
Rob O'Reilly (Emory University)
Practitioners of data journalism at The Guardian like to note that working with data is often "80% perspiration, 10% great idea, 10% output" (http://www.theguardian.com/news/datablog/2011/jul/28/data-journalism). Extracting data from web-based sources is a case in point: even if the data are "open" and accessible to all, that does not guarantee that they will be in a usable format for research and analysis. Instead, it often takes extensive effort to extract the contents of such sources and get them into clean, usable states, even when the original data are presented in tabular form. A common approach to web scraping is to make use of a programming language such as Python or R. However, Stata also possesses functionality that can be useful for this purpose. In this presentation, I will discuss how we have been taking both built-in Stata commands for loops, macros, and string functions and user-written Stata commands for parsing text files and using them in combination to grab data from web sites, clean their contents, and turn them into usable datasets for researchers.
The Introduction of SRDA
Wan Yun Lo (Academia Sinica)
The Survey Research Data Archive (SRDA) founded in November 1984 by the Center of Survey Research engages in the systematic acquisition, organization, preservation, and dissemination of academic survey data in Taiwan. The datasets collected in SRDA are donated by researchers, surveys carried out by SRDA, government department, and other academic organizations and broadly divided into survey data, censuses, and In-house value-added data. Confidentiality and sensitivity are evaluated prior to the release of every survey data set. Standard data management and cleaning procedures are applied to ensure data accuracy and completeness. In addition, metadata and relevant supplement files are also edited and attached. For the access of the restricted data with personal, confidential, and sensitive information, SRDA provides two services: on-site service and remote service. Both services are provided only after the approval of the application. In order to ensure data security, SRDA has been awarded ISO 27001:2005 certification for its digital data storage and usage services from BSI management Systems in 2010. SRDA has also obtained ISO 27001: 2013 transition certifications in 2015. Since 2012 SRDA has developed an online comprehensive inquiry service of academic and government survey data. The use of concept terms or keywords as parts of question item allows users to search data more conveniently and efficiently in addition to the primary services for searching by topics and data sets. The SRDA set up Networked Social Science Tools and Resources (Nesstar) in 2009. The user interface of Nesstar allows users to search and browse survey data on the web.
CESSDA poster
Ivana Versic (CESSDA)
Hossein Abroshan (CESSDA)
CESSDA has a poster which it would like to hang up at the venue. We have copies here at the CESSDA House in Bergen. The poster does not present project or research information as such but it does present CESSDA as a pan-European infrastructure and it would add visibility at the event
Controlled Vocabularies Published by the DDI Alliance
Sanda Ionescu (ICPSR - Inter-university Consortium for Political and Social Research)
The DDI Alliance is well known for being the originator of the DDI metadata standard, which is now widely used across several continents. This poster presentation will focus on updating the audience on another product of the DDI Alliance, the Controlled Vocabularies, with a view to increase the visibility and usage of this valuable metadata resource among data users and curators. Controlled vocabularies are structured lists of terms, or concepts that may be used to standardize metadata content and thus enhance both resource discovery and metadata interoperability. A clear advantage presented by the DDI Alliance Controlled Vocabularies is that they are published independently of the DDI specification, and therefore may be used in conjunction with any version of the DDI standard, but also with other metadata standards that may have a different structure and need not be expressed in an XML language. Our poster presentation will include a brief review of the published vocabularies and our plans for the future, will familiarize the audience with their Web presentation and download, will discuss translations and the possibility of other agencies contributing to the vocabularies' development, with the main goal of encouraging the vocabularies' widespread usage.
Connecting Social and Health Sciences Data - This Librarian's Life
Michelle Bass (University of Chicago)
In this presentation, I will highlight my experiences connecting social science data to health science fields and discuss the on-the-job training required to bridge these disciplines in my role as science research services librarian. My perspective as a social scientist who evolved into an information scientist informs my practice, outreach, and collaborative efforts with regard to the library's research data management and data services and departmental and center partners. I will discuss in-depth my role in helping connect social science data to health science fields in three projects: 1. The Center for Health and the Social Sciences Program in Oral Health, Systemic Health, Well-Being, the Social Sciences resource guide for students and researchers. 2. The Institute for Translational Medicine reapplication grant to the National Center for Advancing Translational Sciences process. 3. Research data services workshop series and resource creation and facilitation for non-data focused librarians across the social, biological, and physical sciences and humanities. Topics of emphasis for these respective projects include training and skills development for research data management (for myself, librarian peers, and faculty and researchers); challenges, and solutions, in exchanging research data across disciplines; and how my expertise as an information professional was applied in multi-disciplinary collaborations. I hope my outsider-looking-in perspective will promote productive discussion about the ways in which information professionals can connect social science data to questions and experiments in health science fields.
RDM Needs of Science and Engineering Researchers: a View from Canada
Cristina Sewerin (University of Toronto)
Eugene Barsky (University of British Columbia)
Melissa Cheung (University of Ottawa)
Dylanne Dearborn (University of Toronto)
Angela Henshilwood (University of Toronto)
Christina Hwang (University of Alberta)
Erin MacPherson (Dalhousie University)
Understanding researcher behaviour and workflow is instrumental to developing reflective service. With changes in funding requirements around sharing, preservation and the submission of a data management plan potentially looming, institutions across Canada are engaging with researchers to better understand research data management (RDM) practices and needs. What are the characteristics of the research data produced, and how do researchers manage their data? What are their attitudes towards RDM support services and data sharing? A number of Canadian universities have partnered to survey their respective science and engineering researcher communities, with participating institutions at time of writing including: University of Toronto, University of British Columbia, University of Waterloo, University of Alberta, Queen's University, University of Ontario Institute of Technology, Dalhousie University, and University of Ottawa. These institutions are collaborating to better understand both national and local needs, as well as to generate a richer understanding of disciplinary practices by generating comparative data for cross analysis. In this poster, the project development, results, and future steps will be summarized.
Introduction of Additional Primary Services and Derivative Work Inquiry Service of SRDA
Ya-Chi Lin (Academia Sinica)
Survey Research Data Archive (SRDA) is an electronic library of the largest collection of digital data in social sciences in Taiwan. The survey data collected in SRDA are broadly divided into academic survey data and government survey data. Academic survey data contain more than 1,900 datasets from 17 fields including education, sociology, political science, economics, and so on. In response to the increasing of datasets archived, SRDA has developed various inquiry services for users to search survey data conveniently and efficiently. In order to maximize the value and utility of the released survey data, which can be reused as secondary data, derivative work inquiry service has been established to assist users to search citations based on archived datasets. In present, three main ways are adopted by SRDA to collect the related publications, respectively internet, email survey on our users, and online report by SRDA members. The information of publications includes title, author, year of publish, and the Digital Object Identifier (DOI) of the cited data. Hopefully, this inquiry service will help SRDA members for secondary data analysis. Consequently, this poster session will demonstrate current derivative work inquiry service from establishment, user interface, results of search, methodology, and future perspectives.
Keeping Track of Users and Publications at Social Science Japan Data Archive
Yukio Maeda (Social Science Japan Data Archive)
Satoshi Miwa (Social Science Japan Data Archive)
Kenji Ishida (Social Science Japan Data Archive)
Koichi Iriyama (Social Science Japan Data Archive)
This poster presents the background and workflow of user management at Social Science Japan Data Archive (SSJDA). SSJDA keeps track of its users and publications since its start. The main motive is to ensure the trust from data producers when there was no data sharing institution in Japan. To convince data producers, we formulated a strict policy in terms of data use, and the subsequent report of publications. This procedure not only enhances the trust from data producers but also provides them incentive to deposit their data as they can actually see that they are acknowledged. It also helps SSJDA to demonstrate the impact of its activity quantitatively. Initially, the management of users and publications was tedious paper work. We gradually developed the relational database between users, datasets, and publications. Further, the system was revised in such a way that users can submit their reports through the Internet and the submitted information is automatically updated within the database. Currently the collected information is used only within SSJDA, but we plan to set up a web page on which visitors can search and find the publications resulting from the secondary use of specific datasets.
DDI implementation at the SSJDA
Akira Motegi (University of Tokyo)
We will introduce DDI implementation at the Social Science Japan Data Archive (SSJDA), focusing on its two core projects. The first is the Easy DDI Organizer (EDO), a metadata editing and management software development project, based on both the DDI-L and DDI-C. Resulting from stressing the compatibility, the file import and export function is built in the EDO. It can import variable level metadata from SPSS files and export a codebook and a questionnaire in a Word format. The second project is Nesstar system operation. While the Nesstar system has been widely operated across countries, there was enough room for its implementation in Japan. The operation at the SSJDA has started since 2012. The number of published datasets now amounts to over 70. Given the results from its current operation, the perspectives for its future development will be discussed.
FORSbase V2.0: a web-based data archiving platform
Marieke Heers (FORS - Swiss Centre for Expertise in the Social Sciences )
Eliane Ferrez (FORS - Swiss Centre for Expertise in the Social Sciences )
Small data archives in Europe often lack the human resources for adequate documentation and delivery of data. FORSbase V2.0, a new product of the Swiss Centre for Expertise in the Social Sciences FORS in Lausanne, facilitates and automates data documentation, deposit, and access, thus freeing up resources for promotional and training activities. Its goal is to combine within a single system and database a wide range of archiving functions and tools for researchers themselves to document and deposit their data, access data and metadata, and to establish contacts and communicate with other researchers. All of this is done within individual researcher "workspaces" where specific project descriptions and data are safely stored. Within the workspaces, researchers also have access to a messaging system and other resources to assist them in their work. The benefits of such a system for researchers are the ease with which they can manage, store, and deposit their data, as well as search for and download directly the data of others. This presentation will highlight and demonstrate the key features of FORSbase V2.0.
The Architecture of Data Science and Archiving - Archonnex Architecture and Technology Stack
Thomas Murphy (University of Michigan, ICPSR)
Harsha Ummerpillai (University of Michigan, ICPSR)
Archonnex is a Digital Asset Management System (DAMS) architecture defined to transition to a newer technology stack meeting core and emerging business needs of the organization and the industry. It aims to build a digital technology platform that leverages ICPSR expertise and open source technologies that are proven and well supported by strong Open Source communities. This component based design identifies re-usable self-contained services as components. These components will be integrated and orchestrated using an Enterprise Service Bus and Message Broker to deliver complex business functions. All components starts as a Minimum Viable Product (MVP) and improved in iterative development phases. This poster will identify all the various operational components and the associated technology counterparts involved with running a data science repository. It will consider the process of the upfront integration with the researcher to allow better managed data collection, dissemination and management (see the SEAD poster) during research and follow the workflow process technologically through from the ingestion of data to the repository, curation, archiving, publication and re-use of the research data including the citation and bibliography management along the way. Finally, conference participants will leave with an understanding of how the Archonnex Architecture and its Technology Stack is strengthening the data services offer to new researchers as well as data re-use. The integration of data management plans and their impact on this workflow should be apparent with this ground up architecture designed for the data science industry.
IFDO Poster
Jonathan Crabtree (IFDO)
IFDO was established in the mid 1970s as a response to the research needs of the international social science community. The founders felt it would be advantageous to coordinate worldwide data services and thus enhance social science research. Current efforts of IFDO seek to gather input from the international community and IFDO members to help guide the organization into the future. The dynamic nature of research data lifecycle will be discussed and input from the community will be used to develop future IFDO directions and potential services. Come join the discussion and help shape IFDO's future.
An Architecture for Social Science Data Curation
Deirdre Lungley (UK Data Archive)
The UK ESRC's Big Data Network (BDN) has a broad ambition to provide a coherent data infrastructure, harnessing new and novel forms of data, to provide social science researchers with the data, tools, technology and skills they need to undertake excellent, impactful research. The UK Data Service (UKDS), as part of the BDN, has been tasked with creating cost effective resourcing models. To this end we have adopted the Open Data Platform (ODP) data lake approach. We are utilising the Hortonworks Data Platform (HDP), with its associated data storage, processing and analytics components to showcase the power of the ODP model to provide a cost effective, scalable framework. Students and researchers working on this platform are exposed to industry-standard data tools. A hybrid architecture, a HDP cloud installation linked to an on-premises installation coupled with the ODP Hadoop governance framework (Ambari, Ranger, etc.) allows us to provide the Safe-Setting component of our "5-Safes: Secure Access to Confidential Data" commitment. A hybrid approach allowing us to serve both secure and non-secure data services. This poster will illustrate how this powerful, scalable, yet cost-effective architecture can, through generic processing, take Big Data from a raw text state through to powerful data products (per user aggregates) and information products (result visualisations).
The Metadata Model of The Dataverse Project: Helping More Data Become Discoverable
Eleni Castro (Harvard University)
Since 2006 the Dataverse Project, an open source data repository application developed at Harvard's Institute for Quantitative Social Science (IQSS), has provided metadata for datasets in the social sciences using the Data Documentation Initiative (DDI) standard. Over time the Dataverse application has expanded to also include metadata and file support for: additional domains such as astronomy and the biomedical sciences, as well as to increase interoperability with other systems. This poster will describe the process of expanding support to other domains for purposes of interoperability, discovery, preservation and reuse. It will also provide a visual graph outlining all of the currently supported APIs, metadata standards/ schemas (based on DDI Codebook, Dublin Core, DataCite 3.1, Virtual Observatory for astronomy data, and ISA-Tab for biomedical data), ontologies, and thesauri; along with what is planned to be supported in the future (e.g., DCAT, RDF, and schema.org).
Academic Data Services and Makerspaces
Samantha Guss (University of Richmond)
Makerspaces have become increasingly popular on college campuses at the same time that support for data research, teaching, and learning has become a top priority for many academic libraries -- but what do these services have in common? Through an exploration of the Maker Movement and other trends in learning spaces, I situate academic data services within the larger conversations about space use in academic libraries and on college campuses. For example, what can data services providers learn from the philosophies and practices of makerspaces? What is the value of physical space for academic data services? By comparing and contrasting the service models and philosophies of different types of learning spaces, I aim to encourage conversation about how we conceive our own data services, how we communicate about them with our users and administrators, and how we can best meet the needs of our communities.
Working Across Boundaries: Public and Private Domains, Part 2
Flavio Bonifacio (Metis Ricerche srl)
This poster illustrates our efforts to install a Data Service in Turin using Nesstar. We presented the first part of this work in Toronto in 2013, concluding our poster with the information that Metis Ricerche had presented an application project for tender as a member of an Innovative ICT Pole concerning the installation of a data service for the conservation, reuse and dissemination of data. We obtained the funding requested and we are now pleased to announce that we have concluded our project. We will show our results of the second part of the work in our poster. We summarise the installation of a mixed data base built with numerical, text and multimedia data files, such as videos, photos and so on. We named this project Sy.Mul.Story, Multimedial System for storytelling analytics. We are currently presenting the project in Turin to various public and private organisations in order to obtain further funding.
Sustainable Environment Actionable Data (SEAD): A Knowledge Network for Collaboration, Data Curation, and Discovery
Peter Granda (ICPSR - University of Michigan)
SEAD is a project, sponsored by the National Science Foundation in the United States, to create data services for sustainability science research. This research supports fundamental science and engineering investigations and education needed to understand and overcome the barriers to sustainable human and environmental wellbeing and to forge reasoned pathways to a sustainable future. Sustainability research requires reliable cyberinfrastructure and an enhanced ability to manage, integrate, interpret, share, curate, and preserve data across a broad range of physical and social science disciplines. SEAD offers hosted end-to-end data services that serve this need, helping research teams whether large or small, to be more productive while reducing the effort required to preserve data for the long term. In particular, it serves researchers who produce and analyze heterogeneous data that is unique and at a fine resolution and granularity, those who must work in a collaborative environment, and investigators who lack access to reliable cyberinfrastructure. This poster will describe current and developing tools available to researchers in this field as they work through their projects, collect rich metadata throughout that whole process, consider ways to share their data, and discover the best location for its preservation. It will also focus on how the tools and services SEAD provides are applicable to a wide range of research projects beyond sustainability investigations. Finally, conference participants will leave with an understanding of how SEAD data services can be an integral part of best practices for data management, curation, and dissemination of data at their home institutions.
The CIC Geospatial Data Discovery Project: A Multi-Institution Project to Create an Open-Source Discovery Portal for Geospatial Data Resources
Mara Blake (University of Michigan)
In July 2015, the Committee on Institutional Cooperation (CIC) began the CIC Geospatial Data Discovery Project, a collaborative pilot project to provide discoverability, to facilitate access, and to connect scholars across the CIC to geospatial data resources. Nine of the fifteen CIC member institutions are participating in the project, including the University of Illinois at UrbanaÂshy;-Champaign, the University of Iowa, the University of Maryland, the University of Michigan, Michigan State University, the University of Minnesota, Pennsylvania State University, Purdue University, and the University of Wisconsin-Âshy;Madison. The project will support the creation and aggregation of discovery-focused metadata describing geospatial data resources from participating institutions and will make those resources discoverable via an open source portal . The collectively supported project provides the project staffing and technical infrastructure to host and develop the services. The poster will show the organizational formation and structure of this collaborative project, as well as the established processes for collaborative geospatial metadata creation and portal development.
A Collaborative Questionnaire and Study Editor within the German Longitudinal Election Study (GLES)
Claus-Peter Klas (GESIS - Leibniz Institute for the Social Sciences)
Oliver Hopt (GESIS - Leibniz Institute for the Social Sciences)
Manuela Blumenberg (GESIS - Leibniz Institute for the Social Sciences)
Wolfgang Zenk-Moltgen (GESIS - Leibniz Institute for the Social Sciences)
As presented at the EDDI 2015, several election surveys will be prepare and run on the German national election in 2017 (see http://www.gesis.org/en/elections-home/gles/).The overall process to prepare the surveys consists of several steps beginning with preparing and evaluating previous questionnaires within a distributed team, hand over the questionnaires to the polling agency for running the surveys, and finally hand over the results for documentation and archival processes. All these steps are currently done manually, in some cases semi-automatically. Word/ PDF files are used for discussion, documentation and hand-over, resulting in repeated and error-prone documentation steps. Based on the GESIS DDI-FlatDB infrastructure we will give an update on the first DDI-L based prototype for our collaborative questionnaire and study editor. The functionality will be creating, editing and structuring questions and study information according to the GLES user needs. All edited information is stored with versioning. In addition we will integrate a comment system for the researchers to interact and provide a rate system, to give certain questions more weight. At any time, the system will be able to export the current status as DDI, Word or PDF file.
UK Data Service Impact through Narratives and Case Studies
Rebecca Parsons (UK Data Service)
Scott Summers (UK Data Service)
Louise Corti (UK Data Service)
The UK Data Service plays a key role in supporting researchers in their quest to provide impact for their funded research. We use case studies, including video and modern infographics, to highlight various strands of impact - from showcasing the value of their data deposit to demonstrating powerful and policy relevant reuse of those data. In this poster we will demonstrate our portfoilo of 'impact' wares.
Doing the right thing: how you construct a service that allows researcher to access unconsented data?
Carlotta Greci (UK Data Archive, University of Essex)
Kakia Chatsiou (UK Data Archive, University of Essex)
The Administrative Data Research Network is a partnership between UK universities, government, national statistical authorities and aims to facilitate access to administrative data previously difficult to access for research. Administrative data refer to information collected primarily for operational purposes by government and other organisations when delivering a service. Within the complex UK data sharing framework, the Network enables secure and lawful access to de-identified linked administrative data to "trusted" researchers. Administrative data are in some cases reused for secondary analysis without the data subjects' explicit consent, so the Network only acts when there is a legal right for access and appropriate safeguards are in place. Despite the processing that UK and EU legislation enables, public concern about use of their data is high. Hence the ADRN have been constructed with a specific focus on providing robust and transparent governance and secure data-handling standards. From setting criteria on the eligibility to use the Network, an Approvals Panel screening all proposals, the use of Trusted Third Parties to handle data matching, to secure environments with rigorous access procedures, this poster provides an overview of the ADRN model as a way to enable research, using administrative data, to benefit the public
2016-06-03: 1H: Online user support and training
Online Tools and Training For Access and Analysis of Restricted Government Data Files
Warren Brown (Cornell University)
The Cornell Institute for Social and Economic Research (CISER) has developed tools and training modules for online access enabling researchers to more effectively work with official restricted access statistical files. The research steps covered are: discovery; metadata; practice data sets; summary statistics; and analytical exercises. Cornell's CED2AR repository of metadata in DDI 2.5 enables researchers to discover restricted access official statistical files, and government statistical agencies to securely manage details of documentation. CED2AR processes the metadata and produces search terms that are accessible by major search engines thus enabling discovery of data files based on descriptive information on the study, file and variables. The codebooks for files discovered by the search, may then be browsed and variables searched for. CED2AR enables side-by-side comparison of similar variables across files to aid in the selection of the most appropriate file. Links are provided to zero observation versions of the restricted access files in SAS and Stata for program development and proofing. These resources enable researchers to prepare well-informed applications to the statistical agency data custodians based on knowledge of the data file prior to actual access. Once approved for access, CED2AR provides detailed summary statistics further enabling approved researchers to validate their initial analyses. Finally, links are provided to analytical exercises taking the researcher beyond summary statistics to elementary modeling. At that point the researcher is well prepared to be productive and successful in the investigation of their scientific question. For this presentation the following restricted access files are used: American Community Survey (US); Current Population Survey (US); and Sample of Integrated Employment Biographies-Scientific Use File (DE).
Developing Teaching and Learning Resources for Students and Teachers
Kathryn Simpson (University of Manchester/ UK Data Service)
The UK Data Service (UKDS) is a resource funded by the Economic and Social Research Council (ESRC) to support researchers, students, lecturers and policymakers who depend on high-quality social and economic data. This presentation will demonstrate the new resources developed for students and teachers. Firstly, a suite of Student Resources webpages have been developed alongside a Using Survey Data Guide. Within the guide, key issues are related to real data using an example research project covering themes such as research questions, finding and accessing data, getting started with data analysis and reporting of results. Secondly, we have developed a UKDS Student Forum on Facebook which has 170 members participating in discussions. Through the forum we have launched a Student Dissertation Prize-the prize will be awarded to a dissertation that demonstrates flair and originality using quantitative data. The winning dissertation, along with its key findings, will be publicised on the UKDS website and through the UKDS' quarterly Newsletter. Thirdly, for teachers, we have developed Teaching and Learning Worksheets to help students learn statistical techniques such as correlation and regression using real data from the UKDS. We have also updated our Teaching with Data webpages and a number of our teaching datasets.
Training across Services: Resources, Convenience and Safe Use of Sensitive Data
James Scott (UK Data Service)
The UK Data Service (UKDS) leads the way in offering remote access to sensitive microdata for UK-based academics via its Secure Lab. UKDS offers a range of support to its growing number of users and has taken a lead role in the development of a new joint training course in using these data, to be administered by a consortium of four UK-based Research Data Centres. UKDS' partners in this venture are The Virtual Microdata Laboratory at the Office for National Statistics, Her Majesty's Revenue and Customs (HMRC) Datalab and the Administrative Data Research Network. Deployment of this training represents an improved course and a better use of resources for RDCs and researchers. Retaining the concept of the "Safes" at its core, the new course is recognised across all services within the consortium, eliminating the need for repeat training when users wish to use more than one service. The consortium is committed to regular review of the course and making improvements if appropriate. UKDS have also developed a standalone course for those (e.g. PhD supervisors) with different needs. The "Safe" security model will be discussed alongside the wider considerations of training researchers to use these highly sensitive data safely.
Developing Survey-specific Online Resources to Enhance Data Use and Confidence in Researchers: Understanding Society, A Case Study
Deborah Wiltshire (UK Data Service)
An important aspect of the UK Data Service's work is to promote statistical literacy and engagement with quantitative survey data. We provide support by creating user guides and webinars as well as through help desk support. There is often a gap between the skills learned using bespoke data, and the reality of using survey data for analysis. The queries we receive indicate a need for greater understanding of survey design and data collection, especially with longitudinal surveys which have many constraints including fieldwork procedures, respondent burden, confidentiality, and software limitations. There is some disparity between researchers' expectations and the analysis of data obtained from a repository. Having previously worked on Understanding Society, I now work in the UK Data Service User Support Team which gives me an understanding of the challenges faced by researchers and how to address those challenges. I am collaborating with the Understanding Society team to provide training events like our recent webinar introducing Ethnic Minority Boost data. I am developing online materials to support researchers in using Understanding Society. Using the success of this model as a template, we aim to explore further opportunities to collaborate closely with data depositors and ultimately improve the research experience.
Roles and Gaps in Geospatial Information Management in Asia and Pacific Countries
Jungwon Yang (University of Michigan)
At its 47th plenary meeting on July 27, 2011, the Economic and Social Council (ECOSOC) of the United Nations acknowledged the urgent need to strengthen international cooperation in global geospatial information management. The ECOSOC established the United Nations Committee of Experts on Global Geospatial Information Management (UN-GGIM) during that meeting, and asked the UN-GGIM to identify the global, regional, and national challenges, and provide recommendations for their resolution. In October 2015, after 5 years of investigation, the Asia and Pacific member states of the UN-GGIM and United Nations agencies, such as United nations Economic and Social Commission for Asia and the Pacific (ESCAP), UN Habitat, UN Statistics Division, and United Nations Group on the Information Society (UNGIS), convened to share their global and national projects at the 4th UN-GGIM-AP conference. In this presentation I will present the major global and national projects and challenges of Asia and the Pacific countries related to capacity building and disaster risk management. I will also introduce open GIS software, was developed by the United Nations agencies and the UN-GGIM-AP member states. Finally, I will discuss how these current global collaborations in Asia and the Pacific countries will affect the availability and reliability of geospatial data in the region.
Supporting campus GIS needs with Limited Staffing and Budgetary Resources
Erich Purpur (University of Nevada)
Following the model of many American university libraries, University of Nevada, Reno (UNR) libraries are providing increased GIS and research data support to the campus community. These services are heavily used by both campus users and community members. With minimal staffing and budget allocated towards serving growing GIS needs, meeting the user demands is challenging. Available GIS services are wide-ranging and serve a breadth of departments on campus, a large cohort of which are in the social sciences. In-person technical support, data gathering, statistical services, data management assistance, and funding issues are a part of the suite of GIS services offered. On top of this, UNR libraries maintains the state's online public GIS data and remote sensing imagery portal (http://keck.library.unr.edu/), in service to many private, state, and federal agencies in Nevada. This session will examine in further detail how GIS staff navigates these responsibilities among other duties and how they efficiently leverage expertise and technology to support the teaching, learning, and research mission of the university. Lastly, we will discuss the successes we have had in gaining additional resources from administration through thorough documentation and assessment of GIS activities.
Implementing an Infrastructure to Georeference Survey Data at the GESIS Data Archive
Stefan Schweers (GESIS - Leibniz Institute for the Social Sciences)
Stefan Muller (GESIS - Leibniz Institute for the Social Sciences)
Katharina Kinder-Kurlanda (GESIS - Leibniz Institute for the Social Sciences)
There is an increasing demand for georeferenced survey data in the social sciences as such data promise to contribute to a better understanding of how the concrete living environment influences individuals' attitudes and behaviors. In consequence, when the two are brought together, spatial data can complement survey data in important ways and can open up new possibilities for research. For example, census data can be added to survey data. However, this opportunity is only rarely realized in the social sciences as researchers face several technical and legal barriers.Currently, there is no infrastructure in Germany that facilitates the merging of spatial data with survey data in an open and transparent way. The project "Georeferencing of survey data" (GeorefUm) explores avenues for creating such a spatial data infrastructure (SDI) for the social sciences.In our presentation we examine the role that a spatial data infrastructure can play in offering services for social scientists, and show the scope and nature of necessary tasks in areas such as archiving, dissemination and user support. As the case of Germany is similar to that of other European countries, we expect our results to be helpful in the creation of SDIs in other countries as well.
Andy Rutkowski (University of California, Los Angeles)
Yoh Kawano (University of California, Los Angeles)
Stacy Williams (University of Southern California)
Online web mapping of data has increasingly become ubiquitous within student and faculty research as well as becoming integral to the ways that programs/institutes/organizations communicate and share their work or collections. With platforms like CartoDB, MapBox, Google My Maps, and developer spaces like GitHub, it has become easier and easier to create simple, effective, and beautiful maps that display all types of information.This paper/presentation shares the experience of creating a custom online web mapping template that anyone can use - as long as they have a dataset. The project emerged during a seminar course when students began using an online archive that contained images with geographic information but no map viewer/interface to explore the data. The project was then adapted to work with other materials within our institution's collections. We will share the process of getting data from that website, cleaning and data preparation for a map-viewer, identifying future growth and enrichment of the data, and the different tools, hardware, software, and programming languages necessary to create a simple and functional online map viewer. The emphasis of this paper/presentation will be on the process and how to get students and faculty to understand what is involved in gathering, preparing, and displaying data online. Larger issues that we will address include the sustainability of online mapping projects, best practices for data curation, and creating/enabling user communities around mapping projects.
2016-06-03: 2E: Data management planning in action
High Costs - Little Benefits? The Costs of Research Data Management
Sebastian Netscher (GESIS, Leibniz Institute for the Social Sciences)
Astrid Recher (GESIS, Leibniz Institute for the Social Sciences)
Researchers often consider Research Data Management (RDM) a chore with high cost but little benefit. The argument of little benefits is easily refuted seeing that RDM enhances the quality and transparency of research. However, we lack evidence for the cost of RDM. It is difficult to measure this cost because many RDM measures are an integral part of the research process. Moreover, RDM cost depends on the specific design of a research project and so far very little data exists to help us identify cost drivers in a project.Our project aims to close the gap of missing reliable calculations by examining the cost of RDM as follows: Firstly, it identifies the areas of RDM causing costs in research projects. Secondly, it systematically analyzes RDM measures undertaken in different projects: (inter-)national surveys in which quantitative data was processed, harmonized and documented, as well as projects that collected qualitative data. On this basis, the project aims to develop a better idea of the cost factors of RDM with the objective of creating a tool to assist researchers to calculate the costs of RDM. The presentation will provide an insight into RDM cost drivers and our approach to examining these costs.
Active Data Management Planning and Its Exchange with DDI
Uwe Jensen (GESIS, Leibniz Institute for the Social Sciences)
Sebastian Netscher (GESIS, Leibniz Institute for the Social Sciences)
Research data management (RDM) is an integral part of research. Nowadays, Data Management Planning is becoming highly relevant for project proposals, since data policies and funding guidelines expect replicability, sharing and re-use of publicly funded data.However, implementing systematic RDM is challenging for different reasons at different levels:nbsp; with respect to project complexity and available resources in social science research RDM is unambiguously related to a certain research project;nbsp; in regard to funders, (inter-)national regulations on funding and data policies vary greatly;nbsp; in terms of DDI, there exists no data management plan (DMP) standard that enable fully integrated in the current DDI flavours, so far.To face such challenges, the DDI working group Active Data Management Planning (ADMP) aims incorporating DMP into current DDI versions. In line with their goals the presentation provides an overview of achieved results (so far) with respect to the following issue:nbsp; We introduce uses cases of projects, funders, and archives to consider their specific DPM requirements for integration into DDI specifications.nbsp; We inform on common, different and notably re-usable DMP information sets to be exchanged among these user groups. The specific workflows between the triangle of projects, funders and archives will be considered respectively.nbsp; Finally, we discuss options of how to integrate the findings into DDI and the potential usage of and mapping with (other) standards.With this presentation, we aim to get feedback from the IASSIST community on the various use cases and the usage of DMPs for various purposes. Thus, the feedback shall foster the work of the DDI ADMP working group to integrate relevant data management planning assets into DDI.
Early Intervention and Data Management - New Strategy for Increasing Research Data Deposits
Gry Henriksen (Norwegian Social Science Data Service (NSD))
NSD have archival agreements with The Norwegian Research Council, and research institutions. Even so, NSD receive only a fraction of the potential numbers of data sets for archiving. The reasons are multiple and solutions complex. We believe that closer contact with the researchers will increase the number of deposits and be advantageously both for the researchers and the archival services. This given that the contact strengthens the researcher's ability to collect, document and manage their own research data.NSD's new strategy in this area is to focus on more, and more targeted on research needs during the whole research process. The core will be early intervention including introduction of a new data management plan (DMP) that reflect the researchers needs for good research organization and practice. The new DMP will not just provide the archive with good and usable metadata, but also provide the researcher with tools and resources to collect high quality research data in compliance with legal or ethical requirements. We will focus on targeted information and training to the research community.This presentation will present and discuss the new strategy and our first experiences.
Formal Data Management Planning: Useful Guidance Or Administrative Burden?
Marieke Heers (FORS, Swiss Centre of Expertise in the Social Sciences)
Brian Kleiner (FORS, Swiss Centre of Expertise in the Social Sciences)
Recent years have seen increasing initiatives aiming to promote data sharing, driven in part by the Open Access movement, with increasing awareness among stakeholders and researchers of the importance of making data publicly available. A number of research funding agencies have made data management plans a formal requirement of the research proposal; others are contemplating doing so. This is the case of Switzerland's main science funder, the Swiss National Science Foundation (SNSF). As a representative of the Swiss research community in the social sciences, but also as the main national data archive in the field, it is important for the Swiss Centre of Expertise in the Social Sciences (FORS) to assist in this possible policy development from the SNSF. Reviewing data management plans (DMP's) from several countries, we asked the following questions: a) is there such thing as a DMP, or is there rather a variety of DMPs serving different goals? b) How useful/relevant are DMP's for researchers? c) How could DMP's be improved? Results of our analyses show that data management plans are often heavily focused on post-project data sharing with less concern for data management during the projects themselves. We argue that redressing the boundary between future usefulness (for sharing) and current usefulness (for project planning) would help researchers' better see the value and utility of data management planning, and thus consider DMPs less as an annoying administrative burden.
2016-06-03: 3E: Data management archiving/curation platforms
Data repository platform evaluations
Jennifer Doty (Emery University)
In 2015, Emory University embarked on a process to identify an appropriate long-term data repository solution for locally-generated research data for which there are not suitable disciplinary repositories. To do so, we formed an internal task force drawing from across the libraries and IT services and representing a wide range of roles and perspectives in our organization. The group identified several possible implementations for long-term data archiving from available platforms. We then worked together to identify and refine our criteria for evaluating the different platforms and to conduct comprehensive evaluations of each system. This presentation will outline the process of how we collaboratively developed our institutional criteria and how we established common evaluation tasks from depositor, administrator, and end-user perspectives. The presentation will also review the evaluators’ experiences conducting assessments in the dynamically developing world of data repository platforms. Finally, I will cover some lessons learned from the experience, including both the advantages and disadvantages to our approach.
More Data, Less Process? The Applicability of MPLP to Research Data
Sohia Lafferty-Hess (University of North Carolina, Chapel Hill)
Tu-Mai Christian (University of North Carolina, Chapel Hill)
In their seminal piece, "More Product, Less Process: Revamping Traditional Archival Processing," Greene and Meissner (2005) ask archivists to reconsider the amount of processing devoted to collections and instead commit to the More Product, Less Process (MPLP) "golden minimum." However, the article does not specifically consider the application of the MPLP approach to digital data. Data repositories often apply standardized workflows and procedures when ingesting data to ensure that the data are discoverable, accessible, and usable over the long-term; however, such pipeline processes can be time consuming and costly. In this paper, we will apply the principles and concepts outlined in MPLP to the archiving of digital research data. MPLP provides a useful lens to discuss questions related to data quality, usability, preservation, and access: What is the "golden minimum" for archiving digital data? What unique properties of data affect the ideal level of processing? What level of processing is necessary to serve our patrons most effectively? These queries will contribute to the discussion surrounding how data repositories can develop sustainable service models that support the increasing data management needs of the research community while also ensuring data remain discoverable and useable for the long-term.
NORD-i - a novel, DDI4-powered Data Curation Platform for NSD
Ornulf Risnes (Norwegian Centre for Research Data(NSD))
Vigdis Kvalheim (Norwegian Centre for Research Data(NSD))
NORD-i is a new infrastructure project for NSD, funded by the Norwegian Research Council. The goal of the project is to increase volume and quality of Norwegian research on socio-economic data and other data under NSD's archival mandate, through a strengthening of NSD's data curation platform and its interfaces to actors and stakeholders in the research community. Through RAIRD - another Norwegian infrastructure project - and interaction with CESSDA-related projects and the DDI community, NSD has accumulated sufficient domain experience and technological knowledge to plan and design a more holistic, automated and extensible data curation platform than previously conceivable. The new data curation platform will integrate tools and functionality for documentation, data management and anonymization with machine-readable resources (e.g. classifications, thesauri, question banks, institution registers, etc) within and outside of NSD. The data curation platform is also inspired by workflows and practices known from software development and source code management, and includes: * Fine grained revision control of data and metadata * Solutions for collaborative workflows * Automated testing and quality assessment * Automated solutions for packaging, distribution, dissemination and publishing of data * Inventory control * Activity reports * Online data management and analysis solutions This presentation will give an overview of the NORD-i project, NSD's context in the Norwegian and international research infrastructure landscape - and take a closer look at the foundational and functional components of the platform.
2016-06-03: Pecha Kuchas
Datafication: Is That Really a Word?
Susan Noble (UK Data Service)
According to the OECD Data Driven Innovation report, Oct 2015, socio-economic activities are increasingly migrating to the Internet in what is termed the "datafication"of society. Thanks to the mushrooming evidence that open data are beneficial, much publically funded data are now freely accessible. In line with the open access, open education and open data movements, the UK Data Service offers more and more of its data openly and is keen to help UK Data Service users seize the benefits of this datafication of society!During this presentation, we will describe the features we have implemented on the Service's international data delivery platform - UKDS.Stat, which help to make it an invaluable resource for anyone interested in international socio-economic data. This includes the opening up of more datasets, the implementation of API access, the integration of research publications via Digital Object Identifier citations to demonstrate impact, and the reaching out to new discipline areas with a specific section on the Sustainable Development Goals.We will also present the plans we have for the future, for example investigating linked data and integrating social media.
Zachary Painter (University of Massachusetts, Dartmouth)
The programming language R is one of the most popular statistical software tools, yet many people can feel intimidated by the idiosyncrasies and supposed complexity of the language. Other tools for statistical computation typically either require expensive licenses (such as SPSS or Mathematica) or are not as well developed as R (such as Python or Julia). Overcoming the mental hurdle of learning a new, unfamiliar tool that looks frightening compared to some alternatives can be challenging for many, yet with a little help from someone who knows some basic concepts those new to R can feel confident that they can learn how to use it. Drawn from experts in a variety of fields in the natural/physical sciences and social sciences, this presentation will demonstrate a framework to provide an instructor, who does not have to be an expert in statistical computation or the R language itself the tools to teach a gentle and logical explanation to someone completely unfamiliar with the R language within a 90 minute workshop. Basic functions, simple data wrangling, and introductory visual analysis will be covered to give learners a wide introduction to the capabilities of the language without giving them too much to absorb at one time. In addition, learners will be given a few brief resources to use pre- and post- workshop so that they can take full advantage of the time in class to practice hands-on and experiment without giving up hope that they will get lost or slow the rest of the group down.
Stories Are Just Data with a Soul
Chris Coates (University of Essex)
How does a communications specialist look at data science? (Or, what does a car in a fountain have to do with "big" data?) This case study of a TEDx talk at the University of Essex looks at two different kinds of storytelling: journalistic work producing "human interest" stories for a university alumni magazine; and the stories administrative data can tell us.The TEDx talk (https://www.youtube.com/watch?v=KYKV-LwGe4g) used irreverent stories about Essex students to draw in a general audience, and then compared these to (for example): the Index of Multiple Deprivation, which pinpoints the most deprived parts of England; and the Scottish National Health Service using data to target diabetes treatment.This presentation will examine how I researched and presented these examples, and wove different stories together into a narrative to engage the audience. It will also touch on how important it is for the Administrative Data Research Network to communicate in unexpected ways, to be open and transparent, and to encourage the public to see data science in a new light and understand its importance for society - all in order to pre-empt possible objections to the use of these data.
Energy Embed
Lisa Neidert (University of Michigan)
I worked for two years on a project as a liaison between the Institute for Social Research and the Energy Institute, an organization populated by engineers of one stripe or other. The University of Michigan Energy Survey is a rider on the monthly Survey of Consumers. The survey is taken quarterly in January, April, July, and October, although October 2013 was the first survey month. Thus, year 1 is October 2013 - July 2014 and year 2 is October 2014 - July 2015. Exciting for the project is the movement in the price of gas from 3.70 a gallon down to 2.10. This allows us to see how the consumer unaffordability threshold responds to price changes. This presentation will describe the survey and some findings; the tasks I did; what I had to learn; and all the details about going to the dark side.
A Match Made in Data? - Developing a UK Research Data Discovery Service
Veerle Van den Eynden (UK Data Service, University of Essex)
Research data are everywhere. Created by researchers, then lodged in disciplinary data centres, university repositories, journal supplements, international repositories. How do we keep track of which interesting data are out there to use and where they can be found? A partnership of Jisc, seven UK data centres and nine university repositories that are representative of the current UK research data landscape are busy developing a UK-wide research data discovery service, by harvesting metadata into a central discovery service. In theory this sounds easy; in practice quite a challenge. For starters, how to define the scope of which "UK research data" to bring into this portal? Data resulting from publicly funding research? Data created by UK researchers? Data of use to UK researchers? And what do we understand with research data anyway and how they are represented in a dataset? Can we organise and describe visual arts data with the same metadata profile as collections of interviews, crystal structures or data produced by neutron beamer experiments? How does DDI map to the Gemini metadata standard or to the Core Scientific Metadata Model) metadata standard ? Long-established data centres have typically developed optimal ways to represent datasets for their discipline. Newly established university repositories have much flexibility in implementing generic data solutions. The advisory groups formed of specialists from across the partnership's data repositories steer us professionally through this maze of weird and wonderful data facts and tales towards a harmonious discovery service.
Feeling Like Indiana Jones: Discovering the Unknown Holy Grail of Administrative Data
Kakia Chatsiou (UK Data Archive, University of Essex)
The Administrative Data Research Network (ADRN) helps researchers access de-identified administrative data to carry out research that can benefit society. Researchers can apply to the Network with a research idea to access administrative data in secure environments.ADRN User Services are tasked to advise on data sometimes not used before, not well understood or even unclear if they exist. This is often due to:nbsp;the nature of administrative data (operational; undocumented; volume; dynamic; mostly unconsented; quality; frequency of release; retention periods, legislation)nbsp;the nature of information about administrative data (inconsistent metadata and schema; not interoperable; incomplete; quality; validation; reproducibility)nbsp;the diversity of needs of stakeholders engagednbsp;limited resources of government departmentsValuable research can happen by unlocking the potential of administrative data but at the moment the research community has little motivation to use these sources and a very limited understanding of what is available.The presentation provides an overview of recent work in that area and how we have recently dealt with challenges, worked alongside administrative departments to encourage the use of and implemented standardised approaches to metadata collection. And why we feel like Indy in search of the Holy Grail. Most of the times.
Sex, Drugs, Rock 'n Roll, and Social Science Research Data: New Data Collection Additions to the ICPSR Data Archive
Justin Noble (ICPSR, University of Michigan)
Sex, drugs, and rock 'n roll are not only attention-getting topics, but are areas in which valuable research data have been collected and are being shared through ICPSR and other data repositories. This presentation promotes recently released studies available through ICPSR on sexual behavior and drug abuse as well as highlights data available at ICPSR's National Archive of Data on Arts Culture.
2016-06-03: 1E: Policies and trust
Core Certification: New Common Requirements for Trustworthy Digital Repositories
Ingrid Dillo (DANS)
If we want to be able to share data, we need to store them in a trustworthy digital repository. Data created and used by scientists should be managed, curated, and archived in such a way to preserve the initial investment in collecting them. Researchers must be certain that data held in archives remain useful and meaningful into the future. Funding authorities increasingly require continued access to data produced by the projects they fund, and have made this an important element in Data Management Plans.Nowadays certification standards for data repositories are available at different levels, from a core level to extended and formal levels. Even at the basic level, certification offers many benefits to a repository and its stakeholders.Core certification involves a minimally intensive process whereby digital repositories supply evidence that they are sustainable and trustworthy. Both the Data Seal of Approval as well as the ICSU/World Data System offer a basic certification standard.Within the framework of the Research Data Alliance these two communities have now created a new set of harmonized Common Requirements for certification of repositories at the basic level, drawing from their respective criteria. In this presentation these new common requirements, that will replace the existing ones in the course of 2016, will be presented.
Preservation Policy Recommendations - Results from DASISH and CESSDA SaW
Trond Kvamme (Norwegian Social Science Data Service (NSD))
Clear and explicit policies play a vital role in sustaining long-term preservation and accessibility of research data. A well-defined policy framework establishes a platform of trust between stakeholders involved in the funding, creation, preservation and dissemination of research outputs. A transparent set of policies support internal data curation procedures and ensures accountability and allow for external quality control. This strengthens the trust between archive service providers, funders and researchers.In the DASISH project (2012-2014) a selection of guidelines and recommendations for the articulation of preservation policies were described and compared, mapping the current scope and content of policies and procedures, particularly in the SSH domain.Building on the work in DASISH, the H2020 INFRADEV-project CESSDA SaW will provide high-level guidance in setting up and preparing a preservation policy framework for research data services. The aim is to provide a resource and a template that can assist well-established and new/developing data archives to prepare, articulate and upgrade their policy framework.This presentation will describe the policy template that is currently being developed in CESSDA SaW and how it is interrelated with the policy recommendations from the DASISH project.
Integrating Data Reusers' Defined Trust Attributes Regarding Data Curation
Ayoung Yoon (Purdue University)
This presentation will discuss how data reusers define trust attributes of data from their reuse experiences and how these trust attributes can be integrated into data curation and management practices. Understanding data reusers' perspectives and expectations is important to enhance data reusability and support current and future use, which are the fundamental purposes of data curation. Trust is a useful concept to apply in order to understand data reusers' thoughts, experiences, and needs, as the concept of trust is woven into the life cycle of data from the creation, preparation, and management of data to sharing, reuse and preservation and into the relations with parties involved in this life cycle.Assessing data for its trustworthiness becomes important for data reusers with the growth in data creation because of the lack of standards for ensuring data quality and the potential harm from using poor-quality data. Despite the importance of data reusers' trust in data, trust judgment is not a simple task for data reusers and the process of judging trust involves various social, individual, and institutional factors. Exploring many facets of data reusers' trust in data generated by other researchers and reuser-defined trust attributes will provide insights on how to improve current data curation activities in a user-trusted way, such as methods that ensure users’ trustworthiness during data curation and developing user evaluation criteria for the trustworthiness of data.This study identifies a total of 11 trust attributes from among qualitative interview studies with 38 quantitative social science data reusers in the United States, which reflect current practices of data reuse and suggest various implications for data curation. Trust attributes discussed from previous trust literature are also adopted or modified in the context of data reuse.
2016-06-03: 2A: Research data management services development
Partnering with Researchers by Supporting Data Management Planning
Anne Sofie Fink (National Archive of Denmark - DDA)
Christian Lindgaard (National Archive of Denmark - DDA)
In Denmark as in many other countries data management planning is becoming inevitable for research projects. DM plans are required by both funders and institutions. Most significantly DMP is carried forward by EU's Horizon2020 framework.In this presentation We will outline the new research landscape where DMP will have a significant impact seen from a national and European perspective. In Denmark the most remarkable events has been the establishment of a National Forum for Data Management and a cross national case-based project on data management in practise.As experts in data documentation there is an important role for us to play in supporting researchers in producing data management plans. However, researchers and research infrastructures might not be aware of the expertise for DMP found in social science data archives.The challenge is to be able to communicate our knowledge about data documentation to researchers both on a conceptual level (the research project) and on a detailed level (data production). We will make suggestions for strategies for communication and corporation that visualises to researchers and institutions, how our expertise adds value to data management planning on both levels.Additionally DDI (Data Documentation Initiative) must be among our offerings for DMP for research projects.
Research Data Management: A Practical Approach to Overcome Challenges to Boost Research
Bhojaraju Gunjal (National Institute of Technology, Rourkela)
Panorea Gaitanou (Ionia University)
The advent of new technologies along with the development of several Research Data Management (RDM) tools has led to a great revolution in the automation and digitization in libraries which aim to provide innovative added value services to their patrons. At the same time, the adoption of various policy frameworks for managing the data and workflow systems along with other Knowledge Organisation Systems such as metadata, taxonomies, ontologies etc. that enable the interoperability of research data and enhance information retrieval, pose challenges to the information professionals within the library context. The importance of RDM is increasingly recognized from several organizations and institutions around the world, as it plays a crucial role in the documentation, curation and preservation of research data. Therefore, it is natural that libraries can be considered as a critical stakeholder in the RDM landscape. Their role is highly related to the following: RDM policy development, advocacy and awareness, patrons training, advisory services, data repository development etc.The paper will first present a brief overview of RDM and a detailed literature review regarding the RDM aspects adopted in libraries globally. It will also describe several tendencies concerning the management of repository tools for research data, as well as the challenges in implementing the RDM. The proper planned training and skill development for all stakeholders by mentors to train both staff and users are some of the issues that need to be considered to enhance the RDM process. An attempt will be also made to present the suitable policies and workflows along with the adoption of best practices in RDM, so as to boost the research process in an organisation. This study will showcase the implementation use of RDM and the processes adopted at the Central Library at NIT Rourkela, India.
From Evidence to Strategies: Needs for Research Data Use, Management and Sharing in Canada
Susan Mowers (University of Ottawa)
Chuck Humphrey (Canadian Association of Research Libraries (CARL))
Evidence of RDM needs and practices of researchers from all disciplines at one larger Canadian university was gathered in 2013. This paper reports on the survey results from 250 respondents around data use, sharing, and management, as well as the researchers' research practices and goes on to discuss the impact on planning for policies, guidelines, services, and infrastructure. Major themes are collaboration, sensitive data, differences across research methods and disciplines, as well as incentives for researchers to improve data use, management and sharing in their fields.
2016-06-03: 3B: Disclosure techniques for restricted data
RAIRD - A fully interactive online statistical package for remote analysis of confidential microdata
Ornulf Risnes (Norwegian Social Science Data Service (NSD))
RAIRD (Remote Access Infrastructure for Register Data) is a web-based system for confidential research on full population event data (spell-data) from a set of Norwegian administrative registers. RAIRD is currently under development and testing, and will move into production in 2017. RAIRD takes advantage of technological opportunities and improvements in the DDI-standard, and enables advanced statistical functionality through a fully interactive web interface. Researchers interact with data through a privacy preservation layer, and can explore, transform and analyze data almost as if the data were stored locally. The microdata however never leave the safe environment on the server-side, and all directly or indirectly disclosive information is removed from the outputs before they are returned to the researchers. The ambition of RAIRD is to develop a fully interacte and highly performant and scalable platform that can help the research community utilize the register data more easily than currently possible. Ideas from the DDI Moving Forward have enabled RAIRD to develop funcitonal and robust metadata models suitable for event-histories and other complex register data. This presentation will give an overview of the solutions in RAIRD, how we preserve privacy and the impact from the DDI Moving Forward process
SDCMicroGUI - a R-based GUI tool for statistical disclosure control
Archana Bidargaddi (Norwegian Social Science Data Service (NSD))
The responsibilities of public data owning institutions such as National Statistical Institutions and other government departments, are not only to archive data but also to disseminate data and enable its reuse without compromising on privacy. Hence, these institutions have need to publish/deliver trusted, high quality microdata primarily aimed at either scientific use or public use. These outputs have to be as detailed as possible, primarily if they are meant for scientific use, to meet the the objective of providing rich statistical information. However, this objective conflicts with the obligation the institutions have to protect the confidentiality of the information collected as part of surveys and/or other administrative activities. Statistical Disclosure Control (SDC) seeks to protect statistical data in such a way that they can be released without giving away confidential information that can be linked to specific individuals or entities. Even though there is enough information and guidelines on performing SDC there is a lack of tools to enable one to perform SDC in an effective way. Most institutions perform anonymization of the data they deliver manually. This manual anonymisation process is often quite challenging, cumbersome and time consuming needing high level of expertise and often with no repeatability or traceability. This talk will present SDCMicroGUI - a tool for Statistical Disclosure Control. The free-GPL licensed R-based sdcMicroGUI tool allows users to perform common anonymization methods along with supporting estimation of disclosure risks and calculating of data utility post anonymisation on micro-datasets through easy-to-use, highly interactive, menu-based operations. SDCMicroGUI was developed by IHSN and is based on the R-package sdcMicro, which is also developed by IHSN. In a project initiated by UKDA, NSD enhanced the existing SDCMicroGUI to further improve its user-interface/layout and to add more UI functionalities. Today we present the modified version of the SDCMicroGUI.
The development of synthetic data sets to expand and transform use of disclosive data from the ONS Longitudinal Study
Oliver Duke-Williams (University College London)
Nicola Shelton (University College London)
Adam Dennett (University College London)
The ONS Longitudinal Study is a data set built around a sample of 4/365 birthdates for individuals in England and Wales using decennial census data from 1971 to 2011 plus linked administrative data, together with census data for other persons in sample member's households. The data are disclosive, and thus access is restricted to approved researchers working on approved projects, with access via a safe setting or via an intermediary. This presentation describes the production of non-disclosive synthetic versions of the data that offer a number of advantages. Training sessions etc. especially those for potential users can be run using these data, whereas access to real data is impossible. Existing users can test their SPSS/STATA/etc scripts on synthetic data, to debug and test the logic properly, before submitting scripts to be used in the safe setting. If the data have the characteristic that attribute distributions are reasonably similar in synthetic data to those observed in real data, then synthetic data can also be used as an initial exploratory tool to gauge whether planned analyses are feasible. Two separate approaches have been engaged: one working from project-specific secure data sets to produce a safe set of equivalent microdata that have similar but not identical characteristics, and a second approach producing an entirely synthetic general purpose data set. We reflect on both the methodological issues involved in producing synthetic data and the institutional process of arguing that such data are both safe and useful.
2016-06-03: S4: DDI tools: No tools, no standard
DDI Tools Session: No Tools, No Standard
Marcel Hebing (DIW Berlin)
The acceptance and adoption of a standard like DDI highly depends on the availability of software tools to use it. The DDI Developers Community is a part of the DDI Alliance where software developers from around the world can meet and swap ideas on working with DDI in various programming environments and languages. In this session we like to give you an introduction to our work and present you a selection of available tools. This Session will give you an overview of tools available from the community.Detailed presentations of tools include:- Dan Smith and Jeremy Iverson: Colectica - Daniel Katzberg: Using metadata to auto-generate variable - Knut Wenzig: Packages for Stata and R using DDI on Rails metadata - Marcel Hebing: DDI on Rails - Olof Olsen and Jannik Jensen: The healthy portable portal - Samuel Spencer: Aristotle Metadata Registry