Key Note 2: Administrative Data Research UK: The journey so far
Emma Gordon (ADR UK)
Session A1
Data Skills Pathway: Planning a curriculum of free interactive self-learning materials
Sarah King-Hele (UK Data Service, University of Manchester)
This talk outlines an 18-month project ending in Sept 2022 to explore the feasibility of providing a coherent curriculum of self-learning materials that will guide learners through a pathway of pre-analysis and basic data skills for quantitative research in the social sciences. Open access and available on the UK Data Service website, the self-learning materials would cover quantitative social science skills and those needed to access and analyse more novel data types such as social media data, administrative data etc. We will outline progress to date and discuss the potential to extend the Data Skills Pathway to self-learning materials from other training providers to cover skills that are beyond the remit of UK Data Service training.
Who is counted? The Census of Canada and ethno-racial and Indigenous identities
Kevin Manuel (Ryerson University)
Rosa Orlandini (York University)
Alex Cooper (Queen's University)
Finding data on race, racialized populations, and anti-racism in Canada can be a complex process when conducting research. One source of data is the Census of Canada which has been collecting socio-demographic data since 1871. However, the collection of racial, ethnic, or Indigenous data has changed throughout the years and from Census to Census. This presentation will detail how the Census of Canada has asked questions about ethno-racial and Indigenous identity over time, and some of the challenges with searching for ethno-racial data in Canada. The presenters will also share an online guide they created to help librarians and researchers answer ethno-racial data questions within the Canadian context.
This presentation will share the experience of two Florida student success librarians, at different flagship schools in the state. Both were interested in data privacy and concerned with the ethical dilemmas raised by participating in campus and library learning analytics, and the undergraduate students' understanding of data ethics. Together they organized and coordinated a webinar series on this topic. This series had a two pronged approach, both to educate local librarians about these issues as well as form a local community of interest and practice on data ethics and privacy literacy. This presentation will focus on the methods of building such a series and community, lessons learned from this experience, and future plans of action.
Developing an undergraduate course in critical data literacy to advance new college learning outcomes
Amelia Kallaher (Cornell University)
In 2019, the College of Agriculture and Life Sciences (CALS) at Cornell University initiated a review of its undergraduate learning outcomes. Librarian membership and advocacy on the review committee facilitated the integration of a data literacy competency into the new college learning outcomes. Two years later, the library assisted with the advancement of these learning outcomes by developing and teaching a new credit-bearing course on data literacy. This new data literacy course aimed to equip students from a variety of non-technical backgrounds with the necessary skills to engage with data in meaningful ways (both quantitative and qualitative data). The course approaches data literacy as part of a broader process of inquiry into the world – not from a math or statistics-centric point of view. This presentation will outline the transition from mostly “click and show” or “one-shot” data instruction sessions to theoretical, scaffolded, semester-long data literacy instruction that advances curricular goals of the college. Additionally, the presentation will share strategies for successful advocacy with college leadership, walk attendees through the course-development and approval process at our college, future plans and assessment for the course, and share our philosophy about the importance of critical data literacy at this particular point in history.
Session A2
(Virtual) - DATABOOK: a standardised framework for dynamic documentation of algorithm design during Data Science projects
Anna Nesvijevskaia (DICEN IDF / QUINTEN)
This paper proposes a standard documentary framework called Databook for Data Science projects. This proposal is the result of five years of action-research on multiple projects in several sectors of activity in France and a confrontation of standard theoretical processes of Data Science, such as CRISP_DM, with the reality of the field. The minimalist and flexible structure of the Databook prototype, described and illustrated in this paper, has revealed its operationality on more than a hundred projects and has been recognised by various stakeholders as an excellent facilitator of Human Data Mediation, especially for multi-skilled projects. Beyond its proven benefits for project efficiency, this framework, conceived as a frontier object, can be applied more broadly to data project portfolio management and data value, governance and quality. By surpassing the computational aspect of the models, the Databook is an answer to the issues of interpretability and auditability of algorithms.
(Virtual) - Better ask the right questions: Leveraging fielded surveys for future research.
Thomas Krämer (GESIS Leibniz Institute for the Social Sciences)
Recently, survey documentation is increasingly available including questions and variables used in these studies, as tool support and adoption of the DDI Metadata Standard has improved significantly in the past five years. However, there is only poor support in searching large collections of fielded survey questions, comparing these questions related to the same variables or concepts, and re-using questions for ones own research. We present an open source technology stack that integrates search, comparison and re-use of survey questions for social scientists. We will show how institutions can set up a fully automated workflow to aggragate metadata, build a search index, offer a rich search user interface on the resulting question banks. From a social scientists point of view, we demonstrate how to find questions in one's area of interest, compare candidates including answer options and language variants, transfer selected questions into a questionnaire designer and proceed to conducting the survey in an online survey tool. While it is almost standard to version and interlink variables and questions across different waves of longitudinal studies, individual research currently lacks this contextualisation. Using the presented solution, social scientists enhance the track of (re-)used questions and variables, thus opening analysis perspectives beyond the current study. Finally, we will explain how the stack relates to the principles of open science and the FAIR principles.
Managing the grant output contribution process from lab to repository
Juliane Schneider (Sage Bionetworks)
The major challenges in creating research outputs that are discoverable, accessible and reusable are the transfer of research outputs from the creators to a repository, and application of useful, interoperable metadata to datafiles, software and mouse models. As a Data Liaison for Sage Bionetworks, I coordinate the outputs for over 60 grants in Alzheimer's Disease and mental disorders research. The outputs include research data, experimental tools and computational tools, which are uploaded to Sage's Synapse platform along with detailed metadata and study documentation. Sage also provides data availability statements for manuscripts using data in our repository, and manages governance and accessibility levels for the grant outputs. As part of the process, I train the data contributors on creating metadata for their data using metadata templates, and teach them to use our metadata validator tool. Tracking these processes can be complex, as curation tasks are passed between the Data Liaison, Data Curator, and the Governance team, and as the documentation is collected and governance is applied. The challenges we face in our current data contribution process are legacy systems, managing communications with our data contributors, and keeping our data models and metadata current with a fast-moving research environment. I will present on the methods I use to track the data contribution process, the tools and templates Sage uses to acquire metadata and train researchers on creating good metadata for their contributions, and how we are addressing the challenges we currently face in reconciling legacy systems with new assays, materials and processes used by our contributors.
(Virtual) - Experiences with converting DDI among versions using DDI-FlatDB
Claus-Peter Klas (GESIS - Leibniz Institute for the Social Sciences)
Many service providers and institutions in the social sciences document survey data only on the level of studies and usually in legacy systems, so not based on standards and standard metadata. The reasons are multifold, but one prominent reason is missing software or tool support to document studies, questions and variables. In addition, such tools are usually self-developed and lack functionality and do not provide DDI as metadata standard. Current tools to document studies and questions are DataVerse, NESSTAR and the commercial tool suite Colectica. Based on GESIS FlatDB, it is a strong, tested, open-source and DDI compliant technology stack for Social Sciences research institutes to increase findability, accessibility, interoperability and re-usability at question and variable level, we will introduce the surrounding tools and services. These tools and services consist of editors, a DDI converter and validator, legacy data converter or statistics generators, search portals and support the complete workflow from documentation to dissemination. One prominent tool is the GESIS questionnaire editor as open-source service for documentation and translation of questionnaires.
(Virtual) - Designing Inclusive Research Data Management Practices with Intentionality to Reduce Invisibility and Promote Equity across the Research Data Lifecycle for Sexual and Gender Minority (SGM) People
Berenica Vejvoda (University of Windsor)
This proposal begins with the premise that “by design” confers intentionality and that best practices for RDM across all stages of the research data life cycle must `by design’ purposefully centralize inclusion strategies in order to reduce invisibility and stigmatization against SGM people and to promote equity. This proposal will establish that RDM is not a neutral process but, rather, is situated in a wider social, political, and cultural context and dominant discourses. A critical lens will be applied across the research data life cycle to discuss how RDM strategies can become more inclusive and representative of SGM people The presentation will address key inclusion strategies as they apply to each stage of the research data life cycle. Strategies to be discussed include: data capture of critical issues and knowledge gaps specific to SGM people; inclusive methodologies, instruments and study design; direct involvement of SGM people in data collection processes; capturing gender, sex and sexual orientation (GSSO) data beyond a single binary attribute; adoption of inclusive metadata and ontologies; inclusive dissemination; and ensuring data are readily available to, and discoverable by, key stakeholders affecting change through policy and research in addressing inequalities relating to SGM people. Finally, this presentation will discuss how data professionals can utilize a critical lens to advise researchers on more inclusive data management practices for GSSO data. That approaching research data management with a critical lens is challenging will be recognized. However, it is hoped that this proposal will reveal that researchers can ask critical questions (such as “who is being excluded”) across all stages of the research data life cycle when an intent to change social realities and address social inequities is made central to managing research data from the outset.
(Virtual) - Mapping by Design: Learning from the Zaria mapping project.
Yusuf Suleiman (Local Development Research Institute, Nigeria)
Open mapping is continuously growing and being popularized worldwide as many organizations, companies, and individuals are shifting focus to crowd-source spatial data for their operations and services. While open geospatial data are freely available, many organizations are yet to realize their full potential. Designing data mapping and collection ensures more involvement in usage, and removes the barrier that prevents many stakeholders (including public and private) from using open geospatial data in their respective fields. This session will demonstrate the importance of human-centered design and multistaker-holder involvement and scalable change as key components in successful urban transformation projects in Zaria.
Domain-Data-Protocols for Empirical Educational Research in Germany*
Simon Eckert (GESIS - Leibniz Institute for the Social Sciences)
Researchers are increasingly required by various stakeholders to make research processes as transparent as possible, to enable reproducible research results and to share their (research) data FAIR and openly with others. Such requirements can be challenging, as not all researchers are familiar with the concepts of FAIR or Open Data. At the same time, the tools available to date that are intended to promote the creation of FAIR data - such as templates for data management plans - vary widely and rarely provide guidance. The collaborative project Domain-Data-Protocols for Empirical Educational Research (DDP-Bildung), funded by the German Federal Ministry of Education and Research (BMBF; funding code 16QK01), addresses this challenge by designing standardized model protocols that are intended to ensure the quality assurance and re-use of research data. For this purpose, the collaborative project draws on the expertise of twelve German research institutions from predominantly educational research-related disciplines in the development of the so-called Domain-Data-Protocols (DDPs). Based on a concept from Science Europe, DDPs are open, standardized and referenceable data protocols that are intended to function as a "model" for a data management plan (DMP) in a specific research area, in our case education research. In general, DDPs are useful for different stakeholders. On the one hand, they help researchers to conduct formally sound data management, prepare project proposals and funding applications, and share research data. On the other hand, they also enable the replication of results by the research community and the reuse of data by third parties in new (research) contexts. Furthermore, DDPs simplify data management review processes by reducing the burden of reviewing funding applications and (regular) data management reports through the introduction of standardized procedures. Lastly, DDPs promote the inclusion of data in a data archive by supporting researchers in creating data according to FAIR principles.
Expanding Our Perspective: Building a Sustainable Metadata Culture
Diana Magnuson (ISRDI)
Wendy Thomas (ISRDI)
Several years ago, the IPUMS projects began working on a submission for the Core Trust Seal (CTS) certification. As a research organization with focused archival responsibilities, IPUMS wished to ensure the contributors, funders, and users of the IPUMS data projects that the data was being managed appropriately for long-term preservation and access. This process forced us to look at our data, metadata, and processes in ways that expanded our perspective and required us to improve our systems and the way they were presented to our users. Although CTS was designed for traditional archives, it is of great value to organizations like IPUMS whose focus is on collecting, augmenting, and dissemination, yet have archival responsibilities for the data within their system. Some lessons learned from this process: • Developing trust between contributors, the research organization, researchers, and long-term archives • Viewing our work as a segment of the metadata lifecycle and ensuring we meet the needs of future custodians • Need to capture metadata on our processes to inform future users • Increasing transparency for current users on processing and content • Creating Archival Information Packages (AIP) that provides project level processing information as well as dataset specific content
Measuring the time spent on data curation
Anja Perry (GESIS - Leibniz Institute for the Social Sciences)
Sebastian Netscher (GESIS - Leibniz Institute for the Social Sciences)
Budgeting data curation in research projects is difficult. In this paper, we examine the time spent on cleaning and documenting quantitative data for data sharing. We make use of a pilot study conducted at the GESIS Data Archive for the Social Sciences in Germany between December 2016 and September 2017. During this period, data curators at GESIS systematically documented their working hours and activities while cleaning and documenting datasets from ten quantitative survey studies. We analyse recorded times to identify patterns in their work, e.g., how the size of data affects the time spent for data cleaning. Afterwards, we discuss these patterns with the data curators involved in this work to examine important cost factors in data curation and to better understand factors that increase hours spent for cleaning and documenting data. Overall, we identify two major drivers: The size of the data and personal information contained in the data. With the number of variables included in a dataset, the time spent on cleaning and documenting variables increases. Likewise, sensitive information in the data, e. g., in open-answer questions, require additional time to carefully ensure data protection and anonymization. In addition, we find learning effects when data are similar, f.ex. in consecutive waves of the same study. We also identify important interdependencies between individual data curation tasks and in connection with certain data characteristics. In sum, we provide a first insight in costs and costs drivers in data curation. Our findings contribute to a better understanding of the different tasks, the time spent on them, and the data characteristics affecting the time spent. Thereby, we can derive recommendations on some of the cost factors in research data management.
Session B1
Harvest and Transform of Research Data from Repository to Archive: a SIP from the Source
Joakim Philipson (Stockholm University)
Stockholm University (SU) has developed a software script package for harvest and transformation of metadata and data files from data repositories such as Figshare or Zenodo that are used by SU researchers. The metadata is collected from these repositories, partly enriched with metadata from other sources and then transformed to Swedish National Archives Common Specification Package standard (FGS METS), to be stored together with the associated data files that are fetched simultaneously as SIPs (Submission Information Packages) in the SU local (temporal) archive, awaiting future transformation to AIPs (Archival Information Packages). The end product of this harvest and transformation processing are SIPs where the transformed metadata sip.xml files validate against the FGS - Common Specification Package schemas, (http://xml.ra.se/e-arkiv/METS/CSPackageMETS.xsd and http://xml.ra.se/e-arkiv/METS/CSPackageExtensionMETS.xsd). The associated data files are not transformed or converted in this first step towards long-term preservation and archiving, but we keep track of the file formats ingested partly by mapping file extensions to mimetypes in an XSLT script. The software scripts that were developed for this harvester and transformer are in BASH Unix Shell, XQuery and XSLT. The processing of metadata, data and scripts occurs locally using Git Bash (for Windows), BaseX and Oxygen XML editor, but is essentially software-tool agnostic. Metadata input sources are e.g. OAI-PMH METS feeds from su.figshare.com, API - https://api.figshare.com/v2/articles/, orcid.org and a local list of departments at SU. This presentation will include a live demo of the harvester and transformer at SU, but also give some ideas about how it might be adapted and developed to fit local and other future needs. As the harvest and transformation processing is currently only semi-automatic, requiring human intervention at some points, it is also desirable to increase the level of automation, perhaps by integration of all parts and scripts into a Jupyter Notebook.
Developing an Integrated and Multi-Faceted Software Platform: Lessons from Transitioning the Roper Center Archive
Gene Wang (Roper Center for Public Opinion Research)
The Roper Center for Public Opinion Research moved to Cornell University in 2015. In the years since, the Center embarked on a comprehensive project to replace its legacy archive management software systems with tools built on modern technology and platforms. This presentation will cover some of the overall design principles and philosophies used in conceptualizing and building the new archive management system, as well as examples of specific challenges and lessons learned from the transition process. At its core, the primary functions of our system are to aid in the archival process and facilitate maximum discoverability. However, there are many ancillary functions, ranging from managing lists of organizations and data depositors, to usage reporting and account management of our web portal. Our architecture is intended to address both existing and future needs through extensibility and modularization. In addition, we will cover some of the technical aspects of designing the Roper archive’s digital infrastructure. The approach adopted by Roper is founded on core concepts and best practices in general software engineering, tailored to the unique needs of a digital archive in terms of data integrity and an extended software lifecycle.
Collection bias in social science data archives: what’s in and what’s out?
Brian Kleiner (FORS)
Alexandra Stam (FORS)
A traditional view of archives is that of neutral repositories of records or documents, where archivists make available records that serve as bridges to the past, impassively and methodically gathering and storing various types of objects (e.g., paper documents, digital or physical objects, images). However, in the 1990s the post-modern turn led to intensive and critical reflection on such assumptions, with the growing realisation that archivists are themselves co-producers of records of events, including the selection of what is to be maintained, as well as how records are contextualised, documented, and presented as a reflection of some part of the past. In the perspective of social science data archives, this means that the active processes by which they select and make available certain objects may lead to systematic biases in their holdings. Not all research data make their way into a scientific data archive – what gets in and in what manner depends on the policies and conscious and unconscious choices of archives, as well as researchers. Such “filtering” on the part of archives or researchers could thus distort the field and influence what may be taken up in the future for re-use or replication. We hypothesize that the research data available in social science archives do contain such biases and further are more likely to represent easier to study stable and non-vulnerable populations, examined as part of larger-scale and well-resourced research projects. Using the Swiss national social science data archive “FORSbase” as a case study, we analyze study- and dataset-level metadata to assess whether our own collection has any such biases. In addition, we will present results from interviews with researchers who documented their projects but who did not in the end share their data in FORSbase, this regarding their motivations and reservations.
Building JDCat: Japan Data Catalog for the Humanities and Social Sciences
YUKIO MAEDA (Japan Society for the Promotion of Science / University of Tokyo)
MAKOTO ASAOKA (National Institute of Informatics)
MASAHARU HAYASHI (National Institute of Informatics)
This presentation introduces the first DDI-based data catalog constructed in Japan, the “Japan Data Catalog for the Humanities and Social Sciences” (JDCat). It started its full operation in November 2021. In 2018, the Japan Society for the Promotion of Science (JSPS) initiated a five-year project to create a federated data catalog enabling data search through a single point of entry across research institutions. The National Institute of Informatics provides technical expertise to JSPS. In building a federated data catalog, we studied the practices of major data archives in North America and Europe and developed a metadata scheme for JDCat based on the DDI Codebook. Some of the controlled vocabularies are translated from DDI and CESSDA. Four social science institutions were selected through a competitive tender and were later joined by a major humanities institution. As the first institution to adopt DDI for its data catalog, we created a manual and developed an interface to edit metadata. We also created a guide for researchers new to data sharing. Metadata will be provided in Japanese and English for social sciences datasets. To integrate research data from diverse academic fields, the DDI-based metadata scheme is mapped onto the metadata scheme of the Japan Consortium for Open Access Repository for crosswalk purposes. JDCat itself is a catalog of metadata. The five institutions are responsible for preserving and disseminating data and may maintain their own catalogs. JDCat harvests metadata through OAI-PMH from the social science institutions, and through Resource Sync from the institution in the humanities. After searching the data on JDCat, users are transferred to the local catalogs for data access. Open data, such as tabular data and historic documents, can be downloaded without registration, but registration is required for survey data retrieval.
Panel 1: CESSDA in ERA - collaborations, partnerships and perspectives
CESSDA in ERA - collaborations, partnerships and perspectives
Martina Drascic Capar (CESSDA ERIC)
This session will tackle CESSDA’s involvement in the European Research Area over four broad areas of activities. In case of European Research Infrastructures advancing the European Open Science Cloud - CESSDA is connected to EOSC via several channels, main being the coordination of the SSHOC (Social Sciences and Humanities Open Cloud) project, one of 5 cluster projects connecting thematic communities to EOSC, but also by participating in a number of other EOSC related projects like TRIPLE, EOSC Future or EOSC Enhance). Martina Drascic For research communities advancing the SSH research - CESSDA is either participating or supporting a number of projects and initiatives advancing the SSH research. Examples of the engagement with thematic communities via projects are EurHisFirm (dealing with historic economic data), COORDINATE (child and youth wellbeing), and HumMingBird (migration data community) projects. Connecting established infrastructures with new ones within the domain is another way of establishing a link with thematic communities via the SSHOC project.OPERAS (open publications), RESILIENCE (religious studies), and GGP (gender and generations programme) are welcomed as new and emerging ESFRI infrastructures. Ivana Ilijasic Versic For European RIs advancing the Digital Future of Europe - CESSDA is actively engaged in ERIC Forum Implementation project and RITrain Plus project, both gathering EU research infrastructures to work on common challenges (and in particular the need for educated and properly trained managers). Finally, the cross-domain collaboration is gaining its momentum via alignment with other cluster projects (i.e. in SSHOC project), with other organisations and partners from different scientific domains (i.e. in EOSC Future project) and in life-sciences domain in particular (i.e. BY-COVID project). Vanja Komljenovic In this session, we will use examples and active discussion to address some of the challenges, showcasing results of collaborations in each of the above mentioned areas and demonstrate the best practices used.
Through the sands of the hourglass, these are the data of our lives -- FAIRing the DLI Training Repository
Chantal Ripp (University of Ottawa)
Sandra Sawchuk (Mount Saint Vincent University)
Margaret Vail (St. Francis Xavier University)
Alexandra Cooper (Queen's University)
The Data Liberation Initiative (DLI) is a partnership between Canada's national statistics agency and data librarians and specialists representing universities and colleges across the country. The program was established in 1996 in order to increase access to microdata for teaching and research purposes. In the early 2000s, DLI-affiliated librarians created a repository to provide access to training materials created by and for the data community as a means of supporting knowledge transfer and dissemination. The repository has moved twice already in its 20-year lifespan, but it now needs to be moved again. Best practices in metadata for discovery have changed dramatically over the last few decades, something that is readily apparent when searching the collection. A large number of 'maintainers' and a lack of controlled vocabulary meant that digital materials were deposited and described inconsistently over time, leading to challenges for end-users in the navigation and retrieval of relevant items. After consultation with the DLI data community, it was determined that there was a desire not only for improved description, but for curated learning trajectories designed to support independent learning and development of data literacy skills. As a result, a working group was formed to evaluate new platform solutions and identify opportunities to improve descriptive metadata and file formats of the training materials. In this presentation, we will show how we applied FAIR principles to make the training materials easier to find, (re)use, and adapt, not only for the benefit of our data community but for everyone. The DLI Training Repository is not just a complex collection of learning objects; it's a historical archive capturing the evolving trends in the development of the DLI data community in Canada.
Facilitating database search and analysis through the Roper Center’s node network interface
Kathleen Joyce Weldon Weldon (The Roper Center for Public Opinion Research)
The new Health Poll Database allows users to search 70,000 health-related survey questions from the Roper Center archive and allows for building custom over-time trends. To make the questions easily findable for the novice user, the Roper Center developed a visual exploration tool that not only improves findability but depicts the interconnectedness of health-related concepts. The visual tool therefore serves to both facilitate data search and analysis while educating the user on the interrelatedness of health topics as they explore what the archive offers. In addition to introducing the node network tool, the presentation will provide an overview of the process of working with an advisory committee of subject-area experts and the Roper IT development team to make this new technology a reality, from the initial concept to the final tool. We will discuss the taxonomy decisions, visual design, approaches for tagging questions, and development of curation tools integrated into question entry to ensure the Health Poll Database can be maintained over time.
(Virtual) - Addressing Diversity in Data through the work of Graduate Specialists
Ryan Womack (Rutgers University)
No less than any other area of society, the world of data has historically lacked attention to diversity, equity, and inclusion. This has manifested itself in data collections and sources that lack sufficient representation by race, gender, and ethnicity. It has also resulted in unexamined bias, algorithmic and otherwise, resulting the implementation of methods in data science that differentially impact underrepresented groups. Data literacy and education programs often gloss over such issues while spending more time teaching narrowly defined skills. To address this lack of attention, as part of the Graduate Specialist Program at Rutgers University Libraries in New Brunswick, “Diversity in Data” positions were created in Fall 2021 so that graduate students could investigate data issues through a diversity lens. A diverse group of graduate students, from varying disciplines (Computer Science, Sociology, and Political Science) were hired to investigate topics ranging from the impact of past health data practices on attitudes to COVID vaccination in the African American community, the role of social media in Black Lives Matter, and an investigation of the varieties of algorithmic bias. This resulted in a series of presentations and discussions on these topics for the Rutgers University community. While the Diversity in Data Graduate Specialists are a first step in addressing past deficiencies in data education, further implications for data literacy and data outreach are discussed, such as how to blend this work into other data programs and extend its impact.
(Virtual) - Faculty researcher perspectives on RDM and the pedagogical needs of graduate students
Dany Savard (York University)
Minglu Wang (York University)
The intersection of data literacy and research data management (RDM) training continues to be an important topic for academic libraries. Evidence of successful efforts by libraries to improve their research data services and better support the growing RDM training needs of graduate students is reflected in the LIS literature, and librarians have seized the opportunity to lead these training efforts by collaborating with faculty researchers, student groups, information technology support units, administrators and other important stakeholders. However, efforts to embed research data management training into credit-bearing program curricula are still in their infancy for many in the academic library community, and strategies for integrating RDM training into existing research method courses have not been widely studied or explored. This presentation will explore these areas by reporting on the preliminary results of a small-scale focus group study on teaching faculty perceptions of RDM and the relating pedagogical needs of graduate students. By examining the opinions of researchers teaching graduate level research methods courses in social sciences and health sciences programs as well as the perceptions of active researchers with research mentorship responsibilities in these programs, this presentation will look at how educational or training opportunities in RDM are understood by teaching faculty concerned with readying their graduate students for lab, field research, or real-world contexts. The presentation will also cover the open coding methodology employed by the researchers of this study and the resulting categorization and theory building built from these initial codes. Finally, this presentation will report on researchers’ rationales for integrating RDM training into their courses by looking at elements such as timing, data formats, research contexts, institutional contexts, coping strategies for dealing with RDM challenges, and changes in research culture.
Research Governance strategies to safeguard the personal information of children and adolescents in sub-Saharan Africa
Lucas Hertzog (University of Cape Town)
Boladé Banougnin (University of Cape Town)
Elona Toska (University of Cape Town)
Recent data protection regulatory frameworks, such as the General Data Protection Regulation (GDPR) in the European Union and the Protection of Personal Information Act (POPI Act) in South Africa, impose a foremost governance challenge for research involving high-risk and vulnerable groups such as children and adolescents. Our paper's objective is to unpack what constitutes adequate safeguards to protect the child's personal information, suggesting strategies to adhere meaningfully to the principal aims of new regulations. Navigating this within established research projects raises questions about how research can use these regulatory frameworks to build on existing mechanisms already in use by researchers. Therefore, we will explore a series of best practices in safeguarding the personal information of children, adolescents and young people (0-24 years old), who represents more than half of sub-Saharan Africa's population. We will discuss the actions taken to ensure regulations such as GDPR and POPIA effectively build on existing data protection mechanisms for research projects at all stages, focusing on promoting regulatory alignment throughout the data lifecycle. Our goal is to stimulate a broader conversation on improving the protection of sensitive personal information of children's and adolescents in sub-Saharan Africa. We join this discussion as a research group generating evidence that influences social and health policy and programming for young people in sub-Saharan Africa. Our contribution draws on our work adhering to multiple transnational governance frameworks imposed by national legislation such as data protection regulations, funders, and academic institutions. This paper summarises our strategies to assist research projects involving children and adolescents to align their data governance mechanisms with the most recent data protection regulations.
(Virtual) - Data access before and now: a Nigerian university library experience
Titilayo Comfort Ilesanmi (University of Ibadan)
Data are essential part of any research especially empirical studies. Universities are established to carry out research that would foster individual, organizational, national and international development, hence data emanate from various researches embarked upon by stakeholders. For a while, access to the data and results had been on physical/face to face basis. University library could be described as a place where research for development evolve. University of Ibadan Library now Kenneth Dike Library was established in 1948, same year the as its parent institution was founded. For over six decades, access to their data was through the conventional method. The presence of information and communication technology couple with the recent COVID-19 pandemic has changed the way by which data are accessed. This paper proposes to share best practices at Kenneth Dike Library, University of Ibadan, Nigeria.
(Virtual) - Malawi health research data sharing framework
Chifundo Kanjala (London School of Hygiene and Tropical Medicine - Malawi Epidemiology and Intervention Research Unit)
Clemens Masesa (Liverpool School of Tropical Medicine - MLW)
Marlen Chawani (MLW)
Ben Kumwenda (Kamuzu University of Health Sciences)
Patrick Mapulanga (Mzuzu university)
Maganizo Monawe (Ministry of Health Malawi)
Jon Johnson (CLOSER project)
Providing wide access to research data resources for reuse in sustainable, legal, and ethical ways is a hot topic all over the world. There are obligations placed on researchers in health research to share data, as well as incentives that can be leveraged through sharing. On the other hand, the local contexts in which the research is conducted frequently impose conditions for sharing. Research institutions in Malawi, like those elsewhere, need a framework, customised to Malawian laws and ethical standards, to use in order to fully participate in the global research community while complying to local requirements. To contribute towards contextualising health research data sharing within Malawi, we embarked on a journey of developing a data sharing framework with the commissioning of Malawian ethics bodies governing health research. A series of meetings were held with national and international data experts to gather their perspectives on the essential components of a data sharing framework. This was supplemented by a review of the literature. A draft framework was created using the findings from the meetings and literature. This framework was presented at an ethics workshop in November 2021, and feedback from workshop participants was gathered. Follow-up meetings are currently taking place in order to further develop the framework. A data sharing strategy, ethical considerations, legal considerations, sustainability considerations, infrastructure, and standards have been identified as key components thus far. This presentation will outline the key considerations for each of the core components, as well as the outstanding issues that are currently being investigated. So far, feedback from stakeholders indicates that the framework is required to guide the operationalisation of health research data by Malawi's ministry of health and research institutions. One of the most difficult challenges is tailoring it to the needs of data producers and users in Malawi and beyond.
(Virtual) - The Academic libraries role in providing Research Data Management Services in Uganda
Cissy AKELLO (Law Development Centre)
A number of academic and research libraries globally are providing Research Data Management (RDM) services to support the university research activities. Academic and research libraries in Africa are slowly beginning to implement these RDM services amidst a lot of challenges. Uganda has a number of academic and research libraries in both private and public universities. This paper aims at identifying the RDM services the academic and research libraries provides; how these libraries have integrated these services into their institutional research workflows; identify any collaborative approaches that have assisted the libraries to participate in the development of open source platform for management of the full research lifecycle in support of RDM and the likely challenges they may encounter for further development of the services.
Towards understanding the ecosystem of a sustainable data culture
Emma-Lisa Hansson (Lund University)
Thomas Kieselbach (Umeå University)
Mattias Persson (Örebro University)
Monica Lassi (SNIC)
Goal: Enterprise architecture provides tools to map and understand the capabilities that a university needs for successful and sustainable management of research data in different scientific disciplines. This workshop provides an opportunity for you who gives researchers technical, legal or administrative support for managing research data to achieve overview of relevant capabilities and their connections at a university. That includes everybody who is interested in aligning management of research data with open science. Results: This workshop will present results from a comprehensive study at Lund University with the aim of investigating and mapping capabilities that are relevant to promote good management of research data at the university. It will show that capability maps are useful for visualizing where different capabilities come in at different stages of research data management and how they connect to each other. They provide intriguing overview of the facilities and support staff, which a university and their researchers need during all stages of managing, sharing and archiving of research data. In addition, they can show the emotional experience of researchers with their challenges in research data management and the support that they receive. Innovation: Maps of capabilities can show strengths, weaknesses and needs of an ecosystem for research data management at a university and explain a very complex topic visually. Conclusion: If you are interested in analyzing the strength and interoperability of services for data management support at your university, maps of capabilities can be a useful tool for it. They can help you building data services that provide a basis for a sustainable data culture at your university. This workshop will introduce capability mapping and provide you an opportunity to discuss its strengths and weaknesses.
Visions for the European Social Survey Multi-Level Application
Hilde Orten (NSD - Norwegian Centre for Research Data)
Bodil Agasøster (NSD - Norwegian Centre for Research Data)
Knut Kalgraff Skjåk (NSD - Norwegian Centre for Research Data)
Right from the start of the European Social Survey (ESS), the context that respondents live in has been regarded as important. In 2012 The European Social Survey Multi-Level Application tool was launched, facilitating integrated data download of regional level data on demography, economy etc. together with data from the ESS. With funding from the EOSC Future WP 6.3 Science Project 9, an upgrade of the tool is under planning, with the aim to support integration of climate data with the ESS data and the regional contextual data already supported by the tool. The upcoming work product of the DDI Alliance, the DDI-Cross Domain Integration metadata standard will be used to bridge between data from the different research domains and of different structures. This presentation will focus on the latest ideas for the upgrade of the tool, and will report on work performed so far.
SWISSUbase – A National Data Archiving Platform Tailored For Specific Disciplines
Stefan Buerli (FORS)
Bojana Tasic (FORS)
In the future, research data will be facing two major challenges at the heart of the FAIR principles : discoverability and reusability. Making data accessible on a repository is not really an issue anymore. However, this does not make such solutions completely FAIR-compliant. SWISSUbase is tackling these challenges by putting the strength on rich discipline-specific metadata and a well developed service component. SWISSUbase is a national archiving platform for research data in Switzerland. It features a modular metadata structure that caters to discipline-specific needs and standards of different scientific communities. New metadata schemas or additional features within a schema can be easily added or extended following the evolving research practices and needs. The provided metadata schemas (DDI, META-SHARE, etc.) are compliant with widely accepted standards within the different scientific communities and allow for interoperability with discipline-specific data infrastructures on the international level. Secondly, SWISSUbase is at the center of a national data service network consisting of a community of data curators located at the different partner institutions. The members of these institutional or disciplinary Data Service Units carry out the data curation according to the OAIS standard, ensuring the long-term preservation and accessibility of the deposited data. All data and documentation are carefully checked and validated to increase the reusability of the data and the reproducibility of scientific research. In sum, SWISSUbase provides a solution catering to discipline-specific metadata requirements as well as a network of data curators that assist the different scientific communities in their efforts to archive, document and share their research data.
(Virtual) - Redesigning a Federated Data Discovery Service
Mark Goodwin (University of British Columbia)
Kelly Stathis (Digital Research Alliance of Canada)
With the rapid proliferation of research data, it is vital to create innovative tools for data discovery and access. Canada’s Federated Research Data Repository (FRDR) Discovery Service addresses this need through a national discovery platform, aggregating metadata for datasets from over 80 repositories. In 2020, FRDR started offering a map search, powered by Geodisy, which provides access to datasets with geospatial information using an interactive map. This introduced a second search interface in parallel with FRDR’s original text-based search interface. Now, the FRDR team is working towards combining these two interfaces into one, allowing users to seamlessly combine keyword and map-based queries in an enhanced search interface. In this presentation, we’ll share our approach to merging two separate search interfaces into a single platform, including our process, challenges, and lessons learned.
(Virtual)- Digital Transformation by Design at Harvard Business School
Katherine McNeill (Harvard Business School, Harvard University)
For many organizations, the COVID-19 pandemic served as a catalyst for rethinking how work, including academic research, could be done in more efficient and innovative ways through digital transformation by design. As part of a Harvard Business School-wide digital transformation initiative, a cross-departmental collaboration involving the library, research computing, IT, and research administration emerged to transform and optimize the way that researchers use data. New systems and approaches—many of which have already been employed at other institutions--have potential to transform innumerable realms related to research data, such as governance, acquisition, curation, analysis, interoperable metadata, cloud storage, and the systematic tracking of research outputs. This presentation will discuss the process of envisioning, prioritizing, and building support for these initiatives, leveraging the work already done by the international research infrastructure community, and how the new systems are envisioned to come together to transform research.
Session C2
Providing real-world data experiences: Libraries as data providers and clients in student consulting projects
Sarah Young (Carnegie Mellon University)
Huajin Wang (Carnegie Mellon University)
Teaching faculty and instructors often seek experiential opportunities for students to hone the data science skills they learn in class, using real-world data and working with real-world clients. University libraries are an ideal partner for such projects, overcoming the challenges and barriers associated with establishing partnerships with industry or organizational clients external to the university. Libraries have data from many sources, but limited time and expertise to leverage these data, and thus such projects can benefit libraries as well. Since the summer of 2020, Carnegie Mellon University Libraries (UL) have partnered with the Department of Statistics & Data Science (SDS) in their project-orientated courses and undergraduate research apprenticeships, providing a wide range of datasets and research questions for data analysis projects. These projects make use of a broad range of data sources including internal library usage data, bibliographic data on publications, grants from subscription databases, and open datasets on topics of interest to librarianship and library service development. Student teams use these datasets to answer practical research questions posed by UL faculty and staff, periodically meeting with the UL team to refine the questions and direct project development. This presentation will discuss this partnership with SDS, provide an overview of student projects, and will consider logistical considerations for this work. We will describe potential data types available through libraries that can be leveraged for mutually beneficial projects that provide students with real-world data science applications and the library with an opportunity to pursue data-driven service and tool development.
(Virtual) - Moving with the times: multilingual thesaurus management in a changing environment
Sharon Bolton (UK Data Service, CESSDA)
The CESSDA European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus with extensive coverage of the social sciences. During 2020/21, the thesaurus moved to new internal thesaurus management and external browser platforms (VocBench and Skosmos). This presentation will cover how international CESSDA Service Provider partners collaborated to both manage the challenges of moving to the new systems and embrace the new and improved opportunities the move has brought, including linked open data, mapping with other high-profile thesauri, increasing the FAIRness of metadata, and increased international visibility for ELSST.
Partnership and Collaboration for Better Research Data Management in EfD Network
Samuel Zewdie (Ethiopian Development Research Institute)
The Environment for Development Initiative (EfD) is an environmental economics program focusing on research, policy interaction, and academic programs in fifteen different countries. With regard to research data, it has the goal of establishing a data management (DM) culture that enhances the quality, availability and effective use of research data within the network so that EfD research is replicable, verifiable and transparent. The FAIR data principles are guiding in the development of EfD’s DM culture while partnership and collaboration with relevant institutions are considered strategies to achieve the DM goals. After analyzing the strength and weaknesses of certain intuitions, EfD initiative has partnered with the Swedish National Data Service (SND), located at the University of Gothenburg and has established a Data Access Unit (DAU) at EfD Ethiopia Center. The DAU provides DM technical assistance to all centers and researchers in the network while SND provides the platform for data sharing. Based on this model, researchers across the different EfD centers are required to prepare a good description of the content of data and metadata and make the submission to SND repository system with the assistance of experts from the DAU. EfD also DM management experts is beneficial for increasing the visibility of datasets and to learning from their best practices in data management, data sharing and data re-use. This is especially important in the case of local repositories in other countries where EfD centers are located. The main challenges here are researchers’ flawed attitude towards data sharing, limited knowledge about better data curation tools, language barriers and unstandardized data collection procedures in the centers. To tackle these challenges, EfD is introducing a data policy, promoting the use of DM plan for all EfD projects and attempting to standardize the data collection process throughout EfD centers.
(Virtual) - University of Alberta Dataverse: A Journey from Standalone Application to Hosted Community Platform
Guanwen Zhang (University of Alberta Library)
John Huck (University of Alberta Library)
James Doiron (University of Alberta Library)
Leah Vanderjagt (University of Alberta Library)
Between 2014 and 2021, the University of Alberta Library (UAL) operated a standalone installation of the Dataverse open-source, data repository software, during which time it played an essential role in helping researchers at the University of Alberta store, manage, share, and disseminate their research data. The Dataverse platform is a well-known tool for publishing data that is findable, accessible, interoperable, and reusable (FAIR). Over time, it became more difficult to sustain this essential service, given the high maintenance costs associated with the frequently-updated Dataverse application on the one hand, and a limited pool of in-house IT personnel with the specialized knowledge to perform that maintenance on the other. At the same time, discontinuing the Dataverse service was not an option, because of ongoing demand from researchers for the service. With these challenges in mind, when an opportunity presented itself to join a national Dataverse service, UAL made the decision to move the collection and service from the standalone application to Scholars Portal Dataverse, which hosts the institutional Dataverse collections of more than 50 Canadian post-secondary institutions. The collaboration allows participating institutions to leverage shared computing infrastructure and resources, tap into a pool of dedicated expertises, stay at the forefront of cutting-edge technologies, increase exposure of research data, and, most importantly, overcome the challenge of providing sustainable services with cost-efficiency. For UAL, the process to migrate the repository, beginning with initial conversations, then moving through analysis and deliberation, and finally ending with the planning and execution of the migration, took approximately three years and was completed in 2021. Many valuable lessons and insights were learned along the way. This presentation tells the story of that journey.
From me to we - developing a practice of collaborative metadata management using the Aristotle Metadata Registry
Samuel Spencer (Aristotle Metadata)
Lauren Eickhorst (Aristotle Metadata)
Data management is seeing a resurgence in popularity as organisations continue to become overwhelmed by the deluge of data from numerous sources. Organisations are now looking to data and metadata management experts to assist with stewardship of data from a range of sources and being managed by many staff who are now often working remotely. The common trend shows that data experts are shifting from managers to leaders, helping business and IT areas manage their own data using common organisational concepts. During this workshop we look at ways that traditional metadata standards such as ISO 11179 and the W3C DCAT data catalog format can be used to build a common framework for data management and data governance. We also look at tools and techniques for shifting from a “me to we” mindset by encouraging collaboration and data discovery to get all users from traditional data managers and IT staff to policy researchers and executives engaged in the data management process. This is supported using practice examples and exercises using the Aristotle Metadata Registry based on real-world examples from government, academic and private sector organisations to show how ISO 11179 recommendations can be put into practice in a user-friendly way to improve stakeholder buy-in. In this workshop you will learn how to: * Understand the ISO 11179 metadata framework * Understand the DCAT Data catalog format * Explain how metadata management assists with data strategy * Understand how data lineage improves understanding of data * Explain how collaborative tools assist with metadata management * Demonstrate understanding of metadata development using the Aristotle Metadata Registry * Demonstrate understanding of data governance using ISO 11179 Registration best practices
Panel 2: Research Data Literacy and Socio-economic Development: Sub-Saharan African Perspectives
Research Data Literacy and Socio-economic Development: Sub-Saharan African Perspectives
Robert Buwule (Kyambogo University)
Adetoun Oyelude (University of Ibadan)
Lucas Hertzog (University of Cape Town)
Peter Ngetich (Parliament of Kenya)
The African Union Agenda 2063 is the continent’s blueprint and master plan for transforming Africa into the global economic powerhouse of the future. It is Africa’s strategic framework for attaining inclusive and sustainable development. Additionally, it is a concrete manifestation of the Pan-African drive for unity, self-determination, freedom, progress and collective prosperity. Science, Technology and Innovation (ST&I) is expected to play a pivotal role in facilitating the realisation of many pillars of this development agenda. This calls for greater investment in Research and Development (R&D) to generate valuable knowledge for socio-economic transformation. Indeed, the African Union has urged its members to allocate at least 2% of their gross domestic product to R&D. The knowledge generated through R&D can only be used optimally if it can be mobilised and applied effectively. This requires effective research data literacy among researchers and scholars to ensure that data generated through research is organised, stored, disseminated, and applied to support socio-economic development. Currently, data literacy levels among scholars and researchers in Sub-Saharan Africa are unknown. It is against this background that IASSIST- Africa Chapter proposes this panel discussion to explore the current status of research data literacy among researchers and scholars in Sub-Saharan Africa and how to improve it as a means of strengthening their capacity to contribute to the realisation of African Union’s Agenda 2063.
Documenting a social network survey : lessons learned and ongoing challenges
Laetitia Bideau (Sciences Po)
Valentin Brunel (Sciences Po)
Social network surveys constitute a growing field of social science research. In these original surveys, the object of research is not the individual, but rather the set of relations he/she has with a social field. Due to their relative complexity, social network surveys are seldom documented and disseminated. Metadata norms such as Data Documentation Initiative Codebook (DDI-C) do not easily adapt to this data, which can abound in matrices and where relations, more than individuals, account for most of the researchers’ interests. The Center for Socio-political data (CDSP) has been documenting network surveys for some years. In this presentation, we would like to speak about our strategies for documenting these original surveys and thus facilitate the task of other data managers. The CDSP is now documenting the Caen Panel, a very thorough survey which has gone on for twenty years. As researchers were investigating the way social relations evolved through time on a hundred young adults from the Caen (Normandy) region, they created numerous datasets based on interviews. The CDSP’s approach to documenting this survey will be explained in three parts. First, we’ll comment on the survey’s specificities, which involve ongoing research as documentation work started, the longitudinal aspect of social relations, and the fact that research was carried on in multiple places simultaneously, which means a multiplicity of actors taking part in data collection. Then, we’ll focus on the databases we chose to include in the disseminated survey package and the way we connected them: an individual characteristics database, then a relations database - which is typical of social network surveys - and finally a “calendar” database replacing all events pertaining to social relations in time. To conclude our presentation, we would like to comment on the relative inadequacy of most metadata languages for these surveys’ documentation.
(Virtual) - The Journal Editors Discussion Interface (JEDI): Building and Fostering Community
Priya Silverstein (Syracuse University)
Colin Elman (Syracuse University)
In March 2021, a number of Data-PASS repositories launched the Journal Editors Discussion Interface (JEDI) [https://dpjedi.org], a new virtual community for social science journal editors to ask and answer questions, share information and expertise, and build a fund of collective knowledge. Given the many demands on editors’ time – and given that editors often face similar challenges – there is great value to their interacting with each other about important issues, and pooling their collective experience, sharing lessons, examples, insights, and solutions. The benefits are further multiplied where experts on relevant topics (“Scholarly Knowledge Builders”, e.g. data management personnel, open science advocates) are included in the conversation. JEDI generates that interaction and those benefits. A year since launching, JEDI has ~300 members; over 250 of these are journal editors, and the rest are Scholarly Knowledge Builders. In addition, the JEDI resource page (https://dpjedi.org/resources) lists over 160 resources contributed by more than 30 members. Conversations and resources span topics including (but not limited to) open science (e.g. open data and code policies), ethics, diversifying social science research, and improving the quality of reviews. As JEDI's core user base of journal editors generally serve in those roles temporarily and have other primary career-related identities (e.g., professor, researcher), this creates unique challenges to building an online community and fostering a shared identity among members. We discuss these challenges, potential solutions, and next steps for JEDI.
(Virtual) - Documenting Data to Improve Trust and Support Use Across Disciplines and Vocations
Robert Downs (CIESIN, Columbia University)
Data documentation allows audiences to determine whether and how to use data. Providing detailed descriptions of data products and services enables potential users to decide on the applicability of the data for an intended use. Furthermore, such data documentation also enables users to determine how the data could be used to meet a particular objective. Practically, creating effective data documentation requires detailed knowledge of the data that is ordinarily held by those who have been involved in the collection or production of the data. By utilizing such in-depth knowledge of a data product, rich documentation can be produced that is beneficial to data repository stakeholders. Offering rich data documentation provides opportunities to serve a diversified audience of users and potential users. The audience for data that includes rich documentation could span disciplinary and vocational boundaries, enabling use of data that is not necessarily dependent on the specialized knowledge that is associated with a specific discipline or sub-discipline. Providing rich documentation along with data also can improve the trust that the designated community has in a data repository by demonstrating the repository’s commitment for enabling use. The challenge for creating rich data documentation is in obtaining the documentation from data producers who have not been developing or routinely preparing such documentation as part of their data collection practices. Likewise, data producers who do not expect to be recognized or rewarded for such contributions may not be sufficiently motivated to produce rich data documentation. Aspects of rich data documentation are described along with the challenges and benefits of providing such documentation with data.
(Virtual) - Modeling costs of computational reproducibility and data verification in political science
Cheryl Thompson (UNC Odum Institute for Research in Social Science)
Thu-Mai Christian (UNC Odum Institute for Research in Social Science)
In 2015, the American Journal of Political Science (AJPS) adopted a data verification policy requiring authors to submit data, code, and materials for pre-publication computational reproducibility checks of analytic results. In collaboration with the UNC Odum Institute for Research in Social Science (Odum Institute), the verification of quantitative analyses includes a curation review and re-execution of computational steps to ensure transparency and accuracy of reported results. The addition of this verification process is labor intensive and increases time to publication, which raises questions about the costs of computational reproducibility and data verification policies. Since the policy adoption, the Odum Institute has collected data on each manuscript submission including package characteristics and deficiencies, computation errors, use of restricted data, time in verification, and number of resubmissions. This unique Verification Tracking Database enables the analysis of the costs and challenges in the AJPS verification process. This talk presents preliminary results of time to event models along with regression analysis of the factors to determine their impact on the verification process. Findings will help identify opportunities to reduce overall costs, further establishing the AJPS verification policy as a feasible model to be adopted by other journals, data archives, and research institutions. To our knowledge, research modeling the costs of pre-publication verification policies has not been conducted. The presentation will contribute a novel approach to modeling the verification process and identify specific changes in research practice that can support reproducibility and reduce verification time and costs. This talk builds on research presented by Thompson and Christian on verification errors and challenges faced by authors in making their research reproducible at IASSIST 2021.
Panel 3: Training as grounds for collaboration across disciplinary boundaries
Training as grounds for collaboration across disciplinary boundaries
Irena Vipavc Brvar (Social Science Data Archives, University of Ljubljana (ADP))
Ellen Leenarts (Data Archiving and Network Service (DANS))
Iryna Kuchma (Electronic Information for Libraries (EIFL))
Maja Dolinar (University of Ljubljana (ADP))
Sonja Bezjak (Social Science Data Archives, University of Ljubljana (ADP))
Shanmugasundaram Venkataraman (OPENAire)
Training is one of the main pillars of the Consortium of European Social Science Data Archives (CESSDA). CESSDA’s service providers offer training for various stakeholders (data users, data producers, repositories, data stewards). However, in the evolving landscape of data services and infrastructures, cooperation with and engagement in different training communities is crucial in order to offer different approaches, cover user needs, provide quality FAIR training materials as well as in-person training. This panel will present social science training communities and other initiatives, emphasizing the role of training in community building projects that provide general as well as disciplinary/ topic-specific infrastructures and services. After presentations, panelists will discuss the intersection of the developments and challenges they meet. SSH Training Community and Platform for Training Materials (Leenarts, Braukmann) A platform for training materials for SSH community was built in SSHOC project. Its use among trainers, sustainability and role in the further development of the EOSC will be discussed. Community of Practice in Open Science Training (Kuchma, Clare) An interdisciplinary community, with over 100 members from 20 countries, sharing experiences and practice in training as well as working on joint projects, such as FAIRification of training materials. Community of Practice in Governance and Management of RIs (Dolinar, Vipavc) Community gathers current and future managers to discuss relevant issues and shares experiences about management and operational challenges of RIs and CFs. Data Community for Cohort Survey (Bezjak, Vipavc) Training for the data community of the COORDINATE project will support community-building and facilitate improved access to longitudinal survey data on child wellbeing. EOSC Knowledge Hub (Venkataraman) EOSC Future project was established to consolidate the previous work that has built much of EOSC’s infrastructure. A training task will provide training, but also a technical platform for all EOSC related training elements (access to materials and trainers).
Can we still archive data? A comparative case study of social science data archives under the GDPR
Allison R. B. Tyler (University of Michigan School of Information)
The European Union’s General Data Protection Regulation (GDPR) intended, among other goals, to standardize the data protection practices for personal data in order to streamline the ability to share data about European residents within and between member states while also ensuring that the privacy of the data subjects was respected. 25 May 2018 was not just a deadline for for-profit institutions to demonstrate their compliance with the new European data protection requirements. Research institutions, universities, and data archives also had to demonstrate their compliance, both with the GDPR and with any additional requirements from the national GDPR implementation or accompanying laws. How this occurred, when, and to what extent the institutions consider themselves to be compliant differs even now across the European Union. In this paper, I present the initial findings of a comparative case study of four European Union-based social science data archives and their experiences in addressing the requirements of the GDPR since the 2016 adoption by the European Parliament. The four archives—GESIS, DANS, the Czech Social Science Data Archive, and the Finnish Social Science Data Archive—approached their GDPR adaptations based on their own histories, services, and the national interpretations of the GDPR. While in many ways, there has been data protection standardization across these four archives; in other ways, there are still differences. Based on 40 semi-structured interviews with archive staff, legal advisors, and stakeholders as well as analysis of archive documentation and policies, this presentation will discuss the changes in archive operations and practices and the challenges they faced. Finally, I will identify considerations for archives in non-European Union countries who are implementing similarly strong data protection laws inspired by the GDPR.
”It's hard to create something people use”: Reflections on creating and sustaining a data lifecycle management toolkit to support diversity scholarship
Rachel Woodbrook (University of Michigan Library)
Karen Downing (University of Michigan Library)
Emma De Vera (The Friends of the Saint Paul Public Library / University of Michigan)
Elyse Thulin (University of Michigan)
Laura Sanchez-Parkinson (National Center for Institutional Diversity (NCID), University of Michigan)
There are increasing demands on researchers to continue refining their data practices, including requirements for data sharing coming from federal and other funders. However, support to appropriately manage and assess data (e.g., for sharing) is nonstandard and inconsistent, often relying on the resources of specific institutions, mentors, or individual experience. This is likely to have a disproportionate impact on scholars doing work with data on sensitive populations or that otherwise require extra care, as well as those scholars from underrepresented backgrounds, without well-resourced institutional affiliations, or who otherwise face structural barriers in their work. While these issues demand larger shifts in the research infrastructure, this presentation will discuss the creation of an openly-available toolkit of data lifecycle management resources to support diversity scholarship (scholarship that advances understanding of identity, difference, culture, representation, power, oppression, and inequality). There has been heightened focus and more tools created in this area in the past few years, but there are still not enough easily accessible resources that explicitly integrate DEIA concerns into the full spectrum of data management for research. The toolkit creation process included original research conducted by librarians and students in collaboration with a campus organization for diversity scholars to identify needs and gaps in support, as well as an environmental scan of existing resources which were evaluated and selected against criteria we set. The end result is a curated list of resources relevant to diversity, equity, inclusion, and accessibility considerations for data management in multiple disciplines. The presentation will present and contextualize the toolkit based on the challenges scholars identified, the role of the partnership between the library and diversity scholar organization in conducting research and connecting the toolkit to the scholars, and plans for sustainability and being responsive to scholar engagement in revision of the toolkit moving forward.
(Virtual) - Managing data access in CESSDA I: the Data Catalogue and Beyond
Taina Jääskeläinen (Finnish Social Science Data Archive (FSD))
Darren Bell (UK Data Service)
Sharon Bolton (UK Data Service)
The catalogue records of CESSDA’s national Service Provider organisations are harvested for the CESSDA Data Catalogue. Each organisation has their own data access classification. The classifications differ due to differences in national legislation and the type of data deposit agreements. User feedback from the CESSDA Data Catalogue suggests that users would like to be able to narrow down their searches to find open access data. CESSDA plans to manage this by creating a common access vocabulary to which Service Providers map their own categories. To facilitate this, a vocabulary element for access has been suggested for DDI standards. Current challenges and developments in the wider landscape also provide background and context to the harmonization of data access.
(Virtual) - Managing data access in CESSDA II: the Data Access Policy
Sharon Bolton (UK Data Service)
In 2016, CESSDA published the first edition of its Data Access Policy, which aims to provide a set of principles for CESSDA Service Providers (SPs) covering common standards for the provision of data access, based on the CESSDA Statutes. Since then, new developments have arisen in the social science data community concerning the needs and requirements for data access. These include the FAIR principles and other external initiatives that CESSDA needs to comply with, such as the General Data Protection Regulation (GDPR), the EU’s Open Science Policy, and CESSDA’s own developments, such as the CESSDA PID Policy. Also, new SPs have joined CESSDA since the last version was published. During 2021/22, the Data Access Policy was revised to take into account all of these developments. This presentation covers the new elements introduced and considers how such a policy can work across a wide range of different national SP organisations.
(Virtual) - The Data Bootcamp as a Platform For Data Literacy Education: Reflections from the University of Colorado-Boulder
Aditya Ranganath (University of Colorado Boulder Libraries)
Jordan Wrigley (University of Colorado Boulder Libraries)
An important pedagogical development in contemporary data literacy education is the use of intensive multi-day “data bootcamps” to teach learners from diverse backgrounds foundational data competencies that they can immediately apply in their work and research. Academic libraries and digital scholarship centers are a part of this trend, experimenting in particular with the bootcamp format as a way to extend traditional short-form library teaching on data-related topics. In the summer of 2021, the Center for Research Data and Digital Scholarship (CRDDS), a formal collaboration between the University Libraries and Research Computing at the University of Colorado-Boulder, held its inaugural Data Bootcamp for graduate students. The Bootcamp was hosted virtually over three days, and attracted over seventy attendees with diverse disciplinary backgrounds and experience levels. The curriculum and lesson plans, which were newly developed for the Bootcamp in collaboration with Bootcamp instructors (most of whom are full-time employees at CRDDS), covered a range of topics across the research lifecycle. These included strategies for data discovery; data analysis and visualization in R and Python; tools for collaboration and reproducible research (such as GitHub, Docker, and institutional repositories); and the critical evaluation of found data and data ethics. After the Bootcamp, participants had the opportunity to certify their knowledge of the material taught in the Bootcamp by earning a micro-credential (i.e. “digital badge”) that was sponsored by the University Libraries, and formally issued by the University’s Office of the Registrar. In this presentation, we describe our experience in conceiving and organizing this Bootcamp, and reflect on its pedagogical and logistical successes and challenges, with a view towards helping other librarians and data professionals organize similar events at their own institutions. Whenever relevant, our discussion is informed by survey data that was collected from Bootcamp participants before, during, and after the event.
Data Literacy Competencies for Librarians: A Scoping Review
Meryl Brodsky (University of Texas - Austin)
Hannah Chapman Tripp (University of Texas - Austin)
Data Literacy is a phrase that has been used in the library and information literature for roughly twenty years. Most understand data literacy to mean teaching students or others to be critical users of data. At present, there are at least three different major clusters of competencies: those related to information literacy, which chiefly involve finding, evaluating and using secondary data, those related to the research data lifecycle, which start before data collection or creation, to data storage and licensing, and those related to employment, which have to do with finding, evaluating and manipulating data to be useful for decision making. Increasingly, liaison librarians, subject specialists, and other librarians are expected to gain awareness concerning data literacy and to teach elements thereof to students and researchers. What do liaison librarians, subject specialists, and other librarians whose focus isn’t data need to know about data literacy? If one were to list competencies related to data literacy for academic librarians, what would they be? Our research focuses on this question. We will conduct a scoping review of the published and grey literature with the aim of mapping data literacy competencies for librarians. We will present our research to date, and ask for feedback/input from attendees. Our presentation will highlight what we’ve found, and provide an opportunity to dialogue about the fundamentals.
(Virtual) - Strategies for Inclusion in Data Literacy Programming
Gabriele Hayden (University of Oregon Libraries)
Cameron Mulder (University of Oregon Libraries)
The Data Services Department at the University of Oregon was designed from the beginning to prioritize the needs of under-served students. In this vein, the focus of our statistical consulting service and data literacy workshops has been undergraduates as well as graduate students and faculty. In this talk we discuss some of our strategies for reaching under-served students (students of color, immigrants, and women in male-dominated fields), including the following: - hiring students of color, students from immigrant backgrounds, and non-traditional students to provide peer consulting services and help lead workshops, - encouraging those students to develop workshops directed at their communities of origin. This has meant, for example, offering a workshop sequence teaching R in Chinese (not common at a US university), and hiring a student specifically to liaise with and offer services to a program for underrepresented undergraduate students who intend to pursue careers in research. As time allows, we will also address the limitations of our strategies and invite feedback from audience members about what strategies have worked at their institutions.
Data literacy a catalyst to improving publishing trends and patterns of Academic Staff in Kyambogo University Uganda
Robert Stalone Buwule (Kyambogo University)
Eliz Nassali State (Kyambogo University)
Edward Mukiibi (Kyambogo University)
A lot of research takes place in universities as part of their mandate of promoting teaching, learning, research and community engagement. Research is mainly conducted by the academic staff and students as a contribution to national development. Since the inception of Kyambogo University in March 2003, a lot of research has been carried out. However, it has been difficult to understand the nature of research produced and the rate of publications by the academic staff since 2003 leaves a lot left to be desired. The study employed a mixed methods research approach using a case study research design which enabled the researchers’ to collect detailed information, observe lessons and deeply study patterns of the phenomenon. The study adhered to the standard ethical procedures particularly relating to anonymity, confidentiality and privacy. This paper explores initiatives that can be used to improve the literacies of academic staff to deal with research data and publications. The initiatives explored are; awareness of open data, promoting freedom of using open data, partnering with open educational resources agencies, engaging students in data literacy activities, datafication of single and joint academic staff publications and a pedagogical reflection of datafication. Identifying the areas where the university can improve the academic staff publications through data literacy is key in increasing the research productivity of the university. Integrating data literacy in the publication culture of Kyambogo University will go a long way in addressing the bottlenecks and develop a supportive research infrastructure to deliver quality research and increase its societal impact.
(Virtual) - A recommendation to the SSH community: Take a linguist on board
Jeannine Beeken (University of Essex, UK Data Service)
In this paper we address how Natural Language Processing (NLP) approaches and language technology can contribute to data services in different ways; from providing social science users with new approaches and tools to explore oral and textual data, to enhancing the search, findability and retrieval of data sources. By using linguistic approaches we are able to process data, for example using Automated Speech Recognition (ASR) and named entity recognizers (NER), extract key concepts and terms, and improve search strategies. We provide examples of how computational linguistics contribute to and facilitate the mining and analysis of oral or textual material, for example (transcribed) interviews or oral histories, and show how free open source (OS) tools can used very easily to gain a quick overview of the key features of text, which can be further exploited as useful metadata. ASR tools can distinguish spoken natural language from surrounding noise. They also convert spoken language into written language, which can be aligned with the spoken fragments. NERs identify and classify named entities, such as person and organisation names, geospatial terminology. They can simplify anonymisation and assist checks on disclosure and de-identification. Extraction tools detect keywords to assist indexing. An example, the relevant meaning of ‘was’ as used in the UKDS data catalogue is ‘Wealth and Assets Survey’, not the verb ‘was’. Concordances (KWiC/Keywords in Context) and correlations in a text corpus can help to detecting unexpected patterns, for example between ‘schools’ and ‘knives’. Search, findability and retrieval are optimized by implementing spellcheckers, stopwords or providing autosuggestions from a thesaurus, a list of keywords and their synonyms (e.g. ‘war’, ‘armed conflict’), or a list of abbreviations/acronyms as used in social science research (e.g. CLOSER, GUS). Finally, using a stemmer when searching for ‘tax’ finds correctly studies about 'tax, taxes, taxation’, but not ‘taxi’.
Ch-ch-ch-ch changes......the future of data skills training within the UK Data Service
Vanessa Higgins (UK Data Service)
How might quantitative data skills training within data services change in the future? How can we approach things differently? Who should we be reaching and how do they want to learn? This presentation outlines the UK Data Service's approach to quantitative data skills training, including recent changes and thoughts for the future. We discuss our experiences of new training formats and topics, such as upskilling social scientists in computational social science methods, and how the data and feedback we collect from users (and potential) users of our training is shaping our thoughts for the future of our training. We present our vision as we seek to prepare diverse users for the data skills and knowledge they will need to create relevant research now and in the future.
The Virtual Educational Observatory – making data for research on education and learning in Switzerland visible and usable
Rahel Haymoz (University of Applied Science of the Grisons)
VEO, the Virtual Educational Observatory, is a four-year project funded by the Swiss National Fond. It brings together experts from research on education and learning with experts from data science. Jointly they work on making data for research better visible, discover new data sources (like social media data), and put existing and new data sources in relation. The project must cover a high number of different dimensions. From data collection perspective sources are surveys, competency tests, administrative data, log files from educational software, social media, etc. Developed methods and tools come from data visualisation, data linkage, data matching, and harvesting and connecting of metadata, etc. Users of the solution are researchers on education and learning as well as experts on educational monitoring. The scope is thereby not limited to schools. With the term of life-long-learning, data on education and learning is relevant for every age. Besides technical and methodological challenges, privacy and data security topics play an important role. The overall goal of the project is to provide a platform in form of a virtual observatory - accessible to researchers throughout Switzerland. For that, VEO positions itself in the field as a meta database service provider. The talk will introduce the audience to the VEO project and present the current development lines. Beside this overview a couple of topics will be described and discussed in more detail.
Datenräume for the canton of Grison (Eastern Switzerland) – Discussing the concept
David Schiller (University of Applied Science of the Grisons)
The canton of Grisons in Eastern Switzerland has a population of 200’000. It is the biggest canton with respect to land mass. The highest mountain is Piz Bernina with 4049 meters. Traveling by train from the capital city Chur to St. Moritz takes about two hours by train. Beside being the perfect place for your holidays the canton of Grisons faces the challenges of digitalisation on many different levels. While at the same time digitalisation may also be the solution. Only some examples should be mentioned. Citizen participation without traveling for hours, efficient management and marketing of tourism, education for everybody, useful surroundings for start-ups, efficient and modern avalanche warnings, traffic control systems, emergency rescue systems. The canton of Grisons is perfect environment to develop the concept of Datenräume. This concept may help to solve those challenges. Datenräume are data infrastructures that are customized to serve the public good instead of giving the power about data into the hands of some commercial companies. Every person, every company, every administration, every scientific activity produces data. Data that needs protection on one hand but also has potential to serve the public good. Building a data infrastructure to protect and unfold data at the same time, where society is responsible for taking care of data, may help to move not only business but complete societies into a future where data is really a part of daily life. The challenges are huge and rang from technical, business, and ethical to legal topics that need to be discussed and solved. Within the talk the concept of Datenräume should be discussed regarding its potential for serving the common good and helping to move a culture of data into the daily life of everybody.
Information Architecture for Secure Trustworthy Digital Repositories: Quality Culture, not Standards Theater
Hervé L'Hours (UK Data Archive)
Covering emerging expectations of full lifecycle, interoperable data services, how repositories can guide and align with these expectations and the critical role of transparency in levels of data curation and preservation to data depositors and users. Covering different types of data services, their roles and potential interactions, the obligations and performance indicators of CESSDA ERIC membership, building policy frameworks to support best practice, change management and evidence for certification. Repositories and research infrastructures such as CESSDA are critical nodes for serving researchers, as data depositors and data reusers. Repositories have been at the forefront of defining standards, processes and certification schemas for organisations, technologies and digital object management, including Trustworthy Digital Repository requirements such as the CoreTrustSeal. Repositories must manage their internal information to demonstrate compliance with a range of standards, and expose metadata about themselves and the objects in their collections to interoperate with a range of other data services, including object and repository registries. Overall this presentation will outline the variety of different data services that constitute modern data infrastructures and explore issues around their management and what they can learn from each other.
(Virtual) - Data Integrity: A Cornerstone of Rigorous and Reproducible Research
Patricia Condon (University of New Hampshire)
Julie Simpson (University of New Hampshire)
Maria Emanuel (University of New Hampshire)
Data integrity provides a strong foundation for high quality and valuable research outcomes. Intentionally realizing data integrity in the research process is essential due to its critical role in research rigor, reproducibility, replication, and data reuse (the 4 Rs). This is imperative in collaborative interdisciplinary research and collaborative cross-sector research where different norms, procedures, and terminology regarding data exist. Data integrity is closely associated with data management, data quality, and data security. Producing data that are reliable, trustworthy, valid, and secure throughout the research process requires purposefully planning for data integrity and careful consideration of data lifecycle actions like data acquisition, analysis, and preservation. In addition, purposeful planning enables researchers to conduct rigorous research and generate outcomes that are reproducible, replicable, and reusable. To advance this conversation, we developed two conceptual models: a visual representation of the relationship between data management, data quality and data security as components of data integrity, and a schema for implementing these components in practice. We contend that untangling data integrity and its components, developing a standardized way of describing their interplay, and intentionally addressing them in the data lifecycle reduces threats to data integrity. The aim of this project is to unravel the complexity of data integrity in a way that is useful for data producers, providers, users, and educators. Part of that utility is positioning our conceptual models within the larger dialog around research integrity and data literacy and illuminating the role that data integrity and its components play in the 4 Rs. In this paper, we present our conceptual models for use as tools for instruction/training and practical implementation. Using these models, we examine the role of data integrity in rigorous and reproducible research and offer insight into ensuring data integrity throughout the research process.
(Virtual) - What Makes a Trend? Approaches for Presentation of Data over Time
Kathleen Weldon (Cornell University)
The Roper Center for Public Opinion Research’s iPoll database holds over 800,000 questions with complete question wordings and topline/marginal results from polls dating back to 1935. The database, therefore, is an unparalleled resource for researchers looking to track public opinion over time. But tracking trends can be complex, as changes in wording, response categories, survey organizations, or population can affect comparability over time, and tolerance for variation can differ between researchers. Roper Center has built a new software system that identifies likely trends within the iPoll database, allowing Center staff to build and display semi-customizable results to users. Examples of Roper Trend functionality are currently available in the Health Poll Database and will be available in Roper iPoll in 2022. This presentation will include a demonstration of Trendbuilder, the software powering Roper Trends, and a discussion of the metadata it utilizes, as well an overview of the development of standards for question inclusion in trends. The presentation will also address how this project resolves the tension between helping users avoid common pitfalls of trend-building and allowing users to make choices about trend criteria.
Thinking critically about sources: a recipe for data literacy
Elizabeth Hill (Western University)
Alexandra Cooper (Queen's University)
Kristi Thompson (Western University)
In our roles as data specialists and librarians within the Canadian academic community we actively promote data literacy through our library instruction and outreach to courses that have a requirement to locate reputable data sources for analysis. Our collective experience delivering data literacy sessions has led to the development of a shared pedagogical approach to teaching an instructional data session. We used this approach to develop a template for a forthcoming ACRL Cookbook on data literacy. In this presentation, we will introduce our approach to delivering instruction to post-secondary students in the social, health sciences, or related disciplines. Our approach focuses on helping students understand data as an information source by helping them think critically about how and why data are collected and shared. The instructional session focuses on locating, evaluating, and using sources of secondary data as information resources. We encourage students to consider how data are created and used by exploring the following questions – Who collects data? Why do they collect data and why might they choose to share it or not share it? How is data collected, and how do administrative and survey data differ? By working through these questions, we help students to learn why public data may be available on some topics and not others, understand key limitations of official statistics, and think about where information can be found to fill in the gaps. This session not only highlights the value of data literacy as a learning objective but also acknowledges the benefits of early careers practitioners learning from experienced colleagues.
Using, managing and sharing vocabularies: SSH Vocabulary Commons
Mari Kleemola (Tampere University and CESSDA)
Daan Broeder (CLARIN ERIC)
Darren Bell (UKDS)
Matej Ďurčo (OEAW)
Metadata and data interoperability are crucial for searching, finding, using and understanding research data, especially in cross-discipline settings and/or when the volume of data grows. The descriptive information provided therefore needs to be structured so that it can be read and/or acted upon by machines. This is addressed in the FAIR data principles which include three requirements on interoperability, with one stating that (meta)data needs to use vocabularies that follow FAIR principles. This presentation will offer perspectives on both familiar and developing aspects of the role and management of vocabularies, drawing from the experiences in the Social Sciences & Humanities Open Cloud (SSHOC) project and the SSH Vocabulary Commons initiative that is working towards common recommendations for using, managing, operating and sharing vocabularies and vocabulary services. The presentation will explore the challenges, technologies, benefits and developments in the field of SSH vocabularies and offer ideas on how to manage SSH vocabularies and how to incorporate vocabularies in the research data lifecycle, as well as showcase results from the SSHOC project.
Session E2
(Virtual) - A High Performance Partnership: Data Librarians and Supercomputer Centres
Kelly Schultz (University of Toronto)
Marcel Fortin (University of Toronto)
Kara Handren (University of Toronto)
As datasets have continued to grow exponentially, data libraries are struggling with ways to provide access to them and support their use. The University of Toronto’s Map & Data Library recently explored options for providing access to one such large dataset: Web of Science Raw Data (XML). Researchers across disciplines are interested in using this dataset to explore citation networks and conduct bibliometrics research. University administrators are also very interested in this dataset for their reports and benchmarking. Querying this dataset can result in a subset of millions of records; thus, the challenge is not just in accessing the data, but also how to work with such a large number of results. To overcome these obstacles, the Map & Data Library has developed a mutually beneficial partnership with our High Performance Computer Service on campus: SciNet. This partnership enabled us to develop a new service that provides access to the Web of Science XML through an environment where researchers can effectively query and work with this dataset. This presentation will focus on our experiences with this project: how it came about; how the relationship was developed and navigated; the challenges in building a sustainable service; what our final solution was; what roles the Map & Data Library and SciNet play in this service; and our future plans to continue to expand this fruitful partnership.
Raising the metadata bar: technology, culture and resourcing: lessons from documenting UK Longitudinal Studies
Jon Johnson (CLOSER, Social Research Institute, UCL)
Haley Mills (CLOSER, Social Research Institute, UCL)
In 2012 CLOSER was established, and amongst its tasks was to document to the best available metadata standards eight of the UKs longitudinal studies, which cover both social science and biomedical domains. The initial impetus was the variation amongst the studies in the available metadata, the lack of machine readable provenance information and the technical, cultural and resource barriers within some of the studies to achieve this. The paper will describe how the provision of technical and logistical solutions, alongside the development of new ways of working, practical demonstration, training and resource allocation has led to a positive change in the perception of the need for high quality metadata.
Building and sustaining a curation community: Updates from the Data Curation Network
Sophia Lafferty-Hess (Duke University)
Since its launch in 2016 the Data Curation Network (DCN) has grown into a radically collaborative network of institutions, data repositories, and organizations focused on the ethical sharing of research data. Growing beyond its original scope of shared staffing for data curation, the DCN has become a thriving community and sustainable organization that advocates for data curation and data curators and provides a unique platform for exploration and research. This presentation will highlight efforts underway in the DCN and provide project updates including DCN’s new membership model and plans for ongoing sustainability, special interest groups, research assessing the value of curation, and other initiatives.
(Virtual) - QualidataNet – a federated archiving infrastructure for qualitative research data
Kati Mozygemba (University of Bremen, RDC Qualiservice)
Noemi Betancort-Cabrera (State and University Library Bremen)
Tobias Gebel (German Institute for Economic Research)
Hanna Hedeland (Leibniz Institute for the German language, Archive for Spoken German)
Jan-Ocko Heuer (University of Bremen, RDC Qualiservice)
Dilek İkiz-Akıncı (German Center for Higher Education and Science Studies, RDC for Higher Education and Science Studies)
Susanne Klauke (Leibniz Institute for Research and Information in Education)
Alexia Meyermann (Leibniz Institute for Research and Information in Education)
Silke Reineke (Leibniz Institute for the German language, Archive for Spoken German)
Dirk Weisbrod (Leibniz Institute for Research and Information in Education)
Betina Hollstein (University of Bremen, RDC Qualiservice)
Background: One reason, why archiving and re-use of data in qualitative social research is still an exception, is the lack of a sustainable and interlinked infrastructure, which develops services in close cooperation with the specific research communities. So far, only few Research Data Centers (RDC) established procedures and tools to professionally curate and prepare qualitative data, which often comprise sensitive personal data and high context-sensitiveness. Moreover, the RDC-landscape consists of various small, unconnected and highly specialised archives, usually dealing with challenges related to data curation on its own. At the same time, researchers often struggle to identify an RDC that offers suitable conditions for archiving and disseminating their data, and secondary users have difficulties in finding appropriate data for their research purposes. Objective: To overcome this fragmentation and to promote access and re-use of qualitative data, the Consortium for the Social, Behavioral, Educational and the Economic Sciences as part of the National Research Data Infrastructure in Germany, aims at creating a community-centred federated network for qualitative data: The QualidataNetwork (QualidataNet). We will present how QualidataNet can transform the heterogeneous and fragmented landscape with a coordinated, user and service-oriented infrastructure by functioning as a single point of entry with an outstanding overview of the landscape of archiving and sharing research data. This includes guiding researchers in supporting data preparation, finding a suitable partner for archiving and data sharing, as well as developing an internationally compatible core metadata schema and controlled vocabulary for the various kinds of qualitative data taking the work of the DDI Alliance Qualitative Working Group and further metadata schemata into account.
Session E3
Responding to the “replication crisis”: requirements and stakes for archives
Marieke Heers (FORS)
Brian Kleiner (FORS)
Alexandra Stam (FORS)
Emilie Morgan de Paula (FORS)
While more and more journals and funders are requiring the sharing of data used in publications, the available infrastructure is lagging. One the one hand, in the absence of providing such services themselves, the journals and funders can only point to repositories that fulfill minimum criteria (e.g., FAIR, long-term preservation). On the other hand, researchers may lack satisfactory options for making their replication materials available. While scientific data archives are a logical solution to these problems, it is far from clear that most are ready and able to play host to the materials that would satisfy the requirements of journals and funders. This is largely because data archives have traditionally focused far more on re-use than replication, with accompanying technical, policy, and workflow implications. This paper will address the minimum conditions for archives to make way for replication, from technical, policy, and workflow perspectives. This includes adaptations that allow for the sharing and discovery of replication materials, such as appropriate metadata, data citations, and persistent identifiers. Archives should also be prepared to handle issues regarding sensitive data used in publications or data with highly restricted access (e.g., administrative data). Once such services are in place, archives need to make this known to journals and funders, and so outreach is key. Archives are well-placed to serve the role of host to replication material during the “reproducibility crisis”, with most of the requisite skills and capacities. In this contribution, we will present a tool and service that FORS has recently developed to allow researchers to share their replication material. We will describe our experience with setting up and implementing this service, as well as the challenges that require further development of the tool.
Working together to ensure the safe use of sensitive and confidential data: exploring a collaborative training approach
Deborah Wiltshire (GESIS Leibniz Institute for the Social Sciences)
Secure data access facilities globally provide research infrastructures that allow the sharing and safe use of confidential data for research. In recent years there has been a shift towards virtual data enclaves or Remote Desktop systems that offer fewer physical controls. The Secure Data Centre at GESIS in Germany is currently developing a new remote desktop system and must consider how best to compensate for the reduction in these physical controls to safeguard the data it makes available. In existing remote desktop systems, these physical controls are often replaced with other safeguards, including researcher training. This training aims to equip researchers with the knowledge required to use sensitive and confidential data safely and ethically. Developing training is resource intensive so canonical training materials are an economical approach to providing standardised, high-quality materials for researchers. As part of the Social Science and Humanities Open Cloud project, researchers at GESIS have developed canonical training materials. The aim of the training materials is that any secure data access facility professionals can use the materials as a framework on which to build their own training course. As development moves towards remote access connections that allow access across organisational and international borders, having some commonalities in the training that services offer will allow secure data access facilities across the world to be confident that researchers have received high quality training regardless of where they trained. Delivering such training courses are also a burden on often tight resources, what scope is there to help Secure data facilities manage this through a collaborative approach? Could multiple organisations collaborate to develop and deliver a common training course? Is this feasible? Can organisational or country-level differences be successfully accommodated? This presentation discusses these questions in more detail.
It was the best of times, it was the worst of times: Social Media Data Collection
Michael Beckstrand (University of Minnesota)
As social media research gains more attention and interest across the social and behavioral scientists, there’s an ever increasing demand for support in harvesting the plethora of data generated by social media platforms from students and researchers. The corporations behind these platforms continue to shift their platforms’ accessibility to non-business interests, making collecting data from these services an ever-shifting target for social media researchers. This presentation will take stock of current access permissions and limitations across main social media platforms, while also exploring graphical tools (e.g. Social Feed Manger, TAGS, FacePager) and R & Python package corrollaries for collecting social media data. It draws on experiences supporting both graduate student and faculty research across the social and behavioral sciences, including both qualitative and quantitative modes of inquiry.
Institutional research data policy: feedback on development, strategic and operational issues
Sophie Forcadell (Sciences Po)
Cyril Heude (Sciences Po)
The first part of the presentation will address feedback on the design of a research data policy at the scale of an institution within a national and international context. How to articulate a bottom-up approach taking into account the whole research community of an institution and the framework driven on one hand by a national policy and network and on the other hand by funders. What is the concrete and actionable objective of an institutional data policy and how can its evaluation and evolution be anticipated? What questions does the governance of this type of policy raise and what are its blind spots? The second part of the presentation deals with the role of a data policy in structuring the transversal coordination of the partners involved in supporting the management and dissemination of research data. A focus will be made on the visibility of the services offered and the associated workflows at all stages of the data life cycle and also on the dialogue between governance structure and operational body for inter-service and inter-laboratory cooperation. Another aspect introduced will be the organisation of support between first-level information and the transfer to experts in order to have a modular approach to working with research teams.
Panel 4: Economic and Social Research Council UK – DigitalFootprints: data services for the past, present and future
Title: Economic and Social Research Council UK – DigitalFootprints: data services for the past, present and future
Bruce Jackson (Economic and Social Research Council)
People’s interactions with the world are increasingly digital, creating digital-footprints-data (DFD) including internet, geo-spatial, commercial and sensor data. DFD are diverse, powerful, large-scale and complex. Successfully obtaining and leveraging DFD is in the vanguard of modern social science. However, DFD aren’t used to their full potential: researchers are limited by insufficient data access and infrastructure, underdeveloped methodology and opaque ethical procedures. These gaps, coupled with a lack of coordination and leadership, severely curtail the UK’s ability to extract key value. ESRC has years of experience exploring and meeting the challenges of access and use of largely closed proprietary data not collected for the purposes of research which closely align to the conference themes. ESRC intends to build on this experience to deliver a transformational shift in the creation, access and use of DFD. DigitalFootprints will provide the required leadership, skills, coordination and data infrastructure to leverage and magnify the strengths of DFD and the social sciences to address pressing research and policy questions. To test our approach, in 2022, ESRC will launch a Prototype consisting of: • Coordinating Hub • Accelerator Programme • Programme of Data Services, including Consumer Data Research Centre and Urban Big Data Centre. Following extensive testing, ESRC plans (subject funding) to ramp up to a major step change investment, addressing critical gaps in the landscape and embedding itself by: • working with bodies such as ADRUK, HDRUK and ONS to design and deliver a unified UK network of Digital Research Infrastructure • leading cross cutting and coordinated programmes, building on existing capability, partnerships, and networks This panel will: • present ESRC’s plans for DigitalFootrprints • explore and discuss the challenges of delivering future DFD services. Speakers: • Bruce Jackson, ESRC • Senior Strategic DigitalFootprints Advisor • Dr Amy Orben, Cambridge
Panel 5: The CESSDA Data Archives joint efforts to support journals in data sharing and reproducibility
The CESSDA Data Archives joint efforts to support journals in data sharing and reproducibility
Janez Štebe (ADP/UL - Slovenian SocialScience Data Archives, University of Ljubljana)
Sonja Bezjak (ADP/UL - Slovenian SocialScience Data Archives, University of Ljubljana)
Serafeim Alvanides (GESIS)
CESSDA ERIC is a European landmark data infrastructure owned by the member states. The data service provision is distributed among national service providers (SPs). The presentations on the proposed panel will show results of recent cooperation between selected CESSDA members’ data services and scientific journals. The dedicated CESSDA Journals Outreach 2021-22 project started in 2020. The first year of the project resulted in a national and international landscape analysis covering an overview of journals’ requirements and needs, and an assessment of the corresponding CESSDA SPs capacities to support specific needs. The project was prolonged in 2021 and 2022 with the journals’ outreach and support activities. The activities in 2022 concentrate on the national pilot studies in supporting journals, with the aim of collecting and sharing the experiences among project partners and demonstrating to the wider audience possible future developments. The presentations will cover the whole range of support to journals, starting with the challenges in how to equip them to be able to articulate realistic and appropriate data sharing policies. The promotion of data sharing policies goes in parallel with the support offered in implementing them using the dedicated data services. Thus, barriers and hesitation among journal editorial teams can be mitigated through the partnerships offered in providing data sharing facilities and advice, adapted to journals’ needs. Finally, new developments among the CESSDA national SPs will be presented in enhancing the reproducibility and replication of the published results. The panel consists of the following presenters of the national journals and data services pilot cooperation: Serafeim Alvanides, Reiner Mauer (GESIS, Germany), Sonja Bezjak, Janez Štebe (ADP/UL, Slovenia), Marijana Glavica, Irena Kranjec (CROSSDA/FFZG, Croatia), Brian Kleiner (FORS, Swiss), Dimitra Kondyli, Nicolas Klironomos, Apostolos Linardis (EKKE, Greece) and Helena Laaksonen (FSD, Finland).
Data domain specialists and flagship projects: challenges and successes in crafting partner contributions to the Swedish National Data Service consortium
Gustav Nilsonne (Swedish National Data Service)
The Swedish National Data Service (SND) is a national infrastructure for data sharing, operated by a consortium formed by 9 of the largest higher education institutions in Sweden. SND is funded by the Swedish Research Council as well as by in-kind contributions from the consortium members. This talk will describe the development, evaluation, and refinement of the in-kind contributions from consortium members in terms of data domain specialists and flagship projects. Data domain specialists were conceptualized as researchers with strong domain specific expertise as well as expertise in data management and sharing, and were appointed from 2017 onwards. The data domain specialists represent a strong knowledge base, and have performed important work in areas including outreach to scientific communities and policy development. However, evaluation showed considerable diversity in the perception of roles of data domain specialists, and challenges in the governance model where domain specialists are employed by each consortium member, by working towards common aims in the consortium. More recently, flagship projects were introduced as an alternative in-kind contribution. The flagship projects will consist of targeted efforts to improve data management and FAIR sharing, with specified timelines and deliverables, and with transferable components which can be disseminated for national and/or international value, introducing also the possibility for joint projects between consortium partners. Currently the flagship projects are in a piloting phase. Experiences and lessons learned will be discussed.
“RDM Compas” – An online platform to foster data management skills for data curators in the social sciences
Tatiana Kvetnaya (Leibniz Institute for Psychology (ZPID))
With the ongoing digital transformation and ‘scientification’ of data infrastructure tasks, the demand for research data management (RDM) skills and scientific data literacy is rising. This is especially relevant for infrastructural service facilities like research data centres (RDCs), which are crucial for curating, archiving and providing access to research data. However, according to a survey conducted among RDCs, many applicants for positions in RDCs are lacking RDC-specific competencies, such as knowledge about ethical and legal aspects of RDM, and data documentation skills (RatSWD, 2018). To meet the increasing demand for training and individual knowledge acquisition, in the context of the German Consortium for the Social, Behavioural, Educational and Economic Sciences, our working group is developing the Research Data Management Competence Base (RDM Compas) – a training and information platform to foster RDM skills among data curators, RDC staff and early career researchers in the social sciences. In this central, nation-wide platform, we intend to collect training materials from RDCs and make them available to provide a collaborative online programme following the data curation lifecycle (Higgins, 2008). In this presentation, we provide insights into the development of the RDM Compas, a beta-version of which is expected to be available in April 2022. We will discuss several challenges to the development of such training platforms that we believe are particularly relevant to ensuring their success: (1) How to follow a collaborative approach in integrating existing training materials from RDCs in order to create synergy effects between data communities, (2) how to integrate materials into a comprehensive competence framework for creating a meaningful and applicable learning experience, and finally, (3) how to develop a platform guided by FAIR (findable, accessible, interoperable and reusable) data principles regarding the showcased training materials, thereby promoting data sustainability by design.
(Virtual) - Creating a national institutional framework for research data for sensitive data
Roxanne Missingham (Australian National University)
Nicola Burton (Australian Research Data Commons)
Effective research data management is an important consideration for all of Australia’s universities. The evolution to data by design has built on a history of evolution of skills assessment, national capability development and technical work funded through the Commonwealth government NCRIS program and its predecessors. In 2021, a program was initiated to develop a national Institutional Research Data Management Framework. 24 of the 39 Australian universities are participating. The framework will be developed, tested and validated by participating institutions, able to be applied across all of Australia’s universities. The outcome will address challenges including the growing burden of resourcing, the management of access and sensitivity, and the need to make decisions about longer-term retention and disposal. The project aims to uplift RDM capability of all universities by collaboratively developing a Nationally Agreed Institutional RDM Framework. It will inform the design of policy, procedures, infrastructure and services and improve the coordination of RDM within and between universities. By sharing challenges, experiences, advice, new directions and opportunities for collaboration, and testing these shared outputs locally, the twenty-five participating universities will produce a resource that will advance RDM for all Australian universities. The Institutional Underpinnings program has worked through the development phase, establishment of key areas of work and the editorial panel is currently finalising the guidance documents to be tested in the next tranche of activities. The paper will discuss the approach that has been taken to the co-design of the Framework, the essential elements of RDM that have been identified by the program participants, participant experiences of the process, and the scope of the testing in the next stage. It will reflects on the complexities of applying data by design at a national scale and the ideation taking into account state, national and territory legislation and funder policies.
Metadata quality and production: building a sustainable metadata ecosystem
Darren Bell (CESSDA)
Metadata is the oil that lubricates the data production and dissemination engine. Within and across CESSDA, there are many moving parts that contribute to delivering high quality and interoperable metadata. This presentation gives an insight into metadata pre-production and quality controls designed and implemented by CESSDA and will draw from the experience of the CESSDA Metadata Office in describing how clean, high quality metadata is generated from the outset using DDI Profiles and the CESSDA Metadata Model. This has been not only a technical journey but a pan-European collaboration between end-users and metadata experts who communicate in English but are often still speaking different languages. We describe the context of the CESSDA Metadata Model, its ultimate implementation in the CDC (CESSDA Data Catalogue) and EQB (European Question Bank), and its relationship with DDI Profiles. Lastly, we outline future developments in 2021 and how the CESSDA Metadata Validator can help repositories QA their published metadata.
Session F2
(Virtual) - Developing data management support partnerships and collaborations at the University of Florida
Plato Smith (University of Florida)
The developing data culture at the University of Florida (UF) includes but not limited to current partnerships and collaborations involving UF Research, UF Information Technology Research Computing, UF Clinical Research, UF Clinical and Translational Science Information Technology, and the Libraries. Currently, major stakeholders are developing an institutional data management policy started in 2019. Researchers continually struggle with (1) developing data management plans (DMPs), (2) finding a discipline-specific repository, and (3) making their research data findable, accessible, interoperable, and reusable (FAIR) in compliance with funding agencies’ data management and sharing requirements. The development of a sustainable data culture requires collaboration of major stakeholders to develop a standardized approach to data management with an institutional data management policy “to ease compliance and improve management of and access to the university’s intellectual assets” within and across academic units’ communities of practice (i.e. Medicine, Research). “Communities of practice are groups of people who share a concern, a set of problems, or a passion about a practice and who deepen their knowledge and expertise by interacting on an ongoing basis” (Macklin 2007; Wenger, 2000). This presentation will (1) articulate the work leading to the developing institutional data management policy at UF, (2) highlight partnership with UF Informatics Institute to conduct the online Fall 2021 Data Management workshop series (include six modules - select modules include data from data management surveys at UF and three real data management support use cases), (3) discuss data management survey for USDA-NIFA funded SmartPath researchers, (4) discuss development of a graduate data management course, and (5) highlight internal (i.e. UF Research and UF Information Technology Research Computing) and external collaborations (i.e. Figshare and Project TIER) that lead to the successful development of an approved but unfunded United States Department of Health and Human Services (HHS) grant proposal for FY 2021.
(Virtual) - Sharing the load: collaboration across the University of California system
Stephanie Labou (University of California San Diego)
Amy Work (University of California San Diego)
In past years, each of the 10 campuses in the University of California system worked individually to create programming and activities for GIS Day/Geography Awareness Week (November) and Love Data Week (February). Most of the time, the coordination of these events was placed on the shoulders of a single librarian. When COVID-19 forced the University of California System to go remote, individuals from UC campuses created a grassroots effort to develop and offer two system-wide events to target the ~500,000 students, academics, and staff in the UC System: UC GIS Week and UC Love Data Week. The events were hugely successful and all involved in the planning of these two events agree that we want to keep collaborating regardless of what happens with in-person events in the future. As we plan the third annual offering of these events, the question of sustainability becomes ever more important.Specifically, how can a team of volunteers across 10 campuses, each with full existing workloads, develop and host unique and noteworthy week-long event series on an annual basis? We will discuss the different approaches each of these events took towards collaboration and organization: UC GIS Week, which leverages a formal committee structure with defined roles and submission process for presentations; and UC Love Data Week, which relies on a looser, crowdsourced “data potluck” approach. We will also lay out our joint plan for next steps and identify the necessary resources and infrastructure to make these events sustainable and an integral part of the suite of resources available to UC affiliates.
A use case for building a researcher community around data sharing: The Swedish COVID-19 Data Portal
Arnold Kochari (SciLifeLab)
Katarina Öjefors Stark (SciLifeLab)
Parul Tewatia (SciLifeLab)
Anna Asklöf (SciLifeLab)
Liane Hughes (SciLifeLab)
Wolmar Nyberg Åkerström (NBIS/SciLifeLab)
Senthilkumar Panneerselvam (SciLifeLab)
Hanna Kultima (SciLifeLab)
In spring 2020, our team was tasked with building the Swedish national COVID-19 Data Portal (https://covid19dataportal.se). Our assignment was to build a platform for researchers that could act as an effective national research data sharing hub. Since its launch, the Portal has been visited over 130,000 times by more than 35,000 people, and we have provided support for data management and sharing to over 200 projects. In this talk, we describe our experiences in designing and operating the Portal that could aid others with building similar platforms for other research communities. We have identified several infrastructure components (software, tools, people, and ways of working) and features that were key to the success of the Portal. One example is the use of software that made it easy for researchers to contribute with specific content. Another is the development of content aimed at attracting the attention of relevant researchers, and offering data management guidelines and support ‘as a bonus’ alongside such content, rather than focusing solely on the latter. We also selectively provided publicity and increased visibility (e.g. by publishing “data highlights”, or creating dedicated sections) for projects following good data practices in order to incentivise other researchers to do the same. Something that we are keen to highlight is how the environment in which the Portal was developed was key to its success. In particular, from the start, we received a clear mandate from two large Swedish research funding bodies to develop and operate the Portal. These bodies explicitly required that any recipients of their funding must communicate with our team on issues related to data. Our team was also part of a large research institute, which facilitated close contact with a number relevant researchers early on, and subsequently encouraged others from outside of the institute to collaborate with us too.
Swedish National Data Service (SND) has a primary function to support the accessibility, preservation, and re-use of research data and related materials. Together with a network of around 40 universities and public research institutes, we strive to create a national infrastructure for open access to research data. As of January 2018, SND is run by a consortium of universities. The consortium consists of University of Gothenburg, Chalmers University of Technology, Karolinska Institutet, KTH Royal Institute of Technology, Lund University, Stockholm University, Swedish University of Agricultural Sciences, Umeå University, and Uppsala University. University of Gothenburg is the host university of SND, and the SND headquarters are located in Gothenburg. The nine consortium universities contribute with expertise through so-called domain specialists who have extensive experience and knowledge from different research fields and research data management. There is also a national network connected to SND, with approximately 30 higher learning institutions and public research institutes. Traditionally SND has received research data directly from researchers or research groups and makes them accessible. Today, SND is in a transition phase changing from a traditional repository to a network based collaboration. In order to meet the growing demands for open access, the operations are developing into a distributed large-scale model together with the 40 SND network members. In this new business model, the work with making research data accessible will gradually take place in the local support functions for research data that are being established with the network members. This model will be easier to scale up and it will be able to handle also sensitive data. SND´s new role is to facilitate the national work by training co-workers, developing and suggesting standardized procedures and IT tools, and to collaborate with other stake holders nationally and internationally.
Structuring Metadata Workflows to Support Data Sharing
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
Uwe Jensen (GESIS - Leibniz Institute for the Social Sciences)
Our classical Data Archive focus was on the documentation of already conducted projects within the social science area. For that, we use structured metadata, especially the DDI-Codebook standard, and tools supporting this approach. Over the recent years, a DDI-Lifecycle approach was added to that, for better re-use options in the context of some projects which actively do data collection or have longitudinal designs. The idea was to implement the “Tornado” approach (from the Generic Longitudinal Business Process Model: many cycles of design, use and re-use) to metadata. Currently, we follow a metadata capture and processing pipeline at GESIS with the ExploreData project, based on the DDI-Lifecycle standard. The pipeline addresses technical challenges like XML transformation, database import/export, and search index creation, including the underlaying different technology stacks. Our goal is that all metadata can be published for secondary data use or for further re-use in other data collection activities. Our presentation will describe our experiences with the implementation, show the lessons learned from automated versus human testing of metadata publication in retrieval systems, and will highlight particular challenges of managing study level and variable level metadata in the workflow pipeline.
Goal-oriented data discovery for effective analytics
Sainyam Galhotra (University of Chicago)
Yue Gong (University of Chicago)
Raul Castro Fernandez (University of Chicago)
Data is a central component of a myriad real-world applications involving business intelligence tools, machine learning and causal inference based analytics. Recent technological advancements have resulted in an explosion of data generated by a multitude of sources. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to leverage data that could boost the performance of a downstream task. However, current data discovery tools rely on the analyst's domain knowledge to search for datasets among the large repositories of potential candidates. The reliance on users' domain knowledge and skill to manually iterate over millions of datasets and identify useful ones is therefore time-consuming and tedious. In this work, we propose a novel data discovery framework that addresses the unique challenges in the path of meaningful and equitable access to data. The proposed data search and discovery techniques minimize user effort required to specify their requirements and relieves them from manually iterating over individual repositories. Our methodology allows users to focus on the downstream task, provides them with the power to easily navigate the maze of available datasets, and allows them to leverage datasets that were difficult to search manually. Our proposed techniques are generic to adapt to diverse real-world applications and user requirements. With a comprehensive suite of theoretically sound and empirically demonstrated methodologies, we demonstrate the ability of our system to ease access and usage of datasets.
Research data infrastructure and community at Stockholm university
Merlijn De Smit (Stockholm university)
The Research Data Management Team at Stockholm university started its activities in 2016, and currently has six full-time employees. Since then, the Team has developed its activities in line with Stockholm university’s explicit strategic support for Open Science. Among the services we provide to researchers, there is an on-line tool for writing data management plans (dmp.su.se), storage space at Sunet 200 Gb of which is provided to researchers and Ph.D. students at no cost, space curated by the RDM Team at four different data repositories, as well as various informational and educational events, such as an Open Science course for Ph.D. students. In this presentation, I wish to focus on our efforts to provide a research data infrastructure – both in “hard” terms such as technical solutions and in “soft” terms such as communication channels and networks – to the human sciences departments at Stockholm university. At the moment, these efforts take place mainly through physical and virtual visits to individual institutes, usually piggybacking on other events such as personnel meetings. Work to build up a network of research data coordinators at the individual institutions is currently at an initial stage. I will also take up challenges we encounter, which range from difficulties in establishing effective communication channels with researchers to the great diversity of “data” in the humanities and the way GDPR and ethical concerns are perceived to be an obstacle in the transition to Open Science and Open Data practices
A Reproducible Approach to Data FAIRification: A Case Study from the Environmental Health Sciences
Harrison Dekker (University of Rhode Island)
Yana Hrytsenko (University of Rhode Island)
This talk will provide an overview of a reproducible approach to promote data interoperability and FAIR compliance within STEEP, a multi-year, multi-institution research project funded by the National Institute of Environmental Health Sciences. The STEEP project addresses the ubiquitous human health threat of PFAS, a chemical found in a variety of common household and industrial products, and is representative of a growing trend of data-intensive, small team based research. A major impediment to the adoption of data practices to promote interoperability and FAIR compliance in projects like STEEP, is that in contrast to the unprecedented growth in the ability for small research teams to produce and analyze data, the resources available to prepare and manage data for long term availability are typically limited. A further challenge, particularly given the breadth of scientific disciplines involved in STEEP research, is the lack of familiarity with the technological requirements and emerging standards and practices for data publication and metadata creation. With these challenges in mind, the authors are developing a reproducible framework for data and metadata management based on the popular Jupyter Notebook platform and a variety of existing and well-supported Python modules.
(Virtual) - D-Psy-FAIR: The development of a documentation standard enabling a sustainable, high quality documentation of psychological research data
Katarina Blask (Leibniz Institute for Psychology)
Marie-Luise Müller (Leibniz Institute for Psychology)
Marc Latz (Leibniz Institute for Psychology)
Stephanie Kraffert (Leibniz Institute for Psychology)
Despite the potential to accelerate scientific progress and to foster a sustainable scientific practice, sharing data openly remains relatively rare in psychology. Furthermore, there are hardly any common standards for the documentation of psychological research data. In order to support this much needed cultural change and to provide researchers with an easy-to-use way to create high-quality documentation of their research data, we have started to empirically develop a discipline-specific documentation standard. Specifically, we began by investigating existing standards and their potential for the sustainable documentation of psychological research data. To this end, we explored the nature of necessary information for optimal reuse and as to how existing standards meet those requirements. Proceeding from those results, we defined a content specification for the documentation standard D-Psy-FAIR allowing for the FAIR and method-specific documentation of the entire research process. In order to test the standard´s specification as well as its usability, three user studies were conducted with samples composed of psychological researchers. The first two user studies revealed valuable insights about the information content as well as about the presentation form for optimal data reuse. In the third user study, we then tested whether the documentation standard could increase the reuse potential of psychological research data. In this talk, we will present and discuss the results of these studies, particularly in relation to the standard's potential to increase reproducibility and to help build a sustainable data culture in psychology.
(Virtual) - The Research Data Management Maturity Assessment Model in Canada (MAMIC)
Jane Fry (Carleton University)
Dylanne Dearborn (University of Toronto)
Alison Farrell (Memorial University)
In March 2021, the three federal research funding bodies in Canada (the Tri-Agency) released a research data management (RDM) policy stipulating that each research institution receiving Tri-Agency funding have an institutional strategy outlining expectations of best practices in handling research data, and how the institution will support its researchers in terms of data management. The RDM team from the Digital Research Alliance of Canada created guiding documents to help institutions with this endeavour. One suggested step is for institutions to complete a maturity assessment to determine their readiness to complete such a strategy. Although there are numerous international RDM assessment models available (e.g., the RISE model and the ANDS Data Management Framework Guide), it was found that none really fit the needs of the Canadian landscape or directly map to the requirements outlined in the Tri-Agency RDM policy. To this end, a working group was formed under the National Training Expert Group to develop a tool to assess the state of RDM support readiness at Canadian institutions. Building on the international tools already available, the first version of the RDM MAMIC was developed and rolled out in fall 2021. This presentation will detail expectations surrounding institutional strategy development as outlined in the new federal mandates. We will discuss the national approach to support for strategy development, and the need and purpose for the MAMIC within this context. We will also provide insight into the development of the MAMIC and detail how we learned from the efforts of groups in other countries. We will then provide an overview of the tool and show how it can be used to help research institutions as they develop their institutional strategies and conceptualize RDM supports.
(Virtual) - Data Sustainability by Professionalization of Research Data Management
Kathrin Behrens (GESIS - Lebiniz Insititute for Social Sciences)
Research data centres (RDCs) play an essential role when it comes to sustainable archiving and management of research data. The entire research data curation cycle is in their hands, ranging from the selection of suitable data in compliance with legal and ethical standards, to comprehensive archiving services, to ensuring user-friendly data access. These activities require basic and curation-specific research data management skills, especially with regard to the specificity of data from different scientific disciplines. Past experience shows that it is an enormous challenge for social science RDCs to attract sufficiently qualified staff. A working group from the Consortium for the Social, Behavioural, Educational and Economic Sciences therefore dedicated itself to the task of advancing professionalization in research data management, specifically on curation activities in RDCs. To this end, the working group is currently conducting a survey among the RDCs in order to ascertain the concrete needs with regard to the required development of competencies. The presentation is intended to provide insights into the competence needs for work in areas of research data management and to give an outlook on the measures planned by the working group. These include the development of a comprehensive research data management platform and the design of a certificate course for work in RDCs. They will be significantly involved in the design and implementation of the course to ensure a continuous exchange with the target group. From this, we expect a demand-oriented professionalization of research data management and curation activities in RDCs, with the conviction that highly qualified, competent staff is the basis for a sustainable data culture.
A National Strategy for FAIR Data Management in Denmark
Anne Sofie Fink (DeiC - Danish e-Infrastructure Cooperation)
A National Strategy for FAIR Data Management in Denmark For Denmark ‘The National Data Management Strategy based on the FAIR principles’ will be an important initiative towards more effective and better research, as well as increased confidence in the research conducted. Additionally the strategy is a first Danish response to the requirements of the European Open Data Directive the (PSI Directive) that national policies must be drawn up for making research data accessible in accordance with the FAIR principles. The strategy must work to ensure that research data generated via public funds, and possibly co-funded by private research-funding foundations meet the FAIR principles. The target group for this strategy is researchers and management at Danish universities, preservation institutions, which support research, and other institutions that conduct research using public grants. The strategy lays out a number of principles and identifies actions and initiatives aimed at advancing the process towards a successful implementation of the strategy, including the development of practices for how research data are handled. The strategy uses the term data, understood as the digital content of any kind that can be referenced in the form of data sets, files and databases, etc. This may include all types of digital research output that form part of the research. The purpose of the strategy is thus to contribute to taking significant steps towards making research data more FAIR and to research data being treated and recognized as a resource shared in national and global data infrastructures in ways that stimulate open research and open innovation. The presentation will outline the principles, actions and initiatives in the strategy and the recent actions towards implementation of the strategy in order to initiate discussions of national responses towards FAIR data management and the agenda of open data.
Strategy to document and disseminate longitudinal surveys: case study of using DDI-Lifecycle and Colectica
Lucie Marie (Sciences Po, Center for Socio-Political Data)
With the Open Science movement, the patterns of data sharing are evolving. In this context, the French Center for socio-political data (CDSP) undertook an experimental project - UpMet (Upscaling metadata for increasing reuse in the social sciences) - that aimed to build a question bank using DDI-Lifecycle and Colectica - Designer, Repository and Portal. Aside from compliance with the CESSDA repositories standards, so that data and metadata may be harvested in the European Question Bank, this new tool was implemented for two additional purposes. On the one hand, to make data more discoverable and findable at the variable level. On the other hand, to pave the way for a time-efficient internal documentation protocol relying on a metadata model at the variable level – a reference document where data managers may find data organization paradigms. Firstly, we’ll outline the main implementation steps of DDI-Lifecycle using the Colectica software packages, including challenges met to build the question bank. Then, we’ll demonstrate the benefits in terms of data discoverability and reusability from users’ perspective. After reviewing internal and external outcomes, we aim to discuss opportunities of implementing Colectica in the overall ecosystem of research data dissemination at the CDSP. More broadly, the presentation will share feedback about the entry cost of upgrading data documentation standards (from DDI-Codebook to DDI-Lifecycle), as well as the process of enhancing metadata documentation quality, especially harmonization of items and work on variable-level granularity.
Panel 6: Democratising data: A service design approach to the creation of public data panels
Democratising data: A service design approach to the creation of public data panels
Elizabeth Nelson (Administrative Data Research Centre Northern Ireland (ADRC NI))
Frances Burns (Northern Ireland Trusted Research Environment)
Maíra Rahme (Big Motive)
Andrea Thornbury, (Belfast City Council)
Overview Using the pilot of the Northern Ireland Public Data Panel (NIPDP) as a case study, this panel will explore how best practice is being created, identifying gaps in approaches, investigating assumptions, and validating how the NIPDP can serve as a blueprint for multi-stakeholder data engagement infrastructure in other localities. There will be four presentations and a panel discussion, involving the audience in thinking through topics identified during the session. 1. Data for public good Ecosystems involving data innovation are commonly required to engage the public in matters of consent and social licence; how this is done, however, varies widely and often falls short of desired levels of participation and understanding. To address a multiplicity of approaches a novel consortium-based approach has informed the pilot of the NIPDP. 2. Citizen Office of Digital Innovation (CODI) Belfast City Council has developed several data ambitions including a Citizen Office of Digital Innovation, with synergy to the NIPDP. This will develop a capacity building programme to directly involve citizens in data-enabled projects, building skills to support development of new products and services, and fostering transparency and public trust. 3. The Northern Ireland Public Data Panel Current approaches in data-focused work have identified a novel challenge: how to speak to publics about how their data is used, while reflecting nuance and democratising the process? Addressing this question requires innovative solutions that enable public involvement throughout the data-use cycle. 4. Data by design BigMotive supported the development of the NIPDP pilot with their expertise in user-centred design approaches. Facilitating exploration of different prototypes which were tested with the aim of engaging the public in data decision-making.
Session G2
(Virtual) - Investigating teaching practices in quantitative and computational Social Sciences: a case study
Rebecca Greer (University of California, Santa Barbara Library)
Renata Curty (University of California, Santa Barbara Library)
Data education is gaining traction in higher education across disciplines and degree levels. Teaching data skills in the Social Sciences in today's data-driven world is vital for preparing the next generation of data literate and critical social scientists. The ability to identify, assess, analyze, and communicate well and responsibly with data is a skill scholars and professionals need to navigate dynamic and expansive information ecosystems. In response, instructors have adapted their curricula and pedagogy to foster the necessary skills and theoretical knowledge to advance students’ computational and statistical praxis. This paper reports the findings of a local report of a larger national project with other 19 academic participant institutions. It discusses ways academic libraries in association with other campus partners can better support students and teachers in the Social Sciences as they entertain quantitative and computational approaches to deal with pressing contemporary social issues. The study's goals were: 1) Explore pedagogical techniques and support needs in teaching undergraduates with data and 2) Provide actionable recommendations for stakeholders within and outside the library to inform new services, policies, and practices to advance data instruction in the Social Sciences. Interviews were transcribed and coded in MaxQDA. The results of our local assessment revealed that the core learning goal of interviewees is to develop students' critical thinking skills with data, including: 1. A conceptual understanding of the research methods employed by Social Scientists; 2. The ability to critically evaluate research methodologies, findings, and data sets; and 3. Develop prowess using quantitative and computational tools and technologies to aid them in this process. A recurring theme across interviews was students’ fear of math and technology and the challenges it poses to data-related instruction. Instructors value participation in a community of practice and are eager for more institutional support to advance their own computational skills.
Bringing data literacy into teacher training: challenges and perspectives
Anne Lehmans (Bordeaux University)
Vincent Liquète (University of Bordeaux)
Camille Capelle (University of Bordeaux)
The use of data as objects and tools for knowledge building is still underdeveloped in education. However, it constitutes a strong axis in the construction of a critical digital literacy in order for pupils and students to become citizens who are prepared to understand the logics and challenges of technologies that influence their activities and decisions, on the one hand, to innovate ethically and responsibly in the near and distant future, on the other hand. In France, the implementation of the Digital Competence Framework for Citizens (DigComp) into a platform (PIX), strongly emphasizes this focus on data, as does the roadmap of the Ministry of Education on data policy. Curricula in education have begun to make a - reduced - place for data literacy in recent years, particularly in secondary schools. The risk of these curricula is that they limit data literacy to the development of technical skills and disciplinary learning in computer science, while the social, political, economic and ethical issues go far beyond this perspective. An open multidisciplinary team, focused on information, library and communication sciences and the training of teachers, has been working for several years on the issue of the development of data literacy among teachers. This paper proposal aims to: - present the challenges and perspectives of research in this field, explaining its theoretical framework, which calls for a pragmatic and critical approach, - present the research methodologies centered on the analysis of the institutional framework, of the social representations and practices of data in education, through an ethnomethodological outlook on learning situations, - propose a reflection on the perspectives to be considered for the training of teachers in a complex, critical and political approach of digital and data literacy.
Collect, archive, publish, reuse. But what about the users?
Michaela Kudrnáčová (CSDA)
Ilona Trtíková, CSDA ()
The data archive in general has various functions. Its main purpose is collecting research data and archiving it. However, there is much more to it. Other responsibilities come with data archiving: partnership and collaborations with other archives and institutions, cooperation on international social surveys, promoting secondary data analysis and much more. Significant but often underestimated part is communication and systematic education in the data literacy field of our users which are both students and researchers but also the general but data-interested public. In 2020, we conducted a short survey employing user-centred design methods to define our typical users to have a better image of their needs. The results inspired us to re-design our website to provide more data-related knowledge as requested. Moreover, we are now regularly organizing events aiming at newbies to our archive and also events that have various topics such as migration, politics, covid-19 related research and more where we introduce the data that can be found in our data archive. In our contribution, we will present results from the 2020 survey, what they tell about their users and what we are doing to accommodate them and help them improve their skills to search and work with research data. We intent show results from a follow-up survey planned for the beginning of 2022 aimed at sharing data behaviour in the open science environment.
Regulations and recommendations about file formats
Benjamin Yousefi (Swedish National Archives)
Magnus Geber (Swedish National Archives)
Swedish National Archives has sent out a remittance concerning a draft of regulations about file formats for agencies in the public sector, https://riksarkivet.se/rafs/remiss/. The draft Is accompanied by extensive technical and legal comments and consist totally of 1600 pages. It is written by Benjamin Yousefi who spent about 8 years with different aspect of the developing work. It thoroughly goes through the technical aspects of file formats and fit these into the Swedish legal framework. It develops a model to evaluate and choose suitable file format. Referees implementations is advised as method to handle the problem that many format specifications not are precise enough which could lead to different “dialects” of formats, something which may cause problems when it comes to long time digital preservation. The regulations aim to cover all relevant digital formats and consist of extensive table were a high number of file formats describe and evaluated. There are two draft regulation. The first one concerns the creation of technical documents which may be public document within whole the public sector. The second one concerns the governmental sector and set up demands concerning digital public documents. The regulations also expect the agencies to concern in choosing suitable formats for their situation. Independently and parallel to the work in Sweden the international organisation OPF (Open Preservation Foundation) https://openpreservation.org/ has a working group doing an ‘International Comparison of Recommended File Formats’. It’s intended to enable comparisons between institutional practices in the digital preservation community and may serve as a tool for developing best practices with those of similar institutions in mind. That work may also be presented in this context.
Marjorie Mitchell (University of British Columbia Okanagan Library)
Mathew Vis-Dunbar (University of British Columbia Okanagan Library)
Nick Rochlin (University of British Columbia Advanced Research Computing)
The graduate research lab: birth place of so much research data. Many PIs struggle to find the time and resources to wrangle a lab’s worth of data. Many graduate students struggle to identify the best ways to manage, organize, and document their data. Undergraduate assistants struggle to adopt what may at best be loosely described RDM protocols. Handouts, workshops, and webinars help. But they leave a gap: moving from principles to practices that work for a specific lab in a specific research domain in a coordinated way. This session presents the development of a customizable graduate RDM Lab Manual. With a general resource as its backbone, it can be tailored to any lab. The materials are designed to be adaptable across disciplines, within initial input from an Advanced Research Computing (ARC) Specialist, a Research Data Management (RDM) specialist, and a Data specialist. After an initial meeting with the lab manager and/or principal investigator (PI), the team provide an interactive seminar to workshop the generic to practical implementations that work for the lab. Lab members work through, and agree upon, things like file naming conventions, file hierarchy structures, frequency and location of data back-ups, and their roles in these tasks. The end product is a lab specific protocol for one or more aspects of RDM supported by a general purpose, detailed resource guide, built collaboratively by lab members. The team assembled materials into an Open Educational Resource: slide templates, manual template, and links to a vast array of resources to provide support as labs individualize the manual to meet their operational needs. By training the next generation of researchers from the time they first join a lab, this Open Educational Resource (OER) alongside customized seminars will positively impact the quality of data sets, including documentation, and the reproducibility of research.
Chalmers e-Commons: One gateway to multiple services
Jeremy Azzopardi (Chalmers University of Technology)
Chalmers University of Technology in Gothenburg, Sweden conducts research and education in technology and natural sciences at a high international level. Following an in-depth internal analysis of the needs of digital research infrastructure and support, Chalmers e-Infrastructure Commons' (e-Commons was established by the president of Chalmers. E-Commons' mandate is to deliver integrated support to researchers at Chalmers within calculations, simulations, analysis and management of large or complex data. An integral element of e-Commons' mission is to provide data management support throughout the whole data lifecycle. This is offered via e-Commons' Chalmers Data Office, collecting expertise from data librarians, technical staff, and archivists and providing links to the supporting IT services. Initial focus is on a data management plan tool via Data Stewardship Wizard, data publication support via SND's DORIS data publishing tool and web catalog, and constant availability for data management support via its Digital Research Data engineer function. E-Commons aims for researchers to experience continuous flow in the research data lifecycle, integrating modular solutions into existing workflows. Researchers will have one point of access for all data needs, from which metadata can then be stored to and shared with several services: Project information is registered in Chalmers' project database (CRIS), from which relevant metadata is used to prefill a DMP, using Data Stewardship Wizard. DMPs can then automatically provide information to allocate storage, processing, and other resources. Relevant material/data can be marked for archival at an appropriate stage. Potential issues could also be flagged early on (e.g., presence of sensitive data). DMP contents can also be forwarded to SND's data documentation and publication system (DORIS), automatically providing metadata and marking relevant data for publication. Automating workflows between DMPs, local storage and DORIS is the focus of a pilot project between Chalmers, SND, and KTH university of technology.
More than just FAIRly interoperable. A new platform based on DDI-CDI.
Darren Bell (UK Data Archive)
Deirdre Lungley, UK Data Archive ()
We will demonstrate a prototype cloud platform developed at the UK Data Archive, bringing a range of existing data sources up to modern standards by forward migrating outdated metadata/schemas so that the platform is made interoperable by design, not just providing XML exchange formats for download as an afterthought. The development of DDI-CDI, the newest metadata standard in the DDI family, allows the storing and dissemination of very different data structures with a single, unified data model. With linked data interfaces and NoSQL databases as the principal infrastructure components, we can support new products and services for the research community. In particular, the development of services for machine-assisted disclosure assessment and more refined, machine-actionable rights metadata can be used to streamline and accelerate the delivery times for data to researchers, and to enable increased data availability in Trusted Research Environments. More granular disclosure metadata will also provide better metrics for decision making on where data should be analyzed in secure, restricted environments as opposed to more lightweight controls. Not only the model but the infrastructure has embraced innovation: for the first time, we have a “serverless”, cloud-based infrastructure, which means full decoupling from specific repository premises, laying the groundwork for future federated services. This prototype implements data subsetting and bespoke data product generation, allowing increased alignment between research infrastructures, and providing the basis for a more generalized solution for bespoke dataset dissemination for researchers. Researchers will have the ability to choose only the variables they need for their research.
EDSC 15 years of economic and financial data support: data stories, lessons learned and the road ahead.
Rob Grim (EUR)
In 2021 the Erasmus Data Service Center (EDSC) celebrates a 15 years’ anniversary of providing financial data support to students and researchers at the Erasmus University Rotterdam. Hosted by the university library at the request of the EUR economics, business - and management faculties, the EDSC provides a unique and highly valued portfolio of economic data services. In this paper 15 years of user data - both numerical and text - are analyzed and captured in catchy data stories that illustrate research trends in economics and finance. In addition, lessons learned will be shared and concrete directions will be provided for the next generation of operational data services. Economics and finance data support covers a vast range of topics and typically requires large volumes of data. With a rapidly growing number of data products and ever-increasing expectations from patrons and students regarding data accessibility, the EDSC is challenged to meet the expectations. Further challenges are identified for e.g., big data analytics services, FAIR-data support, machine learning and the skills set of academic data workers. The paper will illustrate how cloud platform data infrastructure services and metadata repositories can be used for deployment of innovative academic data services.
Steven McEachern (The Australian National University)
Amir Aryani (Swinburne University)
Peter Vats (Research Graph Foundation)
John Scullen (The Australian Access Federation)
The CADRE (Coordinated Access for Data, Research and Environments) Platform project has produced a prototype information graph for the CADRE Information Exchange. Work undertaken to combine multiple information sources reveals the value of collating, synthesising and visualising (1) information about researchers, their projects, and their history of working with sensitive data, (2) technical authorisation and authentication information and (3) sensitive data access request information. This panel session will enable a deep dive into the informatics and analytics associated with sensitive social science data management especially where research workflows work across multiple systems in institutional and national research infrastructure as a technical ecosystem.
Panel 7: Trust Standards, Support and FAIR Enabling Trustworthy Repositories
Trust Standards, Support and FAIR Enabling Trustworthy Repositories
Hervé L'Hours (UK Data Archive)
A session of four presentations followed by Q&A covering the standards and associated mutual support around data management and archiving. Existing and future approaches to providing support and the emerging expectations for FAIR data that impact FAIRenabling Trustworthy Repositories. As areas of work professionalise and the demands for interoperability through consolidated lifecycles and research infrastructures (e.g. EOSC) increase, the repository community is both in an important position to guide future directions while also needed to respond to rapid external changes. The emergence of the FAIR Data Principles and the related RDA FAIR Data Maturity Indicators present a set of object-level expectations that interact with the repository-level expectations of Trustworthy Digital Repository (TDR) standards such as the CoreTrustSeal. Emerging work, including that around the European Open Science Cloud (EOSC) is beginning to define how these expectations should be addressed for domain and disciplinary data including the social sciences and humanities. The focus is not only on certification through processes such as CoreTrustSeal, but covers the breadth of good practice in research data management throughout the lifecycle. Information on the CESSDA approach to Trust Support and its influence on other ongoing repository support programmes will be followed by an overview of current and future collaborations, including the alignment of FAIR enabling repositories with evaluations of FAIR digital objects. Two approaches to FAIR object assessment and knowledge exchange will then be presented
(Virtual) - From Cradle to (Grave) Sudden Death and Beyond: A Journey in Research Data Migration
Carla Graebner (Simon Fraser University)
Erin Clary (Research Data Alliance)
This presentation touches on the intersection between decommissioning an existing research data repository and Murphy’s Law. It will also emphasize the importance of and benefits to a good working relationship with partners and stakeholders. Simon Fraser University launched its research data repository, SFU Radar, in 2012 as a proof-of-concept project to support institutional data deposit activities. Using Islandora and Archivematica as a platform, Radar was designed to support data curation activities across disciplines and formats. Fast forward to 2021 and SFU Library begins its year-long planned decommission of Radar and data migration to Canada’s Federated Research Data Repository (FRDR)https://www.frdr-dfdr.ca/repo/. Until. Events. Happened--and a 12 month long migration turned into a 10 week race. We’ll discuss the migration process, and our ongoing work to design/redesign FRDR workflows that better support collection managers.
ARIADNE portal: Building a common resource infrastructure for archaeologists worldwide
Pablo Millet (Swedish National Data Service)
Johan Fihn Marberg (Swedish National Data Service)
The ARIADNEplus project is the extension of the previous ARIADNE Integrating Activity, which successfully integrated archaeological data infrastructures in Europe. In the ARIADNE portal about 2.000.000 datasets are now indexed currently from 12 different organizations, with almost 30 additional organizations being integrated in 2022. With no common metadata standards or systems between the organizations, this presentation will demonstrate how the harmonization, aggregation and enrichment of metadata and resources are conducted to ensure every resource in the portal follow the common metadata profile of the portal, with as rich metadata as possible. This is ensured by aggregating all original metadata into an RDF database using conversion recipes and by harmonizing original metadata from partners with commonly used thesauri and tools like Arts and Architecture Thesaurus, PeriodO and Geonames.
Metadata for the Masses – Building an educated community of data users
Samuel Spencer (Aristotle Cloud Services Australia)
Lauren Eickhorst (Aristotle Metadata)
Increasingly, government and academic agencies are becoming aware of data as a valuable organisational asset that requires ongoing development and maintenance. The challenge is that many organisations do not have the institutional knowledge to build data and metadata registries to support the management and discovery of data. As these organisations look to software to fill these gaps there needs to be appropriate training and guidance on the use of technology to achieve the best value. Modern users expect not only appropriate documentation, but also interactive self-paced training and access to online communities of users to build their knowledge. When looking to improve data and metadata awareness and support new users, data professionals must be aware of these needs to be able to provide relevant and specific training. As new users are introduced to the Aristotle Metadata Registry, we have worked with news users looking to improve their metadata awareness and understand their training requirements. Through this feedback, we have built relevant training supported by an online interactive community of users to improve data communication skills and drive the adoption of good practices across government agencies. In this talk we look at the techniques used when building a metadata community and training methods used to turn novice users into metadata enthusiasts, and how these techniques can be used by other data professionals looking to improve general understanding of data practices. This talk covers practices such as choosing appropriate communication and vocabulary when introducing users, training and induction techniques when identifying audiences, and guidelines for self-directed training. We also explore community development methods, such as setting up communication guidelines, online forum options and guidelines for increasing interaction. Lastly, we use the Aristotle Metadata Community as an example to explore the successes and challenges of these techniques in practice.
(Virtual) - Secure Data Facility Professionals Networks – what’s in it for your TRE?
Beate Lichtwardt (UK Data Service/ UK Data Archive, University of Essex)
James Scott (UK Data Service/ UK Data Archive, University of Essex)
John Sanderson (UK Data Service/ UK Data Archive, University of Essex)
Secure Data Facilities, sometimes also referred to as Trusted Research Environments (TREs), provide secure access to controlled data - data that are too detailed, sensitive or confidential to be made available otherwise. These data cannot be downloaded by the researcher, they have to be accessed via a TRE. Researchers must be approved and trained, projects specified and time-limited, and research outputs undergo Statistical Disclosure Control (SDC) checks before release (The Five Safes Framework). For the past 10 years, there have been few such facilities in the United Kingdom. With the launch of the UKDS SecureLab in 2011 (formerly ‘Secure Data Service’ (SDS)) the UK Data Archive was leading the way in offering secure remote access to controlled data. Although the fast changing data-landscape now begins to see more services starting to offer remote access, it is still a diverse picture. Nationally, and internationally, access to controlled data ranges from on-site access only, remote execution portals, to remote desktop access, such as UKDS SecureLab. In the UK, a ‘Safe Data Access Professionals’ Network (SDAP) exists to share expertise, best practice, and knowledge between organisations engaged in providing secure access to controlled data. This has proven to be an invaluable resource for all involved. With the growth of international projects in recent years, aiming to facilitate transnational access to controlled microdata for research, an international Secure Data Facility Professionals Network has been formed as part of the ‘Social Sciences and Humanities Open Cloud’ project (SSHOC, WP5.4) to share expertise, knowledge, and ideas. This presentation will introduce the resources and potential of these two existing Secure Data Facility Professionals Networks, one national, one international. What is in it for your TRE? What would you like to be in it for your TRE?
Session H2
CADRE Five Safes Framework
Steven McEachern (The Australian National University)
Heather Leasor (The Australian National University)
Marina McGale (The Australian National University)
Julie McLeod (University of Melbourne)
Kate O'Connor (La Trobe University)
Nicole Davis (University of Melbourne)
Ingrid Mason (The Australian National University)
The CADRE (Coordinated Access for Data, Research and Environments) Platform project has produced a framework to guide the development of an information exchange that will underpin a decision-support system, that draws heavily on the Five Safes framework. As a first step towards operationalising the Five Safes framework to streamline researcher access to sensitive data – this interpretive exercise was broken into three major sections: (1) the context for the framework development is in Australia (2) the conceptualisation of the Five Safes grounded against requirements from social science researchers overseeing access to quantitative and qualitative data as the basis for capturing relevant information (3) the means with which to operationalise the framework including information and data models. In November 2021 the first version of the framework was published and circulated to the CADRE international advisory board and to experts in project partner and affiliate organisations for feedback. This presentation will include: lessons learned in developing the framework; the critical feedback from experts in sensitive research data management incorporated in the latest version; and, reflections on moving from case-by-case basis decision-making to more systematic and enhanced means for enabling access to sensitive data for research.
(Virtual) - Deposit Options to Enhance the FAIR Principals
John Marcotte (ICPSR, University of Michigan)
Sarah Rush (ICPSR, University of Michigan)
Kelly Ogden-Schuette (ICPSR, University of Michigan)
Data repositories must adhere to the FAIR Principles as a minimum standard rather than a goal. In addition to making data findable, repositories should strive to also make data discoverable. Discoverable data appear in many kinds of searches and in contexts such as topics, variables, and funders. The keys to discoverability are metadata and search engines. Metadata is important in making data accessible and interoperable. Discoverability is important as data repositories are tasked with accommodating an increasing variety of data types. The challenge is that no single repository can handle all types of data; researchers want to analyze these different data types for the same project. The DSDR project at ICPSR has expanded its deposit options to enhance discoverability, accessibility, and interoperability. DSDR has developed four types of deposits. (1) Exclusive. Data are deposited only with DSDR. With an exclusive deposit, DSDR is in the position to add the most value to the data by enhancing the data with online analysis, a bibliography, and restricted-data access options. (2) Mirror. Data are available through multiple repositories. DSDR does not typically add extra value. A mirror deposit increases the accessibility of data. (3) Variable-level metadata. DSDR makes data that it does not host discoverable through searches of topics, variables, and funders. The data are accessible through another mechanism such as the data collector’s website. Variable-level metadata deposits are particularly valuable for on-going studies. (4) Study-level metadata. This deposit includes a description of the study and the type of data. Study-level metadata deposits enable DSDR to make studies with different types of data discoverable through the DSDR catalog, such as genomics data and brain images hosted in other repositories. These options enable DSDR to catalog more data than it hosts. A replete catalog is essential for enhancing the FAIR principals.
National Population Register (NPR) in Bangladesh perspective
Chandra Shekhar Roy (Bangladesh Bureau of Statistics)
The active development of various register systems in Bangladesh which interact poorly with each other and do not constitute a holistic system. The need is to consider the experience of advanced European countries in creating a full-fledged NPR. Therefore, the paper will focus on the development of NPR. In order to fulfil the objectives, baseline data was obtained from the Bangladesh Bureau of Statistics in collaboration with the National Household Database (NHD) project under the Ministry of Planning. In NHD, all the households were recorded as census-like fashion. A total of 14 basic types of data is available for creating NPR. The primary element of the system is the Personal Identification Number (PIN). This unique identifier has been the core element of the NPR. Another element in the system is the Family tree. NHD also covered family relationships and dwellings, allowing data on individuals to be linked by family and household head. Some 10 or 11 digit PIN number will be generated by the NPR authority with considered as standard practice. The PIN number for a child will be generated when parents register the child’s birth. Local ‘Birth & Death Registration’ office will be responsible for NPR data updating. The secondary element in the system is “one person one file” (OPOF). Each personal file in the NPR will hold the history of changes in terms of registered date and the action that triggered the change of personal data in the register. Once the OPOF objective is ensured, the second objective, “register once-multiple use”, become important. The other objective ensures data quality and the proper use of NPR data. The novelty of the paper lies in a generalized and comparative analysis of the population register of Bangladesh.
(Virtual) - Shifting into Data Governance roles: Encounters of three data librarians
Heather Coates (IUPUI University Library)
Kristin Briney (California Institute of Technology)
Abigail Goben (University of Illinois - Chicago)
As research data management and sharing has become ubiquitous, the need for data governance — coordinated decision-making around research data across all levels of an institution — has come to the forefront. Data governance is needed to address immediate and changing issues such as emerging funder policies as well as the ongoing challenge of researchers leaving an institution. Data governance often falls under the purview of information technology units. However, this technocentric approach may conflict with the values and real world aims of university research, resulting in policies and practices that create additional barriers or disincentivize unconventional processes. Due to the traditionally hierarchical nature of research institutions, there is a need for broader engagement and representation in governance structures. Currently, data governance typically reflects the priorities and perspectives of those who are white, able-bodied, and male. While this is evolving, there is a specific need to identify and include the communities who have been previously excluded from decision-making and to ensure their participation in order to anticipate potential governance problems across a range of scenarios. Due to their familiarity with working across disciplines and throughout their organizations and expertise in areas like data sharing and preservation, library data professionals should be key partners in data governance processes. At our institutions, each of us has observed common challenges and witnessed the need for more participatory data governance practices. Seeing these issues, as librarians working with data, we’ve raised our voices and used our established credibility to bring together the disparate groups and to ensure library expertise is utilized when policy and practice decisions are being made. This presentation will describe how three data librarians have engaged with data governance and identified opportunities to advance more transparent and collaborative data governance practices.
Session I1
(Virtual) - More than just infrastructure - How collaboration is essential to building an open research data repository community in Canada
Scholars Portal Dataverse (https://dataverse.scholarsportal.info) is a publicly accessible, secure, multi-disciplinary, and bilingual research data repository that has grown over the past ten years into a national repository service. Fifty-nine post secondary institutions from across Canada now subscribe to the service, with each institution responsible for managing an institutional Dataverse collection, and for supporting its researchers in depositing and sharing research data. In this presentation, we describe how the development of the service has been accompanied by community-building initiatives to better support institutional Dataverse collection administrators and Canadian researchers. We outline current efforts to foster a Community of Practice, an undertaking coordinated by Scholars Portal in collaboration with the Dataverse North Expert Group, and with support from the Digital Research Alliance of Canada (formerly NDRIO-Portage), a national organization that funds the development of digital research infrastructure and services. The Community of Practice project seeks to make spaces for open exchange of knowledge, through which the nascent community might come to recognize itself. In addition to building local capacity, this could lead to more formal collaborations, for instance to develop training and outreach materials. We explore areas for which the Community of Practice could provide specific feedback for the Scholars Portal team that would assist in setting priorities for technical or service development, including equitable support services that mitigate regional disparities and the enhancement of specific features of the Dataverse repository software itself. We argue that these types of external collaborations and partnerships are crucial for the success of the research data repository community.
(Virtual) - Many hands make new opportunities: Institutional collaborations in supporting data stewardship
Alicia Hofelich Mohr (University of Minnesota)
Lisa Johnston (University of Minnesota)
Data stewardship requires a confluence of knowledge on how to properly secure, backup, version, automate, share, and otherwise care for one's data. Coordinating resources to support good data stewardship in a research university is especially challenging, as responsibilities for providing support and infrastructure are often distributed across many offices and people whose varying motivations and perspectives on data stewardship may conflict. For example, one office may be tasked with reducing university operating costs while another is focused on scaling up costly, large-scale research; similarly, regulatory offices may focus on securing and protecting human participant data, while others may be helping researchers to make more of that data available for reuse and reproducibility. When it comes to the broad scope of data support, collaboration is key. This presentation will describe a collaboration at the University of Minnesota (USA) that spans Libraries, Information Technology, the Office of the Vice President for Research, and various research support offices to inform and shape university cyberinfrastructure policy and service decisions. As an institutional-wide "Research Cyberinfrastructure Champions" network bringing together these groups at our university, this initiative can more effectively address institutional data stewardship than any one unit or department. Several accomplishments of this group will be discussed including: 1) the creation of a storage selection tool, which presents features of over a dozen IT storage offerings in an easy-to-navigate interface, providing information about the cost, capacity, backup, and workflow considerations of each; 2) a storage restructuring initiative to shape the development, promotion, and implementation of campus storage, using language, policies, and functionality that serves researchers; 3) the update of the University’s Research Data Management policy which went into effect in 2015. We will discuss these projects, our structure and evolution, as well as the lessons learned and future outlook of this collaboration.
Keep it Simple, Stupid: Designing a stripped-down DMP Template
Kristi Thompson (Western University)
Elizabeth Hill (Western University)
In Canada the long-awaited Tri-Agency policy on Research Data Management was finally released in early 2021. This policy is being rolled out in stages and is starting to require that applications for research funding that include data collection incorporate data management plans (DMPs). As DMPs have not previously been required in most Canadian funding calls, researchers and institutions supporting them are naturally apprehensive about these requirements. At Western University, a library-led working group including representatives from across the institution formed to consider these issues. The group also included members from research ethics, research development, IT, faculty and university administration. The need to support researchers in writing DMPs was recognized as an initial priority. Canada’s DMP Assistant, adapted from the Digital Curation Centre’s (DCC) DMP Online tool, provides a solution for helping novices write data management plans that meet the new requirements. The DMP Assistant walks researchers through a series of questions and provides guidance for answering them. While the group approved of the tool in principle, we felt that many of the questions were redundant or confusing, the way the questions were worded assumed a level of knowledge that many faculty members would not have, and that the guidance supplied was too general. Fortunately, the DMP Assistant allows the creation of custom templates with institution-specific guidance and questions, and the group decided to modify the default template to counter the issues that we saw. This session will discuss the differing perspectives the members of the group brought to this discussion, some of the issues raised with the existing templates, and the choices we made in coming up with a simplified template. We will also share responses from faculty who beta-tested the customized tool.
Collaboration is key: data discovery in a time of crisis
Alle Bloom (UK Data Service, University of Manchester)
It is in the interest of the whole data community that service providers ensure they are equipped to handle the needs of users in times of crisis. This includes ensuring users can find relevant, timely data and adapting quickly to provide training which meets changing demand. One way to approach this is by building strong networks of collaboration to promote data discovery on both a local and international level. The COVID-19 pandemic in particular has highlighted the requirement for service providers to respond dynamically to users' data needs. This presentation will outline how the UK Data Service training team approached this, exploring how we worked to ensure users were equipped to find and utilise useful data, through the collation of resources and the promotion of data discovery. Particular attention will be paid to the value we found in both local and international collaboration to help provide researchers with the data and tools they needed. By understanding this approach this presentation will aim to answer the question of how data services can use collaborations to best aid researchers in discovering and using data in times of crisis, now and in the future.
Panel 8: Pan-European Research Data Infrastructure: Challenges of the Changing Environment
Pan-European Research Data Infrastructure: Challenges of the Changing Environment
Michaela Kudrnáčová (European Data Archive)
Georg Lutz (FORS - Swiss Centre of Expertise in the Social Sciences)
Yana Leontiyeva (ČSDA - Czech Social Science Data Archive, Institute of Sociology of the Czech Academy of Sciences)
Nathalie Paton (CNRS)
Ivana Ilijašić Veršić (CESSDA)
Max Petzold (SND)
A key CESSDA objective is to be a pan-European research data infrastructure covering the European Research Area as fully as possible. CESSDA members are countries, currently with 23 members, but the CESSDA widening activities target a total of 44 countries. At the same time, European as well as national policies in the field of Open Science are dynamically evolving, new types of data, new methodologies and new technologies are influencing the research environment, and the importance of interdisciplinary collaborations are increasing. Thus, national data services are also undergoing dynamic transformations, and the diversity between countries is increasing. All of this has a great impact on new opportunities and directions for the development of international data services. In addition, the pandemic has changed the research environment and threatens budgets. How is the role of domain research data infrastructures shaped by the changing environment? What is the possible impact on the landscape of European social research data infrastructures? Are the benefits of CESSDA membership affected? How is CESSDA reacting and how should it react? The Panel session will take 90 minutes altogether. Chair and moderator: Jindřich Krejčí (CESSDA Widening and Outreach WG Leader) Presentations (working titles; 45 minutes altogether): • Georg Lutz (ESFRI SCI-SWG Acting Chair): Clustering national and European infrastructures to facilitate participation in SSH infrastructures • Nathalie Paton (STERE project): Feedback from CESSDA member countries • Yana Leontiyeva (CESSDA Monitoring Task): Monitoring new developments in the landscape of the European data archives Panel discussion (45 minutes) will start with an opening statement including initial question(s). There will be 5-6 discussants: the three speakers (see above), Ivana Ilijašić Veršić (CESSDA Main Office), Max Petzold (director of SND), and a representative from Swedish Research Council/Swedish ESFRI SCI-SWG delegate.
Session I2
(Virtual) - Factors Affecting Deposits in Data Repositories
Michele Hayslett (UNC at Chapel Hill Libraries)
Matt Jansen (UNC at Chapel Hill Libraries)
What repository characteristics and approaches are associated with the successful recruitment of research data deposits? While quite a few studies outline researchers’ data management needs and how repositories plan to meet those needs, few have assessed the success of various approaches. The presenters conducted a survey of repositories in Fall 2019 to collect information about factors potentially related to high data deposit rates. Participants reported on many topics from budget and infrastructure to promotional methods and services provided. The presentation will discuss the outcomes of eight hypotheses related to staffing, curation services, and promotion methods, among other things. Although survey response was relatively low, several correlations were significant and can offer benchmarking for future studies, as well as immediately actionable information for repositories.
Where is the majority of institutional research data hosted? – how Universities can keep oversight of data and deploy effective RDM practices
Federica Rosetta (Elsevier)
Lorenzo Feri (Elsevier)
The past few years, driven by funder mandates, research data management (RDM) has taken increasingly center stage in the context of managing the research life cycle. There has been a rapid growth in the appetite for making research data publicly available. This has been triggered, in part, by mounting support for open science, concerns over research integrity, and the launch of initiatives such as the FAIR principles for RDM. In response, the number of open data repositories has risen sharply, along with open data requirements attached to research funding. Many involved in the research ecosystem, from policy makers and funders to publishers and institutions, have adopted new research data guidelines and practices. While some welcome this greater transparency, for those tasked with their institution's research development, the shift to extend the research life cycle management to research data brings a unique set of challenges. For example, most researchers deposit their data outside their institutional data repository. Our analysis of RDM practices at 11 institutions* suggests that up to 90 percent datasets are hosted on one of the many external general subject or domain-specific repositories. In this presentation, we aim to provide further insights on this analysis and the team of a leading university will share their experiences with advancing RDM practices by leveraging their Research Information Management System and specialized RDM tools. They will shed light on how this has helped them shaping their strategic thinking throughout the research lifecycle. Participants will hear both about recent findings on the evolution of the research data landscape and as well as learn how to operationalize an Open Data strategy making use of their research development tools. * Zudilova-Seinstra, Elena; Zigoni, Alberto; Haak, Wouter (2020), "Analysis of research data for 11 Institutions - Data Monitor", Mendeley Data, V3, doi: 10.17632/k5p45z33kb.3
DataVault - Digital Preservation Meets Big Data
Robin Rice (University of Edinburgh)
The University of Edinburgh’s Research Data Service was established from a prior umbrella program in 2016 to provide the tools and support required for its researchers to manage their data well and comply with their funders’ and university requirements, such as data management planning and data sharing. The two bulwark components of the service, DataStore and DataShare, cover storage during the life of a research project and open access archiving after the project finishes, respectively. However, not all research projects can share (all of) their data outputs and records openly, and not all research projects generate data of a size that can be easily downloaded. For these cases, the low-cost retention solution invented to fill the gap was DataVault. A research project which is active may require large amounts of disk space which can be purchased from research grant funds. When a project comes to an end, more datasets and data records than can or should be published may need to be retained. For ‘big data’ projects, a low-cost solution such as DataVault, which stores two copies on tape in different locations and a third disaster recovery copy in a cloud service, can make storing such data for an interim period such as ten years or more affordable, especially when paid up front. DataVault uses basic digital preservation techniques such as metadata capture and fixity checking in addition to storing three copies. The system automatically encrypts deposits to safeguard sensitive data. A DOI is assigned, and a record of each project’s ‘vault’ is publicly linked to other project outputs through the university research information system. Roles determine who can retrieve deposits onto disk, and give permissions to others. A governance framework allows expiring vaults to be reviewed and transfers ownership to the university when the principal investigator leaves the institution.
NSD DMP – towards a FAIR ecosystem for data management planning
Trond Kvamme (NSD - Norwegian Centre for Research)
NSD - Norwegian Centre for Research Data - has recently released a set of integrated tools and services that are aimed at researchers and institutions, and will help them in making their data FAIR. The new portfolio of services consists of three new tools: a Data Management Plan tool, a Data Policy Manager, and a Data Management Plan Registry. NSD Data Management Plan introduces new machine-actionable functionality with a built-in module that classifies data in terms of confidentiality and level of data security. The classification module provides policy recommendations for collecting, storing and archiving data based on the classification of data into either open (green), restricted/internal (yellow), confidential (red), or strictly confidential (black) categories. The DMP tool is integrated with the Data Policy Manager which allows institutions to design interactive and machine-readable policies that can be linked to internal and external systems as needed. The tool enables institutions to define their own policies for a wide variety of general and institution-specific storage services, data transfer applications and data collection tools. The institutionally defined policies will be integrated into the data classification module in the DMP tool. In addition, the Data Management Plan Registry will provide the institutions with access to and an overview of all shared project DMPs that are registered under their institution. The registry will provide information harvested from the DMPs and will include project description, information about data classification, data archiving, etc. This portfolio of services will hopefully contribute to cultural change with respect to data FAIRness and is an initiation towards a broader national FAIR ecosystem for keeping research data as open as possible and as closed as necessary.
Session I3
(Virtual) - Achieving Transparency in Data
Dan Gillman (US Bureau of Labor Statistics)
The US Committee on National Statistics just issued a report on transparency called "Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies". Consistent with the report, data are transparent when it is easy to perceive or detect where they are, what they mean, and how to use them. The focus of the report is on US federal statistical agencies, but much of it addresses transparency for any data. Some of the more generic subject areas include metadata, standards, and metadata systems. Here, we look at how these areas support transparency of data. Specifically, transparency is a generic concept, and it needs to be defined differently in each situation. Part of the reason for this is different kinds of metadata are needed in each new situation. For example, the kinds of metadata are needed to discover a data set are different than those needed to understand the contents of one. Kinds of metadata are organized in schemas, and this leads to standards. Discussing standards requires we talk about conformance, because conformance is a way to ensure a schema is properly followed. An example of conformance is validating an XML instance file against an XML-Schema. A validated instance conforms to the schema. But conformance is part of metadata quality, and we need quality to make sure the metadata adequately do their job - describe. Finally, we discuss needs of the systems necessary to deliver all this. We illustrate these ideas through examples.
(Virtual) - Adopting modern technologies and industry protocols to increase FAIRness of data archives, ESS as a service – a case study.
Archana Bidargaddi (NSD/Sikt)
As part of the Social Sciences and Humanities Open Cloud project (SSHOC) project, NSD (from 1.1.2022 Sikt) has in 2020-2022 upgraded the ESS infrastructure for data storage, management and distribution. The new ESS as a service implements the Open Archival Information System (OAIS) model in the cloud – and covers Ingest, Data Management, Preservation and Dissemination processes of digital asset preservation and dissemination. The metadata steered, API-based storage service has adopted multiple industry standard protocols for data documentation, storage, communication and infrastructure, giving ESS a state-of-the-art infrastructure for data management and dissemination purposes. With the new search functionality, users can search and find exactly what they are looking for in the 10 ESS rounds, 60 data files, 18,139 variables and 213 question texts. The DOI landing pages present rich metadata in comprehensible way along with flexible data download. Use of DDI-lifecycle metadata standard together with new Creative Commons data licenses, increased accessibility and interoperability have been achieved. A new solution for global authentication has opened data access, given better user experiences and simplified reuse of data. In parallel, a new data deposit solution and new data processing routines have been developed, all coming together in NSDs cloud-based integrated data management system. The successful implementation presents new ways of using modern technologies and industry protocols to increase FAIRness of data archives. The presentation will introduce and showcase the upgraded infrastructure and data management solutions.
(Virtual) - Data Science By Design: Building Data Science Services in Research Libraries
Joel Herndon (Duke University)
Elizabeth Wickes (University of Illinois)
Luis Martinez-Uribe (Fundación Juan March)
Over the last decade, data science has transformed academic disciplines and reshaped existing university curricula. As an increasing number of students and faculty adopt data science workflows that embrace programming languages, reproducible research, and other open science practices, how can libraries engage with this dynamic field of research? This panel brings together an international group of data librarians to discuss strategies and opportunities for implementing data science services in research libraries with a particular focus on consultations, instruction, and partnerships. The panel will consider how data science can help both researchers and libraries in their goal to build community and inform the public.
Lightning Talks
Would you manage a vibrant data mine?
Noé Nessel (Ministry of Education (Buenos Aires City))
The cryptocurrencies have revolutionized financial data management. Due to this situation, now we have to think: How will we manage all these interactions? How will we preserve digital information? and How will we reduce the environmental footprint of these systems? This new decentralized informational architecture is putting many banking entities in check. It allows, without intermediaries, data transfers at a global level. Which is why this intercontinental database is gaining more and more followers. For this reason, professionals are required capable of managing the frenzied number of algorithms and in turn ensuring the security of the entire informational chain.
Data CuRiosities: A Blog Using Popular Culture to Teach Data Curation Concepts in the Data CuRe Curriculum
Hannah Gunderman (Carnegie Mellon University Libraries)
Limor Peer (Yale University)
Curation is a set of practices that support organized, tidy, and reusable data and code, and the focus of the Data Curation for Reproducibility (Data CuRe) Training Program. While essential for creating and preserving transparent, reproducible, and impactful research, imparting an excitement for these practices can be challenging, especially with audiences outside of a data librarian/data curator community who may not share the same enthusiasm. This could be particularly true when contrasted with other, more “exciting” topics such as creating useful and demonstrative data visualizations, learning new programming languages, and interpreting the results of data analyses. To create more interest in curation practices, it can be useful to approach curation education through the engaging lens of popular culture. From the video games we play, the television shows we watch, the comics we read, to the music we listen to, there are many examples within popular culture that can be used as a gateway for learning about key concepts in the Data CuRe curriculum. In this lightning talk, we will introduce Data CuRiosities, a new blog highlighting the different ways we can learn about curation concepts in the Data CuRe curriculum through a popular culture lens, and offer advice for how data curators and other information professionals can create engaging learning experiences around curation and popular culture in their own settings.
Assigning DOI on datasets have long been a common practice for all of us. With an increasing need to put persistent identifiers to other types of digital objects other types of PID are needed. We are investigating how to provide ePIC identifiers as an extended service connected to DOI:s. This will enable citation of individual files or digital objects without the need to provide rich metadata for each item while still having metadata for the whole set in a DOI with relations to the ePIC identifiers. In this presentation we present the problems we face and our solution on how to solve it at SND.
(Virtual) - Scaling up data services with standard ontology
Olatunbosun Obileye (International Institute of Tropical Agriculture, IITA)
Adetoun Oyelude (University of Ibadan)
Hafeez Adepoju (International Institute of Tropical Agriculture, IITA)
Lots of work has been done on data. Some of the works involved data life cycle management, data FAIRness (Findability, Accessibility, Interoperability, and Reusability), data reproducibility and data stewardship. Some of them are promoted by data service professionals. In the past, data services worked in isolation from paper publishing units or organizations. However, with the need to validate research outputs and publications, data and paper publication services gradually converge. The convergence was able to meet the immediate quest for validating claims in scientific publications. There is still the global struggle to promote reproducible data that meets every FAIR principle, a concern that data services of the future can provide with integration of standard ontology as building block for open access/open data. Data services have focused more on data reproducibility and FAIRness in the past few years. This has yielded positive results in some thematic areas. Data reuse and interoperability enhancement supported by open access publications can be scaled-up further. Data service has the key to give researchers the future from today. The development of tools that will harmonize ontology and be supported by CC BY license with endear this. The future data service will not just support researchers on data management but integrate ontology as an integral part of information and data service. The service will be the bridge between knowledge management, data management and information management. Researchers will work collaboratively with data services from project conceptualization to end of life of the project. Standard ontology will govern the protocol design which will be pulled from a centrally developed database of ontologies (with both API (Application Programming Interfaces) and microservices for integrations) for different thematic areas. Data service stewards will have access to ‘new terms’ approval workflow system to maintain and add new terms to the central database.
Partnerships in Data Literacy
Ashley Peterson (UCLA)
Ibraheem Ali (UCLA)
Leigh Phan (UCLA)
Chris Lopez (UCLA)
Monique Tudon (UCLA)
Madison Juul (UCLA)
Mason Hardy (UCLA)
This panel will showcase several collaborative data literacy projects, and encourage audience members to consider how they might initiate data literacy efforts at their own institutions. The presenters, comprised of professional and student employees of the UCLA Library, represent a range of departments and skill sets. In our work, we foreground the values of peer learning, learner-centered design, and interdisciplinarity. The projects we will present share two common goals: connecting learners who are new to data-driven research with the wide array of data support services at UCLA, and engaging all learners in a conversation about how the data economy impacts our experience of the world. We will present three efforts toward achieving these goals. First, a STEM Data Librarian and a Humanities Research & Instruction Librarian will discuss their work toward creating a set of Data Literacy Core Competencies, which included forming several strategic partnerships at UCLA and conducting an environmental scan of similar documents at other institutions. In the next presentation, a fulltime Library staff member and an undergraduate student staff member will share an Introduction to Data Literacy webcomic tutorial. This presentation will highlight the uniquely collaborative process of asynchronous learning object creation at the UCLA Library, as well as the pedagogical affordances of visual media. In the final presentation, two student Data Literacy Specialists will discuss the UCLA Library’s Data Literacy Workshop Series. The Series explores topics that help learners understand their roles in the data economy, as both consumers of data-driven products such as search engines and social media platforms, and creators of data-driven research within the academy. Following the presentations, the panel moderator will lead an interactive conversation about how attendees might collaboratively build connections at their own institutions between existing data support services and new practitioners.
De-jargon-ifying the data support space
Paula Lackie (Carleton College)
Deborah Wiltshire (GESIS Institute for the Social Sciences)
What is the purpose of jargon? It both includes and excludes people. But this is counterproductive to the drive towards open science! For our users as well as ourselves, the dizzying collection of data-related languages, tools, organizations, services, and models is overwhelming.
Posters
The Association of Religion Data Archives: Democratizing Access to Quality International Religion Data
Andrew Whitehead (The Association of Religion Data Archives)
The Association of Religion Data Archives (theARDA.com) exists to democratize access to quality data. Founded in 1997 and first online in 1998, the ARDA has expanded dramatically over the years in both the target audience and in its data collection. Guided by FAIR data management principles, the ARDA now archives over 1,100 datasets. We provide numerous resources ensuring that the data we archive are easily accessible to researchers, as well as teachers, educators, and non-academic users like journalists and community leaders. In this demonstration I will highlight various resources on the ARDA, our international data collections and resources, as well as share how we archive the data and our extensive meta-data practices. I will share what we have learned concerning data management and archiving, as well as how to work alongside librarians and others interested in archival techniques. This demonstration will also share our various data visualization tools, underscoring the possibilities for quality data management and archiving that is accessible to the broadest possible cross-section of users. As the ARDA builds toward the future, we look forward to opportunities to share what we are doing and learn from others interested in data design and sustainability.
SWPyS: Scrape the Web. A Python solution. An application to Wellbeing and SDG indexes
Flavio Bonifacio (Metis Ricerche)
We present a solution for scraping public websites, extracting textual or numeric information such as data tables and related documents, archiving them, analysing the collected material and reporting the obtained results by using automated processes. The operations needed are automated by the system that we called SWPyS, Scrape the Web. A python solution. The general schema of the system functions is: {while a pertinent URL exists} [GETURL->SCRAPE->ExtraxtTransformationLoading->ARCHIVE->ANALYSE->REPORT]. SWPyS manages two subprocesses: the scraping robot and the analytic robot. The scraping robot is trained to find the relevant data or textual informations behind the website scene, to get it and to archive it in a properly designed DB. The analytic robot will access the DB and will analyse data and documents by using some Machine Learning and/or text mining techniques. We will show SWPyS operativity by presenting two training sessions. The first session will collect data and documentation from OECD Better Life Index and analyse them by using cluster techniques. The second session will collect documentation about SDGs (UN Sustainable Development Goals) from the Eurostat website and analyse it with some elementary text mining techniques (Word Clouds). All the operations may be scheduled for automatic periodic execution. We will present the system by using poster and demo sessions, and the presentation will also be described in a report. The system is conceived for in-depth analysis of Well Being Assessment models.
IASSIST- Africa Chapter: Advocating for Data Literacy in Africa
Winny Nekesa AKULLO (Public Procurement and Disposal of Public Assets Authority)
IASSIST -Africa Chapter was established in 2020 to advocate for responsible data management in Africa. IASSIST-Africa Secretariat is based in Kampala, Uganda. With the support from the IASSIST and East African School of Library and Information Science (EASLIS), Makerere University, the IASSIST-Africa Chapter was able to conduct a number of activities in 2021. For the first time ever, IASSIST was able to conducted a Regional Workshop and this was in Africa in January, 2021 and was hosted by EASLIS. This was a was successful hybrid event despite the Covid-19 situation. Attendees were from Africa, Europe and USA. In addition, Other activities included four regional webinars (West, East, Central, and Southern Africa regions) focusing on data literacy and research data management with an African and global perspective. These activities exposed and publicity IASSIST and its activities and how attendees can benefit from being members of the association. This poster will therefore show case the activities conducted and recommendations for possible collaborations and partnerships.
The principles, practices, and tools for managing data-sharing good governance by the Social Science Japan Data Archive
Nobutada Yokouchi (The University of Tokyo)
Satoshi Miwa (The University of Tokyo)
This poster presentation illustrates how the Social Science Japan Data Archive (SSJDA) has managed data-sharing good governance that is both ethical and effective. By ethical, we mean that data are shared with users under the terms and conditions provided by the SSJDA, and thus, individual respondents' anonymity is protected. By effective, we mean that the secondary use of data renders research outputs, which provide credits to and become incentives for depositors, hence forming a positive feedback loop among the SSJDA, depositors, and users. The SSJDA has developed and implemented various principles, practices, and tools to attain such good governance in data sharing. First, the related principles cover the whole data-sharing process. For example, different eligibility criteria for access are being set based on the academic status of the users. There are also sharing periods for most of the data, and the users are allowed to choose if they need an extension or to delete the data and terminate data sharing. Second, these principles are put into practice by professional staff who have data-sharing expertise. For example, the SSJDA's staff keep monitoring whether there are misrepresentations in applications submitted by users or if there are any users who have not deleted the data after the sharing period has expired. These staff also respond to user-provided inquiries on virtually any aspects of secondary use. Finally, the SSJDA has developed a its own system called "SSJDA Direct," which serves as a tool that can be used to put principles into practice. SSJDA Direct is a one-stop online service that allows users to search for deposited data and its metadata, apply for secondary use, download the data, and report their research outputs. The SSJDA is continuing to reinforce this system to further improve its data-sharing governance.
Improving Research output at Busitema University in Uganda
Emmy Medard Muhumuza (Busitema University Library)
Grace Adong (Busitema University)
Research output in scholarly communication is one aspect of information publication, dissemination, access and retrieval process in many research and academic institutions. This therefore calls for a modern storage platform for online access. This presentation discusses how research output can be improved in Ugandan institutions with Busitema University as a case study. The study will use a survey research design and will be administered to seven (7) institutions with open digital institutional repositories in Uganda, academic staff and the library users. Qualitative and quantitative research design will be employed. Data will be collected using online Google Form questionnaires which will be emailed to the respondents. The study will then provide the discussion of findings, and recommendations as a guide for future improvements.
Modernizing Data Management at the US Bureau of Labor Statistics
Dan Gillman (US Bureau of Labor Statistics)
Clayton Waring (US Bureau of Labor Statistics)
The US Bureau of Labor Statistics is undertaking a number of initiatives to improve the way it manages its data and metadata systems. Two examples include planning for the replacement of its public facing LABSTAT data query system and efforts within its Office of Productivity and Technology to combine multiple production systems within a single cross-divisional database platform. Within these projects, BLS views time series data as a combination of three elemental components. These components are found in all time series. They include a measure element; a person, place, and thing element; and a time element. The authors turned this basic approach into a more formal conceptual model represented in UML. The UML model describes multi-dimensional data, of which time series are a kind, and is very flexible in that it supports any kind of query into the data. The Office of Productivity has adopted the model, and it is guiding their approach moving forward. The model was also adopted by the Financial Industry Business Ontology project under the Object Management Group and, more importantly, by the DDI-4 Core development team for inclusion in that specification. There are other similarities between the OPT effort and DDI-4 Core as well. In this way, the OPT project demonstrates the feasibility and usefulness of many of the ideas in DDI-4 Core. In this talk we describe the time series formulation and the UML conceptual model. Then, the design of the OPT system and some of its features are described, relating those that are similar to DDI-4 Core where appropriate. In doing so, we provide a thorough understanding of the structure of time series, and we describe some of the productivity measures BLS/OPT produces as illustrations.
Best Practice May Not Be Enough: Variation in Data Citation Using DOIs
Homeyra Banaeefar (University of Michigan ICPSR)
Elizabeth Moss (University of Michigan ICPSR)
Sarah Burchart (University of Michigan ICPSR)
Eszter Palvolgyi-Polyak (University of Michigan ICPSR)
Citing research data with a persistent identifier, e.g., a digital object identifier (DOI), is typically recognized as part of best practice. However, authors do not always use provided DOIs accurately, or for their intended purpose. This is demonstrated in a recent project conducted by the NSF-funded Measuring the Impact of Curatorial Actions (MICA) project at the University of Michigan and by bibliographers for the Inter-university Consortium for Political and Social Research (ICPSR)’s Bibliography of Data-related Literature. MICA conducted an API query of Dimensions Plus (a large multidisciplinary database with over 69 million publications available for full text search), searching for use of over 11,000 ICPSR study DOIs. The result set of publications citing ICPSR dataset DOIs contained over 2,259 unique hits. The sample discussed in this poster is the large subset of hits that were deemed not collectible by the ICPSR bibliographers, who evaluated the results to determine whether publications met criteria for inclusion in the Bibliography. This poster describes the methodology used to examine the results and the most common reasons why a publication was not added to the Bibliography despite an ICPSR DOI being cited. It also portrays our analysis of the citations in a visual categorization. Based on the results of our analysis, we suggest further exploration of author citation behaviors and more institutional guidance regarding how and when to use data DOIs. Our results also highlight the need for archives to offer other options for citation, so that citation of data analysis can be differentiated from other types of attribution, e.g., brief data mentions, instrument use, or codebook quotations. (This material is based upon work supported by the National Science Foundation under grant 1930645.)
As data practitioners we adhere to key principles of protecting human rights and high ethical standards. What principles, practices and tools have you worked on around data access, especially where there may be added risk in data publishing and use. • Code of conduct (NSD - SSHOC) - CESSDA to reach out to Mathilde, Ina and Njaal. Code of conduct presentatin will include: A presentation will be given on work in SSHOC, WP5 and WP8 related to initiating a SSH GDPR Code of Conduct. To facilitate harmonisation across the EU/EEA and sectors, the European Union Commission has highlighted the creation and use of Codes of Conduct. A Social Science and Humanities GDPR Code of Conduct may lead to such a harmonised practice within the SSH environment. In WP5 5 and WP8 of SSHOC, the presenters have been involved in deliverables that intent to initiate a draft SSH GDPR Code of Conduct to be created. The presenters will inform on findings in the report “Draft SSH GDPR Code of Conduct” and “Recommendations for a GDPR Code of Conduct for SSH”. The presenters will thereby inform what a Code of Conduct is, why it is beneficial, how a SSH GDPR Code of Conduct can be created, by addressing necessary steps to be performed and will specify what such a Code of Conduct possibly can regulate.The presenters can answer to possible questions, if there are any.
GoTriple Discovery Platform for Social Sciences and Humanities
Ana Inkret (Slovenian Social Science Data Archives)
To support the visibility of research in the area of Social Sciences, Arts and Humanities (SSH), its effectiveness, reuse, and economical and societal impact, the TRIPLE project has been building a multilingual discovery platform. The GoTriple platform will provide a single access point for researchers, policy makers, enterprises and media to discover and reuse open scholarly SSH resources such as research data and publications in nine European languages (Croatian, English, French, German, Greek, Italian, Polish, Portuguese, Spanish), currently scattered across local repositories. The platform will also allow users to find and connect with other researchers and projects across disciplinary and language boundaries, to use innovative tools to support research (e.g. visualisation, annotation, trust building/social network and recommender system), and to discover new ways of funding research (e.g. crowdfunding). The first prototype of the platform has been released in October 2021, offering access to over 6 million publications, 12,000 datasets and 2.8 million authors. An update is expected in March 2022, with the final release made public in January 2023. The poster will underline the expected contribution of the discovery platform and project work to the objectives of Open Science, interdisciplinary and cross-cultural collaborations, and improved access to SSH resources. The TRIPLE project is financed under the European H2020 program for a duration of 42 months and involves 19 European partners from thirteen countries.
The poster will define key concepts and the implications of the IASSIST values statement on social science data practice. Feedback will be gathered and applied into the development of a future IASSIST ethics statement.
Workshop 1 - How to set up and configure a Dataverse repository that suits your needs
Geneviève Michaud (CDSP (Sciences Po, CNRS))
Baptiste Rouxel (CDSP (Sciences Po, CNRS))
Alina DCDSP (Sciences Po, CNRS) ()
How to set up and configure a Dataverse repository that suits your needs, a hands-on session. Since its creation, the Center of Socio Political Data of Sciences Po has been committed to design and develop services to support research data lifecycle. Pursuing its mission for data preservation and dissemination over the years, it has been also engaged in creating innovative information systems and tools for data collection. In 2016, the CDSP launched the first Dataverse repository in France, and has built a strong experience with Harvard IQSS' Dataverse open source solution. We propose a workshop session to share our expertise. During this workshop, we will guide you through the complete process of launching a test Dataverse repository instance, and performing some useful adjustments and customisations that suit your project and environment, beginning with an introduction to the Data Documentation Initiative (DDI).
Workshop 2 - Teaching data visualization: Adapting the Visualizing the Future data visualization modules for your institutional context
Alisa B. Rod (McGill University)
Tess Grynoch (University of Massachusetts Chan Medical School)
Negeen Aghassibake (University of Washington Libraries)
David Christensen (The Seattle Public Library)
Angela Zoss (Duke University Libraries)
Foundational concepts in visualization and design are critical to creating and reading data visualizations, but are often not included in traditional library data services instruction. This 2-hour hands-on workshop would provide attendees with the pedagogical tools and content to teach data visualization for a variety of audiences and skill levels. We plan to follow a “train the trainer” model to empower attendees to develop context-specific versions of the materials and implement them at their own institutions. The first hour of the workshop would provide attendees with a quick overview of the content of two open access data visualization instruction modules and will focus on highlighting specific topics and areas where the materials can be customized. The two instruction modules are “Data Visualization 101” and “Ethics in Data Visualization” and they cover topics such as key parts of a data visualization, accessibility, ethics-centered design, and identifying bias in the data and visualization creation process. In the second hour, attendees would have the opportunity to repurpose the materials to fit their institutional audience and needs. This would consist of creating a sketch of a lesson plan and breaking up into smaller groups to workshop their tailored plans with other attendees. We plan to conclude the training session by providing information about how to share ideas and adapted materials using the dedicated GitHub site for these two instruction modules.
Workshop 3 - Sustainable Survey Question Banks and Data Catalogs using DDI and Colectica
Jeremy Iverson (Colectica)
Dan Smith (Colectica)
DDI Lifecycle is an open standard used by data archives to document survey questions and resulting datasets. Colectica is a fully supported, off-the-shelf software tool that can create and publish DDI content, including survey question banks and data catalogs. Using open standards and existing software allows organizations to focus on publishing their content instead of developing information formats and performing software maintenance. This workshop covers the following topics: - Introduction to DDI Lifecycle - Introduction to Colectica - Create DDI-backed question banks - Use questions from a question bank to build a survey - Document datasets using DDI and Colectica - Publish question banks and data catalogs on the Web - Harvest questions and data documentation using OAI-PMH harvesting protocol.
Workshop 4 - The CESSDA Data Archiving Guide: A resource for people who love data
Libby Bishop (GESIS)
Yevhen Voronin (GESIS)
Ilze Lace (Swedish National Data Service (SND), University of Gothenburg)
Dimitra Kondyli (National Centre for Social Research (EKKE))
The CESSDA Data Archiving Guide (https://dag.cessda.eu/) is a new resource developed by CESSDA and is designed to provide employees at data archives and repositories with an understanding of the work a data archive performs. The information in the DAG was collected by experts from CESSDA social science data archives reflecting the procedures and policies at their local archives. While the context of these archives varies — in size, the underlying technical architecture or in the specific services provided to researchers — the DAG focuses on common ground and is a useful tool for professionals new to data archiving or those who are knowledgeable in one domain and now seek to broaden their expertise. The proposed 120 minutes panel is made up of representatives from three of the data archives who have built DAG and will follow the structure below: ● Introduction to DAG (15 mins) ● Present the content of three sections: FAQs, Policies, and Pre-ingest. ● Interactive sessions with participants, as full group or in breakout groups, depending on numbers. (30 mins for each chapter, 15 presentation and 15 for discussion) ● Next steps - Focused discussion with participants on several topics (15 mins) o Comments on current content o Suggestions for topics future sections o Discussion of how DAG might be distinctive as a resource for data professionals because it provides a sustainable infrastructure supported by CESSDA. Based on participant feedback from the soft launch of this website, the link will be made available to participants in advance, and questions to solicit feedback will also be provided. Reviewing the website content and feedback questions before the event is recommended, but not required, for participation.
Florio Arguillas (Cornell Center for Social Sciences)
Limor Peer (Yale Institution for Social and Policy Studies)
Thu-Mai Lewis (Odum Institute, University of North Carolina)
Lessons and workshops on Research Data Management are becoming a standard offering at many institutions. Research Code Management (RCM), a very important component of a reproducibility package, however, is not often given the equal importance that it deserves. Code shows the finer details and implementation of the research methodology and is essential for a complete and transparent scholarly record. Research Code Management acknowledges that code is a research object and aims to prepare it for archival and preservation. In this workshop, we will introduce participants to proper management of code as it relates to the whole research compendium and discuss potential pitfalls to watch out for. There will be examples and exercises to elucidate the various components of research code management.
Workshop 6 - What can DDI do for you? An Introduction to the DDI?
Benjamin Beuster (NSD - Norwegian Centre for Research Data)
Hilde Orten (NSD - Norwegian Centre for Research Data)
Are you interested to learn about what DDI can do for your organization or institution? DDI is an international product suit of standard for describing data from the social, economic and behavioral sciences, currently cross-domain. This tutorial provides an overview of the work products of the DDI Alliance. The conceptual basis of DDI will be described, introducing the participants to the main building blocks and items of the main standard products. Practical examples on how DDI can be used beneficially in the business processes of organizations and institutions that manage research data will also be shown. The overall approach of the tutorial is DDI-version agnostic. The examples shown will however be based on specific DDI versions (DDI-Codebook, DDI-Lifecycle and the forthcoming DDI-CDI). Main focus will be put on the following areas: • Data description and variable management • Questionnaire design and implementation • Question and variable banking • Cross-domain data integration • Making your data and metadata FAIR (Findable, Accessible, Interoperable and Reusable) using DDI
Workshop 7 - Analyzing Survey Data with Tableau
Harrison Dekker (University of Rhode Island)
Tableau data visualization software is primarily thought of as a tool for business analytics. While there are compelling reasons why it may not be an appropriate tool for academic research, the software does offer a variety of features that make it an ideal tool for teaching data literacy as well for a variety of analytical tasks that many data professionals routinely engage in, such as creating presentations, data dashboards, etc. Tableau has made significant enhancements to its data "wrangling" and management capabilities, to the extent that for many use cases an entire processing pipeline can be built within the environment without having to write any code. These features in combination with Tableau's powerful visualization tools are ideally suited for exploratory analysis and presentation of survey data. The workshop is intended for an audience with no prior Table au experience but who are familiar with survey data. In addition to exploratory data analysis, particular attention will be given to Tableau's survey data wrangling capabilities, e.g. variable creation, recoding, application of statistical weights, and pivoting. Emphasis will also be placed on how Tableau can be used in data literacy training.
Workshop 8 - Strengthening Managerial Capacities in Data Policy Management and Team Building
Irena Vipavc Brvar (Social Science Data Archives, University of Ljubljana)
Brian Kleiner (Swiss Center of Expertise in Social Science)
Francesco Giovanni Paoletti (University of Milano-Bicocca)
Learning from experienced colleagues is a valued and sought-after resource for management training in research infrastructures (RIs) and core facilities (CF). RIs and CFs that are still in the developmental stage, may be lacking management and leadership expertise appropriate for the operation of services provided or acquired in an incomplete and unstructured manner. The overall objective of the RItrainPlus project is to develop and deliver a training program to fulfill the competency requirements for the current and future managers of European RI and CF. The proposed workshop aims to tackle two important issues of leadership in RIs and CFs, namely data policy and management and team building. The Data policy and data management section aims to make participants aware of the skills required by infrastructure managers to define data management policies and lead their teams to effective execution. It will cover the requirements for developing and implementing meaningful data-related policies across the research lifecycle. Topics include long-term preservation, data security and staff access requirements, data storage and backup, personal data handling, and documentation. Participants will learn what is required to define appropriate workflows, roles, and responsibilities among RI staff. The RI and CF management team culture section will address the issue of building a stronger team. Our workplace culture is created based on a set of values, beliefs, and behaviors. The teams are constantly under tremendous constraints and pressures, which makes building the right team culture laborious. It is very easy to rely on old habits and take the easier route if that means completing a task and moving on. In this section, we will use examples and active discussion to address some of the challenges of team building: intercultural teams, getting to know our team, consensus building, diversity promotion. The workshop will include content and practical part.