Plenary 1: From Big Data to Dirt Research: Mapping Canadian Energy Transitions in City, Field, and Forest
Dr. Josh MacFadyen (University of Prince Edward Island, Canada)
Josh MacFadyen is an Associate Professor and Canada Research Chair in Geospatial Humanities at the University of Prince Edward Island. His research focuses on energy transitions and traditional energy carriers in Canada, and he leads the GeoREACH lab at UPEI which supports Geospatial Research in Atlantic Canadian History. His first monograph, "Flax Americana: A History of the Fibre and Oil that Covered a Continent", was published by McGill-Queens University Press; his most recent monograph is "Time Flies: A History of Prince Edward Island from the Air", a new book with Island Studies Press that examines land use change on PEI using aerial photographs.
Plenary 2: Community-University Partnered Research: Roles, Power and Practical Stages for Equitably Sharing and Generating Knowledge
Dr. Melanie Zurba (Dalhousie University)
Melanie Zurba is an Associate Professor with the School for Resource and Environmental Studies at Dalhousie University. She mentors students and works with postdocs, visiting scholars, and fellows on many projects in collaboration with community partners through her Community-Engaged CoLab. Such projects have focused on climate change, youth engagement, protected areas, and biodiversity conservation, all with a connection to fostering equity and collaboration in relation to environmental governance. Dr. Zurba also focuses on advancing research methodologies, including advancing participatory methods, working with the arts sector, and co-developing principles for equitable research. Her work also connections regional and international policy, and she hold the Chair in Governance, Equity and Rights within the International Union for the Conservation of Nature Commission for Environmental, Economic and Social Policy.
May 29, 2024: Session A1 Data and Social Justice
Navigating New Frontiers: Transforming Data Services for an Inclusive and Diverse World
Paula Lackie (Carleton College)
Deborah Wiltshire (GESIS)
Tim Dennis (UCLA Library)
Adefunke Olanike (Nike) Alabi (University of Lagos)
Sophia Adeyeye (Lead City University)
The imperative to transform data support services to be inclusive and diverse has never been more crucial. Data support professionals need to be ready to not only deal with technical and procedural needs of our patrons, but also address topics such as biases in algorithms, equitable access to resources, and the many layers of security and privacy issues inherent in research data - and all for a dramatically diversifying clientele. As the needs for our services are expanding it is important to be mindful of factors such as; transparency, culture, the changing needs of academic disciplines, accessibility differences, promoting social justice, and breaking down barriers for a more inclusive next generation of data professionals – all while working within our budgets, of course. In this panel discussion, we will share our insights for sustainability in striving to deliver more equitable, accessible, and representative data services. We also seek input on a draft scheme to re-chart data service frameworks and training for greater inclusivity. Join us in navigating the new frontiers of data services, where inclusivity is not just a goal but a fundamental principle for the future of data-driven decision-making.
May 29, 2024: Session A2 Training and Education
Comedy of errors: Comics as a medium to tell data management (horror) stories
Marla Hertz (University of Alabama at Birmingham)
The use of comics in the classroom has proven to be highly effective due to their engaging format that allows readers to set the pace for absorbing information. This creative medium is especially well-suited for research data management (RDM) instruction. Comics can be a cathartic output for relating stories and consequences of poor RDM practices, and audience familiarity with the medium promotes receptivity to these topics. With the aim of encouraging RDM educators to adopt comics as a teaching tool, this presentation will explore the application of comics in RDM instruction, review activities tailored to different learner groups, and present strategies for incorporating them into the classroom.
Building a Community: Crafting Specialized Curriculum for Data Curation
Sophia Lafferty-Hess (Duke University)
Mikala Narlock (University of Minnesota)
Data curation is the process of making data understandable and reusable for publication within a data repository. It can be multifaceted and complex based on the types of data you are curating and the expertise of the curator; and with evolving data sharing practices and standards, ensuring data curators and stewards have access to high-quality instruction on specific data types is essential for supporting the goals of data sharing. Duke University, in collaboration with members of the Data Curation Network (DCN), received an Institute of Museum and Library Services (IMLS) grant [re-252343-ols-22] to support the development of specialized curriculum for data curation training for academic library and archives staff. The project drew together data curators and information professionals to create in-depth training resources for four specialized data types: geospatial data, scientific images, code, and simulations. Building upon the CURATE(D) workflow, the project employed a community-driven cohort approach to develop the new curriculum materials, which were piloted at a 2-day in-person workshop in October 2023. In this presentation we will discuss how this curriculum complements existing DCN training initiatives, the approach to curriculum development, the core content that was developed, plans for dissemination, and reflections on the project.
Strengthening, Not Reinventing, the Training Wheel: Sharing Open Educational Resources for Research Data Management Training
Jennifer Abel (University of Calgary)
Elizabeth Hill (Western University)
Lee Wilson (Digital Research Alliance of Canada)
With the release of a Research Data Management (RDM) Policy by Canada’s Tri-Agency (Government of Canada, 2021), RDM and training to support RDM has become essential. All researchers who apply for grants to fund data-related research must now meet requirements including writing Data Management Plans and preparing data for deposit. In response to both this and the growing movement towards open scholarship and open data, a variety of Open Educational Resources (OER) were developed by several Canadian institutions to respond to local RDM training needs for researchers. At the same time, national collaborative projects were developing and growing in number. Members from these groups met to discuss how to bring together these projects, raise awareness and leverage the work that’s been done and bring it to a wider audience. This led to the idea of bringing together OER creators and contributors in a webinar series that showcases the works themselves, but also touches on how these resources may be deployed locally, re-used, re-mixed, cited, and attributed. This series is hosted jointly by the Digital Research Alliance of Canada and the National Training Expert Group. In this presentation, we will explore the lessons learned to date from this series, discuss challenges and opportunities that have arisen, and speak to future directions in national training initiatives and OER support.
May 29, 2024: Session A3 Data and AI
Metadata Ahoy! Charting a reusable path for machine learning
Stephanie Labou (University of California San Diego)
Abigail Pennington (University of California San Diego)
Ho Jung Yoo (University of California San Diego)
Machine learning (ML) is more popular than ever, but what is needed to best document, curate, and archive ML research outputs? Data curators are largely in uncharted waters as to what extent repositories are able to manage ML objects and components (data, code, parameters, documentation, etc.) in a way that matches researcher needs and uses. But before we can plot a course towards a set of best practices, we must first ask: where are we now? This presentation will provide an overview of a recent research project that assessed how well metadata schema and fields in eight generalist (Figshare, Zenodo, Harvard Dataverse, etc.) and specialist repositories facilitate findability, interoperability, and reusability of ML objects. We will discuss strengths of and opportunities for these repositories, and what generalist repositories can learn from specialist repositories and vice versa. The presentation will also summarize the outputs from this project, all of which are publicly available: a multi-repository metadata field crosswalk, complete metadata exports of nearly 20,000 ML-related items from these repositories, and user interface and code to query repository APIs and standardize and analyze metadata exports. We hope the IASSIST community will dive deep into this bounty of (meta)data!
Metadata Augmentation for Social Science Datasets Using Generative AI
Caden Picard (University of Michigan - Flint)
Jared Lyle (ICPSR)
jay winkler (ICPSR)
Murali Mani (University of Michigan - Flint)
Efficiently curating metadata with controlled terminology is a critical yet time-consuming task in social science data management. Data depositors often provide insufficient metadata, compelling data repository staff to extensively enhance the metadata. This process traditionally involves navigating a wide array of controlled terms, a task demanding substantial time and expertise, sometimes necessitating the creation of new terms. Addressing these challenges, we introduce an innovative model employing Generative AI technology (ChatGPT). This tool is engineered to significantly diminish the time required for metadata curation for data repository staff while enhancing the accuracy of term matching. It achieves this by rapidly analyzing text and extracting pertinent keywords from established thesauri, including ICPSR, ELSST, and Library of Congress, along with ChatGPT's intelligent recommendations. This approach not only expedites the curation process but also ensures heightened precision and recall in the results.
Unpacking Inter-dependent Considerations Associated with Selecting Data for Reuse as Part of the Discovery Process
A.J. Million (University of Michigan)
Jeremy York (University of Michigan)
Sara Lafia (University of Chicago)
Libby Hemphill (University of Michigan)
Elisabeth Shook (ICPSR)
This presentation reports preliminary findings from an interview study conducted at the Inter-university Consortium for Political and Social Research (ICPSR). We present insights from 20 semi-structured interviews of secondary data users about data search, reuse, and recommendation. We conducted interviews as part of a larger study for a National Science Foundation (NSF) funded project. Academic literature shows varied data discovery practices among scientists are the norm (e.g., Gregory et al., 2020). Our NSF-sponsored work (Lafia et al., 2023; Million et al., in review) confirms this finding, but much less understood is how inter-dependencies among information needs, data types, and user contexts (to name a few factors) create variable data discovery behaviors (and why). We classify interviews using a structured codebook (Saldaña, 2011) to understand why data discovery at ICPSR is varied. Our interviews related to data discovery behavior from a variety of angles. Our presentation focuses on the search process (i.e., looking for data to meet an information need) and data reuse (i.e., processes after users find relevant data that may lead them to search for additional data or information to complete a study). We show how our interview passages align with search paths we previously identified (Lafia et al., 2023) and that considerations tied to doing research produce variability in data discovery. We conclude data discovery occurs in an ecosystem extending beyond individual data archives and repositories, which further increases variation in discovery behavior.
Navigating Waves: Using DDI and Colectica to Enhance Interoperability Within and Among Longitudinal Series
Jennifer Zeiger (Inter-university Consortium for Political and Social Research (ICPSR))
In this presentation, I plan to describe NACDA’s efforts to create information-rich metadata that documents the comparability of variables in longitudinal series on aging, as well as how this metadata is made available to the public. The National Archive of Computerized Data on Aging (NACDA) began working with DDI-Lifecycle (DDI-L) in 2018. Since then, NACDA has documented some of its most established and frequently-used longitudinal data collections to DDI-L and displayed them in a Colectica Portal. In this presentation, I will discuss how this portal functions, how our use of the portal has evolved, and our plans for the future. In particular, I will highlight the creation and presentation of cross-wave concordance documentation for a single series, concordance documentation for two separate series, and concordance documentation for three or more series. NACDA is part of the Inter-university Consortium for Political and Social Research (ICPSR) and based at the Institute for Social Research (ISR) at the University of Michigan.
“Wealth from the Sea”-- Finding Treasures in a Database of Social Science Data-Linked Literature
Homeyra Banaeefar (ICPSR)
Sarah Burchart (ICPSR)
Elizabeth Moss (ICPSR)
Eszter Palvogyi Polyak (ICPSR)
Elisabeth Shook (ICPSR)
The Inter-university Consortium for Political and Social Research (ICPSR), a social and behavioral research data archive, maintains the ICPSR Bibliography of Data-related Literature. Initially created with the support of a National Science Foundation grant in 1999, the Bibliography captures and documents the usage of ICPSR data in research and publications. It helps people discover data and gauge its potential utility. Now housing over 110,000 data-linked citations, the Bibliography’s database continues to be expanded by librarians and information professionals who actively fish through research output, both published and unpublished, to find instances of ICPSR data usage. If a publication meets the Bibliography’s collection criteria, the publication is displayed with the research study’s record on ICPSR’s websites. By equipping people with a compass to discover and evaluate the value of the thousands of social science datasets held by ICPSR, the Bibliography provides “wealth from the sea” of data. Attendees will enjoy engaging narratives and visual presentations that include insights about the ICPSR collection as reflected in the data-related literature. Topics covered will include (1) a look at the variety of venues where that literature is published, (2) analyses of changing data citation conventions, (3) comparisons between peer-reviewed and gray literature, including the increasing use of preprints, and (4) tales of datasets that are used in tandem to support interdisciplinary research. Attendees will come away with an appreciation of how a curated collection of data-linked publications can act as a dataset, itself, providing data users with insight while navigating the vast waters of social science scholarship.
May 29, 2024: Session A5 Emerging Geospatial Trends
3D Reconstruction with Drones: LiDAR vs. Aerial Photogrammetry
Dan Jakubek (Toronto Metropolitan University Libraries)
Jimmy Tran (Toronto Metropolitan University Libraries)
At Toronto Metropolitan University (TMU) Libraries, we have initiated a Drone Learning Program to assist our research community with the incorporation of drone technology in their research. This program has led to collaborations with researchers across a variety of disciplines that require capturing and modeling the real world in 3 dimensions (3D). To do so, several 3D reconstruction technologies and processes have been applied, varying in cost and ease of use. This presentation will introduce the use of LiDAR technology and the process of aerial photogrammetry as strategies for data collection and 3D reconstruction. Datasets created using both approaches will be showcased to illustrate the pros and cons of incorporating LiDAR technology and aerial photogrammetry with drones to recreate physical objects.
Participatory Mapping and Historical Memory: Reconstructing Destroyed Villages of the Salvadorian Civil War
Zack MacDonald (Western University Libraries)
This presentation examines the use of participatory mapping and historical GIS to reconstruct communities destroyed during the Salvadoran Civil War. These reconstructions serve as both a repository of living memory and an access point to a growing body of archival evidence and testimonials collected by the surviving memory team. This work incorporates precision GPS equipment, and Survey 123 to record data from survivor testimonials and physical evidence, alongside historical maps, and archival imagery. It briefly discusses the data collection workflows and tools and introduces a trauma-informed approach to participatory mapping in an historical memory context and how these approaches can be applied across multiple projects in different contexts. Finally, it shows how the resulting interactive maps become powerful tools for knowledge mobilization and to support more in-depth reconstructions of communities lost to scorched-earth tactics.
Geospatial Data for Design: Current and Emerging Library Provision of Expertise and Assistance
Bruce Boucek (Harvard University Graduate School of Design)
This presentation discusses the contemporary provision of geospatial expertise and assistance in the context of a graduate school of design and it explores the future trajectory of those services. It identifies current challenges and explores how dramatic shifts in computational availability (Artificial Intelligence, virtual reality, high performance computing, and pervasive and always available networks) are influencing the ways in which expertise and assistance are provided. Geographic Information System services have existed in academic libraries for decades. Map and cartography divisions have existed in libraries for centuries. The provision of geospatial data expertise and assistance in libraries is nothing new. How the services are provided and the degree of expertise and breadth of materials (paper maps, digital data) made available is constantly evolving. The shifts that we are living in right now, with regard to how designers are educated and how scholars do scholarly work are not a gradual evolution but are instead punctuated tectonic shifts irrevocably altering the world in which we work. This presentation will weave together the practical experience of multiple dramatically different projects, their technical requirements, their scholarly contributions, and their indications of the future of geospatial work in libraries.
May 29, 2024: Session B1 Policy and Costs
The best things in life are free but data sharing is not: A multi-institutional study of the realities of academic data sharing
Alicia Hofelich Mohr (University of Minnesota)
Jonathan Petters (Virginia Tech)
Jacob Carlson (University of Buffalo)
Jennifer Moore (Washington University)
Joel Herndon (Duke University)
With increasing requirements to make research data publicly available, it is critical to acknowledge and understand the labor and costs required to meet these obligations. This panel will present results from the Realities of Academic Sharing (RADS) initiative, a mixed-method study across six US Research Universities. Through surveys of and interviews with institutional administrators and federally funded researchers, we assessed engagement with data management and sharing (DMS) activities and the costs associated with this work. We found that institutions support a broad range of DMS activities, and invest an average of $750,000 annually in this support, with the highest costs falling on IT and Libraries. Researchers also engage with DMS activities, although many do so without institutional support. This panel will consist of separate presentations on the various facets of our results, focusing on four different institutional perspectives. First, we will discuss the Libraries, which support the majority of DMS activities across institutions, and bear some of the highest costs for DMS support. Secondly, we will highlight the support and expenses of other administrative offices such as Institutional Technology, Research Offices, and Specialized Centers. These offices are often overlooked as primary DMS service providers, but support a substantial number of DMS activities and bear considerable infrastructural costs. Thirdly, we will present the results for researchers, who spend an average of nearly 6% of their grant award totals on DMS activities, with this value more than doubling for smaller grant awards. Although most report engaging in DMS without institutional support, we found some evidence of cost-savings for those who did engage institutionally. Finally, we will discuss implications for practices and data-informed responses at the individual institution level, as well as invite collaboration in future research to broaden and diversify the institutions represented in this initial study.
Preparing the Researchers of Tomorrow: Data Literacy and Undergraduate Posters
Maggie Marchant (Brigham Young University)
Jeff Belliston (Brigham Young University)
At universities, research involving data is often regarded as the domain of graduate students and faculty. Undergraduate students also work with data within the research process, and it can be a core experience to prepare for future education and careers. Research products from undergraduate students can also demonstrate the extent of their data literacy skills and understanding, which are becoming central to success in graduate work and the business world. Since one main way undergraduate students share research is through posters, this session will examine undergraduate posters in the context of data literacy skills. During the session we will define data literacy and discuss why it is important for undergraduate students to develop. We will review the structure of undergraduate poster competitions and other opportunities for undergraduate research. Finally, we will share results on the strengths and gaps in data literacy, followed by suggestions for supporting and encouraging undergraduate research and data literacy development beyond the traditional area of data analysis.
Fostering Workforce Readiness: Librarians and Data Literacy Education
Wendy Pothier (University of New Hampshire)
Patricia Condon (University of New Hampshire)
Given the rapid growth of data creation, increasing automation and AI, and expanding daily interactions with data, our collective need to become more data literate is vital. While discussions around data literacy in the workforce focus on its critical role in revenue and competitive advantage for companies, it is equally vital for the professional success of students and alumni. The value of data literacy skills extends beyond performing job tasks, impacting various facets of individuals’ professional lives, including job satisfaction, salary, retention, mental well-being, and job placement. However, both companies and employees, and subsequently our students, face challenges in navigating pathways to data literacy. The increasing demand for improving data literacy aligns well with the core values of librarianship, which emphasize lifelong learning and the empowerment of individuals to work effectively with information and data. Librarians in academic settings have a distinct role in teaching data literacy by collaborating with teaching faculty, campus administration, and industry professionals to equip undergraduates with essential data literacy skills before graduation. In this presentation, we will share our seven proposed business data literacy competencies, demonstrate their alignment with the ACRL Framework, and present findings from a research survey exploring the relationship between data literacy competencies taught in the classroom and professional workplace practices. These findings illustrate how the proposed business data literacy competencies move from the undergraduate classroom into the workplace and provide additional context for integrating data literacy instruction alongside information literacy. This presentation will help the audience appreciate ways in which improved data literacy contributes to success in the workplace and strengthens our workforce's capabilities.
Adventures in data literacy: When the gap you were trying to identify turns out to be a chasm.
Meg Miller (University of Manitoba)
Grace O'Hanlon (University of Manitoba)
In an era where post-secondary students are seen as digital natives and data storytelling is becoming an expected part of scholarly discourse, this paper synthesizes insights from multiple surveys. This research was conducted in 2020 and 2022 with participants from programs across the University of Manitoba (UM) to illuminate the landscape of data literacy. This paper will not only delineate the current landscape of data literacy at UM (a Canadian public research university of around 30,000 students) but also explore challenges and potential pathways for improvement. The authors hope this paper can serve as a foundational resource guiding those who work in data services towards a data-literate future.
Coding in public: recognising vulnerability as an added barrier to reproducibility for marginalised groups
J. Kasmire (UK Data Service)
Louise Capener (UK Data Service)
Nadia Kennar (UK Data Service)
Reproducibility is an important topic and one that is increasingly recognised as necessary for building trust into science and research. However, reproducibility entails transparency and openness which many find challenging as they can make researchers feel vulnerable; being clear about what you have done and why, opens the door to criticism, imposter syndrome, unethical or competitive behaviour and even professional disadvantage. Vulnerability is not easy for anyone but for most the discomfort is generally resolved through experience. However, researchers from marginalised or historically excluded groups will experience the vulnerability in very different ways which go beyond discomfort through to fear and genuine problems. For these researchers, the vulnerability and fears associated with reproducibility can be a very real barrier that prevents already disadvantaged researchers from participating that causes problems when they do participate. These researchers cannot assume that they will enjoy the benefits of established reputations, peer solidarity, or supportive management and instead may feel deliberately excluded from and disadvantaged by research practices that demand reproducibility. This presentation explores the barriers faced by researchers in efforts to be more reproducible, how those barriers may be unevenly distributed, and how research institutions and cultures can make these barriers better or worse. The presentation concludes by arguing that those calling for more transparency, openness and/or reproducibility must consider how the vulnerability and barriers entailed may not be equitably distributed and that they should be advocating for changes that allow everyone to participate in reproducible practices without exposing themselves to unfair risk.
Envisioning Ethical Frameworks for Community-Data
Danica Evering (McMaster University)
Subhanya Sivajothy (McMaster University)
Emerging funder and journal requirements for research data management impact community-engaged research. In Canada, the Tri-Agency’s RDM Policy prompted changes to their Policy Statement on Ethical Conduct for Research Involving Humans, enabling “broad consent” for data sharing. Across the border, the National Institutes of Health and National Science Foundation are enacting new protocols for data management and sharing. While sensitive data management is an active discussion in academic circles, researched communities are not yet included. Community organizations collect and manage data and often act as intermediaries in human-participant research that produces sensitive data but do so in different ways than university researchers. These groups and their constituents stand to benefit from guidelines to help alleviate over-research, surveillance, and damage-centered narratives. Our IASSIST presentation will discuss insights and strategies gained from the RDM Community Data Toolkits Workshop on March 21-22, 2024. This two-day event will bring together researchers and information specialists alongside social justice organizations and non-profits to develop toolkits to better navigate the critical ethics of community research data. Discussions will ensure: • Data can be utilized by communities themselves in ways that serve their specific targets and metrics for change as conceived by their own determinants of need and designs for betterment. Community-led data practices empower and support community-led grassroots actions and initiatives. • Research data from communities is protected and does not make them vulnerable and/or grant them visibility, safeguarding their susceptibility to data misuse and exploitation. With the community-engaged interventions we hope to establish a framework to continue working on these toolkits beyond the workshop. By sharing these insights with the IASSIST community, we hope to assist researchers and information specialists who may also be working with community data and/or organizations, as well as find potential collaborators on future projects following this theme
Indigenous Data Governance and Environmental Sustainability Initiatives in Nigeria
Sophia Adeyeye (Lead City University)
Nurudeen Bakare (Rochelt Business Limited)
Indigenous knowledge has been recognized as a veritable tool in promoting environmental sustainability globally. However, some countries are more successful than others in harnessing the power of indigenous knowledge in the fight against climate change and environmental sustainability. One of the factors behind this disparity is effective data management practices. While efforts have been made to document indigenous knowledge, a lack of indigenous data sovereignty and indigenous data governance framework often means that the right data are not collected, or the collected data are not properly utilized in a way that benefits both the local community and the larger society. This often results in poor approach to indigenous data collection, development of culturally insensitive policies, and failure to secure the buy-in of local people in environmental sustainability projects. This article, therefore, explores the role of indigenous data management in environmental sustainability efforts in Nigeria. Specifically, the study examines the role of data governance practices in enhancing the preservation of indigenous knowledge relating to environmental sustainability and fostering community-led initiatives for environmental conservation. The article aims to propose an indigenous data governance structure for Nigeria to facilitate indigenous data sovereignty, which ensures that indigenous people in Nigeria are fully involved in making decisions about what data is collected and how such data is used. It is expected that promoting effective indigenous data governance and data sovereignty will lead to the development of policies and initiatives that will enjoy the support of local people and ensure the success of environmental sustainability efforts in Nigeria.
May 29, 2024: Session B4 Use Cases
Meeting new challenges posed by the UK Censuses
Oliver Duke-Williams (UCL (University College London))
Vassilis Routsis (UCL)
This paper builds on and extends a presentation at IASSIST2023 which outlined the challenges faced in disseminating data from the 2021/2022 Censuses in the United Kingdom, as part of the UK Data Service (UKDS). The UKDS is a key part of UK research infrastructure, and provides a wide range of social sciences, humanities, and economic research data with census data being one of the major collections. A range of tools have been developed within UKDS to provide access to data, and these continue to be extended. New workflows exist for aggregate data and for origin-destination data – the latter being addressed in detail in a separate submission to this conference. In this paper we illustrate the full set of current tools, explain how they can be used, and showcase examples of analysis. Census data are disseminated with a variety of license arrangements: open, safeguarded, and secure, and we reflect on shifting balances of risk appetite, and what this means for researchers, including those based outside the UK. We also reflect more broadly on long-standing and new challenges faced in disseminating this data: the size and complexity of the resulting outputs; user expectations; new competitors etc. The 2021 census round in the UK was notable for a number of reasons: operational changes enforced by the covid-19 pandemic including deferment in Scotland, new questions and legal challenges to question and guidance wording. The wider context is a public consultation over the future of population and migration data collection in the UK, with the possibility that the 2021 census will also turn out to have the last such census, and thus we also reflect on how we can adapt a census archive/service to a future 'administrative based census' archive/service, with more frequent data about which researchers may be less aware.
Measuring gender diversity in library collections and publications using OCLC metadata
Michelle Alexopoulos (University of Toronto)
Kelly Lyons (University of Toronto)
Mahendhar Kumar Kumar (University of Toronto)
Kaushar Mahdtaji (University of Toronto)
Marcus Emmanuel Barnes (University of Toronto)
Metadata associated with countries’ publications and library collections across time and topics can help explore the evolution of the diversity of contributors within a field of study and library patrons’ ability to access information by diverse groups of authors. The reasons are simple. First, given publications disseminate knowledge within fields of inquiry, the author/creator metatdata associated with the works provide rich snapshots of persons working within the field. Second, holdings of different types of libraries within a country (i.e., national libraries, research libraries, public libraries, etc.) over time reflect changes in the acquisition policies, and preferences and demographic characteristics of their users. Gender-diversity in library collections helps expand viewpoints, promote dialogue and advance research. Our paper mines OCLC’s Worldcat metadata to explore how publication and acquisition patterns have evolved for materials created by authors of different genders over time, and across fields and different types of libraries within and across countries. This is accomplished by applying tools to infer genders of authors for materials in our sample, and combining this information with: (1) country of publication, (2) field of inquiry identified using the records’ classification codes, (3) date(s) of publication, (4) language(s) associated with the works and their translations, and (5) the number of libraries by major type in different countries holding the material within their collections. Metrics created from this analysis help answer the following questions. First, how has the gender composition of publications across fields changed over time on a global scale, and at a country level? and second, how well do the different collections of various library-types reflect the diversity of the materials available? The investigation should provide important insights into the current level of diversity reflected along this dimension by different library types across countries, and identify gaps in coverage.
Improving the Discovery of Restricted Data: Identifying Metadata Commonalities Across Restricted Data Sources
Kevin Read (University of Saskatchewan)
Grant Gibson (Canadian Research Data Centre Network)
Amber Leahey (Scholars Portal)
Lynn Peterson (National Research Council of Canada)
Sarah Rutley (University of Saskatchewan)
Julie Shi (University of Toronto)
Victoria Smith (Digital Research Alliance of Canada)
Kelly Stathis (DataCite Canada)
Background The challenge of finding and accessing restricted data for research purposes is a known issue; specifically, researchers encounter barriers when identifying prospective data sources, when locating and understanding available data within those sources, and when discerning whether they are eligible to access it. A prior study conducted by the authors of this presentation identified that many restricted data sources do not make use of metadata to ensure their data are findable and accessible. Methods To assess the readiness of restricted data sources to utilize a metadata standard, this study identified common elements of both dataset descriptions and access requirements/procedures across 48 restricted health data sources. These elements were subsequently mapped to current metadata standards (e.g. DataCite) to determine how closely they matched the elements in these existing standards. Results Our findings indicate that many restricted data sources already provide dataset information that aligns closely with existing metadata standards, that data sources would benefit from adopting metadata standards to improve the discovery of their data, and that generally, it would be possible for these data sources to adopt an existing common metadata standard to describe their data. Access information provided by these data sources, however, is not adequately supported by existing standards. To ensure that the access requirements/procedures needed to acquire restricted datasets can be discoverable and transparent, metadata standards bodies will need to revise their schemas to include more descriptive access information. This revision would also provide researchers – who collect restricted data and must comply with funder and publisher data sharing policies – with standard guidelines for describing their data access request processes in more detail. Conclusion This presentation will discuss our findings in detail, articulate key challenges in assigning metadata to restricted data, and suggest recommendations for improving the discovery of and access to restricted data.
May 29, 2024: Session B5 Archives and Geo Data
Automation of the geodata archiving
Martin Rechtorik (University of west Bohemia/National Archives of the Czech Republic)
In the Czech Republic, spatial data archiving is official part of the Strategy for the Development of Infrastructure for Spatial Information in the Czech Republic after 2020+. This goes hand in hand with the right to public information and the publication of open data datasets too. All this generates high demands on both sides, archives and creators especially in the area of technical knowledge, software and hardware, data thoughput, data management in digital archives, digital skills, metadata creation in various standards including the necessary validation. The National Archives sees an automation of at least part of these workflows as the only realistic solution to this very complex issue. We would like to present results from our project supported by The Technology Agency of the Czech Republic as way how we try to solve it, our own tools and a tool which was developed within the project too.
Bubbling on the institutional radar: Charting the topography of digital data at Queen’s University of Belfast
Michael O'Connor (Queen's University Belfast)
Queen’s University Belfast is a Russell Group University, the UK’s Ivy League, and research-active institution, founded in 1845. It is a UK leader in a broad range of disciplines with 99% of research as world leading or internationally excellent (REF 2021). Its Library is part of the network of Research Libraries UK (RLUK), boasting significant unique digital assets, including an Open Access institutional repository (research outputs repository, thesis repository & research data repository), as well as Northern Ireland Official Publications Archive (NIOPA) with de facto legal deposit status. As an institution, we must address a litany of issues and obstacles faced in the governance, maintenance, security and preservation of this unique digital data. This includes preservation formats for content in these repositories, the challenges of structured versus unstructured data, anticipated moves to cloud-based solutions and different repository vendors, funder mandates to preserve and share data, & data access committees and governance for controlled/restricted data. This paper outlines the various attendant problems, the current state of play and necessary contexts (e.g. FAIR, Open Access, funders & legislation) and asks the questions ‘Who is the making the call on vital decision making about unique assets?’, ‘Collectively, are we informed enough to do this?’, ‘What is the necessary governance and infrastructure?’ and ‘Are we ready to face an uncertain future where the stability and permanence of digital assets may be hampered by indecision, data loss, confusion, obfuscation and the unpredictability of digital data now and into the future?' This paper argues that since national/international agreement about some good practices are formative, principle-based or not fully understood or applied (e.g. FAIR data & preservation formats) it is even more imperative that partnership, collaboration and discussion happen at both micro and macro levels to harness the security, access and longevity our unique digital data collections.
Lowering Barriers to Incorporating Geospatial Data into Social Science Research: The National Neighborhood Data Archive
Megan Chenoweth (ICPSR-University of Michigan)
Lindsay Gypin (University of Michigan Institute for Social Research)
Research in the social sciences and public health has shed light on the many influences of the physical, built, and social environment on health and wellbeing. Increasingly, researchers are turning to geospatial data sources to measure these phenomena. Researchers exploring social science questions about place are experts in their fields, but may be novice users of GIS tools and technologies. They may face many challenges when working with geospatial data, including: lack of access to geospatial datasets, a need for large amounts of storage space and computing power, and a steep learning curve for the tools and programming languages used to work with geospatial data. The National Neighborhood Data Archive (NaNDA) aims to address these challenges for users in the United States by providing geographic information in a simpler, easy-to-use, and publicly available format. NaNDA is an open data repository. It was created to facilitate research on the relationship between neighborhoods and health, especially within the context of large federally funded surveys and cohort studies. NaNDA reduces barriers to incorporating spatial data into social science research in three ways. First, NaNDA makes the data available in familiar tabular formats that can easily be linked to other data sources about individuals and communities. Second, NaNDA data promotes reproducibility and speeds up the research process by making the finished version of measures available instead of multiple users having to create the same measures from scratch. Third, NaNDA data is both publicly available for anyone to download from ICPSR, and increasingly available at the point of use in other enclaves, such as the Michigan Center on the Demography of Aging (MiCDA) and the Michigan Medicine DataDirect portal. For these reasons, NaNDA serves as a model for geographic data services of the future.
Canadian Census Data Discovery Partnership (CCDDP): Census Data Discovery and Access in Canada: Stakeholder Perspectives
Leanne Trimble (University of Toronto)
Julia Barkhouse (Library and Archives Canada)
David Price (Statistics Canada)
The discovery and access of historical and contemporary census data for use in research poses significant challenges in Canada. With distributed discovery and access portals including government, archives, libraries, and research projects, this necessitates collaborative efforts among various stakeholders. This panel aims to present a comprehensive overview of the Canadian Census Data Discovery Partnership (CCDDP), a SSHRC funded partnership project (2020-2023), focused on bringing together census data stakeholders, disseminators, and stewards, aiming to discuss challenges, innovations, and collaborative initiatives developed to enhance the discoverability and access to Canadian census data. Experts from key stakeholders groups, including the Library and Archives Canada (LAC), CCDDP project committee, and Statistics Canada, will provide unique insights into their contributions and experiences, underscoring the critical need for collective stewardship of valuable census data in Canada. CCDDP Project Overview: This presentation will delve into the CCDDP's comprehensive census data inventory (1666-2021) and user needs analysis, shedding light on the current state of census data discovery and access. Additionally, it will showcase the prototype of the census data discovery portal, developed as a solution for finding historical and contemporary census data in Canada, emphasizing its role as a distributed discovery tool for users. LAC Census Search: This segment will feature insights from Library and Archives Canada on the challenges and successes encountered in facilitating the discovery and access to census data. The presentation will highlight the importance of effective data management and search tools, and the role of the LAC Census Search in this context. Statistics Canada: This section will explore the innovative census data dissemination tools and access models introduced by Statistics Canada. A representative from Statistics Canada will share insights into the latest strategies and technologies employed to enhance data dissemination, facilitating improved access for a broader user base.
Assigning Persistent Identifiers (PIDs) to facilities and instruments allows data users to better understand how and where data is created, facilitating reproducibility and broadening discoverability and attribution for the facilities and instruments themselves. In light of the fragmentation of the open-science ecosystem, a community-based coordinated approach is necessary to ensure PIDs for facilities and instruments are adopted in ways that provide the widest benefits to everyone involved in the research landscape. It is particularly important for the social and geospatial data communities to participate in this effort as social and geospatial researchers increasingly collaborate with colleagues in the natural sciences to generate and use complex datasets produced by capital-intensive facilities and instruments. This presentation will introduce findings from the first year of activities conducted by the FAIR Facilities and Instruments project (funded by the American National Science Foundation’s FAIR Open Science Research Coordination Network) to bring together communities from a range of disciplines and roles and to identify use cases such as allowing facilities and instruments to be connected to other entities like data sets, researchers, organizations, articles, grants, and more. Activities included multiple online focus groups with various communities and an in-person workshop that convened experts from around the USA to discuss motivations for and barriers to PID adoption, as well as realistic near-term actions to pursue. Project findings to date represent an important step on the path to incorporating facility and instrument PIDs into the wider data ecosystem and developing community-based recommendations and best practices for research transparency and reproducibility in the next generation of interdisciplinary geospatial and social scientific research.
Practical Considerations When Archiving Reproducibility Packages
Florio Arguillas (Cornell Center for Social Sciences)
Jonathan Bohan (Cornell Center for Social Sciences)
With reproducibility of results becoming well-established as a gold standard for publication by researchers, and increasing requirements from journals and research funders to share data and code, the necessity of education in the archiving of reproducibility materials and its difference from archiving data alone is growing within the field of data librarianship. Ten years after establishing the Cornell Center for Social Sciences's (formerly CISER) Results Reproduction Service, we share lessons learned in verifying, certifying, and archiving reproducibility packages. We will discuss typical consultation questions, things to watch out for, and staffing and technological infrastructure requirements.
Documenting Reproducibility: The Integral Role of Documentation in Transparent Qualitative Research
Maureen Haaker (UK Data Service)
An international drive for open data has been growing since the early 2000s to promote greater reproducibility within research. This has culminated in Wilkinson’s et al (2016) guidelines for FAIR data. As these calls for reproducibility and FAIR data are codified in data policies and publishing ethics, it raises difficult questions for qualitative researchers for how transparency can be consistently and ethically demonstrated within their projects. Using observations and findings from a systematic review of over 1000 qualitative collections held at the UK Data Service, this presentation presents an archive’s point-of-view of how innovative ways of documenting data and new curation tools can provide support for demonstrating process and analytic transparency in qualitative research. I outline key ways that good practices in data management and curation support transparency, as well enhanced ways that go further in showing research integrity. As part of this discussion, I introduce exemplary case studies from the UK Data Service’s collections and qualitative-specific tools developed at the UK Data Service which exemplify the core principles of transparency and reproducibility in qualitative research. Reflecting on the recent developments in curation, this presentation offers qualitative researchers new possibilities for constructively responding to calls for reproducibility, as well as providing further guidance on how policies and procedures might take into account the specific needs of qualitative data.
May 29, 2024: Session C3 Providing Access to Restricted Data
Controlled Access Management (CAM) for Research Data Initiative
Victoria Smith (Digital Research Alliance)
Not all research data can be openly published; some require controlled access due to containing sensitive, confidential, proprietary, or embargoed information. The Digital Research Alliance of Canada initiated an open Call for Participation in 2023 to launch the Controlled Access Management (CAM) for Research Data Initiative. The CAM Initiative aims to: - Assist researchers in managing restricted-access data within repositories; - Aid institutions whose researchers manage restricted-access data; - Support research data repositories in stewarding restricted-access data; - Foster collaboration within the research community for managing restricted access data. The CAM Initiative is establishing a collaborative forum involving repositories, institutions, and research organizations to improve the management of sensitive research data. In response to the open Call for Participation, 28 organizations from across Canada have become Partner Organizations in the CAM Initiative. Our Partner Organizations include universities, research hospitals, non-profit research organizations, specialized laboratories, and more. As a pilot project, the CAM Initiative seeks to foster partnerships and collaborations to advance shared objectives in data governance and ethical management of sensitive research data. This presentation will offer an overview of the CAM Initiative, detailing its Partner Organizations, ongoing collaborative efforts, and the outlined plans for future work.
Enhancing Data Governance in Trusted Research Environments: Mitigating Risks Beyond Re-identification
Deborah Wiltshire (GESIS-Leibniz Institute for the Social Sciences)
Trusted Research Environments have been pivotal in facilitating secure and ethical access for research to data deemed too sensitive or detailed to be shared outside of their secure enclaves. The increased availability of these data has facilitated vital research that can benefit our society. But these data come with risk and a TREs role is to mitigate this. Previously much of the focus of our data governance has been on preventing the re-identification of data subjects, maintaining an excellent track record in confidentiality breaches. However, in recent years we have seen the data landscape changing rapidly, with a wide array of new data types and analytical methods. This changing landscape brings new possibilities for harms which extend beyond simple re-identification. With this changing landscape, it’s time to re-evaluate our data governance to ensure we can continue to protect those who trust us with their data. This presentation will focus on work at TREs such as the Secure Data Center in Germany to review, re-evaluate and improve our data governance frameworks. This work utilises the Five Safes framework to steer the development of technical infrastructure, new training and the redefining of projects to ensure that we can ensure safe and ethical data use in a changing world.
Challenges in the relation of informed consent and data archiving
Oliver Watteler (GESIS - Leibniz Institute for the Social Sciences)
When collecting, preparing, and analyzing your research data you need a proper legal basis for processing. This legal basis should also cover the archiving and publishing of research data once you are done. Usually, the data is only released for secondary use after it has been anonymized or de-identified in some way. When talking about personal data one legal basis is informed consent. According to the European Union’s General Data Protection Regulation (GDPR) this consent is one out of six possible legal bases for processing personal data. It is frequently used in social science research in Germany and other European countries. But there are several ethical and legal issues concerning the relation between informed consent and data archiving and publishing. Every now and then data repositories or data archives like GESIS are faced with cases of dubious consent caused by for example insufficient information. For example, in some cases it is unclear whether the data can be made available for re-use at all. In other cases, researchers relied on a fieldwork company to take care of consent and research participants are only presented with general information about the data collection. In yet other scenarios researcher think that consent would only cover the processing during the project phase. In the latter case ‘anonymizing’ the data is believed to move the data beyond any legal or ethical restriction. Is it necessary to inform research participants in advance about the future use of their data after a project has ended? If no, why not? If yes, why is it not done regularly? In my talk I will give an overview of these challenges and discuss possible solutions.
May 29, 2024: Session C4 Mapping and Mobility
Tactile Map Creation to Support Wayfinding for the Visually Impaired
Noel Damba (Toronto Metropolitan University)
Daniel Jakubek (Toronto Metropolitan University)
Jimmy Tran (Toronto Metropolitan University)
Tactile maps are specialized maps designed for tactile perception, providing spatial information through touch rather than sight. These maps are crucial for visually impaired individuals, offering an accessible means to understand and navigate their surroundings. The significance of tactile maps extends beyond mere navigation; they empower visually impaired people with greater independence and confidence in exploring new environments. This capability is particularly vital in urban settings (such as a campus), where complex layouts can pose significant challenges. At Toronto Metropolitan University (TMU) Libraries, we have initiated and built upon an existing innovative process for creating tactile maps, addressing the unique needs of the visually impaired community. This presentation will outline the process for tactile map creation, highlighting the steps involved from data acquisition to the production of the final tactile map. We will also discuss the software used, the analytical methods employed, and the considerations necessary for creating effective tactile maps. Finally, we will propose a focus group approach to refine this process, ensuring the tactile maps produced are not only accurate but also user-friendly and practical for the intended audience. Through this presentation, we hope to share our insights and methodologies, contributing to the broader efforts in making spatial information accessible to all.
Scrapping the decennial census of England and Wales: the end of the road for tracking commuters?
Jemima Stockton (University College London)
Oliver Duke-Williams (University College London)
Censuses of the general population of England and Wales (E&W) started in 1801 and have taken place almost every 10 years since. A primary purpose of these decennial censuses – to serve as a basis for calculation of resource allocation across areas – has remained, but the data collected has grown in detail complexity, giving a richer population profile snapshot. The most recent E&W census, in 2021, may be the last if the government deems alternative, more cost-effective and/or accurate methods of data collection are available. See https://consultations.ons.gov.uk/ons/futureofpopulationandmigrationstatistics/ Ending census data collection would affect the future of E&W’s largest longitudinal nationally representative data resource, the Office for National Statistics Longitudinal Study (LS). The LS comprises linked census and life events data for a 1% sample of the population of E&W from 1971 onwards, with individuals of all ages becoming members if they have one of four undisclosed birthdays. It enables researchers to track members over the life course, discovering the impacts of an individual’s personal, social and environmental circumstances on their outcomes, including changes in their travel to work. How we travel is inextricably linked to human and wider planetary health. Understanding what steers individuals’ travel behaviours in future is central to informing transport policy that will protect human health through, for example, climate change adaptation and pandemic preparedness. In our dual roles as members of the Centre for Longitudinal Study Information and User Support (CeLSIUS) team, and as researchers, we demonstrate the power of the LS, presenting our ongoing investigation of commuting behaviour and associated factors – such as self-rated health, socioeconomic status, length of commute, residential and workplace stability, and neighbourhood walkability – among LS members. We highlight the implications of the proposed cessation of the census and discuss data sources needed to interpret future commuting habits.
One Place Studies: Multiple Sources, Multi-level Society & Maps
Peter Burnhill (Independent)
This is an update on ‘Surfing Sources From The Sofa’ (IASSIST/CARTO 2018, Montreal) which set out the prospectus for a Before/After Study of a mid-19th Century village. The results are now in, much available as monthly chapters at https://aldershotvillage.net, published as part of the collective efforts of the Society of One Place Studies. My attempt at an ‘inside-out, multi-level’ account of the Before made extensive use of the search facilities available in ‘subscription access’ genealogy websites, having abandoned use of the micro-census databases available to academics. This micro-history uses a mix of demographic sources, cadastral records and maps available online during Covid times, as well as visits to remote archives after Lockdown. Access to private correspondence and newspapers during 1853 exposed personality clashes and anti-Russian sentiment, providing context and insight into the plans made by Prince Albert and Commander-in-Chief Hardinge to build a Camp on the heathland to the north of the village. Palmerston had charge of the Militia and there was clamour for investment in the military. The passing of an Enclosure Act caused upset, however, with division amongst the Commoners whose rights dated back to Anglo-Saxon times: two-thirds agreement was required in terms of rateable value. Spoiler Alert: Britain and France declared war in 1854, troops despatched to fight in the Crimea. The Camp built, thousands of townsfolk then arrive and the village begins its transformation into a garrison town – the topic for the After study.
May 29, 2024: Session D1 Preservation and Curation
Love notes to our future selves: Digital preservation and data curation
Erin Clary (Digital Research Alliance of Canada)
Beth Knazook (Digital Repository of Ireland)
Mikala Narlock (University of Minnesota)
As the volume of research data grows in both size and complexity, concerns about maintaining access to files that are difficult (or impossible) to migrate, reliant on software that is not openly available or no longer accessible, or of such poor quality that they cannot be reused are heightened by an awareness of the environmental cost of digital storage. While archivists have a long history of practice guiding preservation decision-making and the deaccessioning or removal of records from archives, it is not clear how widely archival appraisal theory informs the approach to research data (Dorey, Hurley, and Knazook 2022). Long-term preservation of digital research data will prove challenging for repositories and preservationists, requiring substantially more information than is typically collected to support decisions about what to keep, how to maintain accessibility, and for how long. Preservationists need information about when the files were created, by whom, and using what software or tools, along with an understanding of the relevance of the data to the community of practice and its perceived long-term value. Curators play a critical role in communicating the informational value of datasets, and through their work with depositors, are in a unique position to collect information that will inform preservation decisions and reduce duplication of effort as data are (re)appraised over time, but they are often disconnected from preservation decision-making. In this panel discussion, we explore how training data curators in archival appraisal can help ensure long-term access to research data. We will hear an overview of a combined data curation and preservation workflow, and 3 institutions will discuss their experiences testing and refining a checklist developed at the Digital Research Alliance of Canada to record appraisal information about incoming datasets. We will end with a panel discussion and share a public version of the checklist.
Supporting researchers & DMP funder requirements in Canada: An overview of DMPEG activities and a new DMP template!
James Doiron (University of Alberta Library)
The Digital Research Alliance of Canada’s Data Management Planning Expert Group (DMPEG) develops and delivers publicly available DMP-related guidelines, best practices, content, and resources for supporting researchers and research excellence across Canada, including DMP templates, examples, and guidance materials. Additionally, DMPEG supports the ongoing development, maintenance, and sustainability of DMP Assistant, a freely available bilingual web-based tool providing templates, questions, and guidance for supporting researchers with their DMP needs. This session provides an overview of DMPEG activities and resources, notably focusing upon a new DMP template that has been developed specifically to support researchers in meeting DMP requirements at the funding application stage, including those implemented by the Tri-Agency, Canada’s national funders of research across the Social, Health, and Natural Sciences. An overview of the template and its content, including key guidance and questions, and how to access and use it within DMP Assistant, will be provided. Additionally, a description of the template development process, including review and feedback processes with key stakeholder groups such as the DMP Assistant Administrators Group, Tri-Agency colleagues, and the Alliance RDM Network of Experts will be discussed. A DMP assessment rubric corresponding to the new template, and aiming to be released by Spring 2024, will also be discussed, along with additional resources developed by DMPEG, including DMP examples, as well as future work and directions, including with respect to the DMP Assistant platform.
Dominique Roche (Social Sciences and Humanities Research Council of Canada)
This talk will provide an overview of recent policy developments in the area of research data management and sharing in Canada. I will discuss the implementation of the Tri-Agency Research Data Management Policy, which applies to federally funded research; the Office of the Chief Science Advisor’s Roadmap for Open Science, which applies to research by the Government of Canada’s science-based departments and agencies; and initiatives by Canadian journals and research institutions to promote research data management and sharing. The talk will provide background for an unconference-style ‘birds of a feathers’ session to discuss the planned implementation of the Tri-Agency Research Data Management Policy’s Data Deposit Requirement.
Institutional RDM Strategies: A Canadian Context
Lucia Costanzo (University of Guelph)
Alexandra Cooper (Queen’s University)
Kelly Cobey, University of Ottawa Heart Institute (Open Science and Metaresearch Program)
Dylanne Dearborn (University of Toronto)
Elizabeth Lartey (The Digital Research Alliance of Canada)
Dominique Roche (Social Sciences and Humanities Research Council of Canada)
Michael Steeleworthy (Wilfrid Laurier University)
Minglu Wang (York University)
In March 2021, the Canadian federal funding agencies (Tri-Agencies) announced a Research Data Management (RDM) Policy requiring research institutions eligible to administer funding to develop and publish an institutional RDM strategy by March 2023. The Institutional RDM Strategy Review Group (composed of representatives from the Tri-Agencies, Digital Research Alliance of Canada’s Research Intelligence Expert Group, and University of Ottawa Researchers) has since collated all published institutional RDM strategies and conducted a quantitative description on institutions’ submission status and a qualitative analysis on characteristics of the strategies. This presentation will report on the preliminary findings of our study, focusing on the current RDM environment and initiatives at Canadian institutions that are reflected in their RDM strategies, such as the context of RDM strategy development, RDM governance, RDM related guidelines and policies, and RDM engagement strategies. Our presentation will speak to the readiness of Canadian research institutions to meet the agencies' RDM Policy’s incoming requirements for Data Management Plan (DMP) creation and data deposit. We will also report on how Canadian institutions recognize and discuss Indigenous data sovereignty, disciplinary RDM requirements, and EDI issues related to RDM. The results of our mapping provide an important step in the ongoing implementation of Canadian RDM activities and to ensure Canada’s continued leadership and innovation agenda. Our understanding of institutional strategies obtained from this work will serve to identify important organizational and infrastructural advancements, but also gaps in national policy, support services, and community infrastructure. We will highlight the previous efforts and future needs for a national level coordination and collaboration to foster RDM communities of practice and reduce duplication of effort.
May 29, 2024: Session D3 Data Services of the Future
Exploring the Use of Text as Data in Political Science
Hilary Bussell (Ohio State University)
In political science, as in other social sciences, the rise of text mining “has vastly broadened the scope of questions that can be investigated empirically” (Benoit, p. 461). As a political science librarian supporting a department with an emphasis on computational methods, I have encountered a steady rise in questions from researchers looking to use a variety of textual data sources. These range from whether library books can be digitized for computational analysis, to whether our existing databases allow for automated searching and bulk downloading, to requests for new licensed resources for text mining. Additionally, recent changes to the social media data landscape, such as the monetization of the Twitter API, are shaking up researchers’ approaches for finding and accessing data, causing them to turn to the library for guidance. To better understand the specific needs and challenges of political science researchers using text mining methods, I conducted an analysis of recent dissertations from top political science programs. Specifically, I looked at what types of sources are used for text mining, how these sources are accessed, what challenges researchers encounter in working with large-scale textual data, and how libraries are involved in supporting this work. In this presentation I will highlight findings from this study, discuss how they have informed the way I and others at my institution are supporting text mining in the social sciences, and consider implications for libraries as we navigate the future of text as data. Works Cited: Benoit, K. (2020). Text as data: An overview. In L. Curini & R. Franzese (Eds.), The SAGE Handbook of Research Methods in Political Science and International Relations (pp. 461-497). SAGE Publications Ltd.
Boosting Data Findability: The Role of AI-Enhanced Keywords
Kokila Jamwal (GESIS – Leibniz Institute for the Social Sciences)
In today’s data-driven world, data has become more valuable than ever. Finding relevant data within a vast expanse of data can be quite challenging at times. Researchers have been working on finding different methods to offer the relevant data to users easily and swiftly. Focusing on data and its reusability, FAIR principles emphasize on the importance of findability of data. Finding the relevant data will make the life of data users easier and at the same time improves the reusability quotient of the data for data producer. For data archives, providing the relevant data to the data consumers is important. Data archives use keywords to define a study. These keywords are mostly chosen from the available set of controlled vocabularies (CVs) in the form of Thesauri. Sometimes the data producers, unable to find suitable keywords for their study use their own keywords, called user-defined keywords. User-defined keywords on one hand solves data producers’ problems, but on the other hand poses a challenge for the data archive to make data findable for future data consumers. One such example is GESIS Search. GESIS Search is a web-portal which provides a platform to find surveys and social science research data. Users can query the research data based on metadata fields like Topic, Author and many more. In this paper, we focus on Topic field to make the research data more findable. However, the user defined keywords in the Topic field acts as hinderance to data findability. Therefore, the role of assigning CVs to user-defined keywords is crucial to solve this challenge. Manually assigning CVs to user-defined keywords costs a majority of resources and time. We aim to employ Artificial Intelligence (AI) techniques to automate the process of broader-term assignment from CVs, improving the findability along the way for the studies in GESIS Search.
David Schiller (University of Applied Science of the Grisons)
This talk will discuss approaches for linking different data sets with the goal of offering richer data sources for research. Approach one will make research easier by hosting all datasets in one place and creating one big data source. While this approach benefits from easy handling of data in one place, it also comes with issues like data ownership, data protection, and data documentation/services. Approach two is not demanding one single storage location. Data sets may be stored in different places, but they need to be structured to be “Linkage-Ready”. Being linkage-ready, or interoperable, demands, beside a useful identifier to link cases, a coordination of research domains, instruments, and universes. The talk will discuss the pros and cons of both approaches based on the current situation regarding the availability of data for research on education and learning in Switzerland.
May 29, 2024: Session D4 Map and Data Librarianship
Soil Data, Desert Data, Changes Over Time Data: Geospatial reference questions and how you might answer them.
Sharon Janzen (Brock University)
Using polling software, this presentation will involve attendees by asking them to brainstorm sources for answering geospatial data reference questions. From soil profile data to food deserts, sharing knowledge with other reference facilitators may result in a healthy discussion about how cartographic reference assistance has changed over time and how it might look in the future. Have you ever received an intriguing reference question and wondered how you were going to answer it? Sometimes the solution is clear while other instances might require 'calling a friend'. Join us in a knowledge sharing session about GIS and cartographic reference transactions.
IMGIS and Open Repository for GIS Teaching Materials
Amanda Tickner (Michigan State University Libraries)
Jennie Murack (MIT)
Lena Denis (John Hopkins University Library)
Emma Slayton (Carnegie Mellon University)
IMGIS is an open repository for GIS related teaching materials primarily targeted at library GIS educators. The IMGIS OSF page is an informal hub for sharing materials and resources within the GIS and Map Librarian community. The goal of the collection is to enable librarians or GIS practitioners to easily access pedagogical materials, data resources, and other helpful instruction or practice guides. This collection platform was developed in partnership between the IASSIST geospatial interest group and ROLEGGE. This presentation provides an overview of the repository, describes some of our challenges in developing the repository, and considers the problem of gaining professional credit for and formal evaluation of open educational resources.
Digital Narratives in Flux: Navigating Preservation Challenges for Classic ESRI Story Maps
Melinda Kernik (University of Minnesota Libraries)
For the past decade, Esri's story maps platform has offered a way to combine maps, text, images, and other multimedia, facilitating the creation of engaging narratives with minimal technical barriers. The challenge of preserving work in this format looms large, however, as the retirement date for the "classic" version of the platform approaches. The presentation describes an effort undertaken at the University of Minnesota to contact authors for hundreds of public-facing story maps. It reflects on the difficulty of managing scholarly outputs in a system not primarily designed for that purpose and of representing web-based work within the library record.
May 30, 2024: Session E1 Data Services of the Future
Data Science and emerging technologies: the role of libraries and why partnerships matter
Mara Blake (North Carolina State University Libraries)
Joel Herndon (Duke University Libraries)
Andrzej Rutkowski (University of Southern California Libarries)
The panel will offer viewpoints on the role of academic libraries as campuses continue to place increasing emphasis on data science and associated new technologies. With support from the Alfred P. Sloan Foundation, a cohort of representatives from universities around the United States convened in spring of 2023 to discuss topics on models in data science consulting and collaboration in higher education. Participants represented many different administrative locations on their campuses, including academic departments, stand alone centers, and libraries. The three panelists represented their respective libraries in this project and will share perspectives from their own work on ways that libraries are contributing to the data science ecosystem in higher education. The panel will kick off with some of the key findings and takeaways from that project and then each panelist will contribute more detailed perspectives. Joel Herndon will share new collaborations among Duke Libraries, the Center for Computational thinking, and Duke’s Masters in Interdisciplinary Data Science to support data science across campus. Mara Blake will share more about the collaboration between the NC State Libraries and Data Science Academy to provide data science consulting to campus and their successes engaging graduate students in the provision of such services. Andy Rutkowski will talk about the challenges of developing a new data services model at USC and working on a set of grants with an interdisciplinary team of faculty that are focused on providing data consultation and support to a local community organization.
May 30, 2024: E2 Policy & Culture
Navigating the future of data sharing: The impact and cost of expanded public access requirements
Gail Steinhart (Invest in Open Infrastructure)
Eric Schares (Iowa State University)
Katherine Skinner (Invest in Open Infrastructure)
Updates to US Federal government policies requiring public access to federally funded research are prompting concerns over the cost and effort required to meet expanded public access requirements. Funders’ policies typically allow researchers to include the costs associated with providing public access in their grant proposal budgets. Invest in Open Infrastructure’s research seeks to shed light on what constitutes “reasonable” cost for providing public access to data and publications. We will present quantitative findings on the characteristics of datasets arising from US federally funded research, and summarize available information on the cost of providing data sharing services along with the price those services charge to researchers or their institutions. Finally, we will share our approach to and preliminary results from qualtitative research with stakeholder groups: researchers and their institutions, repositories, and scholarly societies. For each actor in the system, we seek to understand their plans, workflows, decision points, and concerns with respect to the evolving requirements. While this project takes a US focus, the findings are of broad potential interest as the practice of open science and research data sharing are referenced in the UNESCO Recommendation on Open Science, and subject to numerous national research funder policies around the world.
Furthering an open science culture at the Swedish University of Agricultural sciences
Hanna Lindroos (SLU)
Open access to curated research data contributes to spreading of knowledge and to an increased ability for public and private players, as well as private individuals, to contribute to a more sustainable, democratic society and more efficient information exchange. The routines necessary for open access to research data will not only further “good data hygiene” among the research community, but also clarify in which situations data is in need of special protection. To achieve this degree of data management, certain levels of routines, standardization and infrastructure are required. In addition to this, a supporting culture and acceptance of the concept is required from researchers. We aim to create the physical, organizational and cultural conditions necessary to make more research data from SLU available according to the FAIR principles. A well-functioning process; from project planning, data acquisition, analysis and storage through to archiving and publication of research results, requires an equally well-functioning collaboration between the functions involved in data management support. It also requires clear routines as well as hard- and software that can connect the different parts of the process. In addition to all this, it depends on researchers and managers gaining an understanding of the value it may generate and a willingness to adapt routines to this. To achieve this a number of sub-projects have been initiated, all on the condition that a front desk – back office system is set up as the single point of entry, a function that can either answer or re-direct all data management related questions from researchers. The other sub-projects focus on i) building capacity, ii) development of routines and processes, iii) identifying and recruiting the required competencies, and iv) a communication and implementation plan.
Reflecting on the history of Research Data Management (RDM) in Canada, we can gain insights and inspiration for developing new RDM services, infrastructure and charting new pathways. In the Canadian academic community, RDM is often supported by data specialists and librarians, and has developed an advanced service structure based on academic libraries and regional networks before a national RDM service was introduced. The range of support, which includes online tools, guides, training, consultations, research data repositories, and collegial networks, has grown dramatically over the past ten years. Canada is a large, geographically diverse and bilingual country with five distinct regions (Atlantic, Central, Prairie, West Coast, and North) where regional grass-roots support has grown into nationally coordinated services. The Digital Research Alliance of Canada (Alliance) has integrated RDM as an area of focus and is leading and coordinating the Canadian research community through the new era of infrastructure and service development. Looking ahead, the continuing contributions from research institutions, regional groups, and the Alliance will all be necessary to bring Canadian RDM support to a new level, along with more comprehensive consideration and inclusion of data management support to research on our Indigenous populations. The presentation will draw on the content of a chapter in a recent open-access book, "Research Data Management in the Canadian Context: A Guide for Practitioners and Learners", which focuses on the national and regional development of RDM in Canada. Authors from across Canada contributed to the open-access book, which is in use both in RDM training and Library and Information Science classes to educate and inform future and current data professionals. The book is released in both English and French, the official languages of Canada.
May 30, 2024: Session E3 International Perspectives
CESSDA Data Catalogue and ELSST integration
Kristina Strand (Sikt - Norwegian Agency for Shared Services in Education and Research)
Morten Jakobsen (Sikt - Norwegian Agency for Shared Services in Education and Research)
Jeannine Beeken (UK Data Service)
Hilde Orten (Sikt - Norwegian Agency for Shared Services in Education and Research)
CESSDA Data Catalogue (CDC) contains descriptions of more than 40,000 data collections held by CESSDA’s Service Providers (SP), originating from over 20 European countries. The descriptions follow the DDI metadata standard and are harvested through OAI-PMH from the SPs. A central way of discovering data collections in the CDC is through the use of keywords. Many SPs use CESSDAs own thesaurus for the social sciences, European Language Social Science Thesaurus (ELSST), for keywords. ELSST aims to cover all relevant research subjects within the social science and consists of over 3,300 concepts available in 16 languages through an interactive webservice. In order to increase data discoverability, and fully utilize the structure of ELSST, we wanted to integrate CDC and ELSST. An important part was to maintain the user friendly UI of CDC, and at the same time support the complexities of ELSST. As a solution for this, the integration between the two services are done using API lookups of relevant metadata elements, presenting themselves as links for the user. Whenever a user discovers a data collection in CDC, there is a check to find out if a given keyword describing the data collection exists as a concept in ELSST. If this is the case, a link to the concept in ELSST is generated. The link will take the user to the concept in ELSST, where the user can read more about the concept and navigate through the thesaurus structure. From ELSST the user can directly search for data collections in the CDC marked with concepts the user discovers. The integration between the services is both simple and powerful. It enhances the functionality of both with regards to data discovery at the same time avoiding introduction of unnecessary complexities.
Population and health data governance in the era of digital technology in Africa
Daniel Mtai Mwanga (African Population and Health Research Center)
Background: This paper explores the evolving digital landscape in Africa, focusing on its challenges and opportunities for promoting data governance and sharing in Africa. As the region integrates digital technology into health systems and adopts artificial intelligence (AI), the need for collaborative data sharing becomes crucial. The African Union has established cybersecurity policies and data protection measures but progress to their implementation has been slow, with only a fraction of countries ratifying the Malabo Convention of 2014 as of March 2023. To contribute to these efforts, the African Population and Health Research Center (APHRC) organized a data governance policy dialogue in September 2023, in Naivasha, Kenya, involving diverse stakeholders to discuss the challenges and opportunities in the African data ecosystem. Methods: Participants were drawn from the national data protection office, ministry of health, national statistics bureau, population and health researchers, data producers, demographic surveillance systems, data managers and data scientists. Sessions included technical presentations, round table and panel discussions. Findings and Conclusions: We found that key gaps to data sharing in African included lack of standardized data management and sharing principles, administrative and bureaucratic barriers, mistrust resulting from fear of data misrepresentation, misuse and breach of privacy, and data systems that are not interoperable. To navigate this, the academia has a crucial role to play. First, in strengthening capacity and creating awareness about data privacy and protection, and second, in contributing to research on best practices. This includes working with research community to develop training curricula on data governance and sharing. Further, there was emphasis on development of comprehensive data governance frameworks at institutional, national and regional levels. The data governance framework should be tailored to the unique challenges and opportunities in Africa, and leverage the power of AI in its implementation and research.
Refactoring data delivery: The case study of the new tools for census flow data at UKDS
Vassilis Routsis (University College London)
Oliver Duke-Williams (University College London)
This paper presents the API and user interface for disseminating census flow data developed as part of the UK Data Service (UKDS) strategy to modernise its supported software. It explores the development challenges and the diverse technologies employed, contextualised within the broader scope of data services. The new tools were in the final stages of development at the time of writing. We opted to construct these tools from the ground up, a decision influenced by the complex nature of the underlying data, which necessitated high levels of flexibility and adaptability. This paper will critically evaluate the decision-making process, weighing the advantages and disadvantages of developing in-house solutions versus the trend of relying on external, often commercial, platforms. Third-party solutions frequently compromise functionality and the ability to tailor to specific requirements, especially when dealing with highly complex data. The presentation will showcase some functionalities of these new tools, highlighting ongoing enhancements, including integrating AI and machine-learning technologies. We will also discuss the advantages of the design principle to separate the user interface from the backend API. This approach improves user experience and promotes better interoperability. While acknowledging that budget and overall resource constraints are a common hurdle in such initiatives, this case study provides insights into the feasible options available to data services striving to deliver robust and comprehensive data to their users. The insights and experience shared are intended to contribute to the dynamic relationship between data services and information science, especially when dealing with data related to the social sciences. By offering practical examples and lessons learned, we aim to inform those enhancing data accessibility, utility, and distribution in a rapidly evolving digital environment, where data services often face challenges in keeping pace with technological advances.
Data Reuse Among Digital Humanities Scholars: a Qualitative Study of Practices, Challenges and Opportunities
Lina Harper (Digital Research Alliance of Canada)
Scholarship is more and more data-driven, and as digital tools continue to evolve, sound data use practices among scholars are now essential for scientific discovery. Data reuse has become central to an emerging cultural push towards a more open way of doing science. This study investigated the challenges and opportunities in reusing research data among digital humanities (DH) scholars. Its findings may serve as a case study for how disciplinary practices influence the ways in which researchers reuse data. The aim of the study was to enhance current thinking and provide insight for data, information and library professionals who work at the intersection of the humanities and data. Data were collected using interviews. An analysis of semi-structured interviews with 12 DH scholars working at universities, research centres and cultural or heritage organizations around the world was performed, and found that lack of time and resources, inconsistent data practices, technical training gaps, labour intensity and difficulties in finding data were the most challenging aspects of data reuse. Findings also revealed a number of enabling factors in data reuse chiefly collaboration and autonomous learning as a feature of DH. The results indicate a gap between data reusers and data sharers - low rates of sharing reduce the amount of findable and accessible data available for reuse. Both data reusers and data sharers must begin to see themselves as embedded into the research data lifecycle within the research infrastructure. The recommendations includes cultural changes to policy, education, and infrastructure. Other interventions could include boosting data literacy, developing self-paced RDM training, improving data discovery systems, rewarding data sharing, and creation of data stewardship networks.
Charting a course towards coordinated research data services
Janet Rothney (University of Manitoba)
Meghan Goodchild (Queen's University and Borealis)
Alexandra Cooper (Queen's University)
Due to the growth of funder and journal policies, researchers across all disciplines are increasingly required to manage data in accordance with requirements related to data management planning and data deposit. Canadian academic institutions that are eligible to administer federal research funding were also required to create an institutional research data management strategy this past year (as described in the Tri-Agency RDM policy). As a result, academic institutions need to develop strategic, coordinated approaches to providing research data services, in order to avoid siloed and inefficient duplication of services across campus and potential gaps in programming and support. This presentation will compare and contrast the efforts of two Canadian university libraries who are participating in the Ithaka S+R initiative “Building Campus Strategies for Coordinated Data Support,” a cohort-based project with 29 universities across Canada and the US. Based on analyses of research data services currently offered and the experiences of researchers in navigating and accessing existing services, the presenters will explore various topics, including the types of services currently offered at these two institutions, the roles various service providers play (e.g., library, research office, IT), and the areas identified as opportunities for future investment. Other topics will include potential mechanisms of coordinating and collaborating to develop and offer effective research data services and the various opportunities for leveraging regional or national services and infrastructure. Given the relevance of research data services and research data management for institutions broadly, the questions and issues discussed will be of interest to IASSIST attendees regardless of geographic location.
Johnson Jon, CLOSER, Social Research Institute (UCL)
Mills Hayley, CLOSER, Social Research Institute (UCL)
Oldroyd Becky, CLOSER, Social Research Institute (UCL)
CLOSER is the interdisciplinary partnership of leading UK social and biomedical longitudinal population studies, the UK Data Service and The British Library. Evolving from a project to a sustainable infrastructure presents a variety of challenges at all levels both within an organisation, and for its collaborators. The presentation will reflect on these challenges from the perspective of CLOSER and more generally within a university environment. Areas covered will include creating structures and processes which support the long term, staff development and training, the development of software and changes in technology to support these organisational changes. The presentation will also draw some lessons that could be more widely useful from this case study.
Uncharted: 'Just because you can, doesn't mean you should' and other revelations from an analyst turned librarian.
Meg Miller (University of Manitoba)
The future of data is many things, and intertwined with it is the future of those of us who act as its shepherd. In this talk, the presenter will discuss their experience over the last four years of taking on a newly created role at a new (to them) university as a single point of service for GIS & Data Visualization support and how they got there. Focus will be placed on development work with ArcGIS Enterprise, research projects conducted but not written up, early career librarianship, and the importance of community.
What A Rig: Starting a Map Library at a Small Institution
Martin Chandler (Cape Breton University)
As a small university on the east coast of Canada, Cape Breton University has seen a number of rapid changes in recent years. With new librarians come future-thinking projects, including the development of new collections as part of future-planning for library needs and use to meet the university's growth. In order to meet anticipated future needs, I have undertaken the proposal, development, and execution of a new print map collection for the university. In this talk I will share how I made a new collection happen, why I did so at a time of shrinking map collections, and what I anticipate the future of this map collection will be in the local, regional, and Canadian national context.
Will-o’-the-wisp, map collection tours, hauntology and deep time spectres.
Larry Laliberte (University of Alberta)
In renewing ways to navigate the uncharted institutional spaces of interred maps, and their containers, recent William C. Wonders map collection tours and “top-of cabinet” displays have been crafted to open up ways to re-read cartographic renderings as apparitions, situating their re-inscription, and retention in the spectres of deep time. By incorporating tactile ambience, aurals, and experiential movement, in-person tours aim to dislocate the map collection into Anthropogenic fragments (shales), and their impressions (fossils) that captured the uncanny energy embodiments of extractive dispossession, and the resulting wrack lines that continue to haunt the landscape.
May 30, 2024: Session F1 Open Source Communities
Evolution of the open-source Dataverse repository and community
SONIA BARBOSA, The Dataverse Project, IQSS (Harvard University)
Amber Leahey, Scholars Portal (Ontario Council of University Libraries)
Stefano Iacus, The Dataverse Project, IQSS (Harvard University)
Ceilyn Boyd, The Dataverse Project, IQSS (Harvard University)
Gustavo Durand, The Dataverse Project, IQSS (Harvard University)
Steven Mceachern (Australian National University)
In the dynamic realm of data management and sharing, academic research communities are experiencing a significant transformation in how data repository infrastructure evolves and adapts. This presentation aims to explore the dynamic ways in which organizations and communities are embracing and shaping the open-source Dataverse Project repository software to address the escalating needs of data management and sharing. This includes aspects such as storage, metadata, file formats, publication workflows, data discovery, access, retrieval, analysis, and preservation. The discussion will encompass a broad spectrum of Dataverse Software adopters and interest groups (DVN IGs), providing insights into the latest updates and integrations from the global community, notably featuring contributions from the Harvard Dataverse Repository. These developments and adaptations mirror the ongoing innovations that collectively define the open Dataverse data repository infrastructure and its community. Within the community, there will be a spotlight on collaborative initiatives. These initiatives cover diverse areas such as new storage solutions for handling large data, ensuring the security of sensitive data, managing disciplinary-specific data, and fostering community-driven approaches. These efforts aim to nurture the evolution of open and interoperable data repositories. The focus will be on how these initiatives streamline processes for data deposit, sharing, collaboration, and their role in establishing open and standardized data management practices across various research domains and institutions.
May 30, 2024: Session F2 Data Service Partnerships
Charting a Collective Course for Data Repositories at the World Data System
Reyna Jenkyns (World Data System - International Technology Office)
Meredith Goins (World Data System - International Program Office)
David Castle (World Data System - Scientific Committee)
The World Data System (WDS), an affiliate member of the International Science Council, serves a membership of trusted data repositories and related organizations. Governed by a Scientific Committee, the WDS consists of an International Program Office (WDS-IPO) based in Oak Ridge, Tennessee, USA, and an International Technology Office (WDS-ITO) based in Victoria, BC, Canada. The WDS mission is to enhance the capabilities, impact, and sustainability of our member data repositories and data services. In this presentation, we report on progress for the 2022-2024 Action Plan of the World Data System, which has four objectives: 1) Provide services and support to existing and new members, 2) Develop value narratives for WDS members, 3) Provide global leadership and agenda setting, 4) Enhance access, quality and accessibility of data worldwide. Featured activities pertain to important topics like open data for open science, artificial intelligence, data repository attributes, data commons, Indigenous data governance, more inclusive regional and disciplinary scope, and more. We also take a preliminary look at future plans and rationale for the uncharted future of data, as we look ahead to 2025 and beyond.
De Facto Data Librarian? How Business Librarians' Data Skills Support the Future of Library Data Services
Teddy Stocking, University of Nevada (Reno)
Nancy Lovas (UNC Chapel Hil)
As we plot the future of data services in libraries, the interdisciplinary nature of data services means that increasing collaboration between functional data librarians and subject liaison librarians is a key trend. As well, subject liaison librarians are learning new skills and recognizing existing literacies to better serve their user populations. An example of this is data skils inherent to business librarianship. Business librarians are pragmatic data librarians, often not recognizing the extent of their data skills and abilities to provide data services. Especially in workplaces without specialist or standalone data librarians, business and economics librarians often function also as the de facto data librarian, handling data reference questions without additional support. The diverse, complex nature of business information means that business librarians have a significant ability to support patrons as they navigate siloed data, search for free data sources, and build datasets when no existing source is readily available. This presentation will focus on how business librarians’ data skills complement services provided by functional data librarians, and how business librarians already have many of the skills to meet patrons’ data needs in the absence of specialist support. From their experiences as business librarians and examples from the literature, the presenters will lay out basic strengths and challenges data librarians or service coordinators can anticipate from building partnerships with business librarians. In particular, the session will present the value business librarians can bring to data sourcing, data synthesis, and incorporating data alongside other information sources. Presenters will also identify intervention points where data experts can support business librarians, in areas such as data visualization, alternative data sourcing, and systematic approaches to data literacy.
Jessica Ko (Roper Center for Public Opinion Research)
Kelsie Norek (Roper Center for Public Opinion Research)
In the past several years, the Roper Center for Public Opinion Research, the world’s oldest archive of social science data and the largest specializing in public opinion data, has increasingly cultivated partnerships and collaborations with other institutions. Our primary collaboration has been with Digital Divide Data (DDD), a company based in India, Cambodia, Laos, and Kenya that provides youth development through education and employment programs to underserved communities. DDD provides students with paid positions that give them the opportunity to learn how to interpret and clean survey data, skills which they then use in their educational and career pursuits. Through joint efforts with DDD, Roper has been able to target the conversion of over 600 of our historical ASCII datasets to modern formats as well as processing of over 100,000 questions from toplines incoming daily for entry into our question-level iPoll database. This presentation will provide an overview of our ASCII conversion and question-level curation projects, DDD and their mission, our training and review processes, and potential future projects.
May 30, 2024: Session F3 Preservation
The IPUMS Business Process Model: Instituting a workflow mapping strategy to support archival processes
Diana Magnuson, Institute for Social Research and Data Innovation (University of Minnesota)
IPUMS is the home of the world's largest accessible database of census microdata, comprising over two billion records. The signature activity of IPUMS is harmonizing variable codes and documentation to be fully consistent across datasets. Major data types include U.S. and international censuses, global health data, major U.S. demographic surveys, labor force surveys, and educational surveys. In addition to census and survey microdata, IPUMS integrates and disseminates area-level census data and electronic boundaries describing census geography for the U.S. and internationally. While the activities of harmonizing data and documentation are similar across IPUMS projects, their processes are organized around the partners, data sources, unique content issues, and timelines of each project. The IPUMS Archive is instituting a workflow mapping strategy to further identify IPUMS process and metadata capture points for the data archive. Drawing on two business process models, the Generic Statistical Business Process Model (GSBPM) and the Generic Longitudinal Business Process Model (GLBPM), archival staff customized the GSBPM and the GLBPM to create an IPUMS Business Process Model (IPUMS BPM). The IPUMS BPM reflects the use of secondary data sources and the work of harmonization and integration to create a data infrastructure that supports research across time and space. Internally, the IPUMS BPM provides a clear visualization of our workflow from external submission of data, harmonization process, extraction systems, and archival preservation of metadata. The challenge for archival staff is furthering the understanding and adoption of the IPUMS BPM within the IPUMS project groups, and to identify metadata production points that requires the intervention of the archive for provenance and preservation purposes. This presentation identifies the value of this mapping approach in gaining a clearer understanding of the role of the archive within project work cycles and points where activities intersect.
Online Resources for Scholarship At Risk of Loss: Doing Data on The Keepers
Peter Burnhill (Independent)
Is there any source of data more important for social science than each nation's published heritage, the content now issued on the web? When IASSIST began, and for many years afterwards, there was expectation that research libraries kept all kinds of that important printed stuff safe on their shelves, not only scholarly journals but other periodicals such as newspapers, trade magazines and government publications. The early IASSISTers struggled to have datasets regarded as first class objects within academic support services. That battle might be won, but now our research libraries no longer keep the digital equivalent of that other important stuff. Instead they depend upon third parties as keepers of content for scholarly communication journals. The Keepers Registry, hosted at https://keepers.issn.org, has become the global monitor on what is being kept by the likes of CLOCKSS, Portico, PKP-PLN, as well as some national libraries and university consortia. The Internet Archive has also signed up as a Keeper. ISSN is now applied to the websites of governments, news media, commerce, trade and professional bodies and a myriad of other platforms reckoned to have lasting significance. Recent analysis of data from the ISSN Portal indicated progress but also shines a light upon what is reckoned to be at most risk of loss. Not only ‘The Long Tail’ of small publishers, with alarm bells for Open Access journals, but even more so for the wider set of published heritage which is issued online. Social media diverts our attention; let's not neglect the web resources more formally published online as digital content as they are critical for our scholarship and for public understanding, now and into the future.
A panorama of self-deposit practices in European and North American archives
Paul Colin, (Center for Socio-Political Data (Sciences Po/CNRS))
Alina Danciu (Center for Socio-Political Data (Sciences Po/CNRS))
Guillaume Garcia (Center for Socio-Political Data (Sciences Po/CNRS))
This research paper analyses self-deposit operations in European and North American archives. By examining the obstacles to the introduction of self-deposit services stricto sensu (with minimum involvement of the data stewards), which are still rare, we will look closely at how data stewards support the curation and documentation effort that must be handled, at least partially, by the depositors themselves. Our analysis shows that self-deposit practices are a trend that is particularly marked in emerging archives. Very few studies tackled this subject head-on: how are curation practices actually carried out? What are the various costs involved? What types of guidelines are provided to depositors? How are the archive's recommendations and instructions accepted and followed by depositors? To what extent is the data re-use potential taken into account when it comes to self-deposit? While some of these questions have been addressed in the literature, albeit in very general terms, there is very little concrete feedback out there. This is the subject our paper tackles. Empirically, our research is based on 20 interviews (an in-depth questionnaire and/or guided interviews conducted remotely) conducted, between 2021 and 2022, with CESSDA archives or repositories referenced by the Dataverse network. In this paper, we will discuss several pitfalls in the implementation of open science policies: the difficulties in finding the right balance between the archive's workflows and the flexibility granted to depositors; the gap between the limited resources that data curation teams have and the extent of the tasks that ultimately fall on them to make data FAIR; the uncertainties over the modes of reasoned division of labour that can be put in place between depositors, data stewards and other data professionals, to optimise the self deposit workflow and the articulation deficit between self-deposit practices and the data re-use potential.
Queering the Numbers: The IASSIST LGBTQ+ Data Guide
Kevin Manuel (Toronto Metropolitan University)
Michele Hayslett (University of North Carolina at Chapel Hill)
Van Bich Tran (Temple University)
As a subgroup of the IASSIST Diversity, Equity, and Inclusion Data Resources Interest Group, the IASSIST LGBTQ+ Data Guide team was formed in September of 2023. Chaired by Kevin Manuel, Data Librarian from Toronto Metropolitan University, it has members from around the world that are contributing their knowledge and expertise with LGBTQ+ data resources. The subgroup was formed as there were discussions that came out of the development of the IASSIST Anti-Racism Resources Guide which recognized that there were other populations that were underrepresented, and that include LGBTQ+ people. Due to a history and even current context of oppression and exclusion LGBTQ+ people, data on these communities are often non-existent, small samples, or only recently collected. Please join Kevin, Michele and Van to learn more about the efforts of the IASSIST Diversity, Equity, and Inclusion Data Resources Interest Group to produce data guides that can help researchers find the often hard to find statistics they are looking for.
“Nothing for us, without us”: Stakeholders involvement in preserving social justice data.
Aileen O'Carroll (Digital Repository of Ireland)
Lorraine Grimes (Maynooth University)
Clair Lanigan (Digital Repository of Ireland)
On 25th May 2018, Irish citizens voted to remove the controversial “Eighth Amendment” to the Irish Constitution, opening the way for the introduction of legislation allowing abortion in some circumstances. Feminist and left groups had been campaigning for a liberalisation of abortion laws in Ireland from the 1980s onwards. In this presentation we will give an overview of the Archiving Reproductive Health project which provides long-term preservation and access to the many at-risk archives generated by grassroots women’s reproductive health movements during the campaign. The project was funded by the Wellcome Trust and co-ordinated by the Digital Repository of Ireland (DRI). DRI is a trusted digital repository, which provides reliable long-term preservation and access to Ireland’s humanities, cultural heritage, and social sciences digital data. The collections preserved include: the records of the stakeholder organisations in the Repeal of the Eighth campaigns, the In Her Shoes: Women of the Eighth Facebook page, the first Facebook dataset to be archived in the Digital Repository of Ireland and research data generated by social science researchers examining the campaign. Not only does the project archive material from the recent repeal campaign (2012 - 2018), but it also archives historical campaigns on reproductive health including interviews with activists who were involved in the women’s movement in the 1970s, 1980s, 1990s and early 2000s. This paper will give an overview of the collections. We will also outline both how the project benefited from the advice and guidance of key stakeholders as well as our less successful attempts to ensure that stakeholders’ were centred in the archiving process.
May 30, 2024: Session F5 Mapping Social Data
Urban Tapestry: Weaving Geospatial Data with Social Sentiment to Understand Crime Patterns
Nadia Kennar (UK Data Service)
In the pursuit of creating safer, more resilient urban spaces, this research addresses the fundamental question: How does the interplay between urban design and public sentiment shape crime trends and community safety? By merging geospatial analysis of land use with sentiment analysis from social media, this study explores the profound influence of urban landscapes on both crime occurrences and societal perceptions. The analysis begins with a comprehensive set of urban data to delineate crime hotspots and uncover correlations with the urban form, shedding light on the 'when' and 'where' of crime incidents. Complementary to this is the sentiment analysis of social media content, which taps into the community's pulse on crime and safety, offering a deeper understanding of the 'how' and 'why' behind public reactions—dimensions often underrepresented in traditional crime analysis. Despite limitations in geotagged data, innovative natural language processing techniques, aim to approximate spatial sentiment distributions. This bifocal approach not only pinpoints areas of concern but also clarifies the public's response to crime, which could steer policy and urban planning towards impactful interventions. The integration of diverse data types is essential for a holistic view, one that balances quantitative crime data with the qualitative nuances of human experience. The presentation undertakes a deep dive into how data-driven insights can empower data and geospatial professionals within research, libraries and archives to craft narratives that blend numbers with nuance, aiding in the design of safer, more resilient urban space.
Mapping a Legacy: Ukrainian Immigration and the Settlement of Western Canada
Sandra Sawchuk (Mount Saint Vincent University)
Alexandra Cooper (Queen's University)
Ukrainian Canadians began to settle in western Canada in 1892. Settlement occurred in 'blocks' – farm-sized quarter sections demarcated by the Dominion Lands Survey. Block settlement allowed the new migrants to settle among family and communities and contributed to the growth of a strong Ukrainian Canadian community. This was initially preferred by immigration and land agents, as it made their work easier, but the practice soon fell out of favour. Despite government opposition, Ukrainians continued to settle in these same patterns, establishing the Ukrainian Canadians as one of main ethnic groups that contributed to the development of the newly established Western Canada and Dominion of Canada. This project will use census data along with archival and genealogical documents to create an interactive map showing the movement of Ukrainians to and across Canada throughout their 130-year period of immigration. Most of the demographic information about the history of Ukrainian block settlement is contained in static thematic maps that were published over 30 years ago. With the addition of census data, these maps can be transformed into dynamic and interactive objects. This project seeks to preserve the historically significant research on Ukrainian immigrant block settlements with the creation of an openly accessible geospatial resource that includes historic and contemporary census data. This interactive mapping project will be a valuable resource for researchers, educators, and the general public interested in exploring the multifaceted narrative of Ukrainian immigration in Canada.
Collecting data about the accessibility of places: what should we be collecting?
Jessica Benner (Carnegie Mellon University)
The first worldwide report on disability identified several broad barriers for people worldwide. One of these barriers is a lack of accessibility. Standard guidelines for the design and construction of accessible environments have been implemented since the 1960s, but they do not cover the built environment that already exists. Consequently, when traveling to an unfamiliar place, people with disabilities (and others) cannot expect the spaces to be accessible to them. In lieu of the total removal of barriers, which may never be achieved, some information tools may help with feelings of independence and some researchers claim that access to information about the environment is even more important than removing physical barriers. There are existing sources of information about accessibility, however, they often lack clear criteria to describe accessibility, especially for different types of disabilities or ranges of ability and preference within disability groups. Additionally, each person with a disability has unique interactions with the built environment that impact accessibility including both barriers and facilitators to mobility. This talk will discuss the importance of collecting information about accessibility and share a set of criteria for collecting data related to the accessibility of public places like restaurants, doctors offices, government offices, etc. The hope is that this criteria can be used both locally by local enthusiasts and globally international technology companies to collect and share data about the accessibility of everyday places so people traveling to those locations can make more informed decisions about that location’s accessibility.
May 30, 2024: Session G1 Text Analysis
Text Analysis, Data Requests, and the Academic Library
Jen Ferguson (Northeastern University)
David Beales (Case Western Reserve University)
David Lowe (Texas A&M University)
Todd Suomela (Bucknell University)
Amy Kirchhoff (ITHAKA)
The ability to comprehend and communicate with text-based data is essential to future success in academics and employment, as evidenced in a recent survey from Bloomberg Research Services which shows that nearly 97% of survey respondents now use data analytics in their companies and 58% consider data and text mining a business analytics tool (https://www.sas.com/content/dam/SAS/bp_de/doc/studie/ba-st-the-current-state-of-business-analytics-2317022.pdf). This has fueled a substantial growth in the field of text analysis, involving the use of technology to analyze un- and semi-structured text data for valuable insights, trends, and patterns. Text analysis plays a pivotal role in various aspects of daily life, from the auto-suggest feature on your phone to the spam filter in your email and suggestions on streaming services. Recently, there has been significant buzz around text analysis and generative artificial intelligence, emphasizing their potential impact on human society. In our panel session, we will introduce the concept of text analysis and its importance. Librarians from four diverse higher education institutions, ranging from a large STEM focussed institution to a small liberal arts university, will discuss meeting the many text analysis needs arising on their campuses: integrating text analysis pedagogy across campus, running workshops out of the library, working with faculty to integrate it into class materials, and supporting research on campus. The panelists will also speak to some of the challenges inherent in supporting this work, ranging from budgetary concerns to issues of data management, access, and reproducibility. With students and faculty from all disciplines adopting text analysis, academic libraries have the opportunity to become primary supporters of this approach.
AI and the Sporadic Librarian-Coder: Adventures in Muddling Through and Excelling (Occasionally)
Jeremy Darrington (Princeton University Library)
For the occasional librarian-coder--those who know more than the basics, but for whom coding is not a primary/daily task--trying to remember specific syntax or commands, like how to format a visualization in ggplot or how to execute a semi-complicated filter in a pandas dataframe, often creates enough inertia to prevent us from tackling data-related projects where we might add value in the library. AI coding tools, like ChatGPT and GitHub’s Copilot, can lower the bar enough to make such projects worthwhile and successful...at least occasionally. This presentation will showcase how I've used these tools to augment my less-than-professional coding skills in devising solutions to practical work challenges, like exploring and parsing new datasets, collecting data for patrons, making data purchased from vendors more usable, and improving the functionality of crappy database interfaces that I'm forced to tolerate. In addition to highlighting specific project examples, I'll discuss opportunities, challenges, and some things I've learned along the way about what can increase the chances for success as well as when to consider throwing in the towel.
Building an automatic answering machine to panel survey questions
Flavio Bonifacio (Metis Ricerche srl)
There are now problems interviewing people by phone, even for simple and short interviews. This for three main reasons: 1.Inflation of commercial and/or promotional interviews 2.The decrease in the number of landline phones 3.Connecting people via mobile phone due to the absence of public directories of telephone numbers is extremely expensive. The SVTP project aims to reduce these difficulties by the construction of an answering machine. The machine will learn to answer from previous experiences, from surveys already conducted on subjects of interest (tourism in our case). The project we will work on a panel of tourism surveys to create a virtual survey campaign by means of the automated answering machine. The proposed machine will be able to answer survey questions. We have already done 10 surveys in years 20,21,22,23 about tourism in Piedmont for the local administration. Three for Sommers, three for Autumns, three for Winter and one for testing purpose (Autumn 23). We have already built a base line automated survey from the previous nine surveys and compared the answers of this survey on the test survey (Autumn 23) which results will be presented. Furthermore, several simulation trials using different forecasting techniques will be done: logistic regression and decisional trees for example. The models test will compare the model results with the baseline model to better preview the number of future Piedmont tourist visitors. The results of the test will be presented. Something like what ChatGPT does building artificial questionnaires, just to give an example. We will generate the answers instead. Philosophic implications of all of this are obvious. I will give you some clues in the presentation. Turin, 12th January 2224
AI in the context of generalist data repositories: the Harvard Dataverse case.
Stefano Iacus (Harvard University - IQSS)
Advances in generative AI research and the development of large language models are revolutionizing how we interact with digital devices, applications, and computers at an incredible pace. This overview first revisits basic concepts of generative AI in the context of natural language processing and large language models (LLMs). We then discuss potential applications for generalist data repositories, highlighting both the risks and opportunities of adopting these technologies. Specifically, we explore the implementation of these ideas in the Harvard Dataverse repository, including semantic search, automatic data curation support, interactive data exploration through natural language queries, data augmentation, and knowledge graph construction. We also examine the performance of commercial and open-source models in these tasks, explaining our development approach using open-source models. While our applications are focused on the Harvard Dataverse repository, the techniques and the methods we present are portable to any other data repository. References to open source code will be made available.
May 30, 2024: Session G3 Disseminating Government Data
Developments in Data Governance at the US Department of Labor
Dan Gillman (US Bureaau of Labor Statistics)
The US Department of Labor (DOL) contains many agencies which are broken into 3 groups: administration, policy, and enforcement. The Bureau of Labor Statistics sits alone outside this basic framework. These DOL agencies produce data, but this activity is not the main focus of their work. This has consequences, as discoverability, quality, understanding, interoperability, and the ability to blend data are not optimized. The Office of Data Governance (ODG) at DOL was established and resolved to address these issues. The task for the office is to develop a framework for good data governance at DOL and set a path towards building the capabilities to maintain it. All this is to be done with a small staff and very little budget. The focus of this talk will be on the 5 enforcement agencies at DOL: EBSA (benefits), MSHA (mine safety), OFCCP (federal contracts), OSHA (occupational safety), and WHD (wages). Enforcement is somewhat of a misnomer, as the main interest of the agencies is to gain compliance with their laws and regulations. When an inspection occurs and violations are found, the efforts of the agencies are to bring the violating establishments into compliance. The efforts are not to punish, though fines are sometimes necessary. With the exception of mining, inspections occur when a violation is suspected at some establishment. So, the data are not representative of a larger population, and the data are mostly hand entered. As would be expected, problems arise from this situation. In the talk, we will describe the program being developed by ODG with emphasis on • working with the other agencies • functionality being developed • problems being solved stressing the adaptation of parts of DDI-CDI and the use of machine actionable metadata.
National Population Register (NPR) in Bangladesh perspective
Chandra Shekhar Roy (Bangladesh Bureau of Statistics)
Shohorab Ahmed Chowdhury (Synesis IT PLC)
Md Shahin Akondo (Nano Information Technology)
The active development of various register systems in Bangladesh which interact poorly with each other and do not constitute a holistic system. The need is to consider the experience of advanced developed countries in creating a full-fledged NPR. Therefore, the paper will focus on the development of NPR. In order to fulfill the objectives, baseline data was obtained from the Bangladesh Bureau of Statistics (BBS) in collaboration with the National Household Database (NHD) project under the Ministry of Planning. In NHD, all the households were recorded in a census-like fashion. A total of 14 basic types of data are available for creating NPR. The primary element of the system is the Personal Identification Number (PIN). This unique identifier has been the core element of the NPR. Another element in the system is the Family tree. NHD also covered family relationships and dwellings, allowing data on individuals to be linked by family and household head. Some 10 or 11-digit PIN numbers will be generated by the NPR authority with considered as standard practice. The PIN number for a child will be generated when thire parents register the child’s birth. The local ‘Birth & Death Registration’ office will be responsible for NPR data updating. The secondary element in the system is “One Person One File” (OPOF). Each personal file in the NPR will hold the history of changes in terms of the registered date and the action that triggered the change of personal data in the register. Once the OPOF objective is ensured, the second objective, “register once-multiple use”, becomes important. The other objective ensures data quality and the reproducibility of NPR data. The novelty of the paper lies in a generalized and comparative analysis of the population register of Bangladesh.
The Nova Scotia Open Data Portal: Findings from User Research and Engagement with its Community
Lori McCay-Peet, Cyber Security and Digital Solutions (Government of Nova Scotia)
Lucy Ye, Cyber Security and Digital Solutions (Government of Nova Scotia)
The principles of transparency and accountability are important to the Government of Nova Scotia. The Nova Scotia Open Data Portal plays a key role in supporting these principles by hosting government datasets made available by various departments. However, to meet the demands and expectations for access to government data, insight into the needs of and the use of data and open data by the public, including, for example, librarians, researchers, students, entrepreneurs, and interested citizens, is vital. User research was conducted in the Fall of 2023 to understand the needs and experiences of Nova Scotia Open Data Portal users and potential users and to find opportunities for future and continuous engagement. This presentation will describe the user research findings from an online survey, focus groups, and interviews as well as subsequent plans and efforts to support the Nova Scotia open data community and engage meaningfully.
The role of FAIR principles in high-quality research data documentation
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
The FAIR principles as a framework for evaluating and improving open science and research data management have gained much attention over the last years. By defining a set of properties that indicates good practice for making data findable, accessible, interoperable, and reusable (FAIR), a quality measurement is created, which can be applied to many research outputs, including research data. There are some software tools available to help with the assessment, with the F-UJI tool being the most prominent of them. It uses a set of metrics which defines tests for each of the FAIR components, and it creates an overall assessment score. The FAIR assessment is done by using aggregated metadata for a research dataset, e.g. from the data webpage URL, from a PID provider like DataCite and others, and more services like repository information by re3data. The presentation will examine differences between manually and automatically assessing FAIR principles, show the different results, and use Election Studies and COVID data studies as examples. It will highlight the role of archives in securing a high level of data and metadata quality and technically sound implementation of the FAIR principles to help researchers benefit from getting the most of their valuable research data.
Assessing data deposits in an institutional repository (U of T Dataverse in Borealis)
Jasmine Lefresne (University of Toronto)
Dylanne Dearborn (University of Toronto)
Ken Lui (University of Toronto)
U of T Dataverse, the University of Toronto's data repository, is one of the largest institutional dataverse collections in Borealis with over 1000 published datasets. It currently follows a self-deposit model that allows U of T researchers to deposit and publish their data without intervention unless requested. Considering increased usage of the repository and new and developing data deposit policies, we conducted an assessment of U of T Dataverse to (1) review the quality of published datasets, and (2) understand who is using the repository. This was accomplished by analyzing the monthly Borealis metrics and conducting a quality assessment of select datasets’ structure and associated metadata. Ultimately, this assessment will be used to help conceptualize curation services and identify resources to develop that would enable high-quality data deposits. In this presentation we will discuss our approach to this assessment, preliminary findings, and how this project will shape our approach to service and resource development. Overall, this project will allow us to better understand disciplinary trends relating to who is (and isn’t) using U of T Dataverse, help develop clear processes and guidelines, and inform training and departmental outreach. Longer-term, it will help us estimate the effort and time required to provide curation services, inform priorities for repository development, and help us anticipate the impact any change in national policy may have in demand for institutional services.
How are we FAIR-ing? Creating a FAIR Self-Assessment Checklist for Data Repositories
Lauren Phegley (University of Pennsylvania)
Lynda Kellam (University of Pennsylvania)
In 2022, a team from a local grant-funded medical data repository team contacted the University of Pennsylvania Libraries’ Research Data & Digital Scholarship unit asking for guidance on evaluating the extent that their repository was FAIR enabling. After a consultation with the repository team, our research data experts discovered that many of the current self-assessments of the FAIR guidelines were for data creators rather than data repository managers. In addition, we wanted a self-assessment tool similar to the process and guidance created by CoreTrustSeal but with a focus explicitly on FAIR principles. In answer to their request, the Penn Libraries’ Research Data Engineer conducted a literature review and coalesced current guidance and assessment tools on the principles. After this review of the existing documentation, a small team consisting of the Research Data Engineer, the Head of Research Data Services, the Director of Data and Innovation Services, and the Bioinformatics Librarian developed through an iterative process a self-assessment tool for repository managers regarding FAIR principles. In addition to several iterations of the tool, we also met with the repository managers for feedback on ways to make the tool more understandable. Our discussions provided insights into the challenges of explaining the FAIR principles to those without information science backgrounds. The discussions we had and the development of this self-assessment tool helped to develop a more transparent and trustworthy repository. This paper will discuss the development of the assessment, the goals for utilizing the tool, and lessons learned. Reporting our findings as they currently stand will prompt the research data management field to ruminate on FAIR principle adoption for data repositories. We also intend for this paper to encourage more conversation on the usability of the FAIR principles for professionals without an information science background.
Geospatial, Keyword-based, and Access-Limited Data Discovery with Lunaris
Tristan Kuehn (Digital Research Alliance of Canada)
Shlomi Linoy (McMaster University)
Kevin Read (University of Saskatchewan)
Grant Gibson (Canadian Research Data Centre Network)
Amber Leahy (Ontario Council of University Libraries)
Lynn Peterson (National Research Council of Canada)
Sarah Rutley (University of Saskatchewan)
Julie Shi (University of Toronto)
Victoria Smith (Digital Research Alliance of Canada)
Kelly Stathis (DataCite)
Lunaris is a national research data discovery service operated by the Digital Research Alliance of Canada. Lunaris’ bilingual platform is a single point of search facilitating discovery of Canadian research data held in any of a growing number of data sources, including institutional repositories and government open data sources, among others. Lunaris’ index of Canadian research data is also harvested by other discovery services. Lunaris allows geospatial search of datasets that contain some geospatial information by providing a map-based search interface that limits returned datasets to those within a user-specified bounding box. Lunaris integrates this map-based geospatial search with traditional keyword-based search and filtering, allowing users to directly search for data in a given topic area that is related to a geographical region of interest. As part of ongoing discovery work, the Access-Limited Data Discovery Working Group, part of the Network of Experts, identified 137 data sources that contain datasets that are not immediately accessible or for which access or discovery is limited (https://doi.org/10.31219/osf.io/pa5fx). The group assessed how well those data sources met a set of discoverability and access criteria, and found areas for improvement that the discovery, access, and metadata community might use as an opportunity for growth. This presentation will demonstrate Lunaris’ search capabilities, outline the Access-Limited Data Discovery Working Group’s effort to assess the discoverability of access-limited data, and present plans to work with sources of access-limited data to make that data discoverable with Lunaris. Attendees will learn how they can use Lunaris to discover Canadian data that is relevant to their research interests, learn how Lunaris may facilitate data reuse by enhancing data discoverability, and be introduced to the ongoing efforts to integrate data sources with limited discoverability into a leading national data discovery service in Canada.
Metadata, reuse and reproducibility in the EOSC Future Science Project 'Climate Neutral and Smart Cities'
Hilde Orten (Sikt - Norwegian Agency for Shared Services in Education and Research)
The objective of the ‘Climate Neutral and Smart Cities’ Science Project of EOSC Future is to demonstrate that relevant environmental data and data on citizens' values, attitudes, behaviors and involvement can be combined in a meaningful way for social, political and scientific analysis. The project rests on three pillars: Indicator production and integration of data from three different research domains, structured metadata for interdisciplinary use, and dissemination of data and research outcomes. The presentation gives an introduction to the project, and describes how metadata standards are used as drivers for data reuse and reproducibility. Main focus is put on DDI-Cross Domain Integration (DDI-CDI) and DDI-Lifecycle, and how they are used together in the project.
Digitizing the Bostock Air Photo Collection: Making Air Photos Discoverable by Features and Subject Classification
Rene Duplain (University of Ottawa)
Pierre Leblanc (University of Ottawa)
This presentation will be on the recent digitization and publication of the 'Bostock air photos special collection' by the University of Ottawa Library. In 1968, H. S. Bostock published “A Catalogue of Selected Airphotographs”, a catalogue of selected air photos of geomorphologic phenomena in Canada, for the Geological Survey of Canada. This catalogue included over 800 air photos, both verticals and obliques, from the National Air Photo Library (NAPL). The photos were selected to highlight various subjects or phenomena found in Canada, with an emphasis placed on glaciological and glacial features. Examples of classifications of features included: landslides, waterfalls, canyons, deltas, estuaries, folded strata, highland glacier systems, shelf ice, glacier dammed lakes, volcanoes, meteorite craters, and many more. These photos covered much of northern Canada and originated from hundreds of flightlines generally taken between the 1940s and ‘60s. The uOttawa Library has had this collection in its holdings in paper format for decades but was the subject of a digitization project in fall 2023 and published to the uOttawa Library’s Digital Asset Management System (DAMS) for improved accessibility, preservation, and distribution. The associated catalogue from Bostock provided additional metadata for the photos, such as the general region, subject classification, and a description of the features in each image. With this digital collection, users can now search for model photos matching a variety of geomorphological features by keyword to support research and learning.
May 31, 2024: Session H1: Data Collection at Institutions
Data Collection Development: Practices and Perspectives
Kate Barron (Stanford Libraries)
Ron Nakao (Stanford Libraries)
Barbara Esty (Yale University Library)
Bobray Bordelon (Princeton University Library)
Researchers are demanding access to increasingly complex data. Acquiring these data for library collections present unique challenges that tax the current workflows, staff, resources and expertise in place for the acquisition of traditional library materials. Members of the Data Collection Development Interest Group will share their efforts to implement and enhance the existing acquisitions infrastructure at their institutions, and will share their thoughts on the challenges they have confronted in their quest to add research data to their libraries’ collections. The panelists will cover licensing & curation, data storage & access, identifying vendors & key players within their institution and evaluating data quality.
Charting a course for open data literacy education
Cody Hennesy (University of Minnesota, Twin Cities)
Tim Dennis (UCLA)
Zhiyuan Yao (UCLA)
This presentation shares a new model for developing open source data literacy lessons collaboratively. The Library Carpentry Curriculum Advisory Committee (LC-CAC) adopted a new lesson adoption policy in 2023 to encourage creating and broadening the scope of data literacy lessons in Library Carpentry. The policy leverages open-source frameworks for lesson development, such as the Carpentries Incubator (https://carpentries-incubator.org), along with open peer review via the Carpentries Lab (https://carpentries-lab.org), to support a variety of burgeoning data and information-science educational initiatives. While the traditional LC curriculum introduces tidy data concepts and tools such as Git, the Unix shell, and OpenRefine, early work using the new lesson adoption model has begun to integrate lessons on artificial intelligence, computational thinking, and data curation. LC-CAC has also partnered with UCLA to support the inclusion of materials developed under the IMLS-funded Lessons for Librarians in Open Science project (https://ucla-imls-open-sci.info). Six new lessons focused on topics such as data management, open qualitative research, open science hardware, and reproducible workflows are already being developed during year one of the two-year grant.
How Online Training Events are Supporting the Development of Quantitative Data Skills in social science students: a qualitative research project
Vanessa Higgins (UK Data Service)
Jackie Carter (University of Manchester)
Jools Kasmire (UK Data Service)
There is international demand for quantitative data skills – with Governments increasingly concerned with how data skills are acquired for 21st Century jobs. There is evidence from data skills training initiatives that social science students can help to fill this gap (Carter, 2021a, 2021b; Tazzyman et al., 2021) but more data skills training is needed to fulfil this need. The purpose of this paper is to explore how online training events can support social science students to develop data skills. We present results from a qualitative research project using in-depth interviews with social science students who have participated in UK Data Service (UKDS) online training events. We use reflective thematic analysis (Braun & Clarke 2021) to explore how the training enhances their data literacy and helps them acquire skills that they can use in their studies and research careers. We also discuss how the results from the project are feeding back into developments in the UKDS data skills training programme.
Participating in the Data Services Continuing Professional Education (DSCPE) with Libraries at Rensselaer Polytechnic Institute: RDM, Data Science and Analytics, and the Emerging LIS Roles
Ayaba Logan (DSCPE)
Because research data management and data science are hot new areas camouflaged as the same service as before, Elaine Martin and and Rong Tang, developed a program for upskilling librarians to create or increase data services at their institution by partnering with Libraries whom are further along the Data Service Continuum. I was accepted and paired with Rensselaer Polytechnic Institute. The CIO of their library wanted to know which entity was doing what and what activities were duplicated, essentially an environmental scan. The survey provides a sharable information product for other libraries to use to assess and evaluate their own RDM and Data Science and Analytical activities. The objective of this environmental scan was to document and evaluate the data services provided across RPI by asking the following questions: 1. What department is providing what RDM and Data Science and Analytics service? 2. What are the wholes or duplication of effort across the institution? And 3. What roles are LIS professionals playing in the RDM and Data Science and Analytics arena, broadly and locally? Methods: Used a systematic review of data services to design the survey and specifically included services across engineering disciplines. Upon completion of the survey including input from my capstone partner, I plan to use the survey to review RPI’s website methodically and thoroughly. This survey will increase the richness and rigor of the final report to RPI as well as identify the roles LIS professionals are playing in the RDM and Data Science and Analytics across the institution. Because this is a content analysis of websites, IRB approval was not sought. The study is currently ongoing. Results will be shared at the conference. Conclusions from findings will be shared at the conference.
May 31, 2024: Session H3 Data and AI
Data Sheets in Practice: An Exploration of Machine Learning Dataset Descriptions
Claudia Engel (Stanford University)
One of the approaches to address bias in Machine Learning algorithms has been a call for transparency and critical evaluation of the training data that feed into the Machine Learning models. Data descriptions are seen as a vehicle that can help to increase transparency and create awareness of potential shortcomings and ethical issues for predictive modeling. Among the most prominent is perhaps the proposal of "Datasheets for Datasets" by Gebru et al (2021). Data sheets provide information about provenance, use, and limitations and ask questions about the social and ethical implications of data production and use. The use of these templates is voluntary, suggestive, and modular. So how are they actually applied? Based on the analysis of about 62,000 data sets from Hugging Face, as well as a close reading of selected case studies this project attempts to gain insight into the practices of using Data Sheets for Machine Learning training data. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
Why do we need metadata, can we just use ChatGPT? Methods in metadata discovery and documentation using Generative AI
Samuel Spencer (Aristotle Metadata)
Since its launch in November 2022, ChatGPT has become a focal point for data analytics and discovery with many researchers, students, policy-makers and layman all turning to Generative AI to assist with documentation and interpretation of data. In this new world, the question arises - why do we need metadata if AI can fill in the gaps in our own understanding. While data and metadata practitioners understand that the context and nuance of data understanding are dependent on timely, high quality documentation there remains a bridge between best practice and real-world scenarios. To bridge this gap, data librarians and data governance areas must be able to communicate to stakeholders the value of good metadata, while also integrating new technologies. Given the reliance on emerging cloud or software-as-a-service technologies, there must also be a focus on educating users on how to preserve data privacy when using Generative AI tools to minimise data leakage. In this presentation, we look at how Generative AI tools such as ChatGPT and data analytics tools like Pandas can be used to augment data documentation and reduce the burden on data custodians without reducing data quality by exploring the following topics: * Methods in developing and running local language models * Retraining language models using metadata resources * Matching, summarising and generating data to generate metadata * Privacy-perserving methods in publishing auto-generated content. This is done using Aristotle Activate, an open-source database scanning and metadata linkage tool designed to interact with the Aristotle Metadata Registry to retrain and refine language models, as well upload and publish linked and generated metadata.
Exploring the Frontier: Generative AI in Qualitative Data Analysis
Michael Beckstrand (University of Minnesota)
Exploring the intersection of qualitative data analysis and generative artificial intelligence (AI), this presentation considers the current evolving landscape in qualitative software. Tracing the historical evolution of autocoding features in tools like NVivo and ATLAS.ti, the discussion expands to the these tools’ integration of OpenAI's GPT models. It considers novel applications beyond traditional sentiment and 'topic' identification for faster, hopefully robust, insights into qualitative data. Going beyond technical aspects, the session also addresses critical methodological considerations, examining the epistemological and theoretical challenges qualitative scholars face when considering removing the “close” from close reading in qualitative inference.
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
The DDI Alliance Scientific Board (https:/ +/ddialliance.org/leadership) would like to draw your attention to the developments for standardized metadata in the social sciences community and beyond. DDI products encompass the complete range of functions over the research data life cycle, allowing for both human- and machine-interactions with the documentation. They include DDI-Codebook for entry-level cataloguing; DDI-Lifecycle and Controlled Vocabularies for reusability and more sophisticated descriptions of surveys, questions and instruments; the XKOS extension to the Simple Knowledge Organization System for rich conceptual metadata; and the Structured Data Transformation Language (SDTL) for provenance. In addition, DDI-Cross Domain Integration (DDI-CDI) is a leading-edge product intended to fill the emerging need for integration of data from different disciplinary domains such as health or environmental sciences. The DDI Alliance is an open, welcoming community that includes data archives, research institutions, statistical agencies, software companies, and others. Events like the annual meeting and virtual community meetings help to foster exchange and discussion among members and interested parties. The Scientific Work Plan defines major development goals, and Working Groups (WG) on specific topics are established for implementing detailed steps. Examples are the WGs on Training, Controlled Vocabularies, Cross Domain Integration, or the DDI Developers Community. Working Groups are open to everybody, not only from member organizations, so there are many options to get involved if you are interested.
Urban Dataset Meta-Data Maturity Model
Mark S Fox, Urban Data Centre, School of Cities (University of Toronto)
Bart Gajderowicz, Urban Data Centre, School of Cities (University of Toronto)
Dishu Lyu, Urban Data Centre, School of Cities (University of Toronto)
Introduction: With the surge in urban data availability, the dataset retrieval challenge for urban studies has increased, often due to poor meta-data and data localization. We address these issues by proposing an urban dataset meta-data maturity model and framework that enhances meta-data management vital for data librarians and curators. We aim to establish a meta-data standard that simplifies finding datasets, catering to various stakeholders—governmental, commercial, civil, and philanthropic—and supporting global data sharing among urban data managers and repositories. Methodology: Leveraging existing meta-data standards like DCAT, Schema.org, DQV, and the FAIR principles, the proposed standard introduces a dataset meta-data maturity model categorized into seven distinct categories. Following Fox et al. (2024), the model provides a framework for defining and representing the maturity of a dataset’s meta-data, where higher maturity denotes greater detail, focusing on attributes that facilitate searching by topic, spatial and temporal aspects of datasets. Other levels focus on licensing, governance, adherence to FAIR principles, privacy and quality issues. The model integrates linked-data standards and enhances meta-data analysis by transforming meta-data into a knowledge graph. We evaluate the model by statistical analysis of entries in a catalogue of urban datasets. The model comprises six levels of meta-data attributes: Level 1: Basic information often used in dataset searches, including descriptions and temporal/geospatial data, for easy retrieval and identification. Level 2: Dataset access information, including location, licensing, and points of contact. Level 3: Additional documentation and access meta-data, alignment with FAIR principles for improved identification and interoperability. Level 4: Privacy matters, including individualized (vs aggregate) data and Indigenous data collection standards. Level 5: FAIR principles, ensuring findability, accessibility, interoperability, and retrievability. Level 6: Statistical and quality meta-data, focusing on completeness and accuracy. - Fox, M., Gajderowicz ,B., Lyu, D. (2024), A Maturity Model for Urban Dataset Meta-data. Manuscript under review.
May 31, 2024: Session H5 Open Data
Citation metrics for Open Government Geospatial Datasets
Amanda Tickner (Michigan State University Libraries)
This presentation describes the development of methods and instructions for government open data providers to find citation metrics for their provided datasets. Some example citation metrics will be presented for GIS datasets found in MI and the BTAA Geoportal, and the portal will be briefly introduced. The BTAA Geoportal (run by the Big Ten Academic Alliance Geospatial Information Network) is hoping to expand into holding some government datasets for longitudinal availability and preservation, as most government agencies replace older datasets with new ones and then remove the older datasets from places of public access. It is hoped that by letting government agencies understand where and how people are using their data through citations may encourage them to allow the BTAA Geoportal Project to do this. Government agencies are often reluctant to or hold ambivalence about offering GIS datasets to the public, and they may be especially reluctant to allow researchers to access the underlying data for public presented web maps. Helping government data creators understand the broader impacts of their offerings could encourage more sharing of data in ways that are accessible for researchers and metrics might also be useful to justify funding for data provision to elected officials.
Enabling Cloud-Based Open Science using Earth Observation Data
Greg Yetman, CIESIN (Columbia University)
Kytt MacManus, CIESIN (Columbia University)
Linda Pistolesi, CIESIN (Columbia University)
The Socioeconomic Data and Applications Center (SEDAC), one of NASA’s Distributed Active Archive Centers (DAACs), supports the integration of socioeconomic and earth science data and serves as an "Information Gateway" between earth sciences and social sciences. SEDAC is in the process of migrating its geospatial data archive to NASA’s cloud-based Earthdata GIS, where it will be available as standards-based services ready to use in GIS with NASA earth science data from other DAACs. While access to data and services is key to enabling global system research, it is not sufficient to simply build a system and expect successful uptake: domain-based examples of data use are required to demonstrate techniques and methods to integrate local data, Earthdata GIS services, and other cloud-based data in data analysis workflows. As part of the NASA Transform to Open Science (TOPS) mission, the Science Core Heuristics for Open science Outcomes in Learning (SCHOOL) project is developing open modules in the water resources, health and air quality, environmental justice, disasters, wildfires, climate, and agriculture domains. Population and infrastructure data are an overarching theme that will be integrated in all modules. These online, open modules include sample code (Python and R) that demonstrate the data science life cycle, and will be published in both English and Spanish. The presentation will give an overview of the system architectures being implemented, describe the use cases for the domain modules, and show a completed code example.
Open Government Data (OGD) in Bangladesh perspective.
Chandra Shekhar Roy (Bangladesh Bureau of Statistics)
Md Hossain Sinzon (Labcom Technology)
Chira Rani Mondal (Birdem General Hospital)
To implement Open Government Data (OGD): In line with the trend of Artificial Intelligence (AI), Internet of Things (IoT) and smart Bangladesh vision, real-time data and sensor-generated data are of greater interest to users. However, current problems and technology need to be found for a lesson learned in the next implementation. Based on seven indicators (the OGD portal, engagement with the public, data quality, metadata, data utilization, organization data privacy, and data interoperability) the New-Policies around OGD will be focusing on. Therefore, this paper presents a conceptual analysis of OGD and characterizes conditions under which the intended open data policy’s S.M.A.R.T goals can be achieved. It is a way forward for more systematic data sharing for the whole of government in the future. The paper also emphasizes that the National Statistical Office (NSO) remains the government institution responsible for aggregating official data, information and protecting the data of its citizens. Our findings suggest that smart government (e-gov to we-gov) in practice depends on the ways that social, organizational, and institutional strategies cope with technological change and become a bridge to creativity in developing a new ecosystem in the Fifth Industrial Revolution (5IR) and the digital data-driven smart society.
May 31, 2024: Session I1 Practitioners POV
Teaching for-credit data courses: RDM practitioner perspectives
Elizabeth Stregger (Mount Allison University)
Louise Gillis (Dalhousie University)
Erin MacPherson, (Dalhousie University)
Data literacy is a critical skill and librarians are well positioned to teach it. Despite this, credit-based university courses are rarely taught by librarians. Part of this is practical: librarians are often full-time practitioners and academic instruction isn’t in their job descriptions. But teaching a credit course provides distinct advantages over more common one-shot instruction: Instructors have the flexibility to create cohesive whole curriculums; to delve deeply into data topics, to upskill and reignite interest in data topics, to connect with researchers and students in new ways and new contexts, to explore a world of freely available open access learning materials shared by the international RDM community. This session relates the experiences of librarians teaching credit bearing data literacy courses at two Canadian universities. They will discuss how they mapped data literacies to the course, choice of platforms, and how their professional practice has information their teaching and vice versa. All instructors used open source, community developed resources and drew from their real-life experience to create engaging learning experiences to help their students learn and get excited about data! In this panel, we will also share some lessons learned and our future plans to chart a new course for the next generation of data professionals.
May 31, 2024: Session I2 Collaboration and Governance
Venturing Beyond our Silos: Results from a Survey of Canadian Data Repository Administrators
Alisa Beth Rod (McGill University)
Meghan Goodchild (Queen's University / Borealis)
The academic library community has a long history of collaboration demonstrated by the global adoption of common standards in our work. Despite the widespread use of the same standards and platforms, too few examples exist of academic libraries navigating beyond silos to shared infrastructure and services. As the scope of academic libraries grows to support research data as part of the scholarly resources that we steward, is it possible to develop shared research data management (RDM) services and infrastructures collaboratively? Six years have passed since a recommendation was made by [country’s] national research library association to establish a national data repository that would provide a robust, scalable, and affordable shared service. Building a national repository together would harness available but limited and distributed expertise in RDM, and encourage the collective creation and reuse of materials supporting training, user support, and outreach. Since 2019, [Data Repository] began formally offering the service nationally, governed in partnership with regional academic library consortia, which now supports over 70 [Country] institutions, each managing a locally-branded collection and providing local support to researchers. Has this push for greater equity been realized by [Data Repository] and what are the experiences of its institutional administrators? This presentation reports the results of a community-led survey of [Country] [Data Repository]’s administrators, focusing on their individual perspectives on challenges, barriers, and needs of the emerging community. Overall, the results of the survey highlight a unique effort to build equitable data sharing infrastructure that is national in scope and reflective of community needs. Understanding both the infrastructure demands and the needs for support from our community provides insights on how to plan future programs and software development initiatives. Learning from our experiences and our community can benefit other national and international large-scale repository initiatives in charting the future of data.
International collaboration as a common approach to improve the FAIRness of nonnumeric/qualitative research data
Kati Mozygemba (RDC Qualiservice/ University of Bremen)
Noemi Betancort Cabrera (State and University Library Bremen (SuUB))
The future of sharing and reusing qualitative research materials will be shaped by technological innovations, ethical considerations, privacy and data security issues, collaborative efforts, and an increased focus on improving the rigor and applicability of findings derived from existing non-numeric/qualitative datasets. Due to the sensitivity and comprehensiveness of the data, the integration of (AI-based) technology in data curation, data sharing and data analysis workflows needs to be carefully addressed, as well as issues of interdisciplinary approaches. Furthermore, researchers from different disciplines will be interested in using qualitative data, and vice versa, qualitative researchers will be interested in using data from other methodological backgrounds that lend themselves to the application of qualitative analysis. In the face of these developments and challenges, collaboration between data infrastructures with expertise in the curation and provision of qualitative data is key. Networks such as QualidataNet, a community-centered federated network of RDCs, seek to share and discuss common solutions for harmonization and standardization where possible. Its focus is to provide a single point of access to qualitative datasets from different data providers and to promote the sharing and reuse of qualitative data in line with the FAIR Data Principles. QualidataNet is part of the Consortium for Social, Behavioral, Educational and Economic Sciences (KonsortSWD) at the National Research Data Infrastructure (NFDI) in Germany. It coordinates the international cooperation within the DDI via the DDI-CDI Subgroup "Non-Numeric, Non-Code Datums", which aims at a standardized and comprehensive metadata description of data objects to enable cross domain integration. We will present the current work, future steps and plans of this international effort: In addition, we would like to continue exploring opportunities for international collaboration and exchange, and invite for participation with the goal of shaping FAIR's ways of processing, archiving, and reusing qualitative data objects.
My proposed presentation, "Adaptive Governance for Research Data Management" will provide an overview of how to apply adaptive governance to RDM practices in libraries and repositories. Adaptive governance was originally conceptualized in the late 1900s as a strategy for resource land management and is defined as "the evolution of the rules and norms that promote the satisfaction of underlying human needs and preferences given changes in understanding, objectives, and the social, economic and environmental context" by Hatfield-Dodds, Nelson, and Cook in their paper "Adaptive Governance: An Introduction and Implications for Public Policy. Australian Agricultural and Resource Economics Society", 2007. This concept was later adopted by data governance groups to ensure that rules and policies were adhered to while providing enough flexibility to meet the various data needs and nuances that exist across the data management landscape. In the context of research data management, adaptive governance can be applied by ensuring researchers meet university policies, funder and publisher requirements while providing flexibility to respond to researcher differences and concerns, such as data types and sharing restrictions. This presentation will cover the history and definition of adaptive governance as well as tips and examples for applying this concept to data management activities at each stage of the research data lifecycle. Included are data management planning, file naming and organization, data documentation, storage location and repository selection, and preservation considerations.
Collaborations Using Common Concepts - NACDA and CLOSER
Kathryn Lavender (ICPSR)
Hayley Mills (CLOSER UK)
Jon Johnson (CLOSER UK)
The National Archive of Computerized Data on Aging (NACDA) and CLOSER - a UK partnership of longitudinal population studies (LPS) are collaborating with the aim of providing secondary data users with an effective way to view the research potential of age related data across LPS. CLOSER and NACDA have implemented different metadata organization within their respective portals, CLOSER Discovery (discovery.closer.ac.uk) and NACDA Colectica portal (harmonize.icpsr.umich.edu). As two independent organizations, we intend to share information and in the future create an age-specific portal to allow comparisons across international borders, making it easier for researchers to find relevant data quickly, while supporting data harmonization, accessibility, interoperability, and reuse. Since the researcher journey broadly begins at a high level, then drills down to find the most appropriate variables, we propose to create a proof of concept at a conceptual group level. The use of metadata standards and collaboration are essential for providing researchers with these infrastructures. Since both CLOSER Discovery and NACDA portals operate with metadata documented using the DDI metadata standard, we can leverage this in an interoperable way, without recreating efforts. The presentation will set out how we have built a collaboration and will outline the likely approach to cross-country discoverability and best practice.
The ONS Longitudinal Study – opportunities for longitudinal research on the England and Wales population.
Alison Sizer (UCL)
CeLSIUS Team (UCL)
This presentation will showcase the Office for National Statistics Longitudinal Study (ONS-LS) and the opportunities that it offers for longitudinal research on the England and Wales population. The ONS Longitudinal Study, follows a 1% sample of the England & Wales population from the decennial census data (1971 – 2011), linked to births, deaths and cancer registration data. Sample members are selected on the basis of four confidential birthdays, with new study members entering the study through birth on one of the four birthdays or immigration (and being born on one of the four birthdays), and leave through death or emigration. The main strength of the ONS-LS is its large sample size (>1 million), making it the largest nationally representative dataset in the UK, and allowing the analysis of small areas and specific population groups. The ONS-LS currently has 46 years of follow-up data 1971 – 2017, and the upcoming linkage of the 2021 Census data to the Study will extend this follow-up to 2021 enabling researchers to examine changes that have taken place in the 2011 – 2021 period, which saw Brexit and the Covid 19 pandemic. The paper will introduce the data available in the ONS-LS and will discuss its size and scope. The support available to researchers interested in using the ONS-LS in their research is also highlighted, as are the arrangements for accessing the data. The presentation summarises some recent examples of research using the ONS-LS and highlights some key areas for future research. It concludes with a brief introduction to the ONS-LS’s sister studies, the Scottish Longitudinal Study and the Northern Ireland Longitudinal Study, and the opportunities that these studies offer for comparative research.
May 31, 2024: Session I4 Creating Partnerships and Collaborations
"Fortress-like Relationships": Trusted partnerships and NIOPA at Queen's University Belfast.
Norma Menabney (The Queen's University Belfast)
The Library at Queen’s University Belfast has created a unique open access digital archive of official publications that has an international audience: this includes material from legislatures, central government and agencies. This is called the Northern Ireland Official Publications Archive, (NIOPA). Collection development guidelines and metadata capture are agreed with the British Library under contract to preserve digital works under the Legal Deposit Libraries (Non-Print Works) Regulations 2013. Digital assets are harvested from departmental, agency, Northern Ireland Assembly and other official websites and added to the Archive for long-term preservation using DSpace open-source software. Discoverability is enhanced by agreed metadata standards that are transformed from dc to MARC format and cascaded to multiple legal deposit libraries. The records are also discoverable via OCLC. Essential to the working of this Archive are fortress-like relationships with a range of partner organisations (British Library and associated Legal Deposit libraries), and stakeholders (156 official governmental bodies and universities). The relationships are foundational, predicated upon necessary and legal obligation, fundamental to the future-proofing the Archive and resulting in synchronous activities and procedures. This has created an impressive cross-institutional and multi-agency model of collaboration which has been the gateway for subsequent projects, activities and investigations. This paper outlines the genesis of these relationships, how they began and fruitfully developed, and will provide a helpful case study for other organisations seeking to commence projects and cement alliances by answering the following questions: How does an organisation take the first steps to establishing collaboration? How does one align working practices? How does one secure trusted partners? What role do contracts and legal frameworks play in a multi-modal organisational setting that crosses political and legal jurisdictions?
Charting a Course to Collaboration: The LIbrary Data Services (LIDS) Dataset
Chad Kahl, Illinois State University (Milner Library)
Joshua Newport, Illinois State University (Milner Library)
Lindsey Skaggs, Illinois State University (Milner Library)
Research data services (RDS) are expanding across college and university libraries. To better understand the current state of RDS in R1 and R2 university research libraries in the United States, how they have evolved since the onset of the COVID-19 pandemic, and who is providing these services, this research project built an interoperable dataset, LIbrary Data Services (LIDS) dataset, to inform RDS development and assessment. The dataset records data service area(s) (e.g., Research Data Management), fifteen data service types (e.g., data management/data curation), and personnel and unit information gathered through website content analyses, alongside Carnegie Classification data. How can the data services community build on LIDS? While the focus for this research project is R1 and R2 university research libraries in the United States, similar studies have examined data services in libraries at other levels of American higher education (Radecki & Springer, 2019; Murray, et al., 2019; Yoon & Schultz, 2017), and academic and research libraries across Canada and the United States (Kouper, et al, 2017; Tenopir, et al., 2019), Europe (Tenopir, et al., 2017; Yu, 2017), Spain (Martin-Melon, et al., 2023), the United Kingdom (Cox & Pinfield, 2014), southern Africa (Chiware, 2020; Chiware & Becker, 2018), and globally (Cox, et al., 2019; Liu, et al., 2020; Reilly, 2012; Si, et al., 2019). How can we work together to change LIDS to I-LIDS, the International LIbrary Data Services dataset? Can we better reflect the post-COVID data services environment in higher education globally? This presentation will share the LIDS dataset with the international community to inform data services collaboration, service development, and assessment, and consider how they might want to expand the dataset to cover their region of the world.
Collaborating to Create: A Team-Based, Data-Driven Approach to Developing Generative AI Educational Resources
Shelby Hallman, University of California (Los Angeles)
Renee Romero, University of California (Los Angeles)
Ashley Peterson, University of California (Los Angeles)
Hannah Sutherland, University of California (Los Angeles)
This individual presentation will describe the creation, management, and data-driven outcomes of a project team assembled in response to burgeoning interest in and concern over the use of AI tools in higher education. The presenters will discuss the process of guiding what began as an ad-hoc interest group into a large-scale, multi-faceted team to enable student learning about AI, based on gathered data. The team is co-led by two librarians, and consists of full-time library staff, instructional development staff, and student library workers. The team collaborates on a suite of projects that include: gathering and analyzing data about current use of AI on campus via a pop-up survey; an IRB survey; an interactive tutorial on the ethical use of AI in coursework; an AI tools resource guide; an external, campus workshop addressing algorithmic bias and bias in AI; and an internal, library workshop on AI in library instruction. Attendees will leave with an understanding of the project components; lessons learned; and insight on how the project outputs address diverse student perceptions and needs surrounding generative AI.
The Fashion of Collaboration in Geospatial Data Librarianship: Shaping Emerging Research Data Management Services
Madiareni Sulaiman (University College London and BRIN Indonesia)
To the best of my knowledge, there has been limited research in the context of geospatial data librarianship into the growing trends in its data services, particularly in regard to research data management services (RDMS) and collaboration trends. While geospatial data is becoming increasingly important in addressing global concerns and enabling a wide range of applications, there is still a gap in understanding how developing RDMS interact with collaborative efforts in the geospatial data librarianship area. These services include the difficult curation, preservation, and accessibility of geospatial datasets, each with its own set of spatial and temporal dimensions. Collaboration emerges as a critical feature at multiple levels within this dynamic ecology. For instance, inter-multidisciplinary collaborations are becoming more important as geospatial data librarians collaborate with researchers from other professions to build tailored RDMS for specific domains. This will ensure that data management strategies fit the specific needs of each profession. Second, intra-institutional interactions within institutions become critical for efficient RDMS, mandating close collaboration between libraries, research centres, IT departments, and cartography units. Such arrangements help to ensure the seamless integration of geospatial data into the academic and research environments, as well as a uniform approach to data stewardship. Finally, cross-sectoral collaborations among academic institutions, government agencies, and industry stakeholders are critical in shaping geospatial data management policies, standards, and practices, fostering advancements in RDMS on regional, national, and international scales. Through an examination of these trends, the absence of research on this topic may reveal the diverse degrees of collaboration inherent in the evolving fashion of geospatial data librarianship as it relates to enabling users access to geospatial data. This, in turn, will foster progressions in the domain of librarianship, geospatial sciences and its extensive pragmatic applications.
Building Open Science Communities: The Journal Editors Discussion Interface (JEDI)?
Sebastian Karcher (Qualitative Data Repository)
Julia Bottesini (Journal Editors Discussion Interface)
The Journal Editors Discussion Interface (JEDI) is a thriving online community of 430+ current, former, and incoming social science journal editors, data professionals, and metascientists. JEDI provides an online forum that has hosted numerous discussion threads on open science and journal editing. Drawing on information and materials provided in these discussions by its members, JEDI has also begun to compile a substantial collection of open science resources directed at journal editors. The community is organized by the Data Preservation Alliance for the Social Sciences (Data-PASS), a voluntary partnership of organizations created to archive, catalog, and preserve social science research data, and has received continued funding from the NSF. In this short talk, we will first present JEDI — how its origins, rooted in discussions on editorial processes concerning data, code, and their management — has evolved to encompass topics as diverse as citation practices, research transparency and accessibility, and peer review, among others. Then, we will share insights from JEDI’s three years of existence that we believe are applicable to other scientific communities as well: how to build and sustain community within academia, what challenges JEDI has encountered and how we have tackled them, and how open science plays a central role both as an objective and a guiding principle for JEDI.
If the map cart won't come to the library: Building a bespoke map cart for the Lloyd Reeds Map Collection
Saman Goudarzi (McMaster University)
Bronwen Glover (McMaster University)
Moving large format items always poses a challenge in collections. At McMaster University, the rare maps collection is held on a different level to the Maps Library, which makes moving maps for teaching, reference, and assessment even more challenging. Although the utilization of a map cart would be the apparent solution, such a tool is not readily accessible for procurement from library/archival supplies vendors. This is the predicament that confronted McMaster’s Map Library when looking to undertake a series of collection management projects. In this presentation, we detail the steps taken and lessons learned in creating a bespoke map cart.
Developing a Library Archive for Secure Data Storage
Elizabeth Hill (Western University)
Kristi Thompson (Western University)
In 2019, Western Libraries was asked about potential storage options for data from a major multinational study that had been stagnating on a departmental server for a number of years. The idea of a secure library data archive was floated but despite support from Library Administration and the provision of storage space, we became mired down in endless conversations about procedure, ethics and data ownership. In 2022 an additional data crisis arose: our Research Ethics Board’s former recommended option for sharing sensitive data was being decommissioned and an unknown number of potentially restricted datasets were about to become homeless. That was the push needed and in 2023 the Library Secure Data Archive was launched. Come hear a story of data neglect, peculiar storage recommendations and The Form That Took Forever!
Doing data literacy outreach and assessment with a social justice lens
Jess Yao (Reed College)
Having the skills to find informative data when developing and answering research questions is an important aspect of data literacy, and for the social sciences, these skills need to address issues related to data about race, gender, and underrepresented populations. I want to share an effort to bring in a critical data literacy component at a small, liberal arts college (Reed College) by exploring instructional opportunities and assessing student and faculty needs. We are also interested in questions about prior interest, subject relevance, and scaffolding in data instruction. Improving and integrating data literacy in curricula is vital to ensure that students with no background in data become capable of finding and interpreting the data they need in their field. Standalone units or workshops can also be effective, which in different libraries have run the gamut from special topics workshops to foundational data literacy 101s. The goal of this project is to map out and understand where among the students and programs at Reed can data literacy instruction live. Reed students are highly motivated and interested in social justice topics, which raises some questions: Would students be interested in the topic of data and social justice? Can using a social justice lens to teach about research data pique more interest? How can discussing the historical and political issues of data collection and discovery enhance students’ data literacy? What form should/ could such a session take and what would the learning objectives be? Can it serve as a module for different subject areas or is subject specificity a need? This talk will discuss the use of ad hoc student outreach activities, syllabi review, and faculty surveys; and possibly, blueprints for a workshop.
Charting a FAIR Direction for the US Government Information Ecosystem
Deborah Yun Caldwell (University of North Texas)
Lynda Kellam (University of Pennsylvania)
James R. Jacobs (Stanford University)
Shari Laster (Arizona State University)
FAIR Principles are most often associated with research data, but they have applications in other ecosystems. With the FAIR Principles as common ground, we explore the current system of government information dissemination within the United States to see how FAIR Principles can be applicable to the complex ecosystem of federal government information. Following the guidance of the U.S. Office of Science and Technology Policy (OSTP), federal agencies increasingly take into consideration FAIR Principles as part of practices to improve public access to federally-supported research data. However, information that has been produced by the government for public access also includes publications, websites, databases, and other forms of content, nearly all of which are now born-digital. Dissemination is subject to the guidelines of the Office of Management and Budget (OMB), but once it is published online, stewardship falls to a myriad of responsible agencies, depending on the specifics of the content. This leaves gaps in what is collected, described, made available, and preserved for future discovery and reuse. This talk will chart some of the boundaries and potential for future collaboration and partnership among practitioners and stakeholders.
An Analysis of RDM Job Postings in Canadian Academic Libraries
Jasmine Lefresne (University of Toronto)
Dylanne Dearborn (University of Toronto)
Lucia Costanzo (University of Guelph)
Minglu Wang (York University)
Marjorie Mitchell (University of British Columbia)
Melissa Cheung (University of Ottawa)
This talk will discuss our analysis of RDM job postings in Canadian academic libraries over the last decade. Specifically, we explored the following research questions: (1) what terminology is used; (2) what are the requirements listed; (3) what are the responsibilities and characteristics of the positions; (4) have there been changes over time; and (5) how do our findings compare to similar studies? This study was born from the desire to understand how institutions have been planning for the future of RDM support in Canada. The RDM landscape in Canada has changed significantly in the past decade. The development of the Tri-Agency RDM Policy, changes in journal/publisher requirements, and an increased emphasis on open science have changed the way researchers are expected to manage their research data and, consequently, the types and volume of support they need and that are provided by institutions. The results of this study will help the Canadian RDM community gain a deeper understanding of the role libraries play in supporting RDM and the skills and experience desired when hiring RDM professionals. The findings could also help guide professional development initiatives and could be compared to Canadian LIS curricula to uncover gaps in training for the next generation of information professionals.
Creating a podcast to engage the data professional community
Briana Wham (Penn State University)
Shannon Sheridan (Pacific Northwest National Laboratory)
Researchers increasingly need to adopt new practices to meet the requirements of funders and publishers as well as to engage in increasingly open and reproducible science. Thus, it is vital that researchers be supported in implementing good research data management practices. But, what are the best ways to engage with researchers on these topics? In this lightning talk, we will describe (1) a professional podcast for data professionals, IDEA (Improving Data Engagement and Advocacy), (2) the authors’ process of starting and maintaining a podcast which highlights adoptable data engagement activities and reviews the literature, and (3) the benefits of developing a podcast as early career librarians.
1. Interuniversity Consortium for Political and Social Research (ICPSR) [Sponsor]
Shelly Petrinko (ICPSR)
Sponsor's poster
2. Digital Research Alliance of Canada [Sponsor]
Jolinne Kearns (Digital Research Alliance of Canada)
Sponsor's poster
3. Sage [Sponsor]
Jill Blaemers (Sage)
Sponsor's poster
4. Open Science Framework [Sponsor]
Amanda Staller (Open Science Framework)
Sponsor's poster
5. Social Science and Humanities Council of Canada (SSHRC) [Sponsor]
Dominique Roche (SSHRC)
Sponsor's poster
6. Update on the Tri-Agency RDM Policy in Canada
Dominique Roche (Social Sciences and Humanities Research Council of Canada)
This poster will provide an update on the implementation of each of the three requirements of Canada’s Tri-Agency Research Data Management Policy, including: 1. institutional research data management strategies, 2. data management plans, and 3. data deposit. The Policy was adopted in 2021 by Canada’s three federal research funding agencies (SSHRC, CIHR, and NSERC) to support research excellence by promoting sound research data management and stewardship practices and ensure that research is performed ethically and makes good use of public funds, experiments and studies are replicable, and research results are as accessible as possible. An agency representative will be present to engage with delegates and answer questions.
7. DMPTool & DMP Introduction for Subject Librarians
Meryl Brodsky (University of Texas Austin)
Grant Hardaway (University of Texas Austin)
This poster will describe a virtual one-hour hands-on training session for librarians that was designed to introduce them to the DMPTool and give them an opportunity to practice providing feedback on data management plans (DMPs). Using the DMPTool, the instructors created purposely flawed DMPs for NEH, NIH and NSF grants. Before the session, attendees were assigned one of the three DMPs to review and were expected to come to the session with talking points to improve the DMP. The DMP exercise introduced librarians to the concept of critically assessing a DMP, illustrating the value of subject librarian expertise when evaluating the document, and recommending improvements to researchers in their areas. The attendees were able to share observations, and learn together about the kinds of improvements that are often needed.
8. A Transparent and Interoperable Research Curation Workflow at NYU
Talya Cooper (New York University)
NYU IT and Data Services launched a new research repository, UltraViolet (made with InvenioRDM), in May 2022. Prior to opening the service, the curation team spent significant time planning and designing a transparent, interoperable workflow. Influenced by the collaborative curation model of the Data Curation Network, including their CURATE(D) steps, the team decided to use shared file and information management systems, with no crucial pieces of communication or tracking left in an individual’s email or personal documents. The workflow weaves together several tools, all of which were previously available and familiar to system users. Patrons begin by entering metadata about a deposit in a Qualtrics form, which feeds into Springshare’s LibAnswers portal. An available curator responds using a LibAnswers macro and supplies a Globus upload link; the researcher then adds their files. As the curator reviews the files and communicates with the researcher, they can use internal notes and templates in LibAnswers that incorporate DCN CURATE(D) checklists to ensure the curator stays attuned to crucial pieces of the curation workflow and maintains a log our whole team can access. All communication with the researcher (both emails and notes from conversations) also stays in LibAnswers, enabling other curators to pick up or refer back to the conversation. For large data, curators can also use an API to connect Globus directly to UltraViolet. This connection automatically loads data to the appropriate record in UltraViolet, transferring the files to our preservation repository while maintaining an ready access copy. Our poster will depict this workflow and point to some future refinements and improvements we hope to make, such as connections with other InvenioRDM tools and automations.
9. Adrift in the Data LiteraSea: Are academic libraries an anchor for data literacy?
Terrence Bennett (The College of New Jersey)
Shawn Nicholson (Michigan State University Libraries)
In the Summer / Fall 2004 issue of IASSIST Quarterly, Milo Schield presented a strong case for the role of academic librarians to promote and provide instruction in data literacy, noting its interrelation with both information literacy and statistical literacy. In the ensuing 20 years, we’ve seen a proliferation of data literacy initiatives in higher education (often as a supplement to—and sometimes conflated with—the emergence of data science and data analytics as an academic discipline and an in-demand career path). Harkening back to Schield’s focus on the library, we want to explore the extent to which these burgeoning endeavors are centered in the library and staffed and maintained by librarians. Additionally, we hope to quantify, as of now, at what type of institution the library is most likely to take the lead in promoting data literacy. This poster will present initial findings from a comparison of data literacy initiatives at large research universities in North America (ARL members) with data literacy programs at undergraduate-focused liberal arts colleges in the same region. Other variables considered include the marketing and discoverability of data literacy programs; the involvement of other non-library campus units; the extent of data literacy training and instruction (including a for-credit option); and the focus and content of data literacy programs. Attendees’ reaction to these initial findings will help inform the path and direction of further lines of inquiry related to this topic.
10. Data management plus - organizing data and program files for a university-community partnership
Melanie Brasher (University of Rhode Island)
Skye Leedahl (University of Rhode Island)
Data management refers to all the small practices that make data easier to find, understand, use, and re-use (Briney, 2015). This includes documenting data, organizing files, and backing up data. This presentation will outline how we organized both research data files and accompanying digital program files from a grant-funded service-learning program, where quasi-experimental data was collected from community members and students who participated in the program over multiple years. While increasingly information science professionals have a sense of best practices for data management and archiving, less attention is paid to the organization of non-data files that compliment research. There are a variety of digital files that support the program, this includes written documents pertaining to program development, training materials, contracts, and recruitment of participants. In our particular case, the service-learning program did not start out with a clear plan for organizing the data or the digital files. This presentation will share how we went from ad-hoc file organization to a more streamlined system, not only utilizing practices from research data management but also integrating ideas from personal knowledge management (PKM) and business "self-help" books.
11. Integrating Open Government Data Perspectives within Data Literacy Efforts
Ben Chiewphasa (Columbia University)
This poster will cover three separate sets of learning outcomes and lessons plans or roadmaps for open government data-focused teaching instances previously piloted with Columbia University Libraries and the Minor in Data Science Program at the University of Notre Dame: (1) a semester long credit-bearing course that combines reading discussions and engagement with data science tools; (2) an assignment related to metadata enrichment; (3) a workshop related to finding and evaluating open government data. The poster will also discuss strategies to co-define and co-evaluate the concept of “open” with students in varying topics: (1) Open government policies and practices; (2) technological infrastructures that support open data; (3) ethics, privacy, and social justice. Embedding open government data perspectives in data literacy efforts–whether it’s a workshop or credit-bearing courses–provides a myriad of opportunities for undergraduate and graduate students to engage with the technical, legal, and ethical implications of working with civic data. In a credit-bearing course context, students can inspect the major laws and policies surrounding open government while also examining the social and technological challenges and advancements that shape the future of open data—for example, grassroots data intermediaries obtaining and “translating” open government data for a public audience. When a course is more computational or methods-oriented, open government data allows for students to explore the challenges and opportunities of working with data with varying levels of complexity and “cleanliness,” signifying and clarifying the importance and utilitarian role of metadata within the data lifecycle.
12. Promoting AI and Data Literacy at an Academic Library
Katherine Koziar, University of California (Riverside)
Carrie Cruce, University of California (Riverside)
While machine learning has been used in libraries and higher education in different ways for several years, the rapid and growing availability of AI tools has brought the term AI to the forefront of many conversations. Librarians at the University of California, Riverside (UCR) Library developed a response to promote AI and data literacy both within the library and the UCR community in general. Starting by leading an ongoing internal library discussion group - AI Intersection with Higher Education and the Academic Library - we developed an informal collaborative space to share ideas, concerns, experiences, and self-educate on the impact of AI on our work. This positioned the library within the larger campus to contribute to the response to AI tools, and provided a foundation for AI literacy workshops, which focused on fundamental knowledge of AI with emphasis on consumer level tools. The co-leads were especially mindful to include ethical implications, highlighting possible benefits and harms of AI tools. This poster will describe our experiences and challenges developing these initiatives. We will also highlight feedback from colleagues and learners, and share our future plans.
13. Navigating the AI Landscape: A Comprehensive Dashboard for AI Deployment and Policy Tracking
Trevor Watkins (George Mason University)
In the rapidly evolving domain of Artificial Intelligence (AI), understanding its ubiquity and impact across various sectors is crucial. This poster introduces the "AI Ubiquity Dashboard," a first-of-its-kind tool designed to monitor and analyze the extensive deployment of AI. The dashboard offers a multi-dimensional view of AI integration, focusing on areas such as the diversity of AI applications, data collection methodologies, and the legislative landscape shaping AI usage. Key functionalities of the dashboard include tracking the deployment of AI in different states, categorizing AI applications (e.g., healthcare, finance, education), and examining data collection methods used in AI-driven software. It also keeps track of executive orders and legislation at both the state and federal levels in the United States, offering insights into how governance shapes AI deployment. Additional metrics incorporated into the dashboard address emerging areas of interest in how AI is applied. These include ethical AI usage metrics, Generative AI in education, and research metrics. The dashboard is a feature of the Cosmology of AI project, a project focused on developing and providing a friendly outward-facing visualization of the evolutionary structure of the field of AI.
14. Weaving AI to life: Physically visualizing generative AI responses to data librarian reference questions
Alexandra Wong (York University)
Ada Lovelace, often heralded as the first computer programmer, saw the potential in Charles Babbage’s Analytical Engine by comparing its possibilities beyond number-crunching with that of the mechanical loom, which used mechanical binary punchcards to weave intricately patterned fabric. Inspired by this revelation to conduct a contemporary reimagining, I propose a poster that investigates the potential and pitfalls of generative AI and then visualizes the results through a woven data physicalization. As generative AI rapidly advances, there continues to be a need to stay abreast in its applications and implications. In particular, I plan to investigate how well different generative AI products fare in answering various reference questions that a data librarian may be asked; how deep can a certain generative AI tool be asked to go, before it starts to hallucinate or lead to results not fitting the prompt? How well can it unearth datasets that meet a researcher’s very specific needs, when data can be embedded in a variety of places, some behind paywalls or low SEO? In homage to the long lineage of technology, the results of my investigation will be visualized by creating a textile-based data physicalization, where an accurate result will be woven with one colour and an inaccurate result in another. Each woven row will represent the results of a different reference question and AI product combination; the surrounding area of the poster will contain details of the question and product combination. Thus, technology again programs the textile’s pattern.
15. Initiatives of Japan Data Catalog for the Humanities and Social Sciences (JDCat): Approach for Consolidating Metadata Cross Institutions in Japan
Shuai Wang, Institute of Social Science (the University of Tokyo)
Sae Taniguchi, Institute of Social Science (the University of Tokyo)
Satoshi Miwa, Institute of Social Science (the University of Tokyo)
The Japan Data Catalog for the Humanities and Social Sciences (JDCat), which began operation in 2021, serves to improve user experience and data discoverability by enabling batch searches of metadata for data released by five major research institutions in humanities and social sciences in Japan. As part of the Program for Strengthening Data Infrastructure for the Humanities and Social Sciences, which aims to operate and disseminate JDCat, the University of Tokyo, as the core institution of the project, is working toward disseminating JDCat and promoting the usage of its data. This poster will introduce the JDCat project and provide an overview of the initiatives undertaken by the Social Science Japan Data Archive (SSJDA), which is one of the hub institutions.
16. FAIR, Ethical and Community Driven: Setting up the Dutch Thematic Digital Comptence Center SSH
Nicole Emmenegger (DANS)
This poster presentation will provide an overview of the process, pitfalls and achievements so far in setting up a Thematic Digital Competence Center in the Netherlands for the Social Science and Humanities (TDCC SSH). The TDCC SSH was established in mid 2023 in The Netherlands. Through our network activity and project investments, our goal is to accomplish a substantial increase of reusable research data and software across the SSH domain(s). We’ll do this by supporting Dutch researchers and support staff to address issues related to the collection and usage of data and software. We’ll provide funding and advocate for knowledge sharing, awareness raising, digital transformation and policy development. We'll support and invest in data and software management practises that critique, challenge and transform the distribution of power in academia and society at large. On our path to achieving data justice, we will consistently engage in self-reflection, maintain transparency, and be accountable for our actions.
17. Modernizing Data-Delivery Technologies - Combining Data-User Feedback with User-Support Expertise for Successful Data Tools
Linda Detterman (ICPSR - University of Michigan)
Navigating the future of data ultimately relies on technology that assists would-be data users to discover, evaluate, and analyze data, and to do so quickly. ICPSR began sharing social sciences data electronically by FTP(!) in 1996 and via the Web in 1998; authentication technology that enabled members to download member-data directly via the web launched in 2001. In tech-time, this is ancient history! Over the last two years, ICPSR has been modernizing its data ingest and data delivery platforms. It’s not done yet, but lessons have already been learned! How does a 60+ year-old archive with a legacy of ingest, curation, dissemination, and specially-developed data technologies united together and interconnected in known and unknown ways reimagine and relaunch itself to meet research-data-sharing needs of today and of the future? The answer seems simple - talk to those "using" the data technologies. But then it gets complicated quickly. Who is a data user? Who else has key information that must be included? How does an organization combine user and user-support feedback, expert knowledge, and regulations to assist tech developers in navigating the future of research-data-sharing technology? Don’t worry - this poster isn’t going to be tech-talk! Rather, it’s going to be data-user-talk including how to identify and talk with users of data delivery technologies and importantly, about the necessity of getting direction from those interacting with or assisting data users. There will be colorful process charts, a summary of lessons-learned, and some peeks at ICPSR’s recently-launched/about-to-launch data delivery technologies! The information will be of interest to those interested in (user-friendly) data service technologies and those who help data users navigate the processes to find or deposit scientific data.
18. The Illinois Data Purchase Program: Journey's End / New Horizons
Carissa Phillips, University of Illinois (Urbana-Champaign)
The Data Purchase Program (DPP) at the University of Illinois, Urbana-Champaign (UIUC) was one of the earliest formalized programs for data collection development based on requests from researchers. Beginning in 2011, the DPP successfully made targeted acquisitions of numerical and geospatial datasets on behalf of UIUC graduate students and faculty. However, each new dataset acquisition presented its own unique challenges, and a variety of approaches for storage, discovery, and access (in accord with vendor requirements) were tested and implemented with mixed success. In addition, over the eleven years of the program, researchers' expectations regarding the DPP's budget, speed, and ongoing support began to exceed the DPP's scope, while data vendors became more sophisticated in their offerings and options. Waves of challenges--in acquiring data our researchers wanted, in maintaining the collection we already had, and in maintaining the personnel needed to support the collection--overwhelmed the DPP, and the program was discontinued in 2022. This poster will discuss the territories the DPP explored, the lessons learned, and how those lessons can inform future expeditions in data collection development, storage, discovery and access at Illinois and beyond.
19. An attempt to make open data FAIR. How SLU made data discoverable and reusable by creating the environmental data catalogue
Mikaela Asplund (Swedish University of Agricultural Sciences)
How we made data discoverable, accessible and reusable with the environmental data catalogue. Open science, including open data, is an important part of the Swedish University of Agricultural Sciences' strategy. The university also has a data management policy that states that data from research and environmental monitoring should be openly available with as few restrictions as possible in accordance with the FAIR principles. SLU's environmental data catalogue was built to enable a common and searchable web-based interface to the university's environmental monitoring data. The university already has a large amount of open research data, but finding data is a challenge as it is available in many different places such as webpages, data portals, and repositories, which of course makes it difficult for those who want to reuse data, for example for research purposes. The catalogue was launched in autumn 2022 and the ambition is to continuously collect descriptive information (metadata) on all data to provide a common entry point for data access via SLU's environmental data catalogue. So far we have received a positive response since the launch and we face the challenge of making the catalogue and the university's open data even more FAIR.
Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
Hilde Orten (SIKT)
Darren Bell (UKDS)
Documentation of research data across the research data lifecycle requires metadata at several stages. The DDI metadata standard provides several products across the data lifecycle to facilitate the capture and use of structured metadata. DDI products allow for both human- and machine-interactions with the documentation. They include DDI-Codebook for entry-level cataloguing; DDI-Lifecycle and Controlled Vocabularies for reusability and more sophisticated descriptions of surveys, questions and instruments; the XKOS extension to the Simple Knowledge Organization System for rich conceptual metadata; and the Structured Data Transformation Language (SDTL) for provenance. In addition, DDI-Cross Domain Integration (DDI-CDI) is a leading-edge product intended to fill the emerging need for integration of data from different disciplinary domains such as health or environmental sciences. The poster will show how the DDI products work together and contribute to researcher’s needs for high quality documentation and data and metadata re-use.
Mark S Fox, Urban Data Centre, School of Cities (University of Toronto)
Bart Gajderowicz, Urban Data Centre, School of Cities (University of Toronto)
Dishu Lyu, Urban Data Centre, School of Cities (University of Toronto)
Introduction: The surge in open data platforms such as CKAN and Dataserve has expanded the urban data landscape, yet data scarcity persists due to inadequate metadata, poorly tailored data presentation, and localization challenges (Ojo et al., 2016). Decentralization of repositories further complicates data discovery and metadata inconsistencies and obstructs dataset identification, comparison, and deduplication. The Canadian Urban Data Catalogue (CUDC) addresses these issues by providing a comprehensive catalogue of both accessible and restricted Canadian urban datasets and web services. It incorporates a dataset metadata maturity model that ranks datasets by metadata completeness, where higher maturity denotes greater detail. Following Fox et al. (2024), the levels assess search-relevant attributes, extending to licensing, governance, and compliance with FAIR and indigenous data principles, ensuring a structured and mature metadata framework for catalogue entries. Methodology: The development of CUDC involves a user-centric approach, focusing on its users' practical needs and behaviours. The architecture integrates the maturity model with an advanced knowledge graph database for metadata analysis, developed as an open-source CKAN plugin that provides: 1. Cataloguing: a metamodel, extension support, upload capabilities, and API access points, ensuring accessible and transparent data access policies. 2. Search Functionality: a wide range of searchable metadata organized for easy data entry and retrieval. 3. Dataset Usage Quality: encourages comprehensive metadata provision for determining dataset applicability and relevance. 4. Search Behaviour Analysis: offers insights into dataset search models and tools, identifying key metadata across domains. Ojo, A., Porwol, L., Waqar, M., Stasiewicz, A., Osagie, E., Hogan, M., Harney, O., and Zeleti, F. A. (2016, October). Realizing the innovation potentials from open data: Stakeholders’ perspectives on the desired affordances of open data environment. In Working Conference on Virtual Enterprises (pp. 48-59). Springer, Cham. Fox, M., Gajderowicz ,B., Lyu, D. (2024), A Maturity Model for Urban Dataset Meta-data. Manuscript under review.
22. DDI training materials: A metadata management training resource
Hayley Mills (DDI Alliance Training Working Group)
Alina Danciu (DDI Alliance Training Working Group)
Kathryn Lavender (DDI Alliance Training Working Group)
The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. DDI is a free standard that can document and manage different stages in the research data lifecycle. The DDI training materials provide an introduction to metadata and DDI, to those who are new to DDI - from the basics of ‘What is Metadata?’ to more detailed subjects like ‘Variables and the Variable Cascade’. They can be used for your own individual training to gain an understanding of different aspects of DDI, or re-used when developing training activities. The training materials were developed by the DDI Alliance Training Working Group, whose mission involves introducing people to DDI and improving people's competence in working with DDI. The resource has been published in the Zenodo community DDI Training materials (https://zenodo.org/communities/ddi_training_material) in the form of presentations, and includes a guide for how to use them. In addition, effort has been made to translate several presentations into French. The poster describes the DDI Training resource and how to use them for your own or others learning.
23. Colectica Datasets for Data File Analysis, Conversions, and Archiving
Dan Smith (Colectica)
Jeremy Iverson (Colectica)
This poster introduces Colectica Datasets and a novel data and documentation methodology, DataDoc, which combines the open Parquet file format with embedded RDF metadata that seeks to address the limitations of proprietary statistical file formats, while also offering efficiency, archival, and FAIR benefits. Statistical data tools like SAS, SPSS, and Stata and programming languages such as R and Python give data users powerful capabilities for data analysis. However, these tools have limited metadata capabilities. Colectica Datasets is a new tool that takes a metadata first approach to dataset creation, embedding DDI metadata into the data files themselves. Colectica Datasets can import SAS, SPSS, Stata, CSV, and Parquet files. Additional metadata about the dataset and its variables can be created by the user. Variable labels and value labels can be specified in multiple languages. The tool can then export the enhanced dataset to SPSS, Stata, CSV, Parquet, or Excel, and create summary statistics, weighted statistics, and codebook documentation. Users can run quality checks on their dataset which show alerts where additional metadata would be useful. In addition to embedding enriched metadata, Colectica Datasets can be used as a transfer tool between various statistical file formats. It automatically handles different types of missing values, format mapping, and data type detection. Colectica Datasets is free for personal use and available on Windows and macOS.
24. Broad Consent for Secondary Use of Data: Students and the Consent Process
Brian Jackson (Mount Royal University)
Participant consent for the secondary use of research data is a crucial element of the data lifecycle, and should be an important consideration in data curation and repository management practices. Canadian federal research ethics guidelines include numerous categories of information that should be provided to potential participants in order to establish that consent for unknown future uses of data, or broad consent, is informed. The volume of information conveyed during the consent process raises questions about participant engagement with and comprehension of the terms of consent and whether alternatives to written formats might be preferred by participants. To examine these questions, an anonymous survey was distributed to students at a Canadian undergraduate university designed to explore: how deeply students engage with consent information; whether a brief video describing procedures for, benefits, and risks of open data influences decisions to opt in or out of an open dataset; and whether those decisions are consistent with behaviours toward privacy and information security in other contexts. Preliminary results demonstrate low levels of engagement with the consent process; ineffectiveness of an explanatory video; largely altruistic motivations for participation in anonymous, open research; and attitudes toward personal data protection that are highly context-dependent. Further research in this area could support improvements to the open data consent process to add clarity and increase trust in research data management and sharing processes, and inform the development of institutional and repository policies around ethical data sharing.
25. Reducing climate change through the UK Data Service
Deepti Kulkarni (UK Data Archive)
Andrea Munson (UK Data Archive)
Darren Bell (UKDS)
People across the globe have been increasingly drawn to discussions on climate change and what they can do to lower their contribution to it. The UK Data Service makes international data available which has been used to underpin research in this area. We offer access to the rich seams of International Energy Agency (IEA) data to researchers and students in higher and further education. The IEA data includes breakdowns on different energy use, including electricity, oil, coal, natural gas and renewables, alongside data on greenhouse gas emissions and world energy balances, prices and statistics. Key data sets – https://ukdataservice.ac.uk/about/key-data-owners-and-collections/international-energy-agency/ Research and impact – https://www.sciencedirect.com/science/article/pii/S1361920921001188?via%3Dihub https://www.sciencedirect.com/science/article/pii/S0301479720306332?via%3Dihub https://www.nature.com/articles/s41560-020-0579-8 https://www.sciencedirect.com/science/article/pii/S1352231020305689?via%3Dihub https://www.sciencedirect.com/science/article/pii/S1364032118305975?via%3Dihub
26. Evolving Data Ecosystem at an Academic Library
Heather Charlotte Owen (University of Rochester)
Sarah Siddiqui (University of Rochester)
In this world where the lake of available data is becoming an ocean, libraries need to support their researchers as they navigate government mandates and the future of data. At our library, we decided it was crucial to increase our data services to better support researchers as they meet funder requirements, delve into open science, and improve data skills. This poster will depict the data service ecosystem we created, showcasing how pre-existing support, new services, and our goals for the future come together to create a web that supports the entire data lifecycle. In previous years, our data services focused on offering support in bibliometrics, open scholarship, and geospatial data. In the past year, we have expanded by creating a new data management and sharing support service and introducing new workshop offerings. These workshops range in topics from reproducibility, to metadata, AI, ArcGIS and mapping, and funder policies, culminating with a successful series on data visualizations. Finally, we supported our researchers with new software – launching a data and institutional repository and purchasing and supporting an electronic laboratory notebook. Our plan is to continue this momentum by expanding our data team with two inaugural positions, reproducibility librarian and data curator. This will help us offer tailored learning opportunities, design materials that cater to changing funder requirements, and explore additional research/data platforms. We are also building on partnerships with units across campus that work in synergistic areas including groups that specialize in graduate education, research, and open scholarship. This strategy includes bolstering our marketing efforts and increasing the visibility of our services. In addition, AI and machine learning are closely related to data, and we have a working group devising an AI recommendation for libraries. This poster will depict our data ecosystem, highlighting our current progress, upcoming initiatives, and future aspirations.
27. Anchoring qualitative research support: two university library case studies
Olivia Given Castello (Temple University)
Lynda Kellam (University of Pennsylvania)
Van Bich Tran (Temple University)
Qualitative research is fundamental to the social sciences but has historically sailed in the shadow of, and received less support than, quantitative research. In her pioneering study of the nascent state of qualitative supports in academic libraries, Swygart-Hobaugh (2016) even dubbed them the 'Jan Brady' of data services. As qualitative researchers, and leaders of research data services and social science support units at our respective libraries, we recognize this and believe our organizations are well positioned to support qualitative researchers throughout the research lifecycle. This poster will showcase our experiences establishing new qualitative research support services at Temple University Libraries and the University of Pennsylvania Libraries. We will discuss how we developed or restarted our services, from initial benchmarking and internal exploration to service launch, team staffing, and evaluation. Our two case studies demonstrate practical applications of suggestions made by past researchers, who have explored the visibility and accessibility of qualitative research support and the relationship between qualitative support, data literacy, and other research data services (Swygart-Hobaugh, 2016; Cain et al., 2019; Hagman & Bussell, 2022). We will describe how we acted upon their findings to enhance the discoverability of our libraries’ qualitative services, their educational impact, and their integration with other programs. This poster will yield practical insights to benefit other library, information, and data professionals launching new or developing existing qualitative research support services. We hope to inspire IASSIST 2024 attendees to consider how their organizations can navigate a research landscape where qualitative data and methods are increasingly valued.
28. The Evolution of Geospatial Data Discovery at MIT
Jennie Murack (MIT)
Daniel Sheehan (MIT)
Paxton LaJoie (MIT)
Nicholas Albaugh (MIT)
The poster will outline the changes in the technology used to house the MIT Libraries geospatial data collection over time and outline the specific criteria we evaluated as we made changes through the 21 years that we have been offering geospatial data discovery systems. We will describe our best practices for collaboration and communication between GIS and IT staff and how that helped to create the best system for our users. The MIT Libraries first created access for users to our geospatial data collection in 2002, with an ESRI geodatabase and a desktop search interface written in Visual Basic. Subsequent generations of the collection and search tool evolved from locally coded search interfaces using open source tools to OpenGeoPortal and eventually to GeoBlacklight, with MIT Libraries engineers contributing to GeoBlacklight development. Throughout all of this, our geospatial collection remained separate from the rest of the MIT Libraries collections. Because of staffing and skill changes, technical limitations, and a desire to integrate all of our collections, MIT Libraries is currently moving from GeoBlacklight to a system directly integrated with other library systems and collections. The initial change will lose spatial search capabilities but will integrate our geospatial collection with other collections and will be powered by MIT Libraries’ TIMDEX search API. We are currently planning the next steps beyond the TIMDEX text search, which will include both spatial search capabilities and the ability to examine geospatial data before downloading it.
29. Using GIS for Historical Data Rescue and Digitalization in Libraries: The 1961 & 1971 Canadian Census Subdivisions & Counties Project
Amber Leahey, Scholars Portal (University of Toronto Libraries)
Digitization of massive volumes of print publications held in libraries and archives contains valuable historical government information such as embedded facts, figures, data, tables, maps, reports, documentation, code, and more. Over the past two decades, various projects have undertaken large-scale ‘digitalization’ of historical, pre-digital Censuses of Canada. For the Census of Canada modern era, there are two main gaps that remain for the 1961 and 1971 Censuses corresponding to local governments (municipalities), counties, and equivalent units in municipally unincorporated areas. These units are called Census Subdivisions (CSDs) and are nested within Census Divisions (CDs). Extraction of census data and boundaries using geographic information systems (GIS) supports data rescue and enables broad discovery and reuse across research disciplines. This poster will outline a project to ‘digitalize’ the CSD boundaries for 1961 and 1971 using the best available reference materials and leveraging previous and newly digitized sources, including digitized print source maps, and then, using GIS to draw and assign each CSD polygon with the appropriate ID code and name to match with available data sources.
30. Students and Community Partners Navigating Food Insecurity, Many Data Points at a Time: Overview of Georgia State University Library's Public Interest Data Literacy (PIDLit) Learning Lab Course
Mandy Swygart-Hobaugh (Georgia State University)
Ashley Rockwell (Georgia State University)
Halley Riley (Georgia State University)
Poster will give an overview of the two-semester experiential-learning course they designed and taught in the Fall 2023 and Spring 2024 by Research Data Services (RDS) faculty from the Georgia State University Library’s Public Interest Data Literacy (PIDLit) grant-funded initiative (https://lib.gsu.edu/pidlit). The "Tackling Food Insecurity" PIDLit Learning Lab connected students with partner organizations to apply data skills to address the real-world problem of food insecurity. The poster will: (1) give a brief overview of the course content and array of assignments, (2) detail the partner-driven data collection, analysis, and reporting activities in which students engaged, (3) highlight the successes, the challenges, and the lessons learned for future course offerings, and (4) facilitate discussion with poster attendees regarding the benefit of others considering developing and teaching similar applied experiential-learning courses.
31. Expanding and Refining CAMGDP, the Central Asia and Mongolia Gender Data Portal
Ryan Womack (Rutgers University)
Aizada Arystanbek (Rutgers University)
Launched in 2023, the Central Asia and Mongolia Gender Data Portal (CAMGDP) is a data portal created by to assist scholars, academics, activists, and students in finding gender-related data on Central Asia and Mongolia. We compile quantitative and qualitative sources, informational websites, media publications, social media, and organizations related to Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, Uzbekistan, and Mongolia. The portal places a particular focus on highlighting local initiatives and grassroots organizations in an effort to give them the same credit and exposure as their international and foreign counterparts often receive in Western scholarship and media. This poster focuses on the ongoing development work on the portal, including research on Uzbekistan, refinements in metadata, and the development of resource guides, background guides, and translations into the languages of the region.
32. Progress, Not Perfection: A Workshop Series on Accessible Data Visualization
Negeen Aghassibake (University of Washington Libraries)
Data visualization can improve access to and understanding of information to a wider audience. In emergencies, charts and infographics can be used to quickly communicate information and help with decision making. Despite these benefits, inaccessible data visualizations can exclude people from getting the key information they need to understand a topic or to make decisions. Accessibility is a core part of social justice work and should continue to be a critical component of the future of data. This poster will cover a workshop series at the University of Washington that teaches attendees how to create more accessible visualizations. The workshops emphasize accessibility at the foundation of visualization creation to be more inclusive of all users, and they provide guidance on topics such as screen readers, color and vision, and more. The next stage of this workshop series is to gather feedback on the content and presentation, then hopefully make the content, including slides and activities, open to anyone who is interested in using the material to work with their own user groups.
33. Building a data-intensive social science center
Ron Borzekowski (Yale University)
Barbara Esty (Yale University)
Limor Peer (Yale University)
The newly formed Data Intensive Social Science Center (DISSC) at Yale University is a university-wide hub for data-based research and programming in the social sciences, sitting at the heart of an initiative focused on multidisciplinary social science. The Center embodies a modern version of basic university infrastructure and was established to implement the recommendation in a report to the provost by social science faculty from across the university for such a facility. DISSC addresses emerging needs common to the university’s social scientists and is a major step toward realizing one of Yale’s main university-wide academic priorities. As researchers across the social sciences continue to expand and capitalize on data use to study and address social problems, their needs for an infrastructure and support network have increased. Research support at Yale is distributed across campus by schools, departments, and centers, growing organically to meet the needs of their constituents. Over time this network has shown some gaps where disciplines or needs overlap and it could be challenging to navigate which service point is available to which researcher. This poster describes some of the activities during the first year of DISSC, including improvements to research infrastructure, data access, management, and discoverability, cross-campus collaboration, and interdisciplinary data use, which fall under several IASSIST topic areas.
34. Building an Integrated Data Training and Support Model for Graduate Students
Nick Rochlin (University of British Columbia)
Mathew Vis-Dunbar (University of British Columbia)
Graduate and undergraduate researchers often lack formal training in data, analysis, and coding, and have limited training in statistical applications. Departments, student groups, libraries, and research computing units occasionally and sporadically address these gaps, offering workshops, online learning materials, etc. While well intentioned, these can lack a connected and scaffolded pedagogical approach, in particular one that connects this work throughout the research life cycle and to broader campus computing supports. The result is often task-oriented, as opposed to conceptually oriented learning opportunities generating transferable skills that traverse specific tools. At the University of British Columbia’s Okanagan campus, the Library and Research Computing have partnered to build an array of supports – workshops, consultations, drop-ins, and online learning materials – designed to meet learners at their point of need and support them through the early stages of their research careers. Principles of RDM, transparency, and reproducibility are woven throughout, and each support is integrated with the next: learn foundational concepts for RDM, data analysis and statistical computing, as well as the tools to interface with HPC infrastructure at a workshop, book a consultation to map these to a specific project, revisit online materials for a refresher, and drop in to ask follow-up questions. This pilot project, funded through internal grants, provides experiential learning opportunities for graduate students, connects early career researchers with research support units, and bridges gaps in the transition from core subject area learning to computational approaches to expand the breadth of how research is conducted and disseminated. Our poster will discuss our experiences building relationships across research support units, how we’ve implemented our pilot, uptake across faculties, and future directions.
35. Librarian teach thyself: Exploring natural language processing tools for de-identification as window into AI for librarians
Christine Nieman Hislop, University of Maryland (Baltimore/NNLM Region 1)
Katie Pierce Farrier (University of North Texas Health Science Center/NNLM Region 3)
This poster will present a project that helped librarians learn and teach artificial intelligence (AI) tools for data de-identification. Librarians continue to play an important role in research data management services, offering crucial outreach and instruction to faculty and researchers about data tools, FAIR data, and open science. As AI tools continue to grow in use and popularity, information professionals are poised to offer guidance on data sharing in the AI landscape. Training librarians to train researchers in AI concepts and AI-based data tools helps increase librarian skills in both AI and data management. Data de-identification is a great example of a data management skill that is well suited to AI-based tools. Therefore, this one-hour webinar was created to introduce librarians and information professionals to AI and natural language processing (NLP), and provide fundamental context about how they work through the example of de-identification. Using a train-the-trainer model, this class explored openly available tools that use NLP to find and redact personally identifying information, and discussed the distinctions between privacy, de-identification, anonymization, and (USA-specific) HIPAA compliance. The removal and de-identification of personal and sensitive information is often a repetitive, time intensive process. These tools can be used to produce HIPAA compliant data that can be more safely shared and help researchers comply with data sharing policies. Teaching librarians about freely available data tools can improve their understanding of AI and expand their institution’s data service offerings. Although the workshop presented was focused on librarians, our planning process may be of interest to other trainers interested in introducing AI basics through practical tools. This presentation will describe curriculum development, highlighted tools, and results from a pilot run.
Workshop 1: Coding Qualitative Data: The Methods (using your Mind) and the Mechanics (using Taguette)
Mandy Swygart-Hobaugh (Georgia State University)
This hands-on workshop introduces participants to both the methodological knowledge and the mechanical skills to collaboratively code qualitative data. The workshop will open with a "mini-methods" talk about the broader methodological and epistemological considerations of qualitative research, including discussion of qualitative research methods, qualitative data types, and analytical coding of qualitative data [approximately 75 minutes total of the 3-hour workshop time]. Then participants will apply the mini-methods knowledge by hands-on coding an interview transcript collaboratively using Taguette, a free and open-source qualitative research tool. NOTE: This workshop is an adaptation of curriculum unit developed for a Fall 2023 Georgia State University (GSU) experiential learning lab course – "Tackling Food Insecurity: A Public Interest Data Literacy (PIDLit) Learning Lab" – taught by members of the GSU Library’s Research Data Services (RDS) Department. Methods examples and the coding exercise data relate to individuals and communities experiencing food insecurity.
Workshop 2: Fundamentals of MAST & IDEAL Metadata
Samuel Spencer (Aristotle)
SPONSORED BY ARISTOTLE. Fundamentals of MAST & IDEAL Metadata is a short-course in the MAST/IDEAL Methodology, a pragmatic framework for building understanding and organisational support for data management through good processes rather than specific tools or standards. The MAST/IDEAL Methodology is a standards-agnostic and tools-independent approach that supports existing frameworks such as the Data Management Body of Knowledge and DDI Data Lifecycle, which talk about what data practitioners must do, by providing a step-by-step guide on how practitioners can develop skills and culture to perform these tasks. This course will be delivered in a practioner-led group session, including: Peer discussions on case studies highlighting existing challenges in change management and education in data governance Examination of sample templates for documenting data and how to adopt these for existing projects Peer discussions on the use of MAST & IDEAL for the selection of data standards and software As an emerging framework, following the course participants will be invited to provide feedback on the methods and frameworks presented to gather data on ongoing research into the framework. Participants will also be invited to complete an online quiz to receive a digital micro-certification to promote their own experience and knowledge.
Workshop 3: Introduction to Data Analysis with Python
Cody Hennesy (University of Minnesota, Twin Cities)
Tim Dennis (UCLA)
This six hour Python workshop uses hands-on coding to introduce programming for library and information workers with little or no previous programming experience. The lesson utilizes open datasets to model reproducible workflows for data analysis, with a focus on helping learners apply and work with fundamental Python concepts such as data types, functions, loops, and libraries. The open-source Library Carpentry Python lesson that we’ll use to teach this workshop is currently undergoing a major redesign, and will use the JupyterLab environment along with Pandas dataframes to explore and generate descriptive statistics from a quantitative dataset of library usage data. The workshop provides a basic introduction for those working with metadata, citations, and quantitative data, and serves as a great first step for folks hoping to continue to build skills to access, clean, analyze, and visualize data with Python.
Workshop 4: Let's create data schemas!
Michelle Edwards (University of Guelph)
Carly Huitema (University of Guelph)
A key component to any research project is data and making that data FAIR-compliant is our ultimate goal. However, many researchers today are creating minimal documentation of their data, maybe just enough to pass the requirements of the data management plan (DMP), and maybe the requirements of a data repository. In many situations, researchers may be relying on data repositories or archives to create the necessary documentation to make their data FAIR or are just not concerned about it. At the University of Guelph, we are encouraging and enabling our researchers to create data schemas at the start of their project (before data collection). The schemas are created using our Semantic Engine and the newly developed Overlays Capture Architecture. Data schemas may be stored, shared, edited, and are all uniquely identified. Imagine a large 5yr research project using the same data schemas and data entry forms across all their students and research associates? The time that can be saved when pooling all the data for analysis is immense. Imagine being able to confirm that the data you just discovered was indeed the one created by the Project Lead 10 years ago and not altered in anyway? This workshop will walk you through the development of the Semantic Engine tools and provide hands on experience creating your own data schemas. We will work with a sample dataset and demonstrate how we can build the layers of a schema, how they are uniquely identified, how you can share them, build upon an existing schema, and how we can create a data entry Excel file for the project. We will close the workshop with a discussion about current and future development and possible uses within the broader research ecosystem - just think how AI could use these schemas?
Workshop 5: Data Curation Fundamentals
Mikala Narlock (University of Minnesota)
Sophia Lafferty-Hess (Duke University)
Erin Clary (Digital Research Alliance of Canada)
Amber Leahey (University of Toronto)
Meghan Goodchild (Queen’s University and Borealis)
Tamanna Moharana (Digital Research Alliance of Canada)
Robyn Stobbs (Athabasca University)
Data curation is a key component of the data sharing and publication process, during which data professionals review a dataset, code, and related outputs to ensure that data are findable, accessible, interoperable, and reusable (FAIR) and incorporates ethical curation considerations. Data curation enables data discovery and access, maintains data quality and authenticity, adds value, and provides for re-use over time through activities including open and controlled access, archiving, metadata creation, digital preservation, and file format transformation. There are many additional activities encompassed by the term data curation– which can be daunting for a novice to understand and apply in a meaningful way. The IASSIST & CARTO 2024 proposed Data Curation Network (DCN) and Digital Research Alliance of Canada co-hosted Data Curation Fundamentals training workshop will provide attendees with a framework for getting started with data curation, including hands-on practical curation training using various data formats . Using the DCN CURATE(D) workflow (z.umn.edu/curate) and the Canadian bilingual (English and French) CURATION framework, attendees will learn practical curation techniques that can be applied across research disciplines. Provided by members of the DCN Education Committee, and the Alliance’s Curation Expert Group (CEG) and invited speakers, this workshop will leverage both active learning opportunities, using example datasets, as well as discussions in an inclusive peer-to-peer learning environment. This established curriculum has been used for both in-person and virtual learning opportunities, with overwhelming success. This workshop has been taught in the United States, adapted and extended for use in Canada, and we are eager to bring our curriculum to the international community.
Workshop 6: Open Data and QGIS for Strengthening Data and Geographic Competencies and Humanitarian Interventions
Ann James (George Washington University)
Key concepts and technical skills in data and geographic literacies play a vital role in the effective communication of research and analysis of humanitarian importance conducted by individuals employed in academia, nonprofits, and government teams in North America and elsewhere. Such literacies include an individual’s ability to select, clean, analyze, visualize, critique, and interpret relevant open geospatial datasets. It also includes an ability to understand and critically apply fundamental geographic principles which underlie geospatial technologies. This workshop engages open data and open source GIS software to provide participants an introduction to key concepts and technical skills in data and geographic literacies. During the workshop, participants will become familiar with key geographic concepts, especially the map, geographic place, projections, and coordinate reference systems. They will gain hands-on experience accessing, visualizing, and interpreting a geospatial dataset representing sensitive subjects of relevance to public health within the US-Mexico international border region. Through their participation, attendees’ may also enhance their awareness and knowledge of open educational resources, international migration as a public health issue, and opportunities for humanitarian interventions in borderlands, such as improvements in an ability to access information, clean water, food, and shelter.
Workshop 7: Research Reproducibility and Data Management with R
Briana Wham (Penn State University)
In the rapidly evolving research landscape, characterized by the increasing prevalence of open science practices, data intensive methodologies, and computational approaches, ensuring the reproducibility and transparency of research outcomes is vital. The aim of this hands-on workshop is to empower participants with the knowledge and practical skills required to implement data management practices and cultivate reproducible research practices within the RStudio environment. Leveraging the capabilities of R and RStudio, the workshop will delve into various aspects of research reproducibility and data management, emphasizing literate programming, project workflows, and an array of R packages to support data management and reproducibility efforts. Participants can expect to gain an increased understanding of how R and RStudio can serve as instrumental tools in fostering transparent, reproducible, and reusable research.
Workshop 8: Understanding Data Anonymization
Kristi Thompson (Western University (The University of Western Ontario))
Data curators should have a basic understanding of data anonymization so they can support safe sharing of sensitive data and avoid sharing data that accidentally violates confidentiality. This workshop will consist of a lecture followed by a session using R. The first half will cover the mathematical and theoretical underpinnings of guaranteed data anonymization. Topics covered include an overview of identifiers and quasi-identifiers, an introduction to k-anonymity, a look at some cases where k-anonymity breaks down, and a discussion of various enhancements of k-anonymity. The second half will walk participants through some steps to assess the disclosure risk of a dataset and anonymize it using R and the R package SDCMicro. Much of the academic material looking at data anonymization is quite abstract and aimed at computer scientists, while material aimed at data curators does not always consider recent developments. This session is intended to help bridge the gap.