IASSIST 2025: IASSIST at 50! Bridging oceans, harbouring data & anchoring the future


Let’s chat about data! A study of data discovery with Large Language Models

The search for reusable data for scientific work has been the subject of intensive research for some time. A fundamental realization from these works is that the search for data differs from the search for literature in various respects. Research data, unlike literature, appear in a very wide variety of file formats and differ in form and content, depending on the field of research from which it originates and the methods and instruments used to generate or collect it. The existing published data cannot be meaningfully indexed in its entirety by any search engine and therefore always requires a textual description (metadata or documentation).

Studies show that researchers learn about reusable research data indirectly: either from literature in which the data is cited or from exchanges with other researchers, for example in their research group or at conferences. For researchers, searching for data on the web primarily works for data they already know (known-item search). They tend to learn about new data through more intensive engagement with a research topic, either by reading articles or in conversations with other researchers ("data talk").

Large Language Models (LLM) could help in mitigating problems with web searches for new data by providing the opportunity to pose clarifying questions or ask for explanations. In our work, we observe data search behavior and use concurrent think-aloud to capture the thoughts and strategies of participants while performing search tasks using an LLM. During the second of two data search tasks we provide each participant with a prompt for the LLM to act as a fellow researcher who is talking with the participant about their data search.

In our contribution we will present first results from our study and derive implications on how data repositories can prepare for increasing data search via LLMs.

Anja Perry
GESIS - Leibniz Institute for the Social Sciences
Germany

Christin Kreutz
TH Mittelhessen - University of Applied Sciences
Germany

Tanja Friedrich
German Aerospace Center
Germany