Full Program »
Metadata Augmentation for Social Science Datasets Using Generative AI
Efficiently curating metadata with controlled terminology is a critical yet time-consuming task in social science data management. Data depositors often provide insufficient metadata, compelling data repository staff to extensively enhance the metadata. This process traditionally involves navigating a wide array of controlled terms, a task demanding substantial time and expertise, sometimes necessitating the creation of new terms.
Addressing these challenges, we introduce an innovative model employing Generative AI technology (ChatGPT). This tool is engineered to significantly diminish the time required for metadata curation for data repository staff while enhancing the accuracy of term matching. It achieves this by rapidly analyzing text and extracting pertinent keywords from established thesauri, including ICPSR, ELSST, and Library of Congress, along with ChatGPT's intelligent recommendations. This approach not only expedites the curation process but also ensures heightened precision and recall in the results.