Full Program »

Metadata Ahoy! Charting a reusable path for machine learning

Machine learning (ML) is more popular than ever, but what is needed to best document, curate, and archive ML research outputs? Data curators are largely in uncharted waters as to what extent repositories are able to manage ML objects and components (data, code, parameters, documentation, etc.) in a way that matches researcher needs and uses. But before we can plot a course towards a set of best practices, we must first ask: where are we now?

This presentation will provide an overview of a recent research project that assessed how well metadata schema and fields in eight generalist (Figshare, Zenodo, Harvard Dataverse, etc.) and specialist repositories facilitate findability, interoperability, and reusability of ML objects. We will discuss strengths of and opportunities for these repositories, and what generalist repositories can learn from specialist repositories and vice versa. The presentation will also summarize the outputs from this project, all of which are publicly available: a multi-repository metadata field crosswalk, complete metadata exports of nearly 20,000 ML-related items from these repositories, and user interface and code to query repository APIs and standardize and analyze metadata exports. We hope the IASSIST community will dive deep into this bounty of (meta)data!

Stephanie Labou
University of California San Diego
United States

Abigail Pennington
University of California San Diego
United States

Ho Jung Yoo
University of California San Diego
United States