Data discovery made easy: enhancing access to the Great Britain Historical GIS via a Large Language Model
The "Data Discovery Made Easy" project (DDME) is funded by the UK Economic and Social Research Council as part of their "Future Data Services (Pilots)" programme. It is adding a prototype natural language search interface to the existing web site A Vision of Britain through Time. This is a public interface to the Great Britain Historical GIS (GBHGIS), a large Postgres/PostGIS database holding data from every British census 1801-2021, diverse other statistics including vital registrations and the farming census, and digital boundaries for most of the ever-changing reporting geographies.
We argue that existing data services have become too focused on the needs of data scientists, investing substantial time in learning to navigate download systems. We focus instead on mainstream social scientists and others like journalists and policy analysts, often seeking just one local time series or even a single data value. The GBHGIS holds all statistics in a single central data store, but the diversity of content and the enormous complexity of Britain's statistical geographies makes data discovery challenging.
Our poster will provide an overview of the DDME project, including:
<> Our survey of user needs, focusing on social scientists whose main concerns are with their own surveys, or theoretical, but access secondary data to provide context.
<> An external review of our data model, and its compatibility with current data standards including DDI and SDMX.
<> Our metadata editors, enabling our unique data structure to be more easily extended without an intimate knowledge of the model.
<> The new natural language search interface, acting as a bridge between the non-specialist user and the data repository. It currently depends on Large Language Models (LLMs) from OpenAI but is designed so that these can be replaced by a future locally-hosted LLM, reducing costs.