Optimising the UK Longitudinal Linkage Collaboration researcher journey through the development of inter-operable data discoverability, data documentation and data access systems
UK Longitudinal Linkage Collaboration (LLC) is the national Trusted Research Environment (TRE) for the UK’s longitudinal research community. LLC integrates data from many UK Longitudinal Population Studies (LPS) and systematically links participants’ health, environmental and non-health socio-economic records, into a centralised TRE.
Co-locating many LPS' datasets and including linked routine records enables a highly diverse UK-wide sample, increases overall statistical power to investigate ‘rare’ exposures/outcomes and includes seldom heard population sub-groups. However, the breadth of data raises a substantial data discovery, selection and inference challenge. To enable LLC to effectively support its users, we have developed a multi-layered FAIR (findable, accessible, interoperable, reusable) system to optimise the researcher journey and to support users to identify and understand the data they need for their research.
LLC collates internal LLC metadata and draws metadata from data owners and related metadata infrastructures via Application Programming Interfaces and then surfaces the metadata into a component of the system. First, LLC Explore (https://explore.ukllc.ac.uk/), a web-based data discoverability tool, provides search functionality – now being integrated with a large language model to enhance performance – with advanced filtering to enable researchers to identify the data items most suited to their research question and to build a data request. Second, LLC Guidebook (https://guidebook.ukllc.ac.uk/) contains the documentation and metrics needed to understand the provenance of the data and how these data have been impacted by the linkage and de-identification processes. Third, integration of LLC Explore’s data request with the bespoke LLC application management system facilitates rapid application review with automated project-level data provision. Finally, our GitHub repositories and associated processes support users to deposit and document reusable research resources (e.g. syntax and code lists) into a community archive.
This multi-layered system is designed to provide a high-quality user experience, whilst maximising the value of existing LPS infrastructure initiatives.