This systematic review assessed the implementation of an ML-assisted medical concept mapping tool, ArcMAP, which uses the BioLORD model with a human-in-the-loop workflow and continuous learning pipeline. The review focused on its application in five UK-based NHS hospitals over a two-month period, comparing it to manual workflows. Key outcomes included mapping efficiency and top-1 accuracy for laboratory test names, with the review synthesizing data from these settings to evaluate performance improvements.
The main findings indicate that top-1 accuracy for laboratory test names increased from 37.0% to 91.6%, and weighted average top-1 accuracy, simulating onboarding of a new hospital, was 73.5%. Mapping efficiency also increased compared to manual workflows, though specific effect sizes, absolute numbers, and statistical significance were not reported. These results suggest potential benefits in standardizing data and enhancing workflow processes within the NHS.
However, the authors note limitations, including substantial heterogeneity in data collection practices and local coding schemes across healthcare providers, as well as substantial variability across NHS hospital systems. These factors may affect the tool's applicability and performance in other contexts. The review's practice relevance is framed as potentially accelerating NHS data standardization, but cautious interpretation is warranted due to the observational nature and lack of detailed statistical data.
View Original Abstract ↓
The increasing use of electronic health records (EHRs) for real-world evidence (RWE) studies is hindered by substantial heterogeneity in data collection practices and local coding schemes across healthcare providers. Data standardization—particularly the mapping of locally defined medical concepts to standardized vocabularies—is therefore a critical but labour-intensive step, traditionally relying on extensive manual review by clinical experts. While a range of machine-learning (ML) approaches have been proposed to support medical concept mapping, their integration into practical, end-to-end workflows and their performance under real-world conditions remain insufficiently studied. In this work, we present ArcMAP, an end-to-end application that integrates a state-of-the-art biomedical representation model (BioLORD) into a human-in-the-loop workflow designed to streamline and accelerate medical concept mapping. ArcMAP provides a graphical user interface that enables clinical experts to efficiently review, validate, and correct automated mapping suggestions. A core component of the system is a continuous learning pipeline, in which expert feedback is systematically captured and used to update the underlying model, allowing ArcMAP to adapt to evolving coding practices and newly onboarded data sources. We conduct a comprehensive evaluation of ArcMAP across multiple deployment scenarios, including the impact of continuous fine-tuning, the onboarding of a new hospital, and a longitudinal real-world evaluation conducted over a two-month period using medication and laboratory test data from five UK-based NHS hospitals. Our results demonstrate the importance of domain-specific fine-tuning, with top-1 accuracy for laboratory test names increasing from 37.0% to 91.6%. However, when simulating the onboarding of a new hospital, the system achieves a weighted average top-1 accuracy of only 73.5%, indicating substantial variability across NHS hospital systems. In real-world use, the use of ArcMAP indicates an increased mapping efficiency compared to manual workflows, while also revealing considerable variation across individual data-mapping sessions.