Systematic review evaluates NLP models for triage accuracy against human triage in outpatient specialist referrals.

Photo by FORTYTWO / Unsplash

Frontiers in Medicine Published April 16, 2026 Medically reviewed April 25, 2026 Study authors: Angela Lim Fung, Mohamed Abdalla, Yu Blanche Cheng, Manar Elsayed, Dagmara Chojecki, Joseph Ross Mit… DOI ↗ By Dr. Lars van Dijk, PhD · Surgical, Procedural & Diagnostic

Key Takeaway

Consider NLP models for triage cautiously, noting high accuracy in 7 studies but requiring prospective validation.

This systematic literature review and narrative synthesis examined the utility of natural language processing in clinical settings. The review included 10 studies for data extraction and synthesis, derived from an initial pool of 4,225 titles and abstracts reviewed and 26 full-text reviews. The population consisted of outpatient referrals to a specialist, covering medical or surgical contexts. The setting encompassed diverse specialties including surgery, medical specialties, and radiology.

The intervention involved NLP-based models for triage-related tasks, specifically urgency prioritization, referral classification, and justification review. These models were compared against human triage as the comparator. The primary outcome focused on accuracy compared to manual workflows. Secondary outcomes included dataset preprocessing and augmentation, triage model performance, feasibility, and clinical applicability.

Regarding main results, 7 studies reported high levels of accuracy. The evidence does not provide specific effect sizes, absolute numbers, p-values, or confidence intervals for these accuracy metrics. The input data indicates that outcome measures varied across studies, which complicates direct comparison. The review notes a need for standardized reporting and prospective validation to confirm these findings.

Safety data were not reported in the available evidence. Adverse events, serious adverse events, discontinuations, and tolerability were all listed as not reported. Consequently, the safety profile of these NLP-based models remains undefined in this synthesis. The limitations highlight the heterogeneity of outcome measures and the lack of prospective validation.

Practice relevance suggests NLP shows promise in augmenting human triage of outpatient referrals to specialty care. However, clinicians should interpret these findings cautiously due to the observational nature of the included studies and the lack of safety data. Further research is required before widespread implementation can be recommended based on this evidence alone.

Study Details

Study typeCohort

EvidenceLevel 3

PublishedApr 2026

View Original Abstract ↓

Natural Language Processing (NLP) models show promise in enhancing interpretation and triage of outpatient referrals across diverse specialties. To conduct a systematic literature review and narrative synthesis of recent studies that utilized NLP-based models for triage-related tasks such as urgency prioritization, referral classification, and justification review. Medline, Embase, Web of Science, and CINAHL databases were searched for articles published up to February 17 2024, limiting searches to the last 5 years prior to the search. All citations were imported into Covidence for duplicate removal and screening. We included studies that utilized NLP techniques to triage outpatient referrals to a specialist (medical or surgical), and included comparison to human triage. Abstracts and full texts were each screened independently by two reviewers. Data from each study were extracted independently by two reviewers using a standardized extraction form, including fields such as study design, dataset size, specialty, models tested, and outcomes reported. Results were synthesized narratively, organized by key themes focused on data, model and clinical applicability. Quality and risk of bias assessment was performed using the PROBAST-AI and Technology Readiness scales. A total of 4,225 titles and abstracts were reviewed resulting in 26 full-text reviews. A total of 10 studies were used for data extraction and synthesis. These studies spanned a wide range of medical specialties including surgery, medical specialties, and radiology. Tasks included predicting condition and priority level. Most domains were assessed as low or uncertain risk of bias. Outcome measures varied across studies, but overall, 7 studies reported high levels of accuracy compared to manual workflows. We summarized key differences in dataset preprocessing and augmentation, triage model, and feasibility and clinical applicability. NLP shows promise in augmenting human triage of outpatient referrals to specialty care. To realize the full potential of NLP for triage, future work should prioritize standardized reporting and prospective validation to support safe and effective integration into healthcare systems.

Systematic review evaluates NLP models for triage accuracy against human triage in outpatient specialist referrals.

Study Details

Low-use wrist-strap restraint strategy did not improve days free of delirium or coma compared with high-use strategy in mechanically ventilated adults.

ICU Wrist Restraints: The Surprising Truth About Using Fewer

Clinical research that matters. Delivered to your inbox.

Systematic review evaluates NLP models for triage accuracy against human triage in outpatient specialist referrals.

Study Details

Low-use wrist-strap restraint strategy did not improve days free of delirium or coma compared with high-use strategy in mechanically ventilated adults.

ICU Wrist Restraints: The Surprising Truth About Using Fewer

Clinical research that matters. Delivered to your inbox.

Related in Emergency Medicine

From Other Specialties