Systematic review evaluates NLP models for triage accuracy against human triage in outpatient specialist referrals.
This systematic literature review and narrative synthesis examined the utility of natural language processing in clinical settings. The review included 10 studies for data extraction and synthesis, derived from an initial pool of 4,225 titles and abstracts reviewed and 26 full-text reviews. The population consisted of outpatient referrals to a specialist, covering medical or surgical contexts. The setting encompassed diverse specialties including surgery, medical specialties, and radiology.
The intervention involved NLP-based models for triage-related tasks, specifically urgency prioritization, referral classification, and justification review. These models were compared against human triage as the comparator. The primary outcome focused on accuracy compared to manual workflows. Secondary outcomes included dataset preprocessing and augmentation, triage model performance, feasibility, and clinical applicability.
Regarding main results, 7 studies reported high levels of accuracy. The evidence does not provide specific effect sizes, absolute numbers, p-values, or confidence intervals for these accuracy metrics. The input data indicates that outcome measures varied across studies, which complicates direct comparison. The review notes a need for standardized reporting and prospective validation to confirm these findings.
Safety data were not reported in the available evidence. Adverse events, serious adverse events, discontinuations, and tolerability were all listed as not reported. Consequently, the safety profile of these NLP-based models remains undefined in this synthesis. The limitations highlight the heterogeneity of outcome measures and the lack of prospective validation.
Practice relevance suggests NLP shows promise in augmenting human triage of outpatient referrals to specialty care. However, clinicians should interpret these findings cautiously due to the observational nature of the included studies and the lack of safety data. Further research is required before widespread implementation can be recommended based on this evidence alone.