Validation study of an open-source LLM-enabled genetic testing recommendation pipeline in patients with rare genetic aortopathies.
This validation study assessed an open-source large language model (LLM) pipeline designed to support genetic testing recommendations for rare genetic aortopathies. The system utilized retrieval augmented generation (RAG) on curated corpora and was tested in a cohort of 499 patients drawn from the Penn Medicine BioBank, comprising 250 cases and 250 controls. No comparator was reported for this evaluation.
Performance metrics indicated robust algorithmic performance across multiple dimensions. The pipeline achieved a patient-level recommendation accuracy of 0.834. Additional metrics included a precision of 0.835, sensitivity of 0.831, specificity of 0.836, an F1-score of 0.833, and an F3-score of 0.832. Regarding patient categorization, the system successfully categorized 425 out of 499 patients. Statistical significance or confidence intervals were not reported for these outcomes.
Safety and tolerability data were not reported in this study. One specific case required further clinician evaluation due to incomplete information, highlighting a limitation in the dataset or system handling of specific scenarios. The study design was a validation study, not a clinical trial, which limits the ability to infer direct clinical efficacy or safety in routine practice.
The practice relevance lies in providing a potential decision-support tool to assist clinicians in the earlier recognition of rare genetic disease risks. However, because this was a validation study without a control arm or randomized design, the findings should be interpreted as preliminary evidence of technical feasibility rather than proof of clinical benefit. Clinicians should exercise caution when considering integration into standard care workflows.