AI tool predicts acute leukemia subtypes with AUROC 0.94 for AML, 0.98 for promyelocytic, 0.84 for ALL in international cohort
This retrospective study assembled an international cohort of 6206 leukemia patients from 20 centers to test and refine an artificial intelligence (AI) tool designed to support leukemia diagnosis using standard laboratory results. The goal was to address health disparities by potentially improving access to diagnosis. The pretrained algorithm was executed on this diverse cohort, yielding varying accuracy metrics. When applying a confidence cutoff for predictions, the 2000-fold bootstrapped area under the curve (AUROC) metrics were 0.94 for acute myeloid leukemia (AML), 0.98 for the promyelocytic subtype, and 0.84 for acute lymphoblastic leukemia (ALL). However, this confidence cutoff approach excluded a substantial proportion of patients from receiving predictions, ranging from 70.8% to 92.5%. To improve the tool's utility, the researchers enhanced its accuracy and robustness while maintaining generalizability. They implemented an ensemble method combining Isolation Forest and Local Outlier Factor. This refinement increased the AUROC for AML from 0.72 to 0.84 on a hold-out test set specifically for patients who fell below the initial confidence threshold. Importantly, this improved model excluded only 12.1% of patients from predictions, a significant reduction from the earlier high exclusion rates. An additional development noted in the abstract is that the algorithm was retrained specifically for pediatric patients. The study demonstrates a process of international testing and iterative refinement of an AI diagnostic support tool, showing that modifications can substantially reduce the rate of excluded patients while improving performance for a subset of cases.