AI-Based Adjudication of MACE Events Shows High Agreement with Human CEC in MI Trial
This study evaluated an artificial intelligence-based adjudication system, Auto-MACE, for major adverse cardiovascular events (MACE) in a global randomized trial of 5,661 patients with myocardial infarction (MI) complicated by systolic dysfunction or pulmonary congestion. The trial compared sacubitril/valsartan versus ramipril, and the primary outcome was agreement between Auto-MACE and physician clinical events committee (CEC) adjudication for MACE events. Auto-MACE uses an OpenAI o1-mini language model and a Clinical Longformer model to classify events.
For the primary analysis, Auto-MACE achieved confident adjudication for 315 of 455 deaths (69%), 301 of 659 potential MIs (46%), and 136 of 167 potential strokes (81%). Among these confident cases, agreement with CEC adjudication was high: 97% for deaths, 89% for potential MIs, and 88% for potential strokes. When considering all events (including those where the model was not confident), agreement was lower: 86% for deaths, 76% for potential MIs, and 84% for potential strokes.
Secondary outcomes included the estimated treatment effect of sacubitril/valsartan versus ramipril on the composite MACE endpoint. Using Auto-MACE, the hazard ratio was 0.91 (95% CI: 0.78-1.07), while using CEC adjudication, the hazard ratio was 0.90 (95% CI: 0.77-1.05). These results are very similar, suggesting that AI-based adjudication could yield comparable treatment effect estimates.
Safety and tolerability data were not reported in this analysis. The study did not provide details on adverse events, serious adverse events, or discontinuations. The focus was solely on the performance of the AI adjudication system compared to the standard CEC process.
Compared to prior landmark studies, this is one of the first large-scale evaluations of AI-based adjudication in a cardiovascular outcomes trial. Previous studies have used traditional CEC adjudication as the gold standard, and this study attempts to validate an automated alternative. The high agreement for confident events is encouraging, but the lower agreement for all events highlights the need for human oversight.
Key methodological limitations include the lack of reporting on the specific training data for the AI models, potential selection bias in which events were deemed confident, and the absence of a prospective validation in a separate trial. The study also did not report the time or cost savings associated with Auto-MACE, which would be important for practical implementation.
Clinically, these results suggest that AI-based adjudication could potentially reduce the workload of CECs by handling a subset of events with high confidence, while uncertain events still require human review. However, the lower agreement for all events (especially MIs) means that full replacement of human adjudication is not yet supported.
Unanswered questions include the generalizability of Auto-MACE to other trial populations, the impact on trial timelines and costs, and the optimal threshold for confident adjudication. Further research is needed to validate these findings in prospective settings and to assess the clinical and operational implications.