Researchers conducted an evaluation study to see if artificial intelligence (AI) language models could help assess psychosis risk. They used 11 different open-source AI models to analyze transcripts from 678 clinical interviews with 373 people, most of whom were considered at clinical high risk for psychosis. The goal was to see if the AI could accurately classify risk status and score symptom severity and frequency, comparing its results to those made by human researchers.
The study found that larger AI models performed the best. One specific model, Llama-3.3-70B, correctly classified risk status 80% of the time. The symptom scores generated by the AI also showed good agreement with the scores given by human researchers. The AI summaries were mostly faithful to the interview content, with only a 3% rate of making up clinically important details. However, when the AI made errors, they were mostly errors of 'over-pathologization,' meaning the AI tended to label normal human experiences as potential symptoms.
This research is an early step in exploring how AI might assist in mental health screening. The study did not report on safety concerns, as it was an analysis of existing transcripts, not a live clinical trial. The main reason to be careful is that this technology is not yet proven for real-world use. Performance also varied somewhat across different research sites. Readers should understand that this is a promising but preliminary finding. It suggests AI could one day help clinicians by quickly reviewing interview notes, but it is not a replacement for professional assessment and is far from being a standard tool.