This research focused on developing machine learning models for real-time suicide risk detection using text from clients in asynchronous text therapy within a digital mental health setting. The specific study design and sample size were not reported in the available data. The intervention involved training models to classify risk levels as 'no risk,' 'moderate,' or 'severe,' comparing new iterations against a previously published model.
The primary outcome measured model performance using a weighted F1 score. The final multiclass model, designated as version 3.0, achieved a weighted F1 score of 0.85. This result represented an improvement over the previous model, though exact absolute numbers, p-values, or confidence intervals were not reported. No secondary outcomes were detailed in the findings.
Safety and tolerability data were not reported, as adverse events, serious adverse events, discontinuations, and general tolerability metrics were absent from the study results. Additionally, no follow-up data regarding long-term model stability or clinical outcomes were provided. The study did not report specific limitations, funding sources, or conflicts of interest.
The practice relevance suggests these models could enhance clinical utility by helping providers prioritize urgent cases for more accurate and timely intervention. However, because this is a model development study without clinical validation or reported impact on patient outcomes, the findings should be interpreted as technical performance metrics rather than evidence of improved patient care or safety.
View Original Abstract ↓
The goal of this work was to leverage a large corpus of text based psychotherapy data to create novel machine learning algorithms that can identify suicide risk in asynchronous text therapy. Advances in the field of natural language processing and machine learning have allowed us to include novel data sources as well as use encoding models that can represent context.
Our models utilize advanced natural language processing techniques, including fine-tuned transformer models like RoBERTa, to classify risk. Subsequent model versions incorporated non-text data such as demographic features and census-derived social determinants of health to improve equitable and culturally responsive risk assessment, as well as multiclass models that can identify tiered levels of risk.
All new models demonstrated significant improvements over our previous model. Our final version, a multiclass model, provides a tiered system that classifies risk as "no risk," "moderate," or "severe" (weighted F1 of 0.85). This tiered approach enhances clinical utility by allowing providers to quickly prioritize the most urgent cases, ensuring a more accurate and timely intervention for clients in need.
Author SummarySuicide is a major public health concern, and traditional methods for assessing risk in clinical settings have serious limitations, often failing to capture risk in real time. To address this, we developed a series of new machine learning models to automatically and accurately detect suicide risk from the text of therapy messages. By training these models on a large, unique dataset of de-identified clinical transcripts, we were able to move beyond simple keyword spotting to a more contextual interpretation of a clients language. The resulting models showed vast improvements over our previous published model. This is critical both for catching as much risk as possible, and for reducing "alert fatigue" for our clinicians by reducing the number of false alarms raised by the model. Furthermore, our final model, v3.0, introduced a tiered system that classifies text as "no risk," "moderate," or "severe." This allows clinicians to quickly prioritize the most urgent cases, ensuring a more accurate, equitable, and timely intervention for the clients who need it most.