Mode
Text Size
Log in / Sign up

Methodological inconsistencies and data imbalances hinder the clinical application of machine learning for voice disorder classification

Methodological inconsistencies and data imbalances hinder the clinical application of machine…
Photo by Brett Jordan / Unsplash
Key Takeaway
Note that inconsistent data labeling and testing methods currently limit the clinical reliability of ML in voice disorders.

This scoping review synthesized evidence from 80 studies to evaluate the current state of machine learning (ML) techniques for multi-class voice disorder classification. The scope included an analysis of how various ML models perform when identifying and categorizing different types of voice disorders.

The authors identified significant methodological variations, including inconsistent selection of databases, varying diagnostic labels, and diverse types of input data such as demographic questionnaires and specific voice tasks. These inconsistencies, along with issues like class imbalance and inadequate test set sizes, prevent robust comparisons between models and hinder the identification of state-of-the-art solutions.

Furthermore, the lack of consensus regarding automated classification pipelines—specifically in how disorders are selected and performance is measured—limits the ability of these models to generalize across different clinical settings. These barriers must be addressed before ML can reliably serve as a biomarker for systemic diseases. Clinical application currently remains limited by these technical and methodological gaps.

Study Details

Study typeGuideline
EvidenceLevel 5
PublishedJun 2026
View Original Abstract ↓
This review aims to identify the key barriers to clinical application of Machine Learning (ML) in multi-class voice disorder classification. A comprehensive scoping review of research published between 2013 and May 2025 in seven clinical and engineering databases was conducted. Articles that applied ML techniques to classify voice disorders were examined, excluding publications limited to binary classification (e.g., healthy vs. pathological). Data were extracted from the included articles to analyze patterns in the specific voice disorder classification classes, database selection, the input data attributes, vocal tasks, diagnostic labelling, and the applied ML classification techniques. In total, 10,401 articles that addressed voice disorder classification were screened from which 80 used ML techniques for multi-class classification. Results revealed considerable variation in selection of databases, voice disorder diagnostic labels, amount and type of input data (e.g., voice tasks and demographics questionnaire), and classification techniques. These inconsistencies prevent robust comparisons and therefore identification of state-of-the-art solutions, which would typically mature to clinical applications. Variations in classification tasks make it difficult to compare results across studies. The inconsistency found in terms of class imbalance, sample size, and total number of classes investigated, means there is no baseline for comparing and exploring various classification techniques. Finally, variations in testing methods such as using different test set types and sizes or using cross validation limit comparisons across articles. This review identified considerable variations in the diagnostic labels associated with voice disorder classification, data availability per selected label, and testing methodology. Such variation limits comparability and undermines the generalization of ML models. The lack of consensus across the automated classification pipeline – from selection of which disorders should be classified using ML systems, to constructing test sets and measuring performance – are likely to be critical barriers to clinical application. These barriers must be addressed to realise the potential for using voice as a biomarker of other systemic diseases.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.