Methodological inconsistencies and data imbalances hinder the clinical application of machine learning for voice disorder classification
This scoping review synthesized evidence from 80 studies to evaluate the current state of machine learning (ML) techniques for multi-class voice disorder classification. The scope included an analysis of how various ML models perform when identifying and categorizing different types of voice disorders.
The authors identified significant methodological variations, including inconsistent selection of databases, varying diagnostic labels, and diverse types of input data such as demographic questionnaires and specific voice tasks. These inconsistencies, along with issues like class imbalance and inadequate test set sizes, prevent robust comparisons between models and hinder the identification of state-of-the-art solutions.
Furthermore, the lack of consensus regarding automated classification pipelines—specifically in how disorders are selected and performance is measured—limits the ability of these models to generalize across different clinical settings. These barriers must be addressed before ML can reliably serve as a biomarker for systemic diseases. Clinical application currently remains limited by these technical and methodological gaps.