Abstract
Objectives To detect unilateral vocal fold paralysis (UVFP) from voice recordings using an explainable model of machine learning.
Study Design Case series - retrospective with a control group.
Methods Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Two tasks were used to elicit voice samples: reading the Rainbow Passage and sustaining phonation of the vowel /a/. The eighty-eight extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features were extracted as inputs for four machine learning models of differing complexity. Training and testing were performed using bootstrapped cross-validation. SHAP was used to identify important features.
Results The median Area Under the Receiver Operating Characteristic Curve (ROC AUC) score ranged from 0.79 to 0.87 depending on model and task. After removing redundant features for explainability, the highest median ROC AUC score was 0.84 using only 13 features for the vowel task and 0.87 using 39 features for the reading task. The most important features included intensity measures, mean MFCC1, mean F1 amplitude and frequency, and shimmer variability depending on model and task.
Conclusion Using the largest dataset studying UVFP to date, we achieve high performance from just a few seconds of voice recordings while discovering which acoustic features are important across models. Notably, we demonstrate that the models use different combinations of features to achieve similar effect sizes. Overall the categories of features related to vocal fold physiology were conserved across the models. Machine learning thus provides a mechanism to detect UVFP and contextualize the accuracy relative to both model architecture and pathophysiology.
Level of Evidence Type 3
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
DML was supported by a National Institutes of Health (NIH) training grant (NIDCD 5T32DC000038). The work was supported by a gift to the McGovern Institute for Brain Research at MIT. SSG was partially supported by NIH grant R01 EB020740 (development of pydra-ml) and P41 EB019936 (reproducible practices).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was approved by the Institutional Review Board at Massachusetts Eye and Ear Infirmary and Partners Healthcare (IRB 2019002711).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data and code has been publicly released: https://github.com/danielmlow/vfp doi.org/10.5281/zenodo.4287654