PT - JOURNAL ARTICLE AU - Favaro, Anna AU - Tsai, Yi-Ting AU - Butala, Ankur AU - Thebaud, Thomas AU - Villalba, Jesús AU - Dehak, Najim AU - Moro-Velázquez, Laureano TI - Interpretable Speech Features vs. DNN Embeddings: What to Use in the Automatic Assessment of Parkinson’s Disease in Multi-lingual Scenarios AID - 10.1101/2023.05.29.23290697 DP - 2023 Jan 01 TA - medRxiv PG - 2023.05.29.23290697 4099 - http://medrxiv.org/content/early/2023/06/03/2023.05.29.23290697.short 4100 - http://medrxiv.org/content/early/2023/06/03/2023.05.29.23290697.full AB - Individuals with Parkinson’s disease (PD) develop speech impairments that deteriorate their communication capabilities. Speech-based approaches for PD assessment rely on feature extraction for automatic classification or detection. It is desirable for these features to be interpretable to facilitate their development as diagnostic tools in clinical environments. However, many studies propose detection techniques based on non-interpretable embeddings from Deep Neural Networks since these provide high detection accuracy, and do not compare them with the performance of interpretable features for the same task. The goal of this work was twofold: providing a systematic comparison between the predictive capabilities of models based on interpretable and non-interpretable features and exploring the language robustness of the features themselves. As interpretable features, prosodic, linguistic, and cognitive descriptors were employed. As non-interpretable features, x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used. To the best of our knowledge, this is the first study applying TRILLsson and HuBERT to PD detection. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted on six data sets. These contain speech recordings from different languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. On one hand, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Among the non-interpretable features used, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the evolution and the possible deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.HighlightsBoth interpretable and non-interpretable features displayed robust behaviors.Models based on non-interpretable features outperformed interpretable ones.Interpretable feature-based models provide insights into speech and language deterioration.Non-interpretable feature-based models can be used to achieve higher detection accuracy.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was funded in part by the Richman Family Precision Medicine Center of Excellence Venture Discovery Fund and Consolidated Anti Aging Foundation.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The authors of this study collected a data set called NeuroLogical Signals (NLS) at Johns Hopkins Medicine (JHM). The participants were categorized as either having a neurological disorder or being healthy controls and received treatment and diagnosis from expert neurologists at JHM. All participants underwent informed consent, and the data collection was approved by the Johns Hopkins Medical Institutional Review Board. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesNeurological Signals was collected by the authors of this study. Data are protected by privacy and security law. The authors may share this data set with the public once the data collection is completed. This is allowed by the license and the agreement signed by participants. Neurovoz, GITA, GermanPD, and CzechPD are not publicly accessible. The authors of these data sets may or may not want to share their data with the public. ItalianPVS can be found at the following link reported below. https://ieee-dataport.org/open-access/italian-parkinsons-voice-and-speech PDParkinson’s DiseaseHCHealthy ControlDLDeep LearningMLMachine LearningCNNConvolutional Neural NetworkMFCCMel-Frequency Cepstral CoefficientASRAutomatic Speech RecognitionPOSPart of SpeechIUInformational UnitSVMSupport Vector MachinesKNNK-Nearest NeighborsRFRandom ForestXGBoostExtreme Gradient BoostingBGBaggingPCAPrincipal Components AnalysisPLDAProbabilistic Linear Discriminant AnalysisEEREqual Error RateNCVNested Cross-ValidationROCReceiver Operating CharacteristicAUCArea under the Receiver Operating Characteristic curveSSSpontaneous SpeechRPReading PassageTDUText-Dependent UtterancesACCAccuracySENSensitivitySPESpecificityIMInterpretable Feature-based ModelNIFMNon-Interpretable Feature-based ModelMonoMono-lingualMultiMulti-lingualCrossCross-lingual.