Abstract
Background and Hypothesis Identifying schizophrenia spectrum disorders (SSD) from spontaneous speech features is a key focus in computational psychiatry today.
Study Design We present a task-voting procedure using different speech-elicitation tasks to predict SSD in Spanish, followed by ablation studies highlighting the roles of specific tasks and feature domains. Speech from five tasks was recorded from 92 subjects (49 with SSD and 41 controls). A total of 319 features were automatically extracted, from which 24 were pre-selected based on between-feature correlations and ANOVA F-values, covering acoustic-prosody, morphosyntax, and semantic similarity metrics.
Study Results ExtraTrees-based classification using these features yielded an accuracy of 0.840 on hold-out data. Ablating picture descriptions impaired performance most, followed by story reading, retelling, and free speech. Removing morphosyntactic measures impaired performance most, followed by acoustic and semantic measures. Mixed-effect models suggested significant group differences on all 24 features. In SSD, speech patterns were slower and more variable temporally, while variations in pitch, amplitude, and sound intensity decreased. Semantic similarity between speech and prompts decreased, while minimal distances from embedding centroids to each word increased, and word-to-word similarity arrays became more predictable, all replicating patterns documented in other languages. Morphosyntactically, SSD patients used more first-person pronouns together with less third-person pronouns, and more punctuations and negations. Semantic metrics correlated with a range of positive symptoms, and multiple acoustic-prosodic features with negative symptoms.
Conclusions This study highlights the importance of combining different speech tasks and features for SSD detection, and validates previously found patterns in psychosis for Spanish.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by the grant TRUSTING, HORIZON-HLTH-2022-STAYHLTH-01, grant nr. 101080251-2 (to WH), China Scholarship Council (grant 202108390062 to RH), the Department of Science and Technology of Guangdong Province (grant 112175605105 to WH and RH), a Miguel Servet contract from the Carlos III Health Institute (grant CP18/00003 to RAA), and a Consolidator Grant from the Ministerio de Ciencia e Invovacion (grant CNS2022-136110 to RAA).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of Valdecilla Biomedical Research Institute gave ethical approval for this work (CEIm internal code 2021.119).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors