Abstract
Mental health research faces the challenge of developing machine learning models for clinical decision support. Concerns about the generalizability of such models to real-world populations due to sampling effects and disparities in available data sources are rising. We examined whether harmonized, structured collection of clinical data and stringent measures against overfitting can facilitate the generalization of machine learning models for predicting depressive symptoms across diverse real-world inpatient and outpatient samples. Despite systematic differences between samples, a sparse machine learning model trained on clinical information exhibited strong generalization across diverse real-world samples. These findings highlight the crucial role of standardized routine data collection, grounded in unified ontologies, in the development of generalizable machine learning models in mental health.
One-Sentence Summary Generalization of sparse machine learning models trained on clinical data is possible for depressive symptom prediction.
Competing Interest Statement
FP is a member of the European Scientific Advisory Board of Brainsway Inc., Jerusalem, Israel, and the International Scientific Advisory Board of Sooma, Helsinki, Finland. He has received speaker's honoraria from Mag&More GmbH and the neuroCare Group. His lab has received support with equipment from neuroConn GmbH, Ilmenau, Germany, and Mag&More GmbH and Brainsway Inc., Jerusalem, Israel. MAR has received financial research support from the EU (H2020 No. 754740 ) and served as PI in clinical trials from Abide Therapeutics, Boehringer-Ingelheim, Emalex Biosciences, Lundbeck GmbH, Nuvelution TS Pharma Inc., Oryzon, Otsuka Pharmaceuticals and Therapix Biosciences. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Funding Statement
Interdisciplinary Center for Clinical Research (IZKF) of the medical faculty of Muenster grant SEED 11/18 (NO), ), Dan3/022/22 (UD) German Research Foundation grants RE4458/1-1 (RR), KI 588/14-1 (TK), KI 588/14-2 (TK), KI 588/15-1 (TK), KI 588/17-1 (TK), DA 1151/5-1 (UD), DA 1151/ 5-2 (UD), DA 1151/6-1 (UD), DA1151/9-1 (UD), DA1151/10-1 (UD), DA1151/11-1 (UD), KR 3822/5-1 (AK), KR 3822/7-2 (AK), NE 2254/1-2 (IN), NE 2254/2-1 (IN), NE2254/3-1 (IN), NE2254/4-1 (IN), HA 7070/2-2 (TH), HA7070/3 (TH), HA7070/4 (TH), KO-121806 (KD), and JO22022/1-1 Collaborative Project funded by the European Union (EU) under the 7th Framework Programme grant 601252 German Federal Ministry of Education and Research grants 01EE2305C (RR), 01EE230A (NO), 01EE2303A, 01ER1301A/B/C, 01ER1511D, 01ER1801A/B/C/D, the Federal States of Germany and the Helmholtz Association, the participating universities and the institutes of the Leibniz Association FoeFoLePLUS program of the Faculty of Medicine of the Ludwig-Maximilians-University, Munich, Germany, grant #003, MCSP (MAR)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee of the medical association Westphalia-Lippe and the University of Muenster, officially entitled "Ethikkommission der Aerztekammer Westfalen-Lippe und der Westfaelischen Wilhelms-Universitaet Muenster", gave ethical approval for the entire study.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All samples and their data are findable and requestable through the Meta-Data Study Repository of the German Centre for Mental Health (DZPG). The machine learning model will be published in the PHOTONAI model repository.