PT - JOURNAL ARTICLE AU - Hansen, Lasse AU - Bernstorff, Martin AU - Enevoldsen, Kenneth AU - Kolding, Sara AU - Damgaard, Jakob Grøhn AU - Perfalk, Erik AU - Nielbo, Kristoffer L. AU - Danielsen, Andreas A. AU - Østergaard, Søren D. TI - Predicting diagnostic progression to schizophrenia or bipolar disorder via machine learning applied to electronic health record data AID - 10.1101/2024.07.02.24309828 DP - 2024 Jan 01 TA - medRxiv PG - 2024.07.02.24309828 4099 - http://medrxiv.org/content/early/2024/07/03/2024.07.02.24309828.short 4100 - http://medrxiv.org/content/early/2024/07/03/2024.07.02.24309828.full AB - Importance The diagnosis of schizophrenia and bipolar disorder is often delayed several years despite illness typically emerging in late adolescence or early adulthood, which impedes initiation of targeted treatment.Objective To investigate whether machine learning models trained on routine clinical data from electronic health records (EHRs) can predict diagnostic progression to schizophrenia or bipolar disorder among patients undergoing treatment in psychiatric services for other mental illness.Design Cohort study based on data from EHRs.Setting The psychiatric services of the Central Denmark Region.Participants All patients between ≥15 and <60 years with at least one contact with the psychiatric services of the Central Denmark Region between 2011 and 2021. Patients with only a single contact were removed, leaving a total of 24,449 eligible patients with 398,922 outpatient contacts with the psychiatric services.Exposures Predictors based on EHR data, including medications, diagnoses, and clinical notes.Main Outcomes and Measures Diagnostic transition to schizophrenia or bipolar disorder within 5 years, predicted one day before outpatient contacts by means of regularized logistic regression and Extreme Gradient Boosting (XGBoost) models.Results Transition to the first occurrence of either schizophrenia or bipolar disorder was predicted by the XGBoost model with an area under the receiver operating characteristics curve (AUROC) of 0.70 on the training set, and 0.64 on the test set which consisted of two held-out hospital sites. At a predicted positive rate of 4%, the XGBoost model had a sensitivity of 9.3%, a specificity of 96.3%, and a positive predictive value of 13.0%. Predicting schizophrenia and bipolar disorder separately yielded AUROCs of 0.80 and 0.62, respectively, on the test set.The clinical notes proved particularly informative for prediction.Conclusions and relevance It is possible to predict diagnostic transition to schizophrenia and bipolar disorder from routine clinical data extracted from EHRs, with schizophrenia being notably easier to predict than bipolar disorder.Question Can diagnostic progression to schizophrenia or bipolar disorder be accurately predicted from routine clinical data extracted from electronic health records?Findings In this study, which included all patients aged between ≥15 and <60 years with contacts to the psychiatric services of the Central Denmark Region between 2011 and 2021, progression to schizophrenia was predicted with high accuracy, with bipolar disorder proving a more difficult target.Meaning Detecting progression to schizophrenia through machine learning based on routine clinical data is feasible. This may reduce diagnostic delay and duration of untreated illness.Competing Interest StatementAAD has received a speaker honorarium from Otsuka Pharmaceuticals. SDO received the 2020 Lundbeck Foundation Young Investigator Prize. Furthermore, SDO owns/has owned units of mutual funds with stock tickers DKIGI, SPIC20CAPK, IAIMWC and WEKAFKI, and owns/has owned units of exchange traded funds with stock tickers BATE, IS4S, IQQJ, OM3X, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76 and EUNL. The remaining authors declare no conflicts of interest.Funding StatementThe study is supported by grants from the Lundbeck Foundation (grant number: R344-2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20), and the Danish Agency for Digitisation Investment Fund for New Technologies (grant number 2020-6720) to SDO. Outside this study, SDO reports further funding from the Lundbeck Foundation (grant number: R358-2020-2341), the Novo Nordisk Foundation (grant number: NNF20SA0062874), and Independent Research Fund Denmark (grant numbers: 7016-00048B and 2096-00055A). The funders played no role in study design, collection, analysis or interpretation of data, the writing of the report or the decision to submit the paper for publication. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The use of EHRs from the Central Denmark Region for this study was approved by the Legal Office of the Central Denmark Region in accordance with the Danish Health Care act §46, Section 2. According to the Danish Committee Act, ethical review board approval is not required for studies based solely on data from EHRs (waiver for this project: 1-10-72-1-22). Data were processed and stored in accordance with the European Union General Data Protection Regulation and the project is registered on the internal list of research projects having the Central Denmark Region as data steward.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesDue to the sensitive nature of the data it can not be made available. All code is freely available online. https://github.com/Aarhus-Psychiatry-Research/psycop-common