RT Journal Article SR Electronic T1 Sparse Deep Neural Networks on Imaging Genetics for Schizophrenia Case-Control Classification JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.06.11.20128975 DO 10.1101/2020.06.11.20128975 A1 Chen, Jiayu A1 Li, Xiang A1 Calhoun, Vince D. A1 Turner, Jessica A. A1 van Erp, Theo G. M. A1 Wang, Lei A1 Andreassen, Ole A. A1 Agartz, Ingrid A1 Westlye, Lars T. A1 Jönsson, Erik A1 Ford, Judith M. A1 Mathalon, Daniel H. A1 Macciardi, Fabio A1 O’Leary, Daniel S. A1 Liu, Jingyu A1 Ji, Shihao YR 2020 UL http://medrxiv.org/content/early/2020/06/12/2020.06.11.20128975.abstract AB Machine learning approaches hold potential for deconstructing complex psychiatric traits and yielding biomarkers which have a large potential for clinical application. Particularly, the advancement in deep learning methods has promoted them as highly promising tools for this purpose due to their capability to handle high-dimensional data and automatically extract high-level latent features. However, current proposed approaches for psychiatric classification or prediction using biological data do not allow direct interpretation of original features, which hinders insights into the biological underpinnings and development of biomarkers. In the present study, we introduce a sparse deep neural network (DNN) approach to identify sparse and interpretable features for schizophrenia (SZ) case-control classification. An L0-norm regularization is implemented on the input layer of the network for sparse feature selection, which can later be interpreted based on importance weights. We applied the proposed approach on a large multi-study cohort (N = 1,684) with brain structural MRI (gray matter volume (GMV)) and genetic (single nucleotide polymorphism (SNP)) data for discrimination of patients with SZ vs. controls. A total of 634 individuals served as training samples, and the resulting classification model was evaluated for generalizability on three independent data sets collected at different sites with different scanning protocols (n = 635, 255 and 160, respectively). We examined the classification power of pure GMV features, as well as combined GMV and SNP features. The performance of the proposed approach was compared with that yielded by an independent component analysis + support vector machine (ICA+SVM) framework. Empirical experiments demonstrated that sparse DNN slightly outperformed ICA+SVM and more effectively fused GMV and SNP features for SZ discrimination. With combined GMV and SNP features, sparse DNN yielded an average classification error rate of 28.98% on external data. The importance weights suggested that the DNN model prioritized to select frontal and superior temporal gyrus for SZ classification when a high sparsity was enforced, and parietal regions were further included with a lower sparsity setting, which strongly echoed previous literature. This is the first attempt to apply an interpretable sparse DNN model to imaging and genetic features for SZ classification with generalizability assessed in a large and multi-study cohort. The results validate the application of the proposed approach to SZ classification, and promise extended utility on other data modalities (e.g. functional and diffusion images) and traits (e.g. continuous scores) which ultimately may result in clinically useful tools.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis project was funded by the National Institutes of Health (P20GM103472, P30GM122734, R01EB005846, 1R01EB006841, R01MH106655, 5R01MH094524, U24 RR021992, U24 RR025736-01, U01 MH097435, R01 MH084803, R01 EB020062), National Science Foundation (1539067, 1636893, 1734853), Research Council of Norway (RCN#223273), K. G. Jebsen Stiftelsen and South-East Norway Health Authority.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The data used in the current work were aggregated from multiple studies, including MCIC, COBRE, FBIRN, NU, BSNIP, TOP and HUBIN. The institutional review board at each site approved the study and all participants provided written informed consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe MCIC and COBRE data are available through COINS (https://coins.mrn.org). The NU imaging data can be accessed through SchizConnect (http://schizconnect.org/) and the BSNIP imaging data through NIMH Data Archive (https://nda.nih.gov/). Request of access to other data should be addressed to the individual principal investigator.