Abstract
Schizophrenia (SCZ) is a chronic and severely disabling neurodevelopmental disorder that affects people worldwide. RNA-seq has been a powerful method to detect the differentially expressed genes/non-coding RNAs in patients; however, due to overfitting problems differentially expressed targets (DETs) cannot be used properly as biomarkers. In this study, dorsolateral prefrontal cortex (dlpfc) RNA-seq data from 254 individuals’ was obtained from the CommonMind consortium and analyzed with machine learning methods, including random forest, forward feature selection (ffs), and factor analysis, to reduce the numbers of gene/non-coding RNA feature vectors to overcome overfitting problem and explore involved functional clusters. In 2-fold shuffle testing, the average predictive accuracy for SCZ patients was 67% based on coding genes, and the 96% based on long non-coding RNAs (lncRNAs). Coding genes were further clustered into 14 factors and lncRNAs were clustered into 45 factors to represent the underlying features. The largest contribution factor for coding genes contains number of genes critical in neurodevelopment and previously reported in relation with various brain disorders. Genomic loci of lncRNAs were more insightful, enriched for genes critical in synapse function (p=7.3E-3), cell junction (p=0.017), neuron differentiation (p=8.3E-3), phosphorylation (8.2E-4), and involving the Wnt signaling pathway (p=0.029). Taken together, machine learning is a powerful algorithm to reduce functional biomarkers in SCZ patients. The lncRNAs capture the characteristics of SCZ tissue more accurately than mRNA as the formers regulate every level of gene expression, not limited to mRNA levels.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
DLPFC data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffman-La Roche Ltd, and from NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, and R37MH057881S1, HHSN271201300031C, AG02219, AG05138, and MH06692.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study had been approved by the Children's Hospital of Philadelphia. All the patients who participated in this project have been consented and they agree to publish the results.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, The University of Edinburgh, University of Washington, and Vrije Universiteit Amsterdam.