PT - JOURNAL ARTICLE AU - Barnett, Eric AU - Zhang-James, Yanli AU - Faraone, Stephen V TI - Improving Machine Learning Prediction of ADHD Using Gene Set Polygenic Risk Scores and Risk Scores from Genetically Correlated Phenotypes AID - 10.1101/2022.01.11.22269027 DP - 2022 Jan 01 TA - medRxiv PG - 2022.01.11.22269027 4099 - http://medrxiv.org/content/early/2022/01/12/2022.01.11.22269027.short 4100 - http://medrxiv.org/content/early/2022/01/12/2022.01.11.22269027.full AB - Background Polygenic risk scores (PRSs), which sum the effects of SNPs throughout the genome to measure risk afforded by common genetic variants, have improved our ability to estimate disorder risk for Attention-Deficit/Hyperactivity Disorder (ADHD) but the accuracy of risk prediction is rarely investigated.Methods With the goal of improving risk prediction, we performed gene set analysis of GWAS data to select gene sets associated with ADHD within a training subset. For each selected gene set, we generated gene set polygenic risk scores (gsPRSs), which sum the effects of SNPs for each selected gene set. We created gsPRS for ADHD and for phenotypes having a high genetic correlation with ADHD. These gsPRS were added to the standard PRS as input to machine learning models predicting ADHD. We used feature importance scores to select gsPRS for a final model and to generate a ranking of the most consistently predictive gsPRS.Results For a test subset that had not been used for training or validation, a random forest (RF) model using PRSs from ADHD and genetically correlated phenotypes and an optimized group of 20 gsPRS had an area under the receiving operating characteristic curve (AUC) of 0.72 (95% CI: 0.70 – 0.74). This AUC was a statistically significant improvement over logistic regression models and RF models using only PRS from ADHD and genetically correlated phenotypes.Conclusions Summing risk at the gene set level and incorporating genetic risk from disorders with high genetic correlations with ADHD improved the accuracy of predicting ADHD. Learning curves suggest that additional improvements would be expected with larger study sizes. Our study suggests that better accounting of genetic risk and the genetic context of allelic differences results in more predictive models.Competing Interest StatementIn the past year, Dr. Faraone received income, potential income, travel expenses continuing education support and/or research support from Aardvark, Akili, Genomind, Ironshore, KemPharm/Corium, Noven, Ondosis, Otsuka, Rhodes, Supernus, Takeda, Tris and Vallon. With his institution, he has US patent US20130217707 A1 for the use of sodium-hydrogen exchange inhibitors in the treatment of ADHD. In previous years, he received support from: Alcobra, Arbor, Aveksham, CogCubed, Eli Lilly, Enzymotec, Impact, Janssen, Lundbeck/Takeda, McNeil, NeuroLifeSciences, Neurovance, Novartis, Pfizer, Shire, and Sunovion. He also receives royalties from books published by Guilford Press: Straight Talk about Your Childs Mental Health; Oxford University Press: Schizophrenia: The Facts; and Elsevier: ADHD: Non-Pharmacologic Interventions. He is also Program Director of www.adhdinadults.com. Dr. Faraone is supported by NIMH grants U01MH109536-01, U01AR076092-01A1, R0MH116037 and 5R01AG06495502; Oregon Health and Science University, Otsuka Pharmaceuticals and Supernus Pharmaceutical Company. Dr. Yanli Zhang-James is supported by the European Unions Seventh Framework Programme for research, technological development and demonstration under grant agreement no 602805 and the European Unions Horizon 2020 research and innovation programme under grant agreements No 667302. Eric Barnett has no financial disclosures. Funding StatementThis project has received funding from the European Unions Horizon 2020 research and innovation programme grant agreement No 667302. This project has received funding from the European Unions Horizon 2020 research and innovation programme grant agreement No 965381.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The SUNY Upstate IRB determined this project did not meet the definition of "human subject" research under the purview of the IRB according to federal regulations.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSupplementary Data contains all information on data availability for summary statistics with links. Individual-level genotype data are available upon request to the Psychiatric Genomics Consortium (PGC). https://atlas.ctglab.nl/ukb2_sumstats/f.2139.0.0_res.EUR.sumstats.MACfilt.txt.gz http://cnsgenomics.com/data/wu_et_al_2019_nc/23_medication-taking_GWAS_summary_statistics.tar.gz http://ssgac.org/documents/SSGAC_Rietveld2013.zip http://ssgac.org/documents/CHIC_Summary_Benyamin2014.txt.gz ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/Riveros-McKayF_30677029_GCST007241/SCOOP_UKHLS_ldcorrected.gz https://www.med.unc.edu/pgc/results-and-downloads/downloads https://atlas.ctglab.nl/ukb2_sumstats/f.1070.0.0_res.EUR.sumstats.MACfilt.txt.gz https://www.med.unc.edu/pgc/results-and-downloads http://enigma.ini.usc.edu/research/download-enigma-gwas-results/ https://atlas.ctglab.nl/ukb2_sumstats/f.3581.0.0_res.EUR.sumstats.MACfilt.txt.gz