PT - JOURNAL ARTICLE AU - Kaufman, Hadasa AU - Hochenberg, Yarden AU - Linial, Michal AU - Rappoport, Nadav TI - DROP-DEEP: Dimensionality Reduction for Polygenic Risk Score Using Deep Learning Approach AID - 10.1101/2024.05.01.24306609 DP - 2024 Jan 01 TA - medRxiv PG - 2024.05.01.24306609 4099 - http://medrxiv.org/content/early/2024/05/02/2024.05.01.24306609.short 4100 - http://medrxiv.org/content/early/2024/05/02/2024.05.01.24306609.full AB - Motivation Advances in sequencing technologies have enabled the early detection of genetic diseases and the development of personalized medicine. However, the variance explained by genetic variations is typically small compared to the heritability estimates. Consequently, there is a pressing need to develop enhanced polygenic risk score (PRS) prediction models. We seek an approach that transcends the limitations of the routinely used additive model for PRS.Results Here we present DROP-DEEP, a novel method for calculating PRS that enhances the explanation of the heritability variance of complex traits by incorporating high-dimensional genetic interactions. The first stage of DROP-DEEP employs an unsupervised approach to reduce dimensionality, while the second stage involves training a prediction model using a supervised machine-learning algorithm. Notably, the first stage of training is phenotype-agnostic. Thus, while it is computationally intensive, it is performed only once. Its output can serve as input for predicting any chosen trait or disease. We evaluated the efficacy of the DROP-DEEP dimensionality reduction models using principal component analysis (PCA) and deep neural networks (DNN). All models were trained using the UK Biobank (UKB) dataset with over 340,000 subjects and a set of approximately 460,000 single nucleotide variants (SNVs) across the genome. The results of DROP-DEEP, which was established for patients diagnosed with hypertension, outperformed other approaches. We extended the analysis to include an additional five binary and continuous phenotypes, each repeated five times for reproducibility assessment. For each phenotype, DROP-DEEP results were compared to commonly used PRS methodologies, and the performance of all models was discussed.Conclusion Our approach overcomes the need for variable selection while maintaining computational feasibility. We conclude that the DROP-DEEP approach exhibits significant advantages compared to commonly used PRS methods and can be used efficiently for hundreds of genetic traits.Availability and Implementation All the codes and the trained dimensionality reduction models are available at: https://github.com/HadasaK1/DROP-DEEP.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The University Committee for the Use of Human Subjects in Research of the Hebrew University of JerusalemI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data in the study is based on the UKB