PT - JOURNAL ARTICLE AU - Singhal, Pankhuri AU - Veturi, Yogasudha AU - Dudek, Scott M. AU - Lucas, Anastasia AU - Frase, Alex AU - Schrodi, Steven J. AU - Fasel, David AU - Weng, Chunhua AU - Pendergrass, Rion AU - Schaid, Daniel J. AU - Kullo, Iftikhar J. AU - Dikilitas, Ozan AU - Sleiman, Patrick M.A. AU - Hakonarson, Hakon AU - Moore, Jason H. AU - Williams, Scott M. AU - Ritchie, Marylyn D. AU - Verma, Shefali S. TI - Evidence of epistasis in regions of long-range linkage disequilibrium across five complex diseases in the UK Biobank and eMERGE datasets AID - 10.1101/2022.10.19.22280888 DP - 2022 Jan 01 TA - medRxiv PG - 2022.10.19.22280888 4099 - http://medrxiv.org/content/early/2022/10/21/2022.10.19.22280888.short 4100 - http://medrxiv.org/content/early/2022/10/21/2022.10.19.22280888.full AB - Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWAS). Standard GWAS are well-powered to interrogate additive models; however, new approaches are required to investigate other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected due to lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWAS excludes detection of sites in LD that may underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta’s D statistics) in long-range LD (> 0.25cM). We identified five significant associations across five disease phenotypes that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were 1) members of highly conserved gene families with complex roles in multiple pathways, 2) essential genes, and/or 3) associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and may especially be driving factors in conditions with a wide range of phenotypic outcomes.Significance Current knowledge of genotype-phenotype relationships is largely contingent on traditional univariate approaches to genomic analysis. Yet substantial evidence supports non-additive modes of inheritance and regulation, such as epistasis, as being abundant across the genome. In this genome-wide study, we probe the biomolecular mechanisms underlying complex human diseases by testing the association of pairwise genetic interactions with disease occurrence in large-scale biobank data. Specifically, we tested intrachromosomal and interchrosomal long-range interactions between regions of the genome in high linkage disequilibrium, these regions are typically excluded from genomic analyses. The results from this study suggest that essential gene, members of highly conserved gene families, and phenotypes with variable expressivity, are particularly enriched with epistatic and pleiotropic activity.Competing Interest StatementThe authors have declared no competing interest.Funding StatementMDR is supported by R01 HG010067 and U01 AG066833. JHM is supported by R01 LM010098 and U01 AG066833. PS is supported by F31 AG069441-01. eMERGE Network (Phase III): This phase of the eMERGE Network was initiated and funded by the NHGRI through the following grants: U01HG8657 (Group Health Cooperative/University of Washington); U01HG8685 (Brigham and Womens Hospital); U01HG8672 (Vanderbilt University Medical Center); U01HG8666 (Cincinnati Childrens Hospital Medical Center); U01HG6379 (Mayo Clinic); U01HG8679 (Geisinger Clinic); U01HG8680 (Columbia University Health Sciences); U01HG8684 (Childrens Hospital of Philadelphia); U01HG8673 (Northwestern University); U01HG8701 (Vanderbilt University Medical Center serving as the Coordinating Center); U01HG8676 (Partners Healthcare/Broad Institute); and U01HG8664 (Baylor College of Medicine).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Institutional Review Board of the University of Pennsylvania gave ethical approval for this work through IRB Protocol #850838. The project qualified as exempt.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced are available online. eMERGE network phase III data can be accessed through dbGaP (study ID phs001584.v2.p2). UKBB data was accessed through protocol number 32133.