PT - JOURNAL ARTICLE AU - Li, Binglan AU - Veturi, Yogasudha AU - Lucas, Anastasia AU - Bradford, Yuki AU - Verma, Shefali S. AU - Verma, Anurag AU - Park, Joseph AU - Wei, Wei-Qi AU - Feng, Qiping AU - Namjou, Bahram AU - Kiryluk, Krzysztof AU - Kullo, Iftikhar AU - Luo, Yuan AU - Pividori, Milton AU - Im, Hae Kyung AU - Greene, Casey S. AU - Ritchie, Marylyn D. TI - Multi-ancestry gene-trait connection landscape using electronic health record (EHR) linked biobank data AID - 10.1101/2021.10.21.21265225 DP - 2021 Jan 01 TA - medRxiv PG - 2021.10.21.21265225 4099 - http://medrxiv.org/content/early/2021/10/26/2021.10.21.21265225.short 4100 - http://medrxiv.org/content/early/2021/10/26/2021.10.21.21265225.full AB - Understanding genetic factors of complex traits across ancestry groups holds a key to improve the overall health care quality for diverse populations in the United States. In recent years, multiple electronic health record-linked (EHR-linked) biobanks have recruited participants of diverse ancestry backgrounds; these biobanks make it possible to obtain phenome-wide association study (PheWAS) summary statistics on a genome-wide scale for different ancestry groups. Moreover, advancement in bioinformatics methods provide novel means to accelerate the translation of basic discoveries to clinical utility by integrating GWAS summary statistics and expression quantitative trait locus (eQTL) data to identify complex trait-related genes, such as transcriptome-wide association study (TWAS) and colocalization analyses. Here, we combined the advantages of multi-ancestry biobanks and data integrative approaches to investigate the multi-ancestry, gene-disease connection landscape. We first performed a phenome-wide TWAS on Electronic Medical Records and Genomics (eMERGE) III network participants of European ancestry (N = 68,813) and participants of African ancestry (N = 12,658) populations, separately. For each ancestry group, the phenome-wide TWAS tested gene-disease associations between 22,535 genes and 309 curated disease phenotypes in 49 primary human tissues, as well as cross-tissue associations. Next, we identified gene-disease associations that were shared across the two ancestry groups by combining the ancestry-specific results via meta-analyses. We further applied a Bayesian colocalization method, fastENLOC, to prioritize likely functional gene-disease associations with supportive colocalized eQTL and GWAS signals. We replicated the phenome-wide gene-disease analysis in the analogous Penn Medicine BioBank (PMBB) cohorts and sought additional validations in the PhenomeXcan UK Biobank (UKBB) database, PheWAS catalog, and systematic literature review. Phenome-wide TWAS identified many proof-of-concept gene-disease associations, e.g. FTO-obesity association (p = 7.29e-15), and numerous novel disease-associated genes, e.g. association between GATA6-AS1 with pulmonary heart disease (p = 4.60e-10). In short, the multi-ancestry, gene-disease connection landscape provides rich resources for future multi-ancestry complex disease research. We also highlight the importance of expanding the size of non-European ancestry datasets and the potential of exploring ancestry-specific genetic analyses as these will be critical to improve our understanding of the genetic architecture of complex disease.Competing Interest StatementMDR is on the scientific advisory board for Cipherome.Funding StatementeMERGE Network (Phase III). This phase of the eMERGE Network was initiated and funded by the NHGRI through the following grants: U01HG8657 (Group Health Cooperative/University of Washington); U01HG8685 (Brigham and Womens Hospital); U01HG8672 (Vanderbilt University Medical Center); U01HG8666 (Cincinnati Childrens Hospital Medical Center); U01HG6379 (Mayo Clinic); U01HG8679 (Geisinger Clinic); U01HG8680 (Columbia University Health Sciences); U01HG8684 (Childrens Hospital of Philadelphia); U01HG8673 (Northwestern University); U01HG8701 (Vanderbilt University Medical Center serving as the Coordinating Center); U01HG8676 (Partners Healthcare/Broad Institute); and U01HG8664 (Baylor College of Medicine). Penn Medicine BioBank (PMBB). The PMBB is funded by the Perelman School of Medicine at the University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA Award Number UL1TR001878. We thank D. Birtwell, H. Williams, P. Baumann and M. Risman for informatics support regarding the PMBB. We thank the staff of the Regeneron Genetics Center for whole-exome sequencing of DNA from PMBB participants.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of the University of Pennsylvania gave ethnical approval for this work.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present work are contained in the manuscript