PT - JOURNAL ARTICLE AU - García-González, Judit AU - Garcia-Gonzalez, Saul AU - Liou, Lathan AU - O’Reilly, Paul F. TI - The Gene Expression Landscape of Disease Genes AID - 10.1101/2024.06.20.24309121 DP - 2024 Jan 01 TA - medRxiv PG - 2024.06.20.24309121 4099 - http://medrxiv.org/content/early/2024/06/21/2024.06.20.24309121.short 4100 - http://medrxiv.org/content/early/2024/06/21/2024.06.20.24309121.full AB - Fine-mapping and gene-prioritisation techniques applied to the latest Genome-Wide Association Study (GWAS) results have prioritised hundreds of genes as causally associated with disease. Here we leverage these recently compiled lists of high-confidence causal genes to interrogate where in the body disease genes operate. Specifically, we combine GWAS summary statistics, gene prioritisation results and gene expression RNA-seq data from 46 tissues and 204 cell types in relation to 16 major diseases (including 8 cancers). In tissues and cell types with well-established relevance to the disease, the prioritised genes typically have higher absolute and relative (i.e. tissue/cell specific) expression compared to non-prioritised ‘control’ genes. Examples include brain tissues in psychiatric disorders (P-value < 1×10−7), microglia cells in Alzheimer’s Disease (P-value = 9.8×10−3) and colon mucosa in colorectal cancer (P-value < 1×10−3). We also observe significantly higher expression for disease genes in multiple tissues and cell types with no established links to the corresponding disease. While some of these results may be explained by cell types that span multiple tissues, such as macrophages in brain, blood, lung and spleen in relation to Alzheimer’s disease (P-values < 1×10−3), the cause for others is unclear and motivates further investigation that may provide novel insights into disease etiology. For example, mammary tissue in Type 2 Diabetes (P-value < 1×10−7); reproductive tissues such as breast, uterus, vagina, and prostate in Coronary Artery Disease (P-value < 1×10−4); and motor neurons in psychiatric disorders (P-value < 3×10−4). In the GTEx dataset, tissue type is the major predictor of gene expression but the contribution of each predictor (tissue, sample, subject, batch) varies widely among disease-associated genes. Finally, we highlight genes with the highest levels of gene expression in relevant tissues to guide functional follow-up studies. Our results could offer novel insights into the tissues and cells involved in disease initiation, inform drug target and delivery strategies, highlighting potential off-target effects, and exemplify the relative performance of different statistical tests for linking disease genes with tissue and cell type gene expression.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by a grant from the National Institutes of Health (R01MH122866) to PFO, and by a 2022 NARSAD Young Investigator Grant (Number 30749) by the Brain & Behavior Research Foundation to JGG. Additionally, this work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Data access to the UK Biobank Resource was approved under application number 18177 to Paul O'ReillyI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced are available online and in the supplemental materials. The scripts used in the current study are available at https://gitlab.com/JuditGG/gene_expr_landscape https://gitlab.com/JuditGG/gene_expr_landscape https://juditgg.shinyapps.io/diseasegenes/