Abstract
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). We recently showed that protein-coding VNTRs underlie some of the strongest known genetic associations with diverse phenotypes. Here, we assessed the phenotypic impact of VNTRs genome-wide, 99% of which lie in non-coding regions. We applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants. Association and statistical fine-mapping analyses identified 107 VNTR-phenotype associations (involving 58 VNTRs) that were assigned a high probability of VNTR causality (PIP≥0.5). Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2- fold risk range across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by the US National Institutes of Health (NIH), MIT, Burroughs Wellcome Fund, the Broad Institute, and the Sloan Foundation.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Office of Research Subject Protection at the Broad Institute of Harvard and MIT determined that this work was not human subjects research and is exempt from IRB.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵** These authors co-supervised this work.
Data Availability
Individual-level VNTR genotypes imputed into UKB will be returned to the UK Biobank Resource. The VNTR+SNP reference panel in SSC will be returned to SFARI Base. Summary statistics for VNTR-phenotype association tests are available at https://data.broadinstitute.org/lohlab/UKB_genomewideVNTR_sumstats/. Access to the following data resources used in this study is available to all approved researchers upon application: UK Biobank (http://www.ukbiobank.ac.uk/), Simons Simplex Collection (SSC Whole- genome 2, https://base.sfari.org), TCGA (via dbGaP, https://www.ncbi.nlm.nih.gov/gap/, accession phs000178.v11.p8), GTEx (via dbGaP, https://www.ncbi.nlm.nih.gov/gap/, accession phs000424.v8.p2; the GTEx Portal http://www.gtexportal.org). The following data resources used in this study are publicly available: 1000 Genomes Project (including HGSVC2 long-read assemblies, http://www.internationalgenome.org/), NHGRI-EBI GWAS Catalog (http://ebi.ac.uk/gwas/, accessions GCST009413, GCST90011767, and GCST012879).