Abstract
Human mitochondria contain a high copy number, maternally transmitted genome (mtDNA) that encodes 13 proteins required for oxidative phosphorylation. Heteroplasmy arises when multiple mtDNA variants co-exist in an individual and can exhibit complex dynamics in disease and in aging. As all proteins involved in mtDNA replication and maintenance are nuclear-encoded, heteroplasmy levels can, in principle, be under nuclear genetic control, however this has never been shown in humans. Here, we develop algorithms to quantify mtDNA copy number (mtCN) and heteroplasmy levels using blood-derived whole genome sequences from 274,832 individuals of diverse ancestry and perform GWAS to identify nuclear loci controlling these traits. After careful correction for blood cell composition, we observe that mtCN declines linearly with age and is associated with 92 independent nuclear genetic loci. We find that nearly every individual carries heteroplasmic variants that obey two key patterns: (1) heteroplasmic single nucleotide variants are somatic mutations that accumulate sharply after age 70, while (2) heteroplasmic indels are maternally transmitted as mtDNA mixtures with resulting levels influenced by 42 independent nuclear loci involved in mtDNA replication, maintenance, and novel pathways. These nuclear loci do not appear to act by mtDNA mutagenesis, but rather, likely act by conferring a replicative advantage to specific mtDNA molecules. As an illustrative example, the most common heteroplasmy we identify is a length variant carried by >50% of humans at position m.302 within a G-quadruplex known to serve as a replication switch. We find that this heteroplasmic variant exerts cis-acting genetic control over mtDNA abundance and is itself under trans-acting genetic control of nuclear loci encoding protein components of this regulatory switch. Our study showcases how nuclear haplotype can privilege the replication of specific mtDNA molecules to shape mtCN and heteroplasmy dynamics in the human population.
Competing Interest Statement
VKM is a paid advisor to 5am Ventures and Janssen Pharmaceuticals. BMN is a member of the scientific advisory board at Deep Genomics and Neumora, consultant of the scientific advisory board for Camp4 Therapeutics and consultant for Merck. KJK is a consultant for Vor Biopharma.
Funding Statement
This project was supported in part by grants 5R35GM122455 (VKM) and 5F30AG074507 (RG) from the National Institutes of Health. VKM is an Investigator of the Howard Hughes Medical Institute. PFC is a Wellcome Trust Principal Research Fellow (212219/Z/18/Z), and a UK NIHR Senior Investigator, who receives support from the Medical Research Council Mitochondrial Biology Unit (MC_UU_00028/7), the Medical Research Council (MRC) International Centre for Genomic Medicine in Neuromuscular Disease (MR/S005021/1), the Leverhulme Trust (RPG-2018-408), an MRC research grant (MR/S035699/1), an Alzheimer's Society Project Grant (AS-PG-18b-022). This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This research has been conducted using the UK Biobank Resource under Application Number 31063. The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of Massachusetts General Hospital gave ethical approval for this work under protocol #2016P001517.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
In terms of data processed or generated as part of this study, we provide genetic association statistics for LD-independent lead SNPs and fine-mapped variants in UKB in addition to colocalization results (Supplementary tables 2-4). Full GWAS summary statistics from UKB and AoU will be made available in Zenodo upon peer-review. All GWAS sample sizes for each genetic ancestry group, meta-analysis, and phenotype can be found in Supplementary table 1. AoU policy does not currently permit public release of individual-level data due to important ethical and privacy considerations: https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2020/05/AoU_Policy_Data_and_Statistics_Dissemination_508.pdf In terms of external data used in this study, we leveraged GWAS summary statistics, and ancestry-specific LD-matrices, and a curated list of 29 common, high-quality disease phenotypes generated as part of the Pan UKBB project (Pan UKBB Initiative, 2022), with more information available online (https://pan.ukbb.broadinstitute.org). UKB phenotype and whole genome sequencing data can be accessed via the UKB Research Analysis Platform after completing a UKB access application: https://ukbiobank.dnanexus.com/landing. AoU phenotype and genotype data can be accessed via access to the Controlled Tier v6 on the AoU researcher workbench: workbench.researchallofus.org. Published mtscATACseq data used for chrM:302 analysis can be obtained via approval from dbGaP. Gene-sets for enrichment analyses can be obtained using COMPARTMENTS (https://compartments.jensenlab.org) and MitoCarta 2.0 (https://www.broadinstitute.org/files/shared/metabolism/mitocarta/human.mitocarta2.0.html) as described previously (Gupta et al., 2021). The GRCh37 and GRCh38 reference genomes as well as other standard reference data are available via the GATK resource bundle: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle. Annotations for the baseline v1.1 and BaselineLD v2.2 models for S-LDSC as well certain other relevant reference data, including the HapMap3 SNP list, can be obtained from https://alkesgroup.broadinstitute.org/LDSCORE/. BLASTn was used as available from the NCBI: https://blast.ncbi.nlm.nih.gov/Blast.cgi. Known reference and polymorphic NUMTs were obtained from supplemental data as provided in published work (Calabrese et al., 2012; Dayama et al., 2014; Li et al., 2012; Wei et al., 2022).