Mitochondrial genome copy number in human blood-derived DNA is strongly associated with insulin levels and related metabolic traits and primarily reflects cell-type composition differences
============================================================================================================================================================================================

* Liron Ganel
* Lei Chen
* Ryan Christ
* Jagadish Vangipurapu
* Erica Young
* Indraniel Das
* Krishna Kanchi
* David Larson
* Allison Regier
* Haley Abel
* Chul Joo Kang
* Aki Havulinna
* Charleston W. K. Chiang
* Susan Service
* Nelson Freimer
* Aarno Palotie
* Samuli Ripatti
* Johanna Kuusisto
* Michael Boehnke
* Markku Laakso
* Adam Locke
* Nathan O. Stitziel
* Ira M. Hall

## Abstract

Mitochondrial copy number is known to vary among humans and across tissues, and a population-based study of mitochondrial genome copy number (MT-CN) in blood – as estimated from genome sequencing data – observed strong (∼54%) heritability. However, the genetic causes and phenotypic consequences of MT-CN variation in humans are not well-studied. Here, we studied MT-CN variation in blood-derived DNA from 19,184 Finnish individuals with deep cardiometabolic trait measurements using a combination of whole genome (N = 4,163) and exome sequencing (N = 19,034) data as well as imputed array genotypes (N = 17,718). We confirmed that MT-CN in blood is highly heritable (31% by GREML). We identified two loci in the nuclear genome that are significantly associated with MT-CN variation: a common variant at the *MYB-HBS1L* locus (P = 1.6×10−8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the *TMB1M1* gene (P = 3.0×10−8), which has been reported to protect against non-alcoholic fatty liver disease in model organisms. We also found that MT-CN is strongly associated with insulin levels (P = 2.0×10−21), fat mass (P = 4.5×10−16), and other related traits. Using a Mendelian randomization framework, we constructed a genetic instrument for MT-CN using penalized regression with adjustment for potentially confounding covariates and found a significant association with insulin levels, which suggests that our MT-CN measurement in blood may be causally related to metabolic syndrome. Finally, we computed our genetic instrument in UK Biobank participants and tested it against a set of cell count and cardiometabolic traits. We found significant associations between MT-CN and both neutrophil and platelet counts (P = 1.8×10−8 and P = 1.2×10−3). While the association between MT-CN and metabolic syndrome traits was replicated in the UK Biobank, adjusting for cell counts largely eliminated these signals, suggesting that MT-CN is actually a proxy measurement for neutrophil and platelet counts in its effect on metabolic syndrome. Taken together, these results suggest that measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits.

**Author summary** The number of mitochondria per cell is variable between tissues and between individuals, and prior studies have shown that mitochondrial genome copy number in blood (MT-CN) – as estimated indirectly from sequencing of blood-derived DNA – is a genetically determined trait that varies among humans. We studied genetic data from approximately 19,000 Finnish individuals and showed that MT-CN is significantly associated with insulin and related metabolic traits, providing evidence that MT-CN levels play a role in determining these traits. Consistent with a previous study, we showed that genetics play a significant role in determining blood-derived MT-CN. We also found new evidence linking several regions of the genome to MT-CN, including a gene known to be associated with non-alcoholic fatty liver disease. Finally, we found that in the link between blood-derived MT-CN and metabolic syndrome, MT-CN likely represents the relative quantities of circulating immune cells, providing further evidence for the role of inflammation in metabolic syndrome.

## Introduction

There are many reported links between mitochondrial content and cardiometabolic phenotypes in various tissues, including adipose [1–3], liver [1,4,5], skeletal muscle [1,6–10], and blood [11–17]. Traits associated with mitochondrial (MT) content include coronary heart disease (CHD), type 2 diabetes, and metabolic syndrome traits such as insulin sensitivity/resistance, obesity, and blood triglycerides. However, these studies have generally been limited by small sample sizes and low statistical power. This, in addition to the use of heterogeneous mitochondrial quantification methods, has led to inconsistencies in the literature about the strength and directions of effect between mitochondrial content and metabolic syndrome traits. In the only large WGS study of mitochondrial genome copy number (MT-CN) known to us, Ding *et al*. used data from 2,077 Sardinians to estimate the heritability of MT-CN at 54% and detect significant associations between MT-CN and both waist circumference and waist-hip ratio, but not body mass index (BMI) [11].

Although variations in MT-CN measured from whole blood can in principle be attributed to either variability of MT copy number within cells or the cell type composition of the blood (given that different cell types have varying MT content [18–20]), the literature on this subject is inconclusive. Using CpG methylation data, a large (N = 11,443), low-coverage (1.7x autosomal; 102x mitochondrial) sequencing study of the link between MT-CN and major depressive disorder using buccal DNA from Chinese women concluded that variability of MT-CN from buccal swabs was not due to differences in cell type composition [21]. However, this study did not do a similar experiment in blood. Two small (N = 756 and N = 400) studies identified an association between MT content and CHD that they attributed to variable MT-CN within leukocytes, but they did not directly investigate the possibility of cell type composition being the true driver of the association [12,17]. For brevity, we will use the term “MT-CN” to refer to the underlying phenotype reflected by measuring this quantity for the remainder of this work, with these caveats.

While several studies have found that peripheral blood MT content is heritable, only a small number of MT-CN associated loci have been identified [22,23]. In one of these studies, Curran *et al*. used linkage analysis in Mexican Americans to find an MT-CN associated locus near a marker previously associated with triglyceride levels [23–25], providing further indirect evidence for the link between MT-CN and metabolic syndrome.

Here, we take advantage of large-scale genome, exome, and array genotype data to investigate the causes and effects of MT-CN in a large, deeply phenotyped Finnish cohort. Our results reveal novel links with metabolic syndrome and provide evidence supporting a causal role for MT-CN.

## Results

### Association of MT-CN with metabolic traits

We estimated MT-CN in 4,163 individuals from the METSIM and FINRISK studies based on deep (>20x coverage) WGS data. We did so by measuring the mean coverage depth of reads mapped to the mitochondrial genome in each sample, and normalizing it to the mean autosomal coverage (see Methods). We performed batch normalization separately for METSIM and for two FINRISK batches separated by survey years (see Methods). Each measurement was adjusted for age, age2, and sex, then inverse rank normalized separately before combining across batches. We tested the resulting MT-CN estimates for association with 137 quantitative traits that were collected and normalized according to the procedures described previously [26]. MT-CN was strongly associated with fat mass (P = 4.48 × 10−16) and fasting serum insulin (P = 2.02 × 10−21), as well as numerous additional quantitative traits, many related to metabolic syndrome **(**Fig 1a, S1 Table). Notably, BMI was significantly associated with MT-CN, although Ding *et al*. did not find evidence of this association [11]. Since population structure was a potential confounder in this analysis considering the presence of mtDNA polymorphisms that might adversely affect short-read alignment, we included SNP-inferred mitochondrial haplogroup as a covariate and reran the tests (Fig 1b). The association signals retained significance even after this adjustment.

![Fig 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/27/2020.10.23.20218586/F1.medium.gif)

[Fig 1.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/F1)

Fig 1. Cardiometabolic trait associations with MT-CN in WGS data.
(a) Phenome-wide association study of normalized MT-CN against 137 cardiometabolic traits in the 4,163 sample data set. Traits are grouped into 17 categories, represented by the color of each bar. The top three most significant traits are, in order: fasting serum insulin, C-reactive protein, and fat mass. Exact P values and effect estimates, as calculated by EMMAX, are listed in S1 Table. (b) Association tests between normalized MT-CN and both fat mass and fasting serum insulin using WGS data (N = 4,163). Results are shown for the EMMAX test and a permutation test in which mitochondrial haplogroups were adjusted for.

To understand the connection between MT-CN and more clinically relevant phenotypes, we tested our MT-CN estimate against Matsuda ISI and disposition index (Table 1), which measure insulin sensitivity and secretion, respectively, and were not included in the initial screen. MT-CN was strongly associated with both insulin phenotypes. Notably, the Matsuda ISI signals survived adjustment for fat mass percentage after excluding diabetic individuals, which indicates that the association of peripheral blood MT-CN with insulin sensitivity was independent of fat mass.

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T1)

Table 1. 
Associations of normalized MT-CN with disposition index and Matsuda ISI in METSIM.

To test for this association signal in a larger cohort, we developed a method to estimate mitochondrial genome copy number using 19,034 samples with whole exome sequencing (WES) data from the METSIM and FINRISK studies that included most of the WGS samples [26] (see Methods). Consistent with the WGS-based analysis, WES-estimated MT-CN was significantly associated with both fat mass and fasting serum insulin levels, even after removing the samples with WGS data, with identical directions of effect (S2 Table).

Anecdotally, it is interesting to note that these MT association signals can also be detected using read-depth analysis of the nuclear genome (S1 Fig**;** manuscript in prep.), where reads derived from mtDNA align erroneously to several nuclear loci based on homology between the MT genome and ancient nuclear mitochondrial insertions. This result provides additional evidence for the reported trait associations using an independent MT-CN estimation method, and indicates that these homology-based signals need to be taken into account in future CNV association studies.

### Heritability analysis

To assess the extent to which MT-CN is genetically determined, we estimated the heritability of mitochondrial genome copy number using GREML (Table 2). We explored two different approaches available: (1) analysis of the 4,149 samples with WGS data that passed quality control measures, where both nuclear genotypes and MT-CN are measured directly from the WGS data, and (2) analysis of the set of 17,718 samples with imputed genotype array data, where MT-CN is estimated from WES data. Of these, (1) benefited from more accurate measurement of genotype and phenotype, whereas (2) had noisier measurements but benefited from larger sample size. We focused primarily on the METSIM cohort, both because of the homogeneity of this cohort (see Methods) and because the number of FINRISK samples with WGS data was small.

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T2)

Table 2. 
GREML heritability estimates in each cohort separately and in joint analysis.

In the WGS analysis, the GREML-estimated heritability of MT-CN in METSIM was 31% - somewhat less than the 54% value reported in the only prior large-scale study of peripheral blood MT-CN heritability, which was based on low-coverage WGS [11]. For comparison, we used this same approach to estimate heritability of LDL in METSIM WGS data, which yielded an estimate of 34% with a standard error of 7.9% (Table 3). This is broadly consistent with prior work [27,28], including analysis of the same Finnish sample set using distinct methods [26] (20.2% heritability). These results show that mitochondrial genome copy number is a genetically determined trait with significant heritability, comparable to that of LDL and other quantitative cardiometabolic traits [26].

View this table:
[Table 3.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T3)

Table 3. 
GREML and GREML-LDMS heritability estimates for normalized MT-CN and low-density lipoprotein (LDL).

The analysis of imputed METSIM genotypes using WES-estimated MT-CN yielded an estimated heritability of 11%, which is much lower than the WGS-based estimate (Table 2). To understand this discrepancy, we repeated the GREML analysis with the other two combinations of phenotype source (WGS vs. WES estimation) and genotype source (WGS vs. imputed array). When using the WGS-measured phenotype, the estimated heritability decreased only slightly (31% to 27%) when switching from the WGS to imputed genotypes. This suggests that the difference in genotyping method was not the main driver of the observed heritability disparity between the WGS and imputed array datasets. Conversely, when analyzing the imputed METSIM genotypes, switching from WGS-measured to WES-measured MT-CN resulted in a large drop (27% to 11%) in estimated heritability. This suggests that the extra noise inherent in WES-based MT-CN estimates was responsible for the reduction in the GREML-estimated heritability despite the increased sample size of the imputed array dataset.

### Identification of genetic factors associated with MT-CN

Previous studies have identified three autosomal quantitative trait loci (QTL) for MT-CN in other populations [22,23]. We conducted single variant GWAS for MT-CN (see Methods). Analysis of WGS (N = 4,149) and WES (N = 19,034) genotypes yielded no variants exceeding the respective significance thresholds of 5×10−8 and 5×10−7 (S2 Fig). However, despite the increased noise in the WES-measured phenotype, GWAS of imputed array genotypes from METSIM (N = 9,791) yielded two loci with genome-wide significant associations, identified by lead markers rs2288464 and rs9389268 (Fig 2, Table 4). Of the previously reported QTLs, we observed a suggestive signal at rs445 (P = 7.95×10−4), but no signal at rs11006126 (P = 0.206) or within the linkage peak reported on chromosome 10 (S3 Fig) [22,23].

View this table:
[Table 4.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T4)

Table 4. 
Single marker association results for rs2288464 and rs9389268

![Fig 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/27/2020.10.23.20218586/F2.medium.gif)

[Fig 2.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/F2)

Fig 2. Single-marker genetic associations with MT-CN in imputed array data.
(a) Manhattan plot for a genome-wide association test of normalized, WES-measured MT-CN using imputed array genotype data from METSIM (N = 9,791). Two loci markers reached the genome-wide significance of 5×10−8, identified by lead markers rs2288464 and rs9389268. (b) Quantile-quantile (QQ) plot for the association test shown in (a). This plot is separated by minor allele frequency bin, as indicated by the colors and shapes of the points. (c) Boxplot showing the distributions of normalized WES-measured MT-CN in METSIM separated by the number of rs9389268 alternate alleles as detected by imputed array genotyping (N = 9,791). The EMMAX P value for this variant was 1.62×10−8 in imputed data.

rs9389268 was the only marker that was strongly associated with MT-CN in the METSIM analyses of both WGS and imputed array data (P = 7.87×10−7 and P = 1.62×10−8, respectively). Although this variant was not significantly associated with MT-CN in FINRISK or in a random-effect meta-analysis of both cohorts (P = 0.115), the lack of signal in FINRISK is likely the product of lower-quality MT-CN measurements in FINRISK, which displayed heterogeneity across survey years (S4 Fig). This variant is located in an intergenic region between the *MYB* and *HBS1L* genes, is common across many populations, and is slightly more frequent in Finns compared to non-Finnish Europeans (gnomAD v3 MAF 34.4% vs. 26.0%). *MYB* and *HBS1L* are hematopoietic regulators [29,30], and the region between them is known to be associated with many hematological parameters including fetal hemoglobin levels, hematocrit, and erythrocyte, platelet, and monocyte counts [31–34]. It has been suggested that these intergenic variants function by disrupting *MYB* transcription factor binding and disrupting enhancer-promoter looping [35]. Conditioning the METSIM-only imputed array GWAS on rs9399137 – a tag SNP shown to be associated with many of these hematological parameters [33] – resulted in elimination of the rs9389268 signal entirely (P = 0.408), suggesting that the haplotype responsible for the association of rs9389268 with MT-CN in our data is the same one previously known to be associated with numerous hematological phenotypes.

This result is not surprising considering our approach for normalizing MT-CN. Because our MT-CN estimate was based on the ratio of mtDNA coverage to nuclear DNA coverage, changes in the cell type composition of blood could result in changes in our normalized measurement if the underlying cell types have different average numbers of mitochondria. This is especially true of platelets, which can contain mitochondria but not nuclei, and whose counts are known to be associated with rs9399137.

rs2288464 seemed to be a good candidate due to its location in the 3’ untranslated region of *MRPL34*, which codes for a large subunit protein of the mitochondrial ribosome. While the association signal at this marker was not observed in the WGS data (P = 0.0655), based on the observed effect size of this variant in WES and imputed data as well as the number of WGS datasets available, there was insufficient power (∼0.5% at α = 5×10−7) to robustly detect this association in the WGS data alone [36].

We next performed rare variant association (RVAS) analyses using a mixed-model version of SKAT-O [37] to test for genes in which the presence of high-impact rare variants might be associated with MT-CN levels (see Methods; Fig 3, Table 5). Using WES data, the only gene passing the Bonferroni-adjusted P value threshold of 2.16×10−6 was *TMBIM1* (P = 2.96×10−8), a member of a gene family thought to regulate cell death pathways [38]. *TMBIM1* has been shown to be protective against non-alcoholic fatty liver disease (NAFLD), progression to non-alcoholic steatohepatitis, and insulin resistance in mice and macaques [39]. Interestingly, in our analysis, in which a burden test was determined to be optimal by SKAT-O, putative loss-of-function variants in *TMBIM1* were associated with a higher MT-CN (Fig 3c), which was in turn associated with less severe metabolic syndrome, suggesting that *TMBIM1* is actually a risk gene, not a protective one. Thus, the published function of *TMBIM1* makes it a strong candidate, although the direction of effect in our data disagreed with the direction suggested by prior work in model organisms [39].

View this table:
[Table 5.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T5)

Table 5. 
Gene-based rare variant association results for *TMBIM1*

![Fig 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/27/2020.10.23.20218586/F3.medium.gif)

[Fig 3.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/F3)

Fig 3. Gene-based associations with MT-CN in WES data.
(a) Manhattan plot and quantile-quantile (QQ) plot for a gene-based rare variant association test of normalized, WES-measured MT-CN using WES data from both METSIM and FINRISK (N = 19,034). The red line represents a Bonferroni significance level of 2.164×10−6, as 23,105 genes were included in this test. *TMBIM1* is the only gene to reach significance at this level. (b) QQ plot for the test shown in (a). This plot is separated by minor allele frequency bin, as indicated by the colors and shapes of the points. (c) Boxplot showing the distributions of normalized WES-measured MT-CN, separated by the number of WES-detected alternate alleles in *TMBIM1* with MAF < 0.01 and CADD score > 20 (N = 19,034).

### Inference of causality in the association between MT-CN and insulin

To further understand the association between MT-CN and fasting serum insulin, we employed a Mendelian randomization (MR) approach with MT-CN as the exposure and insulin as the outcome. Using penalized regression, we leveraged our extensive phenotype data to build a genetic instrument from a large number of genetic variants and adjust for possible confounders via a novel approach (see Methods; Fig 4). We believe this approach to be more robust to violations of key MR assumptions than other methods in situations where limited data are available and few robust genotype-exposure associations are known. We restricted our analysis to METSIM samples due to batch effects and inconsistencies in available quantitative trait data observed across FINRISK survey years (S4 Fig). The effect sizes of the instrument in the causality test for insulin levels are shown in Fig 4d. We calculated our instrument using either L1 or L2 regularization. In both cases, the MT-CN instrument was not a significant predictor (α = 0.05) of insulin when we constructed our instrument from WGS variants, but was significant when the instrument was constructed from imputed array variants. This was likely due to the larger sample size of the imputed array data set. However, the effect estimates were remarkably similar across all four cases. As a result, inverse-variance weighted meta-analysis across datasets yielded highly significant P values for both penalties. In summary, our analysis provided evidence for a significant causal role for MT-CN in determining fasting serum insulin levels that was robust to the choice of regression penalty when building the genetic instrument. We note that this evidence for causality comes with some caveats (see Methods).

![Fig 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/27/2020.10.23.20218586/F4.medium.gif)

[Fig 4.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/F4)

Fig 4. Mendelian randomization approach and results.
(a) Formulation of the Mendelian Randomization causality test. G represents genotypes, Z is a genetic instrument value constructed from G, X represents ln(MT-CN), Y represents ln(Insulin), and U represents any confounders of the association between X and Y. The arrow from X to Y is dashed to indicate that although an association is known, the relationship is not known to be causal. In this formulation, a significant association between Z and Y would provide evidence that X is casual for Y. (b) Strategy for choosing variables to adjust for when building Z in order to enforce MR assumptions. A represents those columns of covariate matrix W that are associated with Y (represented by the solid line between A and Y) and B represents those columns of matrix W that are associated with X conditional on A (represented by the solid line between B and X). Dashed lines represent possible, but unproven associations. The penalized regression of X on G used to build Z is adjusted for A and B (with no penalty) in an attempt to prevent any associations between Z and either A or B (represented by the blue X’s). While an association between B and Y is unlikely (represented by the dashed line between B and Y) because B is not contained in A, B is still adjusted for in the penalized regression to be as conservative as possible. (c) Strategy for choosing covariates to adjust for in the causality test of Y against Z in another attempt to reduce the impact of any remaining associations between Z and assumption-violating variables. Covariate sets I, II, III, and IV are defined by the presence of known first-order associations (represented by black lines) with Z and Y (see Methods). Yellow lines represent relationships where a first-order association is not known, but a higher-order association is possible. Covariate sets II, III, and IV (colored blue) are adjusted for in the causality test because there is at least one first order association linking them to Z and Y, so they risk violating MR assumptions 2 or 3. (d) Results of Mendelian randomization test for causality of MT-CN on fasting serum insulin.

### Replication and biological interpretation

In principle, changes in MT-CN can be caused by changes in the number of mitochondrial genome copies within cells or by changes in the blood cell type composition. Based on the association with rs9389268 and the nuances of the normalization procedure described above, we sought to test the hypothesis that our MT-CN measurement primarily reflects the cell type composition of the blood rather than the number of mitochondria per cell. We used imputed array genotype and phenotype data from the UK Biobank (N = 357,656) for this purpose [40].

We first tested cell counts from the UK Biobank (UKBB) against a polygenic risk score (PRS) for MT-CN built using the genetic instrument from the Finnish data. Leukocyte, neutrophil, and platelet counts were all significantly associated with MT-CN PRS conditional on age, age2, and sex (see Methods, Table 6). However, adjusting for neutrophil counts in the leukocyte regression eliminated the signal (PRS regression coefficient P = 0.839), suggesting that the leukocyte count signal was driven by the effect of neutrophil count. We removed any high leverage, large residual samples and repeated the neutrophil and platelet count regressions to ensure that this result was robust to outliers and found no appreciable change in significance (Table 6). As a result, we concluded that our MT-CN measurement was significantly associated with neutrophil and platelet counts. Subsequent analyses were performed both with and without adjustment for these variables, as described below.

View this table:
[Table 6.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T6)

Table 6. 
Association results between blood cell count traits and MT-CN polygenic risk score in 357,656 UK Biobank samples.

We note that the effect directions of the associations of platelet counts with metS and MT-CN PRS seem inconsistent at first glance, as platelet counts were positively correlated with MT-CN PRS (Table 6) and metS (S3 Table) while MT-CN and insulin (a proxy for metS) were negatively correlated (Fig 1b). However, the FinMetSeq regression model in Fig 1b was not conditional on any other covariates (although age, age2, and sex were regressed out of the MT-CN measurement prior to this analysis), while the UKBB models that gave rise to Table 6 and S3 Table adjusted for many additional covariates, including 20 PCs and age-sex interaction terms. As a result, the effect directions for the analyses in the two datasets are not directly comparable.

We next tested for associations between MT-CN PRS and several cardiometabolic phenotypes from the test in Fig 1a (see Methods). With the exception of C-reactive protein, which showed no significant association, all tested phenotypes showed nominal association with MT-CN PRS at α = 0.05, with total triglycerides and HDL being the only traits surviving Bonferroni correction (Table 7). We interpret this as replication of the link between mitochondrial genome copy number and metabolic syndrome in a large, independent data set.

View this table:
[Table 7.](http://medrxiv.org/content/early/2020/10/27/2020.10.23.20218586/T7)

Table 7. 
Association results between metabolic syndrome traits and MT-CN polygenic risk score in 357,656 UK Biobank samples.

To determine whether there was any association between MT-CN and metabolic syndrome not mediated through cell counts, we repeated the tests of cardiometabolic trait association with MT-CN PRS with adjustment for platelet and neutrophil counts. HDL was the only trait with a nominal (α = 0.05) association with PRS under this adjustment, but this signal was not strong enough to survive Bonferroni correction (Table 7). This suggests that the associations we observed between MT-CN and metabolic traits arose simply because MT-CN is a proxy for platelet and neutrophil count. This was supported by the fact that direct testing of platelets and neutrophils against triglycerides, fat mass, and HDL yielded remarkably significant associations, which survived *post-hoc* removal of high-leverage, high-residual outlier samples (S3 Table). This evidence for MT-CN as a proxy for platelet and neutrophil counts strongly suggests that the causal relationship observed in the Mendelian randomization experiment (see above) in fact represents a causative role for neutrophils and platelet counts in setting serum insulin levels.

Given the strong observed associations between blood cell count phenotypes and MT-CN PRS, we used these blood phenotypes to seek replication of the genetic associations detected in Finnish data. Using imputed UKBB genotype data, we tested the expected alternate allele dosage of both rs2288464 and rs9389268 against the same blood cell traits mentioned above, using linear regression (S4 Table; expected alternate allele dosage was calculated from genotype call probabilities as *DS* = *P* (0*/*1) + 2*P* (1*/*1)). As expected given its known associations with multiple hematological parameters (see above), rs9389268 showed strong associations with all tested blood cell phenotypes. rs2288464 was not significantly associated with any of the five phenotypes after correction for multiple testing, although a nominal association was detected with total leukocyte count. This further strengthens our belief that rs9389268 is truly associated with MT-CN through blood cell composition. We also tested *TMBIM1* against the same blood cell traits using SKAT-O [37], and found no significant associations (S4 Table). This may mean that *TMBIM1* affects MT-CN through a mechanism other than altering blood cell type composition.

We further sought to generate hypotheses for other phenotypic associations with MT-CN. To this end, we performed a phenome-wide screen of MT-CN PRS against all of the UKBB phenotypes available to us. To curate and transform these phenotypes, we used a modified version of PHESANT [41,42], which outputs all continuous variables in both raw and inverse rank-normalized form. We chose to interpret the results from the normalized continuous variables (S5 Table) to be conservative and robust to outliers, although the results of the raw continuous variable analyses were similar (S6 Table). No metabolic syndrome traits appeared among the tested traits with q < 0.05. However, the tests for HDL cholesterol, self-reported heart attack, and doctor-diagnosed heart attack did yield somewhat suggestive results (q = 0.123, 0.176, and 0.176, respectively). We also repeated this screen with adjustment for neutrophil and platelet counts (S7 Table and S8 Table), resulting again in no metabolic syndrome phenotypes achieving *q* < 0.05. The addition of neutrophil and platelet counts as covariates attenuated the suggestive signals for HDL cholesterol, self-reported heart attack, and doctor-diagnosed heart attack (q = 0.284, 0.391, and 0.402, respectively).

## Discussion

We have described one of the most well-powered studies to date of the genetic relationship between MT-CN measurements in blood and cardiometabolic phenotypes. Our study is one of very few of which we are aware to utilize sequencing data for this purpose, and our samples are more deeply phenotyped than those used in prior work. Our data show highly significant associations between blood-derived MT-CN measurements and several cardiometabolic traits, particularly insulin and fat mass. We observed strong heritability of MT-CN (31%), on par with other widely studied cardiometabolic traits such as LDL, and identified one single marker association on a haplotype previously associated with several hematological parameters [31–34]. We also report one gene with a rare-variant association with MT-CN, *TMBIM1*, that has a known link to non-alcoholic fatty liver disease in model organisms [39]. Interestingly, rare variants in this gene were not associated (in aggregate) with blood cell counts in UK Biobank, suggesting that this gene might modulate NAFLD risk via MT-CN, but through a mechanism not involving blood composition. More work must be done to confirm this tentative genetic association and mechanistic hypothesis.

Using a novel multiple-variant instrument-building method, we report evidence from Mendelian randomization supporting a causal role for MT-CN in metabolic syndrome. Further, we used UK Biobank data to show that not only does the link between MT-CN and metabolic syndrome replicate in an independent data set using a polygenic risk score approach, but also that this association is mediated by neutrophil and platelet counts, contrary to previous claims that variability in the number of mitochondria per cell is responsible for CHD risk [12]. Taken together, these results suggest that neutrophil and platelet counts in peripheral blood are causally related to fasting insulin and related traits.

There is prior evidence to support the role of inflammation – specifically via innate immune cells such as neutrophils – in the etiology of type 2 diabetes (T2D) and insulin resistance [43–45], which suggests a plausible model by which peripheral blood neutrophil count could influence metabolic syndrome. Nutrient excess and high fat diets are known to recruit neutrophils into tissues, which then cause insulin resistance both by releasing TNF-α and IL-6 and by upregulating cyclooxygenase [43]. This leads to increased LTB4 and subsequent upregulation of NF-*κ*B, a central regulator of inflammation. Moreover, free fatty acids also cause neutrophils to stay in tissues longer, resulting in persistent inflammation and leading to insulin resistance [43]. While it is known that inflammation, and particularly neutrophils, play a role in metabolic syndrome, our results strongly suggest that peripheral blood neutrophil count causally contributes to this process and is associated with heritable genetic variation in the human population. Overall, our work provides further insight into the role that inflammation plays in metabolic syndrome and supports the idea that targeting inflammation may be a fruitful avenue of investigation in developing future therapeutics.

## Methods

### Genotype and phenotype data

Whole genome sequencing (WGS) was performed on a cohort of 4,163 samples comprising 3,074 male samples from the METSIM study [46] and 1,089 male and female samples from the FINRISK study [47]. Genomic DNA was fragmented on the Covaris LE220 instrument targeting 375 bp inserts. Automated Illumina libraries were constructed with the TruSeq PCR-free (Illumina) or KAPA Hyper PCR-free library prep kit (KAPA Biosystems/Roche) on the SciClone NGS platform (Perkin Elmer). The fragmented genomic DNA was size-selected on the SciClone instrument with AMPure XP beads to tighten the distribution of fragmented DNA to ensure the average insert of the libraries was 350-375 bp. We followed the manufacturer’s protocol as provided by Perkin Elmer, with the following exception: post ligation, the libraries were purified twice with a 0.7x AMPure bead/sample ratio to eliminate any residual adaptors present. An aliquot of the final libraries was diluted 1:20 and quantitated on the Caliper GX instrument (Perkin Elmer). The concentration of each library was accurately determined through qPCR utilizing the KAPA library Quantification Kit according to the manufacturer’s protocol (KAPA Biosystems/Roche) to produce cluster counts appropriate for the Illumina HiSeqX instrument. Libraries were pooled and run over a few lanes of the HiSeq X to ensure the libraries within the pool were equally balanced. The final pool of balanced libraries was loaded over the remaining number of HiSeq X lanes to achieve the desired coverage for this project. 2×150 paired end sequence data were demultiplexed using a single index, which was a restriction on the HiSeqX instrument at this time. A minimum of 19.5x coverage was achieved per sample.

The quality of the aligned sequence data was assessed using metrics generated by Picard [48] v2.4.1, Samtools v1.3.1 ([http://www.htslib.org/](http://www.htslib.org/)) and VerifyBamID v1.1.3 ([https://genome.sph.umich.edu/wiki/VerifyBamID](https://genome.sph.umich.edu/wiki/VerifyBamID)). Based on the output files from Picard, the following alignment statistics were collected for review: PF\_MISMATCH\_RATE, PF\_READS, PF\_ALIGNED\_BASES, PCT\_ADAPTER, PCT\_CHIMERAS, PCT\_PF\_READS\_ALIGNED, PCT\_READS\_ALIGNED\_IN\_PAIRS, PF\_HQ\_ALIGNED\_BASES, PF\_HQ\_ALIGNED\_Q20\_BASES, PF\_HQ\_ALIGNED\_READS, MEAN\_INSERT\_SIZE, STANDARD\_DEVIATION, MEDIAN\_INSERT\_SIZE, TOTAL\_READS, PCT\_10x, and PCT\_20x. Alignment rate was calculated as PF\_READS\_ALIGNED/TOTAL_READS. The formula for haploid coverage was as follows: ![Graphic][1]</img>. From the Samtools output, inter-chromosomal rate was calculated as: ![Graphic][2]</img> and discordant rate was calculated as: *reads*_*mapped*_*percentage* − *reads*_*mapped*_*in*_*proper*_*pairs*_*percentage*.

Properly paired percentage (reads\_mapped\_in\_proper\_pairs\_percentage) and singleton percentage (reads_mapped_as_singleton_percentage) were also reviewed. From VerifyBamID, the Freemix value was reviewed.

The metrics for judgement of passing data quality were: FIRST\_OF\_PAIR\_MISMATCH\_RATE < .05, SECOND\_OF\_PAIR\_MISMATCH\_RATE < 0.05, haploid coverage ≥ 19.5, interchromosomal rate < .05, and discordant rate < 5. All of the above metrics must have been met in order for the sample to be assigned as QC pass. If a sample did not meet the passing criteria, the following failure analysis was performed: a) If the Freemix score was at least 0.05, the sample or the library was considered contaminated, and both the library and the sample were abandoned; b) if the discordant rate was over 5 and/or the inter-chromosomal rate was over 0.05, the quality of DNA was considered poor and the sample was removed from the sequencing pipeline; and c) in the case of a) and b), the collaborator was contacted to determine if selection of a replacement sample from the same cohort was desired or feasible.

Separately, whole exome sequencing (WES) data (N = 19,034), genotyping array data (N = 17,718) imputed using the Haplotype Reference Consortium panel [49] v1.1, and transformed, normalized quantitative cardiometabolic trait data were obtained from an earlier study [26]. FINRISK array data came in nine genotyping batches, two of which were excluded from the present study due to small sample size. The traits, normalization and transformation procedures, and sample sizes are described in a previous publication [26]. The WES and imputed sample sets contained 4,013 and 3,929 of the 4,163 WGS samples included in the present study.

All participants in both the METSIM and FINRISK studies provided informed consent, and study protocols were approved by the Ethics Committees at participating institutions (National Public Health Institute of Finland; Hospital District of Helsinki and Uusimaa; Hospital District of Northern Savo). All relevant ethics committees approved this study.

### WGS callset generation and quality control

Single nucleotide polymorphisms (SNPs) and small insertions and deletions were called from the full set of 4,163 samples using GATK [50] v3.5. GVCFs containing SNVs and Indels from GATK HaplotypeCaller (-ERC GVCF -GQB 5 -GQB 20 -GQB 60 -variant\_index\_type LINEAR -variant\_index\_parameter 128000) were first processed to ensure no GVCF blocks crossed boundaries every 1 Mb (CombineGVCFs; --breakBandsAtMultiplesOf 1000000). The resulting GVCFs were then processed in 10 Mb shards across each chromosome. Each shard was combined (CombineGVCFs), genotyped (GenotypeGVCFs; -stand\_call_conf 30 -stand_emit_conf 0), hard filtered to remove alternate alleles uncalled in any individual removed (SelectVariants; --removeUnusedAlternates), and hard filtered to remove lines solely reporting symbolic deletions in parallel. All shards were jointly recalibrated (VariantRecalibrator) and then individually filtered (ApplyRecalibration) based on the recalibration results. All of the above methods were performed using GATK v3.5. SNP variant recalibration was performed using the following options to VariantRecalibrator and all resources were drawn from the GATK hg38 resource bundle (v0):

 -mode SNP -resource:hapmap,known=false,training=true,truth=true,prior=15.0 -resource:omni,known=false,training=true,truth=true,prior=12.0 -resource:1000G,known=false,training=true,truth=false,prior=10.0 -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 -an QD -an DP -an FS -an MQRankSum -an ReadPosRankSum -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 
Indel variant recalibration was performed using the following options to VariantRecalibrator (with the same resource bundle as with SNPs):

 -mode INDEL -resource:mills,known=true,training=true,truth=true,prior=12.0 -an DP -an FS -an MQRankSum -an ReadPosRankSum --maxGaussians 4 -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 
When applying the variant recalibration the following options were used:

For SNPs: --ts\_filter\_level 99.0

For Indels: --ts\_filter\_level 99.0

Following SNP and INDEL variant recalibration, multiallelic variants were decomposed and normalized with vt [51] v0.5. Duplicate variants and variants with symbolic alleles were subsequently removed. The bottom tranche of variants identified by GATK’s Variant Quality Score Recalibration tool and variants with missingness greater than 2% were removed as well, although variants with allele balance between 0.3 and 0.7 were rescued. Variants with Hardy-Weinberg equilibrium (on a second-degree unrelated subset of 3,969 individuals, as determined by KING [52]) P value less than 10−6 and those with allele balance less than 0.3 or greater than 0.7 were also removed.

Sample-level quality control was also undertaken on this dataset; 13 samples were identified for exclusion because of singleton counts that were at least eight median absolute deviations above the median. Separately, 12 sex-discordant samples were flagged using plink --check-sex, and after examining chromosome Y missingness and F coefficient values for these samples, only the one that clearly differed from its reported sex was marked for exclusion. No samples were excluded based on missingness fraction or the first five principal components. In total, 14 samples were excluded from the heritability, GWAS, and Mendelian randomization analyses; the other analyses were performed without exclusion of these samples. As a result, the former analyses were performed with N = 4,149 while the latter had N = 4,163.

### Mitochondrial genome copy number estimation

We estimated mitochondrial genome copy number (MT-CN) from both WGS and WES data. In WGS data, we used BEDTools [53] to calculate per-base coverage on the mitochondrial genome from the latest available 4,163 WGS CRAM files. MT-CN was then calculated by normalizing the mean coverage of the mitochondrial genome to the “haploid coverage” of the autosomes as calculated by Picard [48]. The result was then doubled to account for the diploidy of the autosomal genome. This normalization is summarized by the following equation: ![Graphic][3]</img>.

The output from the above equation served as the raw measurement of per sample MT-CN. To reduce batch effects, we separated the 4,163 samples into three groups: METSIM, FINRISK collected in 1992 or 1997, and FINRISK collected in 2002 or 2007 (the FINRISK batching decisions were made based on the means shown in S4 Fig). Within each cohort, the raw estimates were regressed on age, age2 and sex (FINRISK only) and the residuals were inverse-normal transformed. We combined the three batches of normalized MT-CN values and inverse-normal transformed the combined values for downstream analysis.

We used a similar procedure to estimate MT-CN from WES data, with mean autosomal coverage estimates taken from XHMM [54]. However, as mitochondrial genomic coverage was nonuniform due to the use of hybrid capture probes, mean mtDNA coverage was not an obvious choice of metric for MT-CN estimation (S5 Fig). To summarize this nonuniform mitochondrial genomic coverage into a single number, we tried taking the mean and the maximum depth of reads that aligned to the mitochondrial chromosome; the resulting values were then processed in the same way as the WGS-estimated values. We evaluated the approaches by measuring the R2 between WGS-estimated and WES-estimated MT-CN in the 4,013 samples for which both data types were available (S6 Fig). While R2 was fairly high using both approaches, the maximum coverage method was ultimately selected for use as it yielded a higher R2 (0.445 vs 0.380). As a result, the WES MT-CN estimate was calculated as follows: ![Graphic][4]</img>.

### Mitochondrial haplogroup estimation

We assigned mitochondrial haplogroups using HaploGrep [55] v1.0. Mitochondrial SNP/indel variants were genotyped using GATK GenotypeGVCFs, and a customized filter based on allele balance was applied to the combined callset. HaploGrep was then used to call mitochondrial haplotypes for each individual. We adjusted for major haplogroups in the same linear regressions of metabolic traits onto MT-CN (see Results) and calculated the summary statistics from a permutation test as implemented in lmPerm ([https://github.com/mtorchiano/lmPerm](https://github.com/mtorchiano/lmPerm)).

### Heritability analysis

To estimate heritability of MT-CN, a genomic relatedness-based restricted maximum-likelihood (GREML) method was used as implemented in GCTA [56]. The original GREML [57] method was used first, followed by GREML-LDMS [58] to account for biases arising from differences in minor allele frequency (MAF) spectrum or linkage disequilibrium (LD) properties between the genotyped variants and the true causal variants [59]. For both analyses, MT-CN values were normalized and residualized for sex, age, and age2 as described above. Heritability estimation was performed jointly and separately for METSIM and FINRISK samples using WGS and imputed array genotypes. In all cases, a minimum MAF threshold of 1% was applied. Beyond those covariates already adjusted for in the normalization process, sensitivity analyses were performed on imputed array data to determine whether heritability estimates were sensitive to inclusion of covariates. In these experiments, either cohort or FINRISK genotyping array batch were included as fixed-effect covariates in joint analyses of imputed array data; in neither case was the final heritability estimate significantly affected (h2 = 0.09, SE = 0.02 in both cases). In GREML-LDMS, genotypes were split into four SNP-based LD score quartiles and two MAF bins (1% > MAF > 5% and MAF > 5%), and genetic relatedness matrices (GRMs) were estimated separately for each of the eight combinations. The GREML algorithm was then run on all eight GRMs simultaneously using the first ten principal components (PCs) of the genotype matrix (as calculated by smartPCA v13050; [https://github.com/DReichLab/EIG](https://github.com/DReichLab/EIG)) as fixed covariates [58]. The use of GREML-LDMS over GREML also did not affect estimated heritability values (Table 3), suggesting that the properties of the causal variants for this trait do not lead to significant biases when using the standard GREML approach.

We observed that WGS heritability estimates decrease when analyzing FINRISK and METSIM data together compared to analysis of METSIM alone (Table 2) (note that FINRISK-only heritability estimates are not reliable as they have large standard errors resulting from the small number of FINRISK samples sequenced). One potential explanation for this is that there exists substantial heterogeneity across FINRISK survey years (S4 Fig), and between the FINRISK and METSIM cohorts, with respect to the reliability with which mtDNA was captured (likely due to different DNA preparation protocols).

### Genome-wide association analyses

Genome-wide association studies (GWAS) were performed using the same normalized phenotype used in heritability analyses. Single-variant GWAS were conducted using EMMAX as implemented in EPACTS ([https://genome.sph.umich.edu/wiki/EPACTS](https://genome.sph.umich.edu/wiki/EPACTS)). Kinship matrices required by EMMAX were generated by EPACTS; kinship matrices for WGS GWAS were generated from WGS data, while those for WES and imputed array based GWAS were generated from WES data. A P value threshold of 5×10−8 was used for the WGS and imputed array GWAS while 5×10−7 was used for significance in the WES GWAS. Single-variant association analyses of WGS and WES data did not include any covariates in the EMMAX model, although all association analyses were performed using MT-CN values that adjusted for age, age2, sex, and cohort (see “Mitochondrial Genome Copy Number Estimation”).

Gene-based variant aggregation studies (RVAS) were done using a mixed-model version of SKAT-O [37] as implemented in EPACTS. Variants with CADD [60] score greater than 20 and minor allele frequency less than 1% were grouped into genes as annotated by VEP [61] (which annotates a variant with a gene name if the gene falls within 5k of that gene by default). For gene-based RVAS, genome-wide significance thresholds varied slightly due to differing the number of genes with at least two variants meeting the above criteria in each test, but were approximately 2×10−6 in all cases.

### Mendelian randomization

To assess the evidence for a causal relationship between mitochondrial genome copy number and fasting serum insulin levels, the METSIM cohort alone was used due to its homogeneity of sex, collection procedures, and location. A penalized regression based, multiple variant Mendelian randomization (MR) approach was employed to enforce the necessary assumptions of MR methods. While some MR studies have tested one or more assumptions *post hoc*, to our knowledge, there is no published method that tries to enforce these assumptions during the process of building the genetic instrument in the absence of a large set of known genotype-exposure associations. In our formulation (Fig 4a), X, the natural log of MT-CN (adjusted for nuclear genomic coverage but not for age, age2, or sex), and a genotype matrix G were used to build a genetic instrument Z, which was then tested against Y, the natural log of fasting serum insulin. The goal of the MR approach was to use a large number of common variants to build a genetic instrument Z that satisfied the three assumptions of MR [62]:

1.  Association of Z with X

2.  Independence of Z from any variables U confounding the relationship between X and Y

3.  Independence of Z and Y given X and U

To attempt to build a genetic instrument satisfying assumptions 2 and 3, the deep METSIM phenotype data were leveraged. A matrix W was constructed using the 75 measured traits and first 20 PCs of the genotype matrix (including a third-degree polynomial basis for PC 1). From these variables, covariates that could violate one of these two assumptions were chosen by selecting columns of W associated with X or Y (Fig 4b). These columns were selected using two successive LASSO feature selection procedures. First, a set A of covariates associated with Y was chosen by using LASSO to regress Y onto W. In this regression, age and the third-degree polynomial basis for PC 1 were left unpenalized to ensure that A contains these covariates. The shrinkage parameter was chosen by tenfold cross-validation as the largest value that gives a mean squared error (MSE) within one standard error of the minimum observed MSE. Next, the columns of W associated with X conditional on A were chosen using a similar LASSO procedure in the regression of X onto W. In this step, however, the variables in set A were left unpenalized in order to only capture associations that are conditionally independent of A. The selected variables from this regression were designated set B.

The instrument was built using a penalized regression (using either an L1 or L2 penalty, as implemented in glmnet [63]) of the form *X* ∼ *G* + *W* *A* + *W**B*, where WA and WB are the columns of W representing sets A and B, respectively, and G is a genotype matrix containing the alternate allele dosage (missing alleles are replaced with the MAF, similarly to PLINK [64]) of all variants with MAF greater than 1% and marginal GWAS P value below 0.01. As X was the target vector for this regression, assumption 1 of MR was trivial. In the penalized regression, WA and WB were unpenalized in an effort to orthogonalize the regression coefficients of the genotypes to these covariates in an effort to enforce assumptions 2 and 3. glmnet was run with a convergence threshold of 1×10−10 and maximum number of iterations of 200,000. To avoid the overfitting that would result from calculating instrument values on the same samples on which regression coefficients are learned [65], the penalized regression model was fit on independent subsets of the data as follows. Five models were fit, each by holding out a different 20% of samples, such that the instrument value computed for each sample was calculated using the regression coefficient vector learned without that sample. The vector of possible shrinkage parameters λ for all five models was supplied as (103,102,…,10−13,10−14), and the λ value which minimized the joint residual sum of squares of all five models was chosen for instrument calculation.

Formally, we randomly partitioned the set of samples S with nonmissing insulin measurements into five nonoverlapping sets Sj for *j* = {1, …, 5}. We denote set complements as ![Graphic][5]</img>, such that each SjC contained 80% of the training samples. The instrument vector Zj for each Sj was computed as follows: ![Graphic][6]</img>, where Zj is the instrument vector for Sj, Gj is the genotype matrix of Sj, and βG(-j) is the vector of genotype regression coefficients from the model described above, trained on SjC. The instrument values within each Sj were inverse rank-normalized using a Blom transformation [66,67] before being concatenated across the values of j to give the final instrument vector Z. Because samples with missing insulin values could not be included in the causality test anyway, these samples were excluded from S but safely included in the training sets of all five models. The instrument values of these samples were never calculated or used in downstream analyses.

Often, the inclusion of unpenalized covariate sets A and B in the instrument-building regression was not sufficient to completely orthogonalize Z to these covariates (see below). As a result, the test for association between Z and Y was performed conditional on a set of potentially assumption-violating covariates chosen using the newly constructed instrument Z in another attempt to account for possible violations of MR assumptions in the causality test (Fig 4c). To choose this set of covariates C, a final feature selection step was performed using LASSO regression of Z on W with covariate set A excluded from the penalty. As in the previous feature selection steps, the shrinkage parameter was chosen via tenfold cross-validation as the largest value with MSE within one standard error of the minimum observed MSE. Once this set, D, of covariates associated with Z was chosen, the covariates in W were partitioned into sets I, II, III, and IV based on their membership in A and D (see Fig 4). Formally, this partitioning was done as follows: *I* = *W* \ (*A* ∪ *D*), *II* = *D* ∩ *A**C*, *III* = *A* ∩ *D**C*, and *IV* = *A* ∩ *D*, where *A**C* = *W* \ *A* and *D**C* = *W* \ *D*. Then, the test for causality came from the regression coefficient of Z in the multiple regression *Y* ∼ *Z* + *C*, where C is the union of sets II, III, and IV (colored blue in Fig 4c).

To account for missing data in W, missing values were multiply imputed using regression trees as implemented in the R package mice [68] v3.4.0 (maxit=25). This imputation was repeated 1000 times in parallel, with each set of imputed values being carried through the entire procedure described above. The resulting 1000 computed instrument effect sizes and standard errors were combined using Rubin’s method as implemented in the R package Amelia [69] v1.7.5. The combined effect size and standard error were then tested for significance using a t-test with 998 degrees of freedom.

The above procedure was performed separately for METSIM samples with WGS data (N = 3,034) and METSIM samples with only imputed array data (N = 6,774) using an L1 penalty in the instrument-building regression, and again using an L2 penalty. Both sample sets were limited to those for which relevant quantitative traits were available. An inverse-variance weighted meta-analysis was performed across data sets for L1 and L2-penalized regression separately. The resulting effect size and standard error were tested for significance using a Z test.

To ensure that our results were not driven by outlier samples, we removed outliers in two stages. Before the MR analyses, we used principal components analysis (PCA), Mahalanobis distance, and multi-trait extreme outlier identification to remove 5 WGS samples and 15 imputed array samples based on quantitative trait data. We also removed high leverage, high residual outliers from the causality test regression (see below) *post hoc* and recomputed the instrument effect sizes to ensure that there was no significant change in the results. In each of the 1,000 multiple imputation runs, among the samples with standardized residual greater than 1, the top 10 samples by leverage were recorded. Any sample that was recorded in this way in at least one run was then excluded from the re-analysis as a *post hoc* outlier. The results of this additional analysis showed only very small differences in effect estimates, and their interpretation remained the same (S9 Table). Thus, we concluded that our causal inference results were not driven by outlier samples.

One caveat of this method is that, as mentioned above, exclusion of sets A and B from the regression penalty did not perfectly orthogonalize the resulting instrument from these variables in practice (S7 Fig). Reasons for this may include relatively low levels of shrinkage in the instrument-building regression or higher order associations between MT-CN and the confounding variables. However, our method still represents an improvement over the current standard, which is not to adjust for these covariates at all. Another caveat is that it is impossible to determine the perfect set of covariates for which adjustment is appropriate. Lack of adjustment for truly confounding variables can result in an instrument which does not satisfy MR assumptions 2 and/or 3, yielding a biased effect estimate. Conversely, unnecessary adjustment for certain variables can also result in biases. For example, adjusting for an intermediate phenotype that truly lies along the path from Z to X to Y can cause a false negative signal, making the causality test overly conservative. Alternatively, adjusting for some variables can result in collider biases [70]. That is, if both Z and Y are causal for a confounder U, then adjusting for U can induce a dependency between Z and Y (S8 Fig) that did not previously exist.

We note that a known source of bias in MR studies is the selection of samples based on case-control status for a related disease [71]. While METSIM is a population-based study, samples were selected for WGS based on cardiovascular disease case-control status so as to enrich the sequenced samples for cases. This has the potential to bias a MR experiment if both the exposure and the outcome are associated with the disease, which is certainly possible. However, in our design, all of the METSIM samples not chosen for WGS were tested in the imputed array experiment. The consistency of effect estimates between the WGS and imputed array samples both in the L1 and L2 penalty cases (Fig 4d) suggests that there is little to no bias arising from sample selection in this experiment.

### Calculation and testing of polygenic risk score in the UK Biobank

To search for associations between MT-CN and other phenotypes, the genetic instrument calculated in Finnish imputed array data was computed and treated as a polygenic risk score (PRS) in a relatively homogenous subset of 357,656 UK Biobank samples identified by a previous study [42]. We calculated ![Graphic][7]</img>, the average of the five values of βG(-j) across all 1000 multiple imputation runs using an L2 penalty and imputed array data – the L2 penalty was chosen because it performed better than the L1 on both METSIM data types, and the imputed array data set was chosen due to its larger sample size than the WGS set (Fig 4d). Next, to keep the procedure as consistent as possible with the imputation protocol used for METSIM – which used haploid dosage values to call imputed genotypes [26] – we called imputed genotypes using the expected alternate allele dosage from the UK Biobank by setting thresholds of 0.5 and 1.5. Using the resulting imputed variant calls, we calculated our PRS as ![Graphic][8]</img>, where ![Graphic][9]</img> is the UK Biobank genotype dosage matrix constructed in the same way as G in METSIM.

To test for associations with MT-CN PRS in the UK Biobank, we employed two approaches: a hypothesis-driven analysis targeted to the phenotypes associated with MT-CN in the Finnish data as well as a hypothesis-free screen of all the phenotypes available to us.

In the targeted analysis, we used our genetic instrument from the MR experiment as a PRS for MT-CN in our chosen subset of the UK Biobank and tested for associations with several blood cell count and metabolic syndrome traits. Given the association of MT-CN with rs9389268 (see Results), we selected as cell count traits total leukocyte count as well as lymphocyte, neutrophil, monocyte, and platelet counts for testing (because lymphocyte count was not readily available, it was calculated as the product of leukocyte count and lymphocyte percentage). We did not include basophils and eosinophils in this analysis considering that they comprise a small minority of white blood cells and are unlikely to affect MT-CN measured from whole blood. All cell count traits were log-transformed and standardized separately by sex.

We took several steps to eliminate outlier samples in the dataset. Through three iterations of PCA on the cell count matrix and subsequent outlier removal, we removed 1,637 outlier samples. We then fit null linear models of the form *cell count* ∼ *age* + *age*2 + *sex* + *age* : *sex* + *age*2 : *sex* + *P Cs* (the first 20 PCs were included) for each cell count trait and subsequently removed samples with either large residuals or high leverage and moderate residuals in at least one model (following the example of [42]). Through two iterations of null model fitting and outlier removal, we removed 7 additional samples based on null model fit. Tests of association between cell counts and MT-CN PRS were based on the PRS regression coefficient in linear models of the form *cell count* ∼ *PRS* + *age* + *age*2 + *sex* + *age* : *sex* + *age*2 : *sex* + *P Cs*.

We repeated this process for those cardiometabolic traits found to be suggestively associated with MT-CN in the Finnish dataset (P < 10−6) that were also readily available in the UK Biobank (Fig 1a, S1 Table); these phenotypes were body mass index (BMI), fat mass, C-reactive protein, high-density lipoprotein, total triglycerides, and weight. We also chose to include T2D status because of the lack of insulin measurement in the UK Biobank. Except for T2D, a binary trait, all traits were log-transformed before further analysis (after removing 817 samples with negative values for T2D, representing missing information). The above outlier removal steps were repeated for the cardiometabolic traits after excluding the outliers already identified from the cell count data, with the only major modification being the use of logistic regression for the T2D models. This process resulted in the removal of 42 and 53 samples from PCA and null model fitting, respectively.

SKAT-O tests of association between *TMBIM1* and the cell count traits identified above were also performed. Similarly to the RVAS in Finnish data, variants within 5kb of *TMBIM1* with MAF < 1% and CADD v1.6 score > 20 were selected for inclusion in this analysis. Rather than the mixed-model version of SKAT-O used in the Finnish data, standard SKAT-O was used due to the lower expected level of cryptic relatedness in the UK Biobank population.

We also performed a hypothesis-free, phenome-wide screen of UK Biobank traits to which we had access (S10 Table), to search for other associations with MT-CN PRS. The statistical models used in this screen were of the same form as those described above, both with and without adjustment for neutrophil and platelet counts. To curate and transform phenotypes, we used an adapted version of PHESANT [41,42]. A few further modifications were made to the pipeline, the most significant being the direct use of logistic regression for testing categorical unordered variables, the inclusion of cancer phenotypes, and the exclusion of sex-specific (or nearly sex-specific) categorical traits. The PHESANT pipeline we used [42] outputs continuous variables both in their raw form and after applying an inverse rank normal transformation. For the sake of being conservative and robust to outliers, we chose to interpret the results from the normalized continuous variables. To control false discovery rate, we performed a Benjamini-Hochberg procedure with Storey correction as implemented in the R package qvalue [72] v2.18.0 on the categorical and normalized continuous variables together. As a secondary analysis, this same correction was applied to the categorical and raw continuous variables together.

## Supporting information

Supplementary Tables [[supplements/218586_file02.xlsx]](pending:yes)

Supplementary Figures [[supplements/218586_file03.pdf]](pending:yes)

## Data Availability

METSIM WGS, METSIM WES, and FINRISK WES sequence data are available through dbGaP (accessions phs001579, phs000752, and phs000756). METSIM variant data as well as MT-CN phenotype values will soon be available through AnVIL (accessions TBD). Imputed array GWAS summary statistics from METSIM and WES SKAT-O summary statistics from the joint dataset are freely available at https://wustl.box.com/s/7xfbmxq2r4kg8p8bfc7vpqlrmqvhm0lx. Genomic and phenotypic data for the FINRISK cohort are or will soon be obtainable through THL Biobank, the Finnish Institute for Health and Welfare, Finland (https://thl.fi/en/web/thl-biobank).

## Author contributions

LG led trait association, heritability, and MR analyses. LC discovered the original links between NumtS, MT-CN, and metabolic traits, and developed the MT-CN estimation method. RC and LG conceived the MR covariate correction approach, and RC provided critical guidance on statistical methods. HJA, DL, ID, AR, and LG led variant callset creation and QC. JV and ML performed the analysis of insulin sensitivity and resistance. AP, SR, ML, JK, and AH contributed samples and phenotypic data. All authors edited the manuscript and/or provided intellectual contributions. IH and NOS conceived the study and supervised the overall project. LG and IH wrote the paper, with contributions from LC.

## Supporting information

**S1 Fig. Insulin association signal at chromosome 1 NumtS**. Manhattan plot from an analysis of copy number variation at nuclear mitochondrial insertion sites (NumtS) using CNVnator read-depth measurements in 1 kb genomic windows, using an initial set of 2,049 samples from an earlier data freeze analyzed for an unrelated CNV association study (manuscript in prep.). Shown at the bottom are two tracks from the UCSC Genome Browser [73]. The black track represents an assembly gap upstream of the association signal, and the blue track shows the location of the NumtS region where the association peak is located.

**S2 Fig. Joint (METSIM and FINRISK) single-marker association tests with MT-CN in WGS and WES data**. (a) Manhattan plot and quantile-quantile (QQ) plot for an exome-wide association test of normalized, WES-measured MT-CN using WES genotype data (N = 19,034). The red line represents an exome-wide significance level of 5×10−7; no tested markers achieved this level of significance. The QQ plot is separated by minor allele frequency bin, as indicated by the colors and shapes of the points. (b) Manhattan plot and quantile-quantile (QQ) plot for a genome-wide association test of normalized, WGS-measured MT-CN using WGS genotype data (N = 4,149). The red line represents a genome-wide significance level of 5×10−8; no tested markers achieved this level of significance. The QQ plot is separated by minor allele frequency bin, as indicated by the colors and shapes of the points.

**S3 Fig. Single-marker association test results at chromosome 10 QTL identified by Curran *et al***. Chromosome 10 Manhattan plot for association with WES-estimated MT-CN in Finnish imputed array data from METSIM and FINRISK. The blue highlighted region represents the approximate location of the linkage peak reported for MT-CN in Curran *et al*., 2007 [23]. This region is larger than the reported region in the Curran study (∼30 Mb vs. ∼24 Mb) because the exact coordinates of the 1-LOD support interval were not reported explicitly, so the genetic coordinates were approximated from a figure and converted to physical coordinates using a genetic map from HapMap Phase 2 [74]. Some error may have been introduced due to differing populations between the HapMap data and the Mexican-American samples used by Curran.

**S4 Fig. Raw WGS-based MT-CN estimate distributions for METSIM and all four FINRISK surveys**. Facet labels “1992”, “1997”, “2002”, and “2007” refer to individual FINRISK survey years.

**S5 Fig. Nonuniform WES coverage across the mitochondrial genome for four representative samples**.

**S6 Fig. Comparison of two methods of summarizing nonuniform mitochondrial coverage in WES into a single measurement**. Both panels show the correlation with WGS mean coverage for the 4,013 samples for which both WGS and WES data were available. (a) shows the results of summarizing WES mitochondrial coverage using the maximum coverage value across the mitochondrial chromosome while (b) shows the results of using the mean mitochondrial coverage instead. R2 values are shown on each panel.

**S7 Fig. Associations between Z and traits in set A in imputed data**. P values computed from linear regression using the output from a randomly chosen run of multiple imputation of phenotype data. (a) and (c) show the marginal P values of the traits in set A from regressions of Z onto each trait separately, while (b) and (d) show the P values from multiple regression of Z onto all traits in set A. The instrument in (a) and (b) was computed using an L1 penalty, while that in (c) and (d) was computed using an L2 penalty.

**S8 Fig. Example of collider bias**. If Z and Y are independently causal for a variable U, then adjusting for U can induce an association between Z and Y that did not previously exist.

**S1 Table. Cardiometabolic trait associations with MT-CN in WGS data**. Phenome-wide EMMAX association results of normalized MT-CN against 137 cardiometabolic traits (N = 4,163). These are the data underlying Fig 1a. OGTT: Oral glucose tolerance test, NMR: nuclear magnetic resonance.

**S2 Table. MT-CN associations with insulin and fat mass in WES data**. Results of EMMAX test of association between normalized MT-CN and both fat mass and fasting serum insulin using WES data. Association tests were performed in all samples and also separately among samples with and without WGS data.

**S3 Table. Direct associations between platelet/neutrophil counts and metabolic syndrome phenotypes in the UK Biobank**. Results of direct testing of platelet and neutrophil counts against metabolic syndrome phenotypes chosen based on marginally significant associations for MT-CN without adjustment for cell counts. Testing was done in the UK Biobank (N = 357,656) using linear regression of traits onto cell counts conditional on the same covariates as the MT-CN analyses (see Methods). The bottom two rows show the results after removal of high-residual, high-leverage outliers as determined by Cook’s distance.

**S4 Table. Associations in the UK Biobank between cell counts and loci mapped to MT-CN in Finnish data**. Association tests in the UK Biobank (N = 357,656) of cell counts against common-variant and gene-based rare-variant associations found in Finnish GWAS.

**S5 Table. MT-CN phenome-wide screen using normalized continuous variables with no cell count adjustment**. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and normalized continuous variables. These tests were performed without adjustment for platelet and neutrophil counts. A dark line separates the results at q = 0.05.

**S6 Table. MT-CN phenome-wide screen using raw continuous variables with no cell count adjustment**. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and raw continuous variables. These tests were performed without adjustment for platelet and neutrophil counts. A dark line separates the results at q = 0.05.

**S7 Table. MT-CN phenome-wide screen using normalized continuous variables with adjustment for platelet and neutrophil counts**. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and normalized continuous variables. These tests were performed with adjustment for platelet and neutrophil counts. A dark line separates the results at q = 0.05.

**S8 Table. MT-CN phenome-wide screen using raw continuous variables with adjustment for platelet and neutrophil counts**. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and raw continuous variables. These tests were performed with adjustment for platelet and neutrophil counts. A dark line separates the results at q = 0.05.

**S9 Table. Mendelian randomization results after removal of high leverage *post hoc* outliers**. Results of Mendelian randomization test for causality of MT-CN on fasting serum insulin, after removal of high leverage *post hoc* outliers (see “Inference of causality in the association between MT-CN and insulin”). These results are nearly identical to those computed before outlier removal (Fig 4d), indicating that the detected signals were not driven by outlier samples.

**S10 Table. Phenotypes in the UK Biobank included in the phenome-wide screen after filtering and transformation by PHESANT**. Variable type is reported by PHESANT, where “Categorical unordered” corresponds to a binary variable representing a single possible answer. Samples are generally only excluded as a result of sex imbalances, although there were four phenotypes for which a model was unable to be fit when adjusting for cell counts.

## Acknowledgements

We thank H. Jung for her help in identifying outlier individuals as well as the WashU data production team, in particular R. Fulton, L. Fulton, C. Fronick, A. Wollam and S.K. Dutcher. This research has been conducted using the UK Biobank Resource under Application Number 56546. The FINRISK samples used for the research were obtained from the FINRISK Study and from THL Biobank. We thank all study participants for their generous participation at THL Biobank and the FINRISK Study.

*   Received October 23, 2020.
*   Revision received October 23, 2020.
*   Accepted October 27, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.Koliaki C, Roden M. Alterations of Mitochondrial Function and Insulin Sensitivity in Human Obesity and Diabetes Mellitus | Annual Review of Nutrition. Annu Rev Nutr. 2016;36: 337–367.
    
    
2.  2.Weisberg SP, McCann D, Desai M, Rosenbaum M, Leibel RL, Ferrante AW. Obesity is associated with macrophage accumulation in adipose tissue. J Clin Invest. 2003;112: 1796–1808.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1172/JCI200319246&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14679176&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000187348300007&link_type=ISI) 

3.  3.Kim J-A, Wei Y, Sowers JR. Role of Mitochondrial Dysfunction in Insulin Resistance. Circ Res. 2008;102: 401–414.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNpcmNyZXNhaGEiO3M6NToicmVzaWQiO3M6OToiMTAyLzQvNDAxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMjcvMjAyMC4xMC4yMy4yMDIxODU4Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

4.  4.Burgueño AL, Cabrerizo R, Gonzales Mansilla N, Sookoian S, Pirola CJ. Maternal high-fat intake during pregnancy programs metabolic-syndrome-related phenotypes through liver mitochondrial DNA copy number and transcriptional activity of liver PPARGC1A. J Nutr Biochem. 2013;24: 6–13.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jnutbio.2011.12.008&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22658649&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

5.  5.Sookoian S, Rosselli MS, Gemma C, Burgueño AL, Gianotti TF, Castaño GO, et al. Epigenetic regulation of insulin resistance in nonalcoholic fatty liver disease: Impact of liver methylation of the peroxisome proliferator–activated receptor γ coactivator 1α promoter. Hepatology. 2010;52: 1992–2000.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/hep.23927&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20890895&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000285023700015&link_type=ISI) 

6.  6.Begriche K, Igoudjil A, Pessayre D, Fromenty B. Mitochondrial dysfunction in NASH: Causes, consequences and possible means to prevent it. Mitochondrion. 2006;6: 1–28.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=101016/jmito200510004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16406828&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000237273100001&link_type=ISI) 

7.  7.Zhou X, Li R, Liu X, Wang L, Hui P, Chan L, et al. ROCK1 reduces mitochondrial content and irisin production in muscle suppressing adipocyte browning and impairing insulin sensitivity. Sci Rep. 2016;6: 29669.
    
    
8.  8.Ren J, Pulakat L, Whaley-Connell A, Sowers JR. Mitochondrial biogenesis in the metabolic syndrome and cardiovascular disease. J Mol Med. 2010;88: 993–1001.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00109-010-0663-9&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20725711&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

9.  9.Stephenson EJ, Hawley JA. Mitochondrial function in metabolic health: A genetic and environmental tug of war. Biochimica et Biophysica Acta (BBA) - General Subjects. 2014;1840: 1285–1294.
    
    
10. 10.Szendroedi J, Phielix E, Roden M. The role of mitochondria in insulin resistance and type 2 diabetes mellitus. Nat Rev Endocrinol. 2012;8: 92–103.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrendo.2011.138&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21912398&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

11. 11.Ding J, Sidore C, Butler TJ, Wing MK, Qian Y, Meirelles O, et al. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ∼2,000 Sardinians Using Tailored Sequencing Analysis Tools. PLoS Genet. 2015;11: e1005306.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1005306&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26172475&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

12. 12.Chen S, Xie X, Wang Y, Gao Y, Xie X, Yang J, et al. Association between leukocyte mitochondrial DNA content and risk of coronary heart disease: A case-control study. Atherosclerosis. 2014. pp. 220–226. doi: 10.1016/j.atherosclerosis.2014.08.051
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.atherosclerosis.2014.08.051&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25244506&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

13. 13.Lee HK, Song JH, Shin CS, Park DJ, Park KS, Lee KU, et al. Decreased mitochondrial DNA content in peripheral blood precedes the development of non-insulin-dependent diabetes mellitus. Diabetes Res Clin Pract. 1998;42: 161–167.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0168-8227(98)00110-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9925346&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000077850800004&link_type=ISI) 

14. 14.Shoar Z, Goldenthal MJ, De Luca F, Suarez E. Mitochondrial DNA content and function, childhood obesity, and insulin resistance. Endocr Res. 2016;41: 49–56.
    
    
15. 15.Song J, Oh JY, Sung Y-A, Pak YK, Park KS, Lee HK. Peripheral Blood Mitochondrial DNA Content Is Related to Insulin Sensitivity in Offspring of Type 2 Diabetic Patients. Diabetes Care. 2001;24: 865–869.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIyNC81Lzg2NSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzEwLzI3LzIwMjAuMTAuMjMuMjAyMTg1ODYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

16. 16.Weng S-W, Lin T-K, Liou C-W, Chen S-D, Wei Y-H, Lee H-C, et al. Peripheral blood mitochondrial DNA content and dysregulation of glucose metabolism. Diabetes Res Clin Pract. 2009;83: 94–99.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.diabres.2008.10.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19019479&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

17. 17.Liu L-P, Cheng K, Ning M-A, Li H-H, Wang H-C, Li F, et al. Association between peripheral blood cells mitochondrial DNA content and severity of coronary heart disease. Atherosclerosis. 2017;261: 105–110.
    
    
18. 18.Robin ED, Wong R. Mitochondrial DNA molecules and virtual number of mitochondria per cell in mammalian cells. J Cell Physiol. 1988;136: 507–513.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jcp.1041360316&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3170646&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1988Q306400015&link_type=ISI) 

19. 19.Maianski NA, Geissler J, Srinivasula SM, Alnemri ES, Roos D, Kuijpers TW. Functional characterization of mitochondria in neutrophils: a role restricted to apoptosis. Cell Death Differ. 2004;11: 143–153.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.cdd.4401320&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14576767&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000188126000003&link_type=ISI) 

20. 20.Zharikov S, Shiva S. Platelet mitochondrial function: from regulation of thrombosis to biomarker of disease. Biochem Soc Trans. 2013;41: 118–123.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoicHBiaW9zdCI7czo1OiJyZXNpZCI7czo4OiI0MS8xLzExOCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzEwLzI3LzIwMjAuMTAuMjMuMjAyMTg1ODYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

21. 21.Cai N, Chang S, Li Y, Li Q, Hu J, Liang J, et al. Molecular signatures of major depression. Curr Biol. 2015;25: 1146–1156.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cub.2015.03.008&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25913401&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

22. 22.Cai N, Li Y, Chang S, Liang J, Lin C, Zhang X, et al. Genetic Control over mtDNA and Its Relationship to Major Depressive Disorder. Curr Biol. 2015;25: 3170–3177.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cub.2015.10.065&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00036723&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

23. 23.Curran JE, Johnson MP, Dyer TD, Göring HHH, Kent JW, Charlesworth JC, et al. Genetic determinants of mitochondrial content. Hum Mol Genet. 2007;16: 1504–1514.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddm101&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17468176&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000248084900012&link_type=ISI) 

24. 24.Pajukanta P, Terwilliger JD, Perola M, Hiekkalinna T, Nuotio I, Ellonen P, et al. Genomewide scan for familial combined hyperlipidemia genes in finnish families, suggesting multiple susceptibility loci influencing triglyceride, cholesterol, and apolipoprotein B levels. Am J Hum Genet. 1999;64: 1453–1463.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/302365&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10205279&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000080093000026&link_type=ISI) 

25. 25.Pajukanta P, Allayee H, Krass KL, Kuraishy A, Soro A, Lilja HE, et al. Combined analysis of genome scans of dutch and finnish families reveals a susceptibility locus for high-density lipoprotein cholesterol on chromosome 16q. Am J Hum Genet. 2003;72: 903–917.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/374177&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12638083&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000181972600011&link_type=ISI) 

26. 26.Locke AE, Steinberg KM, Chiang CWK, Service SK, Havulinna AS, Stell L, et al. Exome sequencing of Finnish isolates enhances rare-variant association power. Nature. 2019;572: 323–328.
    
    
27. 27.Kaess B, Fischer M, Baessler A, Stark K, Huber F, Kremer W, et al. The lipoprotein subfraction profile: heritability and identification of quantitative trait loci. J Lipid Res. 2008;49: 715–723.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjg6IjQ5LzQvNzE1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMjcvMjAyMC4xMC4yMy4yMDIxODU4Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

28. 28.Weiss LA, Pan L, Abney M, Ober C. The sex-specific genetic architecture of quantitative traits in humans. Nat Genet. 2006;38: 218–222.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng1726&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16429159&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000234953200018&link_type=ISI) 

29. 29.Wang X, Angelis N, Thein SL. MYB - A regulatory factor in hematopoiesis. Gene. 2018;665: 6–17.
    
    
30. 30.Pandit RA, Svasti S, Sripichai O, Munkongdee T, Triwitayakorn K, Winichagoon P, et al. Association of SNP in exon 1 of HBS1L with hemoglobin F level in beta0-thalassemia/hemoglobin E. Int J Hematol. 2008;88: 357–361.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12185-008-0167-3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18839276&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

31. 31.Thein SL, Menzel S, Peng X, Best S, Jiang J, Close J, et al. Intergenic variants of HBS1L-MYB are responsible for a major quantitative trait locus on chromosome 6q23 influencing fetal hemoglobin levels in adults. Proceedings of the National Academy of Sciences. 2007;104: 11346–11351.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTA0LzI3LzExMzQ2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMjcvMjAyMC4xMC4yMy4yMDIxODU4Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

32. 32.Ganesh SK, Zakai NA, van Rooij FJA, Soranzo N, Smith AV, Nalls MA, et al. Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Nat Genet. 2009;41: 1191–1198.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.466&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19862010&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000271247600011&link_type=ISI) 

33. 33.Menzel S, Jiang J, Silver N, Gallagher J, Cunningham J, Surdulescu G, et al. The HBS1L-MYB intergenic region on chromosome 6q23.3 influences erythrocyte, platelet, and monocyte counts in humans. Blood. 2007;110: 3624–3626.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTI6ImJsb29kam91cm5hbCI7czo1OiJyZXNpZCI7czoxMToiMTEwLzEwLzM2MjQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8xMC8yNy8yMDIwLjEwLjIzLjIwMjE4NTg2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

34. 34.Lin BD, Carnero-Montoro E, Bell JT, Boomsma DI, de Geus EJ, Jansen R, et al. 2SNP heritability and effects of genetic variants for neutrophil-to-lymphocyte and platelet-to-lymphocyte ratio. J Hum Genet. 2017;62: 979–988.
    
    
35. 35.Stadhouders R, Aktuna S, Thongjuea S, Aghajanirefah A, Pourfarzad F, van Ijcken W, et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J Clin Invest. 2014;124: 1699–1710.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1172/JCI71520&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24614105&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

36. 36.Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19: 149–150.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/19.1.149&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12499305&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000180463900021&link_type=ISI) 

37. 37.Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91: 224–237.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2012.06.007&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22863193&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

38. 38.Zhou J, Zhu T, Hu C, Li H, Chen G, Xu G, et al. Comparative genomics and function analysis on BI1 family. Comput Biol Chem. 2008;32: 159–162.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.compbiolchem.2008.01.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18440869&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

39. 39.Zhao G-N, Zhang P, Gong J, Zhang X-J, Wang P-X, Yin M, et al. Tmbim1 is a multivesicular body regulator that protects against non-alcoholic fatty liver disease in mice and monkeys by targeting the lysosomal degradation of Tlr4. Nat Med. 2017;23: 742–752.
    
    
40. 40.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12: e1001779.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1001779&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25826379&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

41. 41.Millard LAC, Davies NM, Gaunt TR, Davey Smith G, Tilling K. Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank. Int J Epidemiol. 2018;47: 29–35.
    
    
42. 42.Neale B. Neale Lab UK Biobank Analysis. Available: [http://www.nealelab.is/uk-biobank/](http://www.nealelab.is/uk-biobank/)
    
    
43. 43.Welty FK, Alfaddagh A, Elajami TK. Targeting inflammation in metabolic syndrome. Transl Res. 2016;167: 257–280.
    
    
44. 44.Creely SJ, McTernan PG, Kusminski CM, Fisher ff M, Da Silva NF, Khanolkar M, et al. Lipopolysaccharide activates an innate immune system response in human adipose tissue in obesity and type 2 diabetes. Am J Physiol Endocrinol Metab. 2007;292: E740–7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1152/ajpendo.00302.2006&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17090751&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000244722000012&link_type=ISI) 

45. 45.Hotamisligil GS. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542: 177–185.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature21363&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28179656&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

46. 46.Laakso M, Kuusisto J, Stančáková A, Kuulasmaa T, Pajukanta P, Lusis AJ, et al. The Metabolic Syndrome in Men study: a resource for studies of metabolic and cardiovascular diseases. J Lipid Res. 2017;58: 481–493.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjg6IjU4LzMvNDgxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMjcvMjAyMC4xMC4yMy4yMDIxODU4Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

47. 47.Borodulin K, Tolonen H, Jousilahti P, Jula A, Juolevi A, Koskinen S, et al. Cohort Profile: The National FINRISK Study. Int J Epidemiol. 2018;47: 696–696i.
    
    
48. 48. Picard Toolkit. Broad Institute, GitHub repository; 2019. Available: [http://broadinstitute.github.io/picard/](http://broadinstitute.github.io/picard/)
    
    
49. 49.McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48: 1279–1283.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3643&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27548312&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

50. 50.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43: 491–498.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.806&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21478889&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000289972600023&link_type=ISI) 

51. 51.Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants. Bioinformatics. 2015;31: 2202–2204.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv112&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25701572&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

52. 52.Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26: 2867–2873.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq559&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20926424&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000283919800010&link_type=ISI) 

53. 53.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–842.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq033&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20110278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275243500019&link_type=ISI) 

54. 54.Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet. 2012;91: 597–607.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2012.08.005&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23040492&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

55. 55.Kloss-Brandstätter A, Pacher D, Schönherr S, Weissensteiner H, Binna R, Specht G, et al. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum Mutat. 2011;32: 25–32.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/humu.21382&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20960467&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

56. 56.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88: 76–82.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2010.11.011&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21167468&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

57. 57.Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88: 294–305.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2011.02.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21376301&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000288589000007&link_type=ISI) 

58. 58.Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AAE, Lee SH, et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat Genet. 2015;47: 1114–1120.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3390&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26323059&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

59. 59.Evans LM, Tahmasbi R, Vrieze SI, Abecasis GR, Das S, Gazal S, et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat Genet. 2018;50: 737–745.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0108-x&link_type=DOI) 

60. 60.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46: 310–315.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2892&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24487276&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

61. 61.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17: 122.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-016-0974-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27268795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

62. 62.Palmer TM, Lawlor DA, Harbord RM, Sheehan NA, Tobias JH, Timpson NJ, et al. Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat Methods Med Res. 2012;21: 223–242.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0962280210394459&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21216802&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

63. 63.Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33: 1–22.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1467-9868.2005.00503.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20808728&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275203200001&link_type=ISI) 

64. 64.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4: 7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25722852&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 

65. 65.Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol. 2013;42: 1134–1144.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyt093&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24062299&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325167800030&link_type=ISI) 

66. 66.Beasley TM, Erickson S, Allison DB. Rank-based inverse normal transformations are increasingly used, but are they merited? Behav Genet. 2009;39: 580–595.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10519-009-9281-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19526352&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268995200013&link_type=ISI) 

67. 67.Blom G. Statistical estimates and transformed beta-variables. Almqvist & Wiksell. 1958. Available: [http://www.diva-portal.org/smash/record.jsf?pid=diva2:516729](http://www.diva-portal.org/smash/record.jsf?pid=diva2:516729)
    
    
68. 68.van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, Articles. 2011;45: 1–67.
    
    
69. 69.Honaker J, King G, Blackwell M. Amelia II: A Program for Missing Data. Journal of Statistical Software, Articles. 2011;45: 1–47.
    
    
70. 70.Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39: 417–420.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyp334&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19926667&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000276303800019&link_type=ISI) 

71. 71.VanderWeele TJ, Tchetgen Tchetgen EJ, Cornelis M, Kraft P. Methodological challenges in mendelian randomization. Epidemiology. 2014;25: 427–435.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/EDE.0000000000000081&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24681576&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000337313300013&link_type=ISI) 

72. 72.Storey JD, Bass AJ, Dabney A, Robinson D. qvalue: Q-value estimation for false discovery rate control. 2019. Available: [http://github.com/jdstorey/qvalue](http://github.com/jdstorey/qvalue)
    
    
73. 73.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12: 996–1006.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjEyLzYvOTk2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMjcvMjAyMC4xMC4yMy4yMDIxODU4Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

74. 74. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449: 851–861.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature06258&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17943122&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F27%2F2020.10.23.20218586.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000250230600036&link_type=ISI)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif
 [4]: /embed/inline-graphic-4.gif
 [5]: /embed/inline-graphic-5.gif
 [6]: /embed/inline-graphic-6.gif
 [7]: /embed/inline-graphic-7.gif
 [8]: /embed/inline-graphic-8.gif
 [9]: /embed/inline-graphic-9.gif