Machine learning enables new insights into clinical significance of and genetic contributions to liver fat accumulation ======================================================================================================================= * Mary E. Haas * James P. Pirruccello * Samuel N. Friedman * Connor A. Emdin * Veeral H. Ajmera * Tracey G. Simon * Julian R. Homburger * Xiuqing Guo * Matthew Budoff * Kathleen E. Corey * Alicia Y. Zhou * Anthony Philippakis * Patrick T. Ellinor * Rohit Loomba * Puneet Batra * Amit V. Khera ## Abstract Excess accumulation of liver fat – termed hepatic steatosis when fat accounts for > 5.5% of liver content – is a leading risk factor for end-stage liver disease and is strongly associated with important cardiometabolic disorders. Using a truth dataset of 4,511 UK Biobank participants with liver fat previously quantified via abdominal MRI imaging, we developed a machine learning algorithm to quantify liver fat with correlation coefficients of 0.97 and 0.99 in hold-out testing datasets and applied this algorithm to raw imaging data from an additional 32,192 participants. Among all 36,703 individuals with abdominal MRI imaging, median liver fat was 2.2%, with 6,250 (17%) meeting criteria for hepatic steatosis. Although individuals afflicted with hepatic steatosis were more likely to have been diagnosed with conditions such as obesity or diabetes, a prediction model based on clinical data alone without imaging could not reliably estimate liver fat content. To identify genetic drivers of variation in liver fat, we first conducted a common variant association study of 9.8 million variants, confirming three known associations for variants in the *TM6SF2, APOE*, and *PNPLA3* genes and identifying five new variants associated with increased hepatic fat in or near the *MARC1, ADH1B, TRIB1, GPAM* and *MAST3* genes. A polygenic score that integrated information from each of these eight variants was strongly associated with future clinical diagnosis of liver diseases. Next, we performed a rare variant association study in a subset of 11,021 participants with gene sequencing data available, identifying an association between inactivating variants in the *APOB* gene and substantially lower LDL cholesterol, but more than 10-fold increased risk of steatosis. Taken together, these results provide proof of principle for the use of machine learning algorithms on raw imaging data to enable epidemiologic studies and genetic discovery. ## Introduction Hepatic steatosis – a condition defined by liver fat content > 5.5% – is a leading risk factor for chronic liver disease and is strongly associated with a range of cardiometabolic conditions (Allen et al., 2018; Caussy et al., 2018; Lorbeer et al., 2017; Speliotes et al., 2010). Recent studies have suggested a prevalence of up to 25% across global populations, with rates rapidly increasing in step with the global epidemics of obesity and diabetes (Loomba and Sanyal, 2013; Younossi et al., 2019). Although the condition is frequently undiagnosed within clinical practice, an existing evidence base indicates that avoidance of excessive alcohol intake, weight loss strategies including bariatric surgery, and emerging pharmacologic therapies can each reduce liver fat and prevent progression to more advanced liver disease (Chalasani et al., 2018). Previous studies related to hepatic steatosis suggest that systematic quantification in large cohorts may prove important in identifying new biologic insights or enabling enhanced clinical care, but suffer from important limitations. First, the traditional approach dichotomizes individuals with hepatic steatosis into ‘nonalcoholic fatty liver disease (NAFLD)’ or alcoholic fatty liver disease according to a largely arbitrary threshold of > 21 drinks per week in men and > 14 drinks per week in women (Chalasani et al., 2018; Sanyal et al., 2011). Second, studies on the prevalence or clinical significance of hepatic steatosis have often been based on non-quantitative ultrasound assessments or physician diagnosis codes, both of which are known to introduce imprecision into downstream analyses (Alexander et al., 2018; Sanyal, 2018; Zhang et al., 2018). Third, common variant association studies (CVAS), which enable systematic comparison of liver fat for individuals stratified by many common variants scattered across the genome, have been hampered by the time-consuming nature of quantifying liver fat from abdominal CT or MRI images and thus have analyzed only up to 14,440 individuals (Speliotes et al., 2011; Kozlitina et al., 2014; Parisinos et al., 2020). By comparison, a recent CVAS of body-mass index – a quantitative trait more easily measured in clinical practice – analyzed up to 681,275 individuals (Yengo et al., 2018). Based on these prior results, three key areas of uncertainty remain. First, the extent to which a machine learning algorithm can be trained to accurately quantify liver fat in a large group of individuals warrants additional study. Second, the association of clinical risk factors with rates of hepatic steatosis, as well as the ability to predict liver fat content without direct imaging, have not been fully characterized. Third, whether an expanded set of individuals with precise liver fat quantification can enable new genetic discoveries using CVAS or rare variant association studies (RVAS) is largely unknown. Here we address each of these areas of uncertainty by studying 36,703 middle-aged UK Biobank participants with extensive linked imaging, genetic, and clinical data, the largest such study to date (Figure 1). We develop a machine learning algorithm that precisely quantifies liver fat using raw abdominal MRI images, achieving correlation coefficients of 0.97 and 0.99 in holdout testing datasets. Using this data, we quantify significantly increased rates of hepatic steatosis among key participant subgroups, such as those with obesity or afflicted by diabetes. Genetic analysis identified 8 common DNA variants at genome-wide levels of statistical prevalence or clinical significance of hepatic steatosis have often been based on non-significance, 5 of which were previously unreported, and rare variants in the gene encoding apolipoprotein B (*APOB*) that associate with significantly increased liver fat and rates of steatosis. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/03/2020.09.03.20187195/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/F1) Figure 1. Machine learning enables liver fat quantification, clinical, and genetic analyses. Starting with 4,511 individuals with previously-quantified liver fat, a machine learning model was created to estimate liver fat in an additional 32,192 individuals. Of the 36,703 individuals with resulting liver fat quantification, 17% met criteria for hepatic steatosis. 1.6% had liver fat > 20% (not shown in density plot). A common variant association study (CVAS) identified eight loci, of which five are newly-identified relative to previous CVAS (top Manhattan plot). By contrast, none of these newly-identified variants were identified in a CVAS of those with liver fat previously quantified (bottom Manhattan plot). A rare variant association study identified one gene, *APOB*, in which inactivating variants were significantly associated with liver fat and rates of steatosis. ## Results ### Machine learning model enables near-perfect quantification of hepatic fat based on raw abdominal MRI imaging To study liver fat in 36,073 UK Biobank participants, we first developed a machine learning algorithm that allowed for precise quantification based on raw abdominal MRI imaging data. We downloaded available imaging data and processed them within a cloud-based computational environment, leveraging a truth dataset of 4,511 participants with liver fat previously quantified by Perspectum Diagnostics (Wilman et al., 2017). Using a two-stage method with deep convolutional neural networks (see methods for details), we trained a machine learning algorithm to quantify liver fat that achieved near-perfect quantification (correlation coefficients 0.97 and 0.99 in hold-out testing datasets in the two stages, Supp. Figure 1). Having trained and validated this algorithm, we next applied this model to quantify liver fat in the remaining 32,192 UK Biobank participants with raw images only. ### Liver fat is strongly associated with cardiometabolic diseases, but cannot be reliably predicted within clinical practice without direct imaging Across all 36,703 participants studied, median liver fat was 2.2% and 6,250 (17.0%) had liver fat > 5.5% consistent with hepatic steatosis. Mean age at time of imaging was 64 years (range 45 to 82) and 52% were female (Supp. Table 1). Liver fat was significantly increased in male participants versus female (median 2.7 versus 2.0%), those who reported alcohol consumption in excess of current clinical guidelines (median 2.6 versus 2.2%), and those with diagnosed diabetes (median 4.9 versus 2.2%); p-values for each comparison < 0.0001. As expected, median liver fat was significantly higher among 93 individuals with a diagnosis of nonalcoholic fatty liver disease (NAFLD) in the electronic health record as compared to the remainder of the population, median 8.6 versus 2.2% respectively (Figure 2). Consistent with this observation, 56 of 93 (60%) of those diagnosed with NAFLD met imaging-based criteria for hepatic steatosis versus 6,194 of 36,610 (17%) in the remainder of the population, corresponding to an adjusted odds ratio of 7.67 (95%CI 5.03 to 11.70; p-value < 0.0001). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/03/2020.09.03.20187195/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/F2) Figure 2. Associations of clinical parameters with liver fat percentage and hepatic steatosis identified by imaging in 36,703 individuals. The distribution of liver fat and prevalence of hepatic steatosis in the presence of A) NAFLD physician diagnosis, B) high waist-to-hip ratio and C) obesity was assessed. Hepatic steatosis was defined as liver fat > 5.5%. High waist-to-hip ratio was defined as greater than 0.9 if male and greater than 0.85 if female at time of imaging. Weight categories were defined using body-mass-index (BMI) at time of imaging: underweight, BMI< 18.5 kg/m2; normal, 18.5≤BMI< 25 kg/m2; overweight, 25≤BMI< 30 kg/m2; obese, 30≤BMI< 40 kg/m2; severely obese, BMI≥40 kg/m2. By stratifying individuals according to presence of hepatic steatosis, we observe significant enrichment of cardiometabolic risk factors in those with high liver fat (Table 1). For example, 13.8% of those with steatosis had been diagnosed with diabetes as compared to 3.6% of those in the remainder (adjusted odds ratio 4.21; 95%CI 3.83 to 4.63; p-value < 0.0001), and 45.1% of those with steatosis had been diagnosed with hypertension as compared to 27.1% of those in the remainder (adjusted odds ratio 2.24; 95%CI 2.11 to 2.37; p-value < 0.0001). With respect to biomarkers, circulating triglycerides, liver-associated aminotransferases and glycemic indices were each significantly increased in those with evidence of steatosis. View this table: [Table 1:](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/T1) Table 1: Baseline characteristics of UK Biobank participants with quantified liver fat stratified by presence of hepatic steatosis Despite the correlation of liver fat with cardiometabolic risk factors, practicing clinicians would not be able to reliably estimate liver fat without direct imaging assessment. As an example, the Pearson correlation coefficient between body mass index – the most commonly used surrogate for adiposity – and liver fat was 0.48, but a broad range of observed values were observed across categorizations used in clinical practice (Figure 2). For those with severe obesity (body-mass index ≥ 40 kg/m2), median liver fat was 9.8% and 107 of 361 (70%) met clinical criteria for steatosis, but measured liver fat ranged widely from 0.5 to 31.5%. Even for those with normal weight in whom median liver fat was 1.6%, 470 of 14,307 (3%) nonetheless had imaging evidence of hepatic steatosis. Similarly, only 4,854 of 17,730 (27%) with a high waist-to-hip ratio, a measure of central adiposity, had liver fat consistent with hepatic steatosis. To further assess the ability to predict liver fat using variables available in clinical practice, we built a beta regression model using 25 variables, including measures of adiposity and blood lipids and biomarkers of liver injury and assessed its predictive accuracy in held out participants, noting a Pearson correlation coefficient of only 0.45 (Supp. Figure 2). ### A common variant association study identifies 8 genetic variants – including 5 newly-identified – associated with increased liver fat We first confirmed prior studies noting a significant inherited component to liver fat (Speliotes et al., 2011; Palmer et al., 2013; Loomba et al., 2015), estimating that up to 33% of the observed variance is explained by measured genetic variants when considered in aggregate using the BOLT-REML method (Loh et al., 2015a2015a). To identify the specific variants most strongly contributing to this heritability, we performed a common variant association study (CVAS), assessing the relationship of each of 9.8 million common (minor allele frequency ≥ 1%) DNA variants and liver fat percentage. Given that 97% of individuals with liver fat quantified were of European ancestry (Supp. Table 1) and the potential for small numbers of individuals of distinct ancestries to introduce confounding by population stratification, we restricted our CVAS to 32,976 individuals of European ancestries using a combination of self-reported ethnicity and genetic principal component analysis (Bycroft et al., 2018, see Methods). Moreover, because standard CVAS algorithms include an assumption that the residuals of the phenotype are normally distributed, we applied an inverse-normal transformation to liver fat residuals prior to CVAS resulting in a Gaussian distribution with mean of zero and standard deviation of one. The CVAS identified 8 loci in which common genetic variants were associated with increased liver fat, including 5 not previously identified at genome-wide levels of statistical significance (Figure 3, Table 2). We confirm and extend prior observations for missense variants in the *TM6SF2, PNPLA3*, and *APOE* genes. Within units of the inverse-normal transformed liver fat variable, the impact of these variants ranged from +0.12 to +0.29 standard deviations, corresponding to impact on raw liver fat percentage ranging from 0.51 to 1.37% and odds ratio of hepatic steatosis per allele of 1.40 to 1.90. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/03/2020.09.03.20187195/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/F3) Figure 3. Genome-wide association study of liver fat in UK Biobank identifies 8 loci. The lead variant at each of 8 genome-wide significant loci are indicated by orange points. Gray line indicates genome-wide significance threshold (p< 5×10−8). View this table: [Table 2:](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/T2) Table 2: 8 common DNA variants associated with liver fat indices Genetic variants in the five newly-identified loci included: the p.T165A missense variant in the gene encoding Mitochondrial Amidoxime Reducing Component 1 (*MARC1*), which we and others recently identified as associated with end-stage liver disease (Emdin et al., 2020; Innes et al., 2020; Luukkonen et al., 2020); the p.H48R missense variant in the gene encoding Alcohol Dehydrogenase 1B (Class I), Beta Polypeptide (*ADH1B*), which plays a key role in ethanol oxidation to acetaldehyde, is thought to contribute to aversion to alcohol intake and was previously associated with alcohol consumption (Bierut et al., 2012; Walters et al., 2018); an intergenic variant near the gene encoding Tribbles-Like Protein 1 (*TRIB1*), which is known to associate with circulating triglycerides and play a role in hepatic lipogenesis (Kathiresan et al., 2008; Burkhardt et al., 2010; Bauer et al., 2015); an intronic variant in the gene encoding glycerol-3-phosphate acyltransferase, mitochondrial (*GPAM*), previously linked to liver triglyceride content in murine overexpression and knockout experiments (Hammond et al., 2002; Lindén et al., 2006); and an intronic variant in the gene encoding microtubule associated serine/threonine kinase 3 (*MAST3*), previously linked to inflammatory bowel disease (Labbé et al., 2008) but with as-yet unknown role in liver fat metabolism. Among these five newly-identified variants, impact on liver fat percentage ranged from 0.18 to 0.51% and odds ratio for hepatic steatosis per allele ranged from 1.09 to 1.37 (Table 2). The expansion of individuals with liver fat quantification from 4,038 to 32,976 individuals using machine learning played a key role in enabling the CVAS. Taking the most strongly associated variant – the p.I148M missense variant in *PNPLA3* – as an example, the p-value for association decreased from 2×10−20 when performing a CVAS in only 4,038 individuals to 5.6×10−95 when using 32,976 participants. Moreover, although each of the five newly-identified variants had directionally-consistent evidence of association in a CVAS of 4,038 individuals (pvalues ranging from 0.16 to 3.6×10−4) none met the standard threshold for statistical significance of p = 5×10−8. Beyond variants meeting genome-wide levels of statistical significance, we next sought to replicate additional variants previously reported to affect liver fat or risk of NAFLD (Supp. Table 3). A missense variant in the gene encoding the glucokinase receptor (*GCKR*) (Speliotes et al., 2011; Palmer et al., 2013; Parisinos et al., 2020) was strongly associated with liver fat with subthreshold level of statistical significance (p = 4.1×10−7), as was a variant near the gene encoding Membrane Bound O-Acyltransferase Domain Containing 7 (*MBOAT7*; p = 8.8×10−6; Buch et al., 2015; Mancina et al., 2016). Consistent with prior reports suggesting that an inactivating variant in the gene encoding hydroxysteroid 17-beta dehydrogenase 13 (*HSD17B13*) relates more strongly to progression from steatosis to more advanced forms of liver disease, we did not observe an association with liver fat in our study population (p = 0.40, Abul-Husn et al., 2018; Ma et al., 2019; Gellert-Kristensen et al., 2020). Given a known important role of alcohol intake on liver fat, we next performed two sensitivity analyses: first, we repeated the CVAS after exclusion of 3,013 individuals who reported alcohol consumption in excess of current clinical recommendations or who reported having stopped drinking alcohol; and second, we repeated the CVAS including self-reported number of alcoholic drinks consumed per week as a covariate. In both cases, results for the 8 variants identified were highly consistent, suggesting that these variants have a largely consistent impact on liver fat independent of alcohol consumption (Supp. Table 4). To replicate the CVAS associations in additional independent cohorts, we analyzed liver fat as assessed by an alternate imaging modality (computed tomography) in 3,284 participants of the Framingham Heart Study and 4,195 participants of the Multi-Ethnic Study of Atherosclerosis (MESA) study. In the Framingham Heart Study, the average age at time of imaging was 52 and 48% were female; in MESA, the average age was 61 and 51% were female. Although the computed tomography measures of hepatic fat based on liver attenuation cannot be directly converted to liver fat percentage, all 8 variants’ associations were directionally consistent and 5 were nominally significant (p< 0.05, Supp. Table 5). Beyond association with liver fat indices, we assessed for additional validation of the variants identified by CVAS using liver biomarkers and clinical diagnoses entered into the medical record. In UK Biobank, we analyzed up to 362,910 UK Biobank participants, excluding those included in the abdominal MRI substudy. We first determined associations with the blood biomarkers alanine aminotransferase (ALT) and aspartate aminotransferase (AST). All eight variants were robustly associated with increased ALT, and 7 of the 8 variants were associated with increased AST at nominal levels of statistical significance (p < 0.05, Supp. Table 6). We next examined association of the CVAS variants with a recorded clinical diagnosis of NAFLD or nonalcoholic steatohepatitis (NASH), a more advanced form of fatty liver disease that additionally includes significant liver inflammation, in both the UK Biobank and the Mass General Brigham Biobank, a hospital-based biorepository (Karlson et al., 2016). 2,225 of 362,910 participants in the UK Biobank, and 4,129 of 30,573 participants of the Mass General Brigham Biobank had been diagnosed with NAFLD or NASH. In a meta-analysis of these two studies, 7 of the 8 variants were strongly associated with increased risk, with odds ratio ranging from 1.08 to 1.43 (p < 0.004 for each; Supp. Table 7). The remaining variant, rs56252442 near *MAST3*, was directionally consistent but did not achieve statistical significance (odds ratio 1.02; 95% CI 0.98 to 1.07; p = 0.32), and we thus consider this association a preliminary result warranting further replication. ### A polygenic score comprised of 8 common genetic variants is strongly associated with chronic liver diseases Recognizing that each of the 8 common variants have modest individual impact on liver fat percentage or risk of steatosis, we next integrated information from each of them into a weighted polygenic score. Within the discovery study population of 32,976 UK Biobank individuals, this polygenic score explained 3.5% of the observed variance. To determine the relationship of this polygenic score to chronic liver diseases, we calculated it in 362,096 UK Biobank participants who were *not* included in the liver fat imaging substudy and had not been diagnosed with liver disease at time of enrollment. The polygenic score was strongly associated with a new diagnosis code of NAFLD entered into the medical record during follow-up, with a hazard ratio per standard deviation score increment (HR/SD) of 1.33 (95%CI 1.27 to 1.39, Figure 4). Individuals who carried a diagnosis of NAFLD had a median polygenic score in the 61st percentile of the distribution as compared to the 50th percentile for those in the remainder of the population. Beyond NAFLD, the polygenic score was additionally associated with an increased risk of more advanced forms of liver disease: nonalcoholic steatohepatitis (HR/SD 1.66), cirrhosis (HR/SD 1.40), and hepatocellular carcinoma (HR/SD 1.69). Based on prior observations of an association between liver disease risk-increasing alleles of variants in the *PNPLA3* and *TM6SF2* genes and decreased cholesterol (Liu et al., 2017), we determined the relationship of the polygenic score to circulating LDL cholesterol. Each SD increment in the score was associated with a 1.9 mg/dl (95%CI 1.8 to 2.0; p = 1.5×10−244) decrease in measured LDL cholesterol concentrations, illustrating a tradeoff rooted in rates of hepatic cholesterol secretion with potentially important implications for drug development. ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/03/2020.09.03.20187195/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/F4) Figure 4: Association between a polygenic score comprised of 8 common DNA variants and risk of liver diseases. The 8-variant polygenic score was used to predict new liver-related disease diagnoses in 362,096 UK Biobank participants who were not included in the discovery CVAS of individuals with imaging data. A) Hazard ratios (HR) of incident disease per 1 standard deviation (SD) increase in the polygenic score; error bars represent 95% confidence intervals (CI). B) Rates of incident disease in each decile of the polygenic score; error bars represent 95% binomial distribution confidence intervals. ### Rare variant association study identifies inactivating variants in *APOB* and *MTTP* that increase liver fat For the subset of 11,021 UK Biobank participants with both liver fat quantified and gene sequencing available, we next investigated whether rare inactivating DNA variants might similarly impact liver fat or risk of steatosis. Because such inactivating mutations do not occur with adequate frequency to detect individual variant-phenotype relationships, we performed a ‘collapsing burden’ rare variant association study (RVAS). In this approach, the observed liver fat for carriers of any inactivating variant for a given gene is compared to individuals without such an inactivating variant. This analysis was restricted to 3,384 genes with at least 10 carriers of inactivating mutations observed, resulting in a Bonferroni-corrected p-value of statistical significance of 1.48×10−5 (p = 0.05/3,384). Inactivating variants in the gene encoding apolipoprotein B (*APOB*) – known to play a key role in lipid homeostasis – were associated with substantially increased liver fat. Using units of inverse-normal transformed hepatic fat, values were 1.58 standard deviations higher (95%CI 0.91 to 2.24; p = 3.25 × 10−6) among 11 carriers of inactivating mutations versus 11,010 individuals without such a variant. This corresponded to a median liver fat of 9.7% versus 2.1% for carriers and noncarriers respectively, and an odds ratio for hepatic steatosis among carriers of 10.8 (95%CI 2.8 to 41.4; p = 5.3×10−4) (Figure 5). ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/03/2020.09.03.20187195/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/09/03/2020.09.03.20187195/F5) Figure 5. Rare variant association study of liver fat in 11,120 individuals. A) QQ plot with significance threshold indicated (0.05/3384 genes tested = 1.48×10−5). B) Liver fat distribution in carriers and noncarriers of inactivating variants in the *APOB* or *MTTP* genes. C) Prevalence of hepatic steatosis (liver fat > 5.5%) in carriers and noncarriers of inactivating variants in *APOB* or *MTTP*. Significant prior human genetic and pharmacologic data implicate APOB in hepatic fat accumulation. Individuals with two inactivating mutations in *APOB* (‘human knockouts’) suffer from the Mendelian condition familial hypobetalipoproteinemia, characterized by near absent levels of circulating apolipoprotein B and LDL cholesterol but increased rates of hepatic steatosis (Lee and Hegele, 2013). Similarly, pharmacologic knockdown of the *APOB* gene using the antisense oligonucleotide mipomersen is approved for the treatment of severe hypercholesterolemia, but is infrequently used within clinical practice owing to high rates of hepatic steatosis observed in clinical trials. To further characterize the phenotypic consequences of inactivating variants in *APOB*, we analyzed an expanded set of 42,260 UK Biobank participants with gene sequencing data available (regardless of availability of abdominal MRI imaging data). Of these 42,260 individuals, 64 (0.15%) harbored an inactivating variant in *APOB*. Biomarkers of liver injury were increased in these individuals: 31% higher alanine aminotransferase and 14% higher aspartate aminotransferase, p-values 1.7×10−5 and 0.006 respectively (Supp. Table 8). In contrast to higher values of aminotransferases, carriers of inactivating *APOB* variants had markedly lower levels of circulating lipoproteins: 40% lower apolipoprotein B, 44% lower LDL cholesterol, and B C 40% lower triglycerides (p < 0.0001 for each). This reduction in circulating lipids was associated with a 75% reduction in risk of coronary artery disease, which although not reaching statistical significance (p = 0.15), is consistent with our recent report of a 72% lower risk of coronary artery disease (p = 0.002) in a much larger dataset (Peloso et al., 2019). Across all 3,348 genes, inactivating variants in the gene encoding microsomal triglyceride transfer protein (*MTTP*) demonstrated the second strongest strength of association. Using units of inverse-normal transformed hepatic fat, values were 1.17 standard deviations higher (95%CI 0.62 to 1.72; p = 3.14 × 10−5) among 16 carriers of inactivating mutations versus 11,005 individuals without such a variant. This corresponded to a median liver fat percentage of 6.7% versus 2.1% for carriers and noncarriers respectively, and an odds ratio for hepatic steatosis of 8.5 (95%CI 2.9 to 24.8; p = 8.5×10−5) for carriers (Figure 5). Although the association of inactivating variants in *MTTP* with liver fat did not reach our prespecified levels of statistical significance, this observation is highly consistent with known biology. MTTP plays a central role in secretion of apolipoprotein B-containing lipoproteins from the liver. Individuals with two inactivating *MTTP* variants suffer from the Mendelian disorder abetalipoproteinemia, a condition characterized by absence of circulating apolipoprotein B and increased rates of hepatic steatosis (Berriot-Varoqueaux et al., 2000; Sharp et al., 1993). As was noted for APOB inhibition, a pharmacologic inhibitor of MTTP activity is approved for treatment of severe hypercholesterolemia, but clinical use is limited by increased hepatic fat observed after administration (Cuchel et al., 2007). Consistent with prior data suggesting that inactivating *MTTP* variants impact circulating lipids only when both copies are affected via a recessive inheritance pattern (Lee and Hegele, 2013), the individuals we identified with a single inactivating variant did not have significantly reduced lipid concentrations (Supp. Table 8). ## Discussion Our analysis describing quantification of liver fat in 36,703 middle-aged individuals using a machine learning algorithm trained on a small subset with previously quantified values has several implications for both biologic discovery and clinical medicine. First, the near-perfect estimation of liver fat enabled by a high-throughput machine learning algorithm confirms and extends prior such efforts and is likely to be broadly generalizable across a diverse spectrum of important phenotypes. Within hold-out testing datasets, our model-based liver fat assessment was highly correlated with liver fat previously quantified by a commercial vendor, with correlation coefficients of 0.97 and 0.99. Previous efforts have similarly documented feasibility of using a convolutional neural net framework to automate liver fat quantification using either CT or MRI images obtained in clinical practice (Wang et al., 2019). Beyond liver phenotypes, we recently validated a machine learning model to quantify the diameter of the aorta – the major vessel supplying oxygen-rich blood to the body – using cardiac MRI imaging data, enabling discovery of 93 associated genetic variants (Pirruccello et al., 2020). Taken together, these and other studies (Meyer et al., 2020) suggest that machine learning approaches to rapidly quantify phenotypes of interest in rich imaging datasets are likely to yield important new scientific insights, particularly if extended to complex features derived from dynamic tissues such as a beating heart, or latent phenotypes not currently measured within clinical practice. Second, we demonstrate that, although correlated with many cardiometabolic traits, liver fat cannot be readily predicted using information readily available in clinical practice. Our large-scale study confirmed significantly increased liver fat in important clinical subgroups such as those with diabetes or those with severe obesity. These observations suggest that future research might validate a clinical prediction tool – potentially including a polygenic score – that identifies a subgroup of individuals in whom screening for hepatic steatosis is warranted, or those with known steatosis who are most likely to progress to cirrhosis (Ajmera et al., 2018). Even if not performed within the context of focused screening, abdominal imaging is very common across a wide range of clinical indications. Application of a machine learning algorithm to alert ordering clinicians of an incidental finding of hepatic steatosis may enable measures that prevent progression to more advanced liver disease (Chalasani et al., 2018). This approach has proven useful in identifying individuals with subclinical atherosclerosis on chest CT imaging, and reporting as an incidental finding is now recommended by current practice guidelines (Hecht et al., 2017; Pakdaman et al., 2017). Third, a common variant association study enabled by our machine learning algorithm more than doubled the number of significantly associated variants in our study from 3 to 8. None of the five newly-identified variants were identified using the subset of 4,038 individuals with liver fat quantified without machine learning. We note compelling biology underlying most of the observed variants, and provide proof-of-concept that combining several most strongly associated variants into a polygenic score was associated with risk of liver-related diseases. By providing the full set of summary association statistics for our CVAS at the time of publication, we hope to enable additional downstream research. Fourth, a rare variant association study, despite a relatively small sample size of 11,021 individuals with both liver fat and gene sequencing data available, identified relationships between inactivating variants in *APOB* and *MTTP* with liver fat. These observations recapitulate the results observed in pharmacologic studies focused on APOB or MTTP inhibition as a treatment for hypercholesterolemia: those with inactivating variants in *APOB* had strikingly lower lipid concentrations, but this came at the expense of increased aminotransferase concentrations and a more than 10.8-fold increase in rates of hepatic steatosis. Given that liver biomarker elevations or increased hepatic fat are commonly observed adverse reactions to novel drug candidates – in many cases leading to termination of drug development programs – our approach to using genetics to predict hepatotoxicity may prove valuable. Moreover, our results suggest that a subset of candidate treatment strategies for hepatic steatosis may have adverse effects on increasing circulating lipids. Seen in this light, prioritization of some drug targets, such as MARC1, where genetic studies suggest inhibition will protect against liver disease without adversely increasing cholesterol concentrations or risk of cardiovascular disease (Emdin et al., 2020), may be warranted. Our results should be interpreted within the context of several potential limitations. First, participants of the UK Biobank imaging study tend to be healthier than the general population and the majority are of European ancestries; additional research is needed to confirm generalizability and transethnic portability. Second, diagnostic codes entered into the electronic health record were used to study the relationship between a clinical diagnosis of NAFLD and liver fat parameters based on imaging; such codes are known to be imprecise within the context of clinical care. Future studies involving manual chart review or using biopsy-confirmed cases of liver disease are warranted. Third, because imaging occurred very recently, we were not able to explore the relationship between liver fat assessment and future risk of cardiometabolic disease endpoints. In conclusion, we applied a machine learning algorithm to quantify liver fat in 36,703 participants of the UK Biobank, identifying 17% of the population with evidence of hepatic steatosis despite lack of a recorded clinical diagnosis of fatty liver disease, and enabling new genetic discoveries with potential important implications for identifying new mechanistic pathways underlying risk for liver disease. ## Data Availability Summary statistics for the liver fat CVAS, as well as the machine learning model architectures and learned weights will be available at the Cardiovascular Disease Knowledge Portal ([http://broadcvdi.org/home/portalHome](http://broadcvdi.org/home/portalHome)) and the ML4CVD modeling framework will be available via GitHub repository at time of publication. ## Author Contributions M.E.H., J.P.P. and S.N.F. conceived of the study. M.E.H., J.P.P., S.N.F. and C.A.E. conducted analyses. M.E.H., J.P.P., S.N.F., and A.V.K. wrote the paper. All other authors contributed to the analysis plan or provided critical revisions. ## Declaration of Interests J.P.P. has served as a consultant for Maze Therapeutics. R.L. serves as a consultant or advisory board member for Arrowhead Pharmaceuticals, AstraZeneca, Boehringer-Ingelheim, Bristol-Myer Squibb, Celgene, Cirius, CohBar, Galmed, Gemphire, Gilead, Glympse bio, Intercept, Ionis, Inipharma, Merck, Metacrine, Inc., NGM Biopharmaceuticals, Novo Nordisk, Pfizer, and Viking Therapeutics. In addition, his institution has received grant support from Allergan, Boehringer-Ingelheim, Bristol-Myers Squibb, Eli Lilly and Company, Galmed Pharmaceuticals, Genfit, Gilead, Intercept, Janssen, Madrigal Pharmaceuticals, NGM Biopharmaceuticals, Novartis, Pfizer, pH Pharma, and Siemens. He is also co-founder of Liponexus, Inc. J.R.H and A.Y.Z. are employees of Color Genomics. K.E.C. serves on the advisory boards of Novo Nordisk and BMS, has consulted for Gilead and has received grant funding from BMS, Boehringer-Ingelheim and Novartis. T.G.S. has served as a consultant for Aetion. A.P. is employed as a Venture Partner at GV, a venture capital group within Alphabet; he is also supported by a grant from Bayer AG to the Broad Institute focused on machine learning for clinical trial design. S.N.F and P.B. are supported by grants from Bayer AG and IBM applying machine learning in cardiovascular disease. P.B. has served as a consultant to Novartis. P.T.E. is supported by a grant from Bayer AG to the Broad Institute focused on the genetics and therapeutics of cardiovascular diseases. P.T.E. has also served on advisory boards or consulted for Bayer AG, Quest Diagnostics, MyoKardia and Novartis. A.V.K. has served as a consultant to Sanofi, Medicines Company, Maze Pharmaceuticals, Navitor Pharmaceuticals, Verve Therapeutics, Amgen, and Color; received speaking fees from Illumina, MedGenome, and the Novartis Institute for Biomedical Research; received a sponsored research agreement from the Novartis Institute for Biomedical Research, and reports a pending patent related to a genetic risk predictor (20190017119). ## Methods ### Study cohorts #### UK Biobank The UK Biobank is a prospective cohort study that enrolled 502,617 individuals aged 40–69 years of age from across the United Kingdom (Sudlow et al., 2015). As part of the study protocol, a subset of individuals underwent detailed imaging between years 2014 and 2019 including abdominal MRI (Littlejohns et al., 2020). We first analyzed 36,703 UK Biobank participants with abdominal MRI imaging. The UK Biobank abdominal imaging protocol was first performed with gradient echo imaging; 4,511 participants had liver fat originally quantified by Perspectum Diagnostics as previously described (Wilman et al., 2017). Beginning in 2018, imaging was switched to the “iterative decomposition of water and fat with echo asymmetry and least-squares estimation” (IDEAL) protocol. A subset of participants underwent both imaging protocols. We next performed a common variant association study (CVAS) of liver fat on a subset of 32,976 participants. We excluded samples that had no imputed genetic data, a genotyping call rate < .98, a mismatch between submitted and inferred sex, sex chromosome aneuploidy (neither XX nor XY), exclusion from kinship inference, excessive third-degree relatives, or that were outliers in heterozygosity or genotype missingness rates, all of which were previously defined centrally by the UK Biobank (Bycroft et al., 2018). Due to the small percentage of samples of non-European ancestries (Supp. Table 1), to avoid artifacts from population stratification we restricted our GWAS to samples of European ancestries (determined as previously reported (Haas et al., 2018) via self-reported ancestry of British, Irish, or other white and outlier detection using the R package *aberrant*). We did not remove related individuals from this analysis as we used a linear mixed model able to account for cryptic relatedness in common variant association studies (Loh et al., 2015b). To further validate the common variants associated with liver fat in the CVAS, we studied association of single variants as well as a composite 8-variant polygenic risk score (PRS) with liver disease and/or blood biomarkers alanine aminotransferase (ALT) and aspartate aminotransferase (AST) in 362,910 individuals in the UK Biobank who did not undergo imaging and therefore were not part of the discovery cohort. Sample quality control was performed by excluding samples that had no imputed genetic data, a genotyping call rate < 0.95, a mismatch between submitted and inferred sex, sex chromosome aneuploidy, exclusion from kinship inference, excessive third-degree relatives, or that were outliers in heterozygosity or genotype missingness rates, and restricting to samples of European ancestries. We additionally removed one of each pair of related individuals (2nd degree or closer, KING coefficient > 0.0884), and those which were part of the liver fat CVAS to avoid sample overlap, resulting in 362,910 individuals. To assess the relationship of rare inactivating variants with liver fat and related traits, we studied the subset of UK Biobank participants with exome sequencing data available. Sample quality control was performed by excluding samples that had no imputed genetic data, a genotyping call rate < 0.95, a mismatch between submitted and inferred sex, sex chromosome aneuploidy, exclusion from kinship inference, excessive third-degree relatives, or that were outliers in heterozygosity or genotype missingness rates, and restricting to samples of European ancestries as well as removing one of each pair of related individuals (2nd degree or closer, KING coefficient > 0.0884). We first analyzed the relationship between rare inactivating variants and liver fat in 11,021 individuals with both whole exome sequencing and abdominal MRI imaging data available. Next, to understand the relationship between inactivating variants in two genes, *APOB* and *MTTP*, and related biomarkers and disease states, we analyzed the full set of up to 42,260 participants with exome sequencing data available. #### Framingham Heart Study The Framingham Heart Study is a multigenerational prospective study that enrolled individuals free of cardiovascular disease beginning in 1948. 3,284 individuals in the Offspring and Third Generation cohorts (enrollment beginning in 1971 and 2002, respectively; Kannel et al., 1979; Splansky et al., 2007) with genotype data available underwent multidetector abdominal CT for liver fat quantification (Speliotes et al., 2008) and were included in analyses described below. #### Multi-Ethnic Study of Atherosclerosis (MESA) The Multi-Ethnic Study of Atherosclerosis (MESA) study is a prospective cohort that enrolled individuals free of cardiovascular disease between 2000 and 2002 (Bild et al., 2002). 4,195 individuals with genotype data available underwent multidetector CT for liver fat quantification (Zeb et al., 2012) and had genetic data available and were used in analyses described below. #### Mass General Brigham Biobank Mass General Brigham Biobank is a hospital-based biorepository with genetic data linked to clinical records (Karlson et al., 2016). Patients were defined as having NAFLD or NASH according to diagnosis codes in the electronic health care record (Supp. Table 2) and were compared to controls without such diagnoses as described below. #### Informed Consent and Study Approval The UK Biobank study was approved by the Research Ethics Committee (reference 16/NW/0274) and informed consent was obtained from all participants. Analysis of UK Biobank data was conducted under application 7089 and was approved by the Mass General Brigham institutional review board (protocol 2013P001840). Framingham Heart Study and MESA genotype and phenotype data were retrieved for analysis from NCBI dbGAP under procedures approved by the Mass General Brigham institutional review board (protocol 2016P002395). Mass General Brigham Biobank participants each provided written informed consent and analysis was approved by the Mass General Brigham institutional review board. ### Liver fat quantification in UK Biobank participants using a new machine learning algorithm To determine liver fat percentage from abdominal magnetic resonance images, we used 2D Convolutional Neural Networks (CNNs) to estimate liver fat percentage from abdominal MRI in 38,706 individuals. The imaging protocol in UK Biobank was switched from gradient echo to IDEAL mid-study, and liver fat was previously quantified by Perspectum Diagnostics only in individuals imaged using the gradient echo protocol (Wilman et al., 2017). To be able to infer liver fat from both protocols, we therefore used a two-model approach with “teacher-student” models. The “teacher” model was a 2D CNN trained on 3,198 of the 4,412 individuals who underwent the gradient echo imaging protocol and had the full set of 10 standard images. The truth data for this model were liver fat values previously quantified by Perspectum Diagnostics from gradient echo imaging protocols which were made available to UK Biobank researchers. Validation was assessed in a held-out set of 1,214 participants with truth data. Liver fat values for the remaining 5,496 participants with gradient echo imaging were estimated using this model. To estimate liver fat in participants imaged using the IDEAL protocol, we also trained a “student” model in the participants who had undergone both the gradient echo and IDEAL imaging protocols. This model was also a 2D CNN, trained on 1,054 of the 1,437 overlapping samples. The truth data for this model was liver fat in the gradient echo protocol, which was inferred from the “teacher” model. The remaining 383 overlapping samples that were not used for training were used to validate the student model. Liver fat values for the remaining 28,595 participants with IDEAL imaging were inferred using this model. In total, we estimated liver fat for 34,281 participants with these models. Performance on the held-out test set is shown for each model in Supp. Figure 1. The 2D CNNs were optimized with backpropagation and Adaptive Moment stochastic gradient descent (ADAM). The models were implemented in tensorflow version 2.1 using the ML4CVD modeling framework (Sarma et al., 2020). The python package hyperopt was used for Bayesian hyperparameter optimization of the model architecture to select the width, depth, activation function, and the size of each residual block in the CNN. The final architecture consisted of two layers of convolution followed by three residual blocks of 2 convolutions in parallel whose outputs are concatenated and max-pooled reducing the size of the representation by a factor of 4 after each block. The output of the final convolutional block is flattened and processed by two fully-connected layers and finally fed to the output regression neuron. All non-linear activations functions in the model are rectified linear units. To combine the previously-quantified liver fat and results of the two models, we first used the previously-quantified liver fat estimates provided by the UK Biobank where available, including the n = 205 individuals who had a nonstandard number of gradient echo images and were not used in the ‘teacher’ model generation. When previously-quantified liver fat was unavailable, we preferentially used the liver fat estimates from the teacher model. When teacher model liver fat estimates were unavailable, we used the liver fat estimates from the student model. For subsequent analyses of liver fat, we filtered to 36,703 individuals in UK Biobank with genetic data, BMI measurements and liver imaging available. Final sources of liver fat were: n = 4,511 previously-quantified, n = 4,971 estimated from gradient echo protocol, n = 27,221 estimated from IDEAL protocol. ### Association of liver fat with patient baseline and clinical characteristics Baseline characteristics of the 36,703 UK Biobank participants are shown in Supp. Table 1. Number of weekly drinks was defined in American standard drinks (1 drink = 14 g ethanol) according to the conversions: red or white wine, 0.84 drinks/glass; beer, 1.29 drinks/pint; liquor, 0.68 drinks/measure; fortified wine, 0.7 drinks/glass; other alcohol, 1 drink/glass. For participants who reported consuming alcohol monthly rather than weekly, monthly alcohol consumption was converted to weekly by multiplying by 0.23 (7×(365/12)). Excessive alcohol consumption was defined according to AASLD guidelines (Chalasani et al., 2012): greater than 14 weekly drinks if female or greater than 21 weekly drinks if male). Physician diagnosis of NAFLD and NAFLD/NASH were defined using ICD10 codes K76.0 or K76.0+K75.8, respectively. Other diseases were defined according to a combination of self-report, procedure codes and ICD codes (Supp. Table 2). Hepatic steatosis was defined as liver fat > 5.5%, as determined previously for UK Biobank using the original previously quantified liver fat values (Wilman et al., 2017). High waist-to-hip ratio was defined as greater than 0.9 if male and greater than 0.85 if female. Weight categories were defined using BMI: underweight, BMI< 18.5 kg/m2; normal, 18.5≤BMI< 25 kg/m2; overweight, 25≤BMI< 30 kg/m2; obese, 30≤BMI< 40 kg/m2; severely obese, BMI≥40 kg/m2. To determine the clinical/anthropometric influences (sex, excessive alcohol consumption, physician diagnosis of NAFLD, physician diagnosis of diabetes) on median liver fat, or the effects of hepatic steatosis on median triglycerides, we performed median regression. Similarly, we used logistic regression to evaluate the effects of physician diagnosis of NAFLD on hepatic steatosis, and hepatic steatosis on diabetes or hypertension diagnosis. In both median and logistic regression, we included sex, birth year, age at imaging, age at imaging2 and MRI machine serial number as covariates. We calculated the Pearson correlation coefficient to assess the relationship between liver fat and BMI at time of imaging. We also wanted to assess how imaging-based quantification of liver fat compared to predicting liver fat using clinical and anthropometric measurements. Liver fat percent is well-modeled by a beta distribution, as it is a series of proportions in the interval (0,1) (Ferrari and Cribari-Neto, 2004). We therefore constructed a beta regression model of liver fat using these measurements in the derivation and testing sets used to construct the ML model. We selected available anthropometrics, biomarkers associated with metabolic function and liver function or injury, as well as measurements of total body or abdominal fat available in UK Biobank. Lipid measures were adjusted for lipid-lowering medication use and blood pressure was adjusted for anti-hypertensive medication use, as previously described (Ehret et al., 2016; Patel et al., 2020). Missing values were imputed using *aregImpute* in the R package *Hmisc*. Only traits which were nominally (p< 0.05) associated with truth liver fat in univariable analysis were included in the beta regression model (Supp. Figure 2). Variables which were not associated with liver fat and were therefore excluded from the beta regression model were total bilirubin, direct bilirubin and indirect bilirubin. We constructed a variable beta regression model using 3,176 of the 4,412 individuals who underwent the gradient echo imaging protocol and had the full set of standard images. The truth data for this model were liver fat values previously quantified by Perspectum Diagnostics. The variable beta regression model was constructed using the *betareg* package in R, optimizing the mean and precision link functions to logit and log, respectively, using AIC & BIC comparisons. Performance of the model was evaluated by the Pearson correlation between truth liver fat and predicted liver fat in a held-out testing dataset of 1,196 individuals who also had liver fat previously quantified by Perspectum Diagnostics (Supp. Figure 2). ### Genotyping and quality control UK Biobank samples were genotyped on either the UK BiLEVE or UK Biobank Axiom arrays, then imputed into the Haplotype Reference Consortium and UK10K + 1000 Genomes panels. We excluded genotyped variants with call rate < 0.95, imputed variants with INFO score < 0.3, and imputed or genotyped variants with minor allele frequency less than 0.001 in the UK Biobank population. Variant positions were denoted in GRCh37/hg19 coordinates. ### Phenotype transformation Because liver fat is not normally distributed and nor are its residuals with respect to clinical covariates, we transformed the input to a rank-based output. This approach is commonly used in GWAS of quantitative traits with skewed distributions (Yang et al., 2012). First, we took the residuals of liver fat in a linear model that included sex, year of birth, age at time of MRI, age at time of MRI squared, genotyping array, MRI device serial number, and the first ten principal components of ancestry. Then, we performed the inverse normal transform on the residuals from this model, yielding a standardized output with mean 0 and standard deviation of 1. ### Common variant association study We performed a CVAS of the inverse normal transformed liver fat residuals in 32,976 individuals, applying linear mixed models with BOLT-LMM (version 2.3.4) to account for ancestry, cryptic population structure, and sample relatedness (Loh et al., 2015b). The default European linkage disequilibrium panel provided with BOLT was used. Variants with BOLT-LMM P < 5 × 10−8 were considered to be genome-wide significant. Loci were defined by 2 MB windows (1 MB distance from the most-significant variant in either direction). The most strongly associated variant at each locus is referred to as the lead variant. We determined the effects of each of the eight lead variants on liver fat % and presence of hepatic steatosis (liver fat > 5.5%) using linear and logistic regression, respectively, in the same 32,976 individuals in the CVAS, adjusting for sex, year of birth, age at time of MRI, age at time of MRI squared, genotyping array, MRI device serial number, and the first ten principal components of ancestry. We repeated this CVAS in the subset of 4,038 individuals with previously-quantified liver fat who passed the CVAS sample quality control. We replicated the CVAS findings in the Framingham Heart Study and the Multi-Ethnic Study of Atherosclerosis (MESA). In the Framingham cohort (Offspring Cohort and Third Generation Cohort), we examined whether the 8 variants associate with hepatic steatosis on CT imaging. Genotyping was imputed to the HapRef consortium using the Michigan Imputation Server. After imputation, variants with allele frequency less than 0.01% and those with an imputation score of less than 0.3 were excluded from analysis. 3284 individuals in Framingham with genotype data available underwent multidetector abdominal CT (Speliotes et al., 2008). Hepatic steatosis was measured by computing the liver-to-phantom ratio of the average Hounsfield units of three liver measurements to average Hounsfield units of three phantom measurements (to correct for inter-individual differences in penetration), as previously described (Speliotes et al., 2008). We tested the association of all 8 variants with liver-to-phantom ratio with adjustment for age, sex and ten principal components of ancestry using a linear mixed model (BOLT-LMM) to control for relatedness among individuals. Liver fat measurements were inverse normal rank transformed prior to analysis. In the Multi-Ethnic Study of Atherosclerosis cohort (MESA), genotypes were imputed to the HapRef consortium using the Michigan Imputation Server. After imputation, variants with allele frequency less than 0.01% and those with an imputation score of less than 0.3 were excluded from analysis. Hepatic steatosis was measured as the mean of three attenuation measurements, two in the right lobe of the liver and one in the left lobe (Zeb et al., 2012), without use of phantom measurement normalization. We therefore tested the association of the top CVAS variants with mean liver attenuation with adjustment for age, sex and ten principal components of ancestry. Mean liver attenuation was subject to inverse normal transformation prior to analysis. Liver fat measurements were inverse normal rank transformed prior to analysis. Linear regression with adjustment for age, sex and five principal components of ancestry was used to test for the association of genetic variants with liver fat. PLINK was used for analysis. Individuals with higher liver fat have lower liver-to-phantom ratios and liver attenuation measurements. For interpretability, we therefore reported replication estimates in units of standard deviation increases in liver fat, with a one standard deviation increase in liver fat corresponding to a one standard deviation decrease in the liver-to-phantom ratio or mean liver attenuation. We examined the association of the top CVAS variants with blood biomarkers alanine aminotransferase (ALT) and aspartate aminotransferase (AST) in UK Biobank using linear regression of each biomarker (in U/L) adjusting for sex, year of birth, age at enrollment and age at enrollment squared, genotyping array and the first ten principal components of ancestry. We also examined the association of the top CVAS variants with physician diagnosis of NAFLD or NASH in UK Biobank and Mass General Brigham Biobank. Disease definitions are provided in Supp. Table 2. In UK Biobank, association of each top CVAS variant was assessed using logistic regression of disease status with sex, year of birth, age at enrollment and age at enrollment squared, genotyping array and the first ten principal components of ancestry as covariates. In the Mass General Brigham Biobank, genotyping was performed using an Illumina MEGA array. Variants were imputed to the HapRef consortium using the Michigan Imputation Server. Variants with multinucleotide alleles and those with call rate less than 90% were excluded prior to imputation. After imputation, variants with allele frequency less than 0.01% and those with an imputation score of less than 0.3 were excluded from analysis. Association of each top CVAS variant was assessed using logistic regression of disease status with age, sex and five principal components of ancestry as covariates. ### Polygenic score analysis The 8 lead variants were scored in additive fashion based on number of liver-fat increasing variants identified weighted by their CVAS effect size estimates, producing a single score for each individual. We tested for association between the score and incident disease occurrence after UK Biobank enrollment using a Cox model in the same set of individuals used to test associations between single CVAS variants and NAFLD/NASH. We excluded individuals who had any of the four diseases investigated, resulting in 362,096 participants in the analysis. We focused on the association of the score with liver diseases; given previously reported association of liver fat variants with circulating lipids (Liu et al., 2017), we also examined association of the score with circulating LDL cholesterol using linear regression. LDL cholesterol was adjusted for lipid-lowering medication use as previously described (Patel et al., 2020); liver disease definitions are listed in Supp. Table 2. All PRS analyses were adjusted for age at enrollment, sex, the first five principal components of ancestry, and genotyping array. We also quantified proportion of individuals who developed each disease during study follow-up stratified by PRS decile. ### Rare variant association study In the subset of individuals with whole exome sequencing available, we identified rare (minor allele frequency < 0.001) inactivating variants in each gene. Sequencing data from the “Functionally Equivalent” gene sequencing dataset ([https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=170](https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?id=170)) was annotated using the Ensembl Variant Effect Predictor (VEP) software (version 96.0) LOFTEE. LOFTEE applies a set of filters to identify high-confidence inactivating variants based on predicted impact on the resulting transcript. High-confidence inactivating variants include those predicted to cause premature truncation of a protein (nonsense), insertions or deletions (indels) of DNA that scramble protein translation beyond the variant site (frameshift), or point mutations at sites of pre-messenger ribonucleic acid splicing that alter the splicing process (splice-site). We aggregated the inactivating variants identified within each gene into a rare variant burden analysis: individuals were considered as an inactivating variant carrier for a particular gene if they had one or more loss-of-function mutations in the gene, and a non-carrier otherwise. We tested the association of inactivating variant carrier status for each gene with inverse normal transformed liver fat as previously described (see Phenotype Transformation) using linear regression with the first ten principal components of ancestry as covariates. We removed genes with fewer than 10 inactivating variant carriers to increase the likelihood of having sufficient statistical power to detect an effect, leaving 3,384 genes in the analysis. To determine the effects of *APOB* or *MTTP* inactivating variants on blood biomarkers or disease outcomes, we used linear or logistic regression, respectively, adjusting for sex, year of birth, age at enrollment and age at enrollment squared, genotyping array and the first ten principal components of ancestry. Lipid measures were adjusted for lipid-lowering medication use as previously described (Patel et al., 2020). ### Data availability Summary statistics for the liver fat CVAS, as well as the machine learning model architectures and learned weights will be made available at the Cardiovascular Disease Knowledge Portal ([http://broadcvdi.org/home/portalHome](http://broadcvdi.org/home/portalHome)) and the ML4CVD modeling framework will be made available via GitHub repository at time of publication. ## Acknowledgements This research has been conducted using the UK Biobank resource, application 7089. Funding support was provided by NIH grants 1K08HG010155 (to A.V.K.) from the National Human Genome Research Institute, 1R01HL092577, R01HL128914, K24HL105780 (to P.T.E), R01HL071739 (to M.B.) from the National Heart, Lung and Blood Institute, 5P42ES010337 (to R.L.) from the National Institute of Environmental Health Sciences, 5UL1TR001442 (to R.L.) from the National Center for Advancing Translational Sciences, R01DK106419, P30DK120515 (to R.L.), K23 DK122104 to (to T.G.S.) from the National Institute of Diabetes and Digestive and Kidney Diseases, CA170674P2 (to R.L.) from the Department of Defense’s Peer Reviewed Cancer Research Program, a Hassenfeld Scholar Award from Massachusetts General Hospital (to A.V.K.), a Merkin Institute Fellowship from the Broad Institute of MIT and Harvard (to A.V.K.), a John S LaDue Memorial Fellowship (to J.P.P.) a sponsored research agreement from IBM Research (to A.P., A.V.K.), American Association for the Study of Liver Diseases Foundation Clinical and Translational Research Awards (to V.A. and T.G.S.). MESA and the MESA SHARe projects are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, and supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at [http://www.mesa-nhlbi.org](http://www.mesa-nhlbi.org). * Received September 3, 2020. * Revision received September 3, 2020. * Accepted September 3, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. Abul-Husn, N.S., Cheng, X., Li, A.H., Xin, Y., Schurmann, C., Stevis, P., Liu, Y., Kozlitina, J., Stender, S., Wood, G.C., et al. (2018). A Protein-Truncating HSD17B13Variant and Protection from Chronic Liver Disease. N. Engl. J. Med. 378, 1096–1106. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1712191&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29562163&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 2. Ajmera, V., Park, C.C., Caussy, C., Singh, S., Hernandez, C., Bettencourt, R., Hooker, J., Sy, E., Behling, C., Xu, R., et al. (2018). Magnetic Resonance Imaging Proton Density Fat Fraction Associates With Progression of Fibrosis in Patients With Nonalcoholic Fatty Liver Disease. Gastroenterology 155, 307–310.e2. 3. Alexander, M., Loomis, A.K., Fairburn-Beech, J., van der Lei, J., Duarte-Salles, T., Prieto-Alhambra, D., Ansell, D., Pasqua, A., Lapi, F., Rijnbeek, P., et al. (2018). Real-world data reveal a diagnostic gap in non-alcoholic fatty liver disease. BMC Med. 16, 130. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12916-018-1103-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30099968&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 4. Allen, A.M., Therneau, T.M., Larson, J.J., Coward, A., Somers, V.K., and Kamath, P.S. (2018). Nonalcoholic fatty liver disease incidence and impact on metabolic burden and death: A 20 year-community study. Hepatol. Baltim. Md 67, 1726–1736. 5. Bauer, R.C., Sasaki, M., Cohen, D.M., Cui, J., Smith, M.A., Yenilmez, B.O., Steger, D.J., and Rader, D.J. (2015). Tribbles-1 regulates hepatic lipogenesis through posttranscriptional regulation of C/EBPα. J. Clin. Invest. 125, 3809–3818. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1172/JCI77095&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26348894&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 6. Berriot-Varoqueaux, N., Aggerbeck, L.P., Samson-Bouma, M., and Wetterau, J.R. (2000). The role of the microsomal triglygeride transfer protein in abetalipoproteinemia. Annu. Rev. Nutr. 20, 663–697. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev.nutr.20.1.663&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10940349&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000089332200027&link_type=ISI) 7. Bierut, L.J., Goate, A.M., Breslau, N., Johnson, E.O., Bertelsen, S., Fox, L., Agrawal, A., Bucholz, K.K., Grucza, R., Hesselbrock, V., et al. (2012). ADH1B is associated with alcohol dependence and alcohol consumption in populations of European and African ancestry. Mol. Psychiatry 17, 445–450. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/mp.2011.124&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21968928&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000302173900011&link_type=ISI) 8. Bild, D.E., Bluemke, D.A., Burke, G.L., Detrano, R., Diez Roux, A.V., Folsom, A.R., Greenland, P., Jacob, D.R., Kronmal, R., Liu, K., et al. (2002). Multi-Ethnic Study of Atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwf113&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12397006&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000179035100012&link_type=ISI) 9. Buch, S., Stickel, F., Trépo, E., Way, M., Herrmann, A., Nischalke, H.D., Brosch, M., Rosendahl, J., Berg, T., Ridinger, M., et al. (2015). A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. Nat. Genet. 47, 1443–1448. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3417&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26482880&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 10. Burkhardt, R., Toh, S.-A., Lagor, W.R., Birkeland, A., Levin, M., Li, X., Robblee, M., Fedorov, V.D., Yamamoto, M., Satoh, T., et al. (2010). Trib1 is a lipid- and myocardial infarction– associated gene that regulates hepatic lipogenesis and VLDL production in mice. J. Clin. Invest. 120, 4410–4414. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1172/JCI44213&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21084752&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000284971400026&link_type=ISI) 11. Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L.T., Sharp, K., Motyer, A., Vukcevic, D., Delaneau, O., O’Connell, J., et al. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 12. Caussy, C., Reeder, S.B., Sirlin, C.B., and Loomba, R. (2018). Noninvasive, Quantitative Assessment of Liver Fat by MRI-PDFF as an Endpoint in NASH Trials. Hepatol. Baltim. Md 68, 763–772. 13. Chalasani, N., Younossi, Z., Lavine, J.E., Diehl, A.M., Brunt, E.M., Cusi, K., Charlton, M., and Sanyal, A.J. (2012). The diagnosis and management of non-alcoholic fatty liver disease: Practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Hepatol. Baltim. Md 55, 2005–2023. 14. Chalasani, N., Younossi, Z., Lavine, J.E., Charlton, M., Cusi, K., Rinella, M., Harrison, S.A., Brunt, E.M., and Sanyal, A.J. (2018). The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatol. Baltim. Md 67, 328–357. 15. Cuchel, M., Bloedon, L.T., Szapary, P.O., Kolansky, D.M., Wolfe, M.L., Sarkis, A., Millar, J.S., Ikewaki, K., Siegelman, E.S., Gregg, R.E., et al. (2007). Inhibition of microsomal triglyceride transfer protein in familial hypercholesterolemia. N. Engl. J. Med. 356, 148–156. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa061189&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17215532&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000243373700007&link_type=ISI) 16. Ehret, G.B., Ferreira, T., Chasman, D.I., Jackson, A.U., Schmidt, E.M., Johnson, T., Thorleifsson, G., Luan, J., Donnelly, L.A., Kanoni, S., et al. (2016). The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nat. Genet. 48, 1171–1184. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3667&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27618452&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 17. Emdin, C.A., Haas, M.E., Khera, A.V., Aragam, K., Chaffin, M., Klarin, D., Hindy, G., Jiang, L., Wei, W.-Q., Feng, Q., et al. (2020). A missense variant in Mitochondrial Amidoxime Reducing Component 1 gene and protection against liver disease. PLoS Genet. 16, e1008629. 18. Feitosa, M.F., Wojczynski, M.K., North, K.E., Zhang, Q., Province, M.A., Carr, J.J., and Borecki, I.B. (2013). The ERLIN1-CHUK-CWF19L1 gene cluster influences liver fat deposition and hepatic inflammation in the NHLBI Family Heart Study. Atherosclerosis 228, 175–180. 19. Ferrari, S., and Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. J. Appl. Stat. 31, 799–815. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/0266476042000214501&link_type=DOI) 20. Gellert-Kristensen, H., Nordestgaard, B.G., Tybjærg Hansen, A., and Stender, S. (2020). High Risk of Fatty Liver Disease Amplifies the Alanine Transaminase–Lowering Effect of a HSD17B13 Variant. Hepatology 71, 56–66. 21. Haas, M.E., Aragam, K.G., Emdin, C.A., Bick, A.G., Hemani, G., Davey Smith, G., Kathiresan, S., and Pressure, I.C. for B. (2018). Genetic Association of Albuminuria with Cardiometabolic Disease and Blood Pressure. Am. J. Hum. Genet. 103, 461–473. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.08.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30220432&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 22. Hammond, L.E., Gallagher, P.A., Wang, S., Hiller, S., Kluckman, K.D., Posey-Marcos, E.L., Maeda, N., and Coleman, R.A. (2002). Mitochondrial glycerol-3-phosphate acyltransferase deficient mice have reduced weight and liver triacylglycerol content and altered glycerolipid fatty acid composition. Mol. Cell. Biol. 22, 8204–8214. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoibWNiIjtzOjU6InJlc2lkIjtzOjEwOiIyMi8yMy84MjA0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMDMvMjAyMC4wOS4wMy4yMDE4NzE5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. Hecht, H.S., Cronin, P., Blaha, M.J., Budoff, M.J., Kazerooni, E.A., Narula, J., Yankelevitz, D., and Abbara, S. (2017). 2016 SCCT/STR guidelines for coronary artery calcium scoring of noncontrast noncardiac chest CT scans: A report of the Society of Cardiovascular Computed Tomography and Society of Thoracic Radiology. J. Thorac. Imaging 32, W54–W66. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/RTI.0000000000000287&link_type=DOI) 24. Innes, H., Buch, S., Hutchinson, S., Guha, I.N., Morling, J.R., Barnes, E., Irving, W., Forrest, E., Pedergnan, V., Goldberg, D., et al. (2020). Genome-wide Association Study for Alcohol-related Cirrhosis Identifies Risk Loci in MARC1 and HNRNPUL1. Gastroenterology . 25. Kannel, W.B., Feinleib, M., McNamara, P.M., Garrison, R.J., and Castelli, W.P. (1979). An investigation of coronary heart disease in families. The Framingham offspring study. Am. J. Epidemiol. 110, 281–290. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.aje.a112813&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=474565&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1979HL66700007&link_type=ISI) 26. Karlson, E.W., Boutin, N.T., Hoffnagle, A.G., and Allen, N.L. (2016). Building the Partners HealthCare Biobank at Partners Personalized Medicine: Informed Consent, Return of Research Results, Recruitment Lessons and Operational Considerations. J. Pers. Med. 6. 27. Kathiresan, S., Melander, O., Guiducci, C., Surti, A., Burtt, N.P., Rieder, M.J., Cooper, G.M., Roos, C., Voight, B.F., Havulinna, A.S., et al. (2008). Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat. Genet. 40, 189–197. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.75&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18193044&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 28. Kozlitina, J., Smagris, E., Stender, S., Nordestgaard, B.G., Zhou, H.H., Tybjaerg-Hansen, A., Vogt, T.F., Hobbs, H.H., and Cohen, J.C. (2014). Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet. 46, 352–356. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2901&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24531328&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 29. Labbé, C., Goyette, P., Lefebvre, C., Stevens, C., Green, T., Tello-Ruiz, M.K., Cao, Z., Landry, A.L., Stempak, J., Annese, V., et al. (2008). MAST3: a novel IBD risk factor that modulates TLR4 signaling. Genes Immun. 9, 602–612. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/gene.2008.57&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18650832&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000260228500004&link_type=ISI) 30. Lee, J., and Hegele, R.A. (2013). Abetalipoproteinemia and homozygous hypobetalipoproteinemia: a framework for diagnosis and management. J. Inherit. Metab. Dis. 37, 333–339. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24288038&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 31. Lindén, D., William-Olsson, L., Ahnmark, A., Ekroos, K., Hallberg, C., Sjögren, H.P., Becker, B., Svensson, L., Clapham, J.C., Oscarsson, J., et al. (2006). Liver-directed overexpression of mitochondrial glycerol-3-phosphate acyltransferase results in hepatic steatosis, increased triacylglycerol secretion and reduced fatty acid oxidation. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 20, 434–443. 32. Littlejohns, T.J., Holliday, J., Gibson, L.M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J.D., Boultwood, C., Collins, R., Conroy, M.C., et al. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat. Commun. 11, 2624. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-15948-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32457287&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 33. Liu, D.J., Peloso, G.M., Yu, H., Butterworth, A.S., Wang, X., Mahajan, A., Saleheen, D., Emdin, C., Alam, D., Alves, A.C., et al. (2017). Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758–1766. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3977&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29083408&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 34. Loh, P.-R., Bhatia, G., Gusev, A., Finucane, H.K., Bulik-Sullivan, B.K., Pollack, S.J., de Candia, T.R., Lee, S.H., Wray, N.R., Kendler, K.S., et al. (2015a). Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis. Nat. Genet. 47, 1385–1392. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3431&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26523775&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 35. Loh, P.-R., Tucker, G., Bulik-Sullivan, B.K., Vilhjálmsson, B.J., Finucane, H.K., Salem, R.M., Chasman, D.I., Ridker, P.M., Neale, B.M., Berger, B., et al. (2015b). Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3190&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25642633&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 36. Loomba, R., and Sanyal, A.J. (2013). The global NAFLD epidemic. Nat. Rev. Gastroenterol. Hepatol. 10, 686–690. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrgastro.2013.171&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24042449&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 37. Loomba, R., Schork, N., Chen, C.-H., Bettencourt, R., Bhatt, A., Ang, B., Nguyen, P., Hernandez, C., Richards, L., Salotti, J., et al. (2015). Heritability of Hepatic Fibrosis and Steatosis Based on a Prospective Twin Study. Gastroenterology 149, 1784–1793. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.gastro.2015.08.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26299412&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 38. Lorbeer, R., Bayerl, C., Auweter, S., Rospleszcz, S., Lieb, W., Meisinger, C., Heier, M., Peters, A., Bamberg, F., and Hetterich, H. (2017). Association between MRI-derived hepatic fat fraction and blood pressure in participants without history of cardiovascular disease. J. Hypertens. 35, 737–744. 39. Luukkonen, P.K., Juuti, A., Sammalkorpi, H., Penttilä, A.K., Orešič, M., Hyötyläinen, T., Arola, J., Orho-Melander, M., and Yki-Järvinen, H. (2020). MARC1 variant rs2642438 increases hepatic phosphatidylcholines and decreases severity of non-alcoholic fatty liver disease in humans. J. Hepatol. 1–2. 40. Ma, Y., Belyaeva, O.V., Brown, P.M., Fujita, K., Valles, K., Karki, S., de Boer, Y.S., Koh, C., Chen, Y., Du, X., et al. (2019). 17-Beta Hydroxysteroid Dehydrogenase 13 Is a Hepatic Retinol Dehydrogenase Associated With Histological Features of Nonalcoholic Fatty Liver Disease. Hepatol. Baltim. Md 69, 1504–1519. 41. Mancina, R.M., Dongiovanni, P., Petta, S., Pingitore, P., Meroni, M., Rametta, R., Borén, J., Montalcini, T., Pujia, A., Wiklund, O., et al. (2016). The MBOAT7-TMC4 Variant rs641738 Increases Risk of Nonalcoholic Fatty Liver Disease in Individuals of European Descent. Gastroenterology 150, 1219–1230.e6. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 42. Meyer, H.V., Dawes, T.J.W., Serrani, M., Bai, W., Tokarczuk, P., Cai, J., de Marvao, A., Henry, A., Lumbers, R.T., Gierten, J., et al. (2020). Genetic and functional insights into the fractal structure of the heart. Nature 584, 589–594. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 43. Pakdaman, M.N., Rozanski, A., and Berman, D.S. (2017). Incidental coronary calcifications on routine chest CT: Clinical implications. Trends Cardiovasc. Med. 27, 475–480. 44. Palmer, n.d., Musani, S.K., Yerges-Armstrong, L.M., Feitosa, M.F., Bielak, L.F., Hernaez, R., Kahali, B., Carr, J.J., Harris, T.B., Jhun, M.A., et al. (2013). Characterization of European ancestry nonalcoholic fatty liver disease-associated variants in individuals of African and Hispanic descent. Hepatol. Baltim. Md 58, 966–975. 45. Parisinos, C.A., Wilman, H.R., Thomas, E.L., Kelly, M., Nicholls, R.C., McGonigle, J., Neubauer, S., Hingorani, A.D., Patel, R.S., Hemingway, H., et al. (2020). Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis. J. Hepatol. 1–41. 46. Patel, A.P., Wang, M., Fahed, A.C., Mason-Suares, H., Brockman, D., Pelletier, R., Amr, S., Machini, K., Hawley, M., Witkowski, L., et al. (2020). Association of Rare Pathogenic DNA Variants for Familial Hypercholesterolemia, Hereditary Breast and Ovarian Cancer Syndrome, and Lynch Syndrome With Disease Risk in Adults According to Family History. JAMA Netw. Open 3, e203959. 47. Peloso, G.M., Nomura, A., Khera, A.V., Chaffin, M., Won, H.-H., Ardissino, D., Danesh, J., Schunkert, H., Wilson, J.G., Samani, N., et al. (2019). Rare Protein-Truncating Variants in APOB, Lower Low-Density Lipoprotein Cholesterol, and Protection Against Coronary Heart Disease. Circ. Genomic Precis. Med. 12, e002376. 48. Pirruccello, J.P., Chaffin, M.D., Fleming, S.J., Arduini, A., Lin, H., Khurshid, S., Chou, E.L., Friedman, S.N., Bick, A.G., Weng, L.-C., et al. (2020). Deep learning enables genetic analysis of the human thoracic aorta. BioRxiv 2020.05.12.091934. 49. Sanyal, A.J. (2018). Putting non-alcoholic fatty liver disease on the radar for primary care physicians: how well are we doing? BMC Med. 16, 148. 50. Sanyal, A.J., Brunt, E.M., Kleiner, D.E., Kowdley, K.V., Chalasani, N., Lavine, J.E., Ratziu, V., and McCullough, A. (2011). Endpoints and clinical trial design for nonalcoholic steatohepatitis. Hepatol. Baltim. Md 54, 344–353. 51. Sarma, G.P., Reinertsen, E., Aguirre, A., Anderson, C., Batra, P., Choi, S.-H., Di Achille, P., Diamant, N., Ellinor, P., Emdin, C., et al. (2020). Physiology as a Lingua Franca for Clinical Machine Learning. Patterns 1, 100017. 52. Sharp, D., Blinderman, L., Combs, K.A., Kienzle, B., Ricci, B., Wager-Smith, K., Gil, C.M., Turck, C.W., Bouma, M.E., and Rader, D.J. (1993). Cloning and gene defects in microsomal triglyceride transfer protein associated with abetalipoproteinaemia. Nature 365, 65–69. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/365065a0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8361539&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 53. Speliotes, E.K., Massaro, J.M., Hoffmann, U., Foster, M.C., Sahani, D.V., Hirschhorn, J.N., O’Donnell, C.J., and Fox, C.S. (2008). Liver fat is reproducibly measured using computed tomography in the Framingham Heart Study. J. Gastroenterol. Hepatol. 23, 894–899. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1440-1746.2008.05420.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18565021&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 54. Speliotes, E.K., Massaro, J.M., Hoffmann, U., Vasan, R.S., Meigs, J.B., Sahani, D.V., Hirschhorn, J.N., O’Donnell, C.J., and Fox, C.S. (2010). Fatty liver is associated with dyslipidemia and dysglycemia independent of visceral fat: the Framingham Heart Study. Hepatol. Baltim. Md 51, 1979–1987. 55. Speliotes, E.K., Yerges-Armstrong, L.M., Wu, J., Hernaez, R., Kim, L.J., Palmer, C.D., Gudnason, V., Eiriksdottir, G., Garcia, M.E., Launer, L.J., et al. (2011). Genome-Wide Association Analysis Identifies Variants Associated with Nonalcoholic Fatty Liver Disease That Have Distinct Effects on Metabolic Traits. PLoS Genet. 7, e1001324–14. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1001324&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21423719&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 56. Splansky, G.L., Corey, D., Yang, Q., Atwood, L.D., Cupples, L.A., Benjamin, E.J., D’Agostino, R.B., Fox, C.S., Larson, M.G., Murabito, J.M., et al. (2007). The Third Generation Cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165, 1328–1335. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwm021&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17372189&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000247005500015&link_type=ISI) 57. Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al. (2015). UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Med. 12, e1001779–10. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1001779&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25826379&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 58. Walters, R.K., Polimanti, R., Johnson, E.C., McClintick, J.N., Adams, M.J., Adkins, A.E., Aliev, F., Bacanu, S.-A., Batzler, A., Bertelsen, S., et al. (2018). Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci. 21, 1656–1669. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41593-018-0275-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30482948&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 59. Wang, K., Mamidipalli, A., Retson, T., Bahrami, N., Hasenstab, K., Blansit, K., Bass, E., Delgado, T., Cunha, G., Middleton, M.S., et al. (2019). Automated CT and MRI Liver Segmentation and Biometry Using a Generalized Convolutional Neural Network. Radiol. Artif. Intell. 1. 60. Wilman, H.R., Kelly, M., Garratt, S., Matthews, P.M., Milanesi, M., Herlihy, A., Gyngell, M., Neubauer, S., Bell, J.D., Banerjee, R., et al. (2017). Characterisation of liver fat in the UK Biobank cohort. PloS One 12, e0172921–14. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0172921&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28241076&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 61. Yang, J., Loos, R.J.F., Powell, J.E., Medland, S.E., Speliotes, E.K., Chasman, D.I., Rose, L.M., Thorleifsson, G., Steinthorsdottir, V., Mägi, R., et al. (2012). FTO genotype is associated with phenotypic variability of body mass index. Nature 490, 267–272. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature11401&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22982992&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000309733300051&link_type=ISI) 62. Yengo, L., Sidorenko, J., Kemper, K.E., Zheng, Z., Wood, A.R., Weedon, M.N., Frayling, T.M., Hirschhorn, J., Yang, J., Visscher, P.M., et al. (2018). Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddy271&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30124842&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 63. Younossi, Z., Tacke, F., Arrese, M., Chander Sharma, B., Mostafa, I., Bugianesi, E., Wai-Sun Wong, V., Yilmaz, Y., George, J., Fan, J., et al. (2019). Global Perspectives on Nonalcoholic Fatty Liver Disease and Nonalcoholic Steatohepatitis. Hepatol. Baltim. Md 69, 2672–2682. 64. Zeb, I., Li, D., Nasir, K., Katz, R., Larijani, V.N., and Budoff, M.J. (2012). Computed tomography scans in the evaluation of fatty liver disease in a population based study: the multi-ethnic study of atherosclerosis. Acad. Radiol. 19, 811–818. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.acra.2012.02.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22521729&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F03%2F2020.09.03.20187195.atom) 65. Zhang, Y.N., Fowler, K.J., Hamilton, G., Cui, J.Y., Sy, E.Z., Balanay, M., Hooker, J.C., Szeverenyi, N., and Sirlin, C.B. (2018). Liver fat imaging—a clinical overview of ultrasound, CT, and MR imaging. Br. J. Radiol. 91.