Comprehensive genetic analysis of the human lipidome identifies novel loci controlling lipid homeostasis with links to coronary artery disease ============================================================================================================================================== * Gemma Cadby * Corey Giles * Phillip E Melton * Kevin Huynh * Natalie A Mellett * Thy Duong * Anh Nguyen * Michelle Cinel * Alex Smith * Gavriel Olshansky * Tingting Wang * Marta Brozynska * Mike Inouye * Nina S McCarthy * Amir Ariff * Joseph Hung * Jennie Hui * John Beilby * Marie-Pierre Dubé * Gerald F Watts * Sonia Shah * Naomi R Wray * Wei Ling Florence Lim * Pratishtha Chatterjee * Ian Martins * Simon M Laws * Tenielle Porter * Michael Vacher * Ashley I Bush * Christopher C Rowe * Victor L Villemagne * David Ames * Colin L Masters * Kevin Taddei * Matthias Arnold * Gabi Kastenmüller * Kwangsik Nho * Andrew J Saykin * Xianlin Han * Rima Kaddurah-Daouk * Ralph N Martins * John Blangero * Peter J Meikle * Eric K Moses ## Abstract We integrated lipidomics and genomics to unravel the genetic architecture of lipid metabolism and identify genetic variants associated with lipid species that are putatively in the mechanistic pathway to coronary artery disease (CAD). We quantified 596 lipid species in serum from 4,492 phenotyped individuals from the Busselton Health Study. In our discovery GWAS we identified 667 independent loci associations with these lipid species (479 novel), followed by meta-analysis and validation in two independent cohorts. Lipid endophenotypes (134) identified for CAD were associated with variation at 186 genomic loci. Associations between independent lipid-loci with coronary atherosclerosis were assessed in ∼456,000 individuals from the UK Biobank. Of the 53 lipid-loci that showed evidence of association (P<1×10−3), 43 loci were associated with at least one of the 134 lipid endophenotypes. The findings of this study illustrate the value of integrative biology to investigate the genetics and lipid metabolism in the aetiology of atherosclerosis and CAD, with implications for other complex diseases. ## Introduction Lipids comprise thousands of individual species, spanning many classes and subclasses. Genome-wide association studies (GWAS) of lipid species can provide novel insights into human physiology, inborn errors of metabolism and mechanisms for complex traits and diseases. Dyslipidaemia, a broad term for disordered lipid and lipoprotein, is a major risk factor for atherosclerotic cardiovascular disease and a therapeutic target for the primary and secondary prevention of coronary artery disease (CAD)1,2. Defined by elevated low-density lipoprotein cholesterol (LDL-C) and triglycerides with decreased high-density lipoprotein cholesterol (HDL-C) – these ‘clinical lipid’ measures provide only a partial view of the complex lipoprotein structures and their metabolism. Lipidomic technologies can now measure hundreds of individual molecular lipid species that make up the human lipidome, providing a more complete snapshot of the underlying lipid metabolism occurring within an individual. Genome-wide association studies have uncovered thousands of genetic variants linked to traditional clinical lipids (LDL-cholesterol, HDL-cholesterol, triglycerides)3,4. Genes implicated at these loci show functional links between lipid levels and CAD5. The human lipidome is heritable and predictive of CAD, furthering our understanding of the biology of CAD6. The individual lipid species that make up the lipidome are biologically simpler measures that may reside closer to the causal action of genes, making them valuable endophenotypes for gene identification. Genetic interrogation of the human lipidome may therefore reveal further genetic variants that play a role in lipid metabolism and CAD. Compared with other complex traits, relatively few genomic loci have been associated with lipid species in GWAS of the human serum/plasma lipidome7-17, although these studies have generally interrogated a restricted subset of lipid species. The serum lipidome is complex and consists of many isobaric and isomeric species that share elemental composition but are structurally distinct. Existing lipidomic studies often employ techniques that provide poor resolution of these species, limiting their biological interpretation. We have recently expanded our lipidomic platform to better characterise isomeric lipid species, now measuring 596 lipids from 33 classes18. Our methodology focuses on the precise measurement of a broad number of lipid and lipid-like compounds, utilising extensive chromatographic separation. Here, we report a GWAS of 596 targeted lipid species (across 33 lipid classes) in an Australian population-based cohort of 4,492 individuals, validation of significant loci in two independent cohorts and meta-analysis of all results. Using robust procedures, we disentangle genetic effects of lipid species from lipoproteins. Integration of multiple datasets, including expression quantitative trait loci (eQTL), methylation QTL (meQTL), and protein QTL (pQTL), and in-depth analysis of significant loci highlights putative susceptibility genes for CAD. We demonstrate robust associations between lipid species and CAD using genetic correlations, polygenic risk scores and phenotypic associations. Many lipid-associated loci show pleiotropy with CAD in colocalization analysis. Assessment of loci with coronary atherosclerosis in 456,486 UK Biobank participants reveals novel associations, independent of clinical lipid measures. ## Results ### Lipidomic profiling We measured 596 individual lipid species within 33 lipid classes, covering the major glycerophospholipid, sphingolipid, glycerolipid, sterol and fatty acyl classes in serum and plasma samples from three independent cohorts (Supplementary Tables 1-3). Assay performance was monitored using pooled plasma quality control samples, enabling determination of coefficient of variation (%CV) values for each lipid class and species. In the Busselton Health Study (BHS) discovery cohort, the median %CV was 8.6% with 570 (95.6%) lipid species showing a %CV less than 20%. All lipids were measured in every individual, with the exception of three values which were below the limit of detection. The lipidomic analysis of the Australian Imaging, Biomarker, and Lifestyle (AIBL) and Alzheimer’s Disease Neuroimaging Initiative (ADNI) validation cohorts showed similar assay performance19. ### GWAS of the human serum lipidome We performed a GWAS of the human serum lipidome (Figure 1), in the BHS discovery cohort of 4,492 individuals of European ancestry (Supplementary Tables 4-7 and Figure 2) and a meta-analysis of the two validation cohorts, consisting of 670 and 895 individuals of European ancestry (Supplementary Table 8). We further performed a discovery meta-analysis of all three studies (Supplementary Table 9). All summary-level statistics are available at our data portal ([https://metabolomics.baker.edu.au/](https://metabolomics.baker.edu.au/)). ![Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F1.medium.gif) [Figure 1](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F1) Figure 1 Study design for the genetic analysis of the human lipidome. Representation of genome-wide association studies of the lipidome in the BHS discovery sample, validation and meta-analysis of ADNI and AIBL studies, and downstream analyses. ![Fig. 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F2.medium.gif) [Fig. 2](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F2) Fig. 2 Circular presentation of loci associated with circulating lipid species identified in our Discovery GWAS. The -log10(*P*) for genetic association with lipid species are arranged by chromosomal position, indicated by alternating blue and green points. Association *P*-values are truncated at *P*<1×10−60. Genome-wide significance (*P*<5×10−8) is indicated by the red line. For details about significant associations, see Supplementary Tables 3 and 4. Genes identified in our candidate gene analysis are highlighted in blue, otherwise the closest gene is indicated in black. The purple band indicates lipid loci that colocalize with coronary artery disease (CAD) or show association with CAD after adjusting for clinical lipids. The inner circle shows a Fuji plot of SNP-lipid associations, colored by broad lipid category. Color keys representing broad lipid categories are indicated in the plot center. Chromosomes are indicated by numbered panels 1–22. The discovery GWAS identified 2,279 independent SNP-lipid species associations, and 132 independent SNP-lipid class associations at a genome-wide significance (P<5.0×10−8; r2<0.1; Figure 2; Supplementary Table 8). All lipid classes and 543 (of 596; 91.1%) lipid species had at least one significant association. All significantly associated SNPs were in Hardy-Weinberg Equilibrium (HWE; all P≥1.53×10−4), and were relatively common (minor allele frequency; MAF<0.01: 4%; MAF>0.05: 91%, Supplementary Table 6). Overall, 667 independent SNPs were significantly associated across lipid outcomes (Supplementary Table 10). Each SNP was associated with between 1 and 222 lipids (Extended data Fig. 1). SNPs associated with a large number of lipids were in regions known to be involved in lipid regulation, including *FADS1/FADS2/FADS3, APOE*, and *LIPC*. The most significant associations were observed between PC(18:0_20:4) and rs174564 (*FADS2*; P=4.63×10−220) and between Cer(d19:1/22:0) and the intergenic SNP rs364585 (flanking *SPTLC3*; P=7.81×10−185). In fact, the most significant 26 SNP-lipid species associations were with SNPs in these two regions. The median genomic inflation factors were 1.01 (range: 0.99-1.03), and 1.02 (range: 1.00-1.03) for lipid species and class analyses, respectively. SNP-based heritability estimates were moderately correlated (r=0.45) with lambda estimates, for each of the lipid species and classes (Extended data Fig. 2a), as expected20. ### SNP-lipid species associations are largely independent of clinical lipid measures We performed additional analyses, adjusting for clinical lipids (total cholesterol, HDL-cholesterol, triglycerides), to identify SNP-lipid species associations independent of clinical lipid traits. The median genomic inflation factors were 1.01 (range: 0.99-1.03), and 1.01 (range: 1.00-1.03) for lipid species and classes, respectively; with heritability estimates moderately correlated (r=0.51) with lambda estimates, for each of the lipid species and classes (Extended data Fig 2b). Adjustment for clinical lipids identified 2,424 independent SNP-lipid species associations, and 124 independent SNP-lipid class associations (Supplementary Table 9). There were 1,545 SNP-lipid species and 72 SNP-lipid class associations that were significant in both the unadjusted and the adjusted analyses, with an r2 between beta coefficients of 0.93 (Figure 3; Supplementary Table 4 and 5). Adjustment for clinical lipids identified an additional 879 significant SNP-lipid species associations, for 387 lipid species. However, 726 SNP-lipid species associations previously associated in the unadjusted analysis, fell below our significance threshold. Approximately 24% of these were lipid species in the classes cholesteryl ester (n=93), and phosphatidylcholine (n=81) (Supplementary Table 9). We also identified an additional 52 significant SNP-lipid class associations, particularly for trihexosylceramide (6 associations) and hexosylceramide (6 associations) classes. However, 60 SNP-lipid class associations, fell below our significance threshold, with the classes diacylglycerol, GM3 ganglioside, lysophosphatidylcholine, lysoalkenylphosphatidylethanolamine, phosphatidylcholine, alkylphosphatidylethanolamine, alkenylphosphatidylethanolamine, phosphatidylserine, sphingomyelin, and triacylglycerol no longer associated (P<5.0×10−8) with any genetic variants. ![Fig. 3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F3.medium.gif) [Fig. 3](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F3) Fig. 3 Comparison of estimated lipidomic effect sizes between clinical lipid adjusted and unadjusted models. **a**, Beta coefficients for independent unadjusted SNP-lipid associations (*x* axis) are plotted against clinical lipid adjusted SNP-lipid associations (*y* axis). **b**, Z-scores for unadjusted SNP-lipid associations (*x* axis) are plotted against clinical lipid adjusted SNP-lipid associations (*y* axis). Z-scores for SNP associations reaching genome-wide significance (P<5×10−8) in either the clinical lipid adjusted or unadjusted models. Variant effect signs are fixed so adjusted associations are positive. Variants showed greater (positive) associations in clinical lipid adjusted analysis are shown in red, and variants showing reduced associations are shown in blue. Circle diameter is proportional of -log10(P) t-test of effect differences. Results from multi-trait conditional and joint (mtCOJO; Supplementary Tables 4 and 5) analyses using clinical lipid traits (total cholesterol, HDL-cholesterol, triglycerides) GWAS results from the UK Biobank, to minimise the risk of pleiotropy/collider bias introduced by heritable covariates, were largely consistent with those of the clinical lipid adjusted analysis (r2 of beta coefficients=0.91, Extended data Fig. 3). Comparison of the clinical lipid adjusted Z-scores and mtCOJO Z-scores identified three regions (*APOE, FADS1*/*FADS2*/*FADS3, TMEM229B*/*PLEKHH1*) with substantial differences (P<1.0×10−4) indicating the possibility of biased effect measures for the adjusted analyses in these regions. Overall, results were overwhelmingly consistent between mtCOJO and clinical lipid-adjusted analyses. Conditional analysis (sequentially conditioning on the lead SNP) identified 386 secondary signals (across both funadjusted and clinical lipid-adjusted analyses), associated with 163 lipid species/classes (Supplementary Table 7). Two regions, *LIPC* and *ATP10D*, each contained five independent signals (PCONDITIONAL<5.0×10−8). The *LIPC* genomic region was strongly associated with phosphatidylethanolamine species and class, while *ATP10D* was associated with hexosylceramide species and class. The *SPTLC3* region harboured four independent signals, strongly associating with sphingolipids containing a d19:1 sphingoid base. ### Associations validated in independent cohorts For each lipid, significantly associated SNPs were linkage disequilibrium (LD)-clumped to remove variants in LD(r2>0.1). We assessed whether the 2,411 independent lipid species/class associations identified in the BHS discovery cohort (unadjusted analysis) were validated within a combined ADNI and AIBL validation cohort meta-analysis. There were 273 SNP-lipid associations not available for validation in the meta-analysis, either due to lipids not available in the ADNI and AIBL cohorts; missing SNPs (and proxies) on the imputation panel; or monomorphic/very low frequency MAF in ADNI/AIBL. Therefore, we attempted to validate the remaining 2,137 significant SNP-lipid associations (Supplementary Table 8). We considered a SNP-lipid association to be validated if i) the SNP was significantly associated (P<5×10−8) in the unadjusted BHS discovery GWAS; ii) the direction of effect was concordant between the validation meta-analysis and the BHS discovery analysis; and iii) the association was nominally significant (P<0.05; less conservative) or reached the Bonferroni significance threshold (P<2.34×10−5) in the validation meta-analysis. We identified 1,474 (69.2%) SNP-lipid associations that reached nominal significance (P<0.05), and 644 (30.1%) reaching Bonferroni-corrected significance. Almost all associations (>99%) had the same direction of effect, with a very strong correlation between validation meta-analysis and significant (P<5×10−8) discovery effect sizes (r2=0.53 overall, and r2=0.80 for SNPs with MAF > 0.05 in the BHS; Extended data Fig. 4). ### Discovery meta-analysis At a stringent significance threshold of P<3.47×10−10 (5×10−8/144 effective lipid dimensions), the meta-analysis of all three studies identified 65,563 significant SNP-lipid associations (Supplementary Table 9), involving 499 lipid species/classes and 7,600 SNPs. We identified 5,658 new associations not observed in the BHS discovery GWAS alone, involving 352 lipids and 2,914 SNPs. The majority of these (n=5,543; 98%) showed some evidence of association in the BHS discovery GWAS (5×10−8< P <5×10−4). However, 89 associations were not nominally significant (P>0.05) in the BHS discovery GWAS, indicating that the effects observed in the meta-analysis were largely due to the AIBL and ADNI samples. ### Defining independent loci and genes controlling lipid homeostasis For each lipid, significantly associated SNPs were LD-clumped to remove variants in LD (r2>0.1). Lead variants from the individual analyses (clinical lipid adjusted and unadjusted), including conditional analyses, were clumped if the index SNPs were in linkage disequilibrium (r2>0.1). We identified 3,361 independent loci-lipid associations, involving 610 lipid species/classes, each associated with between 1 and 30 independent SNPs. To identify genomic regions associated with lipid metabolism, a single dataset was produced by identifying the smallest P-value for each SNP, across all lipids and analyses. LD-clumping of this dataset resulted in 667 independent genomic regions (Supplementary Table 10). This procedure was repeated, including SNP-lipid associations passing our discovery meta-analysis significance threshold (P<3.47×10−10), resulting in 682 independent genomic regions, 612 of which overlap with those identified in BHS alone (737 in total). The variants within a genomic region and the lipids associated with those variants are collectively termed a genetically influenced lipotype. ### Identification of candidate genes within loci Using the **Pr**ioritization **o**f candidate causal **Ge**nesat **M**olecular QTLs (ProGeM) framework21 to prioritize candidate causal genes, biologically plausible genes were identified in 573 of the 737 genomic regions (Supplementary Tables 10-12), with an overlap of 498 genomic regions between genetic-based (bottom-up) and biological knowledge (top-down) based approaches. A total of 2,321 SNP-gene pairs were identified, where the gene has previously been implicated in the regulation of metabolism or a molecular phenotype (Figure 4a). Of these genes, 970 (41.8%) are present in lipid-metabolism specific databases. ![Fig. 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F4.medium.gif) [Fig. 4](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F4) Fig. 4 Identification of putative causal genes using genetic prioritization and knowledge-based approaches. Assignment of putative causal genes was performed using the ProGeM framework, incorporating genetic-based prioritization (bottom-up) and biological knowledge-based approaches (top-down). **a**, Venn diagram showing the number of loci with annotations for causal genes using the distinct approaches and the overlap. Top-down annotations were divided into lipid-specific databases and generic databases. **b**, Venn diagram of distinct genes identified in genetic-based prioritization analysis. **c**, summary of putative causal genes with overlapping annotations for closest gene, protein consequences, eQTL and meQTL (left). Summary of putative causal SNP-gene pairs for which pQTL evidence was identified (right). A total of 62 SNPs were annotated as either missense (n=59), stop gain (n=2), structural interaction (n=1), start loss (n=1), or splice donor (n=1) mutations. Of these, three were annotated as having a putative ‘high’ impact, and the remaining as ‘moderate’ impact. These SNPs are linked to 55 protein products (Figure 4b). Comparing our lead SNPs and proxies against previously published eQTL associations, 2,058 SNP-gene pairs were identified (Figure 4b). Published meQTL associations revealed 879 SNP-gene pairs, 587 (66.8%) of which replicated eQTL associations. In contrast to eQTL and meQTL, overlap of published pQTL associations were much less evident, with only 16 SNP-gene pairs identified (Figure 4c). In total, 18 SNP-gene pairs were identified with evidence from closest gene, protein consequences, eQTL and meQTL. The overlap of top-down and bottom-up candidates supported the annotation of 1,031 SNP-gene pairs. ### Most SNP-lipid species associations were nove Of the 737 lead variants (and their proxies), 228 (31%) had been reported in at least one of 35 previous metabolomic/lipidomic studies (Supplementary Note 1), resulting in 509 putatively novel genetically influenced lipotypes (Supplementary Table 13). ### Genetically influenced lipotypes overlap with coronary artery disease and cardiovascular disease related loci We looked at overlap between 10 hard cardiovascular disease (CVD) points from the GWAS catalog and the lead SNP (or proxy) from each of the 737 regions, identifying a total of 23 lead SNPs, or their proxies, associated (P<5×10−8) with 10 hard CVD endpoints (Supplementary Table 14). The most frequently overlapping GWAS catalog hard CVD endpoints were CAD (n=14 SNPs), CVD (n=10 SNPs), coronary artery calcification (n=8 SNPs), and myocardial infarction (n=8 SNPs). Three additional lead SNPs were associated with CAD in the CARDIoGRAMplusC4D and UK Biobank meta-analysis. Eighty-four lead SNPs were associated with 101 CVD-related traits, including chronic kidney disease (n=18,) C-reactive protein (n=14), metabolic syndrome (n=12), body mass index (n=8), and systolic blood pressure (n=4). As expected, lead SNPs frequently overlapped with 186 lipid-related traits, with 99 lead SNPs or proxies observed in the GWAS catalog. ### Serum lipid species/classes are phenotypically and genetically associated with coronary artery disease Using nominal significance (P<0.05), we identified 240 lipid species/classes phenotypically associated with incident CAD in the BHS (Figure 5a; Supplementary Table 15), with 11% in the positive direction. The strongest association was between TG(50:2)[NL-18:2] and incident CAD (0.311±0.046, P=1.74×10−11, FDR q=1.09×10−8). Overall, the most strongly associated lipid species were those in the triacylglycerol, diacylglycerol, phosphatidylethanolamine, and cholesteryl ester classes. ![Fig. 5](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F5.medium.gif) [Fig. 5](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F5) Fig. 5 Genetic and phenotypic associations of the lipidome with coronary artery disease. Forest plots of lipid-coronary artery disease effect sizes and standard errors. **a**, phenotypic associations between lipid species and incident coronary artery disease in the BHS cohort (551 cases and 3,703 controls), adjusted for age, sex, and the first 10 genomic principal components. **b**, association of lipid species with polygenic risk for coronary artery disease. Individuals in the discovery cohort (n=4,492) were assessed for risk using the metaGRS polygenic score, consisting of approximately 1.7 million genetic variants. Linear regressions were performed to test the association between an individual’s polygenic score and lipid species concentrations, adjusting for age, sex and the 10 first principal components. **c**, genetic correlations of lipid species against coronary artery disease (meta-analysis of CARDIoGRAMplusC4D and UK Biobank; 122,733 cases and 424,528 controls), performed with Linkage Disequilibrium Score Regression (LDSC; v1.0.1). The 10 most significant lipid species are highlighted in blue, red, or green. ![Fig. 6](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F6.medium.gif) [Fig. 6](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F6) Fig. 6 Colocalization of lipid-loci with coronary artery disease. Summary of lipid classes which contain at least one lipid species that colocalizes with coronary artery disease. Colors indicate broad lipid categories. Indicated variants were identified as the most likely causal variant for each of the colocalization analyses. Genetic variants are ordered according to the number of colocalizations across lipid classes. Evidence of colocalization included H3+H4 > 0.8 and H4/H3 > 10. Variants were annotated to the closest gene. We identified 265 lipid species/classes that showed a nominally significant (P<0.05) association with the CAD polygenic risk score22 in the BHS (Figure 5b; Supplementary Table 15). These were positive associations except for lipids in the alkenyl-phosphatidylcholine and alkenyl-phosphatidylethanolamine classes. The strongest association was observed for LPE(18:0) [sn2] (0.075±0.014, P=8.9×10−8, FDR q=5.59×10−5). Next, we estimated the genetic correlation between lipid species/classes and CAD. Using linkage disequilibrium score regression, we identified nominally significant genetic correlations (P<0.05) between 199 lipid species/classes and CAD, with 50 of these negatively correlated (Figure 5c; Supplementary Table 14). The strongest genetic correlations were between TG(51:2) [NL-16:0] (0.275±0.058, P=2.22×10−6, FDR q=8.94×10−4) and CAD. Overall, using a significance threshold of P<0.05, we identified 134 lipid species/classes that were significantly associated in each of the three analyses - association with incident CVD (phenotypic), CAD polygenic risk (PRS), and genetic correlation. Importantly, these lipid species/classes showed concordant directions of effects in all three analyses, defining these lipid species/classes as lipid endophenotypes for CAD. ### Colocalization analysis identified shared causal variants for coronary artery disease We performed pairwise colocalization analysis, within each QTL, between lipid species and CAD to assess whether they share common causal variants (Supplementary Table 16). We identified evidence of 43 shared causal variants for CAD and any lipid species (Table 1; Supplementary Note 2). The strongest evidence was between CE(18:1) and CAD at the *APOE* rs7412 loci (H3+H4=1.00; H4/H3=1.17×1011). There was strong evidence for the sharing of this causal variant between CAD and 184 lipid species from 23 lipid classes (with and without clinical lipid adjustment). There was also strong evidence for rs603424, near a likely candidate *SCD* (Stearoyl-CoA desaturase), and 24 lipid species/classes (0.9360.8; *P*<5×10−8). Variant correlations were obtained from 10,000 unrelated individuals from the UK Biobank. **c**, plot of genetic instrument effect sizes against Total PE and coronary artery disease. Variants were selected based on association with Total PE from within the LIPC region. Eight approximately independent variants were left following clumping (R2>0.05; *P*<5×10−8). Generalised summary-data based Mendelian randomisation (GSMR) was used to estimate effect of Total PE on coronary artery disease, accounting for the variant correlations and uncertainty in both bzx and bzy. **d**, forest plot of single variant tests and GSMR estimate. **e**, diagram of mediated pleiotropy, showing effect sizes estimated across multiple datasets. Exposure modifying variant effect sizes were estimated in the BHS cohort, as well as odds-ratio of phosphatidylethanolamine lipid species against incident cardiovascular disease. Total effect represents the sum of genetics effects on coronary artery disease, whether mediated through phosphatidylethanolamine or not. Coronary artery disease effect size was obtained from van der Harst & Verweij 2018. Angiopoietin-like 3 (ANGPTL3) has been implicated in CAD risk, with a deficiency being associated with cardioprotective effects31-33. ANGPTL3 acts as an inhibitor to two other lipases, lipoprotein lipase (LPL) and endothelial lipase (LIPG); loss of function mutations in *ANGPTL3* have been linked to hypolipidemia33. We recently identified a rare frameshift deletion (rs398122988) associated with decreased ANGPTL3 protein levels in extended Mexican American families34; the variant was also associated with a ∼1.3 standard deviation decrease in phosphatidylinositol species. In this study, we validate this observation, with SNPs in the *ANGPTL3* region associated with a decrease in phosphatidylinositol species, again these associations persisted even after adjustment for clinical lipids (total cholesterol, HDL-C, triglycerides). Interestingly, we also observe associations of phosphatidylinositol species with SNPs in the *LIPG* region. Commonly, phosphatidylinositol species have been studied for their intracellular messaging roles following phosphorylation of the inositol ring by kinases, including PI-3-kinase, which lead to downstream cardio-metabolic effects35. However, the role of phosphatidylinositol species in CVD risk is still largely unknown; we have previously observed the change in the ratio of phosphatidylinositol to phosphatidylcholine species as a predictor of CVD risk reduction from statin treatment36. Further work is now required to unravel the role on phosphatidylinositol in mediating the effect of these genes on CVD risk. In summary, using our expanded lipidomic profiling platform, we have investigated the largest number of targeted lipid species in a GWAS, and have reported significant genetic associations with lipid species that have not previously been reported in any genetic association studies to date. Our strategy to use lipid species as endophenotypes in the search for CVD genes is the ‘tip of the iceberg’. We have previously reported phenotypic associations of lipid species with other complex traits, including diabetes37, Alzheimer’s disease19, and atrial fibrillation38; we believe the same integrative genomics approach may now be used to elucidate the mechanistic underpinnings of lipid metabolism in these and other complex diseases. These data now represent a valuable resource for the future exploration of the genetic analysis of the lipidome to identify lipid metabolic pathways and regulatory genes associated with complex disease and identify new therapeutic targets. To this end we provide all summary statistics and an online searchable resource of association plots of lipid species and classes with genetic variants and regional association plots with individual lipid species and classes ([https://metabolomics.baker.edu.au/](https://metabolomics.baker.edu.au/)). ## Methods ### Study populations Participants in the discovery cohort (n=4,492) were all participants of the 1994/95 survey of the long-running epidemiological study, the BHS, for whom genome-wide SNP data, extensive longitudinal phenotype data, and blood serum were available. The BHS is a community-based study in Western Australia that includes both related and unrelated individuals (predominantly of European ancestry), and has been described in more detail elsewhere39-41. Informed consent was obtained from all participants and the 1994/95 health survey was approved by the University of Western Australia Human Research Ethics Committee (UWA HREC). The current study was also approved by UWA HREC (RA/4/1/7894) and the Western Australian Department of Health HREC (RGS03656). The two validation cohorts used in this study were the AIBL study42 and the ADNI study43; both of which were established to discover biomarkers, health and lifestyle factors for the development, early detection, and tracking of Alzheimer’s disease. The AIBL study is a longitudinal study which recruited 1,112 individuals aged over 60 years within Australia. Time points for blood/data collection were every 18 months from baseline. For each individual, lipidomic data obtained from the earliest blood collection was used. At baseline, 768 individuals were characterized as cognitively normal, 133 with mild cognitive impairment and 211 with Alzheimer’s disease. The ADNI study is a longitudinal study, starting in 2004 and recruited 800 individuals at baseline, from sites across the United States of America and Canada. Serum samples obtained at baseline were analysed. Study data analysed here were obtained from the ADNI database, which is available online ([http://adni.loni.usc.edu/](http://adni.loni.usc.edu/)). For the lipidomics analysis, the AIBL study was deemed low risk (The Alfred Ethics Committee; Project 183/19), and the ADNI study was deemed ‘research not involving human subjects’ (Duke Institute review board; ID:Pro00053208). ### Lipidomic profiling Targeted lipidomic profiling was performed using liquid chromatography coupled electrospray ionization-tandem mass spectrometry to quantify 596 lipid species from 33 lipid classes, from non-fasting blood serum (BHS discovery) and non-fasting blood plasma (ADNI and AIBL validation). Lipidomic profiling of each cohort was performed using the methodology described by Huynh *et al*. and has been described previously18,44. Briefly, 10μL of serum was spiked with an internal standard mix (Supplementary Table 2) and lipid species were isolated using a single phase butanol:methanol (1:1; BuOH:MeOH) extraction45. Analysis of serum extracts was performed on an Agilent 6490 QqQ mass spectrometer with an Agilent 1290 series HPLC, as previously described. Mass spectrometry settings and transitions for each lipid class are shown in Supplementary Table 2. A total of 497 transitions, representing 596 lipid species, were measured using dynamic multiple reaction monitoring (dMRM), where data was collected during a retention time window specific to each lipid species. Raw mass spectrometry data was analysed using MassHunter Quant B08 (Agilent Technologies). ### Data integration and cleaning Lipid concentrations were calculated by relating the area under the chromatographic peak, for each lipid species, to the corresponding internal standard. Correction factors were applied to adjust for differences in response factors, where these were known18. In-house pipelines were used for quality control and filtering of lipid concentrations. Across the entire dataset, only three missing values were evident. Lipids below the limit of detection (missing values) were imputed to half the minimum observed value. To remove technical batch variation, the lipid data in each analytical batch (approximately 486 samples per batch; 11 batches in total) was aligned to the median value in pooled plasma quality control samples included in each analytical run. Unwanted variation was identified using a modified remove unwanted variation-2 (RUV-2) approach46. In brief, lipid data were residualized in a linear mixed model, against age, sex, body mass index (BMI), clinical lipids and the genetic relatedness matrix (described below) as the random effects. Principal component analysis was performed on the residualized data. The first two components showed clear trends along samples in collection order. Therefore, variation associated with these first two principal components was removed from the original data set. Lipid class totals were generated by summing the concentration of the individual species within each class. Validation cohorts were processed in a similar manner. ### Phenotypic variables Details of the BHS data collection have been published previously47. Serum cholesterol and triglycerides were calculated by standard enzymatic methods on a Hitachi 747 (Roche Diagnostics, Sydney, Australia) from fasting blood collected in 1994/95. HDL-C was determined on a serum supernatant after polyethylene glycol precipitation using an enzymatic cholesterol assay and LDL-C was estimated using the Friedewald formula48. Height and weight (used to calculate BMI) were collected from participants at time of interview (1994/95). Use of lipid-lowering medication was recorded at the time of interview (1994/95). Diagnosis of incident CAD was defined as either hospitalisation or death due to CAD (ICD9: 410-414; ICD10: I20-I25) after blood collection date (and until June 2015). Hospitalisations and deaths were identified from the Western Australian Department of Health Hospital Morbidity Data Collection and Death Registrations. ### Medication usage adjustment For individuals taking lipid-lowering medication (BHS, n=108; AIBL, n=366; ADNI, n=382), lipid species and clinical lipid concentrations were adjusted using previously identified effects of lipid-lowering medication. Changes in lipid species and clinical lipids following one year of statin use were calculated from a placebo randomised controlled trial (LIPID study; n=4991)36. To calculate correction factors, lipid measures were centred and scaled by the mean and standard deviation of baseline measures (prior to statin usage), and the change in lipid abundance was calculated and regressed on age, sex, BMI and statin usage. Statin usage beta coefficients (effect of the lipid-lowering medication) was added to standardised lipid species concentrations of the individuals taking lipid-lowering medication in the current study. For lipid species present in both this study and the LIPID study (overlap of 314 lipid species), species-specific correction factors were calculated. For those lipid species not measured in the LIPID study (n=282), class-specific corrections were calculated. ### Genotyping and Imputation For the BHS discovery cohort, genotyping was performed on the Illumina Human 610K Quad-Bead Chip (Illumina Inc., San Diego, CA, USA) at the Centre National de Genotypage in Paris, France (n=1468), and on the Illumina 660W Quad Array Bead Chip (Illumina Inc., San Diego, CA, USA) at the PathWest Laboratory Medicine WA (Nedlands, WA, Australia (n=3428). Complete linkage clustering based on pairwise identity by state distance in PLINK49 showed no batch effects, therefore the batches were merged. Standard genotype data quality control was performed as described previously41. Briefly, individuals were excluded if: >3% of SNP data were missing (*n* = 11), reported sex did not match genotyped sex (*n* = 48), duplicates (*n* = 123), missing phenotype data (*n* = 11), or >5 standard deviations above/below mean heterozygosity (*n*=28). Individuals with non-European ancestry (*n*=4) were also excluded. To prepare genotype data for imputation, SNPs were excluded if: call rates < 95%, minor allele count < 10, deviations from HWE (P<5.0×10™4), no matching Haplotype Reference Consortium (HRC) reference panel SNP, palindromic (A/T, G/C) SNPs with MAF greater than 0.4 from the HRC (*n*=5), and SNPs with >0.2 MAF difference compared to HRC (*n*=150). After quality control, SNP data was available for 513,634 SNPs. Imputation was performed to the HRC reference panel using the Michigan Imputation Server50. Following imputation, 39,117,105 SNPs were available for analysis. We excluded variants if the number of copies of the minor allele <5 or if imputation quality (r2) <0.3. This resulted in 13,887,524 variants available for analysis. Genotyping in ADNI was performed on the Human 610-Quad BeadChip (Illumina, Inc., San Diego, CA). Following standard quality control procedures performed in Plink49 (minimum SNP and individual call rate > 95%, MAF>0.05, HWE test P>1×10™6), the sample was imputed to the 1000 Genomes Phase 3 reference panel using Impute251, with pre-phasing using ShapeIT52. Genotyping in AIBL was performed on the Infinium OmniExpressExome array (Illumina, Inc., San Diego, CA)53. Quality control procedures were performed in Plink49. After removing individuals with ambiguous sex, Plink was used to remove individuals with call rate <0.90; SNPs were removed if call rate<0.95, HWE test P<1.0×10−4, or MAF<0.05. SNPs were flipped to the positive strand before imputation to the 1000 Genomes Phase 3 reference panel using the Michigan Imputation Server50 (using Minimac 4). Both the AIBL and ADNI validation cohorts were restricted to individuals of non-Hispanic European ancestry, based on projection onto the 1000 genomes reference panel. ### Genetic relatedness matrix The discovery sample, BHS, used in this study consisted of related and unrelated individuals; therefore, all analyses included a genetic relatedness matrix. Twenty-two genetic relatedness matrices were calculated. First, a hard-call set of imputed SNPs was created in Plink (i.e. SNP genotypes were called if SNP imputation quality r2>0.8 and if genotype probability >0.9). The *HLA* region on chromosome 6 was also excluded. SNPs were then pruned in Plink using ‘indep-pairwise 500 50 0.3’ [window of size 500, moving 50 SNPs along each time, removing variants with r2>0.3] to create a set of 486,553 independent SNPs. Twenty-two genetic relatedness matrices were created (using the option ‘gk 1’ which specifies a centred relatedness matrix), with each omitting one chromosome, in GEMMA54. ### Statistical analysis Genome-wide association analyses for the 596 lipid species and 33 lipid classes in the discovery cohort were performed using imputed genotype dosages in linear mixed models, as implemented in GEMMA54. To avoid proximal contamination, analyses were performed using genetic relatedness matrices implementing a leave-one-chromosome out scheme. Analyses were performed using rank-based inverse normal transformed residuals, after adjustment by age, sex, age2, age*sex, age2*sex and the first 10 principal components (generated from Eigenstrat)55,56. Validation cohorts, ADNI and AIBL, were analysed using an additive linear model, as implemented in Plink49. Analyses were performed using rank-based inverse normal transformed residuals, after adjustment by age, sex, age2, age*sex, age2*sex, study-specific covariates and a number of principal components deemed sufficient to capture population structure. Meta-analysis between all three studies was performed using an inverse-variance weighted fixed-effects model, as implemented in METAL57. Due to the correlation between lipid species, the effective number of tests was calculated as the number of principal components required to explain at least 95% variance of the lipidome (144 components). Statistical significance was defined using the standard genome-wide significance (P<5×10−8) in the BHS discovery analysis, P<0.05 in AIBL/ADNI validation, and P<3.47×10−10 in the three-study meta-analysis (5×10−8/144 lipid dimensions; Bonferroni correction using the effective number of tests). A more stringent threshold was used for the meta-analysis due to the lack of validation samples available. For each lipid, significantly associated SNPs were LD-clumped (r2>0.1) using correlation measures obtained from 10,000 unrelated individuals from the UK Biobank, the 1000 Genomes, or the BHS. A singular dataset was created by retrieving the smallest P-value across all analyses. This dataset was LD-clumped (r2>0.1) to determine the number of independent genomic regions. For each locus, a regional association plot was produced using LocusZoom58. ### Detection of distinct association signals Conditional analysis was performed to detect independent association signals at each genome-wide significant loci, using GEMMA. For each lipid, we iteratively clumped regions within a 2Mb window centered on the lead SNP until no more genome-wide significant associations were left. Regions with overlapping windows were merged. Conditional analysis was iteratively performed, including the lead variant as a covariate until no more conditionally independent signals (P<5×10−8) remained. ### Assessment of effects of clinical lipid trait adjustment Within the discovery cohort, to determine whether SNP-lipid associations were independent of clinical lipid traits (total cholesterol, HDL-C, triglycerides), all SNPs were tested with and without adjustment for clinical lipid traits. We compared loci effect sizes between analyses run with and without clinical lipid adjustment using a pooled standard deviation t-test (Supplementary Note 3). Bonferroni adjustment (0.05/number of loci) was used to identify loci which differed substantially following adjustment. As adjusting for heritable covariates can introduce collider bias59, we further validated these using multi-trait conditional and joint analysis (mtCOJO)60, conditioning on GWAS summary-level data for clinical lipids obtained from the UK Biobank61. ### Annotation Proxies for lead SNPs were found by identifying those in high LD (r2>0.8) within the BHS dataset; in an unrelated subset of white, British individuals from the UK Biobank62; or in the 1000 Genomes. Lead SNPs and their proxies were annotated using SNPEff63. SNiPA database v3.364 was used to retrieve combined annotation dependent depletion (CADD) score. Expression QTL associations (cis-eQTL) were obtained from GTEx65 (release v8) and eQTLGen66 (release 2019-12-20). SNiPA metabolite QTL (mQTL) associations were supplemented with mQTL associations reported in PhenoScanner67,68 and recently published lipidomic GWAS7,17. SNiPA protein QTL (pQTL) associations were supplemented with cis-pQTL associations from Emilsson *et al*. 201869. Methylation QTL (meQTL) associations were obtained from Huan *et al*. 201970. A locus was defined as novel if the lead SNP or its proxies were not previously reported as an mQTL or lipid related trait loci. Putative causal genes, for each loci, were identified using a slightly modified approach to that previously described (ProGeM)21. For the bottom-up approach, the three closest protein coding genes (within a 1Mb window) were identified, for each lead SNP. Genes were noted if a lead SNP or its proxies were annotated by SNPEff as missense, start loss, stop gain, or with an annotation impact as High. As performed by ProGeM, the top-down analysis reports genes within 500kb of the lead SNP that are present in a curated database of known metabolic-related genes. A list of primary candidates was generated based on the overlap of top-down and bottom-up genes. ### Overlap of lead variants with cardiovascular disease-related loci To assess whether our lead SNPs were previously associated with CVD-related traits, we performed a look-up within the GWAS catalog v1.02 (release 2020-08-26)71 of 10 hard CVD endpoints, 72 CVD-related traits, and 141 lipid-related traits. We also performed a look up against a meta-analysis of CAD between CARDIoGRAMplusC4D and UK Biobank72. ### Associations of lipid species with coronary artery disease and coronary artery disease polygenic risk Within the discovery cohort, the association of lipid species with incident CAD was assessed using logistic regression, adjusting for age, sex, and the first 10 genomic principal components. Prevalent CAD cases were removed prior to analysis; defined as individuals hospitalised with CAD between the start of the Hospital Morbidity Data Collection (1970), and an individual’s serum collection date. Incident CAD events (CAD hospitalisations or death) were included up to the end of follow-up (July 2015). Results are displayed as log-odds ratios. Polygenic risk for CAD was calculated for each individual in the discovery cohort using the metaGRS polygenic score, consisting of approximately 1.7 million genetic variants22. Linear regression in R was performed to test the association between an individual’s polygenic score and lipid species concentrations, adjusting for age, sex and the 10 first principal components. ### Genetic correlations Genetic correlations of lipid species against CAD was assessed using Linkage Disequilibrium Score Regression (v1.0.1)73. Regression weights and scores were obtained from 1000 Genomes European data, as previously described74. Summary statistics from all datasets were restricted to SNPs from the HapMap 3 panel, with 1000 Genomes European MAF greater than 5%. Where available, SNPs were filtered to an imputation quality r2 > 0.9. Similarly, SNPs were removed if the reported MAF deviated from 1000 Genomes European MAF by greater than 0.1. Summary statistics for CAD were obtained from the meta-analysis of CARDIoGRAMplusC4D and UK Biobank by van der Harst and Verweij72. Due to no overlapping samples between BHS and other summary results, the genetic covariance intercept was constrained to 0. ### Colocalization analysis Colocalization between lipid species genome-wide significant loci and CAD was performed using the R package COLOC75. For each loci, all variants within a 400kb window centered on the lead SNP were selected. Priors were kept at default settings. Evidence for shared causal variants was determined as the posterior probability of both traits containing causal variants in the region (H3+H4>0.8) and a larger probability of a shared causal variant (H4/H3>10). Sensitivity analysis for regions with causal variants are shown in Supplementary Note 2. ### Association of loci with coronary atherosclerosis in the UK Biobank Lead SNPs (or proxies) were tested for association with coronary atherosclerosis in the UK Biobank. In a subset of white, British individuals (n=456,486), electronic health records (updated 14th December 2020) were converted into PheCodes76,77 using the R package PheWAS78. Coronary atherosclerosis (phecode 411.4) was exported for genome-wide association analysis. FastGWA79 was used to assess the association of lipid-loci with these phenotypes, adjusting for age, sex, age2, age*sex, age2*sex, the first 20 principal components as provided by the UK Biobank, and the genetic relatedness matrix as the random effect. The analysis was repeated, additionally adjusting for clinical lipids (total cholesterol, HDL-cholesterol, triglycerides; measurements obtained from the first available blood collection). Individuals with missing values were excluded from the analysis. As clinical lipids are heritable, mtCOJO analysis was also performed using GWAS summary statistics obtained above. ## Supporting information Supplementary Tables [[supplements/261814_file03.xlsx]](pending:yes) Supplementary Information [[supplements/261814_file04.pdf]](pending:yes) ## Data Availability Complete summary statistics of all lipid species and classes will be available via the NHGRI-EBI GWAS catalog ([https://www.ebi.ac.uk/gwas](https://www.ebi.ac.uk/gwas)), GCP ID: GCP000197; study accession nos. GCST90023981-GCST90025848. In addition, summary-level statistics are available at our data portal ([https://metabolomics.baker.edu.au/](https://metabolomics.baker.edu.au/)). Individual-level data for the BHS are accessible through applications to the Busselton Population Medical Research Institute ([http://bpmri.org.au/research/database-access.html](http://bpmri.org.au/research/database-access.html)). Individual-level data for the ADNI and AIBL studies are available through applications to the LONI Image and Data Archive ([http://adni.loni.usc.edu/data-samples/access-data/](http://adni.loni.usc.edu/data-samples/access-data/)). Individual-level data for AIBL are also available through applications to the AIBL management committee ([https://aibl.csiro.au/research/support/](https://aibl.csiro.au/research/support/)). Publically available datasets used within the study are available via UK Biobank ([http://www.ukbiobank.ac.uk/register-apply/](http://www.ukbiobank.ac.uk/register-apply/)), HRC ([http://www.haplotype-reference-consortium.org/home](http://www.haplotype-reference-consortium.org/home)), 1000 Genomes ([https://www.internationalgenome.org/](https://www.internationalgenome.org/)), SNiPA ([https://snipa.helmholtz-muenchen.de/snipa3/](https://snipa.helmholtz-muenchen.de/snipa3/)), GTEx ([https://gtexportal.org/home/](https://gtexportal.org/home/)), and eQTLGen ([https://www.eqtlgen.org](https://www.eqtlgen.org)/). [https://www.ebi.ac.uk/gwas](https://www.ebi.ac.uk/gwas) [http://bpmri.org.au/research/database-access.html](http://bpmri.org.au/research/database-access.html) [http://adni.loni.usc.edu/data-samples/access-data/](http://adni.loni.usc.edu/data-samples/access-data/) [https://aibl.csiro.au/research/support/](https://aibl.csiro.au/research/support/) ## Data availability Complete summary statistics of all lipid species and classes will be available via the NHGRI-EBI GWAS catalog ([https://www.ebi.ac.uk/gwas](https://www.ebi.ac.uk/gwas)), GCP ID: GCP000197; study accession nos. GCST90023981– GCST90025848. In addition, summary-level statistics are available at our data portal ([https://metabolomics.baker.edu.au/](https://metabolomics.baker.edu.au/)). Individual-level data for the BHS are accessible through applications to the Busselton Population Medical Research Institute ([http://bpmri.org.au/research/database-access.html](http://bpmri.org.au/research/database-access.html)). Individual-level data for the ADNI and AIBL studies are available through applications to the LONI Image and Data Archive ([http://adni.loni.usc.edu/data-samples/access-data/](http://adni.loni.usc.edu/data-samples/access-data/)). Individual-level data for AIBL are also available through applications to the AIBL management committee ([https://aibl.csiro.au/research/support/](https://aibl.csiro.au/research/support/)). Publically available datasets used within the study are available via UK Biobank ([http://www.ukbiobank.ac.uk/register-apply/](http://www.ukbiobank.ac.uk/register-apply/)), HRC ([http://www.haplotype-reference-consortium.org/home](http://www.haplotype-reference-consortium.org/home)), 1000 Genomes ([https://www.internationalgenome.org/](https://www.internationalgenome.org/)), SNiPA ([https://snipa.helmholtz-muenchen.de/snipa3/](https://snipa.helmholtz-muenchen.de/snipa3/)), GTEx ([https://gtexportal.org/home/](https://gtexportal.org/home/)), and eQTLGen ([https://www.eqtlgen.org/](https://www.eqtlgen.org/)). ## Code availability All software and bioinformatic tools used in the present study are publicly available. ## Author contributions Design of study and interpretation of results: GC, CG, PEM, KH, MI, NSM, JHung, JBeilby, MPD, GFW, SS, NRW, JBlangero, PJM, EKM. Statistical and bioinformatic analyses: GC, CG, PEM, MB, AA. Lipidomic analysis: KH, NAM, TD, AN, MC, AS, GO, TW. Cohort oversight, phenotyping or genotyping: JHung, JHui, JBeilby, WLFL, PC, IM, SML, TP, MV, AIB, CRC, VLV, DA, CLM, KT, MA, GK, KN, AJS, XH, RKD, RNM, PJM, EKM. Drafted the manuscript: GC, CG, PEM, KH, PM, EKM, PJM. All authors read, edited and approved the final version of the manuscript. ## Competing Interests The authors declare no competing interests. ## EXTENDED DATA ![Extended data Fig. 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F8.medium.gif) [Extended data Fig. 1](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F8) Extended data Fig. 1 Distribution of genome-wide significant associations for independent SNPs and lipid species. **a**, the number of lipid species associated with independent SNPs in the BHS discovery cohort. **b**, the number of independent SNPs associated with each lipid species in the BHS discovery cohort. **c**, the number of lipid species associated with independent SNPs in the BHS discovery cohort following adjustment for clinical lipid traits. **d**, the number of independent SNPs associated with each lipid species in the BHS discovery cohort following adjustment for clinical lipid traits. ![Extended data Fig. 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F9.medium.gif) [Extended data Fig. 2](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F9) Extended data Fig. 2 Scatterplot of lipid heritabilities (h2) vs GWAS genomic inflation factors (λ) for lipid species and classes. **a**, lipid heritability and genomic inflation factors for genome-wide association analysis in the BHS cohort. **b**, lipid heritability and genomic inflation factors for genome-wide association analysis, adjusting for clinical lipids, in the BHS cohort. Red diamonds indicate lipid classes and black circles indicate lipid species. The correlation between the heritabilities and genomic inflation factors are also shown, with a line of best fit. The right and top axes show histograms of the distribution of the genomic inflation factors from each GWAS, and heritability estimates, respectively. Heritability estimates were calculated in GCTA; using the genetic related matrix (GRM) and adjusted by age, sex, age2, age*sex, age2*sex. ![Extended data Fig. 3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F10.medium.gif) [Extended data Fig. 3](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F10) Extended data Fig. 3 Comparison of estimated lipidomic effect sizes between clinical lipid adjusted and mtCOJO adjusted models. **a**, Beta coefficients for clinical lipid adjusted SNP-lipid associations (*x* axis) are plotted against mtCOJO adjusted SNP-lipid associations (*y* axis). **b**, Z-scores (Beta coefficient divided by standard error) for clinical lipid adjusted SNP-lipid associations (*x* axis) are plotted against mtCOJO adjusted SNP-lipid associations (*y* axis). Variant effect signs are fixed so mtCOJO adjusted associations are positive. Variants showed greater (positive) associations in mtCOJO adjusted analysis are shown in red, and variants showing reduced associations are shown in blue. Circle diameter is proportional of -log10(*P*) t-test of effect differences. ![Extended data Fig. 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/08/25/2021.08.20.21261814/F11.medium.gif) [Extended data Fig. 4](http://medrxiv.org/content/early/2021/08/25/2021.08.20.21261814/F11) Extended data Fig. 4 Comparison of estimated lipidomic effect sizes between the discovery BHS GWAS and the meta-analysis (ADNI and AIBL). **a**, Beta coefficients were estimated from linear regression models for lipid species using the Busselton Health Study discovery GWAS (*x*-axis) and the ADNI and AIBL validation meta-analysis (*y*-axis). **b**, Beta coefficients for only common SNPs (MAF>=0.05) in the Busselton Health Study discovery GWAS (*x*-axis) and the ADNI and AIBL validation meta-analysis (*y*-axis). Only significantly associated SNPs (P<5×10−8) in the Busselton Health Study discovery GWAS are shown. ## Acknowledgements Support was provided by the National Health and Medical Research Council of Australia (#1101320 and 1157607). K.H. was supported by a Dementia Australia Research Foundation Scholarship. This work was also supported in part by the Victorian Government’s Operational Infrastructure Support Program, and the Royal Perth Hospital Research Foundation. The BHS acknowledges the generous support for the 1994/95 Busselton follow-up studies from HealthWay, the Department of Health, PathWest Laboratory Medicine of WA, The Great Wine Estates of the Margaret River region of Western Australia, the Busselton community volunteers who assisted with data collection, and the study participants from the Shire of Busselton. Statistical analyses performed in this work were supported by resources provided by The Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. The authors wish to thank the staff at the Western Australian Data Linkage Branch and Death Registrations and Hospital Morbidity Data Collection for the provision of linked health data. Funding for the AIBL study was provided in part by the study partners [Commonwealth. Scientific Industrial and research Organization (CSIRO), Edith Cowan University (ECU), Mental Health Research institute (MHRI), National Ageing Research Institute (NARI), Austin Health, CogState Ltd]. The AIBL study has also received support from the National Health and Medical Research Council (NHMRC) and the Dementia Collaborative Research Centres program (DCRC2), as well as funding from the Science and Industry Endowment Fund (SIEF) and the Cooperative Research Centre (CRC) for Mental Health—funded through the CRC Program (Grant ID:20100104), an Australian Government Initiative. Support for AIBL genetic data acquisition and analysis was provided by a grant from the NHMRC (APP1161706) awarded to S.M.L and through the CRC for Mental Health (Grant ID:20100104). T.P. is supported by ECU strategic research funding. Support for the metabolomics sample processing, assays and analytics reported here was provided by grants from the National Institute on Aging (NIA); NIA supported the Alzheimer’s Disease Metabolomics Consortium which is a part of NIA’s national initiatives AMP-AD and M2OVE-AD (R01 AG046171, RF1 AG051550, RF1 AG057452 and 3U01 AG024904-09S4). Additional NIH support from the NIA, NLM and NCI for analysis includes P30 AG10133, R01 AG19771, R01 LM012535, R03 AG054936, R01 AG061788, K01 AG049050 and R01 CA129769. M.A. is supported by National Institute on Aging grants RF1 AG057452, RF1 AG058942, RF1 AG059093, 1U19AG063744 and U01 AG061359. K.N. is supported by NLM R01 LM012535 and NIA R03AG054936. Data collection and sharing for the ADNI was supported by National Institutes of Health Grant U01 AG024904. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ([www.fnih.org](http://www.fnih.org)). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. This study was only possible with the help of the AIBL research group. The authors who made direct contribution to this study have been listed as authors in this article. Members of the AIBL group who did not participate in the analysis or writing of this report are listed here: [https://aibl.csiro.au/about/aibl-research-team/](https://aibl.csiro.au/about/aibl-research-team/). Part of the data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The authors who made direct contribution to this study have been listed as authors in this article. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: [http://adni.loni.usc.edu/wpcontent/uploads/how\_to\_apply/ADNI\_Acknowledgement\_List.pdf](http://adni.loni.usc.edu/wpcontent/uploads/how\_to_apply/ADNI_Acknowledgement_List.pdf). Part of the data used in preparation of this article were generated by the Alzheimer’s Disease Metabolomics Consortium (ADMC). The authors who made direct contribution to this study have been listed as authors in this article. Investigators within the ADMC provided data but did not participate in analysis or writing of this report can be found at [https://sites.duke.edu/adnimetab/team/](https://sites.duke.edu/adnimetab/team/). Metabolomics data and results from the ADNI study have been made accessible through the AMP-AD Knowledge Portal ([https://ampadportal.org](https://ampadportal.org)). The AMP-AD Knowledge Portal is the distribution site for data, analysis results, analytical methodology and research tools generated by the AMP-AD Target Discovery and Preclinical Validation Consortium and multiple Consortia and research programs supported by the National Institute on Aging. ## Footnotes * * Joint first authors, * Received August 20, 2021. * Revision received August 20, 2021. * Accepted August 25, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## REFERENCES 1. 1.Mach, F. et al. Adverse effects of statin therapy: perception vs. the evidence - focus on glucose homeostasis, cognitive, renal and hepatic function, haemorrhagic stroke and cataract. Eur Heart J 39, 2526–2539 (2018). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 2. 2.Grundy Scott, M. et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA Guideline on the Management of Blood Cholesterol. Journal of the American College of Cardiology 73, e285–e350 (2019). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI3My8yNC9lMjg1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. 3.Willer, C.J. et al. Discovery and refinement of loci associated with lipid levels. Nature Genetics 45, 1274–1283 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2797&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24097068&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 4. 4.Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nature Genetics 53, 185–194 (2021). 5. 5.Ference, B.A. et al. Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. J Am Coll Cardiol 60, 2631–9 (2012). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI2MC8yNS8yNjMxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 6. 6.Cadby, G. et al. Heritability of 596 lipid species and genetic correlation with cardiovascular traits in the Busselton Family Heart Study. J Lipid Res 61, 537–545 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjg6IjYxLzQvNTM3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 7. 7.Tabassum, R. et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. Nat Commun 10, 4329 (2019). 8. 8.Demirkan, A. et al. Genome-Wide Association Study Identifies Novel Loci Associated with Circulating Phospho- and Sphingolipid Concentrations. PLOS Genetics 8, e1002490 (2012). 9. 9.Suhre, K. et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature10354&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21886157&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000294404300029&link_type=ISI) 10. 10.Lotta, L.A. et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nature Genetics 53, 54–64 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-020-00751-5&link_type=DOI) 11. 11.Shin, S.Y. et al. An atlas of genetic influences on human blood metabolites. Nat Genet 46, 543–550 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2982&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24816252&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 12. 12.Hicks, A.A. et al. Genetic Determinants of Circulating Sphingolipid Concentrations in European Populations. PLOS Genetics 5, e1000672 (2009). 13. 13.Illig, T. et al. A genome-wide perspective of genetic variation in human metabolism. Nature Genetics 42, 137–141 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.507&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20037589&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000274084400010&link_type=ISI) 14. 14.Draisma, H.H.M. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nature Communications 6, 7208 (2015). 15. 15.Long, T. et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nature Genetics 49, 568–578 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3809&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28263315&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 16. 16.Yousri, N.A. et al. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population. Nat Commun 9, 333 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-017-01972-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29362361&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 17. 17.Chai, J.F. et al. Associations with metabolites in Chinese suggest new metabolic roles in Alzheimer’s and Parkinson’s diseases. Hum Mol Genet 29, 189–201 (2020). 18. 18.Huynh, K. et al. High-Throughput Plasma Lipidomics: Detailed Mapping of the Associations with Cardiometabolic Risk Factors. Cell Chem Biol 26, 71–84 e4 (2019). 19. 19.Huynh, K. et al. Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. Nature Communications 11, 5698 (2020). 20. 20.Yang, J. et al. Genomic inflation factors under polygenic inheritance. European Journal of Human Genetics 19, 807–812 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ejhg.2011.39&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21407268&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 21. 21.Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Research 47, e3–e3 (2018). 22. 22.Inouye, M. et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J Am Coll Cardiol 72, 1883–1893 (2018). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI3Mi8xNi8xODgzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. 23.Harshfield, E.L. et al. Genome-wide analysis of blood lipid metabolites in over 5,000 South Asians reveals biological insights at cardiometabolic disease loci. medRxiv, 2020.10.16.20213520 (2020). 24. 24.Karsai, G. et al. FADS3 is a Δ14Z sphingoid base desaturase that contributes to gender differences in the human plasma sphingolipidome. J Biol Chem 295, 1889–1897 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamJjIjtzOjU6InJlc2lkIjtzOjEwOiIyOTUvNy8xODg5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 25. 25.Jojima, K., Edagawa, M., Sawai, M., Ohno, Y. & Kihara, A. Biosynthesis of the anti-lipid-microdomain sphingoid base 4,14-sphingadiene by the ceramide desaturase FADS3. Faseb j 34, 3318–3335 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1096/fj.201902645R&link_type=DOI) 26. 26.Lone, M.A. et al. Subunit composition of the mammalian serine-palmitoyltransferase defines the spectrum of straight and methyl-branched long-chain bases. Proceedings of the National Academy of Sciences 117, 15591 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE3LzI3LzE1NTkxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 27. 27.Hornemann, T. et al. The SPTLC3 subunit of serine palmitoyltransferase generates short chain sphingoid bases. The Journal of biological chemistry 284, 26322–26330 (2009). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamJjIjtzOjU6InJlc2lkIjtzOjEyOiIyODQvMzkvMjYzMjIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wOC8yNS8yMDIxLjA4LjIwLjIxMjYxODE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 28. 28.Quehenberger, O. et al. Lipidomics reveals a remarkable diversity of lipids in human plasma. J Lipid Res 51, 3299–305 (2010). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjEwOiI1MS8xMS8zMjk5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 29. 29.Jansen, H., Verhoeven, A.J.M. & Sijbrands, E.J.G. Hepatic lipase. Journal of Lipid Research 43, 1352–1362 (2002). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjk6IjQzLzkvMTM1MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA4LzI1LzIwMjEuMDguMjAuMjEyNjE4MTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 30. 30.Santamarina-Fojo, S., González-Navarro, H., Freeman, L., Wagner, E. & Nong, Z. Hepatic Lipase, Lipoprotein Metabolism, and Atherogenesis. Arteriosclerosis, Thrombosis, and Vascular Biology 24, 1750–1754 (2004). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYXR2YmFoYSI7czo1OiJyZXNpZCI7czoxMDoiMjQvMTAvMTc1MCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA4LzI1LzIwMjEuMDguMjAuMjEyNjE4MTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 31. 31.Fernández-Ruiz, I. ANGPTL3 deficiency protects from CAD. Nature Reviews Cardiology 14, 316–316 (2017). 32. 32.Stitziel, N.O. et al. ANGPTL3 Deficiency and Protection Against Coronary Artery Disease. J Am Coll Cardiol 69, 2054–2063 (2017). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI2OS8xNi8yMDU0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 33. 33.Musunuru, K. et al. Exome Sequencing, ANGPTL3 Mutations, and Familial Combined Hypolipidemia. New England Journal of Medicine 363, 2220–2227 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1002926&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20942659&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000284832900009&link_type=ISI) 34. 34.Blackburn, N.B. et al. Identifying the Lipidomic Effects of a Rare Loss-of-Function Deletion in ANGPTL3. Circ Genom Precis Med (2021). 35. 35.Oudit, G.Y. et al. The role of phosphoinositide-3 kinase and PTEN in cardiovascular physiology and disease. Journal of Molecular and Cellular Cardiology 37, 449–471 (2004). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.yjmcc.2004.05.015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15276015&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000223385500009&link_type=ISI) 36. 36.Jayawardana, K.S. et al. Changes in plasma lipids predict pravastatin efficacy in secondary prevention. JCI Insight 4(2019). 37. 37.Meikle, P.J. et al. Plasma lipid profiling shows similar associations with prediabetes and type 2 diabetes. PLoS One 8, e74341 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0074341&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24086336&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 38. 38.Tham, Y.K. et al. Novel Lipid Species for Detecting and Predicting Atrial Fibrillation in Patients With Type 2 Diabetes. Diabetes 70, 255 (2021). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6ODoiNzAvMS8yNTUiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wOC8yNS8yMDIxLjA4LjIwLjIxMjYxODE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 39. 39.James, A.L. et al. Changes in the prevalence of asthma in adults since 1966: the Busselton health study. European Respiratory Journal 35, 273–278 (2010). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZXJqIjtzOjU6InJlc2lkIjtzOjg6IjM1LzIvMjczIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 40. 40.Gregory, A.T., Armstrong, R.M., Grassi, T.D., Gaut, B. & Van Der Weyden, M.B. On our selection: Australian longitudinal research studies. Medical Journal of Australia 189, 650–657 (2008). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19061463&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 41. 41.Cadby, G. et al. Pleiotropy of cardiometabolic syndrome with obesity-related anthropometric traits determined using empirically derived kinships from the Busselton Health Study. Human Genetics 137, 45–53 (2018). 42. 42.Ellis, K.A. et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int Psychogeriatr 21, 672–87 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S1041610209009405&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19470201&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268507700008&link_type=ISI) 43. 43.Mueller, S.G. et al. Ways toward an early diagnosis in Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimers Dement 1, 55–66 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jalz.2005.06.003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17476317&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 44. 44.Weir, J.M. et al. Plasma lipid profiling in a large population-based cohort. J Lipid Res 54, 2898–908 (2013). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamxyIjtzOjU6InJlc2lkIjtzOjEwOiI1NC8xMC8yODk4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 45. 45.Alshehry, Z.H. et al. An Efficient Single Phase Method for the Extraction of Plasma Lipids. Metabolites 5, 389–403 (2015). 46. 46.Gagnon-Bartsch, J.A. & Speed, T.P. Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biostatistics/kxr034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22101192&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000305420000013&link_type=ISI) 47. 47.Knuiman, M.W., Hung, J., Divitini, M.L., Davis, T.M. & Beilby, J.P. Utility of the metabolic syndrome and its components in the prediction of incident cardiovascular disease: a prospective cohort study. Eur J Cardiovasc Prev Rehabil 16, 235–41 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/HJR.0b013e32832955fc&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19238082&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000265517400018&link_type=ISI) 48. 48.Friedewald, W.T., Levy, R.I. & Fredrickson, D.S. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem 18, 499–502 (1972). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiY2xpbmNoZW0iO3M6NToicmVzaWQiO3M6ODoiMTgvNi80OTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wOC8yNS8yMDIxLjA4LjIwLjIxMjYxODE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 49. 49.Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–75 (2007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 50. 50.Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3656&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27571263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 51. 51.Howie, B.N., Donnelly, P. & Marchini, J. A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies. PLOS Genetics 5, e1000529 (2009). 52. 52.Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nature Methods 9, 179–181 (2012). 53. 53.Fowler, C. et al. Fifteen Years of the Australian Imaging, Biomarkers and Lifestyle (AIBL) Study: Progress and Observations from 2,359 Older Adults Spanning the Spectrum from Cognitive Normality to Alzheimer’s Disease. Journal of Alzheimer’s Disease Reports 5, 443–468 (2021). 54. 54.Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44, 821–4 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2310&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22706312&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 55. 55.Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–9 (2006). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng1847&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16862161&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000239325700019&link_type=ISI) 56. 56.Aulchenko, Y.S., Ripke, S., Isaacs, A. & van Duijn, C.M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–6 (2007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btm108&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17384015&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000247348300017&link_type=ISI) 57. 57.Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–1 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq340&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20616382&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281738900017&link_type=ISI) 58. 58.Pruim, R.J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–7 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq419&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20634204&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281714100054&link_type=ISI) 59. 59.Aschard, H., Vilhjálmsson Bjarni J., Joshi Amit D., Price Alkes L. & Kraft, P. Adjusting for Heritable Covariates Can Bias Effect Estimates in Genome-Wide Association Studies. The American Journal of Human Genetics 96, 329–339 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2014.12.021&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25640676&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 60. 60.Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nature Communications 9, 224 (2018). 61. 61.Neale, B. UK Biobank GWAS results - [http://www.nealelab.is/uk-biobank](http://www.nealelab.is/uk-biobank). (2021). 62. 62.Ollier, W., Sprosen, T. & Peakman, T. UK Biobank: from concept to reality. Pharmacogenomics 6, 639–46 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2217/14622416.6.6.639&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16143003&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000231958300013&link_type=ISI) 63. 63.Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2021.03.09.21252822&link_type=DOI) 64. 64.Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmuller, G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics 31, 1334–6 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu779&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25431330&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 65. 65.GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–5 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2653&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23715323&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 66. 66.Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv, 447367 (2018). 67. 67.Kamat, M.A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851–4853 (2019). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 68. 68.Staley, J.R. et al. PhenoScanner: a database of human genotype–phenotype associations. Bioinformatics 32, 3207–3209 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw373&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27318201&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 69. 69.Emilsson, V. et al. Co-regulatory networks of human serum proteins link genetics to disease. Science 361, 769–773 (2018). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjEvNjQwNC83NjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wOC8yNS8yMDIxLjA4LjIwLjIxMjYxODE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 70. 70.Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun 10, 4267 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-019-12228-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31537805&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 71. 71.Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005–d1012 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky1120&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30445434&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 72. 72.van der Harst, P. & Verweij, N. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ Res 122, 433–443 (2018). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNpcmNyZXNhaGEiO3M6NToicmVzaWQiO3M6OToiMTIyLzMvNDMzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDgvMjUvMjAyMS4wOC4yMC4yMTI2MTgxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 73. 73.Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–41 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3406&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26414676&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 74. 74.Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–5 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3211&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25642630&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 75. 75.Wallace, C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet 16, e1008720 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1008720&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32310995&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 76. 76.Wu, P. et al. Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation. JMIR Med Inform 7, e14325 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2196/14325&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 77. 77.Wei, W.Q. et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS One 12, e0175508 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0175508&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28686612&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) 78. 78.Carroll, R.J., Bastarache, L. & Denny, J.C. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 30, 2375–6 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu197&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24733291&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F08%2F25%2F2021.08.20.21261814.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000342746000018&link_type=ISI) 79. 79.Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51, 1749–1755 (2019).