Abstract
Background and aims 1-H nuclear magnetic resonance (1H-NMR) metabolomic measures in plasma have yielded significant insight into the pathophysiology of cardiometabolic disease, but their interrelated nature complicates causal inference and clinical interpretation. This study aimed to investigate the associations of unrelated 1H-NMR metabolomic profiles with coronary artery disease (CAD), type 2 diabetes (T2D) and ischemic stroke (ISTR).
Methods Principal component (PC) analysis was performed on 168 1H-NMR metabolomic measures in 56,712 unrelated European participants from UK Biobank to retrieve unrelated PCs, which were used in multivariable-adjusted cox-proportional hazard models and genome-wide association analyses for Mendelian Randomization (MR). Two-sample MR analyses were conducted in three non-overlapping databases which were subsequently meta-analysed, resulting in combined sample sizes of 755,481 (128,728 cases), 1,017,097 (121,977 cases), and 1,002,264 (56,067 cases) for CAD, T2D, and ISTR, respectively.
Results We identified six PCs which collectively explained 88% of the total variance. For CAD in particular, results from both multivariable-adjusted and MR analyses were generally directionally consistent. The pooled odds ratios (ORs) [95% CI] of per one-SD increase in genetically-influenced PC1 and PC3 (both characterized by distinct ApoB-associated lipoprotein profiles) were 1.04 [1.03, 1.05] and 0.94 [0.93, 0.96], respectively. In addition, the pooled OR for CAD of PC4, characterized by simultaneously decreased small HDL and increased large HDL, and independent of ApoB, was 1.05 [1.03, 1.07]. For the other outcomes, PC5 (characterized by increased amino acids) was associated with a higher risk of T2D and ISTR.
Conclusions This study highlights the existence of an ApoB-independent lipoprotein profile driving CAD. Interestingly, this profile is characterized by a distinctive HDL sub-particle distribution, providing evidence for a role of HDL in the development of CAD.
Introduction
The effectiveness of lowering plasma low-density lipoprotein-cholesterol (LDL-C) to reduce coronary artery disease (CAD) risk is beyond doubt (1). Lowering of plasma triglyceride (TG) levels may also reduce CAD risk on top of LDL-C lowering (2, 3). These observational findings have been attributed to a reduction in apolipoprotein B (ApoB), which has been suggested as the primary marker for cardiovascular disease risk independent of lipid content (cholesterol or TG) and type of ApoB-containing lipoprotein (cholesterol-rich LDL or TG-rich VLDL) (4-7). Although an inverse association between high-density lipoprotein-cholesterol (HDL-C) and cardiovascular disease risk has long been established in prospective studies (8, 9), Mendelian randomization (MR) studies (10, 11) and clinical trials (12, 13) have so far failed to convincingly support a causal role for HDL-C in cardiovascular disease risk.
Importantly, lipoprotein metabolism is a highly dynamic system via which lipids and specific apolipoproteins are passively and actively exchanged between the different lipoprotein classes in the course of their transport and metabolism within the circulation. Considering lipoprotein classes such as LDL or HDL in isolation disregards the intricate interdependence of plasma lipoproteins. It would therefore be more appropriate to consider individual lipoprotein profiles as a whole, characterized by specific distributions of lipids and apolipoproteins over the different lipoprotein classes. Metabolomic platforms based on 1-H nuclear magnetic resonance (1H-NMR) imaging of plasma samples provide such individual profiles by generating detailed measures on the composition, size, number and distribution of the different lipoprotein classes in a sample (14). As an added bonus, other metabolomic measures, such as some amino acids are also reported. Analyses of metabolomic measures as intermediates between exposures and clinical outcomes, is a powerful approach to dissect complex etiologic mechanisms linking metabolic processes to disease (14-16).
The interrelated nature of the lipoproteins also makes it difficult to identify specific genetic instruments for a single lipid or lipoprotein species without pleiotropic effects on the other lipoprotein subclasses. When performing MR studies, this may lead to biased estimations of health effects (17). Here, we tested the hypothesis that individual 1H-NMR metabolomics profiles can be grouped into different overall patterns that are more or less independent from each other and that may have differential associations with cardiometabolic diseases. To address this hypothesis, we performed principal component analysis (PCA) on 1H-NMR metabolomics data from participants in the United Kingdom Biobank (UKB) who were free from disease at the time of sampling (18). The principal components (PCs) can be regarded as independent traits, characterized by a specific overall metabolomic profile, which were exploited to determine the associations with CAD, type 2 diabetes mellitus (T2D), and ischemic stroke (ISTR) and to triangulate findings from observational research through multivariable-adjusted regression and large-scale multicohort MR analyses (19).
Methods
Project design
In the present study, we performed PCA on 1H-NMR metabolomic measures and conducted prospective multivariable-adjusted regression analyses to investigate the associations between selected PCs and examined cardiometabolic diseases in UKB participants. In line with the principles of triangulation, by using selected PCs from UKB participants as exposure, we further conducted genome-wide association studies (GWAS) and subsequent MR studies to assess potential causal associations of selected independent PCs with examined cardiometabolic diseases.
Study population
Prospective multivariable-adjusted regression analyses and genome-wide association analyses were performed in the UKB, which recruited 502,628 participants aged 40-69 years across the entire United Kingdom between 2006 and 2010. The UKB cohort study was approved by the North-West Multicentre Research Ethics Committee (MREC), and the access for information to invite participants was approved by the Patient Information Advisory Group (PIAG) from England and Wales. All participants provided electronic written informed consent for the study. A detailed description of the UKB cohort has been presented elsewhere (18).
Plasma metabolic biomarkers were measured from 121,726 randomly selected UKB participants, 110,002 of whom had complete metabolomic measures and genetics data. To minimize ancestry and population stratification bias, we restricted the study population to 71,736 unrelated individuals of European ancestry, based on the estimated kinship coefficients for all pairs and the self-reported ancestral background (20). A subset of 57,846 participants free from CAD, T2D, ISTR and without taking cholesterol-lowering medication prior to the baseline survey were then selected for further studies. Finally, a total of 56,712 participants with complete data on covariables, including age, sex, the Townsend deprivation index, smoking status, alcohol consumption, body mass index (BMI), blood pressure lowering medication, and fasting time, were eligible for this study.
Profiling of metabolomic measures
The measurement of metabolomic data took place between June 2019 and April 2020 using a high throughput and validated 1H-NMR-metabolomics platform (Nightingale Health, Helsinki, Finland). Technical details of the platform and epidemiological applications have been reviewed previously (14, 21).
The 168 direct metabolomic measures, comprising apolipoproteins (n = 2), lipoprotein particle sizes and concentration (n = 7), lipoprotein (sub)classes (n = 98), cholesterol (n = 15), triglycerides (n = 4), phospholipids and other lipids (n = 8), total lipids (n = 4), glycolysis related metabolites (n = 4), inflammation (n = 1), fluid balance (n = 2), fatty acids (n = 9), amino acids (n = 10), and ketone bodies (n = 4), were included in this study. A full list of the measured 168 metabolites and their concentration characteristics in our study population is presented in Table S1.
Statistical analysis
Principal component analysis
PCA is a method used for dimension reduction by projecting each data point onto a new orthogonal coordinate system while capturing as much of the variation as possible (22), and can thus be used to identify uncorrelated patterns of a large number of interrelated risk factors, also known as PCs. All 168 metabolomic measures from 56,712 participants were first transformed to approximate a normal distribution by inverse rank-based normal transformation and standardized with standard deviation one and mean zero, and then subjected to PCA. The correlations between metabolomic measures and PCs could be represented by loadings (22). For each participant, each PC score was calculated by summing the standardized measures weighted by the corresponding eigenvector values (22).
Prospective analyses
The prospective multivariable-adjusted analyses were performed in the 56,712 UKB participants. Outcome diagnoses were coded according to the International Classification of Diseases edition 10 (ICD-10) and were based on the date of the first occurrence. CAD is defined as angina pectoris (I20), myocardial infarction (MI) (I21 and I22), and acute and chronic ischemic heart disease (IHD) (I24 and I25); ISTR is defined as cerebral infarction (I63); T2D is based on “non-insulin-dependent diabetes mellitus (E11)”. The sources of these variables are from hospital admissions (through linkage with the medical records from the National Health Service), primary care, death register, and through self-report. Based on the date of first appearance derived from the different information sources and the date of enrollment, we defined whether a case was prevalent (before enrollment) or incident (after enrollment). Outcomes in this prospective analysis were incident diseases during the time period from recruitment to January 1st, 2021. Follow-up time is computed from the baseline visit to the diagnosis of incident disease, loss-to-follow-up or death, or the end of the study period, whichever came first.
Three multivariable-adjusted Cox proportional hazard models were fitted to estimate hazard ratios (HRs) and corresponding 95% confidence intervals (95% CI) for the association between PCs and incident CAD, T2D, and ISTR: Model 1 was adjusted for age, sex, and the Townsend deprivation index; Model 2 was additionally adjusted for smoking status, alcohol consumption frequency, BMI, and blood pressure lowering medication; Model 3 was additionally adjusted for fasting time.
Mendelian randomization
MR uses genetic variants, typically single-nucleotide polymorphisms (SNPs), as instrumental variables (23, 24). MR studies depend on three main assumptions, notably: 1) the genetic variant must be associated with the exposure; 2) the genetic variant should not be associated with confounders; 3) the genetic variant affects the outcome only through the exposure. This study used the two-sample MR method, which requires that groups of participants in the gene-exposure association analysis and gene-outcome association analysis do not overlap (25-27).
Genotyping and genetic imputations
UK Biobank genotyping was conducted by Affymetrix using a bespoke BiLEVE Axium array for approximately 50,000 participants, and using the Affymetrix UK Biobank Axiom array for the remaining participants. All genetic data were quality controlled centrally by UK Biobank resources. More information on the genotyping processes can be found online (https://www.ukbiobank.ac.uk). Based on the genotyped SNPs, UK Biobank resources performed centralized imputations on the autosomal SNPs using the UK10K haplotype (28), 1000 Genomes Phase 3 (29), and Haplotype Reference Consortium reference panels (30). Autosomal SNPs were pre-phased using SHAPEIT3 and imputed using IMPUTE4. In total, ∼96 million SNPs were imputed.
Associations of genetic variants with exposure
GWAS were performed on each selected PC for the included 56,712 UKB individuals, using the software program GEM (version 1.4.2) (31), adjusted for age, sex, first 10 genetic PCs, and fasting time. SNPs with a minor allele frequency below 0.001 were removed. For each PC, genome-wide significant SNPs (P < 5*10−8) were selected, and then pruned to obtain independent instrumental variables by the TwoSampleMR and IeuGWASR packages, which use the PLINK clumping method with a clumping window of 10Mb and linkage disequilibrium r2<0.001 (32). To avoid potential residual pleiotropic effects, only SNPs without overlap among PCs were selected and subsequently used to extract the gene-outcome associations. The proportion of exposure variation explained by the genetic variants was depicted by the R2 statistic (33), and the potentially weak instrument bias was examined by the F-statistic, for which a threshold greater than 10 is conventionally considered sufficient for MR analysis (27).
Associations of genetic variants with outcome
Summary association statistics of the identified exposure-related SNPs with each outcome were estimated or extracted from 3 large databases, namely CARDIoGRAMplusC4D (34), DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) consortium (35), and MEGASTROKE (36) for CAD, T2D, and ISTR, respectively, and UKB and FinnGen study for all three outcomes.
Participants from UKB were restricted to unrelated European-ancestry with the full released imputed genomics databases, and were not included in the variant-exposure association analysis. Outcomes in the MR analyses were prevalent or incident diseases, and details of all outcome diagnoses are described in the ‘prospective analysis’ subsection. In total, 34,299 cases and 227,723 controls for CAD, 22,659 cases and 239,363 controls for T2D, and 4,993 cases and 257,029 controls for ISTR were identified. Using the software program of GEM (version 1.4.2) (31), we performed the GWAS to assess the associations of genetic variants with T2D, CAD and ISTR, adjusting for age, sex, and the top 10 genetic principle components, and extracted the summary association statistics of identified exposure-related SNPs.
FinnGen is a public-private partnership research project launched in 2017, which covered the whole of Finland and combined genotype data from Finnish biobanks and digital health record data from Finnish health registries (37). In the FinnGen project and based on the ICD-10 (https://r7.risteys.finngen.fi/), major coronary heart disease (CHD), was defined as angina pectoris (I20), myocardial infarction (I21 to I23), ischemic heart diseases (I24 and I25), cardiac arrest (I46), and other cause unknown death or unattended death (R96 and R98); T2D was defined as non-insulin-dependent diabetes mellitus (E11); ISTR was defined as cerebral infarction (I63) and not specified as haemorrhage or infarction stroke (I64). Summary association statistics between identified exposure-related SNPs and the three outcomes were extracted from published FinnGen data freeze 7, which were based on 33,628 cases and 275,526 controls for major CHD, 44,313 cases and 255,449 controls for T2D, and 16,857 cases and 283,057 controls for ISTR.
The CARDIoGRAMplusC4D consortium assembled 60,801 cases and 123,504 controls from 48 studies for a GWAS meta-analysis of CAD, which was identified as an inclusive diagnosis of myocardial infarction, acute coronary syndrome, chronic stable angina, or coronary stenosis >50%. 77% of the participants were of European ancestry, 13% of South Asian ancestry, 6% of East Asian ancestry, and smaller samples were Hispanic and African Americans (34). The DIAGRAM consortium focuses on performing large-scale studies to characterise the genetic features of T2D. We selected the genome-wide association of T2D based on aggregated GWAS results from 31 studies for European-ancestry individuals with 55,005 cases and 400,308 controls, not including UKB samples (35). The large-scale MEGASTROKE consortium, launched by the International Stroke Genetics Consortium, releases the summary statistics from the 2018 meta-analysis of Genome-wide Association data in stroke and stroke subtypes with 34,217 ISTR cases and 406,111 controls (36).
We extracted summary association statistics of identified exposure-related SNPs with CAD, T2D, and ISTR from CARDIoGRAMplusC4D, DIAGRAM, and MEGASTROKE databases, respectively.
Associations of exposure with outcome
The associations of PCs with each study outcome from each database were estimated by the inverse-variance weighted (IVW) method, which combines the Wald ratio estimates (estimated association of genetic variants with outcome divided by estimated association of genetic variants with exposure) for individual genetic variant by a fixed-effect meta-analysis with inverse-variants weights (26, 27). Those estimates were expressed as log odds ratios (ORs) for the risk of CAD, T2D and ISTR for each PC per one-standard (one-SD) increase. We subsequently conducted meta-analyses for each PC to pool the estimates from the three outcome databases. The heterogeneity of the estimated ORs from three databases for each PC was represented by I2, and detected by the Cochran Q test (38).
Given that the IVW method assumes all genetic instruments are valid (e.g., no horizontal pleiotropy), we conducted sensitivity analyses using the weighted-median estimator and the MR-Egger method to assess whether IVW analyses were biased due to horizontal pleiotropy (39-41). Rather than taking a weighted mean of the ratio estimates as in the IVW method, the weighted-median estimator could still provide a consistent estimate of the causal effect even when up to 50% of the identified genetic variants are invalid IVs (39). In contrast to the IVW method, the MR-Egger method does not require a zero horizontal pleiotropy effect, and could detect pleiotropy by the intercept term (under the InSIDE assumption), which when different from zero indicates a bias in the IVW estimation (40, 41).
Except for the GWAS, all statistical analyses described above were performed in the R (version 4.0.2) software, with ‘prcomp’, ‘survival’ and ‘TwoSampleMR’ packages for PCA, Cox regression analyses and MR analyses, respectively.
Results
Principal component analysis
Metabolomic measures from 56,712 unrelated European-ancestry participants (57% women) with no history of CAD, T2D, stroke, and no cholesterol-lowering therapy at baseline were eligible for analyses in this study. PCA resulted in twelve PCs with eigenvalues above 1, which explained 93.8% of the variance of the original metabolomic data (Table S2). The loadings of the last six PCs (PC7 to PC12, Figure S1) were small (most absolute values < 0.3). Considering the combination of the eigenvalues-greater-than-one rule (42), explained variance and loading interpretability, the top six PCs with a cumulative explained variance of 87.95% were selected for further analyses (Figure 2).
CAD: coronary artery disease; CHD: coronary heart disease; T2D: type 2 diabetes; ISTR: ischemic stroke; PCA: principal component analysis; GWAS: genome-wide association study; MR: Mendelian Randomization.
From the outside to the inside, each circle represents the correlation between the respective principle components (PC) and 168 metabolomic measures. All metabolomic measures were divided into 30 groups and shown in clockwise order. Red or blue colour indicates the increase or decrease of metabolomic measures in PCs.
PC1 (46.7% variance) is mainly characterized by higher levels of ApoB, ApoB-containing lipoproteins and fatty acids. PC2 (22.5% variance) is mainly characterized by higher levels of apolipoprotein A1 (ApoA1), HDL particles and lower levels of VLDL particles. PC3 (9.4% variance) is characterized by lower levels of most ApoB-containing lipoproteins, but higher levels of HDL particles. PC4 (5.0% variance) is characterized by lower levels of small HDL particles and higher levels of very large HDL particles, independent of ApoB. PC5 (2.7% variance) is characterized by higher levels of amino acids, and PC6 (1.6% variance) is characterized by higher levels of ketone bodies.
Prospective multivariable-adjusted regression analyses
The estimated multivariable-adjusted associations of each of the six PCs with the examined incident cardiometabolic diseases for UKB participants are presented in Figure 3. For the risk of CAD, the HRs [95% CI] for per one-SD increase in PC1, PC2, PC3 and PC4 were 1.02 [1.02, 1.03], 0.98 [0.97, 0.99], 0.98 [0.97, 0.99], and 1.02 [1.01, 1.03], respectively. For the risk of T2D, the HRs [95% CI] for per one-SD increase in PC1, PC2, PC3 and PC5 were 1.03 [1.02, 1.03], 0.93 [0.92, 0.94], 1.05 [1.04, 1.06], and 1.05 [1.03, 1.08], respectively. For the risk of ISTR, the HRs [95% CI] for per one-SD increase in PC4 and PC6 were 1.04 [1.01, 1.07] and 1.08 [1.03, 1.34], respectively.
CAD: coronary artery disease; ISTR: ischemic stroke; T2D: type 2 diabetes. Model 1 was adjusted for sex, age and Townsend index; Model 2 was model 1 additionally adjusted for smoking status, alcohol consumption frequency, BMI and blood pressure lowering medication; Model 3 was model 2 additionally adjusted for fasting time. The green, orange and blue lines indicate results based on models 1, 2 and 3, respectively.
Mendelian randomization
A total of 150 independent SNPs were found to be significantly associated with the PCs, of which 41 SNPs, 37 SNPs, 31 SNPs, 22 SNPs, 11 SNPs, and 8 SNPs explained 7.3%, 9.0%, 7.5%, 9.0%, 0.9%, and 1.1% of the variation in PC1, PC2, PC3, PC4, PC5 and PC6, respectively. All F statistics were larger than 10. Supplementary tables provide the details of the independent and non-overlapping genetic variants, including their position, gene-exposure associations and corresponding R2 statistics and F statistics, and gene-outcome association (Table S3-8).
For the association of each PC with each outcome, Cochran Q statistics detected no heterogeneity (P values > 0.05) in the estimated ORs across the three outcome databases (Table S9). Figure 4 shows the estimated associations between each PC and each outcome from each database, and their pooled estimates across the three databases. For the risk of CAD, the pooled estimated ORs [95% CI] per one-SD increase in PC1, PC3 and PC4 were 1.04 [1.03, 1.05], 0.94 [0.93,0.96] and 1.05 [1.03, 1.07], respectively. For the risk of T2D, the pooled estimated ORs [95% CI] per one-SD increase in PC2 and PC5 were 0.98 [0.97,0.99] and 1.09 [1.02, 1.16], respectively. For the risk of ISTR, the pooled estimated ORs [95% CI] per one-SD increase in PC3 and PC5 were 0.97 [0.96,0.99] and 1.12 [1.07, 1.18], respectively.
CAD: coronary artery disease; CHD: coronary heart disease; ISTR: ischemic stroke; T2D: type 2 diabetes. The right y-axis is labelled with the names of three data sources of the summary association statistics between genetic variants and outcome, and with ‘summary’ indicating the fixed-effect meta-analysis. Red lines (but not black lines) indicates estimated associations between principle components (PCs) and outcomes were statistically significant.
The estimated ORs based on the weighted-median estimator analyses were similar to those from IVW analyses (Table S9). No horizontal pleiotropic effect was detected according to the intercepts from MR-Egger, except for effects of PC5 on CHD from the FinnGen database (Table S10) and of PC1 and PC2 on T2D across three databases.
The loadings (Figure 2) and associations of PC1 and PC3 with CAD are in line with previous observations of ApoB as a major driver of CAD risk. Surprisingly, although the contribution of ApoB to PC4 is negligible, PC4 clearly associates with CAD. To assess the contribution of ApoB to CAD in PC4, we reperformed the MR analyses for the residual PC4, which was derived by regressing PC4 on ApoB. The MR result (Figure S2) using the IVW method showed that for the risk of CAD, the estimated pooled OR [95% CI] was 1.07 [1.03,1.11] for per one-SD increase in residual-PC4. This indicates that PC4 is independent of ApoB.
Discussion
We applied PCA on 168 1H NMR-based metabolomic measures in 56,712 UKB participants and identified 6 main PCs representing independent metabolomic profiles, with a cumulative explained variance of 88%. We subsequently used multivariable-adjusted Cox regression and large-scale multicohort MR analyses to examine the cardiometabolic risks associated with these PCs. We found that PC1 (characterized by higher levels of ApoB and ApoB-containing lipoproteins) was associated with higher risk of CAD, and PC3 (characterized by lower levels of most ApoB-containing lipoproteins and higher levels of HDL particles) was associated with lower risk of CAD. Notably, PC4 (characterized by lower levels of small HDL particles and higher levels of very large HDL particles) was also associated with higher risk of CAD. PC5 (characterized by higher levels of amino acids) was associated with the risk of T2D and ISTR. Our findings with PC1 and PC3, which mainly captured higher and lower ApoB-associated lipoproteins, respectively, in relation to CAD are in line with previous findings that ApoB-containing lipoproteins drive atherogenic cardiovascular disease (1, 2, 43, 44). PC4 was also associated with CAD, but this association seemed independent from ApoB. Based on the loading values (Figure 2), the higher PC4-related risk of CAD is associated with a very specific HDL size distribution characterized by lower levels of small HDL particles and higher levels of large HDL particles. These analyses provide evidence for a potential (causal) association of HDL particles/composition with CAD, independent of ApoB.
It has previously been suggested that clinically measured HDL-C may not capture the protective effects of HDL on CAD (45). Hypotheses on HDL function have been proposed, suggesting that the protective properties of HDL, such as antioxidant effects, removal of cellular cholesterol and production of nitric oxide, may depend on specific HDL sub-particle characteristics and cannot reliably be estimated via the simple measurement of HDL-C (46, 47). Accordingly, in line with our observations, small HDL was found to have atheroprotective effects on macrophages and endothelial cells (47). Reduced hepatic scavenger receptor class BI (SR-BI) function was found to be associated with impaired reverse cholesterol transport (RCT), and participants with SR-BI deficiency had an increased risk of coronary heart disease despite increased HDL-C levels (48). Presumably, this increased HDL-C is caused by the accumulation of cholesterol loaded large HDL particles that cannot be cleared via SR-BI by the liver. In line with this interpretation, the loadings of PC4 indicated higher levels of large HDL particles associated with increased CAD risk. These and our data thus provide evidence for the hypothesis that for HDL-targeted therapy to be effective in prevention of CAD, higher levels of small HDL particles and lower levels of large HDL particles are warranted. Treatment with CETP inhibitors results in the opposite effect on the HDL profile (49-51), which may therefore at least partly explain the clinical failure of these drugs thus far.
In addition, our study found that genetically-influenced PC3 (characterized by lower levels of most ApoB-containing lipoproteins and higher levels of HDL particles) was also associated with a lower risk of ISTR. Although not all studies support an increased risk of ISTR by elevated LDL-C (52, 53), LDL-C levels were found to increase the risk of the large artery stroke subtype in recent MR studies (54, 55). Moreover, statins remain one of the main strategies to prevent ISTR (56). ApoB has also been suggested as the predominant risk factor for the risk of ISTR (57). In addition, genetically predicted HDL-C was also found to possibly decrease the risk of ISTR, particularly small vessel stroke (55, 58). Nevertheless, although no association with ISTR was demonstrated for PC1, our study provides evidence for a role of dyslipidaemia in the development of ISTR.
While cardiovascular diseases and T2D share some underlying risk factors, their relationships with lipoprotein metabolism was found to be different (59). A lipoprotein profile, consisting of higher levels of large VLDL particles and small LDL particles, lower levels of large HDL particles, smaller LDL and HDL particle size, and larger VLDL particle size, was found to be associated with incident diabetes (60-63). In the present study, the opposite of this risk lipoprotein profile was seen in PC2. Therefore, as expected, our study found that PC2 was associated with lower risk of T2D but seemingly with a weak effect in the MR analyses.
We found that PC5, mainly characterised by higher levels of amino acids, is a risk factor for T2D and ISTR. Multiple prospective analyses have observed associations of specific amino acids with increased risk (branched chain amino acids [BCAAs], alanine, phenylalanine) or decreased risk (glutamine and glycine) of T2D (64-67). The metabolism of amino acids has been shown to be crucial for the development and progression of ISTR (68). For example, higher levels of BCAAs and glutamate have consistently been reported to contribute to an increased risk of ISTR (69, 70).
There are three main strengths of the present study. First, the large number of 1H-NMR-based metabolomic measures in a very large number of disease free individuals, especially various lipids and lipoprotein fractions, enabled thorough description of the interrelationship among metabolomic measures and the identification of specific profiles. Second, for MR analyses, a large-scale multicohort design was used, which provided ample power. Third, the two-sample MR study design reduced bias by non-overlapping samples between gene-exposure association analyses and gene-outcome association analyses. In addition, pleotropic effects were avoided by excluding overlapping SNPs among PCs.
There are also several limitations to be considered. The metabolomic measures from UKB are from non-fasting samples. Non-fasting may result in measurements of lipids and lipoproteins that are not representative for average daily levels, especially TG and LDL-C (71). However, recent studies suggested that fasting is not routinely required for risk analysis of lipid profiles, and that the measurement of ApoB is stable with or without fasting (71-74). Additionally, the exposures were assessed based on PCs that represented standardised composite traits, so the estimates cannot be interpreted as effects of specific metabolomic biomarkers per unit change, which may decrease the direct clinical value and requires further research.
In conclusion, the present study, based on independent profiles of metabolomic measures, not only confirmed the effect of ApoB-containing lipoproteins on CAD, but also revealed the existence of an alternative ApoB-independent metabolomic profile associated with CAD risk, providing evidence for the potential role of HDL in the development of CAD. Furthermore, our findings support the notion that lipids, lipoproteins and amino acids are important risk factors for the development of T2D and ISTR. More research focusing on specific metabolomic measures is needed to investigate their specific causal role in the development of cardiometabolic disease.
Funding
Ms. Ao is supported by the China Scholarship Council (CSC; no. 202106240064). Prof. dr. Jukema and Prof. dr. Rensen are supported by the Netherlands Cardiovascular Research Initiative: an initiative with support of the Dutch Heart Foundation (CVON-GENIUS-2). Dr. Noordam is supported by an innovation grant from the Dutch Heart Foundation (grant number 2019T103).
Conflict of Interest
none declared
Data availability statement
Data used in the multivariable-adjusted analyses will be made available upon request in adherence with transparency conventions in medical research and through reasonable requests to the corresponding author. Data used in the MR analyses are all publicly available provided in the article or via the corresponding consortium.
Acknowledgements
The present study has been conducted using the UK Biobank Resource (Application Number 56340) that is available to researchers. The authors acknowledge the participants and investigators of all consortia that contributed summary statistics data, including the UK Biobank, CARDIoGRAMplusC4D, HERMES consortium, and the FinnGen study.