Abstract
Background Exposure to individual metals (and metalloids; hereafter ‘metals’) is associated with adverse cardiometabolic outcomes. Specifying analytic models to assess relationships among metal mixtures and cardiometabolic outcomes requires evidence-based models of the (assumed) causal structures; however, such models have not been previously published.
Methods We conducted a systematic literature review to develop an evidence-based directed acyclic graph (DAG) identifying relationships among metals, cardiometabolic health indicators, and potential confounders. To evaluate the consistency of the DAG with data from 1797 participants in the San Luis Valley Diabetes Study (SLVDS; mean age=54 years, 53% women, 48% Hispanic), we tested conditional independence statements suggested by the DAG and by 100 DAGs with the same structure but randomly permuted nodes using linear (continuous outcomes), logistic (dichotomous outcomes), or Bayesian kernel machine regression (BKMR; statements with metal coexposures) models. Based on minimally sufficient adjustment sets identified by the DAG, we specified BKMR models assessing associations between urinary metal mixtures and cardiometabolic outcomes in the SLVDS population.
Results Twenty-nine articles met the inclusion criteria for the systematic review. From these articles, we developed an evidence-based DAG with 382 testable conditional independence statements (71% supported by SLVDS data). Only 3% of the DAGs with randomly permuted nodes indicated more agreement with the data than our evidence-based DAG. Applying the evidence-based DAG in a pilot analysis, we did not observe evidence for an association among metal mixtures and cardiometabolic outcomes.
Conclusions We developed, tested, and applied an evidence-based approach to analyze associations between metal mixtures and cardiometabolic health.
What this study adds We conducted a systematic literature review to develop an evidence-based directed acyclic graph (DAG) of the presumed causal relationship between exposure to metal mixtures and cardiometabolic outcomes. Using real data, we evaluated the testable conditional independence statements. The evidence-based DAG outperformed 97% of DAGs with randomly permuted nodes. We applied the evidence-based DAG to select covariates for a pilot analysis. Environmental Epidemiology readers can (1) plan future research based on our systematic literature review, (2) use our process to evaluate other evidence-based DAGs, and (3) apply the evidence-based covariates sets to further explore relationships between metal mixtures and cardiometabolic health.
Introduction
Extensive epidemiologic and toxicologic evidence indicates that metals and metalloids [e.g., cadmium (Cd), inorganic arsenic (iAs), manganese (Mn), and tungsten (W); hereafter simplified as ‘metals’] are associated with cardiometabolic outcomes.1–11 Whereas most of the published studies considered exposure to only one metal, numerous calls exist to examine associations between metal mixtures and cardiometabolic outcomes.2, 12–14 Examining the health outcomes associated with metal mixtures would more realistically reflect environmental exposure conditions.15, 16 Yet even analyses that include multiple metals typically only adjust for concentrations of non-target metals (not accounting for interactions among metal mixtures) or use stratified analyses. These stratified analyses, often in the form of associations with low exposure to metal A/low exposure to metal B versus high exposure to metal A/high exposure to metal B, do not capture complex non-linear relationships.2, 17–20
Development of predictive analytic models probing the complex relationships among metal mixtures and cardiometabolic outcomes requires an understanding of the putative underlying causal structure. An evidence-based directed acyclic graph (DAG) is one way to represent such a causal structure. DAGs clarify causal contrasts and explicitly show assumptions about common causes of exposures and outcomes (e.g., dietary sources of metal (co)exposure that also affect cardiometabolic health) that we need to account for in our study design and/or analysis.21–24 DAGs are also useful for identifying minimally sufficient adjustment sets of variables; when DAGs are used in this way, articles should report the assumed DAG.25 Few evidence-based DAGs exist in the environmental epidemiology context due to the need to conduct systematic literature reviews and to empirically test the applicability of the DAG for the study context (as one example, see Corlin et al.26). No evidence-based DAGs have been previously published describing the structure underlying potential metal mixture-cardiometabolic outcome relationships. Such a DAG could help researchers assess how specific environmentally relevant metal mixtures mechanistically affect the development of cardiometabolic outcomes. Therefore, our primary objective was to conduct a systematic literature review to support the development of an evidence-based DAG diagraming the relationships among exposure to metal mixtures, the development of cardiometabolic outcomes, and potential common causes of exposures and outcomes. Our secondary objective was to evaluate this DAG and apply it to a real environmental health context using data from a cohort of adults residing in the rural San Luis Valley of Colorado.
Methods
Literature review and directed acyclic graph development
We conducted a systematic search using PubMed, ProQuest, and Embase. The search strategy is detailed in Figure 1. The searches for each database were as follows:
PubMed search: (((((((((inorganic arsenic[MeSH Terms]) OR manganese[MeSH Terms]) OR cadmium[MeSH Terms]) OR uranium[MeSH Terms]) OR tungsten[MeSH Terms])) OR alloy[MeSH Terms])) AND ((cardiovascular disease[MeSH Terms]) OR type 2 diabetes mellitus[MeSH Terms])) AND (((((“meta analysis”[Publication Type]) OR “review”[Publication Type]) OR “systematic review”[Publication Type]) OR “consensus development conference”[Publication Type]) OR “randomized controlled trial”[Publication Type])
ProQuest search: (ti(cardiovascular disease) OR ti(type 2 diabetes)) AND (ti(inorganic arsenic) OR ti(manganese) OR ti(cadmium) OR ti(uranium) OR ti(tungsten))
Embase search: (’cardiovascular disease’/exp OR ‘non insulin dependent diabetes mellitus’/exp) AND (’cadmium’/exp OR ‘inorganic arsenic’/exp OR ‘manganese’/exp OR ‘uranium’/exp OR ‘tungsten’/exp OR ‘alloy’/exp) AND ([systematic review]/lim OR [meta analysis]/lim) AND [2013-2019]/py
In PubMed, we specified our search to only meta-analyses, reviews, systematic reviews, consensus development conferences, or randomized controlled trials. In ProQuest, we further narrowed the search to reviews, which encompassed literature reviews, systematic reviews, and meta-analyses. Finally, in Embase, we searched for systematic reviews and meta-analyses. All searches were restricted to articles published in English within the seven years prior to the review (after September 2013). Articles had to assess one of the five primary metals of interest (As, Cd, Mn, uranium (U), W) as the exposure and a cardiometabolic condition as the outcome. Articles were excluded if they were synthesized in a meta-analysis or review article included in our literature review. We also searched the reference sections of each included article for the terms “mixture” or “alloy,” and included articles that met the search criteria but did not appear in the database searches (n = 1).
From each article included in the review, we collected the following information: authors, year, journal, study population, sample size, location, exposure characteristics (e.g., concentration), outcome, covariates, reported effect estimates, proposed causal pathway/biological mechanism, and reported limitations. Based on the extracted data, we created DAGs using the software DAGitty.27 Each arrow from an exposure to an outcome represented a relationship mentioned in at least one articles from the literature review (Figure 2).1, 3, 5, 6, 9, 10, 12–14, 28–47 Each arrow mapping a covariate to either the exposure or the outcome was verified from either an article in the literature review or an alternate source found searching in PubMed (Appendix A). The primary DAG is shown in Figure 2, and the metal specific DAGs are presented in the supplement (Supplemental Figures 1-5).
Evaluation and application data
To evaluate and apply the evidence-based DAG, we used data from the San Luis Valley Diabetes Study (SLVDS), a prospective cohort study assessing the risk factors for chronic disease among Hispanic and non-Hispanic white adults in rural Colorado. Data collection methods have been detailed elsewhere.48 Briefly, people with diabetes residing in Alamosa or Conejos counties, Colorado were recruited through medical records reviews and local advertisements. People without diabetes were recruited using a stratified random sampling scheme based on residential location in these counties. All participants (with or without diabetes) met three additional eligibility criteria: (1) aged 20-74 years old, (2) able to provide informed consent, and (3) proficient in English or Spanish. Baseline data collection occurred between 1984 and 1988, and follow-up data collection occurred between 1988 and 1998.
Urinary metal exposures used in this analysis were assessed at baseline. Samples (approximately 120 ml) were stored in tubes in a freezer at -80°C until the laboratory analysis was conducted in 2008. An inductively coupled argon plasma instrument with a mass spectrometer was used to detect the metal concentrations with a detection limit of 1 part in 10. Values below limit of detection were defined as the square root of detection limit divided by 2. All laboratory methods met the standards of the Clinical Laboratory Improvement Amendment and Environmental Protection Agency.49 All analyses were adjusted for urinary creatinine concentrations (g/L). Urinary creatinine was quantified using a colorimetric assay by the Jaffe reaction.
Outcomes were assessed at baseline and follow-up study visits. We created a dichotomous variable for having an adverse cardiometabolic outcome. We included people who reported incident: (1) coronary bypass surgery, (2) peripheral vascular surgery, (3) myocardial infarction, (4) stroke, (5) low high-density lipoprotein cholesterol concentration (fasting values <40 mg/dL in men or <50 mg/dL in women), or (6) high triglyceride concentration (fasting values ≥150 mg/dL).50 At baseline, participants self-reported age, sex, ethnicity (Hispanic/non-Hispanic), smoker status (never smoker [<100 lifetime cigarettes]/former smoker [≥100 lifetime cigarettes but not currently smoking]/current smoker [≥100 lifetime cigarettes and currently smoking]), annual gross household income (0-$7,499, $7,500-$19,999, $20,000-$34,999,≥$35,000), educational attainment (<12 years/12 years/>12 years), marital status (living together or married/not in a current domestic partnership or married), employment status (in the labor force and working/not working, retired, or disabled), occupation (agriculture workforce/non-agriculture), physical activity, and diet. Participants’ physical activity at work was classified as sedentary, moderate, or vigorous. Participants’ overall physical activity level accounting for activity during work and non-work time was categorized as sedentary, somewhat active, moderately active, very active, or most active.51 Participants completed a food frequency questionnaire.52, 53 Food intake for each category (i.e. plant-based foods and proteins) was measured as g/day, and alcohol intake was measured as g/week. Drinking water intake was measured as the number of eight-ounce glasses consumed per day. Since the diet variables (kcals, vitamin C, zinc, selenium, vitamin A, beta carotene, folic acid, protein, total fats, saturated fats, monounsaturated fats, polyunsaturated fats, cholesterol, carbohydrates, total sugar, plant-based foods, insoluble, soluble, and total fiber, legumes, and omega-3) were highly correlated, we summarized them by conducting a principal component analysis. We normalized the diet variables by centering and variance-standardizing them. Then, we used singular value decomposition implemented with the numpy linear algebra solver on the normalized dietary data to identify the top two eigenvectors, which together accounted for 53% of the variance. We transformed the participants dietary data by matrix multiplication with the top two eigenvectors to project the dietary data onto the first two principal components.
Additionally, anthropometric and clinical measurements were taken at baseline. Researchers measured participants height and weight, and these measurements were used to calculate body mass index (BMI; obesity defined as BMI >30 kg/m2). Researchers also measured blood pressure three times. The average of the second and third diastolic blood pressure measurements was used in the analysis. We considered individuals to have high blood pressure if the participant self-reported hypertension, self-reported prior or current use of hypertension medication, or was clinically diagnosed with hypertension at the clinic visit. Participants who self-reported current hypertension medication use brought the medication to the clinic visit for confirmation. For consistency with cutoffs at the time of SLVDS data collection, diastolic blood pressure was categorized into four levels: (1) normal (<90 mmHg), (2) mild hypertension (90 - <105 mmHg), (3) moderate hypertension (105 - <155 mmHg), or (4) severe hypertension (≥115 mmHg).54
Statistical analysis
Our analysis was split into two parts: (1) evaluation of the DAG, and (2) application of the DAG in a pilot analysis. For both parts, we mapped each node in the evidence-based DAG to the corresponding variables in the SLVDS data. There were three exceptions: (1) ambient air quality and soil exposure were unmeasured in the SLVDS; (2) we excluded kidney damage due to the potential for reverse causation;55, 56 and (3) we excluded seafood consumption from analysis based on dietary patterns in this population.
In part one of our analysis, we first tested each conditional independence statement implied by the evidence-based DAG using linear (continuous outcomes), logistic (dichotomous outcomes), ordered factor response logistic (ordinal outcomes), or Bayesian kernel machine regression (BKMR; statements with metal mixtures as an exposure node) models. We considered the conditional independence statement to hold if the regression coefficient was not significant using the Wald test (for linear models, or equivalent statistic for other models). For the BKMR models, we examined the predicted mean and standard deviation for the outcome when the metals were held at their 25th versus 75th percentile values using a z-test. The metals included in this analysis were antimony (Sb), As, barium (Ba), cadmium (Cd), cesium (Cs), chromium (Cr), cobalt (Co), copper (Cu), lead (Pb), Mn, molybdenum (Mo), plutonium (Pt), selenium (Se), thallium (Tl), U, W, and zinc (Zn). If the resulting p value was ≥0.05, we did not reject the null hypothesis that the conditional independence statement held. We assessed the proportion of the total testable conditional independence statements that were supported by the data. We then repeated this same process 100 more times, using DAGs with the same structure but randomly permuted nodes. In theory, a lower proportion of the conditional independence statements generated by the DAGs with randomly permuted nodes should be supported by the data than the conditional independence statements generated by the evidence-based DAG. We assessed the fraction of the 100 DAGs with randomly permuted nodes for which this expectation held.
In part two of our analysis, we used DAGitty to identify the three minimally sufficient adjustment sets from the evidence-based DAG. The sets were: (1) age, ethnicity, income, obesity, hypertension, alcohol, drinking water, meat intake, smoking, and plant-based food intake; (2) age, ethnicity, income, obesity, hypertension, alcohol, drinking water, meat intake, smoking, and diet; and (3) age, ethnicity, income, obesity, hypertension, alcohol, meat intake, smoking, diet, sex, and education. We estimated BKMR models for the associations between urinary metals concentrations (total As, Cd, Mn, U, and W; comparing the 75th to the 25th percentile) and the likelihood of having a cardiometabolic outcome (separate models adjusting for each minimally sufficient adjustment set plus urinary creatinine). We conducted sensitivity analyses with the same three sets of covariates: (1) stratifying the models by smoker status (never versus current/former); and (2) accounting for potential collinearity concerns identified by Spearman correlations by removing drinking water intake and plant-based food intake from the first two models. All analyses were conducted in R (R Core Team, Vienna, Austria) or Stata v16 (StataCorp, College Station, Texas). In R, we used the package BKMR.19, 57 Figures were developed using DAGitty and ggplot2 in R.27, 58
Results
DAG development
We identified 29 articles that met the criteria for inclusion in the systematic literature review (Figure 1). These articles included eight meta-analyses, 16 literature or systematic reviews, three multi-centric cross-sectional studies, one multi-centric cohort study, and one multi-centric case-cohort study. The most commonly included metals were As (n = 13) and Cd (n = 13; Table 1). The evidence-based DAG illustrating the putative causal structure relating metal mixture exposures to cardiometabolic outcomes is presented in Figure 2. Secondary DAGs illustrating the putative causal structure relating individual metals and cardiometabolic outcomes are presented in Supplemental Figures 1-5.
SLVDS sample description
In the SLVDS, there were 1797 participants; however, the sample size for each analysis varied by the availability of data for the variables included in the specific model. Of all participants with metals measurements (n = 1609), 53% were female and 48% were Hispanic (Table 2). The mean age was 54 years (standard deviation = 12 years), and 67% had one of the six cardiometabolic outcomes at first follow-up. As shown in Table 3, the urinary metal concentrations in the SLVDS participants were higher than those reported in the 1988-1994 and 2015-2016 National Health and Nutrition Examination Surveys (NHANES).
DAG assessment
In total, the evidence-based DAG indicated 664 conditional independence statements. Based on data availability, there were 382 total testable conditional independence statements and 71% of these were supported by the SLVDS data (p≥0.05). The percentage of statements supported by the data depended on the presence of metal mixtures as a node: 67% of the 291 statements without metal mixtures as a node and 85% of the 91 statements with metal mixtures as a node were supported. Of the 100 DAGs with randomly permutated nodes, only three had >71% conditional independence statements supported by the data (Supplemental Figure 6).
Several nodes were almost consistently conditionally dependent on other related nodes identified in the DAG. For example, only one conditional independence statement related to cardiometabolic outcomes was significant (i.e., cardiometabolic outcomes was conditionally independent of employment status given age, education, income, physical activity, ethnicity, obesity, marriage status, sex, and diet). All other statements had a p ≥0.05 and were dependent. Similarly, when considering alcohol intake as a node, 20 of the 25 conditional independence statements had p ≥0.05, indicating the dependence statement was supported by the SLVDS data. Other nodes were less well represented by the data. Of the 18 conditional independence statements related to employment status, over half (56%) were significantly independent. A detailed list of all conditional independence statements and the corresponding p-values can be found in Appendix B.
DAG application – pilot analysis
Using any of the three minimally sufficient adjustments sets of covariates identified by the evidence-based DAG, we did not observe strong evidence of an association between metal mixture exposures and the likelihood of having an adverse cardiometabolic outcome (Figure 3a-c). The strongest positive associations between any individual metal within the mixture and having a cardiometabolic outcome were observed for Cd in each of the three models. There was no indication of interaction among any metals within the mixture, and the results did not depend on the set of covariates (Supplemental Figure 7a-c). In sensitivity analyses stratified by smoker status, results remained relatively unchanged; however, the model using the first set of covariates for never smokers was less stable than all other models (Supplemental Figures 8a-c and 9a-c). In sensitivity analyses excluding drinking water intake and plant-based food intake from the relevant sets of covariates, results also remained largely unchanged (Supplemental Figure 10a- b).
Discussion
We developed, tested, and applied an evidence-based DAG illustrating the putative causal structure underlying associations between exposure to metal mixtures and cardiometabolic health. Our approach to developing the DAG using a systematic literature review was consistent with many of the tenants of DAG development discussed elsewhere.26, 59 Our quantitative assessment of the DAG structure suggested that the evidence-based DAG was a reasonable representation of relationships among variables in a real data set. Furthermore, as demonstrated through our pilot analysis, environmental epidemiologists can apply our evidence-based DAG to transparently, reproducibly, and efficiently investigate longitudinal associations between complex environmental exposures and cardiometabolic health outcomes adjusting for one of three identified sets of covariates.
Through the systematic literature review process, we identified several critical gaps in the literature that could be more effectively and efficiently addressed using our evidence-based DAG and analytic approach (e.g., assessing metal mixtures in relation to incident cardiometabolic health outcomes). Evidence-based DAGs can guide the specification and interpretation of longitudinal models – even in complex metal mixture exposure scenarios.2, 12, 14, 31 Insights into potential biological mechanisms can be derived from evidence-based DAGs, and these DAGs can be used to develop quantitative assessments of mechanistic hypotheses within observational studies.5, 30, 31, 37, 45, 60 Additionally, future work could investigate dose-response relationships between metal mixtures and health outcomes using the BKMR analytic approach we applied.6, 9, 10
Beyond these challenges of multipollutant epidemiology that can be partially addressed through the development and application of evidence-based DAGs, we also identified several issues through our literature review that are unlikely to be handled by DAGs. DAGs can inform study design, exposure assessment priorities, and analytic model specification, but DAGs alone cannot fix fundamental problems with data collection (e.g., measurement error) or challenges of modeling environmental exposures over the life course. For example, several papers in our literature review discussed the use of urinary metal exposures as a limitation (notably one that is also present in our pilot analysis) because urinary exposures do not necessarily reflect total lifetime exposure, or even average exposure over an extended time period depending on the metal of interest and renal functioning.29, 39, 40
Our systematic literature review suggested several mechanisms through which exposure to individual metals could potentially affect cardiometabolic health outcomes; however, future work is needed to understand how metals may interact physically and/or chemically to affect health. For individual metals, much of the literature focuses on how arsenic may be associated with cardiometabolic health through mechanisms mediated by epigenetic changes,29 and through associations with immune function, endothelial dysfunction, and oxidative stress.33, 61 Similarly, there is extensive literature relating cadmium exposure to inflammatory biomarkers, kidney toxicity, endothelial dysfunction, and oxidative stress.38, 42 These types of hypotheses could be further explored using the secondary DAGs we developed for the putative relationships between individual metals and cardiometabolic health outcomes (Supplemental Figures 1-5).
Applying the evidence-based DAG we developed to epidemiological analyses will give environmental epidemiologists a defensible, reproducible method to identify and adjust for confounding. To the extent that new research is published challenging or adding to the DAG we show here, we encourage researchers to incorporate the new knowledge and update their covariate sets; indeed, a prime advantage of using DAGs is that we can have a scientific conversation about explicit modeling assumptions.25 For example, others may wish to include hypertension as a cardiometabolic outcome (rather than a node that could be a potential confounder). Nevertheless, given the remarkably similar results observed in our pilot analysis using each of the three minimally sufficient adjustment sets and given the DAG assessment results using the DAGs with randomly permutated nodes, we can suggest that our DAG development process was robust and that the DAG is compatible with real-world data.62
The null results of the pilot analysis may be attributable to several factors. First, it is possible that the null trends overall reflect an averaging of positive and negative effects of components within the mixture. For example, As has been positively associated with hypertension, whereas Mn has been negatively associated with hypertension.63–65 Thus, the averaging of the positive and negative effects could lead to null results. Second, we used total arsenic rather than speciated arsenic. Certain species of inorganic arsenic have been found to be more toxic than forms of inorganic and organic arsenic.33 Third, we did not have data on several nodes for the DAG (i.e., soil and ambient air quality). However, soil is more of a risk to children due to higher absorption rates and smaller sizes,66, 67 and exposure through ambient air is less common in rural areas with reduced industrial activity.68 Fourth, we may have over-adjusted or introduced co-exposure bias amplification when using BKMR; however, our sensitivity analyses suggest that the impact of these issues was minimal. Fifth, our decision to include several types of cardiometabolic health outcomes together (including sub-clinical and clinical outcomes) may have limited our ability to detect true associations with more specific outcomes. Additionally, there are several mechanisms through which individual metals can be associated with certain cardiometabolic outcomes, and these could not be assessed in this pilot analysis. In addition to these potential issues, we note limitations of our pilot analysis such as high urinary metal concentrations in the study population compared to the U.S. population, the lack of diversity in our cohort beyond non-Hispanic and Hispanic White individuals, and the limited transportability of our results to non-rural populations. However, although the pilot analysis results are not transportable outside of populations similar to our target population, the results from our extensive literature review and evidence-based DAG could be applicable to any cohort.
Through our systematic literature review incorporating evidence from multiple search engines, we were able to construct an evidence-based DAG to inform covariate selection and/or study design for future longitudinal analyses of the cardiometabolic health effects of exposure to metal mixtures. We also demonstrated an approach to explicitly state and test assumptions through our application of the evidence-based DAG to the rich SLVDS data set. We encourage other environmental epidemiologists to develop and use such tools to increase the scientific rigor, transparency, and reproducibility of their work.
Data Availability
The San Luis Valley Diabetes Study (SLVDS) summary data can be shared with the scientific community in accordance with the SLVDS data sharing plan (which includes Data Use Agreements and anonymizing of data to protect subject confidentiality). Analytic code is available upon reasonable request from the corresponding author.
Funding
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD) grant number K12HD092535 (Corlin), Tufts University Department of Public Health and Community Medicine (Corlin and Riseberg), Tufts Institute of the Environment (Riseberg), R00ES027853 (Alderete)
Author disclosures
The authors report no conflicts of interest.
Acknowledgements
We would like to thank the participants and staff of the San Luis Valley Diabetes Study.
All the intellectual property and data generated were administered according to policies from the University of Colorado and the NIH, including the NIH Data Sharing Policy and Implementation Guidance of March 5, 2003. The San Luis Valley Diabetes Study (SLVDS) summary data can be shared with the scientific community in accordance with the SLVDS data sharing plan (which includes Data Use Agreements and anonymizing of data to protect subject confidentiality). Analytic code is available upon reasonable request from the corresponding author.
Abbreviations
- As
- arsenic
- BKMR
- Bayesian kernel machine regression
- BMI
- body mass index
- CAD
- coronary artery disease
- Cd
- cadmium
- CHD
- coronary heart disease
- CVD
- cardiovascular disease
- DAG
- directed acyclic graph
- iAs
- inorganic arsenic
- Mn
- manganese
- NHANES
- National Health and Nutrition Examination Surveys
- PAD
- peripheral artery disease
- SLVDS
- San Luis Valley Diabetes Study
- T2DM
- type 2 diabetes mellitus
- U
- uranium
- W
- tungsten