Abstract
Studies suggest that adiposity in childhood may reduce the risk of breast cancer in later life. The biological mechanism underlying this effect is unclear but is likely to be independent of body size in adulthood. Using a Mendelian randomization framework, we investigated 18 hypothesised mediators of the protective effect of childhood adiposity on later-life breast cancer, including hormonal, reproductive, physical, and glycaemic traits.
Our results indicate that, while most of the hypothesised mediators are affected by childhood body size, only IGF-1, testosterone, age at menarche and age at menopause influenced breast cancer risk. However, accounting for those traits in multivariable Mendelian randomization showed that the protective effect of childhood body size still remained. This suggests either a direct effect of childhood body size on breast cancer risk or mediation via other pathways not considered.
Our work presents a framework for the systematic exploration of potential biological mediators of disease in Mendelian randomization analysis.
Introduction
Breast cancer is the leading cause of cancer-related deaths among women, with 1 in 8 at risk of developing the disease in high-income countries [1], [2]. Although breast cancer mortality has declined over recent decades, mostly due to improved treatments and personalised diagnoses [3], the incidence of the disease has been steadily increasing by 3.1% annually since the 1980s [4]. Reducing the incidence rates will depend on better understanding and communicating modifiable risk factors both population-wide and targeted at women with an increased risk [5].
The second-largest preventable risk to all cancers after smoking is obesity [6], which has been extensively studied in relation to breast cancer. In observational studies, there is consistent evidence for a positive association of increased body mass index (BMI) with post-menopausal breast cancer, but an inverse association with pre-menopausal breast cancer [5], [6]. This may be explained by varying levels of exposure to oestrogen in overweight compared with normal-weight women. Pre-menopause, overweight women have longer anovulatory cycles, thereby decreasing their exposure to ovarian hormones which has been suggested to reduce their breast cancer risk [9]. Post-menopause, adipose tissue is the main source of oestrogen biosynthesis [10]. This increases exposure to oestrogen in overweight compared to normal-weight women, which may explain their higher risk of breast cancer after menopause.
While previous conclusions have largely been drawn from observational epidemiological studies [11]–[13], several Mendelian randomization (MR) analyses have shown a contrary result, where genetically-instrumented BMI is inversely associated with the risk of both pre- and post-menopausal breast cancer [14], [15]. Since the genetic variants used to instrument BMI are set at birth, they should not be affected by environmental factors in later life. Thus, Guo et al [14] hypothesised that the positive association between high BMI and post-menopausal breast cancer risk seen observationally may reflect adiposity and weight gain later in life. This is supported by findings of an inverse association between early-life (childhood and adolescence) BMI with breast cancer risk [11], [12]. A recent MR study [16] using genetic variants related to childhood body size showed the same protective effect on breast cancer risk. Furthermore, using a multivariable MR analysis that accounted for adult body size, this study indicated that the protective effect of childhood body size influences breast cancer risk directly, independently of adult body size. However, the mechanism by which larger body size during childhood may reduce future breast cancer risk is not understood. Deciphering the mediating pathway between early-life adiposity and breast cancer would be of great interest for identifying targets of intervention, since advocating weight gain in childhood is not recommended.
Many established breast cancer risk factors are plausible candidate mediators of the protective effect of early-life body size, since they have been found to be influenced or associated with childhood adiposity (Figure 1). For example, body fatness during childhood may protect against breast cancer risk through hormonal pathways (e.g. IGF-1, oestradiol, testosterone, SHBG [17], [18]). It has also been shown that incidence of breast cancer, particularly oestrogen receptor-positive (ER+) breast cancer, is substantially driven by changes in reproductive patterns, including parity and age at menarche, first birth and menopause, [19], [20]), events that are also influenced by increased adiposity in childhood [21], [22]. Physical traits such as breast mammographic density (MD), which is an established risk factor of breast cancer [23], are also affected by body fatness in early life [24], [25]. Finally, glycaemic traits have been extensively studied in relation to BMI and, while they have been inconsistently associated with breast cancer [26]–[28], are also of interest as potential intermediates.
In this work, we aimed to decipher the link between increased childhood body size and breast cancer risk by assessing a variety of potential mediators of this effect within a Mendelian randomization framework [29], [30]. We characterised the effects of four groups of mediators that may be influenced by childhood body size to affect breast cancer risk: sex hormones, reproductive traits, glycaemic traits, and physical traits. This was done using several extensions of the basic MR principle, including two-sample MR [31], two-step MR [32], and multivariable MR [28], with results then integrated within a mediation framework.
Results
Study overview
Using genome-wide association study (GWAS) summary statistics for childhood body size [16], breast cancer [51] and 18 reproductive [36], [38], [39], glycaemic [40]–[45], physical [15], [48]–[50] and hormonal traits, we set up an MR framework to investigate the role of these traits as plausible mediators of the childhood body size effect on breast cancer risk in later life. Firstly, we performed two-step MR to identify traits influenced by childhood body size that have a causal effect on breast cancer (Figure 2). Subsequently, traits were assessed in a multivariable MR analysis to assess the magnitude of their direct effect on breast cancer risk, followed by a mediation analysis to quantify the indirect effect. The main exposure and outcome datasets used in the analysis comprised 246K female participants from UK Biobank (childhood body size) and 228K participants from Breast Cancer Association Consortium, BCAC, (breast cancer cases and controls) respectively. An overview of the GWAS datasets used as mediators is presented in Table 1.
Two-step Mendelian randomization
For each mediator, we conducted two-step MR, where each step is an independent univariable two-sample MR analysis. In step 1, childhood body size was used as the exposure and a mediator trait as the outcome; in step 2, the mediator was used as the exposure and breast cancer as the outcome (Figure 3B). This analysis provided insights into the presence of a causal effect on/from the mediators and was used as a mediator prioritisation step. The results obtained using the IVW (inverse-variance weighted) method are presented in Figure 4.
Among the hormones investigated, evidence of an effect in both MR steps was observed for IGF-1 and free and bioavailable testosterone (Figure 4). Increased childhood body size was associated with a reduction in IGF-1 (effect size per standard deviation, -0.24, 95% confidence interval: [-0.33: -0.15]), while IGF-1 had a positive effect on breast cancer risk (odds ratio per standard deviation, OR, 1.08 [1.03: 1.15]). Bioavailable and free testosterone estimates were similarly affected by childhood body size (0.09 [0.02: 0.16] and 0.12 [0.05: 0.18], respectively), and also had a similar positive effect on breast cancer risk (OR: 1.12 [1.05: 1.2] and OR: 1.14 [1.06: 1.22], respectively). Total testosterone had a positive effect on breast cancer (OR: 1.15 [1.06: 1.24], however, there was little evidence to suggest it was affected by childhood body size (−0.01 [-0.07: 0.04]). Oestradiol (−0.08 [-0.15: -0.02]) and SHBG (−0.19 [-0.29: -0.08]) were both inversely affected by increased childhood body size, but there was limited evidence of them affecting the risk of breast cancer (OR: 0.96 [0.68: 1.34] and OR: 0.97 [0.91: 1.04], respectively).
Analysis of reproductive traits showed the greatest effect of high childhood body size on age at menarche (−0.79 [-0.95: -0.64], effect size per standard deviation, 95% CIs), as well as an effect on age at first birth (−0.09 [-0.16: -0.03]). There was little evidence that age at menopause and number of births were affected by increased childhood body size (0.02 [-0.34: 0.38] and -0.01 [-0.07: 0.05], respectively). Age at menopause was found to have a positive effect on breast cancer (OR: 1.05 [1.03: 1.07]). The OR point estimates of other reproductive traits were inverse, but overall, there was little evidence of their effect on breast cancer (age at menarche – OR: 0.98 [0.92: 1.05], age at first birth – OR: 0.92 [0.79: 1.07], number of births – OR: 0.7 [0.44: 1.11]). The estimates for age at menarche and age at menopause from the other data source were in agreement with the results in Figure 4 (Supplementary Tables S2 and S3).
The tested glycaemic traits generally had strong evidence of being positively affected by increased childhood body size, although there were some inconsistent results using different data sources for the same traits (e.g. fasting insulin, see Supplementary Table S2). The estimated effects were – fasting insulin: 0.16 [0.08: 0.24], fasting glucose: 0.05 [-0.01: 0.12], Hba1c: 0.07 [0.04: 0.11], HOMA-B: 0.1 [0.05: 0.15]. However, none of the glycaemic traits had a substantial effect on breast cancer risk (fasting insulin OR: 1.00 [0.58:1.72], fasting glucose OR: 1.03 [0.85: 1.25], Hba1c OR: 1.02 [0.74: 1.4], HOMA-B OR: 1.06 [0.78: 1.45].
For the physical traits, we were only able to perform the second step of the MR analysis (i.e. the effect of the trait on breast cancer) since we did not have access to the full GWAS summary data. There was limited evidence for an effect from breast size on breast cancer risk (OR: 1.11 [0.72: 1.71]), as previously shown by Nick Sern Ooi et al. (2019) using the same data. Among the mammographic density (MD) phenotypes, the dense area of the breast had a positive effect (OR: 1.39 [1.11: 1.73]) and non-dense area had a negative effect (OR: 0.65 [0.46: 0.92]) on breast cancer risk. There was also a positive effect of per cent MD, although with wider confidence intervals (OR: 1.43 [0.97: 2.12]).
We also repeated the second step MR analyses using breast cancer data stratified into oestrogen receptor-positive (ER+) and negative (ER-) groups [51] (Supplementary Tables S9 and S10). Overall, similar findings were identified for the ER+ cases across all mediators, with stronger effects for MD phenotypes. None of the investigated hormones that had an effect on overall breast cancer risk showed evidence of an effect on ER-cases. Among the reproductive traits, the direction of effect switched from inverse to positive for age at menarche, whereas the estimate for age at menopause shifted closer to the null for ER-cases. No strong effects of the glycaemic traits were found on the ER-cases, consistent with ER+ and total breast cancer sample.
Multivariable Mendelian randomization
We next performed MVMR analyses with childhood body size and each mediator in turn, in relation to breast cancer risk. This allowed us to establish the direct effect of childhood body size on breast cancer risk after accounting for each mediator (Figure 3C). MVMR was performed only on those mediators that showed the evidence of effect in at least one of the two-step MR steps and where the sensitivity analyses of both steps showed consistent results (see Sensitivity analysis and prioritisation logic in Supplementary Materials, Section A).
Direct effects of childhood body size
The direct effect of childhood body size on breast cancer after accounting for each mediator estimated using the IVW-MVMR method are presented in Figure 5 (values in Supplementary Table S4). Compared with the total effect of childhood body size on breast cancer from univariable MR, which had an of OR 0.66 [0.57, 0.76]), the direct effects (ORs) of childhood body size varied between 0.59 (age at menarche) and 0.67 (total testosterone). This indicates no substantial change in the effect of childhood body size on breast cancer after accounting for each of the potential mediators, with effects consistent between overall, ER+ and ER-breast cancer (Supplementary Tables S11-S12).
We also performed analyses with adult body size and childhood height as third exposures in MVMR (in addition to childhood body size and the mediator) in two separate analyses. Again, no substantial change in the direct effect of childhood body size was not observed in the presence of the additional exposures (Supplementary Tables S13 and S14).
Direct effects of mediators
Overall, the direct effects of mediators on breast cancer risk were similar to their total effects once accounted for childhood body size (Supplementary Figure S2, values in Supplementary Table S4). Among the hormones investigated, the direction and size of effect remained the same (+/-0.01 OR) for all measures (IGF-1, SHBG, and the three measures of testosterone). Among the reproductive traits, MVMR indicated a protective direct effect of increasing age at menarche once accounted for childhood body size (OR: 0.92 [0.86: 0.99]), and the effect was even stronger using the alternative age at menarche GWAS source (OR: 0.82 [0.74: 0.91], Supplementary Table S4). The positive effect of age at menopause remained consistent with univariable MR results (OR: 1.05 [1.03: 1.07]). The point estimates of Hba1c shifted from the null, but wide confidence intervals still indicate little evidence of the effect (OR: 1.06 [0.83:1.35]). Overall, similar findings to overall breast cancer were identified for the ER+ cases across all mediators. However, in the ER-cases, similar to trends observed in step 2 in two-step MR, there was limited evidence for an effect of the mediators, with the exception of a protective effect of age at first birth in relation to ER-cases (OR: 0.72 [0.56: 0.94] compared with ER+ OR: 0.96 [0.79: 1.17]) (Supplementary Tables S11 and S12).
In the MVMR analyses also accounting for adult body size effect, the mediator estimates were not significantly affected. However, accounting for childhood height reduced the direct estimates for most mediators and attenuated IGF-1 and age at menarche effects to overlap the null (Figure S3 in Supplementary Materials Section C and Tables S13 and S14).
Sensitivity analysis
To investigate the potential violations of the MR assumptions and validate the robustness of the two-sample MR results from the IVW approach, we performed additional MR analyses using MR-Egger [52] and weighted median [53] approaches, both of which are more robust to pleiotropy. The Egger intercept was used to formally test for the presence of horizontal pleiotropy, and Cochran’s Q statistic was calculated to quantify the extent of heterogeneity among SNPs. The sensitivity analyses of MVMR included tests for instrument strength and horizontal pleiotropy. Performance in the sensitivity tests was used as a selection tool for mediator inclusion in the downstream analyses (Supplementary Materials, Section A).
The estimated effects for hormones and reproductive traits were consistent across sensitivity analyses. The results obtained for some of the glycaemic traits, however, were variable and should be interpreted with caution. For all pairs of childhood body size and a mediator which were prioritised for MVMR analysis, conditional F-statistics were > 10, indicating that weak instrument bias is unlikely to be present. The presence of directional pleiotropy was assessed by estimating QA statistics, which consistently indicated excess heterogeneity and so the potential for pleiotropy. The sensitivity analysis details for all mediator groups are available in Supplementary Materials (Section B), and Tables S6-S8.
Lastly, the violation of two-sample MR requirement of having two non-overlapping datasets for exposure and outcome traits, i.e. “winner’s curse”, which was present for hormone traits (step 1 of two-step MR), was addressed with the ‘split-sample’ approach (see Methods). Results of this analysis were consistent with the full sample analysis (Supplementary Materials Section D and Tables S15 and S16).
Mediation analysis
Next, we carried out mediation analysis to estimate the indirect effect of childhood body size on breast cancer risk via selected mediators, using the effect estimates from two-step MR, MVMR, and the total effect (Figure 3A, Supplementary Table S1). This analysis was restricted to mediators that showed evidence of an effect in MVMR and that had substantial instrument strength (Supplementary Materials, Section A).
Using a simulation analysis to compare available mediation methods (Supplementary Materials, Section E), the Product method was chosen with reasonably high confidence as the main mediation analysis approach, with the Delta method as the corresponding SE/CI estimation technique. The indirect effect results are displayed in Figure 6 and the estimates are available in Supplementary Table S17.
The indirect effects were modest, although there was some evidence for an inverse indirect effect via IGF-1 (−0.016 [-0.032: -0.003]), suggesting potential mediation via this trait. The estimated proportion of the mediated effect via IGF-1 was 0.039, indicating that any potential mediation via IGF-1 would only account for 3.9% of the total effect. Conversely, the alternative mediation approach (Difference method) estimated the IGF-1 indirect effect to be positive (described in Supplementary Materials, Section E). Thus, the observed negative effect of IGF-1 requires further investigation. Mediation analysis also showed a positive indirect effect of childhood body size via age at menarche (0.065 [0.007: 0.126]) and via free and bioavailable testosterone (0.015 [0.005: 0.03] and 0.011 [0.002: 0.023] respectively, in contrast to the negative total effect of childhood body size on breast cancer.
Discussion
Previous observational and MR studies indicate that early life body size has a protective effect on risk of breast cancer [11], [12], [16]. In this study, we investigated whether a number of breast cancer risk factors served as potential mediators of this protective effect using large genome-wide association datasets and a series of Mendelian randomization methods. Using two-step MR, we identified IGF-1, SHBG, testosterone, Hb1ac, age at menarche and age at menopause as plausible mediators based on the effect of childhood body size on these traits, and their effect on breast cancer risk. However, when applied in a multivariable MR framework, none of these traits appeared to substantially mediate the protective effect of early life body size on breast cancer risk.
The results of our MR analysis of selected hormones on breast cancer are supported by recent MR studies, with similar effects observed for IGF-1 in Murphy et al (2020) [54] and SHBG and testosterone in Ruth et al (2020) [55]. Conversely, the negative association of SHBG with breast cancer risk adjusted for BMI observed in Dimou et al (2019) [56] had limited evidence of effect in our study, likely due to sex-specific analysis and differences in sample size. The effects observed for IGF-1 in two-step MR is in agreement with observational studies [17], [57]. IGF-1 is known to play an important role in breast tissue differentiation and mammary gland function [58], and during normal development, levels of IGF-1 gradually rise from birth to puberty, followed by a decrease in response to growth hormones [59]. The finding of lower adult IGF-1 in response to larger body size during childhood may be a part of the mechanism through which early life adiposity influences breast cancer risk. Our mediation analysis showed that childhood body size may have a negative effect on breast cancer indirectly via IGF-1. However, this result requires further investigation as the estimated indirect effect was relatively small (accounting for 3.9% of the total effect), and inconsistent across mediation methods. While there was some evidence for indirect effects of childhood body size via age at menarche, as previously reported in [16], [60], and testosterone levels, this was in the positive direction in opposition to the total inverse effect of childhood body size on breast cancer. Lastly, it was not feasible to confidently assess the effect of oestradiol on breast cancer in MR analyses due to the limited number of genetic instruments in the available data, but as oestradiol has been observationally associated with breast cancer [18], its mediating role remains of interest.
We also reviewed the effects in ER- and ER+ cancer samples, which can proxy for younger/older or pre-/post-menopausal women, respectively. The observed causal effects for hormones were maintained in the ER+ samples, but no effect was observed in the ER-samples. Some differences were also observed for reproductive traits, for example, age at first birth had a direct effect only on the ER-cases (younger women). Additionally, there was a null effect of age at menopause and increased number of births among ER-breast cancer cases, which is reasonable since those exposures are less prevalent in younger women who typically present with ER-breast tumours.
Although we used the largest GWAS available for each mediator trait of interest, many of these data sources possess specific limitations that may have prevented us from identifying the mediated effect. When investigating hormones as outcomes, we used data from UK Biobank, which is the same sample as our main exposure (childhood body size). To assess whether sample overlap could have potentially led to “winner’s curse” bias in step 1 of MR, a split-sample analysis was performed. Results of non-overlapping samples analysis were similar to the full sample analysis, suggesting that using the same data source for both exposure and outcome had little impact on our findings. Another limitation related to the hormones measures is that these were quantified in a predominantly post-menopausal sample of women (average age in UK Biobank is 56 years), where sex hormone levels are considerably different to those before menopause [61].
For the reproductive traits, we prioritised non-UK Biobank datasets in the main analysis to minimise the problem of sample overlap. While these datasets typically were from smaller sample sizes (and therefore fewer instruments), the directions of observed effects were consistent with the analyses performed on UK Biobank data (Supplementary Table S2).
The most inconclusive results were observed for glycaemic traits, likely due to smaller samples sizes and mixed-sex samples within these data sources. For traits with multiple available data sources, we prioritised those containing female-only participants [62], which reduced the power of the analyses but showed more relevant effects than in the mixed-sex analyses (Supplementary Tables S2 and S3).
The unavailability of full summary data for the physical traits of interest is a major constraint in this study. Since only the top GWAS hits from mammographic density studies by Sieh et al (2020) [48], Brand et al (2018) [50] and Lindström et al (2014) [49] were available, we were unable to estimate the effect of childhood body size on these traits and the extent to which they could mediate the relationship with breast cancer. High MD is a major breast cancer risk factor and, importantly, is therapeutically modifiable [63]. Moreover, higher adiposity in childhood and adolescence has been associated with lower MD throughout adulthood [24]. In light of several recent studies [64], [65] suggesting a plausible role of MD in the mediation of the protective effect of childhood body size on breast cancer risk, applying our MR framework to these datasets is an important aim for the future. We were also unable to perform the full analysis on breast size data. However, a previous study using MVMR showed that breast size is an unlikely to be a mediator of BMI effect on breast cancer risk Nick Sern Ooi et al (2019) [15].
While most of our analysis focused on mediators measured in adulthood, assessing mediators measured earlier in life would be useful for exploring the life-course effects of childhood body size. Investigating how childhood body size score influences plausible mediators over time, including an assessment of its effects on other anthropometric measures such as growth, changes in body composition and fat distribution [66], would provide another critical step to improve mechanistic understanding of its protective effect on breast cancer risk.
Another important point to raise is the gene-environment equivalence assumption in MR, i.e. that if childhood body size is influenced genetically or environmentally this will have the same effect on the outcome [67]. It is necessary to consider whether childhood adiposity produced by environmental/lifestyle factors can reduce the risk of breast cancer in the same way as has been estimated using genetic variants that affect body size in early life.
It is also important to mention that current MR methods for meditation analysis assume linear associations. However, it is possible that the effects of childhood body size and mediators are non-linear, which could lead to an apparent lack of mediation despite the presence of the true meditating effect. Additionally, two-step MR and MVMR assume no interaction between the exposure and the mediator on the outcome. When assumptions of linearity and no interaction are not satisfied, the magnitude of the estimated effect may be affected [68].
In summary, here we systematically reviewed a set of potential mediators for the observed protective effect of increased childhood body size on breast cancer risk. Individually, none of the tested traits were found to strongly mediate this effect. However, it is plausible that several related traits may collectively contribute to the mediated effect, which could be explored in multi-mediator MVMR analyses in future studies. It would also be interesting to explore mediation effects on breast cancer experienced pre- and post-menopause, and ultimately by molecular subtypes of the disease. Mediation may also occur via a pathway that we have not considered in our study, or via mammographic density, which could not be fully explored in this study due to the lack of available data. Finally, future work could explore proximal molecular mediators (e.g. breast tissue gene expression and methylation) to determine if early-life and adult adiposity have different effects on breast biology, which would be a critical step in deciphering the protective effect investigated in this work.
Our systematic investigation of mediators was designed as a prioritisation workflow, in which the initial MR analysis results (two-step MR) were used to select mediators for more advanced analyses (MVMR, mediation) based on the presence of causal effects and adequate performance in sensitivity analyses. Although we failed to identify a sole plausible mediator, we systematically report the MR results for the majority of obvious candidates from the largest available GWAS datasets to our knowledge. As new GWAS data becomes available, a similar approach can be applied, or used for investigating other biological questions (e.g. as shown in a recent study [69], aiming to identify the mediators of height effect on coronary artery disease). While we adopted a hypothesis-driven approach to investigate potential mediators, in future work, data mining platforms such as EpiGraphDB (epigraphdb.org) [70] may be used to facilitate the identification of novel mediator traits/biomarkers, or candidates for multi-mediator MVMR analyses.
Methods
Data Sources
UK Biobank [71] was the sources for the main exposure GWAS (childhood body size) and the Breast Cancer Association Consortium (BCAC) [51] was the main source for outcome GWAS (breast cancer) used in this work. The sources of the mediator trait GWAS are summarised in Table 1.
UK Biobank is a population-based health research resource consisting of approximately 500,000 people, aged between 38 years and 73 years, who were recruited between the years 2006 and 2010 from across the UK. A full description of the study design, participants and quality control (QC) methods have been described in detail previously [71]. The GWAS of childhood body size and adult body size used in this study were performed by Richardson et al (2020) [16] on UK Biobank data (N=246,511; female only).
BCAC breast cancer GWAS includes 228,951 samples (122,977 cases and 105,974 controls) of European ancestry. The cases include both oestrogen receptor-positive (N=69,501) and oestrogen receptor-negative (N=21,468) participants [51]. The details of cohort design and genotyping protocol are described elsewhere (bcac.ccge.medschl.cam.ac.uk, ccge.medschl.cam.ac.uk/research/consortia/icogs/). The BCAC data was accessed through OpenGWAS [36] (gwas.mrcieu.ac.uk) under the following IDs: ‘ieu-a-1126’ (full sample), ‘ieu-a-1127’ (ER+), ‘ieu-a-1128’ (ER-).
Description of selected traits
Childhood body size is a categorical trait describing body size at age 10, with three categories (‘thinner than average’, ‘about average’, ‘plumper than average’), from a questionnaire completed by adult participants of UK Biobank. Adult body size measure was converted from continuous adult BMI in UK biobank into three groups based on the proportions of childhood body size data to ensure that the GWAS results of both measures are comparable [16]. The genetic scores for childhood and adult body size were independently validated in two separate cohorts (HUNT Study (Norway) [72] and Young Finns Study [73]), which confirmed that the genetic instruments extracted by Richardson et al (2020) [16] can reliably separate childhood and adult body size as distinct exposures.
Four groups of mediators were assessed: sex hormones, female reproductive traits, glycaemic traits, and physical traits.
Hormones: IGF-1 (Insulin-like growth factor 1), SHBG (sex hormone-binding globulin), oestradiol, testosterone (free, bioavailable, total). The different measures of testosterone were estimated as described in [55].
Reproductive traits: age at first birth, age at menarche (x2), number of births, age at menopause (x2).
Glycaemic traits: fasting insulin (x3), fasting glucose (x3), HbA1c (glycated haemoglobin A1c) (x2), HOMA-B (Homeostatic Model Assessment of β-cell function). HOMA-IR (insulin resistance) was considered in the analysis too, however, no robust instruments were identified.
Physical traits: breast size (x3), mammographic density (MD) (percent, dense/non-dense area) (x3). Full summary data was not available for breast size and MD, so only a small number of top GWAS hits were used in the analysis.
Several traits were available from multiple data sources (marked with xN). In the early stages of the analysis, all of them were evaluated, but only one version is presented in the final set of results. To prioritise a particular dataset over the rest we used the following criteria: 1) female-only sample, 2) non-UK Biobank data, 3) sample size. Table 1 shows the full set of tested datasets, highlighting the final selection with an asterisk.
IEU GWAS pipeline
The GWAS of hormone mediators from UK Biobank were performed using the MRC IEU GWAS pipeline [35] which is based on BOLT-LMM (v2.3) [74] linear mixed model and an additive genetic model adjusted for sex, genotyping array, and 10 genetic principal components. The data were inverse rank normalised prior to the analysis; the results are quantified as standard deviation change.
Mendelian randomization
Mendelian randomization (MR) is a type of instrumental variable (IV) analysis where genetic variants are used as proxies to uncover the causal relationship between a modifiable health exposure and a disease outcome [29]. There are three core assumptions that genetic variants need to satisfy to qualify as valid instruments for the causal inference: (1) variants have to be reliably associated with exposure of interest, (2) there cannot be any association with confounders affecting the exposure-outcome relationship, and (3) variants cannot be independently associated with the outcome, via pathway other than through the exposure (i.e. horizontal pleiotropy) [75].
The analyses in this work were performed using the two-sample (univariable) MR approach (Figure 3A), which relies on using GWAS summary statistics of two non-overlapping samples for exposure (sample 1) and outcome (sample 2) [76]. Two-sample MR is the basis for the more advanced analysis set-up described in the next sections.
Two-sample MR analyses were performed using the inverse-variance weighted (IVW) method [77], which is presented in the Results. Alongside IVW, other complementary MR methods were applied to assess the robustness of the causal estimates and to overcome any potential violations of the MR assumptions (e.g. horizontal pleiotropy) (see Sensitivity analysis for further details).
Two-step MR
Two-step MR (also known as network MR) is a sequence of two (or more) MR analyses connected by a shared variable. Two-step analysis setup is used to assess whether an intermediate trait acts as a causal mediator between the main exposure and the outcome of interest [32], [78]. As shown in Figure 3B, in step 1, genetics variants (IVs) for the exposure are used to estimate the causal effect of the exposure variable on the potential mediator, then, in step 2, the mediator IVs are used to assess the causal effect of the mediator on the outcome [78]. The evidence of a causal effect in both steps suggests that the association between exposure and outcome is mediated by the intermediate variable to some extent (further details in Mediation analysis).
MVMR
Multivariable Mendelian randomization (MVMR) is an extension to the standard univariable MR, which allows genetic variants to be associated with more than one exposure, and can estimate the direct causal effects of each exposure in a single analysis [33]. In this way, an exposure trait and a potential mediator can be analysed together to quantify the direct effect of both independently on the outcome (Figure 3C). The genetic variants included in the analysis have to be reliably associated with one or both exposures but not completely overlap (i.e. no perfect collinearity), and have to satisfy the MVMR-extended second and third assumptions of the standard MR analysis [34] [79]. Diagnostic methods and sensitivity tests for the robustness of MVMR estimates [79] [80] are described under Sensitivity analysis.
MR analysis tools
All analyses were conducted using R (version 4.0.0). Univariate MR analyses and sensitivity tests were performed using the TwoSampleMR R package (version 0.5.4) [81], which was also used for accessing GWAS summary data deposited in OpenGWAS [36] (gwas.mrcieu.ac.uk). Multivariable MR was carried out by adapting TwoSampleMR’s functionality to be used on mixed data sources (see Code availability). Sensitivity analyses for multivariable MR were performed using MVMR R package (version 0.2) [34].
For all exposure and mediator datasets, the instruments were extracted from the full summary data by selecting SNPs under P-value < 5×10−8 threshed and clumping the resulting set of variants with r2 < 0.001. When extracting instruments from the outcome (breast cancer) GWAS summary statistics, unavailable SNPs were substituted by proxies with a minimum linkage disequilibrium r2 = 0.8. The rest of the settings were kept to defaults as per package version number.
Sensitivity analysis
To further investigate the causal estimates found in the standard (IVW) MR analyses and to evaluate the validity of the analysed genetic instruments, MR-Egger [52] and weighted median MR [53] methods were used to overcome and accommodate for potential violations of the core MR assumptions. These complementary methods help to support the causal effects found with IVW, as a single method cannot account for all biological and statistical properties that may impact MR estimates. A variety of specialised sensitivity tests were applied, as recommended in [81].
To assess overall horizontal pleiotropy (violation of assumption 3), the intercept in the MR-Egger regression [52] was evaluated, and the heterogeneity among the genetic variants was quantified using the Cochran’s Q-statistic [82]. The intercept term in MR-Egger regression is a useful indication of whether directional horizontal pleiotropy is driving the results of an MR analysis. When the Egger intercept is close to zero (e.g. < 0.002) and the P-value is large, this can be interpreted as no evidence of a substantial directional (horizontal) pleiotropic effect. When the Q-statistic for heterogeneity (difference in individual ratio estimates) is high and the corresponding p-value is small, this suggests evidence for heterogeneity and possibly horizontal pleiotropy. However, this heterogeneity does not translate into a directional pleiotropic effect if the Egger intercept is (close to) zero (under the InSIDE assumption). A high Q-statistic can be also used as an indicator of one or more variant outliers in the analysis, which may also be violating the MR assumptions.
Additionally, scatter plots of SNP effects from exposure and outcome fitted by all tested MR methods were evaluated for any deviations which would also be indicative of heterogeneity and violations of MR assumptions. Single SNP forest plots were used summarise the effect of the exposure on the outcome due to each SNP separately, which is a helpful approach for visualising SNPs heterogeneity. Next, funnel plots were used to visually evaluate the direction of pleiotropy, which, if present, would be characterised by asymmetry in the plot. Finally, the sensitivity of causal inference to any individual genetic variant was tested by leave-one-out analysis, which is used to identify outliers.
In MVMR analyses, conditional F-statistics were used to evaluate the instrument strength [79], with F > 10 indicating suitable strength for the analysis. However, as in univariable MR, heterogeneity may be indicative of horizontal pleiotropy that does not act through one of the exposures. In MVMR, heterogeneity is quantified by QA-statistic (also a further modification of Cochran’s Q), and small QA indicates a lack of directional pleiotropic effect [79].
To calculate both statistic measures, a phenotypic correlation matrix for each MVMR test had to be constructed. This was done by applying the method in for phenotypic matrix construction from GWAS summary data, available from metaCCA R package [83]. This is an alternative approach to using individual-level data for matrix construction. This was only applied in cases when both exposures in MVMR were from the same sample (i.e. UK Biobank). When data samples were different, the default settings for F and QA statistics were used (i.e. gencov parameter set to zero) [79].
In the cases where MVMR sensitivity tests indicated the presence of weak instruments and potential pleiotropy via heterogeneity, Q-minimisation approach (Q-het) for estimating causal association was used to supplement the estimated of MVMR-IVW approach [79].
Split-sample analysis
One of the important requirements in two-sample MR is that the exposure and outcome GWAS are two non-overlapping datasets, which provides an advantage over the limitations in one-sample MR analysis of winner’s curse and anti-conservative weak instrument bias [84] [85]. In the two-sample MR analyses of childhood body size on hormones and some of the reproductive traits, this requirement was not satisfied, since the mediator GWAS were performed in the same cohort as the main exposure (childhood body size), i.e. UK Biobank.
To overcome the bias and evaluate the extent to which the results are affected, the MR analyses were repeated using a split-sample approach. This was possible due to the large sample size of the UK Biobank: the data were randomly split into two parts, and one part was used to carry out a new GWAS for the exposure (childhood body size), and the other part for the outcome (hormones/reproductive traits). With newly extracted instruments, step 1 of two-step MR and MVMR were repeated, with the resulting estimates are available in Supplementary Tables S15 and S16 and Supplementary Materials Section D. Finally, the requirement of having two separate samples does not apply to exposures used in MVMR, i.e. it is acceptable to use UK Biobank traits for both exposure and mediator used in the analysis [34].
Mediation analysis
Mediation analysis is a method for quantifying the effects of an exposure on an outcome, which act directly, or indirectly via an intermediate variable (i.e. mediator) [68]. This analysis can identify which factors mediate the relationship between the exposure and the outcome enabling intervention on those mediators to mitigate or strengthen the effects of the exposure [86]. The “total” effect of exposure on outcome includes both “direct” effect and any “indirect” effects via one or more mediators.
In terms of MR, the total effect is captured by a standard univariable MR analysis (Figure 3A). To decompose direct and indirect effects, we use the results from two-step MR and MVMR (Figure 3B and 3C) in two mediation analysis methods: Difference method and Product method (Supplementary material Section E, Figure S5). For mediation analysis, it is important to have strong evidence of the total effect (Supplementary Table S1), effect in two-step MR and MVMR, and strong instruments in MVMR (measured by F-statistics) with no evidence of pleiotropy.
Lastly, although it is sometimes advised against calculating an indirect effect if the outcome is a binary variable (i.e. disease status) due to non-collapsibility of odds ratios [60], it has been shown that if the outcome effects have been quantified as log-odds ratios, it is acceptable to use betas of such outcome in the mediation analysis [68].
The main mediation method (Product method) and the indirect effect SE/CI estimation approach (Delta method) were chosen with the help of simulation analysis described in Supplementary Materials Section E.
Data Availability
The availability of all data analysed in this study has been referenced throughout the manuscript and supplementary materials.
Code and data availability
MR analysis code is available at https://github.com/mvab/mendelian-randomization-breast-cancer
Simulation analysis for selecting mediation method is available at https://github.com/mvab/simulation_for_MR_mediation
Hormones GWAS data generated with the IEU GWAS pipeline will be uploaded to OpenGWAS.
Funding
BLL – University of Bristol (Vice-Chancellor’s fellowship) and the Academy of Medical Sciences.
RCR – de Pass Vice Chancellor’s research fellowship at the University of Bristol
MV – University of Bristol Alumni Fund: Professor Sir Eric Thomas Scholarship
MV, GDS, ES, TGR, RR work in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol supported by Medical Research Council (MC_UU_00011/1, MC_UU_00011/2 and MC_UU_00011/4).
This work was also supported by a Cancer Research UK programme grant (the Integrative Cancer Epidemiology Programme) (C18281/A19169).
Competing interests
TGR is employed part-time by Novo Nordisk outside of this work. All other authors declare no competing interests.
Acknowledgements
This research has been conducted using the UK Biobank Resource under application number 15825.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].
- [8].
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].
- [42].
- [43].
- [44].
- [45].↵
- [46].
- [47].
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵