Abstract
Body mass index (BMI) changes throughout life with age-varying genetic contributions. We aimed to investigate the genetic contribution to BMI across early life using repeated measures from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. Random regression modelling was used to estimate the genetic covariance matrix (Kg) of BMI trajectories from ages one to 18 years with 65,930 repeated BMI measurements from 6,291 genotyped ALSPAC participants. The Kg matrix was used to estimate SNP-based heritability at yearly intervals from 1-18 years and genetic correlations across early life. We also performed an eigenvalue decomposition of Kg to identify age-varying genetic patterns of BMI. Finally, we investigated the impact of a polygenic score derived from adult BMI on the estimated genetic components across early life. The was relatively constant across early life, between 23-30%. The genetic contribution to BMI in early childhood is different to that in later childhood, indicated by the diminishing strength of genetic correlation across different ages. The eigenvalue decomposition revealed that the primary axis of variation (explaining 89% of the genetic variance in Kg) increases with age from zero and reaches a plateau in adolescence, while the second eigenfunction (explaining around 9% of Kg) represents factors with opposing effects on BMI between early and later ages. Adjusting for the adult BMI polygenic score attenuated the from late childhood; for example, is 29.8% (SE=6.5%) at 18 years of age and attenuates to 14.5% (SE=6.3%) after adjusting for the adult BMI polygenic score. Although common genetic variation explains around 23-30% of BMI variability across early life, our findings indicate that there is a different genetic profile operating during infancy compared to later childhood and adolescence.
Introduction
Body mass index (BMI), defined as body weight (in kilograms) divided by the square of height (in meters), is a commonly used measure to estimate total body fat. Obesity, defined in adults by BMI exceeding 30kg/m2, poses a significant risk for the development of many diseases, particularly cardio-metabolic diseases 1. One of the strongest predictors of obesity in adulthood is high BMI during childhood 2. In the general population, BMI across childhood involves three distinct phases 3. First, there’s a rapid increase in BMI from birth to around nine months, when children reach their adiposity peak (AP). Second, there is a rapid decline in BMI until around five or six years of age where children reach their adiposity rebound (AR), which is due to factors such as changes in body composition and increased physical activity. Finally, BMI gradually increases until early adulthood, reflecting ongoing body development through puberty. Increasing our understanding of the mechanisms influencing BMI across early life could lead to strategies to enable early prevention of obesity.
Considerable strides have been made in understanding the genetic underpinnings of BMI throughout childhood and in adults. The largest and most powerful studies have been conducted in adults including 700K individuals and identifying nearly a thousand independent loci through genome-wide association studies (GWAS) 4. The total amount of variation in adult BMI explained by all common genetic variants either on or tagged by GWAS arrays, also known as single nucleotide polymorphism (SNP) based heritability , was estimated by Yengo et al. (2018) as 22.4% (SE = 3.7%) 4. This contrasts to family-based designs which estimate the heritability of BMI in adults to be about 44% (SE = 4%) 5, and twin-based estimates which are around 60-80% 6. The differences between the experimental designs relate to genetic variation not tagged by common SNP (e.g. rare variants), and potential overestimation of twin-based estimates through common environmental or other effects 5. Some studies have also suggested a genotype-by-age interaction for adult BMI for older age groups, implying that the genetic correlation (rg) for BMI between different ages is less than 1 5,7.
Similar to studies in adults, the estimates of heritability of BMI in childhood differ not only on study design but also age. For example, twin studies estimate a heritability of around 40% at age four, presumably where there is a large shared environmental component to BMI 8. Estimated heritability from twin-studies increases to around 80% from the age of ten to 19 years old 8-10, which is similar to twin-based estimates of heritability in adults (60-80%) 6. The estimated SNP-based heritability is 20-40% between one and ten years of age 9,11,12, again consistent with the estimate in adults of 22%4. BMI in childhood also exhibits genotype-by-age interactions, often to a greater magnitude than is seen in adults. Silventoinen et al. pool data from 25 twin cohorts to show that the genetic correlation between BMI at age 4 and 18 years is 0.5 (95% CI 0.38 – 0.61) in males and 0.38 (95% CI 0.24 – 0.50) in females 13. Low genetic correlations are also reported using SNP-based estimation where Helgeland et al. 12 estimated the genetic correlation between BMI at 5 years and in adults to be 0.45 (SE = 0.041) and with Couto Alves et al. who estimate the genetic correlation between BMI at the adiposity rebound and adult BMI to be 0.64 (SE = 0.08) 14. This means that although BMI is moderately to highly heritable in children and adults, the genetic factors contributing to the heritability are not consistent throughout time and the estimates of heritability in twin-based studies are prone to environmental or other confounders. SNP-based heritability estimates capture only about half the total additive genetic variance but are not as susceptible to confounders.
Helgeland et al. identify 46 genomic loci that are associated with BMI in at least one of twelve time points from birth to 8 years of age. Around half of these loci influence BMI during infancy but have no effect after the AR, including no effect on adult BMI. One locus following this pattern of association is LEP/LEPR, which has been associated with BMI across early life and at the adiposity peak in numerous studies 11,12,14. Additionally, the genetic correlation between the BMI at the AP and adult BMI is lower than the correlation later in childhood (for example, rg= 0.26, SE = 0.07 between BMI at AP and adult BMI14 and rg = 0.63, SE = 0.06 between BMI at 8 years and adult BMI12). The age-varying genetic effects at individual loci, along with the genetic correlations estimated to be less than one, indicate that there might be a unique genetic profile influencing BMI during the first few years of life that differs from the genetic profile affecting BMI in adulthood. However, all of these studies use cross-sectional data, which overlook individual-level changes and neglect to utilize the repeated measures data from longitudinal cohorts to explore the evolving genetic profile of BMI trajectories throughout early life.
One method for investigating genetic effects on traits over time using repeated measures data is the random regression model (RRM) 15,16. The RRM offers flexibility in modelling the correlation structure of repeated measurements through the inclusion of random effects for each subject and accommodates missing or incomplete data. The method models population, genetic and individual effects as a continuous function of time (e.g. age), thus estimating the population average trajectory as well as random coefficients to describe each individual’s trajectory. Random effects can be partitioned into additive genetic and individual-specific effects, where the additive genetic effect can be estimated using a variance-covariance structure defined by a genetic relationship matrix constructed with common SNPs 17. In contrast to previous studies using cross-sectional data, the random regression model increases power to identify patterns of genetic variation by considering the effect of all SNPs simultaneously and it does not attempt to identify individual SNPs associated with a particular time-point or feature of the trajectory. The RRM also reduces the number of parameters estimated from the data compared to previous twin studies which use a series of pairwise comparisons across ages to estimate genetic correlations 13. Finally, it is flexible in that genetic correlations can be estimated at any given pair of ages (within the age range of the data).
The aim of the current study is to use repeated measures data from a large birth cohort, the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort, to investigate the genetic profile of BMI across early life and provide insights into whether it differs from BMI in adulthood. We estimate the SNP-based heritability and genetic correlations within the age range of one to 18 years, identify patterns of genetic variation in BMI and explore the influence of an adult BMI polygenic score on the genetic variance of childhood BMI.
Subjects and methods
Study Sample
ALSPAC is a population-based, prospective birth cohort conducted in region of Avon in the United Kingdom 18,19. Pregnant women resident in Avon with expected dates of delivery between 1st April 1991 and 31st December 1992 were invited to take part in the study. The initial number of pregnancies enrolled was 14,541, with 13,988 children who were alive at 1 year of age. When the oldest children were approximately seven years of age, an attempt was made to bolster the initial sample with eligible families, resulting in an additional 913 children joining the ALSPAC cohort. The total sample size for analyses using any data collected after the age of seven is therefore 15,447 pregnancies, resulting in 15,658 fetuses. Of these 14,901 children were alive at 1 year of age. ALSPAC collected extensive health-related information from the mothers, fathers and their children at regular intervals from birth to early adulthood and blood samples from 9,115 children were collected at various follow-ups for genotyping. Using information supplied by ALSPAC, we restricted our analysis to singleton births and individuals of European ancestry (see details below) 18,19. The study website http://www.bristol.ac.uk/alspac/researchers/our-data/) contains details of all the data that is available through a fully searchable data dictionary and variable search tool 20.
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants or their mothers following the recommendations of the ALSPAC Ethics and Law Committee at the time. Data access of the ALSPAC cohort can be applied for by submitting a request to the study’s Data Access Committee. Requirements for data access are described at http://www.bristol.ac.uk/alspac/. This project received ethical approval from the Institutional Human Research Ethics Committee, University of Queensland (Approval Number 2019002705).
Genetic Data
Genotyping, quality control (QC) and imputation procedures were performed centrally by ALSPAC and described in detail elsewhere 18,19. In brief, the children were genotyped using the Illumina HumanHap550 quad genome-wide microarray and imputed to the 1000 Genomes reference panel (Version 1, Phase 3, Dec 2013 Release, using haplotypes from all populations) 21 using IMPUTE2 (v2.2.2) 22,23. Quality control measures applied centrally by ALSPAC included checks for gender mismatches, heterozygosity levels, missingness rates, and Hardy-Weinberg equilibrium. Families who withdrew from the study were removed.
Population stratification was evaluated using multidimensional scaling analysis, which was then compared to HapMap II (release 22) reference populations 24, including individuals of European descent (CEU), Han Chinese, Japanese, and Yoruba. The study removed all participants with non-European ancestry centrally. After QC and retaining only individuals of European ancestry from singleton births, there were 8,635 genotyped children with 26,048,419 SNPs. More details of centrally-performed QC are provided at the following webpage: https://proposals.epi.bristol.ac.uk/alspac_omics_data_catalogue.html#.
We applied further QC steps to remove SNPs that displayed more than 5% missingness, a Hardy-Weinberg equilibrium P-value of less than 10−6, imputation quality INFO score less than 0.8, and minor allele frequency of less than 1%. A total of 6,380,782 SNPs were retained. Using the cleaned genotype data, we generated a genomic relationship matrix (GRM) using GCTA (version 1.94.1) 17. Individuals with a relatedness coefficient of greater than 0.05 were identified and one member of every related pair randomly removed using the ‘--grm-cutoff 0.05’ option in GCTA. This resulted in a set of 7,791 unrelated individuals of European ancestry for further analyses.
BMI Measurements
The data collected in ALSPAC included height and weight at up to 32 follow-ups before children reached adulthood, including parent and child completed questionnaires, nurse reports from routine health care visits and clinic attendance.
We used the growthcleanr package25 in R26 to compare each height and weight measurement with a weighted moving average of the individual’s other measurements to identify biologically implausible values in height and weight. Further details about the package are described elsewhere (https://carriedaymont.github.io/growthcleanr/articles/output.html) 25. This flagged 5,480 measurements (weight or height [2.1% of the total number of measurements]), from 3,667 individuals, as outliers and we removed them from subsequent analyses. Then, we only retained individuals of European ancestry with genotype information (see Genetic Data). We only used data collected between 1 and 18 years. This avoids the adiposity peak of around 9 months in most individuals meaning that individual trajectories should approximately follow a 3rd order (cubic) polynomial27. We excluded measurements after 18 years because the cubic polynomial model indicates a downward curve during early adulthood, where it is expected to remain stable or slightly increase. Finally, we only included individuals with at least four measurements, ensuring there is adequate data per individual to fit the cubic polynomial. The final dataset consisted of 65,930 BMI measurements from 6,291 unrelated individuals (Table 1, an average of 10.5 (SD=3.8) measures per individual). All BMI phenotypes were natural log-transformed for analyses due to skewed distribution of BMI. All data cleaning procedures were performed in R (version 4.2.1).
Statistical Model
Our random regression models the change in BMI with age as a polynomial in age in which the curve for each person is the sum of an overall age effect, common to everybody, and an individual departure from this curve described by a random genetic effect and a random effect specific to that person. We used Legendre polynomials of standardized age in the RMM 28. Legendre polynomials are a system of orthogonal polynomials over the interval [-1, 1] that reduce the correlation among the random regression coefficients, making them computationally efficient for longitudinal modelling. We therefore standardized age (m) using the following equation: where ageij is the age of individual i at the measurement j, min(age) is the youngest age (i.e 1 year) and max(age) is the oldest age (i.e. 18 years) in the data. This rescales the age range from -1 to 1 (rather than 1 to 18) to facilitate the use of the Legendre polynomials.
We used the standardized age to generate a matrix of Legendre polynomials evaluated at specified ages, Φ: where Φis a matrix of dimension t × k, where t is the standardized age and k is the degree of the polynomial plus one (in our case, k = 4; for example, coefficients for the overall mean [intercept], linear, quadratic and cubic polynomials of standardized age). M is a matrix of order t × k containing the standardized age values in their mk form (i.e. m0, m1, m2, m3). Λ is a matrix of Legendre polynomial coefficients of order k × k, taking the form 29:
Note we used Φin two different contexts below. First, rows from Φwere used in the design matrices of the RRM model when M contains the observed ages where BMI was collected. Second, we used regularly spaced ages along the interval [-1, 1] for M when transforming variance components from the RRM back onto the observed age-scale.
The following RRM was fitted to the observed data: where y is a n × 1 column vector of log-transformed BMI observations (n is the total number of observations). X is a n × (kf + c) design matrix for the fixed effects, where kf is the degree of the polynomial plus one in the fixed effects, and c is the number of covariates. The fixed effects include overall mean (or intercept), linear, quadratic and cubic Legendre polynomials, and covariates: sex, interaction terms between sex and the Legendre polynomials, and measurement source (whether the BMI measure was collected via questionnaire or at the clinic [including both the nurse report and ALSPAC clinic visits]). b is a (kf + c) × 1 vector of estimated fixed regression coefficients. Zg and Zi are n × (kg × N) and n × (ki × N) design matrices, respectively, for the random effects, where N is the number of individuals and kg and ki is the degree of the polynomial plus one in the random effects (kg and ki are less than or equal to kf). Zg and Zi contain rows of Φ, for the relevant Legendre polynomials at the observed age for an observation. We have specified two unique design matrices, Zg and Zi, to allow the degree of the polynomial (and therefore the value of kg and ki) to differ between the additive genetic and unique individual effects. g is a (kg × N) × 1 vector of random additive genetic polynomial effects, and i is a (ki × N) × 1 vector of random individual-specific effects (i.e., individual-specific effects on BMI not captured by SNP genotypes such as, maternal effects, environmental effects, and genetic factors that are not tagged by common SNPs). Finally, e is a n × 1 vector of residuals, with the assumption that the variance of the residual term remains constant over time.
The distribution of random effects, g, i and e, follow a normal distribution with mean zero and variance given by: where A is the genomic relationship matrix constructed with SNP genotypes, as described above, ⊗ denotes the Kronecker product, Kg and Ki are the kg × kg and ki × ki variance-covariance matrices for the additive genetic and unique individual effects, respectively; and IN and In are identity matrices with appropriate order for the respective random effects, is the variance of residuals. The model was fit using ASReml-SA version 4.2.1 (https://vsni.co.uk/software/asreml), with the ASReml model specification file (.as file) provided in Supplementary Note 1.
Model fitting
We began by fitting cubic Legendre polynomials as fixed effects, and for both the additive genetic and unique individual effects. We examined model fit visually in a subset of randomly selected individuals (Supplementary Figures 1 and 2) and inspected the estimated variance components with standard errors. We used the Akaike Information Criterion (AIC) and a likelihood-ratio test (LRT) to formally determine the appropriate degree of polynomial function for the random effects, ensuring an accurate fit to the data (Supplementary Table 1).
Transformation of variance components from the polynomials to the observed scale of one to 18 years
The genetic variance estimated by the RRM, Kg, is a kg × kg variance-covariance matrix where the elements relate to the genetic variance of the intercept (or mean), slope and quadratic terms (when kg = 3). To aid in interpretation, these polynomial terms can be transformed back onto the original scale of age using Φ, a matrix of Legendre polynomial coefficients (equation 2). As noted above, we can use any arbitrary number of values along the interval [-1, 1] for M when calculating Φ during this transformation as the variance components from the RRM are defined on a continuous scale. Thus, the estimates of the additive genetic and unique individual variance-covariance matrices on the observed scale are: where and are of order t × t, and t is the number of ages across the BMI trajectory of interest for evaluation. For simplicity, we chose 18 equally spaced values along [-1,1] for M corresponding to ages of 1 to 18 years, and thus t = 18. The estimates for the phenotypic variance , SNP-based heritability , and genetic and phenotypic correlation between time points t1 and and , respectively] were calculated as: where is the (t1, t2) element from the estimated additive genetic variance-covariance matrix and is the (t1, t2) element from the estimated phenotypic variance-covariance matrix . Standard errors for the variance components, SNP-based heritability and genetic correlations were calculated using a Taylor series expansion following Fischer et al. 30.
Patterns of genetic variation
We use eigenvalue decomposition on Kg to examine the patterns of genetic variation over the age space 28,31. These patterns could be observed in but the eigenvalue decomposition on Kg has the advantage of being independent of the choice of the ages to evaluate in Φ. Then, where the matrix Kg (Equation 5) is partitioned into its eigenvalues D and eigenvectors E, using eigen() function in R. The eigenvectors are transformed into eigenfunctions (of age) as ΛE, where Λ is a matrix of Legendre polynomials (given in Equation 3), and each column of ΛE represents the coefficients for each eigenfunction (ψ). Eigenfunctions can also be evaluated at a specific set of ages as ΦE.
Since each eigenvector has an associated eigenvalue, we can test whether each eigenfunction explains more than zero genetic variance by using their eigenvalues 28. The coefficient matrix (E) is restricted by setting q testing eigenvalues (e.g. eigenvalue 3) in D to zero, resulting in D* and . The approximation can then be tested using a χ2 test with q(q+1)/2 degrees of freedom as: where V-1 is the inverse of the sampling variance-covariance matrix for Kg from the fitted model (and can be obtained from the .vvp file from ASReml). We constructed 95% confidence intervals for the eigenvalues using numerical simulation (see Supplementary Note 2).
Evaluating BMI trajectories using the eigenfunctions
To further assess the eigenfunctions, we calculated a polygenic score (PGS) for each eigenfunction and evaluated the BMI trajectory for low (<1 SD from the mean), average (within 1 SD of mean) and high (>1 SD from the mean) eigenfunction PGS. In this context, the PGS evaluates the sum of additive genetic effects tagged by common SNP for the primary or secondary axes of genetic variation (i.e. eigenfunction). The PGS for eigenfunctions on the original (age) scale as: where Ĝ is a N × kg matrix of the polygenic scores of the N individuals for eigenfunction i (i = 1 or 2), is a N × kg matrix of estimated random regression coefficients for the Legendre polynomials for the additive genetic effects and E is the matrix of eigenvectors (Equation 12).
All analyses and plotting were performed in R (version 4.2.1). P values of fixed effects and random variances were calculated using the the chi-squared distribution function with one degree of freedom (Supplementary Table 2 and 3).
Adjusting for adult BMI PGS
Next, we explored whether the SNP-based heritability across childhood was explained by the genetic variants known to be associated with adult BMI. We downloaded the SNP effect size estimates on adult BMI for approximately 7 million common SNPs generated from a SBayesRC32 analysis of GWAS data of adult BMI in unrelated individuals of European ancestry (N= 347,800, age ≥ 40 years) from the UK Biobank33,34 (https://sbayes.pctgplots.cloud.edu.au/data/SBayesRC/share/v1.0/PGS/). We constructed the adult BMI PGS in our cohort using the quality-controlled genetic data (see Genetic Data section) and the --score sum option in Plink v 1.9 35, which creates a sum of SNPs weighted by the adult BMI effect size. Subsequently, we integrated the adult BMI PGS and its interaction with age (i.e. interaction terms between PGS and the linear, quadratic, and cubic Legendre polynomials) as covariates in the fixed effects part of our RRM. The ASReml model specification file (.as file) is provided in Supplementary Note 3.
Secondary analyses
We conducted secondary analyses to ensure our interpretation of the SNP-based heritability, genetic correlations and genetic profiles estimated using the RRM were consistent with other approaches.
First, we estimated SNP-based heritability and genetic correlations in GCTA (version 1.94.1) 17 using selected measurements of BMI at cross-sectional follow-ups (i.e. average ages are 0.8 (closest cross-sectional time point to one year available), 1.7, 7.6, 10.7, 13.9, 15.5, and 17.5 years). Each analysis included only one measurement per individual, and included sex and age at measurement as covariates. We mostly used the BMI measurements of the same individuals included in the RRM analyses but included some additional records from the Child health database 2 (mean age =0.8 year, N=4799) and Teen Focus 4 (mean age =17 years, N=3373), which were excluded from the RRM as they were outside 1-18-year age range.
Second, our RRM (Equation 4) assumes that the variance of the residual term is constant over ages. However, the phenotypic variance of BMI increases with age, and therefore the variance of the residual term may also vary with age. This may introduce bias into the estimation of both additive genetic and unique individual variances. In order to address this concern, we conducted a secondary analysis in which we assigned different residual terms for each year in our model allowing them to be estimated (i.e. fitting 17 residual terms in the random regression model). The ASReml model specification file (.as file) is provided in Supplementary Note 4.
Third, we performed eigenvalue decomposition using the genetic correlation matrix of BMI obtained from the previously published COADTwins project 13, which was described in Supplementary Note 5.
Results
In the RRM analysis, model comparison indicated that the model with a quadratic slope in additive genetic component (kg = 3) was the superior fit to the data, over fitting a cubic or linear slope (Supplementary Table 1). The fixed effects, which are all associated with log(BMI) (P <0.05), are presented in Supplementary Table 2, and the estimated covariance matrices for random effects (Kg and Ki) and residual variance are presented in Supplementary Table 3. The examination of estimated trajectories and actual measurements of randomly selected individuals (Supplementary Figures 1 and 2) indicates model is a good fit in general. Since the cubic polynomial term for the additive genetic effects were not significant, we used AIC and LRT to formally compare models with (up to) cubic or quadratic degree of the polynomials for the additive genetic effects. The final model retained quadratic polynomials in additive genetic component (kg = 3), cubic polynomials to model random individual-specific effects (ki = 4), and cubic polynomials for each sex to model the overall population BMI trajectory.
Genetic variance and SNP-based heritability
Additive genetic variation was observed for the intercept (Kg 1,1 =0.0073, SE=0.0013, Supplementary Table 3), linear slope (Kg 2,2=0.0017, SE=0.0003), and quadratic slope (Kg 3,3 0.0004, SE=0.0001), indicating that genetics influence on average BMI as well as the shape of BMI trajectory over time. We detected a positive genetic correlation between the intercept and the linear slope (0.682, SE=0.072), and a negative genetic correlation between quadratic slope and both the intercept (−0.678, SE=0.132) and the linear slope (−0.473, SE= 0.174). The estimated SNP-based heritability of the intercept (28.4%, SE=4.8%), linear slope (23.8%, SE=4.2%) and quadratic slope (9.8%, SE=3.1%) were all significantly different from zero (P < 0.05).
We estimated phenotypic, additive genetic and unique individual variance over time between one and 18 years of age on the observed age-scale (Figure 1). We observed that variance components tended to increase with age. The SNP-based heritability was significantly greater than zero and ranged between 23-30% across all ages, which is consistent with the SNP-based heritability estimate for the intercept (i.e. average age 9.5 years, Figure 1 and Supplementary Table 4).
Phenotypic and genetic correlations
The genetic correlation (rg) between BMI at two different ages decreases as the difference between the ages increases (Figure 2, Supplementary Figure 3, Supplementary Table 5). For example, the genetic correlation between one and two years of age was 0.948 (SE = 0.015), whereas the genetic correlation between one and 10 years was not significantly different from zero (rg = -0.009, SE = 0.142). Additionally, the genetic contribution to BMI in early childhood (up until approximately 6 years of age) appears to be independent of genetic influences in later childhood and adolescence. This can be seen by the 95% confidence intervals around the estimates of genetic correlation of BMI between age one and from age seven crossing zero. In contrast, the phenotypic correlation of BMI between age one and the subsequent ages decays quicker but remains non-zero (Supplementary Figure 4, Supplementary Table 6). For example, the phenotypic correlation between one and two years of age was 0.67 (SE=0.01), whereas the phenotypic correlation between one and 10 years was 0.19 (SE=0.01). Interestingly, the genetic correlations at subsequent ages are higher than the phenotypic correlations, indicated by the relatively smooth curves in the genetic correlation figures (Figure 2 and Supplementary Figure 3) but the sharp peaks in the phenotypic correlation figures (Supplementary Figure 4).
Patterns of genetic variation
We conducted an eigenvalue decomposition on Kg to obtain the eigenvalues 1-3 (i.e.,) and their associated eigenfunctions (,,), where the eigenfunctions represent independent (uncorrelated) axes of genetic variation. The eigenvalues showed that the 1st eigenfunction, ψ1, accounts for the majority of the variance in Kg (approximately 89%). The 2nd eigenfunction, ψ2, explains approximately 9% of the variance in Kg, while the 3rd third eigenfunction, ψ3, explains around 2% of the variance in Kg. The chi-squared test showed that the 3rd eigenvalue was not significantly different from zero (, P = 0.10), but that the 2nd eigenfunction did explain a significant proportion of the variance in Kg (, P = 0.02). Numerical simulation of the 95% CI for the eigenvalues supports these findings (Supplementary Figure 5). The first and second eigenfunctions were given by: where t is the standardized age. From the coefficients above, we can see a relatively large weight of the intercept and quadratic slope in ψ1 in comparison to the corresponding weights for ψ2. While ψ2 is dominated by a relatively large negative weight on the linear slope term.
Evaluation of the eigenfunction across the 1-18 age range shows that ψ1 monotonically increases from zero over time until it reaches an approximate plateau at around 10 years of age (Figure 3). It should be noted that the eigenfunctions indicate an axis of genetic variation, and the key feature of the eigenfunction is its relative change and its relationship to zero. Thus, ψ1 is always above the x-axis, indicating that it represents positive genetic covariance in BMI across all ages, increasing in strength through childhood and then plateaus at adolescence. In contrast, ψ2 decreases approximately linearly and crosses zero at approximately 11.5 years. This indicates ψ2 represents genetic variation with strong positive genetic covariance in infancy, but negative genetic covariance between infancy and later life.
To investigate the properties of the eigenfunctions further, we generated PGS for eigenfunctions 1 and 2 and used them to categorise individuals into three clusters: those within one standard deviation of the mean PGS, and the remaining individuals with either high (greater than one standard deviation above the mean) or low (greater than one standard deviation below the mean) PGS scores. Figure 4 illustrates the mean BMI at each age and the distribution of individual trajectories for these three clusters using the PGS of eigenfunction 1. Individuals with high PGS for eigenfunction 1 have a higher mean BMI, a steeper slope and a lack of adiposity rebound (usually around 6 years in non-obese children) in comparison to those with average PGS. In contrast, individuals with low PGS for eigenfunction 1 seem to follow more closely to a typical trajectory for BMI, but with a lower mean BMI and a slower increase in BMI after adiposity rebound.
A similar clustering analysis using PGS for eigenfunction 2 (Supplementary Figure 6) shows individuals with high PGS have a higher mean BMI at one year, but a flatter trajectory across childhood resulting in a lower mean BMI at 18 years compared to the average PGS and low PGS groups. This reflects the positive genetic effect on BMI represented by eigenfunction 2 in early developmental stages but the opposing effect in later stages.
Adjustment of childhood BMI for adult BMI polygenic score (PGS)
In the RRM analyses adjusting for an adult BMI PGS, we investigated whether different genetic factors influence BMI in early life compared to mid-to-late adulthood. (Supplementary Tables 7 and 8, Supplementary Figure 7). By age 18, the SNP-based heritability roughly halved after adjusting for the adult BMI PGS, reducing from 0.298 (SE=0.065) at 18 in the unadjusted analysis to 0.145 (SE=0.063) (Figure 5 and Supplementary Table 9). The 95% confidence intervals at all ages from one to 18 years differed from zero but decreased with increasing age.
Secondary analyses
Cross-sectional analysis
The estimated SNP-based heritability from the RRM were consistent with those from cross-sectional genetic analyses in GCTA, which ranged from 0.28 (SE=0.07) at 0.8 years to 0.37 (SE=0.10) at 17.5 years; however, the standard errors were larger in the cross-sectional analysis than in the RRM model, as expected (Supplementary Table 10). Similarly, the genetic correlations estimated using the cross-sectional data in GCTA were similar to the decreasing age-to-age genetic correlations pattern seen in the RRM analyses (Supplementary Table 11). For instance, the estimated genetic correlation between Child health database 3 (mean age = 1.7 years) and Focus@7 (mean age = 7.6 years) was estimated at 0.68 (SE=0.15), which was not different to the RRM estimated genetic correlation between 2 and 8 years (rg = 0.4, SE=0.12).
Heterogeneous error variance
Estimates of the unique individual and additive genetic variance using the RMM model assuming heterogeneous error variance were similar to those obtained from primary analysis (i.e. assuming homogenous error variance), that is the estimates fall within each other’s 95% confidence intervals (Supplementary Figure 8, Supplementary Tables 12-14). This indicates that while there may be some variation in the residuals over time, it does not appear to significantly impact our overall conclusions.
Validation of growth patterns
In the eigenvalue decomposition of the genetic covariance matrix of BMI from the COADTwins project, the top eigenfunctions exhibit a similar pattern from one to 18 years old to our findings in ALSPAC in terms of variance explained in BMI (for example, eigenfunction 1 explained ∼89% of the variance in both the ALSPAC and COADTwins) and the shape of the eigenfunctions (eigenfunction 1 was monotonically increasing and eigenfunction 2 decreases over time and crosses zero during adolescence) (Supplementary Figure 9).
Discussion
In the current study, we used a RRM to characterise the genetic profile of BMI from infancy to early adulthood using the ALSPAC cohort. We found significant additive genetic variation affecting the shape of the BMI trajectory from one to 18 years. In other words, there was genetic variation in the mean (or intercept, i.e intercept = 28.4%, SE=4.8%) of the BMI trajectory as well as the parameters describing its shape (i.e. linear slope = 23.8%, SE= 4.2%; and quadratic slope = 9.8%, SE= 3.1%). Since the interpretation of these shape parameters can be tricky, we transformed our estimates back onto the observed age-scale. There was a simultaneous increase in both additive genetic variance and unique individual variance over time, resulting in relatively constant SNP-based heritability across early life, ranging from 23 – 30%, and not significantly different from the heritability estimated for the intercept. Genetic correlations found that BMI in early childhood is genetically different to BMI in later childhood. Additionally, the SNP-based heritability in early childhood was relatively unaffected by adjusting for an adult BMI PGS, whereas the SNP-based heritability in later childhood attenuated, indicating further that there is a unique genetic profile operating in early life. Finally, the eigenvalue decomposition of the genetic variance-covariance matrix, Kg, indicated that the main axis of the genetic variation in BMI throughout childhood is strongly influenced by the mean (or intercept) BMI and progressively amplified over time.
Our study demonstrates genetic variation contributing to childhood BMI trajectories. In clinical settings, pediatricians are primarily concerned with individuals who are underweight (often defined as being less than the 5th percentile on the Centers for Disease Control and Prevention (CDC) percentile charts36) or obese (often defined as being greater than the 95th percentile 36) The CDC BMI-for-age percentile chart shows that the higher percentiles also have a greater rate of growth from two to 18 years of age. Here, we ascribe some of the variability in growth curve percentiles to genetic factors. We also note a relatively strong genetic correlation between the mean and linear rate of change (i.e. the genetic correlation between the intercept and slope in our RRM is 0.682 [SE=0.072)), which suggests that the correlation between BMI at mean age (i.e. 9.5 years) and rate of change in BMI across childhood can be partly explained by genetics.
Our observed SNP-based heritability estimates were consistent with the SNP-based heritability for adult BMI 4 and also the Norwegian Mother, Father and Child Cohort Study (MoBa) in infancy and early childhood 11,12 (h2SNP ∼ 30 – 40%). In the CODATwins study 8, they observed an increase in the heritability from 4 years (42%, 95% CI 37 – 47% in boys; and 41%, 95% CI 35 – 46% in girls) to 19 years of age (75%, 95% CI 67 – 80% in boys; and 75%, 95% CI 67 – 82% in girls). Although we expect the heritability estimates from twin designs to be 3-4 times the SNP-based estimates5, our data does not support an increase in heritability throughout this period. These differences could arise from a relative increase in the importance of rare genetic variants influencing BMI, i.e. variants not captured by common SNPs, or other factors associated with twin designs 5. We observed that the additive genetic variance captured by common SNPs increased throughout childhood from 0.0020 (SE=0.00047) at year one to 0.0074 (SE= 0.0016) at 18 years. However, there was also a simultaneous increase in unique individual variance and, thus, no significant change in the SNP-based heritability over time.
Our results show decreasing age-to-age genetic correlations as the difference between ages increased from one to 18 years, which is consistent with the CODATwins study 13. These results suggested independent genetic influences affecting BMI in infancy and early childhood compared to later childhood and adolescence. Our findings align with the transient genetic effects of SNPs detected in early life but not later life, as found in previous studies (e.g., LEPR 12,14), which suggests that larger GWAS are needed to finely map the genetic profile of BMI during early stages.
While genetic correlation focuses specifically on shared genetic influences between ages, phenotypic correlation captures overall associations between ages, regardless of their underlying causes. Previous studies investigating phenotypic risks for childhood obesity, such as Geserick et al. 2018 37, have highlighted weight gain between two and six years as a key predictor of obesity in adolescence. Our study found that the phenotypic correlation between BMI at 18 years of age and all earlier ages progressively increased from rp = 0.16 (SE=0.021) at age one to rp = 0.88 (SE=0.003) at age 17, indicating that high BMI between ages two and six years might not be the only key time point for predicting obesity in adolescence. Interestingly, the phenotypic correlations were weaker than the (SNP-based) genetic correlations among nearby ages (rp=0.67 [SE=0.01], rg=0.95 [SE=0.02] between one to two years) but demonstrated greater strength in ages further apart (rp=0.16 [SE=0.02], rg=-0.14 [SE=0.17] between one to 18 years). The genetic correlations are higher than phenotypic correlations between ages that are close together but lower for ages that are far apart. As the age difference narrows, the model predicts that the genetic correlation approaches 1.0 whereas the phenotypic correlation does not because all measurements are subject to a residual effect and these are all independent. The lower genetic correlation than phenotypic correlation when the ages are far apart implies that genetic effects on BMI are more age specific than environmental effects on BMI.
The eigenvalue decomposition of the genetic variance-covariance matrix allows us to explore the proposed model by Couto Alves et al. of genetic influences on childhood BMI14. This model, based on GWAS of key BMI trajectory features such as BMI at the adiposity peak and rebound, proposes two distinct biological factors underlying childhood BMI. The first acts primarily in infancy until roughly 8 years of age, and the second acts from birth with increasing strength until 18 years. Our results confirm that the majority of genetic variance in childhood BMI (∼89%) is associated with a factor strongly influenced by the BMI at the mean age (i.e. age 9.5), and which is progressively amplified throughout childhood and adolescence. This factor does not have a strong influence on BMI during infancy. Our results also indicate a second (orthogonal) axis of variation that acts primarily during infancy with decreasing importance during childhood, and a negative covariance during adolescence. Therefore, by using eigenvalue decomposition on our RRM, we were able to provide stronger evidence for the proposed model of Couto Alves et al. (2019) by defining two statistically independent axes of variation influencing childhood BMI and determining the degree of genetic variance associated with each axis.
Finally, we performed an analysis adjusting for an adult BMI PGS to (partially) account for genetic factors influencing adult BMI in our analysis. Zheng et al. found the PGS explained 16% of the variance in adult BMI 32. We observed a similar magnitude of attenuation in the estimated SNP-based heritability at 18 years old (by about 15%) after adjustment of the PGS, which is consistent with the adult PGS being an imperfect predictor of the genetic variance tagged by common SNP in adults. These results were also in line with the genetic correlation and eigenvalue decomposition results, whereby adjustment for the adult BMI PGS did not influence the variance components or SNP-based heritability during infancy (< 3 years). This indicates that the genetic contribution to BMI during childhood to adolescence is shared with that of mid-to-late adulthood and highlights the unique genetic underpinnings of BMI during the infancy period.
The strengths of the current study include the utilisation of ALSPAC, a comprehensive long-term birth cohort that provides valuable insights into BMI trajectories across childhood development. The use of the RRM allowed us to estimate the genetic variance components at any age on the trajectory. It also leveraged the large number of repeated measurements per individual (an average of 8 BMI measures per individual), improving the precision of our estimates over a traditional cross-sectional approach. Additionally, by applying eigenvalue decomposition of the RRM, we revealed patterns of genetic variation in BMI that change over time and for the first time validated the proposed model of childhood BMI by Couto-Alves and colleagues. However, there are several limitations in our study also. Firstly, our estimates of the variance components and SNP-based heritability might be imprecise due to sample size, in comparison to those by the larger MoBa12 and CODATwins cohorts8, and we were therefore unable to identify fluctuations in SNP-based heritability across childhood. However, RRM estimates have narrower confidence intervals for genetic variation than cross-sectional studies with the same cohort size and can estimate genetic variation at any time point, even without direct data collection. Secondly, the non-significant low or negative genetic correlation in BMI between ages further apart should be interpreted with caution, as genetic correlations are estimated with lower accuracy compared to phenotypic correlations, especially when the age intervals are large. Thirdly, although the overall fitting of the Legendre polynomial function is satisfactory, improvements in model fitting may be possible by exploring other functions for age within the random regression framework (e.g. splines).
In summary, the current study shows that there is a strong genetic drive regulating BMI during childhood and adolescence. Investigating the genetics of BMI during infancy is likely to identify genetic variants that differ from loci associated with adult BMI. Our findings provide justification for exploring the genetics of childhood BMI (e.g. through GWAS), as it is likely to discover genetic variants distinct from those loci associated with adult BMI. It also highlights the presence of age-varying genetic effects on BMI, emphasising the need to consider these factors when studying the genetics of childhood BMI, its relationship to adult BMI and obesity risk.
Data Availability
The Avon Longitudinal Study of Parents and Children cohort can be applied for by submitting a request to the Data Access Committee. Requirements for data access are described at http://www.bristol.ac.uk/alspac/.
Declaration of interests
The authors declare no competing interests.
Funding
G.W. and N.M.W. are funded by a National Health and Medical Research Council (Australia) Investigator grant (APP2008723). K.E.K. was supported by the Australian Research Council (grant FL180100072). The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. GWAS data in ALSPAC was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. A comprehensive list of funding provided to ALSPC from grants is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). This research was specifically funded by Wellcome Trust and MRC (core), 076467/Z/05/Z.
Data and code availability
The dataset supporting the current study have not been deposited in a public repository because ALSPAC access policy but are available upon application via https://www.bristol.ac.uk/alspac/researchers/access/. The all code generated or analysed during this study are included in the published article or available from the corresponding author on request.
Acknowledgement
This research has been conducted using the ALSPAC (Reference B4194). We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. We would like to express our sincere gratitude to Dominic Waters (The University of New England, Armidale, Australia) for provision of code from Waters et al. (2022)38 for calculating standard errors for the parameter estimates. This publication is the work of the authors, and N.M.W will serve as the guarantor for the contents of this paper.
Appendices
See supplementary materials.
Footnotes
↵# These authors jointly supervised this work