Genomic summary statistics and meta-analysis for set-based gene-environment interaction tests in large-scale sequencing studies

Xinyu Wang; Duy T. Pham; Kenneth E. Westerman; Cong Pan; Alisa K. Manning; Han Chen

doi:10.1101/2022.05.08.22274819

Abstract

We propose an efficient method to generate the summary statistics for set-based gene-environment interaction tests, as well as a meta-analysis approach that aggregates the summary statistics across different studies, which can be applied to large biobank-scale sequencing studies with related samples. Simulations showed that meta-analysis is numerically concordant with the equivalent pooled analysis using individual-level data. Moreover, meta-analysis accommodates heterogeneity between studies and enhances power in multi-ethnic studies. We applied the meta-analysis approach to the whole-exome sequencing data from the UK Biobank and successfully identified gene regions associated with waist-hip ratio, as well as those with sex-specific genetic effects.

Background

Complex diseases are influenced by the synergy of genes and the environment. The study of gene-environment interactions (GEI) may shed light on disease etiology and help identify environmental risk factors that modify the effects of disease-susceptible genes, as well as genetic variants that modify the effects of environmental risk factors for complex diseases (1,2). Following rapid advances in next-generation sequencing (NGS) technologies over the past years, a growing body of research focuses on investigating GEI effects in rare variants (with minor allele frequency (MAF) < 5%), which provides insights into additional disease risk and trait variability undiscovered in common variants from genome-wide association studies (GWAS) (3,4). As in genetic main effect tests, single-variant tests are underpowered in rare variant GEI tests (1,2). A wide variety of set-based GEI tests were developed to improve the statistical power, which are commonly extensions of the burden test and sequence kernel association test (SKAT) (5-7), or a combination of them (8-10). Additionally, the joint tests of both genetic main effects and interaction effects for rare variants were developed to identify variants associated with complex traits, accounting for heterogeneous genetic effects in different environmental exposures (8,10). It is especially useful to perform joint tests when the main effects are small and GEI effects are difficult to detect through interaction only tests (11,12).

Detecting rare variant GEI effects requires extremely large sample sizes given the low power as only a few individuals carried a rare variant while exposed to the environment factor (2,12). Meta-analysis is a well-established way to achieve the scale of sample size needed by rare variant GEI tests. By aggregating summary statistics from different studies, meta-analysis not only increases the sample size, but also avoids sharing individual-level data, which is usually restricted by study policies and confidentiality laws. On the other hand, the file sizes of individual-level genotype data are often extremely large, making it difficult to transfer across platforms with limited disk space and computational resources. The meta-analysis approach can circumvent this problem easily as it does not require sharing individual-level data but rather the genomic summary statistics which are more resource-efficient.

Meta-analysis tools have been developed for genetic main effect tests (13-21). In GWAS, the widely-used inverse variance weighted meta-analysis for common variants usually combines the single-variant effect estimates and their standard errors (21). Meta-analysis for rare variants cannot use the same approach, as models estimating multiple genetic effect sizes with large variances for sparse data can fail to converge, making the algorithm numerically unstable. For this reason, meta-analysis tools for rare variants usually combine score statistics and covariance matrices of individual variants to recover the set-based tests, which are better suited for low-frequency and rare variants. Among these tools, MetaSKAT (13), MASS (17), and Meta-Qtest (19) were developed for unrelated samples. For related samples, RAREMETAL (15-17) can retrieve either the burden test or SKAT, whereas SMMAT (20) can recover burden, SKAT, and a unified test that combines both.

For set-based GEI tests and joint tests, however, very few meta-analysis methods have been developed. Among them, ofGEM (22) introduces filtering statistics based on meta-analysis, but it is only applicable to unrelated samples and no joint tests for genetic main effects and GEI effects were proposed. A recent study proposed extending the rareGE framework (8) to meta-analysis, which can also be conducted for unrelated samples only (23). It is not computationally efficient to generate summary statistics using this method, as for every variant set, a separate statistical model accounting for genetic main effects needs to be fitted. Moreover, methods have been proposed for meta-analysis of joint tests for genetic main effects and GEI effects (24, 25), but they are only applicable to single variant tests. It remains a technical gap in the field to efficiently generate and utilize genomic summary statistics for set-based GEI and joint tests. Recently, we developed a computationally efficient method, Mixed-model Association test for Gene-Environment interactions (MAGEE), for rare variant GEI and joint tests (10). The goal of this study is to develop a general framework for genomic summary statistics and meta-analysis for set-based GEI and joint tests, which are applicable to both unrelated and related samples as well as both quantitative and binary traits. Since MAGEE does not require fitting a separate model adjusting for the genetic main effects in each testing region, the genomic summary statistics for set-based GEI and joint tests can be generated efficiently even for large samples in a whole-genome analysis.

Results

P value benchmark

The simulated individual-level genotype data included 100,000 samples and 100,000 genetic variants in 1,000 groups (100 variants per group). In the meta-analysis, a great amount of time can be saved when the sample size is large as summary statistic-based calculations do not depend on the sample size. For example, we directly computed the p values from individual-level data of 100,000 samples, which took 1,030 s on a single thread for each simulation replicate with 100,000 variants. In contrast, when using the summary statistics, computing the p values of the same tests took only 9.7 s using a single thread on the same computing server, which saved over 99% CPU time.

For both quantitative and binary traits with different sample sizes, we compared the p values from the meta-analysis (assuming homogeneous genetic effects across studies) with those from pooled individual-level data analysis. Each panel in Figure 1 and Figure 2 displays 10,000 p values from quantitative trait analyses with related samples. In the MAGEE framework for individual-level data, we developed two GEI tests, interaction variance component (IV) test and interaction hybrid test using Fisher’s method (IF), along with three joint tests, joint variance component (JV) test, joint hybrid test using Fisher’s method (JF), and joint hybrid test using double Fisher’s procedures (JD) (10). We conducted extensive simulations in MAGEE (10), which showed that each test was well-calibrated for type I error rates. As illustrating examples, here we compared results from IF test as the GEI test, and JD test as the joint test, for meta-analysis and MAGEE pooled analysis. The meta-analysis assuming homogeneous genetic effects shared highly consistent empirical GEI and joint test p values with MAGEE pooled analysis (Figures 1 and 2), regardless of the simulation scenario of homogeneous or heterogeneous genetic effects, and the precision of estimates increased with the sample size. Most importantly, for small p values around the commonly used significance thresholds (5.0×10⁻⁸ to 2.5×10⁻⁶), meta-analysis did not lose any accuracy compared with the MAGEE pooled analysis while being more resource-efficient in saving the computational time. We also found similar results for binary traits (Supplemental Figure S1 and S2).

Figure 1.

GEI test p value benchmark for meta-analysis assuming homogeneous genetic effects with MAGEE in 20,000, 50,000, and 100,000 unrelated samples. (A) Scenario 1 (homogeneous scenario) in quantitative traits. (B) Scenario 2 (heterogeneous scenario) in quantitative traits.

Figure 2.

Joint test p value benchmark for meta-analysis assuming homogeneous genetic effects with MAGEE in 20,000, 50,000, and 100,000 unrelated samples. (A) Scenario 1 (homogeneous scenario) in quantitative traits. (B) Scenario 2 (heterogeneous scenario) in quantitative traits.

Power comparison

Figure 3 shows the empirical power of the meta-analysis GEI and joint tests for analyzing quantitative traits from scenario 1, where the two studies had homogeneous covariate effects, genetic effects, and GEI effects. The power was calculated at the significance level of 2.5×10⁻⁶ with 20,000, 50,000, and 100,000 related samples, respectively. The top panel displays results when the ratio of positive and negative causal variants was 1:1, while the bottom panel displays results when the ratio of positive and negative causal variants was a 4:1. Overall, the meta-analysis tests assuming homogeneous effects are more powerful for both GEI and joint tests, with the IF test being more powerful than the IV test and the JF test being slightly more powerful than the other 2 joint tests.

Figure 3.

Empirical power of meta-analysis tests for scenario 1 (homogeneous scenario) on quantitative traits in 20,000, 50,000, and 100,000 related samples. (A) GEI tests with 80% null variants, 10% causal variants with positive effects, and 10% causal variants with negative effects. (B) Joint tests with 80% null variants, 10% causal variants with positive effects, and 10% causal variants with negative effects. (C) GEI tests with 80% null variants, 16% causal variants with positive effects, and 4% causal variants with negative effects. (D) Joint tests with 80% null variants, 16% causal variants with positive effects, and 4% causal variants with negative effects.

Figure 4 illustrated the empirical power of the meta-analysis GEI and joint tests for analyzing quantitative traits from scenario 2, where the two studies had heterogeneous covariates effects, genetic effects, and GEI effects. Same as in Figure 3, the ratio of positive to negative causal variants in Figure 4 was 1:1 on the top panel and 4:1 on the bottom panel. In this scenario, as heterogeneity was introduced to these studies, the overall performance of the meta-analysis tests assuming heterogeneous effects outperformed the meta-analysis assuming homogeneous effects for both GEI and joint tests.

Figure 4.

Empirical power of meta-analysis tests for scenario 2 (heterogeneous scenario) on quantitative traits in 20,000, 50,000, and 100,000 related samples. (A) GEI tests with 80% null variants, 10% causal variants with positive effects, and 10% causal variants with negative effects. (B) Joint tests with 80% null variants, 10% causal variants with positive effects, and 10% causal variants with negative effects. (C) GEI tests with 80% null variants, 16% causal variants with positive effects, and 4% causal variants with negative effects. (D) Joint tests with 80% null variants, 16% causal variants with positive effects, and 4% causal variants with negative effects.

The same simulations were conducted for binary traits, and we summarized those results in Supplemental Figures S3 and S4.

Meta-analysis of waist-hip ratio in the UK Biobank

The original individual-level genotype data from whole-exome sequencing (WES) provided by the UK Biobank for 200,632 samples was 830 GB for 22 autosomes in the PLINK BED format, while the gene-based summary statistics were about 11 GB in total, including 5.4 GB intermediate files from the first batch and 5.4 GB intermediate files from the second batch. By sharing only the summary statistics files for meta-analysis, we saved about 98.7% disk space compared to directly sharing individual-level data for set-based GEI tests. In addition, when we directly computed p values from individual-level data, it took 25.48 h (91,732 s) CPU time with 10 threads to conduct all the GEI tests and joint tests in the MAGEE framework. In contrast, we spent only 23.8 min (1,428 s) CPU time using a single thread on the same computing server to compute the p values of the same GEI and joint tests when we had the summary statistics, saving about 98.4% of computational time. One additional advantage for meta-analyzing the UK Biobank WES data is that when new samples are released in future tranches, we do not have to spend extra computational cost in rerunning the pooled analysis including previously released samples. Instead, by combining the summary statistics from different releases in a meta-analysis, we can get similar results as the pooled analysis while saving the computational resources.

The two batches of WES data from the UK Biobank were meta-analyzed in 18,668 protein-coding regions to investigate gene-sex interactions effects on waist-hip ratio (WHR). The significance level was adjusted by Bonferroni correction for multiple testing (26) at 0.05/18,668 = 2.7×10⁻⁶. As shown in Figure 5, the GEI test found three significant regions: COBLL1 (p value = 5.8×10⁻¹⁰) on chromosome 2, HMGA1 (p value = 2.8×10⁻⁷) on chromosome 6, and VEGFB (p value = 1.3×10⁻⁶) on chromosome 11. However, none of them remained significant after conditioning on significant variants in the region, defined as those with a single variant joint test p value < 5.0×10⁻⁸, MAF > 1% and genotype missing rate < 5% (Table 1). A total of 13 significant protein-coding regions were found from the joint tests, including TBX15, ACVR1C, COBLL1, NISCH, PLXND1, CYTL1, HLA-B, HMGA1, KIAA0408, MLXIPL, VEGFB, SBNO1, and PAM16, after excluding less significant protein-coding regions in 1 million base pair flanking regions. Of note, COBLL1 (p value = 1.3×10⁻⁸ after adjusting for 7 significant single variants) and KIAA0408 (p value = 9.3×10⁻⁹ after adjusting for 1 significant single variant) had significant joint test p values in the conditional analysis (Table 1).

View this table:

Table 1.

Meta-analysis results for gene-sex interaction effects on WHR from the UK Biobank WES data analysis.

Figure 5.

Manhattan plots of UK Biobank WES data analysis using meta-analysis for gene-sex interaction effects on WHR. (A) GEI test. (B) Joint test.

Previous GWAS have identified many loci in these genes for WHR, some of which have sex-dimorphic effects. For example, there is some evidence that certain loci at COBLL1, PLXND1, HMGA1, and VEGFB have more impact on BMI-adjusted WHR among women (27, 28). In our meta-analysis GEI tests, PLXND1 did not show significance, however, we did identify gene-sex interactions in the other three regions. Additionally, Justice et al. (29) reported gene-sex interactions in COBLL1, PLXND1, and KIAA0408. There were no significant gene-sex interaction signals in PLXND1 (p value = 4.6×10⁻⁵) or KIAA0408 (p value = 4.5×10⁻⁶), possibly because of the limited sample size in our analysis, and more weights were added to rare variants using the beta function, which had potentially diminished the signals from common variants.

We observed most signals driving our joint analyses were contributed by genetic main effects, as only COBLL1, HMGA1 and VEGFB were significant in the GEI tests (they were significant in the main effect tests as well). Several loci in the genes identified in the joint tests were reported to be associated with WHR in previous GWAS analysis (27-32) except for HLA-B. HLA-B is a member of the human leukocyte antigen (HLA) complex gene family, which helps the immune system distinguish between proteins created by the body and proteins made by outside invaders like viruses and bacteria (33). Further research is needed to investigate if there are any sex-specific genetic impacts on WHR in HLA-B. The underlying mechanisms for most of these genetic and sex differences have yet to be discovered.

We pooled individual-level data and ran MAGEE GEI and joint tests to validate the findings. Supplemental Figure S5 shows that p values from the meta-analysis and the MAGEE pooled analysis were highly concordant.

Number of significant variants in the region, defined as those with a single variant joint test p value < 5.0×10⁻⁸, MAF > 1% and genotype missing rate < 5%. Conditional analysis p values were computed by adjusting for both the genetic main effect and gene-sex interaction terms of significant single variants.

Discussion

We present a framework for meta-analysis of rare variant GEI and joint tests for continuous and binary traits. The meta-analysis is based on a generalized linear mixed effect model (GLMM), which can account for relatedness in the samples. Single-variant scores and covariance matrices of individual studies can be efficiently generated from MAGEE. The summary statistics for the meta-analysis can be easily aggregated, which allows combining the evidence from millions of samples collected from multiple large-scale sequencing studies to improve statistical power. Additionally, the proposed meta-analysis framework provides flexible set-based joint tests for genetic main effects and GEI effects, including a SKAT-type variance component test, and two unified tests combining burden and SKAT-type tests. In population-based studies, heterogeneity is possible because different ethnic groups might have different MAFs or effect sizes for the same variants, as well as different levels of environmental exposure. Different ways of collecting and measuring data, genotyping methods, quality-control criterion, and imputation methods make studies heterogeneous (34). The meta-analysis can easily account for the heterogeneity across studies, which is difficult in pooled analyses using individual-level data.

We evaluated the performance of the proposed GEI and joint tests through computer simulations. We first showed that meta-analysis assuming homogeneous genetic effects is numerically concordant with the equivalent pooled analysis using individual-level data. Additionally, the proposed methodology can account for heterogeneity across studies in terms of both covariate and genetic effects. GEI and joint tests assuming heterogeneous genetic effects were generally more powerful than homogeneous-type meta-analyses in the presence of simulated heterogeneity. Overall, IF tests were more powerful than IV tests, and JF tests were the most efficient of the three joint tests, in accordance with what we determined from the single-study numerical simulation of MAGEE.

We assessed gene-sex interaction effects impacting WHR, using GEI and joint tests within the proposed meta-analysis framework on WES data from the UK Biobank. GEI tests identified COBLL1, HMGA1 and VEGFB that are already known to have loci with sex-dimorphic effects on WHR. The meta-analysis joint tests identified 13 protein-coding regions, of which 10 had been previously linked to WHR in GWAS (27-32), including COBLL1, HMGA1, VEGFB, PLXND1 and KIAA0408 that were reported to have sex-specific genetic loci (27-29). The proposed meta-analysis GEI test did not reach significance for PLXND1 and KIAA0408, possibly due to the limited sample size. To identify rare variants that have effects modified by the environment, extremely large sample sizes are required. Furthermore, we only examined genetic variants in the exome, but noncoding regions may play a substantial role and may have integrated effects with protein-coding genes in complex diseases and traits (27, 35). Once more WES data from the UK Biobank and more whole-genome sequencing (WGS) data become available from other large-scale consortia in the future, we expect that the meta-analysis will be used to identify more novel genetic pathways contributing to a wide range of complex diseases.

Our summary statistics and meta-analysis framework have a few limitations. All participating studies in the proposed meta-analysis must use a unified group definition file to ensure that the summary statistics are generated consistently across studies. Although all component variants need not be present for all studies, if different studies use different gene-level tests or group definition files, the summary statistics would be incompatible (3). In addition, in certain case-control studies, case/control ratios could be extremely unbalanced, leading to strong heterogeneity across studies (13). Future research could explore these extremely heterogeneous cases in further depth.

Conclusions

We developed efficient and powerful meta-analysis approaches to overcome the challenges associated with set-based GEI and joint tests for rare variants while allowing for sample relatedness. The meta-analysis leverages study-specific genomic summary statistics, eliminating the need to share individual-level data, and is applicable to both quantitative and binary traits. In addition, the meta-analysis can accommodate heterogeneity among studies. These features allow the analysis of GEI effects across multiple large-scale sequencing studies and can improve the power in identification of novel risk factors for complex traits.

Methods

Modeling summary statistics

The meta-analysis of rare-variant GEI and joint tests involves two major steps. In the first step, we perform a single-study analysis to calculate the summary statistics of the single-variant score vector and covariance matrix for each variant. The second step consists of combining the summary statistics and constructing the test statistic for the meta-analysis. The first step can be performed by using the MAGEE software package. Below is a brief illustration of how to generate the summary statistics.

Suppose we have m = 1, 2, ⋯, M studies, first we need to generate the summary statistics for each individual study. The full generalized linear mixed model (GLMM) for a single study m is in the following form: where g(·) is the link function of μ_mi, which is the conditional mean of the phenotype of individual i in study m. Typically, for a quantitative trait, g(·) is an identity function, while for a binary trait, g(·) is a logit function. X_mi is a vector of p covariates including the intercept, and G_mi is a vector of q variants, and K_mi is a vector of dq pairwise GEI terms for d environmental factors and q variants. Note that the environmental factors are a subset of the covariates in X_mi. Accordingly, α_m is a p × 1 vector for the covariate effects, β_m is a q × 1 vector for the genetic main effects, and γ_m is the dq × 1 vector for GEI effects. The vector for the random intercept , where N is the sample size for study m, λ_l are the variance component parameters of L random effects, and ψ_l are N × N known relatedness matrices.

Through MAGEE, we made statistical inference that the GEI effects test statistic in model (1) can be approximated by fitting a global null model without adjusting for the genetic main effects. Specifically, the global null models are fitted to each individual study as follows:

The score vector for genetic main effects and GEI effects are in the forms of and GEI effects where is a vector of fitted values and is the dispersion parameter estimated from model (2) in study m. Assuming the main effect of genetic variants β_m are small in a null model with genetic main effects, but without GEI effects g(μ_mi) = X_miα_m + G_miβ_m + r_mi (3), we can approximate the score vector for GEI effects accounting for the genetic main effects by where is a N × q matrix of genetic variants, is a N × p matrix of covariates, is an N × N projection matrix from the global null model, where for quantitative traits and diag for binary traits, which we estimate from model (2).

In the meta-analysis context, we save the single-variant scores of genetic main effects and GEI effects, , as well as the covariance matrix for the score vectors for each participant study m, where is a q × 1 vector, is a dq × 1 vector, matrix, is a dq × dq matrix, and is a q × dq matrix. As Ω_m is a large and symmetric matrix, we can save the lower triangle elements in a binary format data to save space. Compared to saving the summary statistics as the double-precision floating numbers that each takes 8 bytes, we used 4 bytes to save each floating-point number to further save the space by 50%, without losing accuracy in the downstream meta-analysis.

The test statistics are recovered using two combination strategies based on the assumption that genetic effects are homogeneous or heterogeneous across studies.

Combine summary statistics assuming homogeneous effects

We combine the test statistics by summing the scores and their covariance matrices if we assume the genetic effects are homogeneous across all the studies. For example, for m = 1, 2 … M studies, we sum the score statistics by and covariance matrix . Assuming all participating studies have q variants, S_G is a q × 1 vector, S_K is a dq × 1 vector, V_G is a q × q matrix, V_K is a dq × dq matrix, and C is a q × dq matrix. In practice, each variant does not need to be included in all the studies. If a variant is absent in study m, we put a placeholder of 0 in place of this variant.

Combine summary statistics assuming heterogeneous effects

Suppose B ethnic groups are studied, some variants can have different allele frequencies depending on the ancestry. Also, lifestyles (e.g., eating habits) and environmental exposures might modify the gene differently, thus their effects on GEI might differ among ethnicities. By correctly accounting for heterogeneity across studies, the power of the meta-analysis should be improved (4). Assuming homogeneous within-ancestry but heterogeneous between-ancestry genetic and GEI effects, we sum the score statistics by and the covariance matrices by over the studies, where , and are the genetic score vector, GEI score vector, and covariance matrix of study m in ethnic group b, respectively. M_b is the number of studies in ethnic group b and can be partitioned to diag, and the variance matrices for genetic effects, GEI effects, and the covariance matrix of genetic and GEI effects are , and , respectively. The lengths are then Bq for S_G and Bdq for S_K. V_G, V_K, and C are block-diagonal matrices with B blocks of q × q, dq × dq, and q × dq sub-matrices respectively for each ancestry.

GEI tests

In the MAGEE framework (10), we developed two GEI tests: interaction variance component (IV) test and interaction hybrid test using Fisher’s method (IF). The same GEI tests can be reconstructed for the meta-analysis from the combined score vectors and the covariance matrices. IV test assumes the overall GEI effects γ ∼ N(0, τW_K²), for which the test statistic is where and W_K is a predefined diagonal weight matrix for interaction. T_γ follows a distribution of , where are independent chi-square distributions with 1 degree of freedom (df), and ξ_γ,j are the eigenvalues of W_KΛW_K, where .

W_K can be flexibly determined using annotation information or MAFs in the combined studies without having to access the individual-level data (36-38). One of the most popular ways to add weight to variants based on MAF is to use a beta density function with parameters 1 and 25 for the MAF (6). The dimension of W_K is Bdq × Bdq, in which B = 1 if we combine the summary statistics assuming homogeneous genetic effects.

IF test assumes the overall GEI effects, where 1_Bdq is a vector of 1’s with length Bdq. To test for H₀: γ₀ = τ = 0, we need two test statistics, for the burden effects γ₀ and T_τ for the variance component effects τ. The burden effects test statistic follows the distribution of under the null hypothesis H₀: γ₀ = 0, where . The variance component effects test statistic is where . T_τ follows the distribution of under the null hypothesis H₀: τ = 0, where ξ_τ,j are eigenvalues for . Since and T _τ are asymptotically independent (proved by Wang et al. (10)), a Fisher’s combination (39) can be used to find the p value for the IF test: , where and p_τ are p values from the burden-type test H₀: γ₀ = 0 (under the assumption τ = 0) and the adjusted variance component test H₀: τ = 0, respectively.

Joint tests

In the joint test, which evaluates both genetic effects and GEI effects, β and γ, the test statistics for β must also be reconstructed after summing the summary statistics across all studies. A meta-analysis framework for genetic main effects tests has been developed by SMMAT, which supports both SKAT test and unified burden and SKAT test. Wang et al. (10) has shown that those test statistics for the genetic main effects and the GEI test are mutually independent. Assume that the p value from the meta-analysis SKAT is p_MV and the p value from the meta-analysis IV test is p_IV, we can combine them through Fisher’s method as for the joint variance component (JV) test. Similarly, assuming and p_σ are the p values from the hybrid genetic main effects meta-analysis, we can combine them with their GEI test counterparts and p_τ through , which is the joint hybrid test using Fisher’s method (JF). Alternatively, we can combine the p values for genetic effects separately by , and then combine the main effects and GEI p values by , which is the joint hybrid test using double Fisher’s procedures (JD).

Numerical Simulations

Using related individuals from the same population and different populations, we conducted simulation studies with two purposes: (1) benchmarking p values from the meta-analysis to the pooled analysis of individual-level data from MAGEE; and (2) evaluating the performance of each test under the conditions of heterogeneous genetic influence across populations.

We simulated the genotype replicates in two different populations using MS (40) and prepared two scenarios based on the two populations. In scenario 1, we simulated 100,000 individuals from the same population, which included 50,000 parents and 50,000 children in a family relatedness pattern with a kinship matrix of . The 100,000 samples were split into two studies of 50,000 individuals (12,500 families) each, and the covariates were the same in the two studies. In scenario 2, we simulated 2 populations with a migration rate of 10. Each population had 50,000 individuals, including 25,000 parents and 25,000 children who were related to each other in the same way as scenario 1. The covariates in these two studies were different. The simulation scenarios and covariates setup for each study are summarized in Table 2. Each scenario had 1,000 genotype replicates of 100 variants in a group. Both qualitative and binary traits were simulated, and the following section describes the details for each simulation.

View this table:

Table 2.

Summary of simulation scenarios.

Comparison of p values and power

We simulated 10 phenotype replicates per genotype replicate. In each study, the quantitative traits of individual j in family i were simulated from where Age_ij ∼ N(50, 5) for parents and Age_ij ∼ N(50, 5) for children, Sex_ij ∼ Bernoulli(0.5) for both parents and children, BMI for family i follows a distribution with heritability (41) of 0.75 (44) BMI_i ∼ , the random effects for family i , and the random error ε_ij ∼ N(0, 1). G_ijt was the t-th genetic variant for individual j in family i and was the mean of the t-th variant, and was the GEI term for the t-th variant with BMI for individual j from family i and was the mean of the t-th interaction term in that study. The binary traits were simulated using a logistic regression model from where Age, Sex, BMI, and the random effects for family i r_i followed the same distribution as the quantitative traits, and , and were the population means in each study, α₀ was set to representing a prevalence rate of 0.4. and the calculation of K_ijt and were the same as the quantitative traits.

For scenario 1, the parameters for age, sex, and BMI were α₁ = 0.1, α₂ = 0.2, and α₃ = 0.1, the same for study 1 and study 2. The effect sizes β_t for G and γ_t for GEI were also the same across in both studies. In scenario 2, the parameters for the covariates in study 1 were the same as those for scenario 1, while the parameters in study 2 were α₁ = 0, α₂ = 0.4, and α₃ = 0.12, so that age was excluded from the covariates. In addition, the effect sizes for G and GEI effects were different between the 2 studies in scenario 2. Specifically, the effect sizes of a variant β_t and γ_t were proportional to its MAF as clog₁₀(MAF_t), where c represents a constant. Detailed information of the constants c for β_t and γ_t in each simulation can be found in supplemental Table S1 for the p value benchmark, supplemental Table S2 for the quantitative trait power comparison, and supplemental Table S3 for the binary trait power comparison.

For the p value benchmark, we used different parameters for three different sample sizes. In total, 20% of the variants were randomly selected as causal, which included 10% positive effects and 10% negative effects. For power comparisons, the same parameters were used with different sample sizes, but different parameters were used for GEI and joint tests. We used two approaches to indicate the proportion of causal variants: (1) we randomly chose 20% of causes, among which 10% were positive and 10% were negative; (2) we randomly selected 20% of causes, among which 16% were positive and 4% were negative. The gene-BMI interaction was tested both for binary and quantitative traits. According to Wu et al. (6), we used a beta density weight function with parameters 1 and 25 on the MAF, so that rare variants had a higher weight than common variants.

Application to UK Biobank whole exome sequencing data

We used the UK Biobank (42) whole exome sequencing (WES) data as an example of our proposed meta-analysis. The UK Biobank has released the first tranche of WES data containing 49,959 samples in March 2019. The second tranche of WES data was released in October 2020 with 200,632 samples (including the samples in the first tranche). We divided the dataset in the second tranche by the first and second batches to meta-analyze the two datasets because there may be measurement errors or batch-specific heterogeneities due to differences in quality control procedures. After excluding the subjects with missing sex information, missing age, BMI or WHR, non-white British, and those who have withdrawn from the study, a total of 43,190 subjects were eligible for batch 1 (23,372 women and 19,818 men), and 132,058 subjects were eligible for batch 2 (72,961 women and 59,097 men). The age distribution was mean = 56.95 (sd = 7.89) in batch 1 and mean = 56.75 (sd = 8.04) in batch 2. The BMI distribution was mean = 27.40 (sd = 4.77) in batch 1 and mean = 27.34 (sd = 4.70) in batch 2.We tested for gene-by-sex interaction effects on WHR at the gene-level for the two batches. We first fitted a linear mixed model (43) adjusting for sex, age, age² and the top ten ancestry principal components (PCs) for each batch using glmmkin function from the GMMAT package,

The relatedness matrix was constructed using the kinship coefficients computed by KING software (44) for third-degree and closer relatives, provided by the UK Biobank (see UK Biobank Resource 531).

We used the genotype data generated by the Functionally Equivalent (FE) pipeline (45). The variant groups were defined as protein-coding regions for the WES data and a total of 19,449 protein-coding regions with 17,549,650 variants were available for analysis. MAGEE was used to generate the score statistics and the covariance matrices for each batch. In the next step, we combined and meta-analyzed the scores and covariance matrices for the two batches, assuming homogeneous genetic effects. As IF test was more powerful than the IV test, and JF test was slightly more powerful than JV and JD tests in our power simulations, in the UK Biobank WES data analysis we reported IF and JF test p values as the GEI and joint test results, respectively. The meta-analysis excluded variants with a minor allele count (MAC) less than 5, or with a genotype missing rate greater than 5% in the combined dataset, and 18,668 protein-coding regions with 2,328,550 variants passed these filters. We applied a beta density weight function with parameters 1 and 25 on the MAF. In addition, we compared the meta-analysis results to the results of a joint analysis pooling all the individual-level data using MAGEE. For protein-coding regions that reached the Bonferroni-corrected significance level of 0.05/18,668 = 2.7×10⁻⁶, we also performed conditional analysis by adjusting for variants in the region that had a single variant joint test p value less than 5.0×10⁻⁸, MAF greater than 1% and genotype missing rate less than 5%. Both the genetic main effect and gene-sex interaction terms of significant single variants were included in the conditional analysis.

Data Availability

The individual-level data that support the findings of this study are available upon application to the UK Biobank (https:/www.ukbiobank.ac.uk/register-apply/).

https:/www.ukbiobank.ac.uk/register-apply/

Declarations

Ethics approval and consent to participate

This study was approved by the Institutional Review Board (IRB) of The University of Texas Health Science Center at Houston (UTHealth) with project number HSC-SPH-18-0466. Waiver of informed consent was granted.

Availability of data and materials

The individual-level data that support the findings of this study are available upon application to the UK Biobank (https://www.ukbiobank.ac.uk/register-apply/).

The method is implemented in R package MAGEE, available at https://github.com/large-scale-gxe-methods/MAGEE.

Competing interests

The authors declare that they have no competing interests.

Funding

This research was supported by National Institutes of Health (NIH) grant R01 HL145025.

Acknowledgements

This research has been conducted using the UK Biobank Resource under Application Number 42646. This work was supported by National Institutes of Health (NIH) grant R01 HL145025.

References

(1).↵
Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet 2005;6(4):287–298.
OpenUrl CrossRef PubMed Web of Science
(2).↵
Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet 2010;11(4):259–272.
OpenUrl CrossRef PubMed Web of Science
(3).↵
Hu Y, Berndt SI, Gustafsson S, Ganna A, Hirschhorn J, North KE, et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am J Hum Genet 2013;93(2):236–48.
OpenUrl CrossRef PubMed
(4).↵
Lee S, Abecasis GR, Boehnke M, Lin X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. Am J Hum Genet 2014;95(1):5–23.
OpenUrl CrossRef PubMed
(5).↵
Li B, Leal SM. Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data. The American Journal of Human Genetics 2008;83(3):311–321.
OpenUrl CrossRef PubMed Web of Science
(6).↵
Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics 2011;89(1):82–93.
OpenUrl CrossRef PubMed
(7).↵
Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei L, et al. Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. Am J Hum Genet 2010;86(6):832–838.
OpenUrl CrossRef PubMed Web of Science
(8).↵
Chen H, Meigs JB, Dupuis J. Incorporating Gene-Environment Interaction in Testing for Association with Rare Genetic Variants. Hum Hered 2014;78(2):81–90.
OpenUrl CrossRef PubMed
(9).
Su Y, Di C, Hsu L. A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics (Oxford, England) 2017;18(1):119.
OpenUrl CrossRef PubMed
(10).↵
Wang X, Lim E, Liu C, Sung YJ, Rao DC, Morrison AC, et al. Efficient gene–environment interaction tests for large biobank-scale sequencing studies. Genet Epidemiol 2020;44(8):908–923.
OpenUrl
(11).↵
Kraft P, Yen Y, Stram DO, Morrison J, Gauderman WJ. Exploiting Gene-Environment Interaction to Detect Genetic Associations. Hum Hered 2007;63(2):111–119.
OpenUrl CrossRef PubMed Web of Science
(12).↵
Sung YJ, Winkler TW, Manning AK, Aschard H, Gudnason V, Harris TB, et al. An Empirical Comparison of Joint and Stratified Frameworks for Studying G × E Interactions: Systolic Blood Pressure and Smoking in the CHARGE Gene-Lifestyle Interactions Working Group. Genet Epidemiol 2016;40(5):404–415.
OpenUrl
(13).↵
Lee S, Teslovich TM, Boehnke M, Lin X. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies. Am J Hum Genet 2013;93(1):42–53.
OpenUrl CrossRef PubMed
(14).
Liu DJ, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, et al. Meta-analysis of gene-level tests for rare variant association. Nat Genet 2014;46(2):200–204.
OpenUrl CrossRef PubMed
(15).↵
Feng S, Liu D, Zhan X, Wing MK, Abecasis GR. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 2014;30(19):2828–2829.
OpenUrl CrossRef PubMed
(16).
Yang J, Chen S, Abecasis G. Improved score statistics for meta-analysis in single-variant and gene-level association studies. Genet Epidemiol 2018;42(4):333–343.
OpenUrl
(17).↵
Tang Z, Lin D. MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics 2013;29(14):1803–1805.
OpenUrl CrossRef PubMed Web of Science
(18).
He Z, Xu B, Lee S, Ionita-Laza I. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. Am J Hum Genet 2017;101(3):340–352.
OpenUrl CrossRef
(19).↵
Ka J, Lee J, Kim Y, Oh B, Park T. Meta-Qtest: meta-analysis of quadratic test for rare variants. BMC medical genomics; BMC Med Genomics 2019;12:102–102.
OpenUrl
(20).↵
Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, et al. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. The American Journal of Human Genetics 2019;104(2):260–274.
OpenUrl CrossRef
(21).↵
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26(17):2190–2191.
OpenUrl CrossRef PubMed Web of Science
(22).↵
Wang J, Liu Q, Pierce BL, Huo D, Olopade OI, Ahsan H, et al. A meta-analysis approach with filtering for identifying gene-level gene–environment interactions. Genet Epidemiol 2018;42(5):434–446.
OpenUrl
(23).↵
Jin X, Shi G. Variance-Component-Based Meta-Analysis of Gene-Environment Interactions for Rare Variants. G3 (Bethesda, Md.) 2021.
(24).↵
Aschard H, Hancock DB, London SJ, Kraft P. Genome-Wide Meta-Analysis of Joint Tests for Genetic and Gene-Environment Interaction Effects. Hum Hered 2010;70(4):292–300.
OpenUrl CrossRef PubMed
(25).↵
Manning AK, LaValley M, Liu C, Rice K, An P, Liu Y, et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP × environment regression coefficients. Genet Epidemiol 2011;35(1):11–18.
OpenUrl CrossRef PubMed
(26).↵
Bland JM, Altman DG. Multiple Significance Tests: The Bonferroni Method. BMJ: British Medical Journal 1995;310(6973):170–170.
OpenUrl CrossRef
(27).↵
Lumish HS, O’Reilly M, Reilly MP. Sex Differences in Genomic Drivers of Adipose Distribution and Related Cardiometabolic Disorders: Opportunities for Precision Medicine. Arterioscler Thromb Vasc Biol 2020;40(1):45–60.
OpenUrl CrossRef
(28).↵
Shungin D, Winkler TW, Croteau-Chonka D, Ferreira T, Locke AE, Mägi R, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 2015;518(7538):187–U378.
OpenUrl CrossRef PubMed Web of Science
(29).↵
Justice AE, Karaderi T, Highland HM, Young KL, Graff M, Lu Y, et al. Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution. Nat Genet 2019;51(3):452–469.
OpenUrl CrossRef
(30).
Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 2010;42(11):949–U160.
OpenUrl CrossRef PubMed Web of Science
(31).
Lotta LA, Wittemans L, Zuber V, Stewart ID, Sharp SJ, Luan J, et al. Association of genetic variants related to gluteofemoral vs abdominal fat distribution With type 2 diabetes, coronary disease, and cardiovascular risk factors. 2018.
(32).↵
Zhu Z, Guo Y, Shi H, Liu C, Panganiban RA, Chung W, et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J Allergy Clin Immunol 2020;145(2):537–549.
OpenUrl
(33).↵
Hill AVS, Allsopp CEM, Kwiatkowski D, Anstey NM, Twumasi P, Rowe PA, et al. Common West African HLA antigens are associated with protection from severe malaria. Nature 1991;352(6336):595–600.
OpenUrl CrossRef PubMed Web of Science
(34).↵
Tang Z, Lin D. Meta-Analysis of Sequencing Studies With Heterogeneous Genetic Associations. Genet Epidemiol 2014;38(5):389–401.
OpenUrl CrossRef PubMed
(35).↵
Lee YH. Meta-analysis of genetic association studies. Annals of laboratory medicine 2015;35(3):283.
OpenUrl
(36).↵
Kircher M, Witten DM, Jain P, O’Roak B,J., Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46(3):310–315.
OpenUrl CrossRef PubMed
(37).
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 2019;47:D886–D894.
OpenUrl CrossRef PubMed
(38).↵
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics 2018;34(3):511–513.
OpenUrl CrossRef PubMed
(39).↵
Fisher RA. Statistical Methods for Research Workers. Second Edition, Revised and enlarged. Edinburgh: Oliver and Boyd. Journal of Comparative Pathology and Therapeutics 1928;41:261–262.
OpenUrl
(40).↵
Hudson RR. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 2002;18(2):337–338.
OpenUrl CrossRef PubMed Web of Science
(41).
Elks CE, Den Hoed M, Zhao JH, Sharp SJ, Wareham NJ, Loos RJF, et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Frontiers in endocrinology 2012;3:29.
OpenUrl
(42).↵
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562(7726):203–209.
OpenUrl CrossRef PubMed
(43).↵
Conomos MP, Laurie CA, Stilp AM, Gogarten SM, Mchugh CP, Nelson SC, et al. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 2016;98(1):165.
OpenUrl CrossRef PubMed
(44).↵
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W. Robust relationship inference in genome-wide association studies. Bioinformatics 2010;26(22):2867–2873.
OpenUrl CrossRef PubMed Web of Science
(45).↵
Regier AA, Farjoun Y, Larson DE, Krasheninina O, Kang HM, Howrigan DP, et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nature communications 2018;9(1):4038.
OpenUrl

View the discussion thread.

Posted May 10, 2022.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (410)
Allergy and Immunology (723)
Anesthesia (214)
Cardiovascular Medicine (3075)
Dentistry and Oral Medicine (347)
Dermatology (260)
Emergency Medicine (461)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1093)
Epidemiology (13006)
Forensic Medicine (13)
Gastroenterology (858)
Genetic and Genomic Medicine (4819)
Geriatric Medicine (444)
Health Economics (749)
Health Informatics (3047)
Health Policy (1100)
Health Systems and Quality Improvement (1123)
Hematology (406)
HIV/AIDS (955)
Infectious Diseases (except HIV/AIDS) (14310)
Intensive Care and Critical Care Medicine (878)
Medical Education (453)
Medical Ethics (119)
Nephrology (498)
Neurology (4593)
Nursing (244)
Nutrition (680)
Obstetrics and Gynecology (842)
Occupational and Environmental Health (764)
Oncology (2379)
Ophthalmology (672)
Orthopedics (268)
Otolaryngology (333)
Pain Medicine (300)
Palliative Medicine (87)
Pathology (514)
Pediatrics (1236)
Pharmacology and Therapeutics (517)
Primary Care Research (518)
Psychiatry and Clinical Psychology (3944)
Public and Global Health (7174)
Radiology and Imaging (1596)
Rehabilitation Medicine and Physical Therapy (951)
Respiratory Medicine (942)
Rheumatology (458)
Sexual and Reproductive Health (474)
Sports Medicine (402)
Surgery (508)
Toxicology (65)
Transplantation (219)
Urology (188)

[1] (1).↵
Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet 2005;6(4):287–298.
OpenUrl CrossRef PubMed Web of Science

[2] (2).↵
Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet 2010;11(4):259–272.
OpenUrl CrossRef PubMed Web of Science

[3] (3).↵
Hu Y, Berndt SI, Gustafsson S, Ganna A, Hirschhorn J, North KE, et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am J Hum Genet 2013;93(2):236–48.
OpenUrl CrossRef PubMed

[4] (4).↵
Lee S, Abecasis GR, Boehnke M, Lin X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. Am J Hum Genet 2014;95(1):5–23.
OpenUrl CrossRef PubMed

[5] (5).↵
Li B, Leal SM. Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data. The American Journal of Human Genetics 2008;83(3):311–321.
OpenUrl CrossRef PubMed Web of Science

[6] (6).↵
Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics 2011;89(1):82–93.
OpenUrl CrossRef PubMed

[7] (7).↵
Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei L, et al. Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. Am J Hum Genet 2010;86(6):832–838.
OpenUrl CrossRef PubMed Web of Science

[8] (8).↵
Chen H, Meigs JB, Dupuis J. Incorporating Gene-Environment Interaction in Testing for Association with Rare Genetic Variants. Hum Hered 2014;78(2):81–90.
OpenUrl CrossRef PubMed

[9] (9).
Su Y, Di C, Hsu L. A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics (Oxford, England) 2017;18(1):119.
OpenUrl CrossRef PubMed

[10] (10).↵
Wang X, Lim E, Liu C, Sung YJ, Rao DC, Morrison AC, et al. Efficient gene–environment interaction tests for large biobank-scale sequencing studies. Genet Epidemiol 2020;44(8):908–923.
OpenUrl

[11] (11).↵
Kraft P, Yen Y, Stram DO, Morrison J, Gauderman WJ. Exploiting Gene-Environment Interaction to Detect Genetic Associations. Hum Hered 2007;63(2):111–119.
OpenUrl CrossRef PubMed Web of Science

[12] (12).↵
Sung YJ, Winkler TW, Manning AK, Aschard H, Gudnason V, Harris TB, et al. An Empirical Comparison of Joint and Stratified Frameworks for Studying G × E Interactions: Systolic Blood Pressure and Smoking in the CHARGE Gene-Lifestyle Interactions Working Group. Genet Epidemiol 2016;40(5):404–415.
OpenUrl

[13] (13).↵
Lee S, Teslovich TM, Boehnke M, Lin X. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies. Am J Hum Genet 2013;93(1):42–53.
OpenUrl CrossRef PubMed

[14] (14).
Liu DJ, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, et al. Meta-analysis of gene-level tests for rare variant association. Nat Genet 2014;46(2):200–204.
OpenUrl CrossRef PubMed

[15] (15).↵
Feng S, Liu D, Zhan X, Wing MK, Abecasis GR. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 2014;30(19):2828–2829.
OpenUrl CrossRef PubMed

[16] (16).
Yang J, Chen S, Abecasis G. Improved score statistics for meta-analysis in single-variant and gene-level association studies. Genet Epidemiol 2018;42(4):333–343.
OpenUrl

[17] (17).↵
Tang Z, Lin D. MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics 2013;29(14):1803–1805.
OpenUrl CrossRef PubMed Web of Science

[18] (18).
He Z, Xu B, Lee S, Ionita-Laza I. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. Am J Hum Genet 2017;101(3):340–352.
OpenUrl CrossRef

[19] (19).↵
Ka J, Lee J, Kim Y, Oh B, Park T. Meta-Qtest: meta-analysis of quadratic test for rare variants. BMC medical genomics; BMC Med Genomics 2019;12:102–102.
OpenUrl

[20] (20).↵
Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, et al. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. The American Journal of Human Genetics 2019;104(2):260–274.
OpenUrl CrossRef

[21] (21).↵
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26(17):2190–2191.
OpenUrl CrossRef PubMed Web of Science

[22] (22).↵
Wang J, Liu Q, Pierce BL, Huo D, Olopade OI, Ahsan H, et al. A meta-analysis approach with filtering for identifying gene-level gene–environment interactions. Genet Epidemiol 2018;42(5):434–446.
OpenUrl

[23] (23).↵
Jin X, Shi G. Variance-Component-Based Meta-Analysis of Gene-Environment Interactions for Rare Variants. G3 (Bethesda, Md.) 2021.

[24] (24).↵
Aschard H, Hancock DB, London SJ, Kraft P. Genome-Wide Meta-Analysis of Joint Tests for Genetic and Gene-Environment Interaction Effects. Hum Hered 2010;70(4):292–300.
OpenUrl CrossRef PubMed

[25] (25).↵
Manning AK, LaValley M, Liu C, Rice K, An P, Liu Y, et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP × environment regression coefficients. Genet Epidemiol 2011;35(1):11–18.
OpenUrl CrossRef PubMed

[26] (26).↵
Bland JM, Altman DG. Multiple Significance Tests: The Bonferroni Method. BMJ: British Medical Journal 1995;310(6973):170–170.
OpenUrl CrossRef

[27] (27).↵
Lumish HS, O’Reilly M, Reilly MP. Sex Differences in Genomic Drivers of Adipose Distribution and Related Cardiometabolic Disorders: Opportunities for Precision Medicine. Arterioscler Thromb Vasc Biol 2020;40(1):45–60.
OpenUrl CrossRef

[28] (28).↵
Shungin D, Winkler TW, Croteau-Chonka D, Ferreira T, Locke AE, Mägi R, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 2015;518(7538):187–U378.
OpenUrl CrossRef PubMed Web of Science

[29] (29).↵
Justice AE, Karaderi T, Highland HM, Young KL, Graff M, Lu Y, et al. Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution. Nat Genet 2019;51(3):452–469.
OpenUrl CrossRef

[30] (30).
Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 2010;42(11):949–U160.
OpenUrl CrossRef PubMed Web of Science

[31] (31).
Lotta LA, Wittemans L, Zuber V, Stewart ID, Sharp SJ, Luan J, et al. Association of genetic variants related to gluteofemoral vs abdominal fat distribution With type 2 diabetes, coronary disease, and cardiovascular risk factors. 2018.

[32] (32).↵
Zhu Z, Guo Y, Shi H, Liu C, Panganiban RA, Chung W, et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J Allergy Clin Immunol 2020;145(2):537–549.
OpenUrl

[33] (33).↵
Hill AVS, Allsopp CEM, Kwiatkowski D, Anstey NM, Twumasi P, Rowe PA, et al. Common West African HLA antigens are associated with protection from severe malaria. Nature 1991;352(6336):595–600.
OpenUrl CrossRef PubMed Web of Science

[34] (34).↵
Tang Z, Lin D. Meta-Analysis of Sequencing Studies With Heterogeneous Genetic Associations. Genet Epidemiol 2014;38(5):389–401.
OpenUrl CrossRef PubMed

[35] (35).↵
Lee YH. Meta-analysis of genetic association studies. Annals of laboratory medicine 2015;35(3):283.
OpenUrl

[36] (36).↵
Kircher M, Witten DM, Jain P, O’Roak B,J., Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46(3):310–315.
OpenUrl CrossRef PubMed

[37] (37).
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 2019;47:D886–D894.
OpenUrl CrossRef PubMed

[38] (38).↵
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics 2018;34(3):511–513.
OpenUrl CrossRef PubMed

[39] (39).↵
Fisher RA. Statistical Methods for Research Workers. Second Edition, Revised and enlarged. Edinburgh: Oliver and Boyd. Journal of Comparative Pathology and Therapeutics 1928;41:261–262.
OpenUrl

[40] (40).↵
Hudson RR. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 2002;18(2):337–338.
OpenUrl CrossRef PubMed Web of Science

[41] (41).
Elks CE, Den Hoed M, Zhao JH, Sharp SJ, Wareham NJ, Loos RJF, et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Frontiers in endocrinology 2012;3:29.
OpenUrl

[42] (42).↵
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562(7726):203–209.
OpenUrl CrossRef PubMed

[43] (43).↵
Conomos MP, Laurie CA, Stilp AM, Gogarten SM, Mchugh CP, Nelson SC, et al. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 2016;98(1):165.
OpenUrl CrossRef PubMed

[44] (44).↵
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W. Robust relationship inference in genome-wide association studies. Bioinformatics 2010;26(22):2867–2873.
OpenUrl CrossRef PubMed Web of Science

[45] (45).↵
Regier AA, Farjoun Y, Larson DE, Krasheninina O, Kang HM, Howrigan DP, et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nature communications 2018;9(1):4038.
OpenUrl