ABSTRACT
As the master regulator in utero, the placenta is core to the Developmental Origins of Health and Disease (DOHaD) hypothesis but is historically understudied. To identify placental gene-trait associations (GTAs) across the life course, we performed distal mediator-enriched transcriptome-wide association studies (TWAS) for 40 traits, integrating placental multi-omics from the Extremely Low Gestational Age Newborn Study. At P < 2.5 × 10−6, we detected 248 GTAs, mostly for neonatal and metabolic traits, across 176 genes, enriched for cell growth and immunological pathways. In aggregate, genetic effects mediated by placental expression significantly explained 4 early-life traits but no later-in-life traits. 89 GTAs showed significant mediation through distal genetic variants, identifying hypotheses for distal regulation of GTAs. Investigation of one hypothesis in human placenta-derived choriocarcinoma cells showed that knockdown of mediator gene EPS15 upregulated predicted targets SPATA13 and FAM214A, both associated with waist-hip ratio in TWAS, and multiple genes involved in metabolic pathways. These results suggest profound health impacts of placental genomic regulation in developmental programming across the life course.
INTRODUCTION
The placenta serves as the master regulator of the intrauterine environment via nutrient transfer, metabolism, gas exchange, neuroendocrine signaling, growth hormone production, and immunologic surveillance1–3. Due to strong influences on postnatal health, the placenta is central to the Developmental Origins of Health and Disease (DOHaD) hypothesis – that the in utero experience has lifelong impacts on child health by altering developmental programming and influencing risk of common, noncommunicable health conditions4. For example, physiological characteristics of the placenta have been linked to neuropsychiatric, developmental, and metabolic diseases or health traits (collectively referred to as traits) that manifest throughout the life course, either early- or later-in-life (Figure 1)1,5–8. Despite its long-lasting influences on health, the placenta is understudied in large consortia studies of multi-tissue gene regulation9,10. Studying regulatory mechanisms in the placenta underlying biological processes in developmental programming could provide novel insight into health and disease etiology.
The complex interplay between genetics and placental transcriptomics and epigenomics has strong effects on gene expression that may explain variation in gene-trait associations (GTAs). Quantitative trait loci (QTL) analyses have identified a strong influence of cis-genetic variants on both placental gene expression and DNA methylation11. Furthermore, there is growing evidence that the placental epigenome influences gene regulation, often distally (more than 1-3 Megabases away in the genome)12, and that placental DNA methylation and microRNA (miRNA) expression are associated with health traits in children13. Dysfunction of transcription factor regulation in the placenta has also shown profound effects on childhood traits14. Although combining genetics, transcriptomics, and epigenomics lends insight into the influence of placental genomics on complex traits15, genome-wide screens for GTAs that integrate different molecular profiles and generate functional hypotheses require more sophisticated computational methods.
To this end, advances in transcriptome-wide association studies (TWAS) have allowed for integration of genome-wide association studies (GWAS) and eQTL datasets to boost power in identifying GTAs, specific to a relevant tissue16,17. However, traditional methods for TWAS largely overlook genetic variants distal to genes of interest, ostensibly mediated through regulatory biomarkers, like transcription factors, miRNAs, or DNA methylation sites18. Not only may these distal biomarkers explain a significant portion of both gene expression heritability and trait heritability on the tissue-specific expression level19,20, they may also influence tissue-specific trait associations for individual genes. Due to the strong interplay of regulatory elements in placental gene regulation, we sought to systematically characterize portions of gene expression that are influenced by these distal regulatory elements.
Here, we set out to identify the following: (1) which genes show associations between their placental genetically-regulated expression (GReX) and various traits across the life course, (2) which traits along the life course can be explained by placental GReX, in aggregate, and (3) which transcription factors, miRNAs, or CpG sites potentially regulate trait-associated genes in the placenta (Figure 1). We leveraged multi-omic data from fetal-side placenta tissue from the Extremely Low Gestational Age Newborn (ELGAN) Cohort Study21 to train predictive models of gene expression enriched for distal SNPs using MOSTWAS, a recent TWAS extension that integrates multi-omic data22. Using 40 GWAS of European-ancestry subjects from large consortia23–27, we performed a series of TWAS for non-communicable health traits and disorders that may be influenced by the placenta to identify GTAs and functional hypotheses for regulation (Figure 2). To our knowledge, this is the first distal mediator-enriched TWAS of health traits that integrates multi-omic data from the placenta.
RESULTS
Overview of analytic framework
We conduct a series of distal mediator-enriched transcriptome-wide association studies (TWAS) for a variety of complex traits by integrating GWAS data with placental eQTL data from ELGAN. First, we use a recent methodology, MOSTWAS22, to train predictive models of gene expression using both local- and distal-SNPs to genes (Figure 2A). Next, we employ these models to conduct TWAS for these traits using GWAS summary statistics to identify genes with placental genetically-regulated expression (GReX) associated with different traits across the life course (Figure 2B)17. We then estimate the extent to which placental genetically-regulated expression across all trait-associated genes can explain the variability in a trait and correlations between traits (Figure 2C)17,28. Next, to provide more biological context, for genes estimated to have placental GTAs, we run multiple follow-up analyses (Figure 2C): gene ontology enrichment analyses29, probabilistic fine-mapping of overlapping loci30, phenome-wide analyses for select genes, and prioritization of functional hypotheses for upstream distal regulation22. Lastly, for one particular functional hypothesis with strong computational support, we conduct an in-vitro assay in human placenta-derived cell lines to validate the predicted mediator-TWAS gene relationship and the transcriptomic consequences of this mediator (Figure 2D).
Complex traits are genetically heritable and correlated
We curated GWAS summary statistics from subjects of European ancestry for 40 non-communicable traits and disorders across five health categories to identify potential links to genetically-regulated placental expression (traits and cohorts for each GWAS are summarized in Supplemental Table S1, sample sizes are provided in Supplemental Table S2). These five categories of traits (autoimmune/autoreactive disorders, metabolic traits, cardiovascular disorders, early childhood outcomes, and neuropsychiatric traits) have been linked previously to placental and fetal biology and morphology1– 8.These 40 traits, derived from 5 different consortia (Supplemental Table S1), comprise of 3 autoimmune/autoreactive disorders, 8 body size/metabolic traits, 4 cardiovascular disorders, 14 neonatal/early childhood traits, and 11 neuropsychiatric traits/disorders23–27. The 26 traits that are not categorized as neonatal/early childhood traits are measured exclusively in adults. In addition, these 40 GWAS are not derived from the same samples of patients
To quantify the total genetic contribution to each trait and the genetic associations shared between traits, using linkage disequilibrium (LD) score regression with LD scores generated for individuals of European ancestry from the 1000 Genomes projects31,32, we estimated the SNP heritability (h2) and genetic correlation (Rg) of these traits, respectively (Supplemental Figure S1 and S2). Of the 40 traits, 37 showed significantly positive SNP heritability and 18 with(ĥ2 > 0.10 (Supplemental Figure S1, Supplemental Table S1), with the largest heritability for childhood BMI (ĥ2 = 0.69, SE = 0.064). As expected, we observed strong, statistically significant genetic correlations between traits of similar categories (i.e., between neuropsychiatric traits or between metabolic traits) (Supplemental Figure S2; Supplemental Table S3). At Benjamini-Hochberg FDR-adjusted P < 0.05, we also observed significant correlations between traits from different categories: diabetes and angina (, FDR-adjusted P = 6.53 × 10−33), Tanner scale (in children) and BMI (, FDR-adjusted P = 1.06 × 10−3), and BMI and obsessive compulsive disorder (, FDR-adjusted P = 1.79 × 10−9), for example. Given strong and potentially shared genetic influences across these traits, we examined whether genetic associations with these traits are mediated by the placental transcriptome.
Multiple placental gene-trait associations detected across the life course
In the first step of our TWAS (Figure 2A), we leveraged MOSTWAS22, a recent TWAS extension that includes distal variants in transcriptomic prediction, to train predictive models of placental expression. As large proportions of total heritable gene expression are explained by distal-eQTLs local to regulatory hotspots18,20, MOSTWAS uses data-driven approaches to identify mediating regulatory biomarkers or distal-eQTLs mediated through local regulatory biomarkers to increase predictive power for gene expression and power to detect GTAs (Supplemental Figure S3)22. In this analysis, these regulatory biomarkers include potential regulatory protein (RP) encoding genes (as curated by TFcheckpoint33), miRNAs, and CpG methylation sites from the ELGAN Study. we assume that these RP genes, miRNAs, and genes and other regulatory features local to these CpG methylation sites have distal effects on the transcription of genes of interest and thus potentially mediate distal-eQTLs to the gene of interest (Methods).
Using genotypes from umbilical cord blood34 and mRNA expression, CpG methylation, and miRNA expression data from fetal-side placenta15 from the ELGAN Study21 for 272 infants born pre-term, we built genetic models to predict RNA expression levels for genes in the fetal placenta (demographic summary in Supplemental Table S4). Out of a total of 12,020 genes expressed across all samples in ELGAN, we successfully built significant models for 2,994 genes, with positive SNP-based expression heritability (nominal P < 0.05) and five-fold McNemar’s adjusted cross-validation (CV) R2 ≥ 0.01 (Figure 3A [Step 1]; Methods). Only these 2,994 models are used in subsequent TWAS steps. Mean SNP heritability for these genes was 0.39 (25% quantile = 0.253, 75% quantile = 0.511), and mean CV R2 was 0.031 (quantiles: 0.014, 0.034). For out-sample validation, we imputed expression into individual-level genotypes from the Rhode Island Child Health Study (RICHS; N = 149)35,36, showing strong portability across studies: of 2,005 genes with RNA-seq expression in RICHS, 1,131 genes met adjusted R2 ≥ 0.01, with mean R2 = 0.011 (quantiles: 7.71 × 10−4, 0.016) (Figure 3B; Supplemental Table S5). Summary statistics of demographic and clinical variables for the RICHS show similar distributions of race, though RICHS excluded all pre-term babies, a clear difference in these two cohorts (Supplemental Table S4).
We integrated GWAS summary statistics for 40 traits from European-ancestry subjects with placental gene expression using our predictive models. Using the weighted burden test with the 1000Genomes European ancestry LD matrix as a reference17, we detected 932 GTAs (spanning 686 unique genes) at P < 2.5 × 10−6, a transcriptome-wide significance threshold consistent with previous TWAS17,28 (Figure 3A [Step 2]). As many of these loci carry significant signal because of strong SNP-trait associations, we employed Gusev et al’s permutation test to assess how much signal is added by the SNP-expression weights and confidently conclude that integration of expression data significantly refines association withthe trait17. At FDR-adjusted P < 0.05 and spanning 176 unique genes, we detected 248 such GTAs, with 11 autoimmune/autoreactive, 136 body size/metabolic, 32 cardiovascular, 39 neonatal/childhood, and 30 neuropsychiatric GTAs (Figure 3A [Step 3], Supplemental Table S2 and S6; Miami plots of TWAS Z-scores in Supplemental Figures S4-S9).
The 39 GTAs detected with adult BMI included LARS2 (Z = 11.4) and CAST (Z = −4.61). These two GTAs have been detected using cis-only TWAS in different tissues17,28. In addition, one of the 30 genes identified in association with waist-hip ratio (in adults) was prioritized in other tissues by TWAS: NDUFS1 (Z = −5.38)28. We cross-referenced susceptibility genes with a recent cis-only TWAS of fetal birthweight, childhood obesity, and childhood BMI by Peng et al using placental expression data from RICHS8. Of the 19 birthweight-associated genes they identified, we could only train significant expression models for two in ELGAN: PLEKHA1 and PSG8. We only detected a significant association between PSG8 and fetal birthweight (Z = −7.77). Similarly, of the 6 childhood BMI-associated genes identified by Peng et al, only 1 had a significant model in ELGAN and showed no association with the trait; there were no overlaps with childhood obesity-associated genes8. We hypothesize that minimal overlap with susceptibility genes identified by Peng et al is due to differing phenotypes and eQTL architectures in the datasets and different inclusion criteria for significant gene expression models.
Next, we tested for horizontal pleiotropic effects of the SNPs employed in the models for TWAS-prioritized genes; if SNPs affect the outcome through a pathway independent of expression of the gene, the TWAS association may be biased37,38. Here, using PMR-Summary-Egger38, we test the magnitude of this null hypothesis for each of the 248 TWAS-prioritized GTAs. At FDR-adjusted P < 0.05, only three GTAs showed significant horizontal pleiotropic effects: MOV10, SLC35G2, and HLA-A, all associated with adult waist-hip ratio (Supplemental Table S6). These three genes may have upwardly biased TWAS associations, as the SNPs used to construct their GReX may influence the outcome through a different molecular pathway.
As these GTAs indicate trait association and do not reflect causality, we used FOCUS30, a Bayesian fine-mapping approach. For TWAS-significant genes with overlapping genetic loci, FOCUS estimates posterior inclusion probabilities (PIP) in a credible set of genes that explains the association signal at the locus. We found 8 such overlaps and estimated a 90% credible set of genes explaining the signal for each locus (Supplemental Table S9). For example, we identified 3 genes associated with triglycerides in adults at the 12q24.13 chromosomal region (ERP29, RPL6, BRAP), with ERP29 defining the region’s 90% credible set with approximately 95% PIP. Similarly, we detected 3 genes associated with adultBMI at 10q22.2 (AP3M1, SAMD8, MRPS16), with AP3M1 defining the region’s 90% credible set with approximately 99% PIP.
We conducted over-representation analysis for biological process, molecular function, and PANTHER gene pathway ontologies for TWAS-detected susceptibility genes (Figure 3D, Supplemental Table S7)29. Overall, considering all 176 TWAS-identified genes, we observed enrichments for nucleic acid binding and immune or cell growth signaling pathways (e.g., B-cell/T-cell activation and EGF receptor, interleukin, PDGF, and Ras signaling pathways). By trait, we found related pathways (sphingolipid biosynthesis, cell motility, etc) for TWAS genes for metabolic and morphological traits (e.g., BMI and childhood BMI); for most traits, we were underpowered to detect ontology enrichments. We also assessed the overlap of TWAS genes with GWAS signals. A total of 112 TWAS genes did not overlap with GWAS loci (P< 5 × 10−8) within a 500 kilobase interval around any SNPs (local and distal) included in predictive models (Table 1).
Genetically-regulated placental expression mediates trait heritability and genetic correlations
To assess how genetically-regulated placental expression explains trait variance, we computed trait heritability on the placental expression level using all examined and all TWAS-prioritized susceptibility genes using a linkage disequilibrium (LD) score regression approach17,31. Overall, we found 4/14 neonatal traits (childhood BMI, head circumference, total puberty growth, and pubertal growth start) with significant (FDR-adjusted P < 0.05 for jack-knife test of significance)28; none of the 26 traits outside the neonatal category were appreciably explained by placental GReX (Supplemental Figure S10). Figure 4A shows that mean is higher in neonatal traits than other groups. In fact, placenta expression-mediated genetic heritability explains a larger proportion of total SNP heritability of neonatal traits, compared to traits from other categories (Figure 4B). A comparison of the number of GWAS-significant SNPs and TWAS-significant genes also shows that neonatal traits are enriched for placental TWAS associations, even though significant genome-wide GWAS architecture cannot be inferred for these traits (Supplemental Figure S11). These observations suggest that placental GReX affects neonatal traits more profoundly, as a significantly larger proportion of neonatal traits showed significant heritability on the placental GReX level than later-in-life traits.
Using RHOGE28, we assessed genetic correlations (RGE) between traits at the level of placental GreX (Supplemental Figure S12). We found several known correlations: between cholesterol and triglycerides, both in adults, and childhood BMI and adult BMI Interestingly, we found correlations between traits across categories (Figure 4C): IQ and diastolic blood pressure, both in adults, and age of asthma diagnosis and adult glucose levels . These traits have been linked in morphological analyses of the placenta, but our results suggest possible genomic contributions39. Overall, these correlations suggest shared genetic pathways for these pairs of traits or for etiologic antecedents of these traits; these shared pathways could be either at the susceptibility genes or through shared distal loci, mediated by RPs, miRNAs, or CpG methylation sites.
Genes with multiple GTAs have phenome-wide associations in early- and later-life traits
We noticed that multiple genes were identified in GTAs with multiple traits, leading us to examine potential horizontally pleiotropic genes. Of the 176 TWAS-prioritized genes, we identified 50 genes associated with multiple traits, many of which are genetically correlated (Table 2). Nine genes showed more than 3 GTAs across different categories. For example, IDI1, a gene involved in cholesterol biosynthesis40, showed associations with 3 metabolic and 2 neuropsychiatric traits: body fat percentage (Z = 15.57), HDL (Z = 26.48), triglycerides (Z = −7.53), fluid intelligence score (= 6.37), and schizophrenia (Z = −5.56), with all five traits measured in adults. A link between cholesterol-related genes and schizophrenia has been detected previously, potentially due to coregulation of myelin-related genes41. Mediated by CpG site cg01687878 (found within PITPNM2), predicted expression of IDI1 was also computed using distal SNPs within Chromosome 12q24.31, a known GWAS risk loci for hypercholesteremia42; the inclusion of this locus may have contributed to the large TWAS associations. Similarly, SAMD4A also shows associations with 4 adult body size/metabolic - body fat percentage (Z = 6.70), cholesterol (Z = −6.76), HDL (Z = −6.78), triglycerides (Z = −5.30) - and 1 adult cardiovascular trait (diastolic blood pressure with (Z = −5.29). These associations also pick up on variants in Chromosome 12q24.31 local to CpG sites cg05747134 (within MMS19) and cg04523690 (within SETD1B). Another gene with multiple trait associations is CMTM4, an angiogenesis regulator43, showing associations with body fat percentage (Z = 6.17), hypertension (Z = 5.24), and fetal birthweight (Z = 8.11). CMTM4 shows evidenced risk of intrauterine growth restriction due to involvement with endothelial vascularization44, potentially suggesting that CMTM4 has a more direct effect in utero, which mediates its associations with body fat percentage and hypertension.
We further studied the 9 genes with 3 or more distinct GTAs across different categories (Figure 5A). Using UK Biobank23 GWAS summary statistics, we conducted TWAS for a variety of traits, measured in adults, across 8 groups, defined generally around ICD code blocks (Figure 5A, Supplemental Figure S13); here, we grouped metabolic and cardiovascular traits into one category for ease of analysis. At FDR-adjusted P < 0.05, ATPAF2, RPL6, and SEC11A showed GTA enrichments for immune-related traits, ATAPF2 for neonatal traits, IDI1 for mental disorders, and RPS25 for musculoskeletal traits. Across these 8 trait groups, RPL6 showed multiple strong associations with circulatory, respiratory, immune-related, and neonatal traits (Figure 5A). Examining specific GTAs for ATPAF2, IDI1, RPS25, and SEC11A reveals associations with multiple biomarker traits (Supplemental Figure S13). For example, at P < 2.5 × 10−6, ATPAF2 and IDI1’s immune GTA enrichment includes associations with eosinophil, monocyte, and lymphocyte count and IGF-1 concentration. ATPAF and RPS25 show multiple associations with platelet volume and distribution and hematocrit percentage. In addition, IDI1 was associated with multiple mental disorders (obsessive compulsive disorder, anorexia nervosa, bipolar disorder, and general mood disorders), consistent with its TWAS associations with fluid intelligence and schizophrenia (Supplemental Figure S13). As placental GReX of these genes correlates with biomarkers, these results may not necessarily signify shared genetic associations across multiple traits. Rather, this may point to more fundamental effects of these TWAS-identified genes that manifest in complex traits later in life.
We next examined whether placental GReX of these 9 genes correlate with fundamental traits at birth. We imputed expression into individual-level ELGAN genotypes (N = 729). Controlling for race, sex, gestational duration, inflammation of the chorion, and maternal age, as described in Methods and Materials, we tested for associations for 6 representative traits measured at birth or at 24 months: neonatal chronic lung disease, birth head circumference Z-score, fetal growth restriction, birth weight Z-score, necrotizing enterocolitis, and Bayley II Mental Development Index (MDI) at 24 months15. Shown in Figure 5B and Supplemental Table S10, at FDR-adjusted P < 0.05, we detected negative associations between SEC11A GReX and birthweight Z-score (effect size: -0.248, 95% adjusted CI: [-0.434,-0.063]) and GReX of ATPAF2 and head circumference Z-score (−0.173, [-0.282,-0.064]). Furthermore, we detected negative associations between MDI and GReX of RPL6 (−2.636, [-4.251,-1.02]) and ERP29 (−3.332, [-4.987,-1.677]). As many of these genes encode for proteins involved in core processes (i.e., RPL6 is involved in trans-activation of transcription and translation, and SEC11A has roles in cell migration and invasion)45,46, understanding how the placental GReX of these genes affects neonatal traits may elucidate the potential long-lasting impacts of placental dysregulation.
Body size and metabolic placental GTAs show trait associations in mice
To further study functional consequences for selected TWAS-identified genes, we evaluated the 109 metabolic trait-associated genes in the Hybrid Mouse Diversity Panel (HMDP) for correlations with obesity-related traits47. This panel includes 100 inbred mice strains with extensive collection of obesity-related phenotypes from over 12,000 genes, with expression measured in a variety of adult tissues. Of the 109 genes, 73 were present in the panel and 36 showed significant cis-GReX associations with at least one obesity-related trait at FDR-adjusted P < 0.10 (Supplemental Table S11). For example, EPB41L1 (Epb4.1l1 in mice), a gene that mediates interactions in the erythrocyte plasma membrane, was associated with cholesterol and triglycerides in TWAS and showed 22 GReX associations with cholesterol, triglycerides, and HDL in mouse liver, adipose, and heart, with R2 ranging between 0.09 and0.31. Similarly, UBC (Ubc in mice), a ubiquitin maintaining gene, was associated with waist-hip ratio in the placental TWAS and showed 27 GReX associations with glucose in adults, insulin, and cholesterol in mouse aorta, liver, and adipose tissues in HMDP, with R2 ranging between 0.08 and 0.14. Though generalizing these functional results from non-placental tissue in mice to humans is tenuous, we believe these 36 individually significant genes in the HMDP are fruitful targets for follow-up studies.
MOSTWAS reveals functional hypotheses for distal placental regulation of GTAs
An advantage of MOSTWAS’s methodology is in functional hypothesis generation by identifying potential mediators that affect TWAS-identified genes. Using the distal-SNPs added-last test from MOSTWAS22, we interrogated distal loci incorporated into expression models for trait associations, beyond the association at the local locus. For 88 of 248 associations, predicted expression from distal SNPs showed significant associations at FDR-adjusted P < 0.05 (Figure 3A [Step 4], Supplemental Table S6). For each significant distal association, we identified a set of biomarkers that potentially affects transcription of the TWAS gene: a total of 9 regulatory protein-encoding genes (RPs) and 159 CpG sites across all 89 distal associations. Particularly, we detected two RPs, DAB2 (distal mediator for PAPPA and diastolic blood pressure, distal (Z = −3.98) and EPS15, both highly expressed in placenta48,49. Mediated through EPS15 (overall distal Z = 7.11 and 6.33, respectively), distally predicted expression of SPATA13 and FAM214A showed association with waist-hip ratio. EPS15 itself showed a TWAS association for waist-hip ratio (Supplemental Table S6), and the direction of the EPS15 GTA was opposite to those of SPATA13 and FAM214A. Furthermore, RORA, a gene encoding a transcription factor involved in inflammatory signaling50, showed a negative association with transcription of UBA3, a TWAS gene for fetal birthweight. Low placental RORA expression was previously shown to be associated with lower birthweight51. Aside from functions related to transcription regulation, the 9 RPs (CUL5, DAB2, ELL, EPS15, RORA, SLC2A4RG, SMARCC1, NFKBIA, ZC3H15) detected by MOSTWAS were enriched for several ontologies (Supplemental Table S12), namely catabolic and metabolic processes, response to lipids, and multiple nucleic acid-binding processes29.
As we observed strong correlations between expressions of RP-TWAS gene pairs in ELGAN (Supplemental Figure S14), we then examined the associations between TWAS-identified genes and the GReX of any predicted mediating RPs in an external dataset. Using RICHS, we conducted a gene-based trans-eQTL scan using Liu et al’s Gene-Based Association Testing (GBAT) method52 to computationally validate RP-TWAS gene associations. We predicted GReX of the RPs using cis-variants through leave-one-out cross-validation and scanned for associations with the respective TWAS genes (Figure 4C, Supplemental Table S13). We found a significant association between predicted EPS15 and FAM214A expressions (effect size -0.24, FDR-adjusted P = 0.019). In addition, we detected a significant association between predicted NFKBIA and HNRNPU (effect size -0.26, FDR-adjusted P = 1.9 × 10−4). We also considered an Egger regression-based Mendelian randomization framework53 in RICHS to estimate the causal effects of RPs on the associated TWAS genes (Methods and Materials) using, as instrumental variables, cis-SNPs correlated to the RP and uncorrelated with the TWAS genes. We estimated significant causal effects for two RP-TWAS gene pairs (Figure 5C, Supplemental Table S14): EPS15 on FAM214A (causal effect estimate -0.58; 95% CI [0.21, 0.94]) and RORA on UBA3 (0.58; [0.20, 0.96]). These GBAT and MR estimates between EPS15 and FAM214A are in opposite directions of the simple correlations presented in Supplemental Figure S14. However, as discussed in previous TWAS and MR studies17,53, correlations between GReX and a phenotype are not equivalent to correlations between full expression and the phenotype, as full expression is subject multiple post-transcriptional process, while GReX is not.
We also examined the CpG methylation sites MOSTWAS marked as potential mediators for expression of TWAS genes for overlap with cis-regulatory elements in the placenta from the ENCODE Project Phase II10, identifying 34 CpG sites (mediating 29 distinct TWAS genes) that fall in cis-regulatory regions (Supplemental Table S15). Interestingly, one CpG site mediating (cg15733049, Chromosome 1:2334974) FAM214A is found in low-DNase activity sites in placenta samples taken at various timepoints; additionally, cg15733049 is local to EPS15, the RP predicted to mediate genetic regulation of FAM214A. Furthermore, expression of LARS2, a TWAS gene for adult BMI, is mediated by cg04097236 (found within ELOVL2), a CpG site found in low DNase or high H3K27 activity regions; LARS2 houses multiple GWAS risk SNPs for type 2 diabetes54 and has shown adult BMI TWAS associations in other tissues17,28. Results from these external datasets add more evidence that these mediators play a role in gene regulation of these TWAS-identified genes and should be investigated experimentally in future studies.
In-vitro assays reveal widespread transcriptomic consequences of EPS15 knockdown
Based on our computational results, we experimentally studied whether the inverse relationship between RP EPS15 and its two prioritized target TWAS genes, SPATA13 and FAM214A, is supported in vitro. We used a FANA oligonucleotide targeting EPS15 to knock down EPS15 expression in human placenta-derived JEG-3 choriocarcinoma cells and assessed the gene expression of the targets in no-addition controls, scramble oligo controls, and the knockdown variant via qRT-PCR. JEG-3 cells were selected for study based on their know first trimester-like phenotypes, including the synthesis and secretion of hCG, human placenta lactogen, progesterone, estrone, and estradiol55,56. Addition of FANA-EPS15 to JEG-3 cells decreased EPS15 gene expression, while increasing the expression of SPATA13 and FAM214A (50% decrease in EPS15 expression, 795% and 377% increase in SPATA13 and FAM214A expression, respectively). At FDR-adjusted P < 0.10, changes in gene expression of EPS15 and downstream targets from the scramble were statistically significant against the knockdown oligo. Similarly, changes in gene expression between the control mRNA and RP and target mRNA were statistically significant (Figure 6A).
To further investigate the transcriptomic consequences of EPS15 knockdown in vitro, we measured transcriptome-wide gene expression in the choriocarcinoma cell lines via RNA-seq and conducted differential gene expression analysis across the knockdown cells and scramble oligo controls57–59. Due to small sample sizes, we define a differentially expression gene with absolute log2-fold change greater than 0.5 at P < 1.32 × 10−6, a Bonferroni correction across all assayed genes (Methods). We detected 650 genes down-regulated and 838 genes up-regulated in the EPS15 knockdown cells, validating the negative correlations between EPS15 and SPATA13 and FAM214A observed in qRT-PCR (Figure 6B, Supplemental Table S16-S17). In particular, these down-regulated genes were enriched for cell cycle, cell proliferation, or replication ontologies, while up-regulated genes were enriched for multiple different pathways, including lipid-related processes, cell movement, and extracellular organization (Figure 5C, Supplemental Table S18-S19). Enrichments for cellular, molecular, and disease pathway ontologies support these enrichments (Supplemental Figure S15, Supplemental Table S18-S19). Though we could not study the effects of these three genes on body size-related traits, cis-GReX correlation analysis from the HMDP did reveal a negative cis-GReX correlation (r = −0.31, FDR-adjusted P = 0.07) between Eps15 (mouse analog of human gene EPS15) and free fatty acids in mouse liver (Supplemental Table S11). These results prioritize EPS15 for further study in larger cell line or animal studies as a potential regulator for multiple downstream genes, perhaps for genes affecting cell proliferation and replication in the placenta, like SPATA1360.
DISCUSSION
The placenta has been understudied in large multi-tissue consortia efforts that study tissue-specific regulatory mechanisms9,10 relevant to complex trait etiology. To address this gap, we systematically categorized placental gene-trait associations relevant to the DOHaD hypothesis using MOSTWAS, a method for enriching TWAS with distal genetic variants22. We detected 176 genes (enriched for cell growth and immune pathways) with transcriptome-wide significant associations, with the majority of GTAs linked to metabolic and neonatal/childhood traits. Furthermore, we could only estimate significantly positive placental GReX-mediated heritability for 4 neonatal traits but not for later-in-life traits. Many of these TWAS-identified genes, especially those with neonatal GTAs, showed multiple GTAs across trait categories (9 genes with 3 or more GTAs). We examined phenome-wide GTAs for these 9 genes in UKBB and found enrichments for traits affecting in immune and circulatory system (e.g., immune cell, erythrocyte, and platelet counts). We followed up with selected early-life traits in ELGAN and found associations with neonatal body size and infant cognitive development. These results suggest that placental expression, mediated by fetal genetics, is most likely to have large effects on early-life traits, but these effects may persist later-in-life as etiologic antecedents for complex traits.
MOSTWAS also generates hypotheses for regulation of TWAS-detected genes, through distal mediating biomarkers, like transcription factors, miRNAs, or products downstream of CpG methylation islands22. Our computational results prioritized 89 GTAs with strong distal associations. We interrogated one such functional hypothesis: EPS15, a predicted RP-encoding gene in the EGFR pathway, regulates two TWAS genes positively associated with waist-hip ratio - FAM214A, a gene of unknown function, and SPATA13, a gene that regulates cell migration and adhesion60–62. In fact, EPS15 itself showed a negative TWAS association with waist-hip ratio. In particular, EPS15, mainly involved in endocytosis, is a maternally imprinted gene and predicted to promote offspring health49,63–65. There is ample literature that implicates the protein product of EPS15 as a direct or indirect transcription regulator. The protein Eps15 is an adaptor protein that regulates intracellular trafficking and has been detected in the nucleus of mammalian cells66. Once in the nucleus, Eps15 has shown to positively modulate transcription in a GAL4 transactivation assay67. Furthermore, Eps15 and its binding partner intersectin activate the Elk-1 transcription factor, pointing to Eps15’s function in regulating gene expression in the nucleus68. Specific to the placenta, it has been proposed, through mouse models, that Eps15’s interactions with multiple proteins suggest a role in cell adhesion of trophoblast to endothelial cells through biogenesis of exosomes and extracellular vesicles, a critical part of placental and fetal development69–71.
In placental-derived choriocarcinoma epithelial cells, knockdown of EPS15 showed increased expression of both FAM214A and SPATA13, as well as multiple genes involved in metabolic and hormone-related pathways. Though not implicating a direct causal effect, EPS15’s inverse association with SPATA13 and FAM214A could provide more context to its full influence in placental developmental programming, perhaps by affecting cell proliferation or adhesion pathways. In vivo animal experiments, albeit limited in scope and generalizability, can be employed to further investigate GTAs, building off results from the HMDP showing cis-GReX correlations between EPS15 mouse analog and fatty acid levels. Although these cis-GReX correlations from HMDP cannot be generalized from mice to humans, our in vitro assay provides valuable evidence for EPS15 genomic regulation in the placenta. Our results also support the potential of MOSTWAS to build mechanistic hypotheses for upstream regulation of TWAS genes that hold up to experimental rigor.
We conclude with limitations of this study and future directions. First, our analysis considers only placental tissue. Though many of our GTAs leverage distal-eQTL architecture which tend to be tissue-specific, the QTLs we leverage in TWAS may not be placenta-specific. A similar analysis across developmental and adult tissues could reveal more widespread genetic signals associated with these traits. Second, the ELGAN Study gathered molecular data from infants born extremely pre-term. If unmeasured confounders affect both prematurity and a trait of interest, GTAs could be subject to backdoor collider confounding72. However, significant TWAS genes did not show associations for gestational duration, suggesting minimal bias from this collider effect. An extensive comparison of genome-wide eQTL architecture between ELGAN and RICHS, highlighting differences in genetic effects on gene expression across pre-term status, could be of particular scientific importance. An interesting future endeavor could include negative control variables to account for unmeasured confounders in predictive models to allow for more generalizability of predictive models73,74. Fourth, though we did scan neonatal traits in ELGAN using individual-level genotypes, as the sample size is small, larger GWAS with longitudinal traits could allow for rigorous Mendelian randomization studies that investigate relationships between traits across the life course, in the context of placental regulation. Fifth, we curated a list of regulatory proteins to include as potential mediators but use RNA expression of the genes that code for these proteins as a proxy for abundance. We contend that RNA abundance of the gene is a noisy estimate of the protein abundance. An interesting extension of this analysis could consider a proteome-wide association study, using the MOSTWAS framework to identify protein interactions that are disease-related. Lastly, due to small sample sizes of other ancestry groups in ELGAN, we could only credibly impute expression into samples from European ancestry and our TWAS only considers GWAS in populations of European ancestry75. We emphasize acquisition of larger genetic and genomic datasets from understudied and underserved populations, especially related to early-in-life traits.
Our findings reveal functional evidence for the fundamental influence of placental genetic and genomic regulation on developmental programming of early- and later-in-life traits, identifying placental gene-trait associations and testable functional hypotheses for upstream placental regulation of these genes. Future large-scale tissue-wide studies should emphasize the placenta as a core tissue for learning about the developmental origins of health and disease.
ONLINE METHODS
Data acquisition and quality control
Genotype data
Genomic DNA was isolated from umbilical cord blood and genotyping was performed using Illumina 1 Million Quad and Human OmniExpression-12 v1.0 arrays34,76. Prior to imputation, from the original set of 731,442 markers, we removed SNPs with call rate < 90% and MAF < 1%. We only consider genetic variants on autosomes. We did not use deviation from Hardy-Weinberg equilibrium as an exclusion criterion since ELGAN is an admixed population. This resulted in 700,845 SNPs. We removed 4 individuals out of 733 with sample-level missingness > 10% using PLINK77. We first performed strand-flipping according to the TOPMed Freeze 5 reference panel and using eagle and minimac4 for phasing and imputation78–80. Genotypes were coded as dosages, representing 0, 1, and 2 copies of the minor allele. The minor allele was coded in accordance with the NCBI Database of Genetic Variation81. Overall, after QC and normalization, we considered a total of 6,567,190 SNPs. We obtained processed genetic data from the Rhode Island Children’s Health Study, as described before36.
Expression data
mRNA expression was determined using the Illumina QuantSeq 3’ mRNA-Seq Library Prep Kit, a method with high strand specificity82. mRNA-sequencing libraries were pooled and sequenced (single-end 50 bp) on one lane of the Illumina HiSeq 2500. mRNA were quantified through pseudo-alignment with salmon57 mapped to the GENCODE Release 31 (GRCh37) reference transcriptome. miRNA expression profiles were assessed using the HTG EdgeSeq miRNA Whole Transcriptome Assay (HTG Molecular Diagnostics, Tucson, AZ). miRNA were aligned to probe sequences and quantified using the HTG EdgeSeq System83.
Genes and miRNAs with less than 5 counts for each sample were filtered, resulting in 12,020 genes and 2,047 miRNAs for downstream analysis. We only consider autosomal genes and miRNAs. Distributional differences between lanes were first upper-quartile normalized84,85. Unwanted technical and biological variation (e.g. tissue heterogeneity) was then estimated using RUVSeq86, where we empirically defined transcripts not associated with outcomes of interest as negative control housekeeping probes87. One dimension of unwanted variation was removed from the variance-stabilized transformation of the gene expression data using the limma package59,86–88. We obtained pre-processed RNA expression data from the Rhode Island Children’s Health Study, as described before36. Pre-processing steps for RNA expression data from the RICHS are different from those employed here in the ELGAN study.
Methylation data
Extracted DNA sequences were bisulfate-converted using the EZ DNA methylation kit (Zymo Research, Irvine, CA) and followed by quantification using the Infinium MethylationEPIC BeadChip (Illumina, San Diego, CA), which measures CpG loci at a single nucleotide resolution, as previously described89–92. Quality control and normalization were performed resulting in 856,832 CpG probes from downstream analysis, with methylation represented as the average methylation level at a single CpG site (β-value)90,93–96. DNA methylation data was imported into R for pre-processing using the minfi package94,95. Quality control was performed at the sample level, excluding samples that failed and technical duplicates; 411 samples were retained for subsequent analyses.
Functional normalization was performed with a preliminary step of normal-exponential out-of band (noob) correction method97 for background subtraction and dye normalization, followed by the typical functional normalization method with the top two principal components of the control matrix94,95. Quality control was performed on individual probes by computing a detection P-value and excluded 806 (0.09%) probes with non-significant detection (P > 0.01) for 5% or more of the samples. A total of 856,832 CpG sites were included in the final analyses. Lastly, the ComBat function was used from the sva package to adjust for batch effects from sample plate98. In addition, to account for cell-type heterogeneity, 5 surrogate values were estimated and removed from the data to account using the sva package, as previously described15,90,98. The data were visualized using density distributions at all processing steps. Each probe measured the average methylation level at a single CpG site. Methylation levels were calculated and expressed as β values, with where Mis the intensity of the methylated allele and U is the intensity of the unmethylated allele. β-values were logit transformed to M values for statistical analyses99. Overall, after QC and normalization, we considered 846,233 CpG sites, only on autosomes.
GWAS summary statistics
Summary statistics were downloaded from the following consortia: the UK Biobank23, Early Growth Genetics Consortium24, Genetic Investigation of Anthropometric Traits25, Psychiatric Genomics Consortium26, and the Complex Trait Genetics Lab27 (Supplemental Table 1). Genomic coordinates were transformed to the hg38 reference genome using liftOver100,101. SNP heritability for each trait and genetic correlations for all pairwise combinations of traits were estimated using LD score regression with the European ancestry sample from the 1000 Genomes Project as a reference for LD scores31,32.
QTL mapping
The first step in the MOSTWAS pipeline is to scan for associations between SNPs and genes (genome-wide eQTL analysis) and between mediators and genes. We conducted genome-wide eQTL mapping between all genotypes and all genes in the transcriptome using a standard linear regression in MatrixeQTL102. Here, we ran an additive model with gene expression as the outcome, SNP dosage as the primary predictor of interest, with covariate adjustments for 20 genotype PCs (for population stratification), sex, gestational duration, maternal age, maternal smoking status, and 10 expression PEER factors103. Mediators here are defined as RNA expression of genes that code for regulatory proteins (curated in TFcheckpoint33), miRNAs, and monomorphic CpG methylation sites. In sum, we call the expression or methylation of a mediator its intensity. We also conducted genome-wide mediator-QTL mapping with the intensity of mediators as the outcome with the same predictors as in the eQTL mapping. Lastly, we also assessed associations between mediators and gene expression using the same linear models, with mediator intensity as the main predictor. All intensities were scaled to zero mean and unit variance.
Estimation of SNP heritability of gene expression
An important step in a TWAS pipeline is estimation of SNP heritability of expression, as SNP heritability is a strong determinant of TWAS study power17,104. Heritability using genotypes within 1 Megabase of the gene of interest and any prioritized distal loci was estimated using the GREML-LDMS method, proposed to estimate heritability by correction for bias in LD in estimated SNP-based heritability105. Analysis was conducted using GCTA v1.93.1106. Briefly, Yang et al shows that estimates of heritability are often biased if causal variants have a different minor allele frequency (MAF) spectrums or LD structures from variants used in analysis. They proposed an LD and MAF-stratified GREML analysis, where variants are stratified into groups by MAF and LD, and genetic relationship matrices (GRMs) from these variants in each group are jointly fit in a multi-component GREML analysis.
Gene expression models
We used MOSTWAS to train predictive models of gene expression from germline genetics, including distal variants that were either close to associated mediators (transcription factors, miRNAs, CpG sites) or had large indirect effects on gene expression22 (Supplemental Figure S1). Our assumption here is that distal-eQTLs of a gene that are local to transcription factor-encoding genes, miRNAs, or regulatory features local to CpG methylation sites may be potentially mediated by cis-QTLs to these local features. This assumption has been employed by multiple studies previously to identify trans-eQTLs in multiple tissues107–110. For CpG methylation sites, we used the maxprobes R package to filter out cross-reactive or polymorphic probes, which may induce bias111–113. MOSTWAS contains two methods of predicting expression: (1) mediator-enriched TWAS (MeTWAS) and (2) distal-eQTL prioritization via mediation analysis. For MeTWAS, we first identified mediators strongly associated with genes through correlation analyses between all genes of interest and a set of distal mediators (FDR-adjusted P < 0.05). We then trained local predictive models (using SNPs within 1 Mb) of each mediator using either elastic net or linear mixed model, used these models to impute the mediator in the training sample, and included the imputed values for mediators as fixed effects in a regularized regression of the gene of interest. For DePMA, we first conducted distal eQTL analysis to identify all distal-eQTLs at P< 10−6 and then local mediator-QTL analysis to identify all mediator-QTLs for these distal-eQTLs at FDR-adjusted P < 0.05. We tested each distal-eQTL for their absolute total mediation effect on the gene of interest through a permutation test and included eQTLs with significantly large effects in the final expression model. Full mathematical details are provided in Bhattacharya et al 22. We considered only genes with significantly positive heritability at nominal P < 0.05 using a likelihood ratio test and five-fold McNemar’s adjusted cross-validation R2 ≥ 0.01, a cross-validation cutoff used by many previous TWAS analyses16,17,28,36,75,114,115. McNemar’s adjustment to the traditional R2 is computed as where n is the sample size and v is the number of predictors in this linear model. Since this R2 is computed only between the observed and predicted expression values, v = 1.
TWAS tests of association
Overall TWAS test
In an external GWAS panel, if individual SNPs are available, model weights from either MeTWAS or DePMA can be multiplied by their corresponding SNP dosages to construct the Genetically Regulated eXpression (GReX) for a given gene. This value represents the portion of expression (in the given tissue) that is directly predicted or regulated by germline genetics. We run a linear model or test of association with phenotype using this GReX value for the eventual TWAS test of association.
If individual SNPs are not available, then the weighted burden Z-test, proposed by Gusev et al, can be employed using summary statistics17. Briefly, we compute
Here, Z is the vector of Z-scores of SNP-trait associations for SNPs used in predicting expression. The vector wG represents the vector of SNP-gene effects from MeTWAS or DePMA and Σs,s is the LD matrix (correlation matrix between genotypes) between the SNPs represented in wG. The test statistic can be compared to the standard Normal distribution for inference.
Permutation test
We implement a permutation test, condition on the GWAS effect sizes, to assess whether the same distribution of SNP-gene effect sizes could yield a significant associations by chance17. We permute wG 1,000 times without replacement and recompute the weighted burden test to generate a null distribution for. This permutation test is only conducted for overall associations at P < 2.5 × 10−6.
Distal-SNPs added-last test
Lastly, we also implement a test to assess the information added from distal-eSNPs in the weighted burden test beyond what we find from local SNPs. This test is analogous to a group added-last test in regression analysis, applied here to GWAS summary statistics. Let Zl and Zd be the vector of Z-scores from GWAS summary statistics from local and distal-SNPs identified by a MOSTWAS model. The local and distal-SNP effects from the MOSTWAS model are represented in Wl and Wd. Formally, we test whether the weighted Z-score from distal-SNPs is significantly larger than 0 given the observed weighted Z-score from local SNPs We draw from the assumption that follow a bivariate Normal distribution. Namely, we conduct a two-sided Wald-type test for the null hypothesis:
We can derive a null distribution using conditional of bivariate Normal distributions; see Bhattacharya et al22.
Genetic heritability and correlation estimation
At the genome-wide genetic level, we estimated the heritability of and genetic correlation between traits via summary statistics using LD score regression31. On the predicted expression level, we adopted approaches from Gusev et al and Mancuso et al to quantify the heritability of and genetic correlations (ρGE) between traits at the predicted placental expression level17,28. We assume that the expected χ2 statistic under a complex trait is a linear function of the LD score31. The effect size of the LD score on the χ2 is proportional to : where NT is the GWAS sample size, M is the number of genes, l is the LD scores for genes, and a is the effect of population structure. We estimated the LD scores of each gene by predicting expression in European samples of 1000 Genomes and computing the sample correlations and inferred using ordinary least squares. We employed RHOGE to estimate and test for significant genetic correlations between traits at the predicted expression level28.
Multi-trait scans in UKBB and ELGAN
For 9 genes with 3 or more associations across traits of different categories, we conducted multi-trait TWAS scans in UK Biobank. Here, we used the weighted burden test in UKBB GWAS summary statistics from samples of European ancestry for 296 traits grouped by ICD code blocks (circulatory, congenital malformations, immune, mental disorders, musculoskeletal, neonatal, neurological, and respiratory). We also imputed expression for these genes in ELGAN using 729 samples with individual genotypes and conducted a multi-trait scan for 6 neonatal traits: neonatal chronic lung disease, head circumference Z-score, fetal growth restriction, birth weight Z-score, necrotizing enterocolitis, and Bayley II Mental Development Index (MDI) at 24 months. For continuous traits (head circumference Z-score, birth weight Z-score, and mental development index), we used a simple linear regression with the GReX of the gene as the main predictor, adjusting for race, sex, gestational duration (in days), inflammation of the chorion, and maternal age. For binary traits, we used a logistic regression with the same predictors and covariates. These covariates have been previously used in placental genomic studies of neonatal traits because of their strong correlations with the outcomes and with placental transcriptomics and methylomics15,90,116.
Validation analyses in RICHS
Using genotype and RNA-seq expression data from RICHS36, we attempted to validate RP-TWAS gene associations prioritized from the distal-SNPs added last test in MOSTWAS. We first ran GBAT, a trans-eQTL mapping method from Liu et al52 to assess associations between the loci around RPs and the expression of TWAS genes in RICHS. GBAT tests the association between the predicted expression of a RP with the expression of a TWAS gene, improving power of trans-eQTL mapping117. We also conduct directional Egger regression-based Mendelian randomization to estimate and test the causal effects of the expression of the RP on the expression of the TWAS gene118.
Human Mouse Diversity Panel
To provide some functional evidence of gene associations with metabolic traits, we evaluated the 109 metabolic trait-associated genes from our human placental TWAS in the Hybrid Mouse Diversity Panel (HMDP) for correlations with obesity-related traits in mice47. This panel includes 100 inbred mice strains with extensive collection of obesity-related phenotypes (e.g., cholesterol, body fat percentage, insulin, etc) from over 12,000 genes, with expression measured in a variety of adult tissues (liver, adipose, aorta). We note that the HMDP only considers adult tissues and does not include placental gene expression. In the HMDP, we consider both trait correlation to tissue-specific gene expression and cis-GReX (genetically-regulated expression controlled by cis-eQTLs).
In-vitro functional assays
Cell culture and treatment
The JEG-3 choriocarcinoma cells were purchased from the American Type Culture Collection (Manassas, VA). Cells were grown in Gibco RMPI 1640, supplemented with 10% fetal bovine serum (FBS), and 1% penicillin/streptomycin at 37°C in 5% CO2. Cells were plated at 2.1 × 106 cells per 75 cm3 flask and incubated under standard conditions until achieving roughly 90% confluence. To investigate the effects of gene silencing, we used AUMsilence FANA oligonucleotides for mRNA knockdown of EPS15 (AUM Bio Tech, Philadelphia, PA) and subsequent analysis of predicted downstream target genes SPATA13 and FAM214A. On the day of treatment, cells were seeded in a 24-well culture plate at 0.05 × 106 cells per well. Cells were plated in biological duplicate. FANA oligos were dissolved in nuclease-free water to a concentration of 500µM, added to cell culture medium to reach a final concentration of 20µM and incubated for 24 hours at 37°C in 5% CO2.
mRNA expression by quantitative Real-Time Polymerase Chain Reaction and RNA Sequencing
Treated and untreated JEG-3 cells were harvested in 350µL of buffer RLT plus. Successive RNA extraction was performed using the AllPrep DNA/RNA/miRNA Universal Kit according to the manufacturer’s protocol. RNA was quantified using a NanoDrop 1000 spectrophotometer (Thermo Scientific, Waltham, MA). RNA was then converted to cDNA, the next step toward analyzing gene expression. Next, mRNA expression was measured for EPS15, SPATA13, and FAM214A using real-time qRT-PCR and previously validated primers. Samples were run in technical duplicate. Real-time qRT-PCR Ct values were normalized against the housekeeping gene B-actin (ACTB), and fold changes in expression were calculated based on the ΔΔCT method119. Each sample was prepared in biological duplicate and technical duplicate. These samples were pooled together for sequencing to yield data representing four samples per exposure group. Fold change calculations using the Delta Delta CT method was calculated for each sample individually:
Treated and untreated samples of JEG-3 RNA previously extracted using the AllPrep DNA/RNA/miRNA Universal Kit were submitted to the High Throughput Sequencing Facility at UNC Chapel Hill for RNA sequencing. Total RNA samples were submitted for sequencing using the HS4000 HO platform. Samples were sequenced in duplicate, and libraries were prepped with the Kapa Stranded mRNA-Seq kit from Illumina Platforms. Sequencing was performed after all samples passed QAQC, with a paired-end read type, with a read length of 2×75.
Statistical analysis
Statistical analysis was performed using a one-way ANOVA (with nominal significance level α = 0.05). Post-hoc pairwise t-tests (3 degrees of freedom for biological and technical duplicate) were utilized to investigate direct comparisons within sample groups.
Differential expression analysis
RNA-seq quantified counts (transcripts per kilobase million) were imported using tximeta58 and summarized to the gene-level. Differential expression analysis between EPS15 knockdown samples and scramble oligo controls was conducted using DESeq259. Although false positive rates are well-controlled even at low sample sizes120, true positive rates at such a low sample size are low for smaller thresholds of log-transformed fold changes. Thus, guided by Schurch et al’s analysis, due to very limited sample size, we considered a gene to be differentially expressed if the absolute log2-fold change is greater than 1 and P < 0.05/37,788 = 1.32 × 10−6. This P-value threshold is a strict Bonferroni threshold across 37,788 quantified genes.
Data Availability
CODE AVAILABILITY Sample scripts for analysis are provided at https://github.com/bhattacharya-a-bt/dohad_twas. The MOSTWAS software is accessible at https://bhattacharya-a-bt.github.io/MOSTWAS/articles/MOSTWAS_vignette.html. DATA AVAILABILITY ELGAN mRNA, miRNA, and CpG methylation data can be accessed from the NCBI Gene Expression Omnibus GSE154829 and GSE167885. ELGAN genotype data is protected, as subjects are still enrolled in the study; any inquiries or data requests must be made to RCF and HPS. GWAS summary statistics can be accessed at the following links: UK Biobank (http://www.nealelab.is/uk-biobank), GIANT consortium (https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page), PGC (https://www.med.unc.edu/pgc/download-results/), EGG consortium (https://egg-consortium.org/), and CTG Lab (https://ctg.cncr.nl/software/). The RICHS eQTL dataset can be accessed via dbGaP accession number phs001586.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001586.v1.p1). Placental epigenomic annotations from the ENCODE Project are available from https://www.encodeproject.org/, with specific accession numbers in Supplemental Table S13. All models and full TWAS results can be accessed at https://doi.org/10.5281/zenodo.4618036121. RNA-seq data generated in placental JEG-3 cells are available at GSE185071.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE154829
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE167885
http://www.nealelab.is/uk-biobank
https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page
https://www.med.unc.edu/pgc/download-results/
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001586.v1.p1
AUTHOR CONTRIBUTIONS
Conceptualization: AB, TMO, RCF, HPS; Data curation: AB, ANF, VA, WL, YL, CP, CJM, TMO, RCF, HPS; Formal Analysis: AB, ANF, WL, HPS; Funding Acquisition: AJL, YL, RMJ, LS, KCKK, CJM, TMO, RCF, HPS; Investigation: AB, ANP, HJH, RCF, HPS; Methodology: AB, YL, HPS; Project administration: AB, RCF, HPS; Resources: TMO, RCF, HPS; Software: AB, WL, YL; Supervision: AB, YL, RCF, HPS; Validation: AB, CJM; Visualization: AB, ANF, RH; Writing – original draft: AB, RCF, HPS; Writing – review & editing: AB, AJL, ANF, VA, RH, WL, YL, RMJ, LS, HJH, KCKK, CJM, TMO, RCF, HPS
FUNDING
This study was supported by grants from the National Institutes of Health (NIH), specifically the National Institute of Neurological Disorders and Stroke (U01NS040069; R01NS040069), the Office of the NIH Director (UG3OD023348), the National Institute of Environmental Health Sciences (T32-ES007018; P30ES019776; R24ES028597), the National Heart, Lung and Blood Institute (R01HL47883, R01HL148577), the National Institute of Nursing Research (K23NR017898; R01NR019245), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD092374; R03HD101413; P50HD103573).
COMPETING INTERESTS
The authors declare that they have no competing interests.
CODE AVAILABILITY
Sample scripts for analysis are provided at https://github.com/bhattacharya-a-bt/dohad_twas. The MOSTWAS software is accessible at https://bhattacharya-a-bt.github.io/MOSTWAS/articles/MOSTWAS_vignette.html.
DATA AVAILABILITY
ELGAN mRNA, miRNA, and CpG methylation data can be accessed from the NCBI Gene Expression Omnibus GSE154829 and GSE167885. ELGAN genotype data is protected, as subjects are still enrolled in the study; any inquiries or data requests must be made to RCF and HPS. GWAS summary statistics can be accessed at the following links: UK Biobank (http://www.nealelab.is/uk-biobank), GIANT consortium (https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page), PGC (https://www.med.unc.edu/pgc/download-results/), EGG consortium (https://egg-consortium.org/), and CTG Lab (https://ctg.cncr.nl/software/). The RICHS eQTL dataset can be accessed via dbGaP accession number phs001586.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001586.v1.p1). Placental epigenomic annotations from the ENCODE Project are available from https://www.encodeproject.org/, with specific accession numbers in Supplemental Table S13. All models and full TWAS results can be accessed at https://doi.org/10.5281/zenodo.4618036121. RNA-seq data generated in placental JEG-3 cells are available at GSE185071.
CODE AVAILABILITY
The MOSTWAS software is accessible, with example workflows, at https://bhattacharya-a-bt.github.io/MOSTWAS/articles/MOSTWAS_vignette.html.
SUPPLEMENTAL FIGURE LEGENDS
Figure S1: SNP heritability of 40 traits. Estimates of SNP heritability with 95% confidence interval (X-axis), grouped and colored by trait category (Y-axis).
Figure S2: SNP-based genetic correlation between 40 traits. Heatmap of estimates of SNP-based genetic correlated between traits, grouped and colored by trait category. Correlations are marked with an asterisk are significantly non-zero with FDR-adjusted P < 0.05.
Figure S3: Example of a biological mechanism MOSTWAS leverages in its predictive models. Here, assume a SNP (in green) within a regulatory element affects the transcription of gene X (A) or the hyper- or hypomethylation of a CpG island upstream of gene X (B) that codes for a transcription factor or a microRNA hairpin. Transcription factor or microRNA X then binds to a distal regulatory region and affects the transcription of gene G. The association between the expression of gene X and gene G is leveraged in the first step of MeTWAS. A distal-eQTL association is also conferred between this distal-SNP and the eGene G, which is leveraged in the DePMA training process.
Figure S4: TWAS Miami plots for autoimmune/autoreactive disorders. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S5: TWAS Miami plots for cardiovascular disorders. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S6: TWAS Miami plots for neonatal/childhood outcomes. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S7: TWAS Miami plots for neuropsychiatric outcomes. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S8: TWAS Miami plots for BMI and BMI-adjusted waist-hip ratio. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S9: TWAS Miami plots for body size/metabolic traits, excluding BMI and BMI-adjusted waist-hip ratio. Weighted Z-scores for TWAS associations (Y-axis) over genomic location of genes (X-axis). Red lines show Z-scores corresponding to P < 2.5 × 10−6. Genes labelled have P < 2.5 × 10−6, nominal permutation P <0.05, and genes in green showed Benjamini-Hochberg FDR-adjusted P < 0.05 for the distal-SNPs added-last test.
Figure S10: Placental expression-mediated genetic heritability of traits. Caterpillar plot of placental expression-mediated genetic heritability of traits, colored by trait category. Wald-type 95% confidence intervals are provided for reference. Trait is labelled if the confidence interval does not intersect the null of .
Figure S11: Comparison of GWAS and TWAS results across all 40 traits. Scatterplot of number of TWAS-significant genes (Y-axis) and number of GWAS-significant SNPs (X-axis) across all 40 traits, colored by category of the trait. The size of the point shows the log10 sample size of the GWAS. The red line and gray band provide a regression line and 95% confidence band for the fitted values. Points are labelled if the point falls outside the confidence band.
Figure S12: Heatmap of genetic correlations on the heritable gene expression level between 40 traits considered in TWAS analysis. Genetic correlations between traits at the level of the predicted expression of heritable genes. Correlations at FDR-adjusted P < 0.05 are marked with an asterisk. Autoimmune/autoreactive traits are colored in yellow, body size/metabolic in purple, cardiovascular in green, neonatal/childhood outcomes in blue, and neuropsychiatric in red.
Figure S13: Miami plot of representative phenome-wide scans of GTAs in UKBB. Weighted burden Z-score (Y-axis) of GTA across all traits (X-axis), grouped and colored by ICD code block.
Figure S14: Heatmap of correlations between select transcription factor and TWAS-identified genes in RICHS. Correlations between the RICHS expression of RPs (Y-axis) and associated TWAS genes identified by MOSTWAS in ELGAN (X-axis).
Figure S15: Over-representation enrichments of differentially expressed genes in EPS15 knockdown. Enrichment plot of over-representation of biological process, cellular component, and molecular function ontologies (Y-axis) with -log10 FDR-adjusted P-value (X-axis). The size of the point gives the relative enrichment ratio for the given pathway.
SUPPLEMENTAL TABLE LEGENDS
Table S1: Overview of 40 traits and GWAS consider in analysis. The consortium, trait category, trait, URL for summary statistics, sample size, number of cases (if binary trait), SNP heritability estimate and standard error, Lambda GC, mean χ2 statistic, reference DOI for GWAS, and expression mediated heritability and standard errors (using all and all TWAS-significant genes) are provided in order.
Table S2: Comparison of GWAS and TWAS associations. The category, trait, GWAS sample size, number of cases, number of significant GWAS SNPs (P < 5x 10−8), and number of significant total and GWAS-overlapping TWAS associations (P < 2.5 × 10−6) are provided in order.
Table S3: Genetic correlations between traits at SNP- and placenta-expression mediated levels. Genetic correlations, standard errors, Z-test statistic, P-value, FDR-adjusted P-value, and genetic covariance and standard errors are provided for all pairs of traits.
Table S4: Demographic and clinical covariates summary statistics of ELGAN and RICHS samples.
Table S5: Summary of in- and out-sample predictive performance of MOSTWAS placental expression models. Mean, standard deviation, 25% quantile, median, and 75% quantile of gene expression heritability, in-sample cross-validation R2 in ELGAN, and out-sample R2 in RICHS.
Table S6: Summary of 248 significant TWAS gene-trait associations. For each gene and trait, the trait category, chromosomal position of the gene, expression heritability and associated likelihood ratio test P-value, cross-validation predictive performance for gene model, TWAS Z-score and P-value, permutation P-value, top SNP and P-value in GWAS among SNPs used in the gene model, distal Z-score and P-value, and identified mediators are provided, in order.
Table S7: Over-representation analysis of TWAS genes. Biological process, molecular function, and PANTHER pathway ontologies enriched for TWAS-identified genes associated with each trait at FDR-adjusted P < 0.05.
Table S8: Genetic correlations between traits at placental expression-mediated level. For each pair of traits, the genetic correlation, standard error, t-statistic and associated degrees of freedom and P-value is provided.
Table S9: Results of fine-mapping of overlapped TWAS genes using FOCUS. Overlapping genes are provided, with the associated trait, chromosomal positions, TWAS Z-scores, P-values, top GWAS SNP information, posterior inclusion probability, and whether they are included in the credible set for the region. The distal Z-score is also provided.
Table S10: Results of ELGAN phenome-wide scan of neonatal outcomes. For each gene and ELGAN phenotype, the effect size, standard error, adjusted 95% confidence interval, Z-score, P-value, and FDR-adjusted P-value are provided.
Table S11: Cis-GReX correlations of TWAS-identified genes with metabolic traits in the Hybrid Mouse Diversity Panel. For each correlation at FDR-adjusted P < 0.10, the dataset, gene (mouse analog), trait, correlation, and P-value are provided.
Table S12: Over-representation analysis of transcription factors identified as mediators. For the transcription-factor encoding genes identified as mediators, functional categories, ontologies, FDR-adjusted P-value of enrichment, number of overlapping genes in the ontology, and the total number of genes in the ontology is given.
Table S13: Trans-eQTL scan using GBAT in RICHS between genetic loci local to MOSTWAS-identified transcription factors and the expression of the target TWAS gene. The effect size, P-value, and FDR-adjusted P-value are provided.
Table S14: Results from MR-Egger to assess causal effects of transcription factors on targeted TWAS genes. For each RP-TWAS pair, the causal estimate, confidence interval, P-value, residual standard error, heterogeneity statistic, and heterogeneity P-value are provided.
Table S15: MOSTWAS-identified CpG site mediators found within ENCODE-identified placenta cis-regulatory sites. For each CpG site mediator that overlaps with a placental cis-regulatory stie, the chromosomal location of the regulatory site, the classification of the regulatory site, tissue, gestational time, sex, and accession number are provided.
Table S16: Summary statistics of down-regulated differentially expressed genes in EPS15 knockdown cells. For each gene with FDR-adjusted P < 0.01, we provide the gene name, log2 fold change, standard error, and P-values.
Table S17: Summary statistics of up-regulated differentially expressed genes in EPS15 knockdown cells. For each gene with FDR-adjusted P < 0.01, we provide the gene name, log2 fold change, standard error, and P-values.
Table S18: Over-representation analysis of down-regulated genes. Biological process, molecular function, and PANTHER and KEGG pathway ontologies enriched for down-regulated genes in EPS15 knockdown cells associated with each trait at FDR-adjusted P < 0.05.
Table S19: Over-representation analysis of up-regulated genes. Biological process, molecular function, and PANTHER and KEGG pathway ontologies enriched for up-regulated genes in EPS15 knockdown cells associated with each trait at FDR-adjusted P < 0.05.
ACKNOWLEDGEMENTS
We thank Michael Love, Kanishka Patel, Michael Gandal, Chloe Yap, Bogdan Pasaniuc, and Jon Huang for their thoughts about the research. We also thank the following consortia and research groups for their publicly available GWAS summary statistics, eQTL datasets, and/or epigenomic annotations: the UK Biobank and the Neale Lab, the Genetic Investigation of Anthropometric Traits Consortium, the Psychiatric Genetics Consortium, the Early Growth Genetics Consortium, the Complex Trait Genetics Lab, the Rhode Island Child Health Study, and the ENCODE Project.
Footnotes
Additional biological context and figure splits
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.
- 109.
- 110.↵
- 111.↵
- 112.
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵