ABSTRACT
As the master regulator of the intrauterine environment, the placenta is core to the Developmental Origins of Health and Disease (DOHaD) but is understudied in relation to tissue-specific gene and trait regulation. We performed distal mediator-enriched transcriptome-wide association studies (TWAS) for 40 health traits across 5 physiological categories, using gene expression models trained with multi-omic data from the Extremely Low Gestational Age Newborn Study (N = 272). At P < 2.5 × 10−6, we detected 248 gene-trait associations (GTAs) across 176 genes, mostly for metabolic and neonatal traits and enriched for cell growth and immunological pathways. Of these GTAs, 89 showed significant mediation through genetic variants distal to the gene, identifying potential targets for functional validation. Functional validation of a mediator gene (EPS15) in human placenta-derived JEG-3 trophoblasts resulted in increased expression of its predicted targets, SPATA13 and FAM214A, both associated with the trait of waist-hip ratio in TWAS. These results illustrate the profound health impacts of placental genetic and genomic regulation in developmental programming across the life course.
MAIN
The placenta serves as the master regulator of the intrauterine environment via nutrient transfer, metabolism, gas exchange, neuroendocrine signaling, growth hormone production, and immunologic control1–5. Due to strong influences on postnatal health, the placenta is central to the Developmental Origins of Health and Disease (DOHaD) hypothesis, which purports that the in utero experience has lifelong impacts on child health by altering developmental programming and influencing risk of common, noncommunicable health conditions6. For example, placental biology has been linked to neuropsychiatric, developmental, and metabolic diseases or health traits (collectively referred to as traits) that manifest throughout the life course, either early- or later-in-life (Figure 1)7–10. Despite its long-lasting influences on health, the placenta has not been well-studied in large consortia studies of multi-tissue gene regulation11,12. Studying regulatory mechanisms in the placenta underlying biological processes in developmental programming will provide novel insight into health and disease etiology.
The complex interplay between genetics and placental transcriptomics and epigenomics has strong effects on gene expression that may explain variation in gene-trait associations (GTAs). Quantitative trait loci (QTL) analyses have identified a strong influence of cis-genetic variants on both placental gene expression and DNA methylation13,14. Furthermore, there is growing evidence that the placental epigenome influences gene regulation, often distally (more than 1-3 Megabases away in the genome)15, and that placental DNA methylation and microRNA (miRNA) expression are associated with health traits in children16–18. Dysfunction of transcription factor regulation in the placenta has also shown profound effects on childhood traits19–22. Although combining genetics, transcriptomics, and epigenomics lends insight into the influence of placental genomics on complex traits23, genome-wide screens for GTAs that integrate different molecular profiles and generate functional hypotheses require more sophisticated computational methods.
To this end, advances in transcriptome-wide association studies (TWAS) have allowed for integration of genome-wide association studies (GWAS) and eQTL datasets to boost power in identifying GTAs, specific to a relevant tissue24–26. However, traditional methods for TWAS largely overlook genetic variants distal to genes of interest, ostensibly mediated through regulatory biomarkers (e.g., transcription factors, miRNAs, and DNA methylation sites)27,28. Not only may these distal biomarkers explain a significant portion of both gene expression heritability and trait heritability on the tissue-specific expression level29–32, they may also influence tissue-specific trait associations for individual genes. Due to the strong interplay of regulatory elements in placental gene regulation, we sought to systematically characterize portions of gene expression that are influenced by these distal regulatory elements.
Here, we investigate three broad questions: (1) which genes show associations between their placental genetically-regulated expression (GReX) and various traits across the life course, (2) which traits along the life course can be explained by placental GReX, in aggregate, and (3) which transcription factors, miRNAs, or CpG sites potentially regulate trait-associated genes in the placenta (Figure 1). We leveraged gene expression, CpG methylation, and miRNA expression data from fetal-side placenta tissue from the Extremely Low Gestational Age Newborn (ELGAN) Cohort Study33. We trained predictive models of gene expression, enriched for distal SNPs using MOSTWAS, a recent TWAS extension that integrates multi-omic data34. Re-analyzing 40 GWAS of European-ancestry subjects from large consortia35–39, we performed a series of TWAS for non-communicable health traits and disorders that may be influenced by the placenta to identify GTAs and functional hypotheses for regulation (Figure 2). To our knowledge, this is the first distal mediator-enriched TWAS of health traits that integrates placental multi-omics. Results from our analysis can be explored at our Shiny R app, the ELGAN DOHaD Atlas: https://elgan-twas.shinyapps.io/dohad/.
RESULTS
Genetic heritability and correlations across traits
From large consortia35–39, we curated GWAS summary statistics from subjects of European ancestry for 40 complex, non-communicable traits and disorders across five health categories to systematically identify potential links to genetically-regulated placental expression (Table 1, Supplemental Table 1; Methods). These 40 traits comprise of 3 autoimmune/autoreactive disorders, 8 body size/metabolic traits, 4 cardiovascular disorders, 14 neonatal/early childhood traits, and 11 neuropsychiatric traits/disorders (Supplemental Table 1). These five categories of traits have been linked previously to placental biology and morphology7–10.
To assess the percent variance explained by genetics in each trait and the genetic associations shared between traits, we estimated the SNP heritability (h2) and genetic correlation (rg) of these traits, respectively (Supplemental Figure S1 and S2). Of the 40 traits, 37 showed significantly positive SNP heritability and 18 with ĥ2 > 0.10 (Supplemental Figure S1, Supplemental Table S1), with the largest heritability for childhood BMI (ĥ2 = 0.69, SE = 0.064). As expected, we observed strong, statistically significant genetic correlations between traits of similar categories (i.e., between neuropsychiatric traits or between metabolic traits) (Supplemental Figure S2; Supplemental Table S2). At Benjamini-Hochberg FDR-adjusted P < 0.05, we also observed several significant correlations between traits from different categories: diabetes and angina (, FDR-adjusted P = 6.53 × 10−33), Tanner scale and BMI (, FDR-adjusted P = 1.06 × 10−3), and BMI and obsessive compulsive disorder (, FDR-adjusted P = 1.79 × 10−9), for example. Given strong and potentially shared genetic influences across these traits, we examined whether genetic associations with these traits are mediated by the placental transcriptome.
Gene expression prediction models
To train predictive models of placental expression, the first step of our TWAS (Figure 2A), we leveraged MOSTWAS34, a recent extension that includes distal variants in transcriptomic prediction. As large proportions of total heritable gene expression are explained by distal-eQTLs local to regulatory hotspots27,28,30, MOSTWAS uses data-driven approaches to either identify mediating regulatory biomarkers or distal-eQTLs mediated through local regulatory biomarkers to increase predictive power for gene expression and power to detect GTAs (Supplemental Figure S3)34. In this analysis, these regulatory biomarkers include transcription-factor encoding genes, miRNAs, and CpG methylation sites from the ELGAN Study (Methods).
Using genotypes (from umbilical cord blood)40, mRNA expression, CpG methylation, and miRNA expression data (from fetal-side placenta)23 from the ELGAN Study33 for 272 infants born pre-term, we built genetic models to predict RNA expression levels for genes in the fetal placenta (demographic summary in Supplemental Table S3). Out of a total of 12,020 genes expressed across all samples in ELGAN, we successfully built significant models for 2,994 genes, such that SNP-based expression heritability is significantly positive (nominal P < 0.05) and five-fold cross-validation (CV) adjusted R2 ≥ 0.01 (Figure 3A [Step 1]); only these 2,994 models are used in subsequent TWAS steps. Mean SNP heritability for these genes was 0.39 (25% quantile = 0.253, 75% quantile = 0.511), and mean CV R2 was 0.031 (quantiles: 0.014, 0.034). For out-of-sample validation, we imputed expression into individual-level genotypes from the Rhode Island Child Health Study (RICHS; N = 149)41,42, showing strong portability across studies: of 2,005 genes with RNA-seq expression in RICHS, 1,131 genes met adjusted R2 ≥ 0.01, with mean R2 = 0.011 (quantiles: 7.71 × 10−4, 0.016) (Figure 3B; Supplemental Table S4).
Placental transcriptome-wide association studies
Overall associations and permutation tests
We integrated GWAS summary statistics for 40 traits from European-ancestry subjects with placental gene expression using our predictive models. Using the weighted burden test25,43, we detected 932 GTAs (spanning 686 unique genes) at P < 2.5 × 10−6 (corresponding to |Z| > 4.56), a transcriptome-wide significance threshold consistent with previous TWAS25,31 (Figure 3A [Step 2], Supplemental Data). As many of these loci carry significant signal because of strong trait-associated GWAS architecture, we employed Gusev et al’s permutation test to assess how much signal is added by the SNP-expression weights and confidently conclude that integration of expression data significantly refines association with the trait25. At FDR-adjusted permutation P < 0.05 and spanning 176 unique genes, we detected 248 such GTAs, of which 11 were found in autoimmune/autoreactive disorders, 136 in body size/metabolic traits, 32 in cardiovascular disorders, 39 in neonatal/childhood traits, and 30 in neuropsychiatric traits (Figure 3C [Step 3], Table 1, Supplemental Table S5; Miami plots of TWAS Z-scores in Supplemental Figures S4-S9).
For example, the 39 GTAs detected with BMI included LARS2 (Leucyl-tRNA Synthetase 2, OMIM: 604544) (Z = 11.4) and CAST (Calpastatin, OMIM: 114090) (Z = −4.61). These two GTAs have been detected using cis-only TWAS in different tissues25,31. In addition, one of the 30 genes identified in association with waist-hip ratio was prioritized in other tissues by TWAS: NDUFS1 (NADH:Ubiquinone Oxidoreductase Core Subunit S1, OMIM: 157655) (Z = −5.38)31. We cross-referenced susceptibility genes with a recent cis-only TWAS of fetal birthweight, childhood obesity, and childhood BMI by Peng et al using placental expression data from RICHS10. Of the 19 birthweight-associated genes they identified, we could only train significant expression models for two in ELGAN: PLEKHA1 (Pleckstrin Homology Domain Containing A1, OMIM: 607772) and PSG8 (Pregnancy Specific Beta-1-Glycoprotein 8, OMIM: 176397). We only detected a significant association between PSG8 and fetal birthweight (Z = −7.77). Similarly, of the 6 childhood BMI-associated genes identified by Peng et al, only 1 had a significant model in ELGAN and showed no association with the trait; there were no overlaps with childhood obesity-associated genes from Peng et al10. We hypothesize that minimal overlap with susceptibility genes identified by Peng et al is due to differing eQTL architectures in the datasets and different inclusion criteria for significant gene expression models10,34,44,45.
We conducted over-representation analysis for biological process, molecular function, and PANTHER gene pathway ontologies for TWAS-detected susceptibility genes (Supplemental Figure S10, Supplemental Table S6)46. Overall, considering all 176 TWAS-identified genes, we observed enrichments for nucleic acid binding and immune or cell growth signaling pathways (e.g., B-cell/T-cell activation and EGF receptor, interleukin, PDGF, and Ras signaling pathways). By trait, we found related pathways (sphingolipid biosynthesis, cell motility, etc) for TWAS genes for metabolic and morphological traits (e.g., BMI and childhood BMI); for most traits, we were underpowered to detect ontology enrichments (Supplemental Table S6). We also assessed the overlap of TWAS genes with GWAS signals. A total of 112 TWAS genes did not overlap with GWAS loci (P < 5 × 10−8) within a 500 kilobase interval around any SNPs (local and distal) included in predictive models (Table 2).
Genetic heritability and correlations of traits by predicted expression
To assess how, on the whole, genetically regulated placental expression explains trait variance, we computed trait heritability on the placental expression level using all examined and all TWAS-prioritized susceptibility genes using RHOGE, an linkage disequilibrium (LD) score regression approach25,47. Overall, we found 3/14 neonatal traits (childhood BMI, total puberty growth, and pubertal growth start) with significant (FDR-adjusted P < 0.05 for jack-knife test of significance)31; none of the 26 traits outside the neonatal category were appreciably explained by placental GReX. Figure 3D shows that mean is higher in neonatal traits than other groups. A comparison of the number of GWAS-significant SNPs and TWAS-significant genes also shows that neonatal/childhood traits are enriched for placental TWAS associations, even though significant genome-wide GWAS architecture cannot be inferred for these traits (Supplemental Figure S11). Not only do these results highlight the power advantage of properly aligned tissue-specific TWAS compared to GWAS, they suggest that placental GReX affects neonatal traits more profoundly, as a significantly larger proportion of neonatal traits showed significant heritability on the placental GReX level than later-in-life traits.
Similarly, using RHOGE31, we assessed genetic correlations (rGE) between traits at the level of placental GReX (Supplemental Figure S12). We found several known correlations, such as between cholesterol and triglycerides and childhood BMI and adult BMI . Interestingly, we found correlations between traits across categories: IQ and diastolic blood pressure and age of asthma diagnosis and glucose levels ; these traits have been linked in morphological analyses of the placenta, but our results suggest possible gene regulatory contributions48. Overall, these correlations may suggest shared genetic pathways for these pairs of traits or for etiologic antecedents of these traits; these shared pathways could be either at the susceptibility genes or through shared distal loci, mediated by transcription factors, miRNAs, or CpG methylation sites.
Pleiotropy across overlapped genes and multiple traits
GTAs across the 40 traits showed several overlaps in signal. First, as these genes indicate trait association and do not reflect causality, we used FOCUS49, a Bayesian fine-mapping approach. For TWAS-significant genes with overlapping genetic loci, FOCUS estimates posterior inclusion probabilities (PIP) in a credible set of genes that explains the association signal at the locus. We found 8 such overlaps and estimated a 90% credible set of genes explaining the signal for each locus (Supplemental Table S8). For example, we identified 3 genes associated with triglycerides at the 12q24.13 chromosomal region (ERP29, RPL6, BRAP), with ERP29 (Endoplasmic Reticulum Protein 29, OMIM: 602287) defining the region’s 90% credible set with approximately 95% PIP. Similarly, we detected 3 genes associated with BMI at 10q22.2 (AP3M1, SAMD8, MRPS16), with AP3M1 (Adaptor Related Protein Complex 3 Subunit Mu 1, OMIM: 610366) defining the region’s 90% credible set with approximately 99% PIP.
We also noticed that ERP29 and RPL6 (Ribosomal Protein L6, OMIM: 603703) were identified in GTAs with multiple traits, leading us to examine potential horizontally pleiotropic genes. Of the 176 TWAS-prioritized genes, we identified 50 genes associated with multiple traits, many of which are genetically correlated (Table 3). Nine genes showed more than 3 GTAs across different categories. For example, IDI1 (Isopentenyl-Diphosphate Delta Isomerase 1, OMIM: 604055), a gene involved in cholesterol biosynthesis50, showed associations with 3 metabolic and 2 neuropsychiatric traits: body fat percentage (Z = 15.57), HDL (Z = 26.48), triglycerides (Z = −7.53), fluid intelligence score (Z = 6.37), and schizophrenia (Z = −5.56). A link between cholesterol-related genes and schizophrenia has been detected previously, potentially due to coregulation of myelin-related genes51,52. Mediated by CpG site cg01687878 (found within PITPNM2), predicted expression of IDI1 was also computed using distal SNPs within Chromosome 12q24.31, a known GWAS risk loci for hypercholesteremia53; the inclusion of this locus may have contributed to the large TWAS associations. Similarly, SAMD4A (Sterile Alpha Motif Domain Containing 4A, OMIM: 610747) also shows associations with 4 body size/metabolic - body fat percentage (Z = 6.70), cholesterol (Z = −6.76), HDL (Z = −6.78), triglycerides (Z = −5.30) - and 1 cardiovascular trait (diastolic blood pressure with Z = −5.29); these associations also pick up on variants in Chromosome 12q24.31 local to CpG sites cg05747134 (within MMS19) and cg04523690 (within SETD1B). Another gene with multiple trait associations is CMTM4 (Chemokine-Like Factor Superfamily Member 4, OMIM: 607887), an angiogenesis regulator54, showing associations with body fat percentage (Z = 6.17), hypertension (Z = 5.24), and fetal birthweight (Z = 8.11). CMTM4 shows evidenced risk of intrauterine growth restriction due to involvement with endothelial vascularization55, potentially suggesting that CMTM4 has a more direct effect in utero, which mediates its associations with body fat percentage and hypertension.
We further studied the 9 genes with 3 or more distinct GTAs across different categories (Figure 4A). Using UK Biobank35 GWAS summary statistics, we conducted TWAS for a variety of traits across 8 groups, defined generally around ICD code blocks (Figure 4A, Supplemental Figure S13); here, we grouped metabolic and cardiovascular traits into one category for ease of analysis. At FDR-adjusted P < 0.05, ATPAF2 (ATP Synthase Mitochondrial F1 Complex Assembly Factor 2, OMIM: 608918), RPL6, and SEC11A (SEC11 Homolog A, Signal Peptidase Complex Subunit, OMIM: 618258) showed GTA enrichments for immune-related traits, ATAPF2 for neonatal traits, IDI1 for mental disorders, and RPS25 (Ribosomal Protein S25, OMIM: 180465) for musculoskeletal traits. Across these 8 trait groups, RPL6 showed multiple strong associations with circulatory, respiratory, immune-related, and neonatal traits (Figure 4A). Examining specific GTAs for ATPAF2, IDI1, RPS25, and SEC11A reveals associations with multiple biomarker traits (Supplemental Figure S13). For example, at P < 2.5 × 10−6, ATPAF2 and IDI1’s immune GTA enrichment includes associations with eosinophil, monocyte, and lymphocyte count and IGF-1 concentration. ATPAF and RPS25 show multiple associations with platelet volume and distribution and hematocrit percentage. In addition, IDI1 was associated with multiple mental disorders (obsessive compulsive disorder, anorexia nervosa, bipolar disorder, and general mood disorders), consistent with its TWAS associations with fluid intelligence and schizophrenia (Supplemental Figure S13). As placental GReX of these genes correlates with biomarkers, these results may not necessarily signify shared genetic associations across multiple traits. Rather, this may point to more fundamental effects of these TWAS-identified genes that manifest in complex traits later in life.
We next examined whether placental GReX of these 9 genes correlate with fundamental traits at birth. We imputed expression into individual-level ELGAN genotypes (N = 729) (Online Methods). Controlling for race, sex, gestational duration, inflammation of the chorion, and maternal age, as described before and in Online Methods23, we tested for associations for 6 representative traits measured at birth or at 24 months: neonatal chronic lung disease, head circumference Z-score, fetal growth restriction, birth weight Z-score, necrotizing enterocolitis, and Bayley II Mental Development Index (MDI) at 24 months56. Shown in Figure 4B and Supplemental Table S9, at FDR-adjusted P < 0.05, we detected negative associations between SEC11A GReX and birthweight Z-score (effect size: −0.248, 95% adjusted57 CI: [-0.434,-0.063]) and GReX of ATPAF2 and head circumference Z-score (−0.173, [-0.282,-0.064]). Furthermore, we detected negative associations between MDI and GReX of RPL6 (−2.636, [-4.251,-1.02]) and ERP29 (−3.332, [-4.987,-1.677]). As many of these genes encode for proteins involved in core processes (i.e., RPL6 is involved in trans-activation of transcription and translation58, and SEC11A has roles in cell migration and invasion59), understanding how the placental GReX of these genes affects neonatal traits may elucidate the potential long-lasting impacts of placental dysregulation.
Investigating mediators of distal SNP to gene relationships
An advantage of MOSTWAS’s methodology is in functional hypothesis generation by identifying potential mediators that affect TWAS-identified genes. Using the distal-SNPs added-last test from MOSTWAS34, we interrogated distal loci incorporated into expression models for trait associations, beyond the association at the local locus. For 89 of 248 associations, predicted expression from distal SNPs showed significant associations at FDR-adjusted P < 0.05 (Figure 3A [Step 4], Supplemental Table S5). For each significant distal association, we identified a set of biomarkers that potentially affects transcription of the TWAS gene: a total of 9 transcription factor-encoding genes (TFs) and 163 CpG sites across all 89 distal associations. Particularly, we detected two TFs, DAB2 (DAB Adaptor Protein 2, OMIM: 601236, distal mediator for PAPPA and diastolic blood pressure, distal Z = −3.98) and EPS15 (Epidermal Growth Factor Receptor Substrate 15, OMIM: 600051), both highly expressed in placenta60,61. Mediated through EPS15 (overall distal Z = 7.11 and 6.33, respectively), distally predicted expression of SPATA13 (Spermatogenesis Associated 13, OMIM: 613324) and FAM214A (Family With Sequence Similarity 214 Member A) showed association with waist-hip ratio. Interestingly, EPS15 itself showed a TWAS association for waist-hip ratio (Supplemental Table S5), and the direction of the EPS15 GTA was opposite to those of SPATA13 and FAM214A. Furthermore, RORA (RAR Related Orphan Receptor A, OMIM: 600825), a gene encoding a TF involved in inflammatory signaling62, showed a negative association with transcription of UBA3 (Ubiquitin Like Modifier Activating Enzyme 3, OMIM: 603172), a TWAS gene for fetal birthweight. Low placental RORA expression was previously shown to be associated with lower birthweight63. Aside from functions related to transcription regulation, the 9 TFs (CUL5, DAB2, ELL, EPS15, RORA, SLC2A4RG, SMARCC1, NFKBIA, ZC3H15) detected by MOSTWAS were enriched for several ontologies (Supplemental Table S10), namely catabolic and metabolic processes, response to lipids, and multiple nucleic acid-binding processes46.
As we observed strong correlations between expressions of TF-TWAS gene pairs in ELGAN (Supplemental Figure S14), we then examined the associations between TWAS-identified genes and the locus around any predicted mediating TFs in an external dataset42. Using RICHS, we conducted a gene-based trans-eQTL scan using Liu et al’s GBAT method64 to computationally validate TF-TWAS gene associations. We predicted GReX of the TFs using cis-variants through leave-one-out cross-validation65 and scanned for associations with the respective TWAS genes (Figure 4C, Supplemental Table S11). We found a significant association between predicted EPS15 and FAM214A expressions (effect size −0.24, FDR-adjusted P = 0.019). In addition, we detected a significant association between predicted NFKBIA (NF-Kappa-B Inhibitor Alpha, OMIM: 164008) and HNRNPU (Heterogeneous Nuclear Ribonucleoprotein U, OMIM: 602869) (effect size −0.26, FDR-adjusted P = 1.9 × 10−4). We also considered an Egger regression-based Mendelian randomization framework66 in RICHS to estimate the causal effects of TFs on the associated TWAS genes (Methods) using, as instrumental variables, cis-SNPs correlated to the TF and uncorrelated with the TWAS genes. We estimated significant causal effects for two TF-TWAS gene pairs (Figure 4C, Supplemental Table S12): EPS15 on FAM214A (causal effect estimate −0.58; 95% CI [0.21, 0.94]) and RORA on UBA3 (0.58; [0.20, 0.96]).
We also examined CpG methylation sites MOSTWAS marked as potential mediators for expression of TWAS genes for overlap with cis-regulatory elements in the placenta from the ENCODE Project Phase II12, identifying 34 CpG sites (mediating 29 distinct TWAS genes) that fall in cis-regulatory regions (Supplemental Table S13). Interestingly, one CpG site mediating (cg15733049, Chromosome 1:2334974) FAM214A is found in low-DNase activity sites in placenta samples taken at various timepoints; additionally, cg15733049 is local to EPS15, the transcription factor predicted to mediate genetic regulation of FAM214A. Furthermore, expression of LARS2, a TWAS gene for BMI, is mediated by cg04097236 (found within ELOVL2), a CpG site found in low DNase or high H3K27 activity regions; LARS2 houses multiple GWAS risk SNPs for type 2 diabetes67 and has shown BMI TWAS associations in other tissues25,31. Results from these external datasets add more evidence that these mediators play a role in gene regulation of these TWAS-identified genes.
In-vitro assays of transcription factor activity
Based on our computational results, we experimentally investigated whether the inverse relationship between TF EPS15 and its two prioritized target TWAS genes, SPATA13 and FAM214A, is supported in vitro. We employed gene silencing technology to knock down EPS15 expression in human placenta-derived JEG-3 trophoblast cells and assessed the gene expression of the targets via qRT-PCR. We used a FANA oligonucleotide targeting EPS15 for specific silencing and compared the knockdown variant with scramble oligo and no-addition controls. Addition of FANA-EPS15 to JEG-3 cells decreased EPS15 gene expression, while increasing the expression of SPATA13 and FAM214 (Figure 5A). EPS15 expression demonstrated a 50% decrease, while SPATA13 and FAM214A demonstrated increases in expression of 795% and 377%, respectively. At FDR-adjusted P < 0.10, though not statistically significant against the control, changes in gene expression of EPS15 and downstream targets from the scramble were statistically significant against the knockdown oligo. Similarly, changes in gene expression between the control mRNA and transcription factor and target mRNA were statistically significant (Figure 5B). These results support our computational findings that increased EPS15 expression may decrease expression of SPATA13 and FAM214A. Though we could not study the effects of these three genes on body size-related traits, this study prioritizes EPS15 as a potential regulator for multiple genes downstream, perhaps for genes affecting cell adhesion and growth in the placenta, like SPATA1368–70.
DISCUSSION
The placenta has historically been understudied in large multitissue consortia efforts that study tissue-specific regulatory mechanisms11,12. To address this gap, we systematically categorized placental gene-trait associations relevant to the DOHaD hypothesis using distal mediator-enriched TWAS and deployed these results at the ELGAN DOHaD Atlas (https://elgan-twas.shinyapps.io/dohad/). By integrating multiomic data from the ELGAN Study33 with 40 GWAS, we detected 176 unique genes (enriched for cell growth and immune pathways) with transcriptome-wide significant associations, with the majority of GTAs linked to metabolic and neonatal/childhood traits. Many of these TWAS-identified genes, especially those with neonatal GTAs, showed multiple GTAs across trait categories (9 genes with 3 or more GTAs). We examined phenome-wide GTAs for these 9 genes in UKBB and found enrichments for traits affecting in immune and circulatory system (e.g., immune cell, erythrocyte, and platelet counts). We followed up with selected at-birth traits in ELGAN and found associations with neonatal body size and infant cognitive development. Furthermore, we could only estimate significantly positive placental GReX-mediated heritability for neonatal traits but not for later-in-life traits. These results suggest that placental expression, mediated by fetal genetics, is most likely to have large effects on early life traits, but these effects may carry over later-in-life or as etiologic antecedents for complex traits.
MOSTWAS also allows for hypothesis generation for regulation of TWAS-detected genes, through distal mediating biomarkers, like transcription factors, miRNAs, or products downstream of CpG methylation islands34. Our computational results prioritized 89 GTAs with strong distal associations. We interrogated one such functional hypothesis: EPS15, a transcription factor-encoding gene in the EGFR pathway1, regulates two TWAS genes associated with waist-hip ratio - FAM214A, a gene of unknown function, and SPATA13, a gene that regulates cell migration and adhesion68–70. In placenta-derived trophoblasts, knockdown of EPS15 showed increased expression of both FAM214A and SPATA13. FAM214A and SPATA13 both showed positive associations with adult waist-hip ratio. In particularly, EPS15, mainly involved in endocytosis, is a maternally imprinted gene and thus predicted to promote offspring health61,71–73; its inverse association with SPATA13 and FAM214A could provide more context to its full influence in placental developmental programming, perhaps by affecting cell proliferation or adhesion pathways. In vivo animal experiments, albeit limited in scope and generalizability, can be employed to further investigate GTAs, but this in vitro assay shows the potential of MOSTWAS to build mechanistic hypotheses for upstream regulation of TWAS genes that hold up to experimental rigor.
We conclude with limitations of this study and future directions. First, although TWAS is unlikely to be subject to reverse-causality (trait cannot affect expression, independent of genetics), instances of horizontal SNP pleiotropy, where SNPs influence the trait and expression independently, were not examined here. Second, the ELGAN Study gathered molecular data from infants born extremely pre-term. If unmeasured confounders affect both prematurity and a trait of interest, GTAs could be subject to backdoor collider confounding74,75. However, significant TWAS genes did not show associations for gestational duration, suggesting minimal bias from this collider effect. An interesting future endeavor could include negative controls to account for unmeasured confounders in predictive models76 to allow for more generalizability of predictive models. Fourth, though we did scan neonatal traits in ELGAN using individual-level genotypes, the sample size is small; larger GWAS with longitudinal traits could allow for rigorous Mendelian randomization studies77 that investigate relationships between traits across the life course, in the context of placental regulation. Lastly, due to small sample sizes of other ancestry groups in ELGAN, we could only credibly impute expression into samples from European ancestry and our TWAS only considers GWAS in populations of European ancestry. We emphasize acquisition of larger genetic and genomic datasets from understudied and underserved populations, especially related to early-in-life traits.
Our findings reveal functional evidence for the fundamental influence of placental genetic and genomic regulation on developmental programming of early- and later-in-life traits, identifying placental gene-trait associations and testable functional hypotheses for upstream placental regulation of these genes. Future large-scale tissue-wide studies should consider the placenta as a core tissue for learning about the developmental origins of trait and disease etiology.
ONLINE METHODS
Data acquisition and quality control
Genotypes and multi-omic (mRNA, miRNA, and CpG methylation) data were collected from umbilical cord blood and the fetal side of the placenta of subjects enrolled in the ELGAN33 Study, as described in previous work23,40,78 and in detail in Supplemental Methods. Genotype data was assayed on Illumina 1 Million Quad and HumanOmniExpress-12 v1.0 arrays79. We removed SNPs with call rate < 90% and MAF < 1% and samples with missing rate > 10%80. We imputed genotypes to the TOPMed Freeze 5 reference panel81 using eagle for phasing and minimac4 for imputation82,83 and considered only autosomal variants. This resulted in 6,567,190 total variants. Quality control for CpG methylation was performed at the sample level, excluding samples that failed and technical duplicates. The ComBat function was used from the sva package to adjust for batch effects from sample plate and cell-type heterogeneity84. mRNA and miRNA were aligned to the GENCODE Release 31 reference transcriptome and quantified using Salmon85 and the HTG EdgeSeq System86. We upper-quartile normalized distributional differences between lanes87 and used RUVSeq and limma to estimate and remove unwanted variation88,89. Overall, we considered 846,233 CpG sites, 12,020 genes, and 1,898 miRNAs. We downloaded quality-controlled genotypes and obtained normalized RNA-seq data for RICHS data for validation of gene expression models42. Summary statistics were downloaded from the following consortia: the UK Biobank35, Early Growth Genetics Consortium36, Genetic Investigation of Anthropometric Traits37, Psychiatric Genomics Consortium38, and the Complex Trait Genetics Lab39 (Supplemental Table 1). Genomic coordinates were transformed to the hg38 reference genome using liftOver90,91.
Estimation of SNP heritability of gene estimation
Heritability using genotypes within 1 Megabase of the gene of interest and any prioritized distal loci was estimated using the GREML-LDMS method, proposed to estimate heritability by correction for bias in LD in estimated SNP-based heritability92. Analysis was conducted using GCTA v1.93.193. Briefly, Yang et al shows that estimates of heritability are often biased if causal variants have a different minor allele frequency (MAF) spectrums or LD structures from variants used in analysis. They proposed an LD and MAF-stratified GREML analysis, where variants are stratified into groups by MAF and LD, and genetic relationship matrices (GRMs) from these variants in each group are jointly fit in a multi-component GREML analysis.
Gene expression models
We used MOSTWAS to train predictive models of gene expression from germline genetics, including distal variants that were either close to associated mediators (transcription factors, miRNAs, CpG sites) or had large indirect effects on gene expression34 (Supplemental Figure S1, Supplemental Methods). Briefly, MOSTWAS contains two methods of predicting expression: (1) mediator-enriched TWAS (MeTWAS) and (2) distal-eQTL prioritization via mediation analysis. For MeTWAS, we first identified mediators strongly associated with genes through correlation analyses between all genes of interest and a set of distal mediators (FDR-adjusted P < 0.05). We then trained local predictive models (using SNPs within 1 Mb) of each mediator using either elastic net or linear mixed model, used these models to impute the mediator in the training sample, and included the imputed values for mediators as fixed effects in a regularized regression of the gene of interest. For DePMA, we first conducted distal eQTL analysis to identify all distal-eQTLs at P < 10−6 and then local mediator-QTL analysis to identify all mediator-QTLs for these distal-eQTLs at FDR-adjusted P < 0.05. We tested each distal-eQTL for their absolute total mediation effect on the gene of interest through a permutation test and included eQTLs with significantly large effects in the final expression model. Full mathematical details are provided in Supplemental Methods. We considered only genes with significantly positive heritability at nominal P < 0.05 using a likelihood ratio test and five-fold cross-validation R2 ≥ 0.01.
TWAS tests of association
The association between predicted expression and traits was assessed in GWAS summary statistics using the weighted burden test and 1000 Genomes Project CEU population as an LD reference25,34,43,94 with Bonferroni-corrected significance threshold of P < 2.5 × 10−6. We only consider GWAS of subjects from European ancestry, as ELGAN data does not have a large enough sample size of non-Europeans to accurately map distal-eQTLs. In individual level data from ELGAN, we multiplied the genotype matrix by the SNP-gene weights to construct imputed expression in ELGAN; for samples used in model training, we used the cross-validated predicted expression. We tested the significance of expression-trait associations conditional on SNP-trait effects at a locus using the permutation test from Gusev et al25. We also tested the trait association at distal variants using the added last test from MOSTWAS34. Briefly, we computed the weighted Z-score at the distal loci, conditional on the weighted Z-score at the local locus and test this using the same null distribution assumptions as in the weighted burden test from Gusev et al25. These tests are explained in detail in Supplemental Methods.
Genetic heritability and correlation estimation
At the genome-wide level, we estimated genetic heritability and correlation between traits using LD score regression47,95 using GWAS summary statistics. We adopted approaches from Gusev et al and Mancuso et al to quantify the heritability of and genetic correlations (ρGE) between traits at the predicted placental level. Briefly, we assume that the expected χ2 statistic under a complex trait is a linear function of the LD score; the effect size of the LD score on the χ2 is proportional to . We estimated the LD scores of each gene by predicted expression in European samples of 100 Genomes and computing the sample correlations and inferred using ordinary least squares. We employed RHOGE31 to estimate and test for significant genetic correlations between traits at the predicted expression level (details in Supplemental Methods).
Multi-trait scans in UKBB and ELGAN
For 9 genes with 3 or more associations across traits of different categories, we conducted multi-trait TWAS scans in UK Biobank. Here, we used the weighted burden test in UKBB GWAS summary statistics from samples of European ancestry for 296 traits grouped by ICD code blocks (circulatory, congenital malformations, immune, mental disorders, musculoskeletal, neonatal, neurological, and respiratory). We also imputed expression for these genes in ELGAN using 729 samples with individual genotypes and conducted a multi-trait scan for 6 neonatal traits: neonatal chronic lung disease, head circumference Z-score, fetal growth restriction, birth weight Z-score, necrotizing enterocolitis, and Bayley II Mental Development Index (MDI) at 24 months. For continuous traits (head circumference Z-score, birth weight Z-score, and mental development index), we used a simple linear regression with the GReX of the gene as the main predictor, adjusting for race, sex, gestational duration (in days), inflammation of the chorion, and maternal age. For binary traits, we used a logistic regression with the same predictors and covariates. These covariates have been previously used in placental genomic studies of neonatal traits23,78,96 because of their strong correlations with the outcomes and with placental transcriptomics and methylomics.
Validation analyses in RICHS
Using genotype and RNA-seq expression data from RICHS42, we attempted to validate TF-TWAS gene associations prioritized from the distal-SNPs added last test in MOSTWAS. We first ran GBAT, a trans-eQTL mapping method from Liu et al64 to assess associations between the loci around TFs and the expression of TWAS genes in RICHS. GBAT tests the association between the predicted expression of a TF with the expression of a TWAS gene, improving power of trans-eQTL mapping65. We also conduct directional Egger regression-based Mendelian randomization to estimate and test the causal effects of the expression of the TF on the expression of the TWAS gene97.
In-vitro functional assays
Cell culture and treatment
The JEG-3 immortalized trophoblast cell was purchased from the American Type Culture Collection (Manassas, VA). Cells were grown in Gibco RMPI 1640, supplemented with 10% fetal bovine serum (FBS), and 1% penicillin/streptomycin at 37°C in 5% CO2. Cells were plated at 2.1 × 106 cells per 75 cm3 flask and incubated under standard conditions until achieving roughly 90% confluence. To investigate the effects of gene silencing, we used AUMsilence FANA oligonucleotides for mRNA knockdown of EPS15 (AUM Bio Tech, Philadelphia, PA) and subsequent analysis of predicted downstream target genes SPATA13 and FAM214A. On the day of treatment, cells were seeded in a 24-well culture plate at 0.05 × 106 cells per well. Cells were plated in biological duplicate. FANA oligos were dissolved in nuclease-free water to a concentration of 500µM, added to cell culture medium to reach a final concentration of 20µM and incubated for 24 hours at 37°C in 5% CO2.
Assessment of mRNA expression by quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
Treated and untreated JEG-3 cells were harvested in 350µL of buffer RLT plus. Successive RNA extraction was performed using the AllPrep DNA/RNA/miRNA Universal Kit according to the manufacturer’s protocol. RNA was quantified using a NanoDrop 1000 spectrophotometer (Thermo Scientific, Waltham, MA). RNA was then converted to cDNA, the next step toward analyzing gene expression. Next, mRNA expression was measured for EPS15, SPATA13, and FAM214A using real-time qRT-PCR and previously validated primers. Samples were run in technical duplicate. Real-time qRT-PCR Ct values were normalized against the housekeeping gene B-actin (ACTB), and fold changes in expression were calculated based on the ΔΔCT method98.
Statistical analysis
Statistical analysis was performed using a one-way ANOVA (with nominal significance level α = 0.05). Post-hoc pairwise t-tests (3 degrees of freedom for biological and technical duplicate) were utilized to investigate direct comparisons within sample groups.
Data Availability
ELGAN mRNA, miRNA, and CpG methylation data can be accessed from the NCBI Gene Expression Omnibus GSE154829 and GSE167885. ELGAN genotype data is protected, as subjects are still enrolled in the study; any inquiries or data requests must be made to RCF and HPS. All models and full TWAS results are found at https://doi.org/10.5281/zenodo.461803699.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE154829
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE167885
http://www.nealelab.is/uk-biobank
https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page
https://www.med.unc.edu/pgc/download-results/
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001586.v1.p1
ETHICS DECLARATIONS
The study was approved by the Institutional Review Board of the University of North Carolina at Chapel Hill. All participants consented to the study as per IRB protocol.
FUNDING
This study was supported by grants from the National Institutes of Health (NIH), specifically the National Institute of Neurological Disorders and Stroke (U01NS040069; R01NS040069), the Office of the NIH Director (UG3OD023348), the National Institute of Environmental Health Sciences (T32-ES007018; P30ES019776; R24ES028597), the National Institute of Nursing Research (K23NR017898; R01NR019245), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD092374; R03HD101413; P50HD103573). The funders had no role in study design or analysis.
WEB RESOURCES
UK Biobank summary statistics: http://www.nealelab.is/uk-biobank
GIANT Consortium summary statistics: https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page
PGC summary statistics: https://www.med.unc.edu/pgc/download-results/
EGG summary statistics: https://egg-consortium.org/
CTG Lab summary statistics: https://ctg.cncr.nl/software/
RICHS eQTL dataset: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001586.v1.p1
ENCODE placental epigenomic annotations: https://www.encodeproject.org/ (accession numbers in Supplemental Table S13)
MOSTWAS software: https://bhattacharya-a-bt.github.io/MOSTWAS/articles/MOSTWAS_vignette.html
ELGAN DOHaD Atlas: https://elgan-twas.shinyapps.io/dohad/
DATA AVAILABILITY
ELGAN mRNA, miRNA, and CpG methylation data can be accessed from the NCBI Gene Expression Omnibus GSE154829 and GSE167885. ELGAN genotype data is protected, as subjects are still enrolled in the study; any inquiries or data requests must be made to RCF and HPS. All models and full TWAS results are found at https://doi.org/10.5281/zenodo.461803699.
AUTHOR CONTRIBUTIONS
AB, RCF, and HPS designed this study. AB, VA, WL, and YL supervised and conducted computational analyses. ANF, HJH, RCF, and HPS supervised and conducted experiments. RMJ, RH, LS, KCKK, TMO, CJM, RCF, and HPS collected the data. AB, TMO, RH, RCF, and HPS interpreted results. AB, RCF, and HPS wrote the paper. All authors read and edited the paper.
ACKNOWLEDGEMENTS
We thank Michael Love, Kanishka Patel, Michael Gandal, Chloe Yap, Bogdan Pasaniuc, and Jon Huang for engaging conversation during the research process. We also thank the following consortia and research groups for their publicly available GWAS summary statistics, eQTL datasets, and/or epigenomic annotations: the UK Biobank and the Neale Lab, the Genetic Investigation of Anthropometric Traits Consortium, the Psychiatric Genetics Consortium, the Early Growth Genetics Consortium, the Complex Trait Genetics Lab, the Rhode Island Child Health Study, and the ENCODE Project.
REFERENCES
- 1.↵
- 2.
- 3.
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.
- 18.↵
- 19.↵
- 20.
- 21.
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.↵
- 71.↵
- 72.
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵