Abstract
Polycystic ovary syndrome (PCOS) is one of the most common metabolism and endocrine disorder affecting women of reproductive age. However, the genetic mechanisms of PCOS remain largely elusive. A transcriptome-wide association study(TWAS) was conducted to identify the genes associated with PCOS using gene expression references of the whole blood and ovary. Then mRNA expression profiling analysis was performed to identify common genes with TWAS. Genes detected by TWAS were subjected to Gene Ontology(GO), pathway enrichment and protein-protein analysis. TWAS identified 422 genes for the whole blood and 177 genes for ovary, such as EML6 (TWAS.P = 6.33E-05), GRAP (TWAS.P = 0.000211), MAFTRR(TWAS.P = 1.21E-05). Functional enrichment analysis detected biological processes including ncRNA metabolic process, microtubule-related pathways. The present study identified a number of potential PCOS genes and biological processes. These results may provide insight into the genetic basis of PCOS.
Introduction
Polycystic ovary syndrome (PCOS), characterized by hyperandrogenism, irregular anovulation, and polycystic ovarian changes, is one of the most common reproductive metabolic disorders affecting reproductive-aged women. It is estimated that PCOS affect 6–20% of women1, and approximately 80% of those PCOS are infertile2. Although studies have suggested that obesity, age at menopause, insulin resistance/hyperinsulinemia, sex hormone-binding globulin (SHBG) levels, and depression may be risk factors for PCOS, its potential etiologies remain uncertain3.
Genetic factors are thought to be important in the pathogenesis of PCOS. For example, researchers have found that PCOS is twice as common in monozygotic twins as in dizygotic twins and conclude that genetic factors account for about 79% of the liability of PCOS4. The prevalence of PCOS is also higher among the mothers and sisters of affected women, suggesting a phenomenon of familial aggregation5. Several genome-wide association studies(GWAS) have identified susceptibility loci for PCOS6–10. However, the biological significance of their findings remains largely unclear as close to 90% of GWAS-identified SNPs are in the noncoding region. Expression quantitative trait loci (eQTL) are genomic loci that are associated with gene transcript levels. Gene expression is a critical intermediate phenotype between genes and diseases. Such integration of GWAS and eQTL is effective in identifying candidate genes associated with diseases.
The availability and cost of specimens, as well as intrinsic factors, small effects, make large-scale expression-trait associations challenging. To address these issues, transcriptome-wide association studies (TWAS) were developed, which combine gene expression data with large-scale genome-wide association studies. First, datasets (e.g. GTEx) with matched expression data and genotype were used for building a prediction model. Then GWAS data are combined with these models to assess the association between predicted gene expression and disease risk. A growing number of researchers are using TWASs to identify disease-associated genetic loci11–13.
In the present study, we conduct TWAS and mRNA expression profiling analysis to identify genes that are associated with PCOS. Initially, TWAS was performed to identify genes associated with PCOS risk. Then mRNA expression profiling analysis was carried out to identify common genes between TWAS. For the genes identified by TWAS, we conducted an analysis of gene ontology and pathway enrichment to explore their biological functions and related pathways.
Method
GWAS summary data for PCOS
We obtained PCOS GWAS summary data from a GWAS meta-analysis of seven cohort studies that included 10074 patients with PCOS and 103164 controls of European ancestry7. The cohort studies include Rotterdam(cases: 1184, controls:5799), UK (cases: 670,controls: 1379), EGCUT(cases:157,controls 2807), deCODE(cases: 658,controls: 6774), Chicago (cases: 984, controls: 2963), Boston(cases: 485,controls: 407), 23andMe(cases: 5184, controls: 82759). Diagnosis of PCOS was made based on National Institutes of Health criteria14, Rotterdam criteria15, or self-report.
TWAS analysis of PCOS
TWAS analysis was conducted using FUSION software. Pre-computed gene expression weights combined with PCOS summary statistics were used to calculate the association between single gene and PCOS. We derived the gene expression weights for whole blood and the ovary from the FUSION website (http://gusevlab.org/projects/fusion). In FUSION, expression weights were calculated using five linear models, including the best linear unbiased prediction model (BLUP), Bayesian sparse linear mixed model (BSLMM), least absolute shrinkage and selection operator (LASSO), Elastic Net, and top single nucleotidepolymorphisms (SNPs). In the present study, a TWAS P-value was calculated for each gene within the whole blood and ovary for European populations.
mRNA Expression profiles of PCOS
A genome-wide mRNA expression profile of PCOS was used to identify differentially expressed genes (DEGs). The PCOS expression data were downloaded from the Gene Expression Omnibus (GEO) Datasets GSE34526 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE34526)16. Gonadotropin-stimulated granulosa cells were obtained from 7 patients with PCOS and 3 healthy individuals. GEO2R was used to screen for differentially expressed genes17. Genes with a folding change (FC) >2 or <0.5 (|log2FC |> 1) and p-value < 0.01 were considered to be differentially expressed genes.
Functional Enrichment
Metascape (https://metascape.org/gp/index.html) was used to perform gene ontology and pathway enrichment analyses of the genes identified by TWAS. Metascape analysis is based on databases, including Gene Ontology, KEGG, Reactome Gene Sets, and WikiPathways. A hypergeometric test is used to identify enhanced ontology terms, and the p-value is corrected for false discovery rates. Molecular Complex Detection (MCODE) algorithm was used to extract densely connected regions18.
Results
TWAS results of PCOS
A total of genes were identified by TWAS in blood and ovary tissue.
The TWAS identified 422 genes with a p value < 0.05 for the whole blood, such as EML6 (TWAS.P = 6.33E-05), GRAP (TWAS.P = 0.000211), PTGER2(TWAS.P= 0.000403)(Supplementary Table1 and Table2). For the ovary tissue, TWAS identified 177 genes with a p value < 0.05, such as MAFTRR(TWAS.P = 1.21E-05), NEIL2 (TWAS.P = 3.22E-05), ZNF555(TWAS.P= 0.000364). 52 genes with p value < 0.05 were detected in both whole blood and ovary TWAS analysis (Supplementary Table3). Table 1 and Figure1 show the top 20 genes associated with PCOS as determined by TWAS.
Top 20 genes identified by TWAS analysis
Manhattan plot of TWAS-identified top20 genes associated with PCOS
The x-axis represents the physical location of each gene, while the y-axis represents the Z score of association between that gene and PCOS.
Genes shared between TWAS and mRNA Expression Profiling
The TWAS analysis and the mRNA expression analysis detected 18 common genes, such as CAMK1D, COLEC12 and PLCB2.(Table 2). A Venn diagram of TWAS versus common genes identified by mRNA expression profiling are shown in Figure2.
Common gene identified by TWAS and mRNA expression profiling
Venn diagram of gene identified by TWAS and mRNA expression profiling
Gene Enrichment Analysis of genes identified by TWAS
In order to obtain an overall impression of the characteristics of these genes identified by TWAS, a total 547 genes were submitted to Metascape performing GO and pathway enrichment, and protein-protein interaction analysis.
GO and pathway enrichment analysis showed that TWAS-identified genes were enriched in ncRNA metabolic process, rRNA metabolic process, transport along microtubule and so on(Figure 3). MCODE enrichment analysis based on PPI enrichment for TWAS-identified genes showed 11 PPI modules(Figure 4). The top enrichment term of MCODE1 included 13 proteins distrusted in three major clusters: “Metabolism of RNA”, “Major pathway of rRNA processing in the nucleolus and cytosol”, “rRNA processing in the nucleus and cytosol”. MCODE2, which consists of 7 proteins was mostly enriched in “Interleukin-20 family signaling”, “JAK-STAT signaling pathway”, “Regulatory circuits of the STAT3 signaling pathway”.
GO enrichment analysis of the genes identified by TWAS
Protein-protein interaction network of genes identified by TWAS
Discussion
The TWAS method is a powerful tool that utilizes genetic variation and gene expression to identify genes whose cis-regulated expression is associated with complex traits. In present study, we performed TWAS to detect candidate genes closely associated with PCOS. Further integrative analysis of TWAS and mRNA expression profiling identified 18 common genes. To the best of our knowledge, this is the first TWAS integrated with mRNA expression profiling analysis for PCOS. The results of our study provide new insight into the genetic mechanism behind PCOS. Our TWAS identified several genes that have been previously reported in GWAS and post-GWAS, such as RPS2619, NEIL28, and ZNF55619 as well as novel genes that have never been reported.
NEIL2 is is the gene encoding a member of the Fpg / Nei DNA glycosylase family. The encoded enzyme is involved in DNA repair mainly by cleaving oxidatively damaged bases and introducing DNA strand breaks through its base cleavage enzyme activity.
The NEIL2 gene encodes NEI endonuclease VIII-like 2, one of the DNA glycosylases involved in DNA repair20. DNA damage has been demonstrated to play a role in the pathogenesis of PCOS21. Further experiments are needed to determine the relationship between NEIL2 and PCOS.
EML6 is the only member of the EML family that is preferentially expressed by oocytes in the mouse ovary24. There is a colocalization of EML6 with meiotic spindles, and this specialized localization plays a critical role in maintaining spindle integrity and homologous chromosome segregation.
RPS26, located on 12q13.2, is a gene encoding a ribosomal protein, which is a component of the 40S subunit and belongs to the ribosomal protein S26E family22. The knockdown of Rps26 in mouse oocytes has previously been shown to result in delayed oocyte development and even the cessation of chromatin structural transition in oocytes, resulting in premature ovarian failure (POF)23.
GO and pathway enrichment indicated that ncRNA metabolic and microtubule-related pathways were enriched. The non-coding RNAs (ncRNAs) are functional RNAs that are not coded for protein production but serve as important regulators of a wide range of biological events25. Several studies have indicated that ncRNAs play a role in the occurrence and development of PCOS. The expression of sncRNAs was significantly different in serum, granulosa cells (GCs), follicular fluid (FF), and other tissues between PCOS patients and the general population26,27. Accordingly, our results further confirm that ncRNA may play a crucial role in the development of PCOS.
Microtubules, which are polymers containing α- and β-tubulin heterodimers, make up the oocyte spindle28. An abnormal spindle can affect the development, differentiation, and growth of a germ cell directly. Recent research has identified several mutations of spindle-related proteins that lead to infertility. An example of this is the TUBB8 gene, which encodes an isoform of beta-tubulin that is specific to humans and primates. TUBB8 mutations in women lead to a range of spindle defects during oocyte meiosis29. The enriched microtubule-related pathways may suggest that microtubules contribute to infertility in women with PCOS.
Conclusion
In summary, this TWAS study identified novel and susceptible genes and potential pathways for PCOS. Further studies are needed to explore the potential mechanism.
Acknowledgements
We would like to thank the participants and the investigators for generating the publicly available dataset. The scientific calculations in this paper have been done on the HPC Cloud Platform of Shandong University.