Abstract
Background Patients at high risk of severe forms of COVID-19 frequently suffer from chronic diseases, but other risk factors may also play a role. Environmental stressors, such as endocrine disrupting chemicals (EDCs), can contribute to certain chronic diseases and might aggravate the course of COVID-19.
Objectives To explore putative links between EDCs and COVID-19 severity, an integrative systems biology approach was constructed and applied.
Methods As a first step, relevant data sets were compiled from major data sources. Biological associations of major EDCs to proteins were extracted from the CompTox database. Associations between proteins and diseases known as important COVID-19 comorbidities were obtained from the GeneCards and DisGeNET databases. Based on these data, we developed a tripartite network (EDCs-proteins-diseases) and used it to identify proteins overlapping between the EDCs and the diseases. Signaling pathways for common proteins were then investigated by over-representation analysis.
Results We found several statistically significant pathways that may be dysregulated by EDCs and that may also be involved in COVID-19 severity. The Th17 and the AGE/RAGE signaling pathways were particularly promising.
Conclusions Pathways were identified as possible targets of EDCs and as contributors to COVID-19 severity, thereby highlighting possible links between exposure to environmental chemicals and disease development. This study also documents the application of computational systems biology methods as a relevant approach to increase the understanding of molecular mechanisms linking EDCs and human diseases, thereby contributing to toxicology prediction.
Introduction
The COVID-19 pandemic started in the fall of 2019 and spread to a large part of the world during the winter and spring of 2020. By late June 2020, it had led to more than 500,000 deaths, of which one-fourth in the US and about 175,000 in the EU (https://coronavirus.jhu.edu/map.html, https://covid19.who.int/). Despite considerable research activities, there are still many unknowns concerning this infectious disease, especially with regard to the substantial variability of the disease severity. Following an initial infectious phase, a “cytokine storm”, leading to pneumonia is observed in severe cases which may require intensive care. It is still unclear why infections lead to severe cases in some patients and not in others, but both endogenous and exogenous factors can likely influence the outcome of the disease.
In addition to older age and male sex, several comorbidities are associated with severe COVID-19 and increased mortality risk. Disorders such as cardiovascular disease, type II diabetes (T2D), obesity, chronic respiratory disease or hypertension are strongly linked to severe COVID-19 cases (Petrilli et al. 2020)(Zhou et al. 2020)(Stefan et al. 2020). As has recently been proposed, underlying metabolic and endocrine dysfunctions may be mechanistically linked to the exacerbation of the coronavirus infection (Bornstein et al. 2020), and these observations may inspire new insight into the pathogenesis of this disease, including biological interpretation of the mechanisms involved. Environmental stressors have already been suggested to contribute to the severity of the disease (Bashir et al. 2020; Fattorini and Regoli 2020; Zhu et al. 2020), but little mechanistic support for this association is available. A relevant approach would be to compare the biological pathways triggered by environmental stressors with those involved in the COVID-19 severity. If similar pathways are found, this would increase the likelihood that such stressors may contribute to critical stages of this disease.
Given the suspected hormonal mode of vulnerability (Drucker 2020) endocrine disrupting chemicals (EDCs) could represent important triggers of aggravated infection, e.g., in the form of phthalates, bisphenols, organochlorine pesticides, and perfluorinated alkane substances (PFASs) (Trasande et al. 2016; Vandenberg et al. 2016). Exposure to these substances may affect the immune defense, thus potentially increasing the susceptibility to develop COVID-19 (Tsatsakis et al. 2020), as supported by experimental studies(Cipelli et al. 2014; Couleau et al. 2015). For example, epidemiological evidence on children exposed to PFASs show decreased immune responses to routine vaccines (Grandjean et al. 2012) and a greater risk of developing infectious disease(Dalsager et al. 2016; Granum et al. 2013).
As promising tools to gain better insight into the possible risk factors and mechanisms, toxicological and chemical data sources have expanded substantially, thereby enabling network science and computational systems biology methods to become feasible (Audouze et al. 2013, 2018; Taboureau and Audouze 2017; Vermeulen et al. 2020; Wu et al. 2020). We have therefore conducted an integrative systems biology exploration to identify overlapping proteins that are both dysregulated by EDCs and involved in comorbidities associated with aggravated COVID-19. Based on this tripartite network, integrating protein-EDC associations and protein-disease annotations, we then performed biological enrichments of pathways to detect the most plausible relationships between EDC exposure and COVID-19 severity.
Methods
We employed a computational systems biology approach to explore putative linkages between EDCs and COVID-19 as presented in Figure 1. First, a tripartite network was created based on known associations between proteins and either COVID-19 comorbidities or EDCs, as compiled from existing databases (CompTox, DisGeNET, GeneCards) (A). Then, biological enrichment was performed with the jointly identified proteins (i.e., those retrieved in both association studies) (B) by over-representation analysis (ORA) to identify the pathways that were the highly linked to both the diseases and the EDCs (C). As a final step, the biological pathways were explored with available knowledge regarding COVID-19 mechanisms (from the literature and the AOP-Wiki database), thereby allowing consideration of hypothetical linkages between EDCs and COVID-19 (D).
Endocrine-disrupting chemical dataset
A list of 34 commonly used substances known or suspected to act as EDCs was established, based on knowledge from three data sources: the endocrine disruptor assessment list from ECHA (https://echa.europa.eu/fr/ed-assessment, as of April 24, 2020), the one from NIEHS (https://www.niehs.nih.gov/health/topics/agents/endocrine/index.cfm as of April 28, 2020), and the TEDX database (https://endocrinedisruption.org/interactive-tools/tedx-list-of-potential-endocrine-disruptors/search-the-tedx-list, as of April 24,2020).
To explore as much as possible the chemical diversities, EDCs for this study were further selected to represent different chemical classes (Table 1). The CAS numbers were used for data integration.
Disease dataset
Comorbidities known to be associated with obesity or otherwise leading to severe COVID-19 were extracted from a recent study (Stefan et al. 2020), and resulted in a total of 13 disorders for exploration in the integrative systems toxicology (Table2).
Endocrine-disrupting chemical-protein associations
Human proteins known to be associated with each of the 34 EDCs were extracted from the U.S. Environmental Protection Agency web-based CompTox Chemistry dashboard, which contains a wide range of data related to chemical toxicity, including in vitro bioassays data (as of April 30, 2020) (Williams et al. 2017). Each linked protein was matched to a gene symbol and classified using the Panther (protein analysis through evolutionary relationships) classification system (version 15, released February 14, 2020) (Mi et al. 2013), a curated biological database of gene/protein families, and their functionally related subfamilies that can be used to classify and identify the function of gene products.
Disease-protein associations
From two human protein-disease databases, proteins known to be linked to the 13 studied diseases were listed (as of April 29, 2020 for both data sources). The DisGenNet database is a discovery platform containing one of the largest publicly available collections of genes and variants associated with human diseases(Piñero et al. 2015). The GeneCards database contains manually curated information for substances and their associations to genes and proteins, that are scored (Safran et al. 2010). For the present study, only associations were kept only for those between human diseases and proteins categorized as coding proteins, and all non-human information, including gene clusters, genetic locus, pseudogenes, RNA genes and those uncategorized were disregarded. All listed proteins were matched to their gene symbol to facilitate further analysis. Each identified protein from both databases, was categorized into the protein class using the Panther classification (version 15).
Pathways enrichment analysis
To decipher biological pathways potentially linked to the selected EDCs and explore if they might overlap with the ones known for COVID-19, an ORA was done. Four major sources of protein-pathway information were independently integrated, i.e., using the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Reactome, the Wiki-pathways and the Panther databases(Fabregat et al. 2018; Kanehisa et al. 2019; Mi et al. 2013; Slenter et al. 2018). To assess the statistical significance of the protein-pathway relationships, a hypergeometric test was used for each of the four sources, followed by a multiple testing correction of the p-values with the Benjamini-Hochberg method. The ORA was performed on the common proteins identified to identify the most strongly linked proteins that are affected by the EDCs and also associated with at least one the 13 comorbidities. As a last step, manual curation allowed us to consider relevant outcomes for interpretation. The four data sources provided complementary information, with some overlapping findings.
COVID-19 and biological mechanism of action
Linkage between COVID-19 and potential biological targets and affected pathways were extracted from the literature (as of May 22, 2020) and the AOP-Wiki database (as of May 22, 2020).
Results
Endocrine-disrupting chemical-protein associations
From the CompTox database, information on the links between chemicals and human proteins were compiled. Data for 30 of the 34 chemicals could be retrieved, and a total of 208 unique human proteins were involved via 1632 associations. No information was retrieved for hexachlorobenzene, nonylphenol ethoxylate, perchlorate and tributyltin. Perfluorooctane sulfonic acid (PFOS) targeted the highest number of proteins (113), and 2,2’,4,4’,5,5’-hexachlorobiphenyl (PCB 153) was associated with only one biological target (the progesterone receptor). The most frequently affected proteins included the androgen receptor (AR) and the estrogen receptor-alpha (ESR1), which were each linked to 23 EDCs, whereas 61 individual proteins were associated with only one EDC.
To identify the biological targets that are most often affected by EDCs, proteins were grouped in clusters according to their families, as based on the Panther classification system (Figure 2). The majority of the 208 proteins related to EDCs belonged to 12 classes among the 23 present in Panther, while the remaining proteins were classified as ‘uncategorized protein class’. Each protein was assigned to only one category, although only one of them, HLA-DRA, (HLA class II histocompatibility antigen, DR alpha chain) belonged to the defense/immunity group. Other immunity-related proteins. such as interleukin 6 (IL-6) or interleukin 1 alpha (IL-1A), were not associated with any class in the Panther classification. We therefore manually added all immune system-related proteins to the “uncategorized class”. Given that Bisphenol A (BPA) increases the release of these proteins(Ben-Jonathan et al. 2009), and because antibodies to the IL-6 receptor (such as tocilizumab) or to the IL-1 receptor (such as Anakinra) are currently tested for the treatment of COVID-19 patients (Zhou et al. 2020), we also explored if the proteins selected could be mapped to defense and/or immunity biological categories. For this purpose, we used the Gene Ontology (GO) classification (as of May 26, 2020), and among the 208 proteins dysregulated by EDCs, 58 were associated with inflammatory response, 75 with defense response, and 66 with regulation of immune system process.
Disease-protein associations
Regarding diseases associated with human proteins, two databases were screened. From the DisGeNET database, we were able to retrieve information for 8 of the 13 diseases, which were connected to 3262 unique proteins via 7195 links (as of April 29, 2020). The proteins were categorized in 22 protein classes using the Panther classification (version 15) (Figure S1). Proteins that did not belong in any class were again grouped into the uncategorized class. Obesity and diabetes were linked to proteins belonging to each of the 22 categories, whereas insulin resistance and dyslipidemia were linked to only half of the categories.
From the GeneCards database, all 13 predisposing diseases were retrieved (as of 29 April 2020), and a total of 115,289 associations were identified between the diseases and 29,094 unique human proteins were extracted. Among them, only protein-coding information according to HGNC, Ensembl or Entrez Gene were kept (proteins data related to biological regions, gene clusters, genetic loci, pseudogenes, non-coding RNA genes and uncategorized elements were not considered), thereby reducing the total number of unique protein to 18,931, representing 97,855 disease-protein links. As a next step, grouping of the proteins using the Panther classification system allowed identification of 23 clusters correspond to the 23 different protein classes (Figure S2). Each protein was assigned to only one category, except for ameloblastin (AMBN), which was associated with both ‘extracellular matrix protein’ and ‘structural protein’. Proteins not associated with Panther classes, were again grouped into the uncategorized class. Excluding the viral or transposable element protein class, all diseases (except dyslipidemia) were associated with all the other Panther classes.
In order to keep the most relevant protein-disease associations obtained from the GeneCards database, data were filtered based on their scores. The GeneCards scores are calculated based on publications mentioning a protein and a disease, using a Boolean model. The higher the score, the more relevant the protein-disease association is. Among the 97,855 links between the 13 diseases and 18,931 proteins, the score values ranged between 0.13 (representing very low association) to 228 (very high evidence for a protein-disease connection). After evaluation of the extracted data (number of proteins by GeneCards scores), we selected associations with a score ≥ 20 (see Figure S3). Within this threshold, a total of 5732 associations were retained that link the 12 diseases with 2079 unique human proteins (no information was retained for ‘dyslipidemias’ from the GeneCards database).
Generating a tripartite network of protein-EDC-disease associations
A human bipartite associative network of proteins and the 13 diseases was created. Among the 3262 unique proteins from the DisGeNET, and the 2079 proteins from the GeneCards databases, 1157 were overlapping proteins and only 922 and 2105 proteins were uniquely associated with GeneCards or DisGeNET, respectively. All 4184 unique proteins were again grouped into 23 clusters using the Panther classification (the class ‘viral or transposable element protein’ was not kept after the cleaning step. Among the groupings, we retrieved a cluster of proteins linked to the ‘defense/immunity’ category. These results were merged with the bipartite protein-EDCs network to develop a tripartite network (Figure 2).
Translation into pathways
To identify biological pathways that may be involved in the predisposing diseases while also being dysregulated by the EDCs, we first analyzed the overlaps between the two sets of proteins. Among the proteins identified from the three data sources, 98 were common (Figure S4), and all of them were mapped to unique Entrez GeneID, and could therefore be used for biological enrichment analyses, which were performed independently using four data sources (KEGG, Reactome, Wiki-pathways and Panther). The ORA analysis revealed several statistically significant pathways linked to interleukins/cytokines signaling, intracellular signaling pathways and, regulation of metabolic pathways (Table 3). Interestingly, the different data sources showed very significant associations with common pathways, such as interleukins (IL) related pathways: IL-4 and IL-13, IL-10 signaling pathways (padj < E-16, and padj of 2.85E-09 respectively, Reactome), IL-17 signaling pathway (padj of 1.05E-10, KEGG), IL-3, IL-5 and IL-18 signaling pathways (padj of 1.09E-09, 2.52E-09, 1.49E-08 respectively, Wiki-pathways), the IL-signaling pathways (padj of 1.10E-05, Panther); or the Toll-like receptor signaling pathway (padj 3.91E-09 for KEGG, padj of 0.99 for Panther and padj 1.93E-08 for Wiki-pathways).
Among the most significant pathways, several were retrieved from each of the data sources with relation to the AGE/RAGE pathway (i.e. Advanced Glycation End products and its receptor), which is known to cause cellular stress and inflammation. The AGE are formed non-enzymatically, by Maillard reaction products (carbohydrates with proteins and/or lipids) and bind to the RAGE. Formation of AGE has been associated with chronic diseases such as type 2 diabetes (Cai et al. 2012; Menini et al. 2018). Similarly, the stress or inflammatory pathways (e.g shear stress, defined as the tangential force exerted by the blood flow on the vascular endothelium, TNF-alpha) are highlighted by our analysis; the shear stress activates the AhR signaling pathway, which is also involved in the regulation of IL-17 production by the Th17 lymphocytes; interleukin 17 has been suspected to be involved in the pathogenesis of COVID19(Gutiérrez-Vázquez and Quintana 2018; Han et al. 2008; Pacha et al. 2020). Interestingly, inflammation is suspected to influence insulin resistance.
Exploration of EDCs linkage to COVID-19
To explore putative links between COVID-19 and exposure to EDCs, we first screened the AOP-Wiki database, and then further examined the pathways identified using literature references.
In the AOP-Wiki database, only one AOP was related to COVID-19, and it involves several key events, such as ‘increased pro-inflammatory mediators’ (KE 1496), ‘increased inflammatory immune responses’ (KE 1750), which leads to the adverse outcome ‘increased mortality’ (AO 351). Such knowledge-based linear chain of events highlights the importance of the link between COVID-19 and inflammatory processes.
Discussion
In order to investigate possible links between exposure to EDCs and the severity of COVID-19, we explored a computational systems biology approach. The tripartite network model first linked EDCs to targeted proteins and then proteins related to diseases that predispose to more serious COVID-19 development, thereby allowing us to identify common signaling pathways. The identification of such joint pathways and their role as possible targets of EDCs highlights the potential links between exposure to environmental chemicals and COVID-19 severity.
This integrative approach can be easily applied as a new approach methodology (NAM) (Bopp et al. 2019), which may offer support to methods alternative to animal testing or to identify biological pathways that require more focused laboratory study. Previous studies have demonstrated that systems chemical toxicology models combined with computational network biology may help in understanding chemical toxicity in humans (Hartung et al. 2017; Nie et al. 2015; Taboureau and Audouze 2017). Our tripartite network supports the notion that exposure to EDCs may contribute to aggravation of COVID-19. Although major links were identified at extremely low p values, the approach relies on existing information available in within the very substantive data sources, but some causal associations may have been overlooked or disregarded because of missing or incomplete information.
To assess the validity of our approach, a more focused expert analysis was attempted, where we selected the Th17 and the AGE/RAGE signaling pathways because of their pathophysiological relevance in the context of COVID-19. The interleukin-17 (IL-17) signaling pathway plays several important roles, and IL-17 is produced by a pro-inflammatory subtype of T helper lymphocytes named Th17 cells, located at mucosal barriers where they contribute to pathogen clearance. The IL-17 produced stimulates the synthesis of cytokines (IL1ß, TNF-alpha…) and chemokines (MCP-1…) by other cell types, thereby favoring the recruitment of monocytes and neutrophils at inflammatory sites. However, an over-activation of Th17 cells can lead to a hyper-inflammatory state which is deleterious (Pacha et al. 2020).
The highly variable symptomatology associated with the infection by SARS-CoV-2 depends on the levels of IL-17 and of other cytokines including IL-1ß, IL-6, IL-15, TNF-alpha and IFNγ. The most deleterious effect of SARS-CoV-2 in humans is an acute lung injury leading to a severe acute respiratory syndrome (SARS) that is partly due to IL-17-related excessive recruitment of pro-inflammatory cells and production of pro-inflammatory cytokines. Therefore, an increased basal level of IL-17 (in the absence of infection, for example due to obesity or to induction by a chemical) might represent a lung injury risk associated with SARS-CoV-2 infection. Our finding of EDC linkage to this pathway is therefore of high pathogenetic relevance.
Obesity promotes a high basal level of inflammation which contributes to insulin resistance and type 2 diabetes(Goldberg 2009). This phenomenon is due to an infiltration of the adipose tissue (AT) by macrophages and T cells and their production of various pro-inflammatory cytokines, including IL-1ß, TNF-alpha, IL-17 and IL-6. Several EDCs are suspected to be obesogenic (and are subsequently named obesogens). This has been demonstrated for several substances (e.g. tributyltin) and linked to the stimulation of pro-adipogenic signaling pathway (e.g. through PPARγ)(Egusquiza and Blumberg 2020). Similarly, the aryl hydrocarbon receptor (AhR) is highly expressed in Th17 cells and is an essential contributor to the production of IL-17(Veldhoen et al. 2008). The AhR, known as the receptor of dioxins and dioxin-like PCBs, is also activated by shear stress (SS), another pathway highlighted in our computational analysis. Indeed, several studies have shown using various endothelial models that laminar SS leads to the activation of two target genes of the AhR, namely CYP1A1 and CYP1B1(Conway et al. 2009). Two recent studies suggest an indirect link between SARS-CoV-2 and SS by showing that the expression of ACE2 (angiotensin-converting enzyme 2), the receptor of the virus, is increased by SS (Song et al. 2020).
These observations support a dual impact of EDCs on IL-17 production and inflammatory state; this impact could be indirect due to the effect of these chemicals on obesity or through a direct stimulation of several signaling pathways, such as AhR or PPARγ, leading to an overproduction of systemic IL-17; the shear stress pathway represents an additional link between AhR activation and the EDC/disease connection. The implication of shear stress also suggests a possible contribution of increased expression of ACE2, the receptor of the SARS-CoV-2. While the role of these pathways at the nexus between exposure to EDCs and COVID-19 severity appears to be relevant, their actual contribution remains to be demonstrated and their putative role as therapeutic targets remains to be further substantiated.
Our integrative systems biology study also indicates a strong statistical association between the AGE/RAGE signaling pathway, chronic diseases and EDC effects. This is likely due to the well-known links between this pathway and type 2 diabetes(Ravichandran et al. 2019). Indeed, hyperglycemia leads to increased amounts of glycation products and their metabolites which results in the activation of the RAGE receptors. The latter are highly expressed in endothelial cells, and their activation leads to increased oxidative stress and inflammation and ultimately to endothelial damage, thrombotic disorders and vascular diseases (Egaña-Gorroño et al. 2020). Other endogenous ligands can also activate RAGE, among them HMGB1 (high-mobility group box 1), an extra-cellular protein also linked to a variety of inflammatory responses(Andersson et al. 2020). Interestingly, the AGE/RAGE signaling pathway is highly expressed in the lung vasculature and has been implicated in several pulmonary diseases(Oczypok et al. 2017). All these observations support the implication of the AGE/RAGE signaling pathway in vascular, thrombotic and lung diseases which are the hallmarks of COVID-19 severity. Interestingly, there are also complex connections between HMGB1 and ACE2 which is the receptor for SARS-Cov2 and other coronaviruses(Luft 2016). These results are in accordance with recent proposals in published commentaries of environmental chemical impacts on COVID-19 progress(Andersson et al. 2020; Rojas et al. 2020).
The three-way approach did not attempt to identify direct immunotoxic effects due to environmental chemicals otherwise considered to be EDCs. However, some of the EDCs selected, i.e., PCB-153, PFOA and PFOS, are known to have immunotoxic properties (Heilmann et al.), and the same is true for some common air pollutants (Tsatsakis et al. 2020). Accordingly, the impact of environmental chemicals on COVID-19 severity demands attention.
Conclusions
The results of this computational study appear as a promising initial step toward systematically linking a major group of environmental chemicals to the severity of COVID-19, although the findings need to be further supported by high-throughput screening tests, clinical and experimental data. Nevertheless, these observations bridge environmental stressors and infectious diseases and support an integrated exposome approach. Preliminary focus on the AGE/RAGE and IL-17 pathways illustrates the potential connection between exposure to EDCs and diseases predisposing to COVID-19 severity.
Data Availability
The authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.
Acknowledgments
The authors would like to acknowledge OBERON (https://oberon-4eu.com/), a project funded by the European Union’s Horizon 2020 research and innovation program under grant agreement no. 825712. This work was also supported by the University of Paris and INSERM. PG is supported by the National Institute of Environmental Health Sciences, NIH (P42ES027706)