ABSTRACT
Systemic sclerosis (SSc) is a rare autoimmune disease characterized by vasculopathy and progressive fibrosis of the skin and internal organs. Individuals with SSc often suffer from chronic acid reflux and dysphagia due to loss of esophageal motility. However, the pathogenesis of esophageal dysmotility in SSc is poorly understood. To determine whether distinct changes in esophageal epithelial cells contribute to impaired motility in SSc, we investigated the stratified squamous esophageal epithelium using single-cell RNA sequencing (n=306,372 cells) in individuals with SSc compared those with gastroesophageal reflux disease (GERD) as well as healthy controls. The proportion of epithelial cells in the outermost, superficial compartment of the esophageal epithelium was significantly reduced in SSc (9.4% vs 21.6% in HCs). Differential gene expression in SSc was primarily limited to the superficial compartment (3,572 genes vs. 232 in all other compartments), with significant upregulation of extracellular matrix and keratinization genes. These cellular and molecular changes in SSc were highly correlated with those seen in GERD, indicating they were secondary to reflux; however, their magnitudes were more pronounced in the proximal esophagus, suggesting that esophageal dysmotility leads to greater proximal acid exposure, which may contribute to aspiration. SSc-specific gene dysregulation implicated immunoregulatory pathways likely pertinent to pathogenic mechanisms. By offering a comprehensive view of transcriptional dysregulation at single-cell resolution in human esophageal epithelial cells in SSc compared to GERD and healthy tissue, this work clarifies the state of epithelial cells in SSc-induced esophageal dysfunction.
INTRODUCTION
Systemic sclerosis (SSc), also known as scleroderma, is an immune-mediated rheumatic disease of unknown etiology that causes progressive fibrosis of the skin and internal organs. Although uncommon (18-26 per 100,000 people)1–4, SSc is one of the deadliest autoimmune disorders5, with a 10-year survival rate of 45-68% following diagnosis4,6,7. Over 90% of individuals with SSc report gastrointestinal (GI) dysfunction8, with esophageal dysmotility being the most common GI manifestation9. Individuals with SSc and esophageal involvement typically suffer from chronic acid reflux and dysphagia due to loss of esophageal motility and are at much greater risk of esophageal stricture and Barrett’s esophagus8,10,11. The rate of esophageal involvement was nearly twice as high in individuals with SSc who died within 5 years of diagnosis12.
There are multiple, interrelated mechanisms that may cause esophageal dysmotility in SSc. Vascular damage and neurogenic impairment lead to smooth muscle cell fibrosis13, but esophageal muscle atrophy has also been observed in SSc without vasculopathy or fibrosis14. Inflammatory signatures15, absence of anti-centromere antibodies16, and presence of anti-topoisomerase I antibodies are associated with esophageal complications, as well8. Complicating research efforts is significant clinical and molecular heterogeneity observed between SSc patients17–19 and difficulty distinguishing between effects from autoimmune processes and secondary factors such as severe reflux. Consequently, the pathogenesis of esophageal disease in SSc remains poorly understood.
Molecular investigations of SSc esophageal involvement have primarily focused on canonical pathways in submucosal and muscle layers20. However, the mucosa is no less essential for normal esophageal transport21,22, and recent studies have suggested that esophageal epithelial cells (EECs) may play a more central role in the pathogenesis of SSc esophageal involvement than previously thought20. Mouse models with epithelial-cell-specific knockout of Fli1, an Ets transcription factor implicated in SSc pathogenesis23, spontaneously develop dermal and esophageal fibrosis and interstitial lung disease24. Importantly, the resultant lung disease is mediated by T-cell autoimmunity, but fibrosis of the skin and esophagus persist in mice additionally lacking mature T and B cells, suggesting that the fibrosis is principally driven by the epithelia20,24.
Most studies implicating epithelial cells in SSc esophageal dysmotility have been performed in mice or in vitro, and molecular analyses in humans have so far been limited to bulk tissues15. Furthermore, single-cell gene expression studies performed in other diseases of the esophagus, including squamous cell carcinoma25,26 and allergic eosinophilic esophagitis27,28, have identified substantial cellular and molecular changes in EECs compared to healthy tissue, such as significant expansion of non-proliferative suprabasal cells27,28. Therefore, to comprehensively investigate the effects of SSc on the esophageal epithelium, we performed single-cell RNA sequencing (scRNA-seq) of esophageal mucosal biopsies from SSc patients with clinically significant esophageal involvement and compared EEC distributions and gene expression signatures to individuals with gastroesophageal reflux disease (GERD) and healthy controls (HCs). By examining gene expression profiles from human tissue at single-cell resolution, this work sheds essential light on the cellular roots of esophageal dysfunction in SSc by clarifying the pathogenic role of the squamous epithelium, one of the most integral tissues supporting healthy esophageal function.
METHODS
Subjects
The SSc study population were recruited from adult patients aged 18 to 89 diagnosed with SSc according to the 2013 ACR/EULAR classification criteria, who visited the Northwestern Medicine Esophageal Center for esophageal symptom evaluation and motility testing, as described previously29, between 2020 and 2021. Patients with GERD were recruited at the primary visit following positive Bravo pH testing28. Patients with technically inadequate panometry or manometry studies, previous foregut surgeries (including prior pneumatic dilation), or mechanical obstructions in the esophagus such as esophageal stricture, eosinophilic esophagitis, severe reflux esophagitis (Los Angeles classification C or D), or hiatal hernias larger than 5 cm were excluded. HCs met asymptomatic criteria including lack of esophageal symptoms, no history of alcohol dependency or tobacco use, body mass index less than 30kg/m2, and no previous antacid or proton pump inhibitor treatment28. The study protocol received approval from the Northwestern University Institutional Review Board.
Symptom evaluation and motility testing
Esophageal function testing included functional lumen imaging probe (FLIP) panometry during sedated endoscopy, high-resolution manometry (HRM), and in some subjects, 24-hour pH-impedance or timed-barium esophagram, which were completed and interpreted as previously described30–33. HRM classifications were based on the application of the Chicago Classification v4.0 to 10 supine and 5 upright test swallows34. FLIP panometry classifications were based on previous evaluation of asymptomatic volunteers and patients35–37. During endoscopy (same encounter as HRM/FLIP), esophageal mucosal biopsies were obtained at approximately 5-cm (“distal esophagus”) and 15-cm (“proximal esophagus”) proximal to the squamocolumnar junction. Acid exposure was measured as the percentage of time with esophageal pH<4 over a monitoring period of 24 hours. Most patients also completed validated symptom-questionnaires (patient-reported outcomes) on the day of esophageal motility testing with FLIP or HRM, including the Brief Esophageal Dysphagia Questionnaire (BEDQ), the Gastroesophageal Reflux Disease Questionnaire (GerdQ), and the Northwestern esophageal quality of life (NEQOL) survey.38–40.
Sample processing and sequencing
Esophageal mucosal biopsies from proximal and distal esophagus were processed for scRNA-seq as described in Clevenger et al28. Briefly, following tissue digestion and filtering, cell suspensions met 85% minimum viability via Cellometer Auto2000 (Nexcelom Bioscience). Cells were loaded into a Chromium iX controller (10X Genomics) on a Chromium Next GEM Chip G (10X Genomics) to capture approximately 10,000 cells per sample and were processed for encapsulation following the manufacturer’s protocol. Cell barcoding and library construction were performed using the 10X Genomics Chromium Next GEM Single Cell 3’ Reagents Kits v3.1 and Dual Index Kit TT Set A according to the manufacturer’s protocol. Sequencing was performed by the NUseq Core using the Illumina Novaseq 6000. Paired-end reads consisting of a 28-base-pair read for cell barcodes and unique molecular identifiers and a 90-base-pair read for transcripts were aligned to the GRCh38 reference transcriptome using Cell Ranger v6.1.2.
Sample and cell quality control
Samples with high ambient RNA (<70% of sequencing reads mapped to cells) were removed from consideration. Gene expression counts generated by Cell Ranger were analyzed using Seurat v4.3.041. Cells with low gene diversity (<200 unique genes), low unique molecular identifier counts (<500), or excessive counts (>100,000) were removed from consideration. Cells with high proportions of mitochondrial DNA reads, indicative of apoptosis or lysis, were removed using the bivariate regression models implemented in the miQC package42. Cells were filtered if their proportion of reads mapped to mitochondrial DNA exceeded 0.05 and their posterior probability of belonging to a compromised cell distribution was greater than 0.75 according to miQC. Doublets, in which more than one cell is clumped together in a single droplet, were predicted and removed using scDblFinder43 with an expected, additive doublet rate of 0.01 per thousand cells with standard deviation of 0.01.
Sample integration and cell type annotation
Individual samples were normalized and integrated on 3,000 variable features using Seurat’s SCTransform procedure44 with v2 regularization45 and canonical correlation analysis dimension reduction. Integration anchors were determined using a reference consisting of three pairs of proximal and distal samples from HCs. Principal component (PC) analysis was performed on the transformed and integrated expression counts. Uniform manifold approximation and projection (UMAP)46 embeddings were calculated on the top 40 PCs with n=200 neighbors. Cell clustering was performed using Seurat’s modularity-based clustering on a shared nearest neighbor (SNN) graph with k=25 on the top 40 PCs. Clusters were annotated according to the expression of established cell-specific gene markers (Supplemental Table 1, Supplemental Figure 1). For annotation of smaller cell clusters, we first filtered out the epithelial, myeloid, lymphoid, and endothelial cell clusters and then recomputed PCs and re-clustered the subset (PCs = 25, k=15).
Epithelial cell characterization
After isolating the subset of epithelial cell clusters, we further filtered cells with disproportionately low read counts (≤3,500; Supplemental Figure 2, A-C). The samples were then re-integrated using the procedure described above. UMAP embeddings were calculated on the top 35 PCs with n=100 neighbors. Epithelial compartments (basal, suprabasal, and superficial) were identified according to expression of established markers (Supplemental Figure 4D)27,28,47–50. For each compartment, we computed composite gene signature scores using Seurat’s AddModuleScore() function (Supplemental Figure 2E). Cells were initially clustered on an SNN graph with k=50 with resolution=0.25 (Supplemental Figure 2F), but because the modularity optimization clustering implemented in Seurat grouped proliferating cells across different epithelial compartments and inconsistently distinguished suprabasal and superficial cells, we applied K-means clustering on the expression of the compartment-specific markers to delineate epithelial compartments. We performed separate K=2 K-means clustering on 1) the Seurat clusters containing basal and early suprabasal cells (clusters 1-5) to distinguish basal from suprabasal compartments and 2) the Seurat clusters containing late suprabasal and superficial cells (clusters 6-8) to demarcate suprabasal and superficial compartments (Supplemental Figure 2, G-H). Proliferating epithelial cells were identified using Seurat’s CellCycleScoring() function. Cells predicted as being in S or G2/M phases based on the relative expression of corresponding markers51 were labeled as proliferating cells (Supplemental Figure 2I). We tested for statistical differences in proportions of epithelial cell populations between conditions using MASC52. To derive a continuous epithelial differentiation score for each cell (Supplemental Figure 3), we combined the compartment-specific module scores as follows : Where EDSC is the epithelial differentiation score for a given cell. This EDSC was then scaled between 0 and 1:
Gene expression analysis
To identify dysregulated pathways in SSc, we first calculated differential gene expression between conditions within each epithelial compartment, considering each cell as a sample. Differential gene expression was calculated using the FindMarkers() function in Seurat with the Wilcoxon rank-sum test on log-normalized RNA counts for genes expressed in at least 1% of cells by compartment. We adjusted for multiple testing using a Bonferroni correction accounting for the number of genes tested. Genes with an adjusted P<0.05 and absolute log2 fold change >0.1 were considered significantly differentially expressed. 53,54 Genes were then ranked by the magnitude and statistical certainty of their expression differences: Gene set enrichment was calculated using 3,795 canonical pathway gene sets from the Human Molecular Signatures Database55 using gene set enrichment analysis (GSEA)53,54. Statistical gene set enrichment was calculated using one-way positive enrichment with 100,000 permutations on pathways with at least 10 genes. Mitochondrial and ribosomal genes were excluded from the GSEA.
To more robustly characterize gene expression differences between conditions56, we performed pseudo-bulk differential expression analysis by aggregating RNA counts per sample within epithelial compartments. Genes with at least one count per million detected in at least 75% of samples in at least one condition per epithelial compartment were analyzed. To mitigate outlier-driven signals, genes with log-transformed expression greater than 3 standard deviations from the mean in exactly one individual were excluded. Differential gene expression was evaluated using edgeR v3.42.457–59. Gene expression was modeled using a quasi-likelihood negative binomial generalized linear model, with global negative-binomial trended dispersion estimated across all samples and gene-specific quasi-likelihood dispersions estimated within each epithelial compartment. Differences in gene expression across conditions were assessed using quasi-likelihood F-tests with false-discovery rate (FDR) adjustment. We conducted tests combining both proximal and distal esophageal regions, as well as separate tests for each esophageal region.
Transcription factor enrichment analyses were conducted on the pseudo-bulk results using the enricher() function from clusterProfiler v4.8.360. We used consensus results from ENCODE61 and ChEA62, as curated by the Ma’ayan Lab63 and available at http://amp.pharm.mssm.edu/Enrichr/64to test for the enrichment of transcription factor target genes from among genes that were differentially expressed between SSc, GERD, and HCs with Bonferroni-adjusted p<0.05. Background gene sets were comprised of the genes included in the pseudo-bulk differential expression analyses for each epithelial compartment. Transcription factors absent from the background gene set were removed as candidate transcription factors. Separate enrichment tests were performed for all differentially expressed genes (DEGs) and those that were upregulated or downregulated in SSc. Enrichment results were adjusted using a Bonferroni correction accounting for the number of transcription factors with more than 2 genes in the significant DEG list .
Pseudo-bulk sample permutation
Because of the sample size imbalance by disease state, which inherently yields different statistical power for different tests across conditions, we permuted differential expression testing with equivalent sample sizes to better evaluate the relative extent of dysregulation by EEC compartment and biopsy location between SSc and GERD. We permuted all possible combinations of SSc samples with sample size n=4 for both the proximal and distal regions and used the median statistics in the SSc vs. HCs tests to more directly compare the results with those observed in GERD vs. HCs.
Gene expression trajectory analysis
To model gene expression as a continuous function along the axis of epithelial cell differentiation, we utilized the framework implemented in Lamian65, with the epithelial differentiation score serving as the underlying pseudotime metric. Lamian fits a polynomial B-spline curve with the number of knots determined by minimization of the Bayesian Information Criterion. With few cells near the lower and upper limits of our epithelial differentiation score and the interpolation range bound by the minimum and maximum pseudotime values, the B-spline models produced erratic fits near the interpolation boundaries. Therefore, we instead modeled natural cubic splines, which are constrained to linearity at the boundaries (second derivative is zero), and we excluded the lowest and highest 1% of cells from the interpolation interval. Using these modeled gene expression trajectories, we performed hierarchical clustering of differential gene trajectories for all genes that were significantly differentially expressed between SSc and HC in single-cell-level differential expression testing.
Clinical phenotype associations
To test for associations with categorical measures of esophageal function (FLIP, HRM, NM), we performed ordinal logistic regression that modeled likelihoods of increasing clinical severity. We excluded individuals with achalasia or esophagogastric junction outflow obstruction (EGJOO) from ordinal regression modeling motility. For FLIP we combined the “borderline/diminished” and “impaired/disordered” phenotypes. For NM we combined the Stage I and Stage II ineffective motility phenotypes. The resultant ordering for the ordinal regression was “normal”< “ineffective” < “absent” for HRM; “normal” < “borderline/diminished & impaired/disordered” < “absent” for FLIP; and “normal” < “stages I & II – ineffective” < “stage III – absent” for NM.
For association testing with quantitative clinical measures in SSc, we first performed a principal components analysis (PCA) on a set of nine quantitative clinical traits: HRM mean distal contractile interval (DCI), HRM basal esophagogastric junction (EGJ) pressure, HRM EGJ contractile index, HRM median integrated relaxation pressure (IRP), FLIP intra-balloon pressure at 60ml, FLIP EGJ distensibility index, the GerdQ impact score, the BEDQ score, and the NEQOL score. Missing values were imputed with the mean. We then modeled associations with the first two PCs using linear regression. Gene expression was represented using log-transformed counts.
RESULTS
Whole tissue single-cell RNA sequencing (scRNA-seq) was performed on paired proximal and distal mucosal biopsies from ten patients with systemic sclerosis (SSc), four with gastroesophageal reflux disease (GERD), and six healthy controls (HCs). All participants but one in each group were female (Table 1). SSc patients were significantly older than HCs (54.9 vs 27.0 years; P<0.001). Among SSc patients, six had the limited cutaneous subtype, three had the diffuse cutaneous subtype, and one had sine scleroderma with no cutaneous symptoms (Supplemental Table 1). Mean SSc disease duration at the time of biopsy was 132 months. All SSc patients had positive serum antinuclear antibody and impaired esophageal motility. Following sample-level quality control (Supplemental Table 2), 39 samples were retained for analysis, including 19 from the proximal esophagus and 20 from the distal esophagus.
Characterization of esophageal mucosal cell populations
A total of 306,372 esophageal cells (7,856 mean cells per sample) were retained for cell clustering and gene expression analysis (Figure 1). Each cell type was identified according to the expression of established cell-specific gene markers including KRT6A, KRT13, and KRT15 for epithelial cells (Supplemental Table 3, Supplemental Figure 1). Epithelial cells comprised the bulk of the sample (n=264,858; 86.45%), followed by lymphoid cells (n=25,129; 8.20%), myeloid cells (n=10,019; 3.27%), endothelial cells (n=4,027; 1.31%), and all other cell types (n=2,339; 0.76%).
Significant loss of superficial EECs in SSc
Following isolation of the epithelial cell cluster and additional quality control (Supplemental Figure 2, A-C), we characterized the remaining 230,720 esophageal epithelial cells (EECs) according to their differentiation and cell cycling states (Figure 2A). We classified the EECs into five primary compartments: basal (n=55,818; 24%), proliferating basal (n=36,439; 16%), proliferating suprabasal (n=25,314; 11%), suprabasal (n=82,283; 36%), and superficial (n=30,866; 13%) based on relative expression of canonical epithelial genes and cell cycle markers (Figure 2, B-C; Supplemental Figure 2, D-E). Roughly 40% of basal cells and 24% of suprabasal cells were proliferating, and the fractions of cells that were proliferating did not differ by disease (Figure 2D; Supplemental Figure 2, F-I). Proportions of basal and suprabasal cells were not significantly different across conditions, but there were significantly fewer superficial cells in SSc compared to HCs (μSSc=0.10, μHC=0.21, P=0.003, Figure 2E). The reduction of superficial cells in SSc was further apparent when examining EEC differentiation score distributions (Supplemental Figure 3) and projecting UMAP embeddings separately by disease state (Figure 3).
EEC gene dysregulation highly correlated in SSc and GERD, limited to superficial cells
Overall, the gene expression correlation between biopsy locations was extremely high, ranging from r=0.98 to r=0.99 for the 2,000 most variable genes (Supplemental Figure 4A) aggregated by condition and epithelial compartment (Supplemental Figure 4B; Supplemental Table 4). Inter-sample expression correlations indicated that the greatest gene expression heterogeneity was in the superficial compartment (mean=0.94, std. dev.=0.027), followed by the basal compartment (mean=0.94, std. dev.=0.018; Figure 4, A-B). Differential gene expression between conditions was most predominant in the superficial compartment (Figure 4, C-E). At the single-cell level, about 6.6% and 4.1% of expressed genes were significantly differentially expressed between SSc and HCs in the superficial compartment in the proximal and distal regions, respectively, compared to just 1.3-1.4% and 1.3-3.0% in non-proliferating basal and suprabasal cells (Supplemental Tables 5-6). Most differentially expressed genes (DEGs) were only differentially expressed in one epithelial compartment (Figure 4, C-D). We observed more suprabasal differential expression in the distal esophagus than in the proximal esophagus for both SSc and GERD (Figure 4, C-E). Many of the same genes were significantly dysregulated in both SSc and GERD compared to HCs (Supplemental Table 5). Depending on the compartment and biopsy location, 22-44% of significantly dysregulated genes in SSc or GERD were differentially expressed in both conditions (Figure 4E). Consequently, the DEGs in SSc and GERD were significantly enriched for some of the same pathways, particularly those related to the extracellular matrix and keratinization (Figure 4F). Clustered trajectories of relative gene expression (SSc vs. HCs) by epithelial differentiation score for all single-cell-level DEGs revealed distinct patterns of differential expression (Figure 4, G-H), which were principally distinguished by the direction of gene expression change in superficial cells. Gene expression changes were more pronounced in the most terminally differentiated superficial cells in the proximal esophagus (Figure 4G), whereas in the distal esophagus the differences appeared more in late-stage suprabasal and early-stage superficial cells (Figure 4H).
Six pathways were significantly enriched among DEGs between SSc and HCs: matrisome (suprabasal, distal PFDR=2.3×10-4, Figure 4H; superficial, distal PFDR=2.6×10-2) and matrisome-associated genes (suprabasal, distal PFDR=4.4×10-4); cornified envelope formation (suprabasal, distal PFDR=2.1×10-3; superficial, distal PFDR=4.7×10-2), and keratinization genes (suprabasal, distal PFDR=2.1×10-3; superficial, distal PFDR=5.0×10-2); developmental biology-related genes (suprabasal, distal PFDR=4.1×10-4; superficial, distal PFDR=9.2×10-3); and innate immune system genes (PFDR=2.8×10-2, Figure 4G). No pathways were significantly enriched among genes differentially expressed between SSc and GERD in any epithelial compartment. The leading edges of significantly enriched pathways (Supplemental Table 7) contained recurring genes from shared protein families, including serine protease inhibitors (serpins, Supplemental Figure 4C), keratins (Supplemental Figure 4D), S100 proteins (Supplemental Figure 4E), and small proline-rich proteins (SPRRs, Supplemental Figure 4F). The most frequent gene in leading edges of significantly enriched pathways was PI3 (Supplemental Figure 4G), including all pathways enriched in SSc. Among the other most enriched pathways specific to SSc were those related to the regulation of metal and immune homeostasis (Figure 4F, Supplemental Table 7).
Upon aggregating single-cell expression counts by sample for each EEC compartment, we found that gene expression patterns clustered by EEC compartment (Supplemental Figure 5A) and that differential expression between conditions was nearly exclusively limited to superficial cells (Figure 5, A-B; Supplemental Figure 5B). There were 3,572 genes differentially expressed between SSc and HCs with FDR q<0.05 in the superficial compartment, compared to just 232 total in all other compartments, including only 172 that were not also differentially expressed in the superficial compartment (Supplemental Table 8). The differential expression observed in SSc was again highly correlated with the changes seen in GERD (Figure 5, C-D). The correlations in log2 fold change between SSc and GERD vs. HCs were consistently between 0.40-0.53 in all compartments except for superficial cells, where we observed much stronger correlation in the proximal region (r=0.82) than in the distal region (r=0.36; Supplemental Figure 5C).
We detected 2,103 genes that were significantly differentially expressed in SSc but not GERD in the proximal esophagus and 282 in the distal esophagus, of which 170 were differentially expressed only in the distal region (Figure 5, C-D; Supplemental Table 9). Among the most upregulated SSc-specific genes (Figure 5, C-E; Supplemental Figure 5, C-D) were genes related to inflammation (PTGES, MFGE8)66,67, innate immune response (FCGBP, BST2, CD44, APOBEC3A)68–71, immune cell migration (C10orf99, LTB4R, ACKR3)72–74, antigen presentation (HLA-B, CD74, TAP1, PSMB8, PSMB9)75, natural killer cell activation (CLEC2B)76, and fibroproliferation (SGK1, HBEGF)77,78. Four of the top five and eight of the top 20 most downregulated SSc-specific genes by fold change in superficial EECs in the proximal esophagus were metallothioneins (MT1A, MT1E, MT1F, MT1G, MT1H, MT1M, MT1X, MT2A). The metallothioneins comprised the bulk of three pathways uniquely enriched in SSc with P<0.001: zinc homeostasis, copper homeostasis, and cellular responses to stimuli (Figure 4F). The most down-regulated SSc-specific gene across both the proximal and distal esophagus was the H19 lncRNA. Interestingly, the most down-regulated gene by fold change in GERD, MUC22, was not differentially expressed in SSc. One gene, SLC8A1-AS1, was significantly differentially expressed in both GERD and SSc, but in opposite directions (Figure 5, C-E; Supplemental Table 9).
More genes were significantly differentially expressed in SSc than in GERD, but in terms of relative fold change vs. HCs, we observed greater expression changes in GERD (Supplemental Figure 5C). We therefore performed all possible sample permutations with equal numbers of SSc and GERD samples (n=4) and repeated differential gene expression testing to more directly compare the relative number of DEGs in both conditions (Figure 5F). In superficial cells from the proximal esophagus, there were more DEGs in SSc in 60% of permutations, but the relative FC differences were greater in GERD in 61% of permutations (Supplemental Table 10). In the distal region, there were more DEGs in GERD in 62% of permutations, and the relative FC differences were greater in GERD in all permutations. The correlations between SSc and GERD in terms of relative gene expression changes were consistently much higher in the proximal region permutations than in the distal region (Figure 5, G-H). The permutation results confirmed that the differences in gene expression observed in SSc relative to GERD were not simply due to differences in sample size.
In summary, the gene expression changes measured in SSc and GERD were highly correlated. In the distal esophagus, differential expression was more pronounced in GERD than in SSc, whereas in the proximal esophagus there were more DEGs in SSc. In both conditions, differential gene expression was most prevalent in the superficial compartment, but the expression changes appeared comparatively earlier in the differentiation of distal EECs relative to proximal EECs.
Transcription factor enrichment analysis
We next examined whether DEGs in SSc superficial cells were significantly enriched for targets of specific transcription factors. Among genes differentially expressed in SSc compared to GERD, the target genes of IRF1 were significantly enriched (Table 2). IRF1 targets were significantly enriched among all proximal superficial DEGs (padj=0.021) and among only those that were downregulated (padj=0.0024). However, IRF1 was not itself differentially expressed in superficial cells in SSc compared to GERD. Compared to HCs, three transcription factors were significantly enriched in SSc in the proximal esophagus, including MYC (padj=2.1ξ10-7) and E2F4 (padj=5.5ξ10-5) for upregulated DEGs and NFE2L2 for downregulated DEGs (padj=7.6ξ10-4). MYC and E2F4 were also themselves differentially expressed in SSc compared to HCs in superficial cells (padj=7.8ξ10-5, padj=9.8ξ10-4, respectively). MYC was the most enriched transcription factor in genes upregulated in GERD compared to HCs, but the enrichment was not significant after adjusting for multiple testing (padj=0.11). No transcription factors were significantly enriched in any pairwise comparison in the distal esophagus.
Expression mapping in superficial EECs
Examining the expression patterns within superficial cells from the proximal esophagus more closely, we identified five distinct clusters (Figure 6A) distinguished by unique expression marker patterns (Figure 6B, Supplemental Figure 6, Supplemental Table 11) and varying degrees of differentiation (Figure 6C). Clusters 1 and 3 represent the outermost, terminally differentiated epithelial layers, as evidenced by their FLG expression79–81. They differed by the expression of metallothioneins, which were predominantly expressed in cluster 3 (Figure 6, E-F). Cluster 3 was less abundant in SSc compared to HC with P<0.05 (Figure 6D), but metallothionein expression was lower in other superficial clusters, as well (Figure 6F). Within the superficial EECs, the relative expression of metallothioneins was significantly correlated with the differentiation score (r=0.06, P=2.3ξ10-23). Cluster 2 was distinguished by its relatively high CTSV expression. Clusters 4 and 5 were the least differentiated of the superficial cells and were distinguished by their relative expression of GJB6 and FGFBP1, respectively, although neither gene was differentially expressed between conditions.
For the expression of the significantly enriched transcription factors (IRF1, MYC, E2F4, and NFE2L2; Table 2) and their targets, we observed two consistent patterns (Figure 7). First, the greatest magnitude differences in median target expression between SSc and HCs were seen in cluster 5. Second, in each of those instances, the median target expression in GERD was between that of SSc and HCs. The expression correlations between the enriched transcription factors and their targets were positive for MYC (r=0.42) and E2F4 (r=0.24) and negative for IRF1 (r=-0.20) and NFE2L2 (r=-0.18).
FLI1 expression in human EECs
Deletion of the Fli1 gene in epithelial cells in mice has been shown to recapitulate histological and molecular features of esophageal involvement in SSc24. Therefore, we investigated the expression of FLI1 in our human esophageal samples. The expression of FLI1 in human EECs was negligible and did not meet the minimum expression thresholds for inclusion in our differential gene expression analysis. We did observe low FLI1 expression levels in endothelial cells (Supplemental Figure 7A), but the expression differences between SSc, HCs, and GERD was not significant (Supplemental Figure 7B). There were many FLI1 downstream targets among the proximal, superficial DEGs between SSc and HCs and seven between SSc and GERD (ARPC2, RAB24, MTPN, PPIF upregulated; CEBPZ, TRMT10C, TBP downregulated); however, the proportion of these genes among all DEGs was not different than what would be expected by chance (P>0.05), and the average expression of FLI1 targets was higher in GERD and lower in HCs compared to the expression in SSc (Supplemental Figure 7C).
Correlations with clinical phenotypes
We next evaluated whether the cellular and molecular changes we observed between SSc, GERD, and HCs were correlated with specific clinical measures of esophageal dysfunction (Table 1). To test for associations with esophageal motility phenotypes, we performed ordinal regression against increasing phenotype severity. To more efficiently model the covariance structure among quantitative, functional esophageal measures in SSc (Figure 8A), we performed PCA on a set of nine clinical metrics and tested for correlations with the first two PCs (Figure 8B). The first PC explained 45.5% of variance and was most correlated with FLIP intra-balloon pressure at 60ml (r=0.94), The second PC explained 22.6% of variance and was most correlated with HRM basal EGJ pressure (r=0.74).
We did not observe any significant correlations between epithelial cell compartment proportions in SSc and esophageal motility phenotypes. It appeared that the low proportion of superficial cells in the esophageal epithelium was a universal feature in SSc, regardless of phenotype (Supplemental Figure 8A). The proportion of superficial cells decreased with disease duration, but the trend was not statistically significant (P=0.1; Supplemental Figure 8B). We also did not observe any significant correlations between EEC compartment proportions and quantitative trait PCs (Figure 8C).
We then tested for correlations with gene expression in superficial EECs for genes that were significantly differentially expressed in SSc compared to HCs (Padj<0.05) and nominally differentially expressed in SSc compared to GERD (P<0.05). There were 433 genes that met these criteria in the proximal esophagus and 99 in the distal esophagus. Although no associations with esophageal motility phenotypes were statistically significant after adjusting for multiple testing, the strongest phenotypic correlation was with TRIM11 (PFLIP=0.01, PHRM=0.007, Supplemental Figure 8C), a gene recently found to attenuate Treg cell differentiation in CD4+ T cells in mice82. No associations with the top two clinical PCs were significant after adjusting for multiple testing. We also tested for clinical PC correlations with aggregate metallothionein gene expression and transcription factor target expression for IRF1, MYC, E2F4, and NFE2L2. We observed one nominal association between the mean metallothionein module score and PC1 (P=0.02, Figure 8D). Notably, within superficial cells we also observed a strong correlation between the proportion of metallothionein-expressing cluster 3 cells and PC1 in the distal esophagus (Figure 8E). This indicates that not only is this population of cells reduced in SSc, but its relative decrease is further correlated with more severe esophageal involvement (Figure 8F). Therefore, our findings suggest that increased esophageal dysmotility in SSc is associated with a decline in metallothionein-expressing superficial EECs, which coincides with lower overall metallothionein expression in superficial cells.
DISCUSSION
Esophageal dysfunction is extremely common in SSc and is significantly associated with increased mortality and lower quality of life83,84. The causal mechanisms by which SSc affects the esophagus, however, have not been conclusively determined. Here, we sought to clarify the role of epithelial cells in SSc esophageal dysfunction by using scRNA-seq to quantify cellular and transcriptional changes relative to HCs and individuals with GERD. While our findings indicate that epithelial changes in SSc result primarily from chronic acid exposure, they also highlight immunoregulatory pathways uniquely altered in SSc that may be linked to pathogenic aberrations.
With epithelial layers of the skin and internal organs being primary sites of injury in SSc, the pathogenic role of epithelial cells in SSc has long been an ongoing area of research. When damaged, epithelial cells release signals that help induce fibroblast activation to promote wound healing85. In SSc, epidermal keratinocyte characteristics resemble a pro-fibrotic, activated state86. SSc epidermal cells were also found to stimulate fibroblasts in culture87, and some signs related to epithelial-mesenchymal transition (EMT) have been observed in the SSc epidermis88. The FLI1 (friend leukemia integration 1) transcription factor, in particular, has received much attention for its potential role in epithelial-cell-mediated SSc pathogenesis20. Lower FLI1 expression was observed in the epidermis of diffuse cutaneous SSc, and inactivation of FLI1 in human keratinocytes in vitro induced gene expression changes characteristic of SSc24. Furthermore, conditional deletion of Fli1 in epithelial cells produced an SSc-like phenotype in mice24. Importantly, the epithelial-cell-specific Fli1 knockout further recapitulated the esophageal involvement of SSc, with increased collagen deposition in the lamina propria combined with atrophy of the circular muscle layer, and the esophageal changes were not mediated by T-cell autoimmunity24. In contrast to these previous studies in mice and epidermal keratinocytes, we did not observe FLI1 expression in EECs in any condition. FLI1 was expressed at low levels in endothelial cells, but the expression differences between SSc, HCs, and GERD were not significant. We did observe significantly increased expression in SSc and GERD of the PI3 gene, which encodes trappin-2/elafin, previously found to be induced by Fli1 silencing in human dermal microvascular endothelial cells89. However, this association is unlikely to be pathogenic, as the increase in gene expression was greater in GERD, and a growing body of evidence indicates that trappin-2/elafin is expressed to promote tissue repair in response to gastrointestinal tissue inflammation90.
While there have been many studies of epithelial cells in skin in SSc, molecular study of the human esophagus in SSc had heretofore been limited to one array-based gene expression study in bulk tissue conducted by Taroni and colleagues15, in which they identified distinct expression signatures among variable genes between 15 individuals with SSc that were similar to those seen in skin91. These signatures were independent of clinically defined SSc subsets15, suggestive of inter-individual heterogeneity in SSc. They also found that gene expression patterns between upper and lower esophageal biopsies within individuals were nearly identical. They did not observe statistical associations between inflammatory gene expression signatures and GERD, but the study was underpowered for such an analysis, and GERD was defined based on histological evidence for basal cell hyperplasia and intraepithelial lymphocyte counts.
The gene expression differences we observed in SSc epithelial cells were highly correlated with those seen in GERD, and nearly all differential gene expression was limited to the superficial layers of the epithelium. These observations suggest that the primary driver of differential gene expression in EECs in SSc is chronic acid exposure. The genes with the strongest mutual upregulation in SSc and GERD were primarily related to keratinization/cornification, including small proline-rich proteins (SPRRs), serine protease inhibitors (serpins), S100 proteins, late cornified envelope (LCE) genes, and keratins associated with terminal epithelial differentiation, namely KRTDAP and KRT. KRT1 helps maintain epithelial barrier function in gastrointestinal epithelial cells, and its overexpression was shown to attenuate IL-1β-induced epithelial permeability92. The mutual upregulation of these genes in the outer layers of the epithelium therefore likely represent a protective response to chronic acid exposure.
While the overall gene expression differences were highly correlated between SSc and GERD, there were sets of genes disproportionately upregulated in SSc that hint at disease-related immune response aberrations, including genes that have previously been implicated in SSc pathogenesis. For example, PTGES, which encodes prostaglandin E synthase, was previously found to be significantly upregulated in SSc fibroblasts93 and inflammatory non-classical monocytes94, and Ptges-null mice were resistant to bleomycin-induced fibrosis93. CD44 and CD74 form the receptor complex for the macrophage inhibitory factor (MIF), an inflammatory cytokine that promotes fibroblast migration and has been implicated in SSc pathogenesis95. LTB4R (Leukotriene B4 Receptor, or BLT1) has been found to activate AKT/mTOR signaling and knockdown of LTB4R attenuated fibrosis in bleomycin-induced SSc in murine models96. Serum HBEGF (heparin-binding epidermal growth factor) levels and fibroblast HBEGF expression were both significantly elevated in SSc97. Copy number variation of APOBEC3A was significantly associated with SSc in a Han Chinese population98. Finally, the SSc-specific upregulation of genes involved in antigen processing and presentation (HLA-B, CD74, TAP1, PSMB8, PSMB9) are suggestive of increased human leukocyte antigen (HLA) class I activity in superficial EECs in SSc. Notably, TAP1, PSMB8, and PSMB9 are located immediately adjacent to one another in the class II region of the HLA locus, which is the genomic region with the strongest genetic associations with SSc99, and SSc-associated HLA-B alleles have also been identified100,101.
The most striking set of genes that were uniquely downregulated in SSc were metallothioneins. The expression of detected metallothioneins (MT1A, MT1E, MT1F, MT1G, MT1H, MT1M, MT1X, MT2A) was strongly induced in the superficial compartment of EECs, but significantly less so in SSc and, interestingly, only in the proximal esophagus. We further observed a nominal association between relative metallothionein expression in the proximal esophagus and the first PC of esophageal dysmotility among SSc patients, with lower expression of metallothioneins correlated with greater EGJ distensibility and weaker contractility. The proportion of the primary metallothionein-expressing cluster within superficial EECs was likewise significantly associated with clinical PC1. The metallothioneins are a family of well-conserved, metal-binding proteins that regulate zinc and copper homeostasis, prevent heavy metal poisoning, and combat oxidative stress102,103. Their study in a wide array of physiological conditions and immune-mediated diseases has revealed the proteins play central roles in innate and adaptive immunity, including autoimmunity, but their immunoregulatory behavior is highly complex and context-specific102,103. Given their import in immune regulation, their associations with SSc and esophageal motility reported in this study, and the connection between SSc and heavy metal exposure104, the metallothioneins are compelling candidates that warrant further study in SSc.
Interestingly, there was greater relative gene dysregulation in the proximal esophagus compared to the distal esophagus in SSc, but the opposite was true for GERD. This discrepancy suggests that esophageal dysmotility in SSc leads to greater acid exposure in the proximal esophagus, since refluxed gastric acid typically affects the distal esophagus significantly more than the proximal esophagus105 and abnormal peristalsis prolongs acid clearance106. This finding coincides with a bulk RNA-seq study of esophageal mucosa of achalasia patients107, which likewise identified more differential expression in the proximal esophagus than in the distal esophagus, including significant enrichment of matrisome-associated genes and lower expression of genes associated with reactive oxygen species metabolism107. Due to differences in innervation depth108, the epithelium of the proximal esophagus has more nociceptive sensitivity than the distal esophagus, which likely contributes to the association between gastrointestinal symptoms and lower quality of life in SSc109. Proximal acid exposure increases the risk of aspiration110 and is significantly more prevalent in SSc patients with idiopathic lung fibrosis111.
Targets for several transcription factors, including IRF1, MYC, E2F4, and NFE2L2 (or NRF2), were significantly enriched among DEGs in superficial EECs from the proximal esophagus. IRF1 is involved innate immune responses, but while IRF1 targets were enriched in DEGs between SSc and GERD, the enrichment appeared to be driven by upregulation in GERD. IRF1 targets were not enriched in DEGs between SSc and HCs, nor was IRF1 significantly differentially expressed in SSc. MYC, E2F4, and NFE2L2 all participate in different stages of the cell cycle112–114, and expression of each was highest in either the proliferating basal or suprabasal compartments, but these proteins may also play a role in terminal differentiation of EECs, particularly under conditions of stress. In the least differentiated superficial cluster (cluster 5), where the expression differences relative to HCs for each of these transcription factors and their targets was most pronounced, the expression level changes in GERD were correlated with those observed in SSc. Furthermore, Myc ablation in Krt14-expressing tissues in mice led to abnormal terminal differentiation of epithelial cells115, and a study of the esophageal epithelium in Nrf2 knockout mice indicated that Nrf2 regulates cornification of keratinized epithelial cells under conditions of stress116.
There are important limitations to consider when interpreting this study. First, this study only focused on squamous epithelial cells. There may be relevant changes in other cell types within the esophageal mucosa that correlate with SSc and dysmotility, but such analyses are ongoing and were outside the scope of this investigation. Second, the relatively small patient sample size of this study combined with the heterogeneity of SSc subtypes and dysmotility phenotypes limited our power to detect clinical associations between cell type proportions or gene expression changes and specific clinical phenotypes. Heterogeneous molecular profiles in SSc esophagus samples have been described previously15, but we grouped all SSc patients together in our analyses for comparisons against GERD and HCs. We applied ordinal logistic regression on motility classifications and performed principal components analysis on reflux- and motility-related quantitative traits to increase our power to detect clinical associations but were nonetheless statistically limited by sample size. Third, we did not examine imaging data in the SSc esophageal mucosa. Therefore, we cannot make conclusions on the morphological states that accompany the transcriptional and cellular proportion changes that we observed. Future studies utilizing larger, more homogeneous cohorts with imaging data may be able to detect more esophageal epithelial changes associated with dysmotility in SSc.
In summary, esophageal complications are common in individuals with SSc and can greatly impact quality of life and lifespan. In this study, we sought to clarify the role of the epithelium in SSc esophageal involvement. Through a thorough, single-cell-level transcriptomic investigation, we identified the unique cellular and transcriptional differences present in the SSc esophageal epithelium. There were significantly fewer terminally differentiated epithelial cells in the outermost, superficial layers in both the proximal and distal esophagus, but otherwise epithelial compartment proportions were similar to healthy controls, including proliferating cell proportions. Significant differences in gene expression were likewise almost exclusively found in the superficial compartment, and these changes were highly correlated with those seen in individuals with GERD, indicating that the primary driver of differential gene expression in the SSc esophageal epithelium is chronic acid exposure. Transcriptional differences were also more prominent in the proximal esophagus of SSc patients, suggesting that esophageal dysmotility leads to greater proximal exposure. Genes that were most disproportionately dysregulated in SSc compared to GERD belonged to immune-related pathways, including innate and adaptive response, antigen presentation, and metal homeostasis, and may therefore point to pathogenic immune dysfunction. Future studies investigating these pathways in non-epithelial cell populations in more homogeneous SSc cohorts may further resolve the pathogenic mechanisms driving esophageal involvement in SSc. By serving as an atlas for the human esophageal epithelium in SSc, this study can guide future efforts to address remaining gaps in our understanding of the SSc esophagus, to ultimately uncover pathogenic mechanisms and identify actionable targets.
Data Availability
Raw and processed sequencing data with corresponding metadata will be deposited in a genomic data repository upon study publication in a peer-reviewed journal.
SUPPLEMENTAL FIGURES
Footnotes
↵* These authors jointly supervised the work
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.↵
- 38.↵
- 39.
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.
- 71.↵
- 72.↵
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.
- 114.↵
- 115.↵
- 116.↵
- 117.