Abstract
Crohn’s disease is an inflammatory bowel disease (IBD) which most often presents with patchy lesions in the terminal ileum and colon and requires complex clinical care. Recent advances in the targeting of cytokines and leukocyte migration have greatly advanced treatment options, but most patients still relapse and inevitably progress. Although single-cell approaches are transforming our ability to understand the barrier tissue biology of inflammatory disease, comprehensive single-cell RNA-sequencing (scRNA-seq) atlases of IBD to date have largely sampled pre-treated patients with established disease. This has limited our understanding of which cell types, subsets, and states at diagnosis are predictive of disease severity and response to treatment. Here, through a combined clinical, flow cytometric, and scRNA-seq study, we profile diagnostic human biopsies from the terminal ileum of treatment-naïve pediatric patients with Crohn’s disease (pediCD; n=14) and from non-inflamed pediatric controls with functional gastrointestinal disorders (FGID; n=13). To fully resolve and annotate epithelial, stromal, and immune cell states among the 201,883 single-cell transcriptomes, we develop and deploy a principled and unbiased tiered clustering approach, ARBOL, yielding 138 FGID and 305 pediCD end cell clusters. Notably, through both flow cytometry and scRNA-seq, we observe that at the level of broad cell types, treatment-naïve pediCD is not readily distinguishable from FGID in cellular composition. However, by integrating high-resolution scRNA-seq analysis, we identify significant differences in cell states that arise during pediCD relative to FGID. Furthermore, by closely linking our scRNA-seq analysis with clinical meta-data, we resolve a vector of lymphoid, myeloid, and epithelial cell states in treatment-naïve samples which can distinguish patients with less severe disease (those not on anti-TNF therapies (NOA)), from those with more severe disease at presentation who require anti-TNF therapies. Moreover, this vector was also able to distinguish those patients that achieve a full response (FR) to anti-TNF blockade from those more treatment-resistant patients who only achieve a partial response (PR). Our study jointly leverages a treatment-naïve cohort, high-resolution principled scRNA-seq data analysis, and clinical outcomes to understand which baseline cell states may predict inflammatory disease trajectory.
Introduction
Inflammatory bowel diseases (IBDs) arise when homeostatic mechanisms regulating gastrointestinal (GI) tract tissue integrity, nutrient absorption, and protective immunity are replaced by pathogenic inflammation (Baumgart and Sandborn, 2012; Chang, 2020; Corridoni et al., 2020a; Friedrich et al., 2019; Graham and Xavier, 2020; Selin et al., 2021). The initiating triggers are not fully known, but host genetics and the microbiome are being increasingly appreciated to play important, and in some cases causal roles in the IBDs (Chang, 2020; Cohen et al., 2019; Franzosa et al., 2019; Jain et al., 2021; Limon et al., 2019). Though symptoms are widespread and overlapping, the endoscopic inflammation in the IBDs is often anatomically restricted: ulcerative colitis (UC) manifests primarily as an superficial inflammatory response restricted to the colon, while Crohn’s disease (CD) presents predominantly in the terminal ileum and the proximal colon, though lesions may develop anywhere along the gastrointestinal tract (Baumgart and Sandborn, 2012; Chang, 2020; Kobayashi et al., 2020; Roda et al., 2020). Among the IBDs, pediatric-onset Crohn’s disease (pediCD) is particularly common (25% of all IBD cases, 60-70 % of pediatric IBD) and is a debilitating form due to its early presentation, impact on the terminal ileum and proximal colon, and the lack of disease-specific therapies developed with children in mind (Hyams et al., 1991; Ruemmele et al., 2014; Sýkora et al., 2018; Turner et al., 2012; Ye et al., 2020). In contrast to pediCD, a group of pediatric disorders termed functional gastrointestinal disorders (FGIDs) include GI symptoms but lack laboratory markers, endoscopic findings, and histologic evidence associated with inflammation (Black et al., 2020; Hyams et al., 2016; McOmber and Shulman, 2008; Santucci et al., 2020). FGID thus represents a critical non- inflamed control cohort with which to contextualize the inflammation observed in pediCD.
The current standard of care for pediCD (as with adult CD) is tailored to the patient’s disease location, clinical behavior and severity, though use of prednisone, immunomodulators, as well as biologics including anti-TNF-α monoclonal antibodies, are common (Hyams et al., 1991; Ruemmele et al., 2014; Turner et al., 2012). While targeting TNF is common across many autoimmune and inflammatory conditions, it is not successful in all patients, and many go on to develop anti-TNF-refractory disease. It is of tremendous importance for the field to precisely understand and characterize for which patients anti-TNF therapy is not necessary, in which patients it may succeed in controlling disease, and which patients will be refractory to treatment. Several ideas based on individual immunogenicity and pharmacokinetics have been proposed to explain TNF-refractory disease, including gender (M>F), low albumin levels, high BMI, and high baseline C-Reactive Protein (CRP) (Atreya et al., 2020; Digby-Bell et al., 2020). However, no single identifiable clinical or biochemical biomarker reliably predicts disease response versus resistance to anti-TNF antibodies suggesting a more complex etiology (Stevens et al., 2018).
The primary cellular lineages sampled from intestinal biopsies of CD patients (from the terminal ileum or colon) represent both the epithelium and lamina propria, and include epithelial cells, stromal cells, hematopoietic cells and neuronal processes whose cell bodies are present outside of these regions (Buisine et al., 2001; Leeb et al., 2003; Leonard et al., 1995; Lilja et al., 2000; Müller et al., 1998; Souza et al., 1999; Stappenbeck and McGovern, 2017; Takayama et al., 2010). Alterations to all cellular lineages have been implicated in CD (Furey et al., 2019; Martin et al., 2019). Histologically, CD is characterized by a granulomatous inflammation, and noted alterations in almost every leukocyte cell type studied including an increase of cytotoxic lymphocytes, potential but equivocal alterations in γδ T cells, increases in mast cells and their production of TNF, activation and shifts in antibody isotypes towards IgM and IgG from B cells and plasma cells, and cytokine production by macrophages (Catalan-Serra et al., 2017; Lilja et al., 2000; Meijer et al., 1979; Mitsialis et al., 2020; Müller et al., 1998; Sieber et al., 1984; Takayama et al., 2010). In the stromal compartment, there is evidence for enhanced vascularization and increased expression of ICAM-1 and MAdCAM-1 by vascular beds, substantial remodeling of collecting lymphatics, and altered migratory potential of fibroblasts (Leeb et al., 2003; Souza et al., 1999). Epithelial barrier dysfunction has also been noted, including alterations to mucus production, microvilli, and Paneth cell dysfunction (Buisine et al., 2001; Stappenbeck and McGovern, 2017). Collectively previous studies have identified that all cell types may be meaningfully altered during CD, and highlight the important need to comprehensively understand the concerted cellular changes that define CD. Importantly, which changes are predictive of more severe disease or treatment resistance in pediCD remain largely unknown.
Massively parallel single-cell RNA-sequencing (scRNA-seq) is enhancing our ability to comprehensively map and resolve the cell types, subsets, and states present during health and disease. This has been particularly evident in the elucidation of novel human cell subsets and states within epithelial, stromal, immune, and neuronal cell lineages. Recent work has generated cellular atlases of previously treated (pre-treated) adult CD and UC, though a comprehensive single-cell atlas for untreated pediatric disease with follow-up of patient outcomes has yet to be reported for IBD (Corridoni et al., 2020a, 2020b; Drokhlyansky et al., 2019; Elmentaite et al., 2020; Huang et al., 2019; Kinchen et al., 2018; Martin et al., 2019; Parikh et al., 2019; Smillie et al., 2019). The potential impact of scRNA-seq on our understanding of IBD is evidenced by studies of adult UC, which have identified potential functional roles for poorly understood colonic cell subsets, such as the BEST4+ enterocyte, and identified pathological alterations in UC pinch biopsies compared to healthy controls, including an expansion of microfold-like cells, IL13RA2+IL11+ inflammatory fibroblasts, CD4+CD8+IL17A+ T cells and CD8+GZMK+ T cells (Smillie et al., 2019). A single-cell study of pre-treated adult CD patients comparing non-inflamed and inflamed tissue from surgically-resected bowel found IgG+ plasma cells, inflammatory mononuclear phagocytes, activated T cells and stromal cells comprising the “GIMATS” pathogenic cellular module (Martin et al., 2019). This module was used to derive a gene signature associated with resistance to anti-TNF therapy in a distinct cohort profiled by bulk RNA-seq. Very recent work has profiled how fetal transcription factors are reactivated in Crohn’s disease epithelium (Elmentaite et al., 2020). Together, these and other studies have demonstrated the power of scRNA-seq to nominate individual and collective cell states that are associated with disease, and have also underscored the unmet need to apply these techniques to untreated disease and associate them with disease severity in order to more specifically identify pathognomonic and prognostic cell states.
Most comprehensive scRNA-seq atlases of inflammatory disease conditions consist of patients being treated with a variety of agents, and for which the biopsies included often reflect a partial treatment-refractory state to combinations of antibiotics, corticosteroids, immunomodulators, and biologics including anti-TNF monoclonal antibodies. A treatment-naïve single-cell atlas in an inflammatory disease condition linking observed baseline cell clusters with disease trajectory and treatment outcomes has yet to be reported. In order to address this unmet need in pediCD, we created the prospective PREDICT study (Clinicaltrials.gov #NCT03369353) to help identify, profile, and understand pediatric IBD and FGID controls. Here, we present detailed diagnostic data from the first cohort of 27 patients enrolled on PREDICT, including 14 pediCD and 13 FGID patients, together with flow cytometric and scRNA-seq studies of the cellular composition of the terminal ileum (Figure 1). Furthermore, through thorough, prospective annotation of clinical metadata and detailed longitudinal follow-up, we stratify our pediCD cohort by clinically-guided therapeutic decisions separating patients treated with anti-TNF mAbs versus those with biopsy- proven pediCD, but for whom clinical symptoms were sufficiently mild that the treating physician did not prescribe anti-TNF agents (this cohort is termed “Not On Anti-TNF” or “NOA”). Importantly, we were also able to separate the cohort of patients treated with anti-TNF agents into a sub-cohort of those who achieved a full response (FR) to this therapy, versus those who achieved only a partial response (PR). Critically, because PREDICT enrolled patients prior to their diagnostic endoscopy, we were able to relate these clinical outcomes to the patients’ cell states at diagnosis. We contextualize our findings in pediCD relative to a cohort of 13 FGID patients, which provides an age-matched comparator cohort with clinical GI symptoms, but non- inflammatory disease proven by endoscopy and histologic examination.
Several analytical approaches have been developed to enable the generation and interrogation of clusters during the curation of single-cell transcriptomic atlases (Hie et al., 2020). One such method, sub-clustering of broad clusters, has proven to be a powerful tool for isolating highly specific axes of variation that are obscured by analyses whose principal axes of variation are broad cell types (La Manno et al., 2021; Tasic et al., 2018; Zeisel et al., 2018). However, while sub-clustering analysis is a powerful tool allowing access to the hierarchy of cell states, this method is manually intensive and there is little consensus, control, or standard in clustering parameters or annotation methods. To address this issue, we developed a principled, modular, automated sub-clustering routine made possible by application of parameter scanning methods (Rousseeuw, 1987; Shekhar et al., 2016). We developed this tool, ARBOL, of which iterative tiered clustering (ITC) is a key component, in R, integrating with Seurat functions, to make it accessible and easily incorporated into common workflows and have curated a GitHub repository with illustrative vignettes. Here, we use ARBOL to standardize fine-grained cell state discovery by the creation and cultivation of a tree of cell states, followed by the generation of automated cell names to aid in the annotation of end clusters by unique and descriptive genes.
Together we present two cellular atlases for pediatric GI disease, consisting of 94,451 cells for FGID and 107,432 for pediCD. We provide key gene-list resources for further studies, identify correspondence between disease states, and nominate a vector of lymphoid, myeloid and epithelial cell states which predicts disease severity and treatment outcomes. This cellular vector correlates strongly with both the clinical presentation of pediCD severity, and to the distinction between anti-TNF full or partial response. The significant changes in cell composition associated with disease severity were increases of proliferating T cells, cytotoxic NK cells, specific monocytes/macrophages, and plasmacytoid dendritic cells (pDCs) accompanied by decreases of metabolically-specialized epithelial cell subsets. We further validate this vector in two bulk RNA- seq treatment-naïve IBD cohorts.
Results
Study cohort outcomes
The PREDICT study prospectively enrolled treatment-naïve, previously undiagnosed pediatric patients with GI complaints necessitating diagnostic endoscopy. The current analysis focuses on patients enrolled in the first year of the study, during which time 14 patients with pediCD and 13 patients with FGID were enrolled and had adequate ileal samples for single cell analysis (Figure 1; Supplemental Figure 1). Following their initial diagnosis, patients with pediCD were followed clinically for up to 3 years. Patients with FGID were followed up as needed in subspecialty/GI clinic. The median time from diagnosis for the pediCD and FGID cohorts as of December 1, 2020 (time of database lock) was 32.5 and 31 months, respectively. Of the pediCD and FGID patients analyzed, the median age at diagnosis was 12.5 years and 16 years respectively (p = 0.095), with no significant differences in gender (Figure 1b; Table 1). Patient weight, height, and BMI z-scores were not significantly different between pediCD and FGID (Figure 1b; Table 1); however, in addition to the diagnostic differences on histologic analysis, several key clinical laboratory values, including C-reactive protein (CRP), Erythrocyte Sedimentation Rate (ESR), hemoglobin concentration, albumin concentration, and platelet count were significantly different between pediCD and FGID (Figure 1c,d; Table 1).
Treatment with anti-TNF agents and response to therapy
Patients with pediCD were initially divided into two cohorts. Those with milder disease characteristics (n = 4) as determined by their treating physician, were not put on anti-TNF therapies, and are noted as NOA. For patients with more severe disease (n = 10), anti-TNF therapy (with either infliximab or adalimumab, Table 2) was initiated within 90 days of diagnostic endoscopy. All pediCD patients were followed prospectively and categorized as FR (n = 5) or PR (n = 5) to anti-TNF therapy based on the following criteria: FR was defined as clinical symptom control and biochemical response (measuring CRP, ESR, albumin, and complete blood counts (CBC)), and with a weighted Pediatric Crohn’s Disease Activity Index (PCDAI) score of <12.5 on maintenance anti-TNF therapy with no dose adjustments required (Cappello and Morreale, 2016; Hyams et al., 1991; Sandborn, 2014; Turner et al., 2012, 2017). PR to anti-TNF therapy was defined as a lack of full clinical symptom control as determined by the treating physician or lack of full biochemical response, with documented escalation of anti-TNF therapy or addition of other agents (Figure 1e; NB: patients in our cohort were dose escalated because of clinical symptoms). Medication timelines and clinical laboratory data through 2 years of follow-up for all pediCD patients is shown in Supplemental Figure 1. The designation of FR or PR was made at 2 years of follow-up for all pediCD patients.
Flow cytometry of the terminal ileum reveals minimal changes in leukocyte subsets in FGID vs. pediCD, and no significant differences across the pediCD spectrum
We collected terminal ileum biopsies from 14 pediCD patients and from 13 uninflamed FGID patients, and prepared single-cell suspensions for flow cytometry and scRNA-seq. Biopsies from pediCD were from actively-inflamed areas adjacent to ulcerations. Biopsies from FGID were from non-inflamed terminal ileum. The epithelium was first separated from the lamina propria before enzymatic dissociation, and flow cytometric analysis was performed on the viable single-cell fraction, which recovered predominantly hematopoietic cells with some remnant epithelial cells (<20% of all cells), likely representing those in deeper crypt regions (Figure 2; Supplemental Figure 2). We utilized two flow cytometry panels, allowing us to resolve the principal lymphoid (CD4 or CD8 T cells, NK cells, B cells, innate lymphoid cells, γδ T cells, CD8αα+ IELs, pDCs) and myeloid (monocytes, granulocytes, HLA-DR+ mononuclear phagocyte) cell subsets (Supplemental Figure 2, Supplemental Table 1). From these panels, which generated 32 gates identifying cell lineages, types and subsets, only HLA-DR+ macrophages/DCs and pDCs were significantly increased in pediCD relative to FGID (Figure 2d). We also analyzed within pediCD, comparing the baseline samples of 4 NOA, 5 FR and 5 PR patients, and noted no significant differences between NOA and patients on anti-TNF, or between FRs and PRs to anti-TNF (Figure 2e). Together, this suggests that despite the substantial endoscopic, histologic, and clinical parameters that distinguish FGID and pediCD (Figure 1), the broad single-cell type composition of the terminal ileum appears minimally altered in pediCD save for an increased frequency of pDCs and HLA-DR+ mononuclear phagocytes.
Traditional joint scRNA-seq clustering of FGID and pediCD patients
In addition to flow cytometry, we performed droplet-based scRNA-seq on cell suspensions from the 14 pediCD/13 FGID patient cohort using the 10X Genomics V2 3’ platform (Figure 1). The analyzed cell suspensions were derived from lamina propria preparations, which our flow cytometry data suggested would be composed primarily of CD45+ leukocytes, alongside a small fraction of epithelial cells and stromal/vascular cells. Deconstructing these tissues into their component cells provided us with the ability to identify some of the corresponding cell types (e.g. T or B cell) and subsets (CD8αα+ IEL or CD4+ T cell) to those we identified by flow cytometry data but importantly also enabled us to: 1. characterize these major cell types and subsets without needing to pre-select markers, and 2. gain substantially enhanced resolution into the cell states (i.e. gene expression programs) within these types and subsets.
Following library preparation and sequencing, we derived a unified cells-by-genes expression matrix from the 27 samples, containing digital gene expression values for all cells passing quality thresholds (n=254,911 cells; Supplemental Figure 3; Supplemental Tables 2 and 3; Methods). We then performed dimensionality reduction and graph-based clustering, noting that despite no computational integration methods being used, FGID and pediCD were highly similar to each other when visualized on a uniform manifold approximation and projection (UMAP) plot (Supplemental Figure 4a-c). We recovered the following cell types from both patient groups: epithelial cells, T cells, B cells, plasma cells, glial cells, endothelial cells, myeloid cells, mast cells, fibroblasts, and a proliferating cell cluster. We noted that the fractional composition amongst all cells of T cells, B cells, and myeloid cells was not significantly different between FGID and pediCD, similar to the flow cytometric data, and this was also the case for endothelial, fibroblasts, glial, mast and plasma cells, which were not measured through flow cytometry (Supplemental Figure 4d). This provided validation and extension of our flow cytometry data documenting that the broad cell type composition of FGID and pediCD is not significantly altered, despite highly distinct endoscopic and histologic diseases. Based on this joint clustering and annotation of top-level cell types, we then performed differential expression testing identifying significant up- and down- regulated genes across all cell types (Supplemental Figure 4e; Supplemental Table 4). Within myeloid cells we identified some of the most significantly upregulated genes in pediCD versus FGID to be CXCL9 and CXCL10, canonical IFNγ-stimulated genes, and S100A8 and S100A9 which form the biomarker fecal calprotectin (Supplemental Figure 4f)(Leach et al., 2007; Ziegler et al., 2021). Within epithelial cells we identified that APOA1 and APOA4 were amongst the most significantly downregulated genes, and correspondingly REG1B, SPINK4 and REG4A were amongst the most significantly upregulated indicating tradeoffs between lipid metabolism and host defense (Supplemental Figure 4f) (Haberman et al., 2014). In T cells, we noted that the cytotoxic genes GNLY and GZMA were amongst the most significantly upregulated, with almost no genes downregulated (Supplemental Figure 4f).
We then systematically re-clustered each broad cell type, identifying increasing cellular heterogeneity. Given that we detected changes in the frequency of HLA-DR+ macrophages/dendritic cells and pDCs between pediCD and FGID by flow cytometry, we initially focused on the myeloid cell type sub-clustering, containing dendritic cells, macrophages, monocytes, and pDCs (Supplemental Figure 4g). Working within this analysis paradigm revealed that a traditional clustering approach had difficulty identifying the boundaries of clusters, and whether a cluster composed primarily of pediCD rather than FGID cells represented a unique cell subset, or a cell state overlaid onto a core cell subset gene expression program (Supplemental Figure 4g, Methods). These distinctions would influence whether a comparison would primarily focus on differential expression testing or differential composition testing between the two clinical cohorts. It also raised the possibility that this joint clustering approach, informed by the inclusion of both FGID and pediCD cell types, subsets and states could muddle some of the unique biology of FGID and pediCD. For instance, this analytical approach could lead to hybrid clusters, informed by cells from both FGID and pediCD and correspondingly critical gene- reference lists for each cluster, that may not accurately represent disease-specific cell types, subsets, or states.
In order to approach this challenge from a more principled direction, we made five key changes to our analytical workflow, which we jointly refer to as ARBOL (https://github.com/jo-m-lab/ARBOL): 1. We proceeded to analyze FGID and pediCD samples separately to define corresponding cell type, subset, and state clusters and markers, 2.We implemented an automated ITC approach to optimize the silhouette score at each tier of iterative sub-clustering and stop when a specific granularity was reached (Supplemental Figure 5a,b; Methods), 3. We systematically generated descriptive names for cell types and subsets together with differentiating marker genes, 4. We accounted for the number and diversity of patients which compose each cluster using Simpson’s Index of Diversity, and 5. We generated and optimized a Random Forest classifier to identify correspondence between the resultant FGID and pediCD atlases (Supplemental Figure 5c; Methods) (Simpson, 1949). Using this approach, each tier of analysis is typically under-clustered relative to traditional empirical analyses, but the automation proceeds through several more tiers (typically 6 to 7) until stop conditions (e.g. cell numbers and differentially expressed genes; Methods) are met. We then inspected all outputs (182 FGID and 425 pediCD clusters) and provided descriptive cell cluster names independently for FGID and pediCD (Supplemental Figures 5b and 6). We also focused at this stage on flagging putative doublet clusters or clusters where the majority of differentially expressed genes which triggered further clustering consist of known technical confounders in scRNA-seq data (e.g. mitochondrial, ribosomal, and spillover genes from cells with high secretory capacity) yielding a final number of 118 FGID and 305 pediCD clusters (Supplemental Figure 5b). We note this clustering method represents a data-driven approach, though it may not always reflect a cellular program or transcriptional module of known biological significance.
We then hierarchically clustered all end cell state clusters to generate the final dendrograms for FGID and pediCD, and performed 1 vs. rest within-Tier 1 clusters (i.e. broad cell types) differential expression to provide systematic names for cells based on their cell type classification and two genes (Figures 3 and 4; Methods). As several cell types contained readily identifiable and meaningful cell subsets, we utilized curation of literature-based markers to provide further guidance within each cell type (Blériot et al., 2020; Cherrier et al., 2018; Dutertre et al., 2019; Guilliams et al., 2018; Robinette and Colonna, 2016). For example, within Tier 1 T cells, we could identify T cells, NK cells and ILCs; within Tier 1 myeloid cells, monocytes, cDC1, cDC2, macrophages and pDCs; within Tier 1 B cells, germinal center, germinal center dark zone and light zone cells; within Tier 1 endothelial cells, arterioles, capillaries, lymphatics, mural cells and venules; and so forth for other cell types. To illustrate this process for one cluster, upon automated hierarchical tiered clustering of T cells, we identified a cluster that was Tier 0: pediCD, Tier 1: T cells, Tier 2: cytotoxic, Tier 3: IEL_FCER1G_NKG7_TYROBP_CD160_AREG. Upon inspection of CD3 genes (CD247, CD3D, etc.), TCR genes (TRAC, TRBC1, etc.), and NK cell genes (NCAM1, NCR1), it became readily apparent these cells were NK cells (Supplemental Figure 6).
We provide complete gene lists for cell types (1 vs. rest across all cells), subsets (1 v. rest across all cells), and states (1 vs. rest within-Tier 1 cell-type) in Supplemental Tables 5-10. To select marker genes for naming in a data driven manner we used 1 vs. rest within-cell-type differential expression (Supplemental Tables 7 and 10; Wilcoxon, Bonferroni adjusted p<0.05). To account for genes that might be highly expressed in just a few cells we ranked the marker genes by a score combining their significance, the fold change in expression, and fold change of percent gene positive cells in the subset versus the percent of gene positive cells outside the subset. The collected metrics were multiplied together to provide a single score by which the genes were ranked: (-log(sig+1) * avg_logFC * (pct.in / pct.out)). For most subsets we selected the top 2 of these marker genes. For T/NK/ILC cells and myeloid cells we occasionally chose a slightly lower ranking gene from the top 10 if it was well supported and recognized by the literature. Using this ranking system, we identified CCL3 and CD160 as two genes significantly enriched in one NK cluster (adj. p-value = 0, expression within-cluster >40% cells positive and in other Tier 1 T cells <6%). This resulted in a final name for this cluster of CD.NK.CCL3.CD160. We repeated this process for all FGID (183 end clusters) and pediCD (426 end clusters) within Tier 1 B cell, endothelial, epithelial, fibroblast, plasma cell, myeloid cell, mast cell, and T cell identified clusters, and provide systematically generated names for all (Supplemental Tables 11 and 12).
Using this analytical workflow, we present comprehensive cellular atlases of FGID (Figure 3) and pediCD (Figure 4), nominate cell states associated with disease severity and treatment outcomes in pediCD (Figure 5), and identify correspondence between the two (Figure 6). We focus on pediCD, and those cell states which distinguish between disease severity (NOA vs. PRs/FRs) and baseline gene expression differences in anti-TNF treatment response (FRs vs. PRs).
Comprehensive atlas of non-inflammatory FGID
From 13 FGID patients, we recovered 12 Tier 1 clusters which we display on a t-stochastic neighbor embedding (t-SNE) plot colored by cluster identity containing 99,488 cells (Figure 3a; Supplemental Figure 5b). These Tier 1 clusters represent the main cell types found in the lamina propria and remnant epithelium of an ileal biopsy. Inspecting each individual patient’s contribution to the t-SNE, we noted that all patients contributed to all Tier 1 clusters, though note that p044 was overrepresented with more terminally differentiated epithelial cells, likely from incomplete EDTA separation, and thus omit the p044 unique cell clusters from further analyses of composition (Figure 3b; Supplemental Figure 4d; Supplemental Table 13). We then proceeded to generate preliminary descriptive names based on inspection of each cluster within each tier, calculated a hierarchically-clustered dendrogram, and produced systematic names for each end cell state within each cell Tier 1 cell type (Figures 3c,d; Supplemental Figure 5; Supplemental Table 11; Methods). We present top marker genes for each main Tier 1 cluster/cell type, and note that we also provide complete gene lists calculated through Wilcoxon with Bonferroni adjusted p<0.05 for Tier 1 clusters/cell types, subsets, and end cell states (Figure 3e, Supplemental Tables 5-7). As patient identity did not factor into ITC stop conditions, we then calculated Simpson’s Index of Diversity for each of the clusters (Figure 3d; Supplemental Figure 5; Simpson’s Index >0.1). Low diversity clusters may still reflect important biology for individual patients, but we comment more extensively on clusters with high patient diversity.
Within B cells, we identified a strong division between non-cycling and cycling B cells, with those found in the cycling compartment readily identifiable by germinal center markers and further dark zone (AICDA) and light zone (CD83) genes resulting in FG.B/DZ.AICDA.IGKC and FG.B/LZ.CD74.CD83 clusters (Figure 3d) (Victora et al., 2010).
Within myeloid cells, we identified, and confirmed using extensive inspection of literature curated markers, cell subsets corresponding to monocytes (CD14, FCGR3A, FCN1, S100A8, S100A9, etc.), macrophages (CSF1R, MERTK, MAF, C1QA, etc.), cDC1 (CLEC9A, XCR1, BATF3), cDC2 (FCER1A, CLEC10A, CD1C, IRF4 etc.), and pDCs (IL3RA, LILRA4, IRF7) (Figure 3d;Supplemental Table 6) (Blériot et al., 2020; Dutertre et al., 2019; Guilliams et al., 2018). We highlight selected cell states including a migratory dendritic cell state (FG.DC.CCR7.FSCN1), extensive cDC2 heterogeneity relative to cDC1 heterogeneity, and a main distinction between macrophage clusters expressing C1Q*, MMP*, APOE, CD68 and PTGDS, (FG.Mac.C1QB.SEPP1, FG.Mac.APOE.PTGDS) and a series of clusters expressing various chemokines including CCL3, CXCL3,and CXCL8 (FG.Mac.CCL3.HES1, FG.Mac.CXCL3.CXCL8, FG.Mac.CXCL8.IL1B).
Within T cells, we followed a similar approach as utilized for Myeloid cells and identified principal cell subsets of T cells (joint expression of CD247, CD3D, CD3E, CD3G with TRAC, TRBC1, TRBC2, or TRGC1, TRGC2 and TRDC), and a combined cluster of cytotoxic cells (FG.T/NK/ILC.GNLY.TYROBP) likely including T cells, NK cells (lower expression of TCR- complex genes with NCAM1, NCR1 and TYROBP), and some ILCs (KIT, NCR2, RORC and low expression of CD3-complex genes) (Figure 3d, Supplemental Table 6) (Cherrier et al., 2018; Robinette and Colonna, 2016). We note that the numerical majority of CD4 T cells (FG.T/NK/ILC.MAF.RPS26) and CD8 T cells (FG.T/NK/ILC.CCR7.SELL) expressed SELL and CCR7 thus identifying them as naïve T cells. However, regardless of clusters expressing CD4 (FG.T.GZMK.GZMA) or CD8A/CD8B (FG.T.GZMK.IFNG, FG.T.GZMK.CRTAM, etc.), mostactivated T cells were characterized by expression of granzymes (Sallusto et al., 1999).
Within epithelial cells, most cells expressed high levels of OLFM4, identifying them as crypt- localized cells (Moor et al., 2018). We readily identified subsets of stem cells (LGR5), proliferating cells (TOP2A), goblet cells (SPINK4, ZG16, various MUCs), enteroendocrine cells (SCG3, ISL1), Paneth cells (ITLN2, PRSS2, LYZ), tuft cells (GNG13, SH2D6, TRPM5) and enterocytes (APOC3, APOA1, FABP6, etc.) (Figure 3d, Supplemental Table 6) (Barker et al., 2007; van der Flier and Clevers, 2009).
Within endothelial cells, we readily identified vascular and lymphatic endothelial cells (LYVE1, PROX1), with the vascular cells able to be further identified as capillaries (CA4) or venular endothelial cells (ACKR1, MADCAM1) (Brulois et al., 2020). We also identified a subset of cells (FG.Endth/Peri.FRZB.NOTCH3) expressing high levels of FRZB and NOTCH3, which, rather than being arterioles, likely represent arteriole-associated pericytes or smooth muscle cells given the absence of EFNB2, SOX17, BMX, and HEY1, and the presence of ACTA2 and MYL9, as cluster- defining genes (Figure 3d, Supplemental Table 6) (Travaglini et al., 2020; Whitsett et al., 2019).
We highlight that the FG.Endth/Ven.ACKR1.MADCAM1 cluster is characterized by expression of markers for postcapillary venules specialized in leukocyte recruitment (Thiriot et al., 2017).
Within fibroblasts, we identified principal subsets characterized by their structural roles (COL3A1, ADAMDEC1, FBLN1, LUM, etc.), myofibroblasts (MYH11, ACTA2, ACTG2, etc.), and organization of lymphoid cells (CCL19, CCL21 etc.) (Figure 3d, Supplemental Table 6) (Buechler et al., 2021; Davidson et al., 2021). Within the lymphoid-organizing fibroblasts, we draw attention to the FG.Fibro.C3.FDCSP, FG.Fibro.CCL19.C3, and FG.Fibro,CCL21.CCL19 subsets, which appear to have some characteristics of follicular dendritic cells and variable expression of CCL19/CCL21 (T-cell or migratory dendritic cell chemoattractants) and CXCL13 (B-cell chemoattractant) (Das et al., 2017; Heesters et al., 2013). We also identified a separate Tier 1 cluster of glial cells characterized by CRYAB and CLU. Intriguingly within the glial cell Tier 1 cluster, we then recovered a cell subset expressing FDCSP, CXCL13, and CR2, a key complement receptor which allows for complement-bound antigens to be recycled and presented by follicular dendritic cells (Das et al., 2017; Heesters et al., 2013). This highlights the power of iterative tiered clustering to recover discrete cell states that may, through the process of traditional clustering, not be fully resolved. Furthermore, the presence of these discrete cell clusters within larger parent cell clusters will alter the gene expression signatures of the higher-level cell types. For example, the FG.Glial/fDC.FDCSP.CXCL13 in the hierarchical cluster tree then assorts within the lymphoid-organizing stromal cells.
The mast cells recovered did not further sub-cluster in an automated fashion, and were largely marked by TPSB2 and TPSAB1 (>97%), with minimal CMA1 (<20%) expressing cells, suggesting they are largely classical MC-T cells in FGID intestine (Dwyer et al., 2021).
We identified four Tier 1 clusters for plasma cells, which are characterized by their strong expression of IGH* immunoglobulin heavy-chain genes together with either an IGK* (kappa light chain) or IGL* (lambda light chain) genes (Cyster and Allen, 2019; James et al., 2020). This resolved IgA IgK plasma cells, IgA IgL plasma cells, IgM plasma cells, and IgG plasma cells. Iterative tiered clustering identified further heterogeneity within all clusters of IgA and IgG plasma cells, though given the 3’-bias of this dataset, we note that a principled investigation of these clusters would ideally use 5’ sequencing with targeted VDJ amplification.
Together, our treatment-naïve cell atlas from 13 FGID patients captures 138 cell clusters from a non-inflammatory state of pediatric ileum which we annotated and named in a principled fashion.
Comprehensive atlas of pediCD
From the 124,054 cells profiled from 14 pediCD patients, we recovered 12 Tier 1 clusters which here we display on a t-SNE plot colored by cluster identity (Figure 4a). Distinct from FGID, Paneth cells clustered separately at Tier 1, while glial cells were now found within the fibroblast Tier 1 cluster. Inspecting each individual patient’s contribution to the t-SNE, we noted that all patients contributed to all Tier 1 clusters (Figure 4b; Supplemental Figure 4c, Supplemental Table 13). We then proceeded to generate preliminary descriptive names, independently from the FGID atlas, based on inspection of each cluster within each tier, calculate a hierarchically-clustered dendrogram, and provide systematic names for each end cell state within each cell type and subset (Figures 4c,d; Supplemental Figure 6; Supplemental Table 12; Methods). We present top marker genes for each main Tier 1 cluster/cell type, and note the complete gene lists calculated through Wilcoxon with Bonferroni adjusted p<0.05 available for Tier 1 clusters/cell types, subsets, and end cell states (Supplemental Tables 8-10). We then calculated Simpson’s Index of Diversity to denote the patient diversity present within each end cluster, identifying that most clusters in pediCD are conserved across multiple patients, and 16/305 clusters are single- patient clusters (Figure 4d; Supplemental Figure 5; Simpson’s Index >0.1). We note that in pediCD, relative to FGID, a higher fraction of cell clusters exhibited lower patient diversity.
Within B cells, we also identified a strong division between non-cycling and cycling B cells, with those found in the cycling compartment readily identifiable by germinal center markers and further dark zone (AICDA) and light zone (CD83) genes, as in FGID (Victora et al., 2010). Within cells expressing germinal centers markers, a highly-proliferative branch including clusters such as CD.B/LZ.CCL22.NPW, CD.B/GC.MKI67.RRM2, and CD.B/DZ.HIST1H1B.MKI67 emerged (Figure 4d). The CD.B/LZ.CCL22.NPW was characterized by high levels of MYC, which has been shown to allow for further rounds of germinal center affinity maturation (Dominguez-Sola et al., 2012). More numerous B cell clusters included ones characterized by expression of GPR183, such as CD.B.CD69.GPR183 (also expressing IGHG1) and CD.B.RPS29.RPS21. GPR183 has been shown to regulate the positioning of B cells in lymphoid tissues (Pereira et al., 2009).
Within myeloid cells, we identified, and confirmed using the same extensive inspection of literature curated markers as in FGID, cell subsets corresponding to monocytes (CD14, FCGR3A, FCN1, S100A8, S100A9, etc.), macrophages (CSF1R, MERTK, MAF, C1QA, etc.), cDC1 (CLEC9A, XCR1, BATF3), cDC2 (FCER1A, CLEC10A, CD1C, IRF4 etc.), and pDCs (IL3RA, LILRA4, IRF7) (Figure 4d; Supplemental Figure 6, Supplemental Table 9) (Blériot et al., 2020; Dutertre et al., 2019; Guilliams et al., 2018). We highlight selected cell states including a migratory dendritic cell state (CD.DC.CCR7.FSCN1), extensive cDC2 heterogeneity relative to cDC1 heterogeneity, and a main distinction between macrophages expressing C1Q*, MMP*, APOE, CD68 and PTGDS, (CD.Mac.APOE.PTGDS) and a series of clusters expressing various chemokines including CXCL2, CXCL3, and CXCL8 (CD.Mac.SEPP1.CXCL3, CD.Mono.CXCL3.FCN1, CD.Mono.CXCL10.TNF). Several of the end cell clusters initially clustering with macrophages also expressed monocyte markers (S100A8, S100A9), and expressed detectable, but lower levels of MERTK or AXL relative to bona fide macrophages, potentially indicative of the early stages of the trajectory of monocyte-to-macrophage differentiation (Blériot et al., 2020; Dutertre et al., 2019; Guilliams et al., 2018). We also noted a substantial expansion of clusters characterized by expression of CXCL9, CXCL10, and STAT1, canonical interferon-stimulated genes, observed in clusters such as CD.Mono/Mac.CXCL10.FCN1 (Ziegler et al., 2020, 2021). Moreover, we identified a cluster of inflammatory monocytes, CD.Mono.S100A8.S100A9, characterized by both CD14 and FCGR3A expression.
Within T cells, we followed a similar approach as utilized for FGID T cells and identified cell subsets of T cells (joint expression of CD247, CD3D, CD3E, CD3G with TRAC, TRBC1, TRBC2, or TRGC1, TRGC2 and TRDC), but in pediCD also identified several discrete clusters of NK cells (lower expression of TCR-complex genes with FCGR3A or NCAM1, NCR1 and TYROBP), and ILCs (KIT, NCR2, RORC and low expression of CD3-complex genes) (Figure 4d, Supplemental Figure 6, Supplemental Table 9) (Cherrier et al., 2018; Robinette and Colonna, 2016). We note that T cells and NK cells with a shared expression of GNLY, GZMB and other cytotoxic effector genes cluster almost indistinguishably from each other through iterative tiered clustering and visualization of the hierarchical tree, but that careful inspection of literature-curated markers helped resolve NK cells (CD.NK.CCL3.CD160; CD.NK.GNLY.GZMB) from CD8A/CD8B T cells (CD.T.GNLY.GZMH; CD.T.GNLY.CTSW) (Figure 4d, Supplemental Figure 6, Supplemental Table 9) (Cherrier et al., 2018; Robinette and Colonna, 2016). One of the specific challenges in distinguishing between T cells and NK cells in scRNA-seq data is that NK cells can express several CD3-complex genes, particularly CD247, as well as detectable aligned reads for TRDC or TRBC1 and TRBC2, and thus lower-resolution clustering approaches or datasets with lower cell numbers may miss these important distinctions (Björklund et al., 2016; Renoux et al., 2015). NK cell clusters also expressed the highest levels of TYROBP, which encodes DAP12 and mediates signaling downstream from many NK receptors (French et al., 2006; Lanier, 2001; Lanier et al., 1998). ILC clusters such as CD.ILC.LST1.AREG or CD.ILC.IL22.KIT were characterized by an apparent ILC3 phenotype, with expression of KIT, RORC and IL22, though they also expressed detectable transcripts of GATA3 in the same clusters (Cherrier et al., 2018;
Robinette and Colonna, 2016). We detected several clusters expressing CD4 and lacking CD8A/CD8B, including regulatory T cells (CD.T.TNFRSF18.FOXP3), and MAF- and CCR6- expressing helper T cells (CD.T.MAF.CTLA4). Perhaps most strikingly, we resolved multiple subsets of proliferating lymphocytes, including regulatory T cells (CD.T.MKI67.FOXP3), IFNG- expressing T cells (CD.T.MKI67.IFNG), and NK cells (CD.NK.MKI67.GZMA).
Within epithelial cells we identified substantial heterogeneity in CD. Most cells expressed high levels of OLFM4, identifying them as crypt-localized cells (Moor et al., 2018). We readily identified subsets of stem cells (LGR5), proliferating cells (TOP2A), goblet cells (SPINK4, ZG16, various MUCs), enteroendocrine cells (SCG3, ISL1), Paneth cells (ITLN2, PRSS2, LYZ), tuft cells (GNG13, SH2D6, TRPM5) and enterocytes (APOC3, APOA1, FABP6, etc.) (Figure 4d, Supplemental Table 9) (Barker et al., 2007; van der Flier and Clevers, 2009). Amongst several clusters characterized by CCL25 and OLFM4 expression, we identified a subset marked by LGR5 expression, characteristic of intestinal stem cells (CD.EpithStem.LINC00176.RPS4YA1) (Barker et al., 2007). We identified several subsets expressing CD24, indicative of crypt localization, with expression of REG1B (CD.Secretory.GSTA1.REG1B; CD.Secretory.REG1B.REG1A) (Moor et al., 2018). We also identified early enterocyte cluster CD.EC.ANPEP.DUOX2, characterized by FABP4 and ALDOB and expressing DUOX2 and MUC1. We resolved several clusters of enteroendocrine cells, including CD.Enteroendocrine.TFPI2.TPH1 and CD.Enteroendocrine.NEUROG3.MLN. We also found two clusters we labeled as M cells based on expression of SPIB (CD.Mcell.CCL23.SPIB; CD.MCell.CSRP2.SPIB) (Beumer et al., 2020; Mabbott et al., 2013). Paneth cells did not further sub-cluster despite forming an independent Tier 1 cluster (CD.Epith.Paneth). Most strikingly, we identified a diversity of goblet cells recovered across multiple patients including CD.Goblet.HES6.COLCA2 expressing REG4 and LGALS9, and CD.Goblet.TFF1.TPSG1 expressing TFF1 and ITLN1, amongst others. We also identified a cluster of Tuft cells: CD.EC.GNAT3.TRPM5.
Within endothelial cells, we also readily identified vascular and lymphatic endothelial cells (LYVE1, PROX1), with the vascular cells able to be further identified as capillaries (CA4) or venular endothelial cells (ACKR1, MADCAM1) (Figure 4d, Supplemental Table 9). We also identified a subset of cells (CD.Endth/Mural.HIGD1B.NDUFA4L2) expressing high levels of FRZB and NOTCH3, which, rather than being arterioles, likely represent arteriole-associated pericytes or smooth muscle cells given the absence of EFNB2, SOX17, BMX, and HEY1, and the presence of ACTA2 and MYL9, as cluster-defining genes. In pediCD, we also identified a cluster of arteriolar endothelial cells, CD.Endth/Art.SEMA3G.SSUH2, identified by expression of HEY1, EFNB2, and SOX17. We also highlight that the endothelial venules characterized by expression of markers for postcapillary venules specialized in leukocyte recruitment, such as CD.Endth/Ven.ADGRG6.ACKR1 and CD.Endth/Ven.POSTN.ACKR1, exhibited greater diversity than in FGID with multiple end cell clusters identified (Thiriot et al., 2017).
Within fibroblasts, we identified principal subsets characterized by their structural roles (COL3A1, ADAMDEC1, FBLN1, LUM, etc.), myofibroblasts (MYH11, ACTA2, ACTG2, etc.), and organization of lymphoid cells (CCL19, CCL21 etc.) (Figure 4d) (Buechler et al., 2021; Davidson et al., 2021). The principal hierarchy in fibroblasts in pediCD was between FRZB-, EDRNB- and F3-expressing subsets such as CD.Fibro.LY6H.PAPPA2 and CD.Fibro.AGT.F3, which were also enriched for CTGF and MMP1 expression, and ADAMDEC1-expressing fibroblasts, which were enriched for several chemokines such as CXCL12, and in some specific clusters CXCL6, CXCL1, CCL11, and other chemokines. Amongst three fibroblast subsets marked by C3 expression, we identified follicular dendritic cells (CD.Fibro/fDC.FCSP.CXCL13), along with fibroblasts expressing CCL21, CCL19, and the interferon-stimulated chemokines CXCL9 and CXCL10 (CD.Fibro.CCL21.CCL19; CD.Fibro.TNFSF11.CD24) (Das et al., 2017; Heesters et al., 2013). Distinct from the FGID atlas, within the pediCD atlas, glial cells clustered within fibroblasts, but were also marked by S100B, PLP1 and SPP1 expression.
The mast cells recovered in pediCD did further sub-cluster in an automated fashion, were largely marked by TPSB2 (>90%), with minimal CMA1 (<16%) expressing cells, suggesting they are largely classical MC-T cells in pediCD intestine (Figure 4d) (Dwyer et al., 2021). Intriguingly, some subsets (CD.Mstcl.AREG.ADCYAP1) were enriched for IL13-expression. We also detected a small cluster of proliferating mast cells from several patients (CD.Mstcl.CDK1.KIAA0101).
We also identified four Tier 1 clusters for plasma cells, which are characterized by their strong expression of IGH* immunoglobulin heavy-chain genes together with either a IGK* (kappa light chain) or IGL* (lambda light chain) genes. This resolved IgA IgK plasma cells, IgA IgL plasma cells, IgM plasma cells, and IgG plasma cells. Iterative tiered clustering identified further heterogeneity within all clusters of IgA plasma cells, though given the 3’-bias of this dataset, we note that a principled investigation of these clusters would ideally use 5’ sequencing with targeted VDJ amplification.
Together, our treatment-naïve cell atlas from 14 pediCD patients captures 305 cell clusters from an inflammatory state of the pediatric ileum suggesting an increase in the number and diversity of cell states present in the intestine during overt inflammatory disease.
Clinical variables and cellular variance that associates with pediCD severity
As this pediCD atlas was curated from treatment-naïve diagnostic samples, we were able to interrogate the data to test if overall shifts in cellular composition, specific cell states, and/or gene expression signatures underlie clinically-appreciated disease severity and treatment decisions (NOA vs. FR/PR), and those that are further associated with response to anti-TNF therapies (either FRs or PRs). Here, we leveraged the detailed clinical trajectories collected from all patients as the ultimate functional test: resolving how cellular composition and cell states predict disease and treatment outcomes.
In order to capture the overall principal axes of variation explaining changes in cellular composition, we calculated the fractional composition of all 305 end cell clusters in pediCD within its parent cell type (“per cell type”), or within all cells (“per total cells”), and performed a principal component analysis (PCA) over both of these sample x cell cluster frequency tables (Supplemental Table 14) (Mathew et al., 2020). We then used the PC1 (13.4% variation “per cell type” and 13.5% variation “per total cells”) and PC2 (12.7% variation “per cell type” and 11.8% variation “per total cells”) as numerical variables which we correlated with clinical metadata including categorical variables (patient ID, ethnicity, gender, etc.), ordinal variables (Terminal Ileum (TI)-macroscopic endoscopic evidence, TI-microscopic histopathology, Anti-TNF treatment within 90 days of diagnosis, and treatment decision/response coded as anti-TNF_NOA_FR_PR, etc.) and numerical variables (Height, BMI, CRP, ESR, PLT, PCDAI, wPCDAI, etc.) (Figure 5a, r by Spearman-rank). Amongst the clinical variables, we noted strong correlation between Initial wPCDAI and CRP (r=0.83), and moderate correlation between Initial wPCDAI and anti-TNF within 30 days (r=0.65) and anti-TNF_NOA_FR_PR (r=0.49). For PC1-“per total cells”, we identified strong correlations with anti-TNF treatment within 90 days (r=-0.76), and moderate correlation anti-TNF_NOA_FR_PR (r=-0.58; Methods). For PC1-“per cell type”, we identified strong correlation with anti-TNF within 90 days (r=-0.72), and moderate correlation with anti- TNF_NOA_FR_PR status (r=-0.63). PC1-“per cell type” was also strongly correlated with BMI and PC1-“per total cells” (r>-0.7). PC1-“per cell type” was weakly correlated with patient ID and gender (r<0.3).
In order to understand if any cell types were predominantly driving associations with clinical disease severity at initial presentation, we then further deconvoluted the overall PCA on 305 end clusters and performed PCA over each cell type’s fractional composition of end clusters individually (B cells: 33 clusters, Endothelial: 18 clusters, Epithelial: 68 clusters, Fibroblast 45 clusters, Myeloid: 54 clusters, T/NK/ILCs: 57 clusters), and correlated the first two PC’s (all PC1’s and PC2’s each accounted for >13% variance) with all of the clinical variables (Supplemental Table 14). The PCs derived from T/NK/ILC cells, myeloid cells, and epithelial cells were all moderately correlated with anti-TNF_NOA_FR_PR status (r>0.49) individually and had higher values than the other cell types; therefore, we asked if a PCA-based metric considering all three cell types would synergistically capture both disease severity and treatment response. When we calculated the PCA accounting for frequencies within each cell type of T/NK/ILC cells, myeloid cells, and epithelial cells, we found strong correlation for PC2 with both anti-TNF within 90 days (r=-0.83) and anti-TNF-NOA_FR_PR status (r=-0.87) (Figure 5a). This represented the two strongest correlations of any variable we tested with anti-TNF treatment and response status, outperforming wPCDAI.
Discrete cell cluster changes across the pediCD clinical severity and response spectrum
We next focused on further deconstructing the disease severity vector: identifying which cell clusters accounted for the most significant changes in abundance based on the relative frequency of an end cell cluster within its parent cell type. We focus on this form of analysis for scRNA-seq, similar to what is typically reported for flow cytometry, and further discuss approaches to enumerate total cell numbers which would be critical to identify changes in overall cellularity in the different pediCD treatment and response categories (Discussion) (Gomariz et al., 2018). We first performed a Fisher’s exact test between NOA vs. FR; NOA vs. PR; or FR vs. PR, and then performed a Mann-Whitney U test to highlight specific clusters and discuss results from clusters with high Simpson’s index of diversity (i.e. recovered from multiple patients) as shown for T/NK/ILCs and Myeloid Cell Types (Fig. 5b; Methods).
When comparing FR/PRs to NOAs, two subsets with significantly increased frequency in FR/PR patients amongst T cells, NK cells, and ILCs were identified. These were CD.NK.MKI67.GZMA and CD.T.MKI67.IL22 (Figure 5b,c; Supplemental Figure 7a; Supplemental Table 15). Beyond the strong proliferation signature, CD.NK.MKI67.GZMA were enriched for genes such as GNLY, CCL3, KLRD1, IL2RB and EOMES, and CD.T.MKI67.IL22 were enriched for IFNG, CCL20, IL22, IL26, CD40LG and ITGAE. This indicates that with increasing pediCD clinical severity, there is increasing local proliferation of cytotoxic NK cells, and proliferation of tissue-resident T cells with the capacity to express anti-microbial and tissue-reparative cytokines, and molecules to interface with antigen-presenting cells and B cells. Alongside this increase, there was a significant decrease amongst fibroblasts of CD.Fibro.CCL19.IRF7, and amongst epithelial cells of CD.EC.SLC28A2.GSTA2 clusters in the FR/PR patients compared to NOA (Supplemental Figure 7a). The CD.Fibro.CCL19.IRF7 were enriched for CCL19, CCL11, CXCL1, CCL2, and very specifically for OAS1 and IRF7. The CD.EC.SLC28A2.GSTA2 cluster was characterized by its two namesake markers, involved in purine transport and glutathione metabolism (Moor et al., 2018).
We also detected significant decreases in FRs relative to NOAs in certain cell types, particularly within Epithelial cells including CD.EpithStem.LINC00176.RPS4Y1, CD.MCell.CSRP2.SPIB, CD.EC.FABP6.PLCG2, and CD.EC.FABP1.ADIRF (Supplemental Figure 7b; Supplemental Table 15). We note that the relative decrease in M cells is in stark contrast to the “ectopic” M-like cells that were detected in adult ulcerative colitis (Smillie et al., 2019).
We next focused on those cell subsets that were significantly changed only between PRs and NOAs (Figure 5c; Supplemental Figure 7c; Supplemental Table 12). Here we note several more distinct clusters within the lymphocyte cell type, including increases in the PR patients compared to NOA patients of CD.T.MKI67.IFNG, CD.T.MKI67.FOXP3, CD.T.GNLY.CSF2, and CD.NK.GNLY.FCER1G. The two MKI67 clusters again highlighted an increase in proliferative cells, specifically cells enriched for IFNG, GNLY, HOPX, ITGAE and IL26 (CD.T.MKI67.IFNG), and IL2RA, BATF, CTLA4, TNFRSF1B, CXCR3, and FOXP3 (CD.T.MKI67.FOXP3), the latter of which may be indicative of proliferating regulatory T cells. The two GNLY clusters emphasized cytotoxicity, specifically cell clusters were both enriched for GNLY, GZMB, GZMA, PRF1 and more specifically for IFNG, CXCR6, and CSF2 (CD.T.GNLY.CSF2), or AREG, TYROBP, and KLRF1 (CD.NK.GNLY.FCER1G). Amongst myeloid cells, there was an increase in CD.Mac.CXCL3.APOC1, CD.Mono/Mac.CXCL10.FCN1, and CD.Mono.FCN1.S100A4 in PR versus NOA. The CD.Mac.CXCL3.APOC1 cluster was enriched for a variety of chemokines including CCL3, CCL4, CXCL3, CXCL2, CXCL1, CCL20, and CCL8. It was also enriched for TNF and IL1B. The CD.Mono/Mac.CXCL10.FCN1 cluster was enriched for CXCL9, CXCL10, CXCL11, GBP1, GBP2, GBP4, GBP5, suggestive of activation by IFN, and more specifically Type II IFNγ, based on the GBP gene cluster (Ziegler et al., 2020). CD.Mono.FCN1.S100A4 was characterized by S100A4, S100A6, and FCN1 expression. These two immune clusters were paralleled by increases in certain clusters within endothelial cells in PR versus NOA patients (CD.Endth/Ven.LAMP3.LIPG) and epithelial cells (CD.Goblet.TFF1.TPSG1).
Several clusters of cells were decreased in PR versus NOA, including CD.T.LAG3.BATF, CD.T.IFI44L.PTGER4, and CD.T.IFI6.IRF7 amongst lymphocytes (Supplemental Figure 7c) (Roncarolo et al., 2018). Amongst myeloid cells, CD.cDC2.CLEC10A.FCGR2B were decreased, and amongst fibroblasts CD.Fibro.IFI6.IFI44L were decreased. In epithelial cells, CD.Tuft.GNAT3.TRPM5 cells were decreased. Alongside the decrease in Tuft cells amongst epithelial cells, two more clusters closely related to the aforementioned CD.EC.GSTA2.SLC28A3 cluster, also marked by GSTA2 expression, were significantly decreased (CD.EC.GSTA2.CES3, and CD.EC.GSTA2.TMPRSS15).
We assessed the compositional differences between FRs and PRs and only identified one cell cluster which was significantly increased in PRs: CD.B/DZ.HIST1H1B.MKI67, which are proliferating dark zone B cells. CD.T.EGR1.TNF T cells were significantly decreased in PR versus FR (Supplemental Figure 7d; Supplemental Table 15). These data suggest that at the earlier stages of pediCD, there are a series of changes in multiple cell types that encapsulate the distinctions between NOA and FR or PR patients.
Collective cell vectors delineating pediCD clinical severity and response spectrum
Together, the significant changes in cell composition between the clinically-defined patient groups were particularly notable within proliferating T cells, cytotoxic NK cells, monocytes/macrophages, and epithelial cells that could be combined to calculate a numerical variable for “PC2- T/NK/ILC/Myeloid/Epithelial” which correlated strongly with both the clinical presentation leading to a decision to treat or not with anti-TNF therapies and to the distinction between anti- TNF_NOA_FR_PR status (Figure 5a). Some of the top negative loadings for PC2, enriched in PRs and FRs compared to NOAs, included both helper and cytotoxic T cell clusters (CD.T.MAF.CTLA4; CD.T.CCL20.RORA; CD.T.GNLY.CSF2), NK cell clusters (CD.NK.GNLY.FCER1G; CD.NK.GNLY.IFNG; CD.NK.GNLY.GZMB); proliferating T cells and NK cells (CD.T.MKI67.FOXP3; CD.T.MKI67.IFNG; CD.T.MKI67.IL22; CD.NK.MKI67.GZMA), and monocytes, macrophages, DCs and pDCs (CD.cDC2.CD1C.AREG; CD.Mac.C1QB.CD14; CD.Mono.CXCL3.FCN1; CD.pDC.IRF7.IL3RA; CD.Mono/Mac.CXCL10.FCN1) (Figure 5d; Supplemental Table 16). The top positive loadings for PC2 encompassing the NOA-enriched clusters included several epithelial cell subsets such as Tuft cells (CD.EC.GNAT3.TRPM5) and those with specialized metabolic features including retinol-binding, bile binding and export, fatty- acid and cholesterol metabolism, fructose and glucose metabolism, starch metabolism glutathione metabolism, sulfation, and the terminal degradation of peptides (CD.EC.RBP2.CYP3A4; CD.EC.FABP6.PLCG2; CD.EC.FABP1.ADIRF; CD.EC.GSTA2.TMPRSS15) (Figure 5d; Supplemental Table 16) (Lampen et al., 2000; Mårtensson et al., 1990; Martínez-Augustin and de Medina, 2008; Sullivan et al., 2021; Wen and Rawls, 2020). Furthermore, clusters also enriched in NOA PC2 such as CD.EC.ADH1C.RPS4Y1 and CD.EC.ADH1C.GSTA1, clustered in a separate branch together and expressed several enzymes responsible for steroid hormone and dopamine biosynthesis (Figure 4d, 5d) (Cima et al., 2004; Magro et al., 2002). Importantly for the regenerative potential of the epithelium, CD.EpithStem.LINC00176.RPS4Y1 were also defining of the PC2-positive NOA direction. This suggests that multiple collective changes in the composition and/or state of T/NK/ILC cells, myeloid cells, and epithelial cells at diagnosis may help stratify pediCD patients not only by clinically appreciated disease severity but also may influence anti-TNF responsiveness.
To determine whether the disease severity gene signature that we discovered in the PREDICT study can be found in other cohorts, we selected the top 92 markers of the 25 cell states associated with disease severity and treatment outcomes (Supplemental Table 17) and performed a gene-set enrichment analysis (GSEA) (Supplemental Figure 7e). In the two independent treatment-naïve cohorts that we analyzed (the pediatric RISK cohort, n = 69, and the adult E−MTAB−7604 cohort, n = 43) our gene signature was significantly enriched in illeal, but not colonic, mucosal biopsies from patients who did not respond to anti-TNF therapy compared to those who responded (Kugathasan et al., 2017; Verstockt et al., 2019). Thus, these genes (TNFAIP6, GZMB, S100A8, CSF2, CLEC4E, S100A9, IL1RN, FCGR1A, CLIC3, CD14, PLA2G7, FAM26F, IL3RA, NKG7, IL32, CCL3, OLR1, LILRA4, APOC1, MYBL2) informed by the PC2 cellular vector, and showing best ranks in both cohorts, could potentially serve as predictive markers of anti-TNF therapy outcome in newly diagnosed patients.
Random forest classifier applied to cellular taxonomies allows for identification of correspondence between FGID and pediCD
As we had generated independent cellular atlases for FGID and pediCD to mitigate “discovery” of hybrid cell clusters that may not represent bona fide biological cell states, we next sought to match and identify correspondence between pediCD and FGID cell subsets. As our study progressed, several analytical methods to integrate scRNA-seq emerged which utilize distinct principles to either predict cell type names given reference gene lists or directly integrate two datasets that were collected from distinct perturbations, tissues, or even species (Hao et al., 2021; Hie et al., 2019; Korsunsky et al., 2019; Pliner et al., 2019). However, many of these methods are benchmarked on broad cell type or subset integration, and thus their applicability for fine cell states, as in the end clusters we identify here, remains unknown. Thus, we employed a random forest (RF) classifier-based approach, which has recently also been applied successfully in work to identify correspondence in fine sub-clusters in the mammalian retina (Peng et al., 2019; Shekhar et al., 2016). Specifically, we employed paired RF models (one trained on FGID the other trained on pediCD) to obtain cross dataset predictions per cell. We trained these models in scikit- learn with 5-fold cross validation and params: min_samples_leaf=1, oob_score=True, criterion=“gini”, max_depth=200, n_estimators=700, max_features=“sqrt” (Pedregosa et al., 2011). The training set (but not the test set) was sampled with replacement such that all classes contained as many samples as the maximum proportioned class. This up-sampling procedure provided the largest gain to our test accuracy, sensitivity, and specificity scores, increasing accuracy ∼10-15% across each cell type. With our final model, we attained cross-dataset predictions (pediCD to FGID & FGID to pediCD) for each cell, giving a probability score of a cell belonging to a subset in the other disease condition (Supplemental Figure 5c; Methods). We applied these models to each cell type individually, and here focus our discussion on Myeloid cells and T/NK/ILC cells as two cell types prominently associated with pediCD disease severity (Figure 6, Supplemental Figure 8). As newer methods are developed, more refined integration is likely to be possible.
Comparing across myeloid cells between pediCD and FGID, we could identify strong correspondence of specific cell subsets such as cDC1s or pDCs (Figure 6a). We also identified strong correspondence between several cDC2 clusters. We identified a gradient of monocyte and macrophage correspondence of 31 clusters in pediCD to 2 FGID clusters, likely reflective of inflammatory monocyte to macrophage differentiation in pediCD (Blériot et al., 2020; Dutertre et al., 2019; Guilliams et al., 2018). Some clusters characterized by STAT1 activation did not demonstrate significant correspondence to any FGID cluster. We also generally noted substantially increased cluster diversity in pediCD end clusters relative to their correspondence in FGID. This emerged from more patient-specific clusters found in pediCD, and an overall decrease in Simpson’s index of diversity considering the patient composition of each end clusters (Figure 6b).
For T/NK/ILC cells, we identified more discrete patterns relative to Myeloid cells based on comparison of the RF result. Within the two FGID cytotoxic T cell clusters, we identified correspondence by 18 pediCD clusters, representing ILC3s, and cytotoxic NK cells and T cells (Supplemental Figure 8). The cluster of naïve T cells in FGID had correspondence with the majority of pediCD non-cytotoxic T cell clusters, illustrating a substantial activation and specialization to several discrete T cell states that were specific for pediCD.
Importantly, when we jointly clustered macrophages from FGID and pediCD together, we identified that several of the original end clusters identified through ARBOL in pediCD were divided across the UMAP: being split into distinct clusters of cells (Figure 6c-f). This reinforces the need to quantitatively approach the choice of clustering parameters and number of iterations used. We also employed the STACAS package to integrate T cells between FGID, confirming the higher percentage of proliferating T cells in pediCD patients compared to FGID (Supplemental Figure 8b) (Andreatta and Carmona, 2021). However, as compared to ITC, the integration approach on T cells resulted in lower heterogeneity, thus masking important differences revealed by our newly developed ARBOL approach. Our detailed analysis of the correspondence between the FGID and pediCD atlases highlights the challenge of multiple cell type, subset, and state vectors which are simultaneously accounted for by clustering over a set of highly variable genes jointly derived from multiple cells and disease conditions. In an atlas composed of multiple clinical entities, highly homogenous cell clusters may appear dispersed across a space based on the other cells that they are being compared against and the parameters used for clustering. This underscores the power in the disease-specific clustering heuristic employed with this dataset, which revealed principled end clusters in pediCD which could then be related back to a non- inflammatory reference atlas while still maintaining fine granularity.
The phenotypic space of macrophages and T cells is significantly different across FGID and NOA/FR/PR pediCD
Based on their over-representation within clusters showing more significant differences within pediCD, we next focused on performing an analysis over a shared gene expression space of FGID and pediCD of the monocytes/macrophages (Figure 7) and T/NK/ILCs (Figure 8). We utilized a list of genes that were cell-type defining genes in either FGID or pediCD, but removed genes that were differentially expressed between FGID and pediCD, to allow for cell type/subset to drive placement on the UMAP (Methods) (Ordovas-Montanes et al., 2018). This allowed us to place our fine-grained clusters within a joint gene-expression space related to underlying cell types in FGID and pediCD, and we also contextualize our findings with an orthogonal integration approach applied to the T/NK/ILCs (Supplemental Figure 8) (Andreatta and Carmona, 2021).
Within FGID monocytes/macrophages, we identified that the majority of clusters occupied the periphery of the UMAP space, including chemokine-expressing clusters (FG.Mac.CCL3.HES1; FG.Mac.CXCL8.IL1B) and metabolic clusters (FG.Mac.APOE.PTGDS) (Figure 7a,c). This was in stark contrast to the pediCD monocytes/macrophages, where we identified that now many of the clusters occupied the central region of the UMAP (Figure 7a,c). We highlight several of the clusters that are significantly different in frequency between the pediCD groups which were found in this central region (Figure 7b). Notably, NOA, FR and PR pediCD clusters had significantly different distributions within this space (Figure 7c-e). There was a progressive increase in the Hellinger Distance (Figure legend 7e: a measure of the distance between two distributions) of the distributions from FGID to NOA, NOA to FR, and NOA to PR. PRs had the most significantly different distribution relative to FGID. This was in large part driven by cells from both FRs and PRs that inhabit the central region, together with a loss of density in the chemokine-expressing macrophages from FGID through to PRs. Of note, the frequency of TNF+ macrophages and its expression level of TNF was significantly increased in FRs relative to all other groups (Figure 7f; permutation test shuffling anti-TNF response variable, FR had significantly more TNF+ cells than expected by chance with p approximating 0). Despite the fine-grained tiered clustering approach used, the majority of clusters had high Simpson’s Diversity indices representing cell states found in several patients (Figure 7g).
Within T/NK/ILCs, we identified that FGID cells were more uniformly mixed with the pediCD cells relative to monocytes/macrophages (Figure 8a). FGID cells occupied naïve and quiescent states, showed some signatures of activation, and also specialization towards helper and cytotoxic states (Figure 8a,c) (Sallusto et al., 1999). The most notable changes in the Hellinger Distance distribution occurred between FGID and FR, rather than between FGID and PR as may have been expected (Figure 8d,e). Similar to the monocytes/macrophages, the main area which gained density with increased disease severity was the central region: characterized by proliferation of several clusters increased in frequency within FRs and PRs to anti-TNF including T cells such as CD.T.MKI67.FOXP3 and CD.T.MKI67.IL22 (Figure 8b-d). Proliferation- associated gene signatures were also seen in extreme corners driven by CD.NK.MKI67.GZMA, and significantly increased from FGID through to PR (Figure 8b,f). Intriguingly the only T cell clusters in pediCD with gini coefficient <0.1 are from pediCD patients (Figure 8g). Taken together with several cell clusters associated with pediCD proliferation overlapping with existing areas found in FGID, this indicates activation and more extreme diversification from existing T cell states driving the T cell clustering that defined pediCD. This is distinct from the recruitment and failed differentiation towards homeostatic cell states between FGID and pediCD that we discovered in monocytes/macrophage clusters. Both joint protections confirm and extend our RF predictions.
In order to provide tissue-scale context and understand the impact on other cell types for these anti-TNF response associated lymphocyte states, we assessed the relationship of these proliferating T and NK cell clusters with epithelial and myeloid cells (Supplemental Figure 8c). We found that CD.T.MKI67.FOXP3 were strongly positively associated with CD.Goblet.RETNLB.ITLN1 and CD.EC.NUPR1.LCN2 secretory cell states. Conversely, we found that CD.NK.MKI67.GZMA were significantly negatively correlated with CD.EC.ADH1C.EDN1, CD.Mcell.CSRP2.SPIB, CD.EC.ADH1C.RPS4Y1, CD.EC.GSTA2.TMPRSS15, and CD.EpithStem.LINC00176.RPS4Y1. This indicates the potential for cytotoxic activity of the proliferating NK cells towards more homeostatic cell states of epithelial cells, and critically of intestinal stem cells, with increased disease severity.
Discussion
This study addresses a critical unmet need in the fields of IBD and systems immunology: the creation of an atlas of newly-diagnosed and untreated diseased tissue, coupled with detailed clinical follow-up to link diagnostic cell types and states with disease trajectory. This is especially true for GI disease and others like it which afflict tissues that are not easily accessible without operative or endoscopic intervention, and where tissue-specific immune pathology dictates disease severity and trajectory. Likewise, cross-sectional studies, as have been the norm for most previous scRNA-seq studies of IBD, are not able to overlay disease trajectory and treatment response onto the topography of a complex multi-cellular atlas, thus limiting the mechanistic and predictive inferences that can be drawn from the generated atlas (Corridoni et al., 2020a, 2020b; Drokhlyansky et al., 2019; Elmentaite et al., 2020; Huang et al., 2019; Kinchen et al., 2018; Martin et al., 2019; Parikh et al., 2019; Smillie et al., 2019). Furthermore, mouse models of CD, and of IBD more broadly, may not be the most appropriate models for understanding treatment resistance in pediCD (Neurath, 2019). To surmount these limitations, we created a prospective clinical study, and enrolled patients requiring a diagnostic biopsy for possible IBD, prior to diagnosis. This allowed us to capture a tremendously valuable control group: those patients with FGID, who experience GI symptoms without evidence of GI inflammation or autoimmunity. These uninflamed controls served as a critical comparator to contextualize the evidence of immune pathology that we observed in patients with pediCD. With these detailed clinical phenotypes as our foundation, we developed an automated ITC algorithm for scRNA-seq data, ARBOL, which defines a vector of T cells, myeloid cells and epithelial cells that cleanly stratifies both Crohn’s disease severity and response to treatment.
The availability of comprehensive clinical, flow cytometric and scRNA-seq data from patients with pediCD and from uninflamed FGID controls created an unprecedented opportunity for comparative atlas creation. We took the opportunity to develop a methodical, unbiased, approach to cell state discovery, ARBOL, released alongside our manuscript (https://github.com/jo-m-lab/ARBOL). ARBOL iteratively explores axes of variation in scRNA-seq data by clustering and subclustering until variation between cells becomes noise. The philosophy of ARBOL is that every axis of variation could be biologically meaningful so each should be explored, and that axes of variation are relative to the comparative outgroup, meaning that similar cell states may arise at distinct tiers. Once every possibility is explored, curation and a statistical interrogation of resolution are used to collapse clusters into the elemental transcriptomes of the dataset. ARBOL inherently builds a tree of subclustering events. As data is separated by major axes of variation in each subset, later rounds capture less pronounced variables. This comes with some caveats: variation shared by all cell types (for example, cell cycle stage) can make up one of the major axes of variation in the first round of clustering. Cell types can split up at the beginning, so the same splitting of B and T cells, for example, may happen further down in separate branches. The resulting tree of clustering events (Supplemental Figure 5a) is therefore neither indicative of true distances between end clusters nor a tree of unique groupings. We address this problem by calculation of a binary tree of manually and computationally curated end clusters. Using a standardized method of end cluster naming, which we describe in ARBOL’s tutorial (https://jo-m-lab.github.io/ARBOL/ARBOLtutorial.html), we found the resulting binary tree assorted end clusters into appreciated cell types and subsets (Figures 3 and 4), and also reveals further previously unappreciated granularity that will serve as the foundation for future work into the cellular composition of the gastrointestinal tract.
One of the primary remaining challenges going forward will be to identify which clusters are truly patient-unique, or are simply patient-unique at the cohort size to which we are currently limited to. We calculate a diversity metric for each end cluster to highlight those which are largely conserved between patients, and provide complete cluster-defining gene lists for both FGID and pediCD at three levels of clustering. We also provide links to our data visualization portal to enable cross- atlas comparisons: https://singlecell.broadinstitute.org/single_cell/study/SCP1422/predict-2021-paper-fgid and https://singlecell.broadinstitute.org/single_cell/study/SCP1423/predict-2021-paper-cd.
One of the chief advantages of enrolling pediCD patients at diagnosis, and prior to any therapeutic intervention, was that we were able to relate their diagnostic immune landscape with disease trajectory. In the pediCD group, we identified 3 clinical subgroups. The first distinction was made by treating physicians, and classified patients with milder versus more severe clinical disease characteristics at diagnosis. The milder patients were not placed on anti-TNF agents (NOA), while the more severe patients were treated with monoclonal antibodies that neutralize TNF including infliximab and adalimumab. The second distinction between patient groups could not be made at diagnosis, but rather, was based on clinical and biochemical response to anti-TNF agents. Thus, of those patients treated with anti-TNF therapeutics, some were FRs, and some were only PRs, with PRs requiring anti-TNF dose modifications and the addition of other agents, and with ongoing, uncontrolled disease signs and symptoms. While differences in ant-TNF pharmacokinetics have been partially implicated in the need to dose-escalate anti-TNF agents in some pediCD patients, our study identifies foundational differences in the immune state at diagnosis in PR patients compared to the NOA and FR subgroups (D’Haens and Deventer, 2021; Ordás et al., 2012; Yarur et al., 2016). While standard flow cytometry was not able to distinguish the immune phenotype of NOA versus treated patients, scRNA-seq identified significant differences. The contextualization of our scRNA-seq derived predictive cellular vector with two other treatment-naïve bulk RNA-seq studies of Crohn’s disease, underscores the broader applicability of our findings (Kugathasan et al., 2017; Verstockt et al., 2019).
We noted significant cell state changes at diagnosis underlying clinically-appreciated disease severity that impacted the clinical decision to treat or not to treat with anti-TNF agents. These occurred within multiple clusters of T, NK, fibroblast, epithelial, monocyte, macrophage, and dendritic cells. For anti-TNF response, very few clusters exhibited significantly differential composition between FR and PR individuals. This suggests that multiple collective changes in several cell types may conspire to lead to differences in treatment outcomes. Indeed, when we jointly considered a cellular principal component vector comprising epithelial cells, T/NK/ILCs, and myeloid cells, we identified several clusters that together could delineate the full spectrum of NOA, FR, and PR. This cellular vector indicated that multiple T cell subsets, NK cells, monocytes, macrophages, and epithelial cells were altered in disease. Intriguingly, by finely clustering each cell type, we found that proliferating T and NK cells do not represent a uniform population, but rather reflect functional specialization capturing FOXP3, IFNG, IL22, and GZMA as cluster- defining genes. Enriched in NOA individuals were epithelial cells involved in chemosensation (Tuft.GNAT3.TRPM5) and absorption of metabolites (EC.GSTA3.TMPRSS15), as well as stem cells (Banerjee et al., 2020; von Moltke et al., 2016; Sido et al., 1998). That pediCD severity is not uniquely predicted by a singular cell subset or gene is reflective of the complex genetics and environmental factors that have been implicated, along with the rich literature that has found significant changes by histology, flow cytometry, or mass cytometry in CD relative to control tissue(Buisine et al., 2001; Leeb et al., 2003; Leonard et al., 1995; Lilja et al., 2000; Mitsialis et al., 2020; Müller et al., 1998; Souza et al., 1999; Stappenbeck and McGovern, 2017; Takayama et al., 2010). However, with the PREDICT study, we have discovered precisely which changes in CD cellular composition come together to form a predictive vector for both disease severity and treatment response. Intriguingly, the quantification and visualization of this response vector predicted a later escalation of one of our patients (p022; who appeared as an outlier FR in Figure 5d) from FR to PR, which occurred after our database lock in December of 2020.
When considering the relationships between T cells and NK cells along with epithelial cells, we captured that proliferating cytotoxic NK cell subsets like CD.NK.MKI67.GZMA were significantly negatively correlated with critical metabolic and progenitor epithelial cell subsets in pediCD. Conversely, proliferating regulatory CD.T.MKI67.FOXP3 were positively associated with secretory epithelial cells in pediCD, but did not appear related to the decrease in metabolic or progenitor cells. How T cell-derived cytokines impact intestinal regeneration and differentiation has recently been the focus of several studies, but the relationship of these fine-grained T cell subsets with specific epithelial cell states observed in the human intestine remained unknown (Biton et al., 2018; Lindemans et al., 2015). Our work suggests that in the context of the ileum impacted by CD that there is further complexity to understand, particularly as it pertains to cytotoxic NK cells and T cells and their impact on epithelial cell homeostasis and regeneration.
The mapping of these disease severity-associated cell networks identifies a host of new potential therapeutic targets for pediCD, for many of which there are clinical-stage therapeutics that could be investigated. These include CD40L-blocking antibodies, IL-22 agonists, and targeted anti- proliferation agents (Betts et al., 2017; Lindemans et al., 2015; Miura et al., 2021; Ramanujam et al., 2020; Sootome et al., 2020).. A case can also be built for targeting inflammatory cytokines such as IL-1, and for interrogating agents aimed at mucosal healing including new anti-GM-CSF antibodies, given that several prominent cell subsets marked by CSF2 were enriched in the PR patients (Ai et al., 2021; Aschenbrenner et al., 2021; Castro-Dopico et al., 2020; Mehta et al., 2020; Mitsialis et al., 2020; Muro and Mrowiec, 2015). This atlas therefore provides a rigorous evidence-based rationale for proposing new therapeutic interventions, as well as a mechanism for interrogating the impact of new agents on the longitudinal immune landscape of pediCD patients.
Recent work on COVID-19 has also highlighted the challenges faced by systems approaches to capture baseline cell states that predict disease trajectory (Kaczorowski et al., 2017; Lucas et al., 2020; Mathew et al., 2020; Schulte-Schrepping et al., 2020; Su et al., 2020). In a disease of known infectious etiology with SARS-CoV-2, monocytes, macrophages, granulocytes, T cells, B cells, antibodies, and interferon state have all independently been associated with disease outcomes.
Few studies have considered how multiple collective changes at baseline may influence outcome, yet are likely more reflective of the disease. With the complex and protracted presentation of a multifactorial disease like Crohn’s disease, we posit that multiple concerted effects are required to dictate both the severity (NOA vs FR/PR) and the treatment-response (FR vs PR). Future work will aim to consider which cell subsets are recovered during mucosal healing, and how closely the treated state reflects each individual patient’s baseline presentation.
Limitations of the Study
Our study has several limitations that are important to consider. Here we are limited by the cohort size to the strict inclusion/exclusion criteria used, follow-up period, and the ability to capture untreated patients early during the diagnosis process. As a result, we are underpowered to directly correlate in our cohort genetic and environmental influences such as the enteric microbiome, infections, and dietary factors (Dovrolis et al., 2020; Rajca et al., 2014; Yilmaz et al., 2019). We also are limited in our ability to capture spatial information and context from our cohort. It will also be critical to understand how disease progresses and the reason for persistent NOA, FR, or PR patient states in some individuals which will require retention in the trial and repeat possibility of for-cause biopsy procurement. Furthermore, our cohort is representative of the demographics of FGID and IBD in Seattle, WA. It will be essential to understand how both FGID and IBD may present differently across the world, and multiple cohorts will be required to understand location-unique and generalizable findings. We also recognized the number of pediCD patients within this cohort limits our ability to subdivide CD phenotype into location and behavior of disease (via the Paris/Montreal classification systems) (Levine et al., 2011). Based on this study, we will seek to expand our efforts to validate and extend our findings.
Data Availability
Data and Code Availability Single-cell RNA-seq data can be visualized via the Single Cell Portal links for each atlas of FGID https://singlecell.broadinstitute.org/single_cell/study/SCP1422/predict-2021-paper-fgid and pediCD https://singlecell.broadinstitute.org/single_cell/study/SCP1423/predict-2021-paper-cd. The cell-by-gene matrices will be available with the peer-reviewed version of this manuscript. The raw human FASTQ files will be available from the Broad controlled access repository DUOS with the peer-reviewed version of this manuscript. Please contact the authors All original code has been assembled as the ARBOL package and deposited at GitHub and is publicly available together with this manuscript at: https://github.com/jo-m-lab/ARBOL and https://jo-m-lab.github.io/ARBOL/ARBOLtutorial.html.
https://github.com/jo-m-lab/ARBOL
https://singlecell.broadinstitute.org/single_cell/study/SCP1422/predict-2021-paper-fgid
https://singlecell.broadinstitute.org/single_cell/study/SCP1423/predict-2021-paper-cd
AUTHOR CONTRIBUTIONS
Conceptualization: G.W., A.Y., A.K.S., L.S.K., H.B.Z., J.O.-M
Methodology: B.A.D., K.K., A.Y., L.A., G.W., D.L.S., D.L., V.N., A.K.S., L.S.K., V.T., H.B.Z., J.O.- M
Formal analysis: B.A.D., K.K., A.Y., V.N., X.D., V.T., H.B.Z., M.F., M.S., J.S.C., J.O.-M
Investigation: B.A.D., K.K., L.A., K.C., R.F., P.K., A.A., L.C., C.M., A.Y., G.W., D.L.S., D.L., M.F.,
V.N., L.S.K., X.D., G.D., B.B., K.B., V.T., L.V.C., V.M., H.B.Z., S.B.S., J.O.-M Resources: B.A.D., K.K., A.K.S. L.S.K., H.B.Z., J.O.-M
Data Curation: B.A.D., K.K., A.Y., P.K., M.F., V.N., V.T., H.B.Z., J.O.-M
Writing – Original Draft: B.A.D., K.K., V.N., A.K.S., L.S.K., H.B.Z., M.S., J.S.C., J.O.-M
Writing – Review & Editing: B.A.D., K.K., L.A., G.W., D.L.S., A.Y., A.A., P.K., K.C., R.F., B.B.,
K.B., L.C., C.M., D.L., M.F., V.N., G.D., A.K.S., L.S.K., V.M., V.T., L.V.C., H.B.Z., M.S., J.S.C., F.T., S.B.S., J.O.-M
Visualization: B.A.D., K.K., V.N., A.K.S. H.B.Z., M.S., J.O.-M
Supervision: A.K.S., L.S.K., J.O.-M
Project Administration: A.K.S., L.S.K., B.B., H.B.Z., F.T., J.O.-M Funding Acquisition: A.K.S., L.S.K., J.O.-M
DECLARATION OF INTERESTS
J.O.-M. reports compensation for consulting services with Cellarity and Hovione. A.K.S. reports compensation for consulting and/or SAB membership from Merck, Honeycomb Biotechnologies, Cellarity, Repertoire Immune Medicines, Hovione, Third Rock Ventures, Ochre Bio, FL82, and Dahlia Biosciences unrelated to this work. A.K.S. has received research support from Merck, Novartis, Leo Pharma, Janssen, the Bill and Melinda Gates Foundation, the Moore Foundation, the Pew-Stewart Trust, Foundation MIT, the Chan Zuckerberg Initiative, Novo Nordisk and the FDA unrelated to this work. Dr. Kean is on the scientific advisory board for HiFiBio and Mammoth Biosciences. She reports research funding from Kymab Limited, Magenta Therapeutics, BlueBird Bio, and Regeneron Pharmaceuticals. She reports consulting fees from Equillium, FortySeven Inc, Novartis Inc, EMD Serono, Gilead Sciences, Vertex Pharmaceuticals, and Takeda Pharmaceuticals. Dr. Kean reports grants and personal fees from Bristol Myers Squibb that are managed under an agreement with Harvard Medical School. L.A. is a consultant for Takeda Pharmaceuticals. G.W. reports Research funding from Abbvie, Jansen, Takeda, Allakos; SAB for Abbvie, Bristol Myers Squibb; DSMB for Abbvie. D.L.S., is the co-founder, CM, president of NiMBAL Health. S.B.S. declares the following interests: Scientific advisory board participation for Pfizer, Lilly, IFM therapeutics, Merck, Pandion, and Takeda Inc., and grant support from Pfizer, Novartis, Amgen, Takeda Consulting for Hoffman La Roche and Amgen. A.K.S., L.S.K., J.O.-M., H.B.Z., and B.A.D., are co-inventors on a provisional patent application relating to methods of stratifying and treating IBD.
Supplemental Tables
Supplemental Table 1: Flow cytometry panels
Supplemental Table 2: scRNA-seq sample filtering thresholds
Supplemental Table 3: scRNA-seq quality control metrics for all single-cells
Supplemental Table 4: Traditional joint clustering for broad cell types and differential expression testing by Wilcoxon to determine CD or FGID-enriched genes.
Supplemental Table 5: FGID cell type markers
Supplemental Table 6: FGID cell subset markers
Supplemental Table 7: FGID cell state (i.e. end cell cluster) markers Supplemental Table 8: CD cell type markers
Supplemental Table 9: CD cell subset markers
Supplemental Table 10: CD cell state (i.e. end cell cluster) markers
Supplemental Table 11: FGID end cell cluster descriptive names and short curated names look up table
Supplemental Table 12: CD end cell cluster descriptive names and short curated names look up table Supplemental Table 13: CD cell type markers
Supplemental Table 13: Number of cells per patient per end cell cluster Supplemental Table 14: PCA Loadings for joint Epithelial, Myeloid, T/NK/ILC vectors
Supplemental Table 15: Differential composition testing for NOA vs FR, NOA vs PR and PR vs FR categories of anti-TNF response with CD patients
Supplemental Table 16: PC2 top positive and negative cell clusters
Supplemental Table 17: 92 markers derived from PC2 used for bulk RNA seq extension
METHODS
RESOURCE AVAILABILITY
Corresponding Authors
Further information and requests for resources and reagents should be directed to and will be fulfilled by Leslie S. Kean, Jose Ordovas-Montanes, and Alex K. Shalek.
Data and Code Availability
Single-cell RNA-seq data can be visualized via the Single Cell Portal links for each atlas of FGID https://singlecell.broadinstitute.org/single_cell/study/SCP1422/predict-2021-paper-fgid and pediCD https://singlecell.broadinstitute.org/single_cell/study/SCP1423/predict-2021-paper-cd. The cell-by-gene matrices will be available with the peer-reviewed version of this manuscript. The raw human FASTQ files will be available from the Broad controlled access repository DUOS with the peer-reviewed version of this manuscript. Please contact the authors
All original code has been assembled as the ARBOL package and deposited at GitHub and is publicly available together with this manuscript at: https://github.com/jo-m-lab/ARBOL and https://jo-m-lab.github.io/ARBOL/ARBOLtutorial.html.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Study Population and Clinical Parameters
Pediatric patients less than 20 years of age with suspected inflammatory bowel disease were enrolled on the PREDICT Study (ClinicalTrials.gov# NCT03369353). Enrollment took place between November 9, 2017 to December 21, 2018 in accordance with the Fred Hutch Institutional Review Board (Protocol #9730, ethical approval given) with written informed consent and assent when applicable. Patients diagnosed with Crohn’s Disease (CD) were included and patients without gut inflammation on endoscopy and histology, and who were diagnosed with Functional GI Disease (FGID), served as a comparative cohort for this study. Terminal ileum and blood samples were taken during the diagnostic endoscopy procedures prior to initiation of therapy. Patients diagnosed with other inflammatory or infectious etiologies on endoscopy and biopsy were excluded from the analysis.
Clinical course and variables were monitored at the time of enrollment and for 3 years after initial endoscopy, with median follow up for CD being 32.5 months and FGID being 31 months at the time of clinical database lock (December 1, 2020). Medical management was dictated by clinicians. Clinical variables obtained included sex, race, age at diagnosis, weight z-score, height z-score, BMI z-score, clinical disease severity using the Pediatric Crohn’s Disease Activity Index (PCDAI), and disease location and phenotype using the Montreal Criteria (Hyams et al., 1991; Silverberg et al., 2005). Laboratory evaluation included C-reactive protein, ESR, hemoglobin, albumin, white blood cell count, and platelet count.
Response to Anti-TNF therapy
Early anti-TNF or immunomodulator therapy was defined as initiation of immunosuppression within 90 days of diagnostic endoscopy. Anti-TNF monoclonal antibody was started in 10 patients with CD. All patients were followed prospectively and categorized as full responders (FR), partial responders (PR), or not on anti-TNF (NOA). Full response to anti-TNF is defined as clinical symptom control and biochemical response with wPCDAI score of <12.5 on maintenance anti- TNF therapy and partial response defined as lack of clinical symptom control and biochemical response with documented escalation of anti-TNF therapy.
Clinical Statistical Analysis
Clinical variables are expressed as median (lower and upper confidence interval; range) and compared using the Mann-Whitney U test. Categorical variables were described as frequencies and percentages and compared using the chi-square test. Clinical laboratory values are represented by mean and standard error of the mean (range) and compared with the Mann- Whitney U test. Significance is indicated by a P value of <0.05. Clinical statistical analysis was performed using GraphPad Prism version 8.3.0.
METHOD DETAILS
Tissue Dissociation into Single-Cell Suspensions
Human Ileum
Single-cell suspensions were collected from intestinal biopsies using a modified version of a previously published protocol (Persson et al., 2013) as described in (Smillie et al., 2019). One biopsy from the terminal ileum was received directly in hand and processed with an average time from patient to loading on the 10X Chromium platform of 2.5 total hours, and never exceeding 3.5 hours. While intact, biopsy bites were handled using a P1000 pipette applying gentle suction, and all centrifugation steps done in a temperature controlled 4°C centrifuge. Biopsy bites were first rinsed in 30 mL of ice-cold PBS (ThermoFisher 10010-049) and allowed to settle. Each individual bite was then transferred to 10 mL epithelial cell solution (HBSS Ca/Mg-Free [ThermoFisher 14175-103], 10 mM EDTA [ThermoFisher AM9261], 100 U/ml penicillin [ThermoFisher 15140- 122], 100 μg/mL streptomycin [ThermoFisher 15140-122], 10 mM HEPES [ThermoFisher 15630- 080], and 2% FCS [ThermoFisher 10082-147]) freshly supplemented with 200 μL of 0.5M EDTA. Separation of the epithelial layer from the underlying lamina propria was performed for 15 minutes at 37°C with rotation at 120RPM. The tube was then removed and placed on ice immediately for 10 minutes before shaking vigorously 15 times. Visual macroscopic inspection of the tube at this point yielded visible epithelial sheets, and microscopic examination confirmed the presence of single-layer sheets and crypt-like structures.
The remnant tissue bite was carefully removed and placed into a large volume of ice-cold PBS to rinse before transferring to 5mL of enzymatic digestion mix (Base: RPMI1640, 100 U/ml penicillin [ThermoFisher 15140-122], 100 μg/mL streptomycin [ThermoFisher 15140-122], 10 mM HEPES [ThermoFisher 15630-080], 2% FCS [ThermoFisher 10082-147], & 50 μg/mL gentamicin [ThermoFisher 15750-060]), freshly supplement immediately before with 100 μg/mL of Liberase TM [Roche 5401127001] and 100 μg/mL of DNase I [Roche 10104159001]), at 37°C with 120 rpm rotation for 30 minutes. During this 30-minute lamina propria (LP) digestion, the epithelial (EPI) fraction was spun down at 400g for 7 minutes and resuspended in 1 mL of epithelial cell solution before transferring to a 1.5mL Eppendorf tube in order to minimize time spent centrifuging and provide a more concentrated cell pellet. Cells were spun down at 800g for 2 minutes and resuspended in TrypLE express enzyme [ThermoFisher 12604013] for 5 minutes in a 37°C bath followed by gentle trituration with a P1000 pipette. Cells were spun down at 800g for 2 minutes and resuspended in ACK lysis buffer [ThermoFisher A1049201] for 3 minutes on ice to remove red blood cells, even if no RBC contamination was visibly observed in order to maintain consistency across samples. Cells were spun down at 800g for 2 minutes and resuspended in 1 mL of epithelial cell solution and placed on ice for 3 minutes before triturating with a P1000 pipette and filtering into a new Eppendorf tube through a 40 μM cell strainer [Falcon/VWR 21008-949]. Cells were spun down at 800g for 2 minutes and then resuspended in 200 μL of epithelial cell solution and placed on ice while final steps of LP dissociation occurred. After 30 minutes, the LP enzymatic dissociation was quenched by addition of 1ml of 100% FCS [ThermoFisher 10082-147] and 80 μL of 0.5M EDTA and placing on ice for five minutes. Samples were typically fully dissociated at this step and after gentle trituration with a P1000 pipette filtered through a 40μM cell strainer into a new 50 mL conical tube and rinsed with PBS to 30 mL total volume. This tube was spun down at 400g for 10 minutes and resuspended in 1 mL of ACK and placed on ice for 3 minutes. LP cells were spun down at 800g for 2 minutes and resuspended in 1 mL of epithelial cell solution and spun down at 800g for 2 minutes and resuspended in 200 μL of epithelial cell solution and placed on ice. Following centrifugation, the cells from both EPI and LP fractions were counted and prepared as a single-cell suspension for scRNA-seq. Since the full EPI isolation was not performed on all patients limiting sample sizes, here we focus our analysis on LP fractions.
Flow Cytometry
Multicolor flow cytometry was performed on tissue samples to examine the immune composition for enrolled patients. Flowjo software was used to phenotypically define cell populations that will be analyzed and compared in patients using two-way ANOVAs (or non-parametric equivalent). Antibodies used include: CD3 APC, SP34-2 (BD Biosciences); CD3 BUV661, UCHT1 (BD Biosciences); CD3 BV711, OKT3, (Biolegend); CD3 PE, SP34 (BD Biosciences); CD4 BV785, OKT4 (Biolegend); CD8a BUV395, RPA-T8 (BD Biosciences); CD8b FITC, REA715 (Miltenyi Biotec); CD11b APC-Cy7, ICRF44 (BD Biosciences); CD11c APC-eFlour 780, BU15 (Fisher Scientific); CD11c BUV661, B-ly6 (BD Biosciences); CD14 APC-eFluor 780, 61D3 (Fisher Scientific); CD14 BUV737, M5E2 (BD Biosciences); CD20 APC-eFluor 780, 2H7 (Fisher Scientific); CD20 PE-Cy7, L27 (BD Biosciences); CD38 APC, HIT2 (BD Biosciences); CD45 PerCP/Cy5.5, HI30 (Biolegend); CD45RA BV605, HI100 (Biolegend); CD56 (NCAM) FITC, TULY56 (Fisher); CD94 APC-Vio770, REA113 (Miltenyi Biotec); CD117 (c-kit) BV421, 104D2 (Biolegend); CD123 BV711, 9F5 (BD Biosciences); CD127 Biotin, HIL-7R-M21 (BD Biosciences); CD161 BV711, DX12 (BD Biosciences); CD197 (CCR7) BV421, GO43H7 (Biolegend); CD294 (CRTH2) BV605, BM16 (Biolegend); CD326 (Epcam) APC, HEA-125 (Miltenyi Biotec); HLA-DR APC-H7, L243 (G46-6) (BD Biosciences); TCR PAN γδ PE-Cy7, IMMU510 (Beckman Coulter); α4-β7 integrin (Act-1), (NIH AIDS Reagent Program); Streptavidin BUV737 (Fisher); Live/dead Fix Aqua (Fisher); R-PE Antibody Labeling Kit (300 mcg) (Abcam).
Methods to Generate Single-Cell RNA-seq Libraries and Sequencing
10X v2 3’
Single cells were loaded onto 3’ library chips as per the manufacturers protocol for Chromium Single Cell 3’ Library (v2) (10X Genomics). The LP fraction was captured in its own channel of the 10X Chromium Single Cell Platform, in order to recover sufficient numbers of cells for downstream analyses. An input of 10,000 single cells was added to each channel with a recovery rate of 9,514 cells per sample based on median across samples. Briefly, single cells were portioned into Gel Beads in Emulsion (GEMs) in the Chromium controller with cell lysis and barcoded reverse transcription of RNA, followed by cDNA amplification, enzymatic fragmentation and 5’ adaptor and sample index attachment. Libraries were sequenced on a HiSeq or NovaSeq flow cell. The read structure was paired end with length of read 1 26bp, length of read 2 91bp, and the length if index 1 (i7 primer) 8bp. Quality-filtered base calls were converted to demultiplexed FASTQ files.
Alignment and Filtering
FASTQ files were aligned to GRCh38 using Cellranger v2.2 pipeline on the Cumulus/Terra cloud pipeline https://portal.firecloud.org/?return=firecloud#methods/cumulus/cellranger_workflow/10 generating 27 cell-by-gene matrices (13 FGID, 14 CD), one for each patient. We used default parameters of the 10th snapshot version of the pipeline, aside from requiring that it use cellranger v2.2.0.
Every sample was first filtered excluding genes measured in fewer than 3 cells and cells with fewer than 200 unique genes. To control for doublets and low-quality cells we then further filtered individually, attempting to match the approximate 10,000 cells loaded onto the sample lane and balancing the thresholds to not cut out dense regions of a Ncounts by Nfeatures scatter plot. Pre- filtering, we looked for outlier samples, based on proportion of percent mitochondrial genes, number of counts, and number of features, none fell beyond the 1.5 times the IQR threshold.
Exact thresholds used for each sample:
Post filtering, we merged sample matrices using an outer join to create an FGID dataset (115,569 cells) and a pediCD dataset (139,342 cells).
QUANTIFICATION AND STATISTICAL ANALYSIS
Preprocessing & Clustering of scRNA-seq Data
1st Approach – Classical Methods on Combined Dataset
We began initial analysis following traditional clustering and annotation techniques; however, these methods using manual and at times subjective metrics scaled poorly to the size and scope of our dataset and moreover did not give clear distinction between disease specific cell states and compositional shifts within cell states across disease.
For our first pass at analysis, we grouped the FGID and pediCD datasets together (254,911 cells) and proceeded with the standard Seurat v3.1.5 pipeline (Stuart et al., 2019). We used manual heuristics of gene marker specificity to choose cluster resolution and isolate 9 major cell types (T, B, plasma, epithelial, endothelial, fibroblast, myeloid, mast cell, and glial) and 1 aggregate cluster of T, B, myeloid, and epithelial cells with a strong proliferation signature. We then subclustered the proliferating group and manually merged the proliferating cells with their corresponding cell type based on marker gene expression, and separately re-preprocessed and clustered each cell type annotating based on one vs. rest differential expression (Wilcoxon, fdr < 0.05) within the cell type.
We found several disadvantages to this approach. First, we found it difficult to determine for each cluster whether we should be looking for changes in compositional frequency or gene expression. Particularly within the myeloid major cell group we would find extremely disease biased sub- clusters, as much as a 9:1 ratio between pediCD to FGID. It was unclear whether there was massive compositional shift within a conserved cell state or if instead a base cell state was split into multiple clusters based on phenotypic differences in disease and we should perform a differential expression test between it and neighboring FGID biased clusters. Second, after two rounds of manual processing we were still unsure if we had reached a base level with each end cluster corresponding to a unique and biologically homogeneous cell state. Third, at that point, having partitioned over 100 distinct clusters, individually supervising each subsets processing and sub-clustering was infeasible. We needed a more systematic method to address these challenges.
2nd approach – ARBOL: Iterative Tiered Clustering (ITC) on Separated Disease Conditions
It is common to organize cell identity ontologies in a tree structure. With major groups such as immune, stromal, and epithelial at the top and branching down a level, you might set more nuanced identities like T, B, endothelial and goblet cell types as a second tier, and even more nuanced identities like CD4+ vs CD8+ T cells as a third. In ideal circumstances, this mental model conforms well to RNA-seq data where we can layer gene modules with more and more specific variation together to describe highly particular cell identities and states. And, by clustering at a high level with genes that vary across the entire dataset, then sub-clustering with genes that vary only within a particular parent cluster we are able reveal this hierarchy of cell identity. Many additional factors to cell identity contribute to the variation in gene expression within actual datasets, particularly as we found with disease condition during our first approach.
To be able to choose the appropriate future analyses and comparisons, we need a highly accurate representation of cell identity and state. The underlying issue in our first pass at clustering was that in combining the disease conditions together, the variable genes selected at each stage represented a combination of differences between cell identity & disease. This combination could have been manageable if either disease or cell identity were consistently more variable. We could isolate one factor at a specific tier in the hierarchy before sub-clustering to isolate the other. In our case, disease and cell identity both had many overlapping scales of variation. To address this problem, we isolated cell identity by separating our dataset by disease and clustering for cell identity within each disease set (FGID 115,569 cells, pediCD 139,342 cells). This approach did then require us to perform an additional stage of analysis to find corresponding clusters between the two datasets, but allowed us to far more effectively distinguish type, scale, and specificity of disease differences.
Within each disease set we still needed a method to ensure we were reaching the bottom level of biological heterogeneity, and preferably an automated method as our first pass had shown the potential for isolating hundreds of cell states. To efficiently cluster and isolate these cell states we wrote a cloud-based pipeline to systematically optimize parameter selection and stop when biological heterogeneity is exhausted. Homogenous cell subsets were isolated by recursively normalizing, selecting variable genes, and clustering based on silhouette score. We stopped recursing into sub-clusters once we reached one of four end conditions defined as:
Having a group of less than 100 cells (though we did partition many clusters smaller than 100 cells after clustering groups just larger than that cutoff).
Isolating an optimized clustering of only one cluster.
Finding two clusters that have fewer than 25 genes (fdr< 0.05 & |log fold change| > 1.5 & percent expression >= 20% in at least one cluster) differentially upregulated between each cluster using a bimodal test developed in (Shekhar2016 10.1016/j.cell.2016.07.054). For this last condition, if reached, we reject the clustering and return back the cells as a single end cluster.
Having reached a max tier limit. Setting this value to 10, we never triggered this condition with either FGID or pediCD datasets, but included it to prevent runaway recursion.
Code for generating this tree of cell clusters is currently available here: (https://jo-m-lab.github.io/ARBOL/ARBOLtutorial.html. Within each recursion, the established steps were processed using Seurat version 3.1.5 (https://github.com/satija.lab/seurat). Normalization and variable gene selection were processed with SCTransform (https://github.com/ChristophH/sctransform) (Hafemeister and Satija, 2019). Clustering for major cell types was performed using Louvain clustering on dimensionally reduced principal components.
Parameters depend on the size of the dataset, and thus must be adjusted based on how many cells are being partitioned for each recursive step. When calculating nearest neighbor graphs, and clustering we set the K parameter to ‘ceiling(0.5*sqrt(N))’. We chose the number of principal components based on the top 15 percentile of calculated improvement of variance explained. For subsets less than 500 cells we used Jackstraw to calculate significant principal components. If neither method succeeded, we chose the first two principal components. This only occurred rarely at late stages where Jackstraw was unable to find significance. We set clustering resolution via a grid search optimizing for maximum average silhouette score (silhouette measures the ratio of intra-cluster distance to inter-cluster distance, where a high score means highly distinct clusters). For stages where we were clustering more than 500 cells a randomized subsample of N cells / 10 was used to calculate the average silhouette score.
Additionally, at each recursive step we output quality metrics and basic plots, such as 1:rest differential expression from the optimal partitioning at each stage and UMAP representations painted by sample metadata (sample ID, cluster number). Our pipeline, saved output as a directory structure matching the tree discovered by this recursive clustering. This tree represents the lower levels of variance of discovered at each tier. At any tier level we are able to extract the cell’s partitioning. Due to the intermixing of patient and cell identity effects at multiple levels of the tree (a fraction of a single patient’s cells might separate out at a high level, but then continue to separate into identifiable cell types, or vice versa), we found the most meaningful levels at the top and bottom of the tree. The clustering tree is useful for understanding the levels of variance in the dataset, but we found it contains too much noise to be easily interpretable. Thus, we later generated a hierarchical clustering of the bottom level clusters based on pairwise differential expression, which is displayed in figures (Figure 3: FGID atlas, Figure 4: pediCD atlas). See the hierarchical clustering of cell subsets section for details.
Cell Type and Subset Annotation from Tiered Clustering
After running the hierarchical tiered clustering pipeline we manually curated the generated tree of clusters. Specifically, tree generation was reinitiated for the B Cells within the FGID dataset as it had stopped at the first tier on two clusters with < 50 genes differentially expressed, however we could see in this case that there was additional biological stratification based on strong differential expression of CXCR4 (Wilcoxon; logFC=1.22860917, Bonferroni.p=1.0E-300) CD69 (Wilcoxon; logFC=1.27527652, Bonferroni.p=2.99E-151), HMGN1 (Wilcoxon; logFC=-1.1688612, Bonferroni.p=1.62E-227) and HMGA1 (Wilcoxon; logFC=-1.28838294, Bonferroni.p= 1.06E-209) among others. This formed a clear divide between non-proliferating and proliferating B Cells, further validated by a clear separation within the UMAP based on PCA reduced variable genes within the B cells. We further examined each branching point of the tree to determine its splitting cause, noting splits based on spillover, doublet, and singular patient effects. Splits at higher tiers based on doublets often split again allowing us to recover cells that did not have the dual expression profile. Splits that only had patient splits below (measured by having only clusters of single patients) were manually marked as end clusters, thereby merging all clusters below that split. With these manual steps made, we performed pairwise differential expression to ensure each partitioned subset is distinct from its neighbors.
We annotated these final clusters with four methods attempting to balance descriptiveness, ease of understanding, and ease of name generation: The first method, is generated during the hierarchical tiered clustering by following the path from the end cluster up to the original tier. An example annotation is T0C0.T1C3.T2C3.T3C5 marking an end cluster that split at tier 1 into cluster 0 and at tier 2 into cluster 3. These annotations do not provide any biological information to the reader, but do provide a unique ID for the end cluster. Our second method is far more descriptive, where we manually annotate the main reason for each particular split. This still follows the original ranking of variation as found by the hierarchical tiered clustering, while also providing biological interpretation, as an example: CD.Mloid.macrophage_chemokine.S100A8_S100A9_CXCL9_CXCL10_TNF_inflamonocyte.
This method of annotation was particularly useful during analysis as we were immediately able to see how early or late two clusters had split from each other, as well as seeing a number of the subset defining genes. Unfortunately, as is apparent this method also produces extremely long names that are difficult to display and to which to refer. It is also a highly manual process, and difficult to reproduce precisely. To better present our findings and aid others in reproducing our results, our third method automates this annotation. This method is performed by taking each major cell type, which in our case matched the tier one splits, and performing 1:rest differential expression testing (Wilcoxon; adj.p<0.05, only.pos=True) within each major cell type. We then ranked the genes based on the product of ‘-log(Bonferroni.p)’, ‘avg_logFC’, and ‘pct.exp.1/pct.exp.rest’ and took the top 5, forming a name like CD.Mloid_CCL3_CCL4_CCL3L3_TNF_TNFAIP6. This scheme, again was useful, but did not quite meet our demands of recognizability and brevity. Thus for T and Myeloid cells we adjusted these names to a finer degree of specificity by visualizing the expression profiles of each subset with a dotplot of canonical marker genes based off of current literature, and limiting to the top 2 genes based off our method 3 rankings and the dotplot of canonical markers, thereby producing the fourth and final annotations in the form: CD.Mono.CXCL10.TNF. Due to the limited nature of current characterization of stromal and epithelial cells we were unable to match the same degree of specificity as the T and Myeloid cells, however we did where possible adjust from the major cell type, to the most specific that we could be confident of. For instance, adjusting “Epith” to “Goblet” based on marker expression of TFF3 and MUC13.
Hierarchical Clustering of Subsets on Unified Gene Space and Removal of Doublets
At this point we had generated a hierarchical representation of the datasets from the top down showing the splits of highest variation at every level. By necessity that means that each level is controlled by and represents different selections of genes, which may have no relation to the genes selected in another branch. To understand the relations of cell subsets and compare across cell type we needed a unified set of genes. For each dataset (FGID and pediCD) we performed pairwise differential expression (Wilcoxon; Bonferroni.p<0.01, max.cells.per.ident=500) and selected the top 50 most significant genes from each test. Gene lists were merged as a union, finding 4445 unique genes for FGID and 1760 unique genes for pediCD that best differentiate the subsets. Subset centers were calculated from these selected genes as the median expression of cells grouped by subset. The resulting table was then hierarchically clustered using correlation distance and complete linkage. Clustering was performed in R using the pvclust package (https://github.com/cran/pvclust)
The resulting tree shows from the bottom up the relationships between cell subsets, and allows cell subsets that were potentially misclassified at a high split in hierarchical tiered clustering to find their biological neighboring subsets. As previously mentioned within our description of hierarchical tiered clustering we did not find any end cluster subsets that met our thresholds for merging. This does not mean that we did not observe shuffling from the initial tiered splits. While overall there was good agreement between the two methods, we noted subsets jumping between major cell types as defined by the first splits of our tiered clustering. We identified the majority of these jumping subsets as doublet clusters by exploring their differential gene results at multiple levels of the tiered clustering tree. We removed these doublet subsets and others based on flipping expression programs at different tiers. For instance, looking like T cells expressing TRAC, IL7R within an epithelial cluster, then at the next tier expressing KRT18 and PIGR. After removing doublets, we recalculated subset distances and dimensional reductions, as presented in the main figures.
Finding Corresponding Cell Subsets Between Diseases
Separating the data on disease condition into two datasets was important as it allowed us to isolate the axis of cell identity within each disease and be confident in the homogeneity of each subset. But does present the problem of how to make comparisons across the disease condition, which is our main research question.
1st Approach – KNN classifier
Our first attempt to find corresponding clusters followed the methods of Tasic et al. 2018. We used the best differentiating genes sets created for our unified gene space clustering to as the mapping space for a nearest-neighbor classifier. For each cell within the disease condition, we could map it to the nearest cell subset within the other disease condition. As a trial run, we created this gene space for each major cell type of the FGID disease condition and performed 5-fold cross-validation. Unfortunately, we could only achieve accuracies of up to 55%.
Rethinking approach
From this trial run we realized that we needed a more automated system to choose genes as the most significantly differentially expressed genes did not create enough separation between cluster centers to effectively classify new cells. We chose to use an RF classifier as it allowed us to train for the optimal selection of genes, required little to no preparation of data, and provided probabilities of each cell being predicted to each class. These probabilities for each class proved particularly useful do to our second realization. Because the number of subsets differs between disease conditions, we cannot make the assumption that there is a one-to-one relationship between conditions. We also cannot make the assumption that the many-to-one relationships are unidirectional with one base subset splitting into many states only from FGID toward CD. A single classifier would not allow us to distinguish between these many types of relationships. However, we realized that by creating a classifier for both directions (FGID to pediCD & pediCD to FGID) we could take advantage of the difference in confidence between the two classifications to discover the direction and type of relationship. In a perfect world of 1-to-1 relationships, we would expect all cells of subset A in condition X to match with 100% confidence to subset B in condition Y. In that particular case the summed probability equal 2 and there would be zero difference in confidence of one classifier to its matching classifier. In our imperfect world we might instead see 90% of cells of subset A in condition X to matching with > 85% confidence to subset B in condition Y, and only 30% of subset B in condition Y matching with > 85% confidence. From this discrepancy we can infer that subset A may be a cell state in condition X that is layered on top of a base state B in condition Y. Low confidence in both directions indicates subsets unique to a particular condition.
2nd Approach – Training Random Forest Model
After these realizations we trained random forest classifiers for each cell type in each disease condition using SciKit-Learn v0.22.2, with the intent to classify each cell to the subset in the opposed dataset the cell is most similar to (Pedregosa et al., 2011). For each cell type we optimized a classifier for accuracy using grid-based search tuning number of trees, depth, number of features, criterion, and min samples per leaf with 5-fold cross validation for each set of tuning parameters. We never observed full overfitting where the accuracy on test folds began to drop with increased size of model, but we did quickly find diminishing returns as we increased model size. For simplicity and because optimal tuning parameters were robust to overfitting, we chose to use the same largest model parameters for all models (number of trees = 500, depth = 200, number of features = sqrt, criterion = gini, min samples per leaf = 1). Our initial training rounds found accuracies in the mid 60%. A definite increase from the NN classifier, but not high enough for us to be confident in the results. The main issue we eventually determined to be the uneven class distributions (far more cells in subset A than subset B). This caused the smaller subsets to be under trained. To compensate we up-sampled with replacement each subset within the training fold to contain at least the 75th quantile number of cells. This single change improved accuracy on the unmodified test fold the most, varying from 5-15% improvement of accuracy, precision and recall across each cell type, and provided accuracies ranging from high 70 to low 90 percent per major cell type.
Applying Random Forest Model
Satisfied with the validation results within each disease condition, we were finally ready to fully run the random forest model across the disease conditions. We trained each random forest model with optimized parameters on all folds of its dataset, then proceeded to get probability predictions for each cell from the disease condition to the trained disease condition. With these class probabilities per cell we could aggregate for each disease condition by taking the mean class probabilities for each group, leaving us with 2 n by m table where n equals the number of subset groups and m equals the number of subset classes in the opposing disease condition. Using the mean probabilities for the group allowed more information from the cell level to rise to the aggregated levels than using the individual class prediction alone (computed as the class with max confidence of cell membership). These tables also provide confidences to all classes which is important for understanding the transverse confidence in both directions.
It is especially important to understand the many-to-one relationships between disease conditions and find where a base cell state becomes layered in additional expression profiles, as these are the exact cases where we can infer the underlying signaling patterns that diversify or concentrate cell state profiles. In diverse splitting of a subset across disease we can start to understand the heterogeneity of patient response to treatment as it becomes clear which particular cell profiles are correlated with strong and poor response. To gain insight to these changes, we care about where there is strong confidence in both directions and where there is strong confidence in only one direction. The simplest method to calculate these is to separately take the sum of the pairwise prediction confidences and the difference. We call the sum of confidences the correspondence of a subset, and the difference the bias.
Visualizing Correspondence and Bias
We plot these metrics on a dot plot where each possible connection is laid out on a grid. For each dot we set the size to match the correspondence, and color the dot based on the bias, such that a perfect match would appear as a large white circle. A more unidirectional match would be tinted darker in the color matching the disease condition with more confidence. Matches with more bias tend to indicate a subset matching a base cell state but also expressing some additional gene modules. To aid the human eye on picking up the major patterns we filter to only show the top 10% highest correspondences. This parameter was chosen after looking at the distribution of correspondence scores and selecting the majority of the right tail of the distribution. It keeps the strongest matches in both ways and keeps the strongest in highly biased matches. To also aid the human eye we perform a hierarchical clustering using cosine distance and complete linkage on the prediction confidences and compute an optimal ordering based on the cosine distances using the “cba” package in R: https://cran.r-project.org/web/packages/cba/index.html. This allows us to sort subsets on the rows and columns such that subsets that get predicted similarly are next to each other. From this visualization we are able to easily discern which are the subsets FGID that split into many phenotypes within pediCD from high correspondence and bias, which subsets don’t change phenotype much at all based on high correspondence and low bias, and which are the subsets are potentially unique to a disease condition based on very low correspondence and bias.
Association of Cell Subsets to anti-TNF Response
Compositional differences are an important metric for understanding the baseline differences that prognose a patent’s response to treatment. We measure these differences with proportional enrichment of particular cell subsets within each patient, and finding the significantly reproducible enrichments across disease. As an extreme toy example we might find that subset A cells comprise as much 80% of cells sampled in one condition whereas they might only comprise 30% in a different condition. This type of compositional analysis is highly affected by the number and choice of subsets included, and the sampling depth per patient (how may cells are collected). The first factor is controlled by the confidence in our clustering and using computationally optimized parameters. We further control this factor by limiting analysis of compositional shifts of cell states to within major cell types. This isolates the chance of error from affecting the entire analysis and allows us to gain a more direct biological insight of the rise and fall of particular cell states in the context of similar subsets. We control the second factor of sampling depth differences by computing a normalized cell count score per patient of the form (ncells in subset / ncells in patient’s major cell type) * 1e6. This score provides us with the number of cells expected per million.
Mann-Whitney tests
We input our cells per million score into a two-sample Wilcoxon test in base R, which is equivalent to the Mann-Whitney rank score test. We set a significance threshold of p_value < 0.05. We made 5 different pairwise comparisons (FGID vs FR, FGID vs PR, NOA vs FR, NOA vs PR, FR vs PR). Comparisons between FGID and pediCD groups were determined by finding maximum correspondence between the disease conditions for each subset.
Fisher’s Exact tests
A similar compositional analysis to that done with the Mann-Whitney was performed with a Fisher’s Exact test. Do the difference of the tests we input for each subset the number of cells for that subset against the number of cells not of that subset within the major cell type split on rows by pairwise comparisons (NOA vs FR, NOA vs PR, FR vs PR). We computed FDR correction of p_values at major cell type and entire dataset levels and found significance subsets at both levels. But, most interestingly in comparing the two tests we found that the Mann-Whitney discovered as significant (pval < 0.05) the portion of cell subsets with largest effect sizes. Understanding our limited patient number at these within pediCD comparisons and wanting to only report results most likely to be reproducible biology, we determined to only follow those subsets reported as significant within both Mann-Whitney and Fisher’s exact tests.
Principal component analysis of cell frequencies and correlation to clinical metadata
Cell frequencies were calculated per patient for cell subsets (i.e. end clusters) within parent cell types and cell subsets (i.e. end clusters) within all cells as CPM = ((count/sum(count)) * 1e6.
Principal component analysis (PCA) was performed on the resulting patient x CPM matrices using the R package stats::prcomp(., scale=TRUE). Variance explained per PC was calculated as std^2/sum(std^2). PCA loadings per patient and per cell subset were extracted from the prcomp() result. PCA1 and PCA2 from the total PCA x patient and from each celltype’s PCA x patient matrix were correlated with clinical metadata using Spearman rank correlation as calculated by the R package stats::cor.test(., method=’spearman’). P values were recorded from the cor.test() call, and FDR was calculated using R’s fdrtool::fdrtool(p.values, statistic=”pvalue”). For combined celltype PCA’s, patient x CPM tables were concatenated before PCA.
Gene set enrichment analysis
Fold changes between patients responding or not responding to anti-TNF therapy from RISK and E-MTAB-7604 cohorts were calculated with Seurat (v4.0.3) (Haberman et al., 2014; Hao et al., 2021) and DESeq2 (v1.30.1) (Love et al., 2014) packages, respectively. GSEA analysis was performed using the fgsea R package (v1.16.0) (Korotkevich et al., 2021). Genes with similar fold changes were preranked in a random order. The code for this analysis can be found in the GitHub repository https://jo-m-lab.github.io/3p-PREDICT-Paper/4_GSEA/PREDICT_GSEA_final.html.
Pseudotime analysis of expression landscape
The micrograin structure found through hierarchical tiered clustering is vital for being able to directly compare like cells across disease conditions, and find significant changes in phenotype and composition within individual subsets. It is also vital to understand how those like subsets relate to each other within a disease condition and how the larger macrograin structure differs across conditions. This macrograin structure can be explored through the gradients of gene expression among cells of a major type. Pseudotime and RNA-velocity are both excellent tools for exploring these gradients. For both tools, the choice of genes directly determines the structure found within the dimensional reduction, and thus what genes are chosen as significantly location specific within the resulting landscape of cells. for our purposes, as we knew we would be exploring a single cell lineage, and exploring the relationships of cell states within that space, we required for our dimensional reduction the genes common to that space. We selected genes by performing differential expression between the major cell type and all other cell types within that disease. We took the outer union of those genes. Then removed genes from the list found to be differentially expressed between disease conditions at the major cell type level. From these genes we performed PCA to 50 principal components and then computed a UMAP reduction to 2 components. This selection process allows the dimensional reduction to find smooth gradients between cells and provided a common space for cells of multiple disease conditions to exist.
From this common expression landscape we utilized Monocle3 https://cole-trapnell-lab.github.io/monocle3/ (Cao et al., 2019) to extract a best estimate linear path through the space, which we calculated a diffusion pseudotime on allowing use to numerically estimate the distribution of cells within the expression landscape. To compute the significance of changes in that distribution we used a permutation test of Hellinger distance between distributions. At each of 10,000 permutations we shuffled the group ordering within the comparison pair. We performed this test five times for comparisons between FGID vs FR, FGID vs PR, NOA vs FR, NOA vs PR and FR vs PR. Our threshold was set as Bonferroni corrected p_value < 0.05.
FGID and pediCD integration using STACAS
Integration of T cells from the FGID and pediCD datasets (n = 29640 and 38031, respectively) was performed using the STACAS package (v1.1.0) (Andreatta and Carmona, 2021) Sankey plot was created using RAWGraphs 2.0 beta (https://github.com/rawgraphs) (Mauri et al., 2017).
Differential expression testing
To calculate differential expression between FR and PR groups, for each subset with a least 50 cells in each condition we used a Wilcoxon test thresholded to 0.05 Bonferroni corrected p-value and down sampled using the “max.cells.per.ident” argument within Seurat’s ‘FindMarkers’ function to a maximum of 10000 cells. The limits on minimum and maximum number of cells were chosen to mitigate issues with comparisons between disproportionate populations and computational efficiency. There does still exist 2 orders of magnitude between the minimum and maximum; however, the subsets most of interest and reported through the manuscript are within the same order of magnitude
There are noted spillover effects within the expression tests. We observe ubiquitous contamination of genes such as IGHA1, IGHG1 and DEFA5 across all cell types and subsets. These genes are routinely found as enriched within more severe inflammation, beyond even this dataset. This is a real effect, but less than useful for understanding driving factors within individual cell subsets. So, we focus on significant differentially expressed genes that also have a high pct.cells.expressing.in/ pct.cells.expressing.out ratio.
General Statistical Testing
Parameters such as sample size, number of replicates, number of independent experiments, measures of center, dispersion, and precision (mean +/- SEM) and statistical significances are reported in Figures and Figure Legends. A p-value less than 0.05 was considered significant. Where appropriate, a Bonferroni or FDR correction was used to account for multiple tests, as noted in the figure legends or Methods. All statistical tests corresponding to differential gene expression are described above and completed using R language for Statistical Computing.
Supplemental Figure Titles and Legends
ACKNOWLEDGMENTS
We thank the patients and their families for helping and contributing to our study. We thank Kathy McConville and Sarah Mbonde for identifying and recruiting patients for PREDICT. We thank Carly G.K. Ziegler for discussions and advice related to iterative tiered clustering, and all members of the Kean, Ordovas-Montanes and Shalek labs for thoughtful discussions. J.O.-M is a New York Stem Cell Foundation – Robertson Investigator. J.O.-M was supported by the Richard and Susan Smith Family Foundation, the AGA Research Foundation’s AGA-Takeda Pharmaceuticals Research Scholar Award in IBD – AGA2020-13-01, the HDDC Pilot and Feasibility P30 DK034854, the Food Allergy Science Initiative, the Leona M. and Harry B. Helmsley Charitable Trust, The Pew Charitable Trusts Biomedical Scholars and The New York Stem Cell Foundation.
L.S.K is supported by NIH P01 1P01HL158504, R01 5R01HL095791, U19 U19AI051731, and by the Helmsley Charitable Trust. V.N. was supported by the International mobility of research, technical and administrative staff of research organizations (CZ.02.2.69/0.0/0.0/18_053/0016981). A.K.S. was supported, in part, by the Searle Scholars Program, the Beckman Young Investigator Program, a Sloan Fellowship in Chemistry, and the NIH (5U24AI118672, 2R01HL095791). V.M. reports research support from Novartis. V.T. Is supported by ASTCT New Investigator Award and CIBMTR/Be the Match Foundation Amy Strelzer Manasevit Research Program Award. S.B.S. is supported by NIH grants P30 DK034854 and RC2 DK122532, and the Helmsley Charitable Trust.