ABSTRACT
Immune-modulating systemic therapies are often used to treat advanced cancer such as metastatic clear cell renal cell carcinoma (ccRCC). Used alone, sequence-based biomarkers neither accurately capture patient dynamics nor the tumor immune microenvironment. To better understand the tumor ecology of this immune microenvironment, we quantified tumor infiltration across two distinct ccRCC patient tumor cohorts using complementarity determining region-3 (CDR3) sequence recovery counts in tumor-infiltrating lymphocytes and a generalized diversity index (GDI) for CDR3 sequence distributions. GDI can be understood as a curve over a continuum of diversity scales which allows sensitive characterization of distributions to capture sample richness, evenness, and subsampling uncertainty, along with other important metrics that characterize tumor heterogeneity. For example, richness quantified the total unique sequence count, while evenness quantified similarities across sequence frequencies. Significant differences in receptor sequence diversity across gender and race revealed that patients with larger and more clinically aggressive tumors had increased richness of recovered tumoral CDR3 sequences, specifically in those from T-cell receptor alpha and B-cell immunoglobulin lambda light chain. The GDI inflection point (IP) allowed for a novel and robust measure of distribution evenness. High IP values associated with improved overall survival, suggesting that normal-like sequence distributions lead to better outcomes. These results propose a new quantitative tool that can be used to better characterize patient-specific differences related to immune cell infiltration, and to identify unique characteristics of tumor-infiltrating lymphocyte heterogeneity in ccRCC and other malignancies.
INTRODUCTION
Renal cell carcinoma ranks seventh and tenth among the most diagnosed cancers among men and women in the US, respectively, accounting for 3.8% of all cancer cases and 2.5% of all cancer deaths (1). The most common type of renal cell carcinoma is clear cell renal cell carcinoma (ccRCC). Historically, metastatic ccRCC has been one of the first malignancies successfully treated with immune-modulating systemic therapy, using interleukin-2 and interferon-α (2). Immune checkpoint inhibitors (ICI) such as nivolumab, ipilimumab, pembrolizumab, and avelumab, have emerged as the first line therapy for metastatic ccRCC, typically administered in combination with each other or with a targeted therapeutic agent (3). The arrival of immune checkpoint inhibitors has precipitated a tremendous research effort aiming to accurately characterize the tumor-immune microenvironment and explore potential biomarkers to predict ICI response, for which robust markers have been largely elusive. Most investigations have focused on tumor-centric variables including somatic mutations and gene expression. Fewer studies have been focused on host factors that contribute to the microenvironment or focused on how differences among these factors may affect clinical or therapeutic outcomes.
Across tumor types, response to ICI has been correlated with higher frequencies of somatic mutations that are believed to give rise to tumor-specific neoantigens, and to stimulate a robust antitumor immune response (4–6). In contrast, analyses of renal cell carcinomas have demonstrated a relatively low frequency of somatic mutations, yet very high levels of immune cell infiltration. These findings suggest that a high mutational burden is not solely responsible for inducing immune infiltration in ccRCC (7–11). Additionally, recent work has demonstrated that CD8+ T cell infiltration alone does not predict response to ICI. Refinements of characterizing immune cell populations are needed to understand the microenvironment and the biology underlying ICI response (12).
To initiate an anti-tumor immune response, tumor-specific neoantigens first require recognition by a T- or B-cell receptor (TCR, BCR) on a tumor infiltrating lymphocyte. The tumor infiltrating lymphocyte, complementarity determining region-3 (CDR3) is a highly variable region in the TCR/BCR that provides a complementary binding surface for antigens and largely determines the antigen specificity of the receptor. Investigations have shown the promise of CDR3-features as prognostic biomarkers for several malignancies (13–16), using sequencing and bioinformatic pipelines to recover single reads that represent the CDR3 amino acid sequence. These reads can be quantified as total recovered reads, potentially as a primary metric of immune infiltration (14–16). However, total count-based measures of CDR3 variability are unlikely to reflect the underlying complex biology of host adaptive immune response with the same accuracy as other measures of receptor diversity (17–19).
We hypothesized that immune cell receptor sequence diversity recapitulates important features of tumor biology such as origin, environment-driven evolution, and progression risk. We leverage properties of a generalized diversity index (GDI) (20–22), a measure applied in ecology and evolution, to quantify CDR3 diversity, and assess whether this diversity is associated with important clinicopathologic outcomes in ccRCC. GDI is evaluated as a continuous function along a range of order of diversity (q) values (20–22). At low values of q (low-q GDI), the index is a measure of distribution richness, i.e., the count of distinct types, sequences or clones, while the value at q=1 is closely related to Shannon’s diversity index (23). At high values of q (high-q GDI), the index approaches a measure of evenness or dominance, i.e., focusing on the dominant clone or sequence and its frequency. Here, we applied these diversity metrics to ccRCC tumor samples, and assessed the properties of GDI and their utility to serve as possible prognostic markers. We assessed tumor infiltrating lymphocyte TCR and BCR CDR3 diversity across the entire range of q, and for isolated values that have direct statistical interpretations. We analyzed two independent cohorts of ccRCC patients with bulk RNA-sequencing samples; the Moffitt Total Cancer Care (TCC) cohort (24), and the Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3) cohort (25).
MATERIALS AND METHODS
Clinical samples
Following institutional review board approval (H. Lee Moffitt Cancer Center’s Total Cancer Care protocol MCC# 14690; approved by the institutional review board; Advarra IRB Pro00014441), we retrospectively obtained clinicopathological and bulk RNA-sequencing patient data from electronic medical records, where all patients had provided written consent under the institutional Total Cancer Care (TCC) Protocol. RNA was prepared using the Qiagen (Venlo, NL) RNAeasy plus mini kit for RNA (frozen tissue) or the Qiagen All prep FFPE DNA/RNA kit (FFPE tissue). RNAseq sequencing libraries were prepared using the standard Illumina TruSeq RNA Access kit (now called TruSeq RNA Exome), according to manufacturer protocols. RNAseq libraries were sequenced on an Illumina HiSeq 4000 according to manufacturer protocols. RNAseq reads were aligned to the human reference genome (hs37d5) in an intron-aware manner with STAR (26). Table 1 shows a summary of the clinical information obtained from individuals in the Moffitt TCC Cohort. Relevant clinical and pathological outcomes available from the Moffitt TCC Cohort, including ranges of percentage of tumor with EGFR spice variant alpha are recorded in Table 1. Summaries of numbers of reads per samples in each of the cohorts is available in Supplemental Figure S1.
To further investigate the trends identified in point-estimates of diversity from T-cell receptor (TCR) and B-cell immunoglobulin recoveries the Moffitt TCC patient bulk RNA-sequencing, we validated trends identified in the Moffitt TCC Cohort analysis with complementary analysis with the RNA-sequencing from the under CPTAC-3 Cohort (written consent had been obtained under CPTAC guidelines). Table 1 shows a summary of the clinical information obtained from individuals in the CPTAC-3 cohort. TCGA-KIRC RNAseq based TRA and TRB CRD3s were obtained from Thorsson et al (27, 28) based on the dbGAaP approved protocol number 6300. Relevant clinical and pathological outcomes, aligning with outcomes available in the Moffitt TCC Cohort, that are available in CPTAC-3 and TCGA-KIRC Cohorts are reported in Table 1.
Recovery of immune receptor V(D)J recombination reads from bulk RNA-sequencing
Recovery of immune receptor V(D)J recombination reads was performed in two steps. First, RNAseq binary alignment map (BAM) files were searched, as a straight string search, for 10-mer nucleotide sequences representing the 3’ ends of every human V-gene and 5’ end of every human J-gene, for all seven immune receptors. Next, the resulting reads were aligned to reference V and J regions obtained from the International Immunogenetics Information System (IMGT). The quantitative parameters for the pairwise alignment were: (i) nucleotide match, + 5, (ii) mismatch, − 10, (iii) opening gap, − 10, and (iv) extending gap, − 10. The threshold for a V or J gene segment match was a score of ≥ 65. To ensure V and J read fidelity, only reads with a 20 nucleotide or greater match length for both V and J regions, and within the 20 nucleotides, a > 90% nucleotide match fidelity for both V and J regions were considered as matches. Additionally, and a productive CDR3 domain, defined as an in-frame junction without stop codons, was required for recombination read identification. Code for the method described above can be obtained at: https://github.com/bchobrut-USF/vdj under “Code Package A”. See also https://hub.docker.com/r/bchobrut/vdj for a container version of the code with a readme file.
Generalized diversity index for patient quantifying CDR3 receptor diversity
The generalized diversity index (GDI) can be viewed as a continuous, non-increasing function over a range of values described by the parameter q, called order of diversity. This parameter allows a consideration of multiple scales of diversity simultaneously or in combination. GDI is often used in ecology (20) and was more recently introduced to quantify intratumor heterogeneity and evolution (29–31). Formally, GDI is calculated as: where D(q) is called the diversity index at the given order of diversity q, n is the number of unique CDR3 sequences recovered across the entire cohort, and p! is the relative proportion of i-th CDR3 sequence. We typically evaluated the diversity score, D(q), for q between 0.01 and 100 numerically for each patient, for each of the receptor and immunoglobulin types individually, as well as in biologically meaningful groups (TRA+TRB together, TRG+TRD together, and IGH+IGK+IGL together). Varying the value of q represents interpolating between richness and evenness: richness is weighted more at low values of q, and evenness is weighted more at higher values of q. High-q GDI scales inversely with dominance or clonality.
Point-estimates derived from patient’s GDI, D(q), can be used for easy of comparison of sequence distributions across patients and cohorts. These point-estimates of interest include (i) low-q diversity (D(0.01)), which describes the epitope richness of the patients, (ii) high-q diversity (D(100)), which describes dominance of the “main driving” epitope, and (iii) ΔD, which measures the difference between low-q and high-q diversity (D(0.01) − D(100)). Furthermore, when visualizing the continuum of diversity measures D(q) with q in log-scaling, the continuum of diversity measures appears to have an inflection point, corresponding to a scale of diversity where small changes in the key parameter q can have large impact: the higher this value, the more even we expect a distribution to be, as the inflection point tends to infinity for perfectly even distributions (corresponding to n sequences all at frequency 1/n). Thus, two additional point-estimates of interest that we used are (iv) the value q at which an inflection point occurs (IP), and (v) the slope at the inflection point (denoted as IP slope). All code for calculating CDR3 diversity and its summary metrics has been implemented in Julia (version 1.4.0) and documented in the publicly available package OncoDiversity.jl.
To determine the impact of all five point-estimates of diversity, we ran a correlation analysis and determined that we could reduce our five point-estimates of diversity down to three metrics for comparison across receptor groups and patients. The Spearman correlation coefficients were calculated between point-estimate metrics and comparisons between low-q diversity, ΔD diversity, and the inflection point slope all had significant and very strong Spearman correlation coefficients of 0.98 or greater, so moving forward, we just focused on one of those metrics as a measure of species richness diversity (Supplemental Fig. S2). There was not a strong correlation between high-q diversity and the inflection point q metrics and either metric compared to any of the three species richness diversity metrics (low-q diversity, ΔD diversity, and inflection point slope), so we continue to look at the high-q diversity and inflection point q separately and in additional to the single species richness measure.
Assessment of clinical and survival associations with CDR3 features
Clinical associations were evaluated for recoveries in TRA, TRB, TRG, TRD, IGH, IGK, and IGL separately as well as in combinations of TRA+TRB, TRG+TRD, and IGH+IGK+IGL. After point estimates of diversity were calculated for each patient and each receptor subtype/combination, the clinical parameter values were assessed to identify if CDR3 receptor diversity could discriminated ccRCC patients based on relevant clinical and pathological outcomes, as well as the percentage of tumor with EGFR spice variant alpha. Largest diameter size and age were the only two continuous variables, which were evaluated by dividing the cohort into above and below the median of the diversity and compared with unpaired t-tests. All other categorial data types were divided by categories and the diversity metric was compared across the categories using unpaired t-test for 2 categories and ANOVA for 3 or more categories.
Survival correlations for the above combinations were performed by separating the cohort into above and below the median based on point-estimates of the generalized diversity index. In addition, the maximally selected rank statistical analysis was performed to estimate an optimal cut point in the quantitative point estimates as a binary classification rule regarding overall survival time (32). The Kaplan-Meier (KM) curve method was used to calculate survival probability and log-rank test was used to compare the above (high diversity) and below (low diversity) groups. Graph-Pad Prism software (version 8) and R version 3.6.1 were used for computing statistical comparisons and outputting figures.
xCell Scores
From bulk RNA-sequencing for each patient, xCell scores were calculated for various T-cell and B-cell subtypes as well as the Immune Score, Stroma Score, and Microenvironment Score. Then for each xCell score calculated a Spearman correlation was calculated for the total and unique recoveries identified for TRA receptor, IGL receptor, and aggregate combination (TRs+IGs).
Patients in the Moffitt TCC cohort had previously undergone bulk RNA sequencing of macro-dissected tumor samples using the TruSeq RNA Exome kit (Illumina) for 50 million 100– base pair paired-end reads. RNA sequence reads were aligned to the human reference genome in a splice-aware fashion using Spliced Transcripts Alignment to a Reference (STAR) (26), allowing for accurate alignments of sequences across introns. Aligned sequences were assigned to exons using the HTseq package (33) to generate initial counts by region. Normalization, expression modeling, and difference testing were performed using DESeq2 (34). For the CPTAC cohorts, detailed methodology regarding RNA sequencing can be found at its source web page(25).
RNA sequencing data was analyzed for cell-type enrichment using the xCell bioinformatic pipeline (25). xCell uses a compendium of validated gene expression signatures for 64 individual cell-types derived from thousands of expression profiles. Single-sample gene set enrichment analysis scores were adjusted for spillover compensation to generate an adjusted enrichment score for each cell type within the specimen, which is referred to as the xCell score. xCell scores were generated for each of the 64 cell-types for each ccRCC tumor specimen.
RESULTS
General diversity index (GDI) quantifies tumor infiltrating lymphocyte receptor subtype diversity in the TCC cohort
For each patient, we measured individual receptor CDR3 diversities across the 7 human adaptive immune receptor genes (TRA, TRB, TRG, TRD, IGH, IGK, IGL), as well as common combinations of these receptor subtypes (TRA+TRB, TRG+TRD, IGH+IGK+IGL, along with all 7 together, denoted at TRs+IGs). In the TCC cohort (n = 105), CDR3 sequences were recovered from bulk RNA-sequencing of patient tumor tissue. GDI was then calculated for each subtype and group of subtypes (the landscape of recoveries across common groups are shown in Fig. 1A-B and Supplemental Fig. S3 and distribution of individual recoveries in Supplemental Fig. S4). The Moffitt TCC cohort of ccRCC patients represented a cohort of clinically high-risk and advanced patients. Over two-thirds of the cohort contains patients with pathologic stages 3 or 4, including 6% of patients with highly aggressive sarcomatoid histology (Table 1). To quantify the GDI, we generated a continuum of diversity measures D(q) for each patient across values of the order of diversity, q. Then, we were able to compare clinical variables at specific point estimates of the continuum of diversity measures, as shown in Fig. 1C. We compared immune receptor subtype diversity across patients, and found that the point estimates ΔD diversity, high-q diversity, and inflection point (IP) of the GDI curve (see Methods) were unique summary metrics. The value of ΔD summarizes richness (total number of unique sequences) of receptor subtypes in a patient sample. High-q diversity focuses on the dominance (frequency of largest sequence) of a receptor subtype and de-emphasizes a rare receptor subtype. IP is a measure of receptor subtype evenness; with high IP values indicating an overall more level distribution, largely independent of receptor subtype richness (29–31).
Immune receptor subtype richness is associated with important pathological features in the TCC cohort
Across individual receptor subtypes, TRA and IGL receptor diversity consistently showed increased richness (in Fig. 2 exemplified with ΔD diversity comparisons) in tumors with larger diameters, higher grade, sarcomatoid status, and tumors from the left side. TRA receptor diversity split the Moffitt TCC cohort at the median of ΔD diversity. Of these, patients with ΔD values below the cohort median had a mean largest diameter size of 6.1 cm, compared to those with above the median who had a mean largest diameter size of 7.6 cm (Fig. 2A i, p-value: 0.0287). This same trend was reflected in IGL receptor diversity with the high ΔD diversity group (Fig. 2B i, p-value: 0.0195).
TRA receptor ΔD diversity in CDR3 amino acid sequences recovered from tumors with left laterality had an average diversity score 2.3-fold higher compared to those with right laterality tumors (Fig. 2A ii, p-value: 0.0097), which was also reflected in IGL receptor ΔD-diversity (Fig. 2B ii, p-value: 0.0445). In addition, patients with high tumor grade had increased TRA receptor ΔD diversity (Fig. 2A iii, p-value: 0.0227), which was also demonstrated in IGL receptor ΔD diversity (Fig. 2B iii, p-value: 0.0459).
Overall, Moffitt TCC cohort tumors that were identified with sarcomatoid histology had increased overall lymphocyte receptor diversity compared to those individuals who did not have sarcomatoid histology (demonstrated in Fig. 2A & B iv; p-value: 0.0430 in TRA and p-value: 0.0152 in IGL). This trend for increased diversity in sarcomatoid carcinoma tumors was statistically significant in all combinations except for TRG receptor diversity, which was one of the rarest CDR3 receptor subtypes recovered.
Our observations of increased lymphocyte receptor richness in larger diameter tumors, higher grade tumors, left laterality tumors, and sarcomatoid carcinomas were also discovered in other receptor subtypes, as well as in the combinations (Supplemental Fig. S5 shows size, laterality, grade, and sarcomatoid status across all combinations of receptors, Supplemental Fig. S6 shows the Shannon index of TRA and IGL across size, laterality, grade, and sarcomatoid status and Supplemental File 1 contains statistics for all comparison combinations across all clinical features).
Independent validation of GDI metrics
To validate the findings of increased diversity with poor pathological features we first calculated xCell scores (35) for patients in the Moffitt TCC cohort to confirm that the detected recoveries came from tumor-infiltrating lymphocytes. TRA receptor total and unique recoveries had the highest Spearman correlation scores with T-cell subtype xCell score and were less correlated with the B-cell subtype xCell score (Fig. 3A and full correlation analysis of all xCell scores with total and unique recoveries of TRA is shown in Supplemental Fig. S7, with total and unique recoveries of IGL in shown in Supplemental Fig. S8, and with total and unique recoveries of all CDR3s recovered is shown in Supplemental Fig. S9). The strongest correlation associated with TRA receptor was with CD8+ Tcm (r=0.819 with total recoveries; r=0.714 with unique recoveries) and CD8+ T-cells (r=0.751 with total recoveries; r=0.695 with unique recoveries), while the weakest correlation was with CD4+ Tcm (r=0.177 with total recoveries; r=0.185 with unique recoveries) and CD4+ naïve T-cells (r=0.243 with total recoveries; r=0.097 with unique recoveries). These correlations also held true for IGL receptor recoveries and total (TRs+IGs) receptor recoveries, but the correlations were moderate in strength (most Spearman correlation coefficients between 0.2-0.5) compared to the Spearman correlation coefficient associated with TRA (most correlation coefficients between 0.3-0.7). Furthermore, the Immune Score strongly correlated positively with total and unique recoveries, compared to the Microenvironment Score and Stroma Score, which both showed weaker Spearman correlation coefficients.
Once we confirmed that the diversity scores, we measured were attributable to the immune cell infiltration in the tumor with the xCell scores, we sought to independently validate our findings with a replicative study, using the ccRCC CPTAC-3 cohort (n=110). The CDR3 recovery landscape of CPTAC-3 and Moffitt TCC differed slightly. CPTAC-3 had more recoveries from T-cells; B-cells accounted for an average of only 78.96%. However, we did not identify any significant trend of differences between the proportion of T-cell and B-cell recovered based on stage (CPTAC-3 recovery landscape is shown in Supplemental Fig. S10 and fraction of B-cell and T-cell receptors recovered per patient grouped by stage is detailed in Supplemental Fig. S11). Laterality and sarcomatoid status could not be evaluated, as these are not available in CPTAC-3.
We confirmed the observation that higher grade and larger size of tumors are associated with increases in TRA and IGL receptor ΔD diversity in the CPTAC-3 cohort (Fig. 3B compared to Supplemental Fig. S12, and Supplemental File 2 contains statistics for all comparisons across all clinical features)—which independently validated that rich, but rather uneven receptor subtype distribution is associated with larger and high-grade tumors. TRA receptor ΔD diversity was significantly different between low and high grade (Fig. 3B i, p-value: 0.0212) and pT stages (Fig. 3B iii, p-value: 0.0368). In both TRA and IGL receptor distributions, the ΔD diversity score showed that large tumors (described pathologically as tumors with the largest diameter 7 cm or greater) have increased receptor subtype diversity compared to smaller tumors (those with largest diameters below 7 cm). Of note, CPTAC-3 has significantly different distribution of tumor sizes, with many small tumors (largest diameter < 7cm), compared to the Moffitt TCC cohort (Fig. 3B ii and v, Supplemental Fig. S13 compares the Moffitt TCC and CPTAC-3 Cohorts based on distribution of tumor sizes).
Furthermore, the differences in grade and stage (size data not available) could be replicated in the TRA recoveries from previously obtained TCR CDR3s(27, 28) from The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) Cohort. As demonstrated in the Moffitt TCC Cohort (Fig. 2A iii and Supplemental Fig. S12A iii) and CPTAC-3 Cohort (Fig. 3B i, iii), TCGA KIRC Cohort patients with higher grade (Fig. 3C i, p-value: 0.0008) and higher pT stage (Fig. 3C ii, p-value: 0.0027) had significantly higher TRA receptor ΔD diversity.
In addition to tumor sample differences, CPTAC-3 Cohort had a subset of 75 patients with matched normal tissue samples (matched normal tissue recovery landscape described in Supplemental Fig. S14). Lymphocyte receptor richness was increased in tumor samples compared to normal tissue, which was observed across all receptor subtypes and combinations (Fig. 4A, Supplemental Fig. S15, and Supplemental File 2). In the TRs+IGs combination, tumor tissues had on average, at least 2.6-fold increase in richness of CDR3 sequences recovered, compared to the matched patient’s normal tissue (mean score of 144.0 in normal tissue compared to a mean score of 377.1 in matched tumor). Furthermore, sequence dominance (measured by low values of high-q diversity) was decreased in all receptor subtypes except for the IGL receptor and IGH+IGK+IGL receptor combination (Fig. 4B, Supplemental Fig. S15, and Supplemental File 2). In the TRs+IGs combination, the tumor tissue had on average, at least 1.3-fold increase in high-q diversity compared to normal tissue, which indicates that the most abundance CDR3 sequence in the sample was about 2% lower, thus less dominant, in the tumor (mean score of 14.27 in the normal tissue compared to a mean score of 19.17 in the tumor tissue). Analyzing the inflection point q-metric of evenness in TRs+IGs combinations in the normal-tumor matched patients showed that normal samples have an almost 10% mean increase evenness compared to their matched tumor samples (Fig. 4C; p-value 0.0857). This data supported the hypothesis that normal tissues are expected to show very even CDR3 sequence distributions, and that better outcomes are to be expected in tumors that appear more normal in this context.
Immune receptor subtype evenness, measured by IP, is associated with survival
We first analyzed associations between immune receptors and overall survival in the Moffitt TCC Cohort. Each of our chosen diversity metrics (Methods) was compared for each of the 7 immune receptor types. We used the maximally selected rank statistics (maxstat) approach (Methods), with a cut-point that yielded a maximal survival difference, together with a multivariate Cox regression analysis. We found larger TRA sequence distribution IP, which measures distribution evenness, was significantly associated with longer overall survival (Fig. 4D). The IP value of the cut-point in the cohort that yielded a maximal survival difference was 0.826. Using this optimal cut-point resulted in a hazard ratio of 0.526 (Log-rank p-value: 0.049, Cox p-value: 0.036) with the low IP group (IP <0.826, n=15) having a median overall survival of 80 months, and the high diversity group (IP >0.826, n=88) having a median overall survival of 115 months. This trend could not be confirmed with the CPTAC-3 cohort due to the lack of survival information, i.e., overall survival data information was censored for 85 of the 98 CPTAC-3 cases (Supplemental Fig. S16). This trend was supported (although not statistically significantly) with the TCGA-KIRC cohort with individual with IP above the median produced a low IP group (IP<8.26, n=192) having a median overall survival of 92.1 months, and the high IP group (IP>8.26, n=197) having an undefined median overall survival due to survival fraction not falling below 50% in this group, which is likely due to this cohort also being comprised of lower grade and stage tumors compared to the Moffitt TCC cohort (Fig. 4E; log-rank p-value: 0.6104).
To further illustrate the utility of our diversity metrics for receptor subtype heterogeneity estimation, we show in Fig. 4F-H four Moffitt TCC cohort examples of characteristic differences in richness versus evenness space. In both examples of low evenness, dominance (low high-q diversity) is high (Fig. 4G). Interestingly, the survival difference unveiled by IP distribution evenness comparison does not necessarily coincide with tumor size, as size rather correlated with richness (ΔD, Fig. 4H). Taken together, these findings highlight the ability of these metrics to assess immune sequence distribution heterogeneity via GDI-derived point estimates.
Demographic and aberrant splicing differences in tumor-infiltrating lymphocyte receptor diversity
In addition to capturing differences in tumor pathology and survival, the GDI was also able to discriminate patients based on demographic differences. B-cell receptor high-q diversity is an estimate of dominance by the most abundant sequence: the lower this value, the more dominant the most abundant sequence. High-q diversity demonstrated differences in immune infiltration of white versus non-white ccRCC patients in the TCC cohort (Supplemental Fig. S17). IGH and IGL receptor high-q diversity alone, as well as the IGH+IGK+IGL and total recovery (TRs+IGs) showed at least a 2.1-fold increase in non-white patients compared to white patients (Supplemental Fig. S17A-B; p-values < 0.01). This trend was also observed in the IGK receptor high-q diversity, but only with a 1.6-fold increase in non-white individuals (Supplemental Fig. S17C; p-value 0.0561).
Furthermore, diversity metrics from T-cell receptors TRG and TRD discriminate patients based on gender differences in the TCC cohort. TRG and TRD receptor inflection point q was at least 1.7-fold higher in female patients compared to male patients (TRG: score of 2.503 in females vs. a score of 1.442 in males, p-value: 0.0033; TRD: score of 2.883 in females vs. a score of 1.454 in males, p-value: 0.0003; Supplemental Fig. S18). Moreover, when high-q diversity (dominance of the most abundant sequence) was compared between female and male patients with respect to TRG and TRD diversity, in both the Moffitt TCC and CPTAC-3 cohorts, female patients had increased high-q diversity compared to male patients (Supplemental Fig. S19).
We evaluated association of GDI with a recently described aberrant EGFR splice variant in ccRCC (36). The Moffitt TCC Cohort of patients were profiled for the presence of this EGFR variant (reported as % EGFR variant), and we found a positive correlation between the percentage of tumors expressing the EGFR variant and the evenness (inflection point q) scores from B-cell receptors (Supplemental Fig. S20). This correlation was most significant in IGL receptor diversity, with a Spearman correlation coefficient of r=0.3430 (p-value: 0.004, B-cell receptor IP vs. EGFR variant correlation analysis is demonstrated in Supplemental Fig. S20 and T-cell receptor IP vs. EGFR variant correlation analysis is demonstrated in Supplemental Fig. S21), indicating more even (high IP) distributions have higher proportion of cells with that EGFR variant. This positive correlation between IP and percentage of EGFR variant was not as strong nor significant in the T-cell receptors (Supplemental Fig. S21).
To determine if pre-existing host inflammatory environment contributed to differences in patient CDR3 diversity, we identified patients in the Moffitt TCC cohort who were diagnosed with diabetes and investigate if there were any associations with the diversity metrics and diabetes status. We found that none of the 11 receptor combinations explored had a significant difference in any of the metrics of CDR3 diversity (individual comparisons are reported in Supplemental File 1 and TRA, IGL, and TRs+IGs ΔD and IP comparisons are demonstrated in Supplemental Fig. S22).
Finally, we evaluated associations of GDI across the mutational landscape across all three cohorts. For each of the cohorts, we had mutational status on a subset of patients for common driver mutations in ccRCC including: BAP1, SETD2, KDM5C, MTOR, PBRM1, PTEN, TP53, and VHL. Across all three cohorts, the number of mutations in this shortlist of driver mutations was negatively correlated with richness (Fig. 5A). Interestingly, the Moffitt TCC cohort, generally considered the more aggressive cohort, had the fewest number of associations between diversity metrics and mutation status, while the CPTAC-3 and TCGA-KIRC cohorts had more significant associations (detailed for CPTAC-3 in Supplemental File 2 and TCGA-KIRC in Supplemental File 3). Furthermore, B-cell receptors (IGH, IGK, IGL) had more significant associations with mutational status than T-cell receptors (TRA, TRB, TRG, TRD; compared between the Moffitt TCC cohort described in Supplemental File 1 and CPTAC-3 cohort in Supplemental File 2). Specifically, with IGL recoveries in the CPTAC-3 cohort in patients with mutations in KDM5C, PBRM1, and VHL all had reduced richness (Fig. 5B i-iii) and mutation in PTEN was associated with increased evenness (Fig. 5B iv). These trends were confirmed in the total (TRs+IGs) recoveries of the CPTAC-3 cohort (Supplemental Fig. S23A) and the trends supported in by the Moffitt TCC cohort in direction, however they were not statistically significant (Fig. 5C for IGL trends and Supplemental Fig. S23B for TRs+IGs trends). In T-cell recoveries, both the CPTAC-3 and TCGA-KIRC cohorts showed reduced richness was associated with a mutation in PBRM1 (Fig. 5D), however this trend was not observed in the Moffitt TCC cohort.
DISCUSSION
Here, we demonstrate how a generalized diversity index (GDI), often used in ecology and evolution (31,37,38), can be applied to quantitatively characterize tumor infiltrating lymphocyte receptor subtype diversity in ccRCC. We identified point estimates of this index that are associated with important differences in patient demographics, tumor pathology, and survival. These metrics can help objectively characterize host differences in immune receptor subtypes in ccRCC patients. These novel objective metrics can provide insight into underlying tumor and host immune relationships by defining differences within and across patients. We used bulk sequencing data from ccRCC tumors to better understand these differences in patient immune receptor subtypes and these metrics can be replicated in other similar cohorts with available sequencing data. These host diversity metrics could be especially helpful in elucidating the ideal tumor microenvironment for response to immunotherapy agents in patients with metastatic ccRCC (39).
In contrast to more numerous recombination read recovery, the recovery of adaptive immune receptor recombination reads from RNAseq files is obtained via PCR amplification of adaptive immune receptors. This procedure is also called the immune repertoire approach (40). Our work here strongly indicates the specific value of the TRA and IGL GDI, discussed in more detail below. Thus, it has made sense to apply the immune repertoire approach to increase the number of recombination reads for all adaptive immune receptors. We expect that other immune receptor genes may have prognostic value, with more recombination reads to evaluate. Ιmmune repertoire approaches, particularly when applied to cancer samples, often result in a majority of reads that represent relatively few clonotypes. In addition, human aging substantially reduces clonotype diversity (41, 42), particularly Figs. 2C and 2D therein. Thus “sampling of the repertoire”, by mining genomics files over large patient databases, can generate conclusions regarding clonotype associations with clinical features. More recent preparations of RNAseq files, including those we use here, have become much more robust over the last several years, both in terms of read quantity and lengths (28). Recovery of adaptive immune receptors from those files also has become more robust. In summary, our results from adaptive immune receptor read recoveries from the RNAseq files are informative. Further studies using the immune repertoire approach, representing a more comprehensive clonotype collection, should be applied in the future.
Our findings suggest that individuals with more advanced disease have increased richness in tumor recovered CDR3 sequences. In the Moffitt TCC Cohort, we detected statistically significant differences in TRA and IGL diversity with increased richness in tumors with larger diameter and higher grade. Furthermore, we identified tumors with sarcomatoid carcinoma pathology that represent a rare and aggressive histology and showed significant increases in different diversity metrics and receptor subtypes, in particular increased CDR3 sequence richness. The immune receptor profiles of these tumors are particularly interesting since sarcomatoid histology has also been associated with very favorable response to checkpoint inhibitors. We postulate that a similar immune receptor profile in other patient tumors may portend a favorable response to checkpoint inhibition. Also, we found a significant increase in richness in left sided tumors. This difference may explain some of the host related factors associated with left-sided tumors that have a poorer clinical outcome than right-sided tumors (43). Many of these associations were able to be validated in similar comparisons with the CPTAC-3 cohort studies.
In the TCC cohort, B-cell receptors with IGL had the most recoveries and TRA had the most T-cell CDR3 sequences recovered (Supplemental Fig. S3). It should be noted that point-estimates of diversity or heterogeneity are only comparable within the subtype of interest, within a cohort. However, trends can be compared between point-estimates and patient cohorts. Furthermore, different cohorts based on their clinical context may show differences in immune cell infiltrates, even when considering simple cell marker differences between T-cells and B-cells. Interestingly, B-cell CDR3 recoveries dominated in the TCC cohort, accounting for an average of 93.41% (ranging from at least 53.24% to 99.95%). This contrasted with the CPTAC-3 renal cell carcinoma cohort, which contains generally less aggressive tumors, with only 40.9% of the cohort respectively comprised of stage 3 or 4 tumors.
Our results demonstrate that increased richness is indicative of larger and more advanced ccRCC tumors, which may be related to differences in underlying tumor biology. Evenness, as measured by the inflection point q, segregated patients based on survival. In a cross-validation cut-point analysis, patients with higher TRA evenness had a significantly improved overall survival compared to individuals with lower TRA evenness. These results indicate that patients’ TRA evenness, not richness, may be a possible prognostic biomarker and could have direct therapeutic consequences for response to systemic agents that elicit their effect in the tumor microenvironment (39). Furthermore, this evenness metric could be extended to other solid tumor types, as a quantitative metric of host contributions to immune infiltration that is a result of tumor evolution. These characterizations might become especially interesting in the context of terminally exhausted CD8+ T cells, which were recently shown to be enriched in advanced renal cell carcinoma, interacting with M2-like tumor-associated macrophages leading to immune dysfunction and poorer prognosis (44).
We identified demographic differences based on the CDR3 diversity of B-cell receptors, both individually and in combination, showing increased high-q diversity. This amounted to decreased dominance of the most abundant sequence, in non-white individuals compared to white individuals in the Moffitt TCC cohort. However, in this cohort, 88.5% of the patients were white (Table 1) and we were unable to confirm these results in the CPTAC-3 cohort due to missing data. Nevertheless, previously found race-related differences in BCR pathway activation in African Americans compared to European Americans lends support to our finding in differences in BCR dominance/clonality diversity (45). Furthermore, high-q diversity/sequence dominance may reveal gender-based differences in the ccRCC microenvironment. We found that females had higher clonality compared to male patients, which persists in the CPTAC-3 cohort. These demographic differences need to be further investigated, but our results suggest underlying race and gender differences in the heterogeneity of ccRCC microenvironments as reflected in tumor immune infiltration differences.
Biodiversity has historically been summarized into: alpha-diversity, which measures a single community’s diversity; beta-diversity, which quantifies the relative change of species between communities; and gamma diversity, which measures the total diversity in ecology (46). Diversity at the individual receptor subtype-level relates closest to alpha-diversity, which is then compared across the patients in a cohort. In a beta-diversity context, we see that many of the trends hold true between the Moffitt TCC and CPTAC-3 cohorts. The most commonly used diversity measures applied to cancer systems have been Shannon and Simpson indices (47, 48), which are special cases of the GDI at intermediate values of the parameter q (49, 50). Our analysis found that it is at the extremes of the continuum of diversity measures (low-q and high-q values) that we can stratify patients in clinically meaningful ways (for example, Fig. 2 vs. Supplemental Fig. S6). In this sense, novel properties of the GDI that are discussed here may allow a more nuanced, and thus more clinically comprehensive characterization of sequence heterogeneity. These novel objective diversity scales could have important applications for other systems in which tumor heterogeneity with its ecological and evolutionary impact is quantified.
Different point-estimates based on generalized diversity give unique information about the tumor. Increased richness in TRA and IGL diversity informs the size and aggressiveness of a tumor. Dominance of the most abundant sequence segregates patients based on prognosis. We identified a novel measure of evenness among immune receptor subtypes that could accurately classify patients’ overall survival. We also found important differences in receptor subtype contributions based on patient demographics such as race and gender. Using these diversity metrics, we identify a new statistical approach to stratify ccRCC patients-based differences in immune infiltration diversity and further guide precision oncology.
Data Availability
Code for VDJ epitope recoveries from patient sequencing data (BAM files) is publicly available on Docker at https://hub.docker.com/r/bchobrut/vdj and GitHub at https://github.com/bchobrut/vdj_recovery. Code for calculating patient CDR3 diversity from CDR3 recoveries is publicly available at: https://github.com/mcfefa/OncoDiversity.jl. The code and documentation describing how to run our pipeline and reproduce our results are open-source and publicly available through the OncoDiversity.jl GitHub repository. A virtual machine producing the full diversity environment is available on Code Ocean: https://codeocean.com/capsule/9959428/tree/v1
ABBREVIATIONS
- CDR3
- complementarity determining region-3
- GDI
- generalized diversity index
- IP
- inflection point
- KM
- Kaplan-Meier survival curve
- OS
- overall survival
- RFS
- recurrence free survival
- TCR
- T-cell receptor
- TRA
- T-cell receptor alpha
- TRB
- T-cell receptor beta
- TRG
- T-cell receptor gamma
- TRD
- T-cell receptor delta
- BCR
- B-cell receptor
- IGH
- B-cell immunoglobulin heavy chain
- IGK
- B-cell immunoglobulin kappa light chain
- IGL
- B-cell immunoglobulin lambda light chain
DATA AND CODE AVAILABILITY
Code for VDJ epitope recoveries from patient sequencing data (BAM files) is publicly available on Docker at https://hub.docker.com/r/bchobrut/vdj and GitHub at https://github.com/bchobrut/vdj_recovery.
Code for calculating patient CDR3 diversity from CDR3 recoveries is publicly available at: https://github.com/mcfefa/OncoDiversity.jl. The code and documentation describing how to run our pipeline and reproduce our results are open-source and publicly available through the OncoDiversity.jl GitHub repository. A virtual machine producing the full diversity environment is available on Code Ocean (DOI: 10.24433/CO.1660327.v1).
FUNDING
United States Army Medical Research Acquisition Activity Department of Defense (KC180139 to B.J.M). P.M.A. was supported by an American Cancer Society Moffitt IRG award, the Richard O. Jacobson Foundation (Evolutionary Therapy Center of Excellence at Moffitt Cancer Center), and the William G. ‘Bill’ Bankhead Jr and David Coley Cancer Research Program (20B06).
AUTHOR CONTRIBUTIONS
MCF: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing.
NC: data curation, critical review.
BIC: data curation, adaptive immune receptor recombination read extractions from the Moffitt TCC cohort RNAseq files.
YK: statistical analysis, validation, critical review.
JKT: logistic support, data curation, statistical analysis, critical review.
AB: logistic support, data curation, statistical analysis, critical review.
JJM: investigation, critical review.
MF: logistic support, data curation.
ES: logistic support, data curation.
JD: critical review.
SSAF: data curation, adaptive immune receptor recombination read extractions from the Moffitt TCC cohort RNAseq files.
JFA: data curation, adaptive immune receptor recombination read extractions from the Moffitt TCC cohort RNAseq files.
ENK: logistic support, data curation.
GB: adaptive immune receptor recombination read extractions from the Moffitt TCC cohort RNAseq files, critical review, study supervision.
BJM: conceptualization, investigation, methodology, validation, critical review, writing, study supervision.
PMA: conceptualization, investigation, methodology, software, validation, critical review, writing, study supervision.
POTENTIAL CONFLICTS OF INTEREST
No potential conflicts of interest to report: NC, BIC, YK, JKT, AB, MF, ES, ENK, GB. BJM has served as expert opinion for Merck. JM has ownership interest in Fulgent Genetics, Inc., Aleta Biotherapeutics, Inc., Cold Genesys, Inc., Myst Pharma, Inc., and Tailored Therapeutics, Inc., and is a consultant/advisory board member for ONCoPEP, Inc., Cold Genesys, Inc., Morphogenesis, Inc., Mersana Therapeutics, Inc., GammaDelta Therapeutics, Ltd., Myst Pharma, Inc., Tailored Therapeutics, Inc., Verseau Therapeutics, Inc., Iovance Biotherapeutics, Inc., Vault Pharma, Inc., Noble Life Sciences Partners, Fulgent Genetics, Inc., UbiVac, LLC, Vycellix, Inc., and Aleta Biotherapeutics, Inc. PMA is a consultant for CRISPR Therapeutics and received research funding from KITE pharma (Gilead). MCF reports other support from the UF Foundation during the conduct of this study.
ACKNOWLEDGEMENTS
This work was supported in part by the Biostatistics and Bioinformatics Shared Resource at the H. Lee Moffitt Cancer Center & Research Institute, a National Cancer Institute–designated Comprehensive Cancer Center (P30-CA076292). Total Cancer Care (TCC) Protocol at Moffitt Cancer Center, which was enabled in part by the generous support of the DeBartolo Family. Some data used in this publication were generated by the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC), available to which author via dbGaP project approval number 6757. Editorial assistance was provided by the Moffitt Cancer Center’s Scientific Editing Department by Dr. Paul Fletcher and Daley Drucker (no compensation was given beyond their regular salaries). We also thank Gregory J. Kimmel, PhD for valuable comments.
Footnotes
mferrall.fairbanks{at}bme.ufl.edu, nicholas.chakiryan{at}moffitt.org, bchobrut{at}usf.edu, youngchul.kim{at}moffitt.org, jamie.teer{at}moffitt.org, anders.berglund{at}moffitt.org, James.Mule{at}moffitt.org, Michelle.Fournier{at}moffitt.org, Erin.Siegel{at}moffitt.org, jasreman.dhillon{at}moffitt.org, Falasiri{at}usf.edu, Jarturo{at}usf.edu, Esther.Katende{at}moffitt.org, gblanck{at}usf.edu, brandon.manley{at}moffitt.org, altrock{at}evolbio.mpg.de;
updated manuscript after anonymous peer review