Abstract
BACKGROUND Clonal hematopoiesis of indeterminate potential (CHIP) is a condition when healthy individuals harbor clonal mutations in myeloid (M-CHIP) and/or lymphoid (L-CHIP) cells at variant allele fraction (VAF) ≥0.02. While CHIP is associated with an increased risk of hematologic malignancy and cardiovascular disease, its association with airborne carcinogens is largely unknown.
OBJECTIVES Here, we studied M/L-CHIP in responders to the 9/11 terrorist attacks on the World Trade Center (WTC), who were exposed to a complex mix of airborne carcinogens. Then we explored the association of CHIP mutations with phenotypes such as age, ancestry, exposure, HLA zygosity, and other clinical, laboratory, mental and cognitive data. Finally, we compared CHIP prevalence in WTC responders to 293 unexposed controls.
METHODS Using banked peripheral blood and ultra-deep whole-exome sequencing at 250X, we characterized CHIP mutations and their interaction with clinical, mental and cognitive characteristics, exposure, peripheral blood counts, and HLA zygosity in 350 WTC responders. We used Fisher’s exact test for categorical variables; Wilcoxon rank sum test for continuous variables; and logistic regression for multivariate analysis.
RESULTS Among WTC participants, M-CHIP prevalence was 16.2% and L-CHIP 21.4%. M-CHIP prevalence increased with age (p=0.02), was elevated in previous-smokers (p=0.01), and associated with lower platelet counts (p=0.03). The most frequently occurring genes for M-CHIP were DNMT3A, TET2, PPM1D and for L-CHIP were EEF1A1, DDX11 and KMT2D. Notably, harboring a DDX11 mutation associated with a lower Montreal Cognitive Assessment score (p=6.57e-03). Overall, M/L-CHIP was more prevalent in WTC responders versus controls.
DISCUSSION Study results will inform the development of personalized risk-adapted CHIP and cancer screening programs in individuals exposed to airborne carcinogens.
Introduction
In the aftermath of the 9/11 terrorist attacks on the World Trade Center (WTC), over 91,000 individuals were involved in the rescue and recovery efforts, clean-up of debris and restoration of essential services (Rescue & Recovery Workers - 9/11 Health). These included first responders such as firefighters, police officers, and paramedics, in addition to operating engineers, steel workers, railway tunnel workers, telecommunications workers, sanitation workers, medical examiners, and volunteers. Many had no prior training in civil disaster response (Herbert et al. 2006). Due to their unprecedented exposure to a complex mix of known or suspected airborne carcinogens, there were justified concerns regarding their elevated cancer risk. WTC responders were exposed to benzene, formaldehyde, asbestos, silica, cement dust, glass fibers, heavy metals, polycyclic aromatic hydrocarbons, and polychlorinated biphenyls, polychlorinated dibenzofurans, and dioxins (Lioy and Georgopoulos 2006). Exposures to these agents may increase cancer risk. However, the association between exposure to this complex mix and the development of hematologic malignancies is not well characterized.
Recently, it has been discovered that a condition called clonal hematopoiesis (CH) elevates risk for hematological neoplasm, cytopenia, cardiovascular disease (CVD), infection and all-cause mortality (Dawoud et al. 2021; Jaiswal et al. 2014, 2017; Jaiswal and Ebert 2019; Niroula et al. 2021; Sperling et al. 2017; Zekavat et al. 2021). Individuals with CH harbor clonal genomic mutations in their blood cells associated with certain hematologic malignancies, yet with no detectable hematologic disorders nor unexplained persistent cytopenia (Heuser et al. 2016). A subset of CH, termed clonal hematopoiesis of indeterminate potential (CHIP), involves having a clonal population of blood cells that carry a point mutation or short insertion/deletion with a variant allele fraction (VAF) ≥ 2% in a gene recurrently mutated in blood cancers. CHIP onset is strongly associated with age, reflecting the gradual accumulation of somatic mutations and quiescent clones which accompany cellular senescence (Genovese et al. 2014). CHIP has been shown to occur in the myeloid lineage (M-CHIP) (Abelson et al. 2018; Desai et al. 2018), and studies have mostly focused on genes known to be recurrently mutated in myeloid malignancies (Bick et al. 2020b; Jaiswal et al. 2017). However, CHIP can also occur in the lymphoid lineage (L-CHIP), and contribute to the risk of lymphoid malignancies, an area that is less documented (Agathangelidis et al. 2018; Condoluci and Rossi 2018; Niroula et al. 2021; Singh et al. 2020; von Beck et al. 2023; Weigert et al. 2012).
Here, we studied the prevalence of M- and L-CHIP (M/L-CHIP) in 350 WTC responders using ultra deep (250X) whole-exome sequencing (WES). Our study has several unique aspects. First, we studied participants from the WTC Health Program General Responders Cohort (GRC), which is a relatively diverse group of workers with respect to their occupational titles. Second, we leveraged the wealth of additional data we collected on these responders, including demographics (gender, race/ethnicity), WTC exposure, blood counts (including various immune cell subsets), smoking history, body mass index (BMI), mental, cognitive and general health characteristics, as well as HLA alleles derived from the WES datasets, to study their association with M/L-CHIP. Third, utilizing WES enabled us to study the full spectrum of both M/L-CHIP mutations without being a priori restricted to a particular gene-set. This is critical, as the literature and the definition of what constitutes CHIP genes and mutations continues to evolve, our raw data files will continue providing a rich resource for future investigations. Overall, our results demonstrate that deep sequencing of M/L-CHIP mutations, integrated with other metrics, enables a more personalized risk-adapted assessment for CHIP associated etiologies in persons exposed to airborne carcinogens. We anticipate that the study findings will inform the development of CHIP-related diagnostic measures which can be incorporated into screening programs for cancer and other inflammation related conditions.
Methods
The World Trade Center Health Program (WTCHP) General Responder Cohort (GRC)
includes participants in the rescue, recovery, and cleanup efforts at the WTC site after 9/11/2001 on the basis of eligibility criteria, which included type of tasks, site location, and dates and hours worked/volunteered (Dasaro et al. 2017; Solan et al. 2013). A subset of participants located in Long Island were referred to the WTC Clinical Center located at Stony Brook University (SBU, B. Luft, PI) (referred to here as the WTC-SBU). This Center was established in 2002 to enlist, monitor and treat WTC-related conditions in individuals with documented WTC-response experience. Monitoring protocol included self-administered physical and mental health questionnaires followed by a physical examination, laboratory tests, spirometry, and a chest radiograph. This population has been characterized very carefully and included in previous studies of various health conditions associated with response to the WTC disaster, including post-traumatic stress disorder (PTSD), prostate cancer, cognitive impairment, and COVID-19 (Clouston et al. 2016, 2019, 2022; Morozova et al. 2021). Routine monitoring visits were performed every 12–18 months. Compared with the whole WTCHP GRC, the WTC-SBU cohort includes relatively more law enforcement personnel and men and fewer individuals with a low level of education (i.e. without a high-school degree) (Clouston et al. 2016). The study was approved annually under IRB #604113 by the Committees on Research Involving Human Subjects at SBU. More than 95% provided written informed consent for their data to be used for research.
Sample Collection
Consented participants provided whole blood samples during their annual clinical checkup visits between 2016 and January 2019. Samples were collected into Vacutainer Plastic K2EDTA (containing ethylenediaminetetraacetic acid) Tubes (BD, 367527) and stored at -80C until analysis.
Demographics
From the WTC-SBU cohort, we randomly selected 350 samples for WES from 345 unique participants. Detailed phenotypic characteristics are in Table 1A. Over 90% were male and white; 7.5% were female, and 9% were non-white. Less than 8% of participants were ≥ 55 years old, 49.6% were 56-60 years old, 24.3% were 61-65 years old, 9.3% were 66-70 years old and 9% were older than 70. Of these, 4.3% were current smokers, 44.6% were former smokers, and 51% were never-smokers.
Clinical data
include WTC exposure history, body mass index (BMI), and mental, cognitive and general health characteristics (including cholesterol and triglyceride levels) assessed at each monitoring visit. WTC exposure history was assessed during the enrollment interview at the WTC Health Program and has been described in detail (Wisnivesky et al. 2011). Briefly, an exposure variable was created using total time spent working at Ground Zero or on the debris pile (Wisnivesky et al. 2011). Post-traumatic stress disorder (PTSD) symptoms were assessed using the PTSD Checklist (PCL) a 20-item self-report measure modified to assess symptoms over the past month, which has excellent psychometric properties, convergent validity and internal consistency (Wilkins et al. 2011). Mild cognitive impairment (MCI) was measured using the Montreal Cognitive Assessment (MoCA), a widely used objective multidomain test (Nasreddine et al. 2005).
Blood and lipid markers
were measured concurrently with blood collection for CHIP analysis from 294 (out of 345) participants. Blood markers include platelet count, basophil count, lymphocyte count, neutrophil count, segmented neutrophil count, eosinophil count, monocyte count, LMR (ratio lymphocyte/monocyte), white blood cell count (WBC), red blood cell count (RBC), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), MCH concentration (MCHC), and red cell distribution width (RDW). Lipid markers include total cholesterol, triglycerides, HDL cholesterol, LDL cholesterol, and very low LDL (VLDL).
Sequencing Analysis
Deep WES for 350 WTC-SBU samples was performed at Azenta Inc (Burlington, MA, USA) using the HiSeq 2500 system (Illumina, San Diego, CA, USA) using standard protocols. Sample quality control (QC), library construction, and sequencing were performed in line with industry quality of process standards. To allow for ample sensitivity and accuracy in CHIP mutation calling even at low clonal frequencies, a median 250X coverage was used. Sequencing generated high-quality 150-bp, pair-end read data in standard fastq format.
Data Pre-Processing for Variant Discovery
To pre-process and QC raw sequence reads, we used fastp (Chen et al. 2018). Specifically, we trimmed the adapters and filtered out bad reads (low quality, too short or too many unknown bases). Next, we adhered to the Genome Analysis Toolkit (GATK, https://software.broadinstitute.org/gatk) best practices for data pre-processing to generate analysis-ready bam files from the fastq files. Briefly, we aligned the sequence reads to the Genome Reference Consortium Human Build GrCh38 using BWA-MEM (Li 2013) followed by duplicate marking using Picard (http://broadinstitute.github.io/picard) and base quality score recalibration using GATK. We used these bam files for calling germline variants and somatic mutations.
Germline Variant Calling and Kinship Analysis
We called germline variants using HaplotypeCaller in GATK in GVCF mode, jointly genotyped individual gVCF files for all autosomes, filtered variants by variant quality score recalibration in GATK and finally removed sites with ≥ 20% missing data. Next, to exclude any genetic duplicates among WTC participants, we performed kinship analysis with germline variants using KING software (Manichaikul et al. 2010) and identified three duplicate pairs (kinship coefficient > 0.354). One pair was from the same individual at different time points. From these 6 samples, we included the most recent sample collected from the participant with a duplicate pair.
Somatic Variant Calling
After kinship analysis, we performed somatic variant calling on the bam files of the remaining 345 samples using the GATK Mutect2 pipeline in tumor-only mode. To exclude likely germline calls and sequencing artifacts, we provided external reference of germline variants from gnomAD and a Panel Of Normals (PON) to Mutect2. To create the PON, we used WES data from the publicly available Genotype-Tissue Expression (GTEx) cohort (phs000424), where we filtered for 70 young individuals (aged ≤ 40 years) as they would be less likely to harbor CHIP. To identify somatic mutations with high confidence, we applied the orientation bias and then PASS filters of Mutect2.
Somatic Variant Filtering to identify M/L-CHIP carriers
We considered 76 somatic driver genes for M-CHIP (Bick et al. 2020b; Kar et al. 2022), and 235 genes for L-CHIP (Niroula et al. 2021) (Supplementary Table S1), and filtered for predefined CHIP variants at a Mutect2 VAF >2%. For L-CHIP somatic variants, we specifically filtered for pathogenic (variants curated from cBioPortal) or putative (variants that alter canonical protein sequence) (Niroula et al. 2021).
To remove likely artifacts, we applied additional QC filters. Briefly, we filtered for somatic variants at a minimum depth of 20 reads, minimum 3 reads (supporting the mutant allele) and at least one read in both forward and reverse directions (supporting the reference and mutant alleles). In addition, we excluded somatic variants observed in gnomAD with allele frequency ≥ 0.1% and with an observed frequency >1% in the cohort (unless previously reported to be involved in hematologic malignancies). Next, we annotated the identified variants for pathogenicity using Combined Annotation Dependent Depletion (CADD) score and excluded variants with a scaled CADD score < 10. Finally, we implemented additional filters for putative L-CHIP mutations (Supplementary Table S1), including a minimum alternate allele read of 5, maximum VAF of 0.2 and at least two reads in both forward and reverse direction supporting the alternate allele.
We included two filtering exceptions. The first one is, based on reports that M-CHIP variants in the U2AF1 gene cannot be reliably identified in the human GrCh38 reference genome due to unintended replication of the U2AF1 locus in chromosome 21 (Miller et al. 2022). Therefore, we realigned U2AF1 using build GrCh37 and repeated somatic variant calling. Second, it was reported that the variant ASXL1- G646Wfs*12 with a VAF ≥ 10% was a true CHIP variant (Vlasschaert et al. 2023) and not a sequencing artifact. Yet, we did not identify any WTC responders with CHIP mutations in U2AF1 gene or at ASXL1-G646Wfs*12. After filtering, we considered all responders who harbored at least one known M/L-CHIP somatic variant as having CHIP. We list the identified M/L-CHIP mutations in Supplementary Table S2.
CHIP Prevalence in WTC Responders and Unexposed Controls
We next compared M/L-CHIP prevalence in the WTC cohort to unexposed controls. For controls, we used M/L-CHIP calls from 293 healthy controls recruited within the Mount Sinai Crohn’s and Colitis Registry (MSCCR) cohort (STUDY-11-01669) which was processed using the same analytical pipeline as described above (Nathan et al. 2023). We provide control phenotypic characteristics in Table 1B. However, to make the median coverage of the WTC cohort comparable to that of the controls, we downsampled the WTC data using Picard’s DownsampleSam tool to keep 41.5% of the total reads so that both WTC and MSCCR cohorts would be comparable as the total number of aligned bases (PF_ALIGNED_BASES).
HLA zygosity
To test whether the zygosity of the HLA alleles was associated with the prevalence of CHIP, we performed HLA-typing using HLA-HD (Kawaguchi et al. 2017), and determined HLA class I and class II alleles of each participant with precision up to 6-digits.
Statistical Analysis
To understand the association of CHIP with all available factors, we used standard statistics. Briefly, we summarized categorical variables as counts and percentages and used Fisher’s exact tests for analyses. We summarized continuous variables using median and median absolute deviation, and used Wilcoxon rank sum tests for analyses. For association analyses, as the majority of the responders were White, we collapsed the race variable into two categories: White and non-White/ unknown. We collapsed WTC exposure as a variable into three categories, including very low/low, intermediate and high/very high. We excluded the missing values from the association analysis.
We considered several WTC outcome subgroups, including a) CHIP positive (at least one M or L-CHIP mutation, N=118), b) M-CHIP positive, (at least one M-CHIP mutation, N=56), c) L-CHIP positive (at least one L-CHIP mutation, N=74), d) DNMT3A mutation (at least one mutation on DNMT3A gene, N=22), e) TET2 mutation (at least one mutation in TET2 gene, N=15), f) PPM1D mutation (at least one mutation in PPM1D gene, N=11), g) EEF1A1 mutation (at least one mutation in EEF1A1 gene, N=18), and h) DDX11 mutation (at least one mutation in DDX11 gene, N=13). For all these cases, we defined participants who did not harbor any M/ L-CHIP mutation as CHIP-negative (N=227).
For association analysis, we first performed univariate analysis using the individual factors as covariates. We provide false discovery rate adjusted p-values in Supplementary Table S3.
Then, among the characteristics that were significant from marginal associations, we fitted a multivariate logistic regression using presence of CHIP as outcome variable and the significant characteristics as covariates. We considered a p-value < 0.05 as statistically significant.
Results
To study the prevalence of CHIP mutations in WTC responders, we performed deep WES at 250X on 350 banked blood samples from 345 participants (Figure 1A). The participants were aged 48-90 years (median, 59 years), with no previous hematologic malignancy at enrollment, with demographics detailed in Table 1A. We applied rigorous QC metrics and filters (Figure 1B, Supplementary Table S1), to reveal M/L-CHIP prevalence, which we then associated with age, ancestry, exposure, HLA zygosity, and other clinical, laboratory, mental and cognitive data. Finally, we compared CHIP prevalence in the WTC cohort to 293 unexposed controls (demographics in Table 1B) from the New York City area.
Prevalence and characteristics of L/M-CHIP mutations
After sample QC, 16.2% (56 participants) of the 345 WTC participants harbored 71 M-CHIP mutations in13 genes, while 21.4% (74 participants) harbored 85 L-CHIP mutations in 43 genes (Supplementary Table S2). Overall, 34.2% (118/345) of the WTC participants harbored at least one M- or L-CHIP mutation. We observed greater clonal complexity, identified by the presence of more than one mutation in 7.0% (24/345) of WTC participants, where 3.5% (12 participants) carried both mutation types.
The majority (>80%) of the participants who harbored M/L-CHIP mutations carried only a single mutation (Figure 2-A, D). Of the M-CHIP mutations, 39% were non-synonymous, 30% were stop-gain, 23% were frameshift deletions and the rest were frameshift insertions and splicing (Figure 2B). For L-CHIP, 87% were non-synonymous, 9% were stop-gain and the rest were frameshift indels (Figure 2E). The top M-CHIP genes were DNMT3A, TET2, PPM1D and ASXL1 (Figure 2C). The top L-CHIP genes were EEF1A1, DDX11, KMT2D, ATM and FAT2 (Figure 2F). Figure 3A shows the VAF distribution of the M/L-CHIP mutations. The highest VAF was in TET2 mutations (Figure 3C).
Additional factors on CHIP prevalence
We studied the associations between M- and/or L-CHIP mutations and i) clinical data - age, gender, race, smoking status, cardiovascular diseases (CVD), stroke, body mass index (BMI), Montreal Cognitive Assessment (MoCA), PTSD Checklist (PCL); ii) laboratory data (blood and lipid counts) and iii) HLA zygosity (class I and class II alleles) (Supplementary Table S3, Figure 4). Further, we studied the associations driven by the top mutated M-CHIP (DNMT3A, TET2, PPM1D) and L-CHIP (EEF1A1, DDX11) genes. Figure 4 summarizes the factors with at least one significant association in the univariate analysis. We did not observe associations in L-CHIP positive WTC cases with the factors considered in this study. In Supplementary Table S3 we list all associations with the participant characteristics we considered.
Clinical data
Several studies (Bick et al. 2020b; Kar et al. 2022) have observed that older age, smoking and other clinical characteristics are associated with CHIP. We report the distribution of M/L-CHIP in the WTC cohort with respect to age in Table 2. Consistent with literature on CHIP and aging (Bick et al. 2020b; Niroula et al. 2021), participants with CHIP mutations, specifically M-CHIP mutations and the top M-CHIP genes, DNMT3A and TET2, were associated with older age compared to CHIP negative (individuals without M/L-CHIP mutations) cases. Age remained a significant factor in the multivariate logistic regression model too. The median age of participants with CHIP, M-CHIP, DNMT3A, TET2 were 61, 62, 66.5 and 64 years, respectively. Whereas the median age of CHIP negative participants was 59 years. Former smokers were 44.6% of the WTC cohort, and a smoking history was associated with M-CHIP in both uni- and multivariate regression models. M-CHIP mutations and particularly DNMT3A were also associated with lower BMI compared to CHIP negative participants. Finally with regards to the mental health and cognitive characteristics, participants with DDX11 mutations (a top L-CHIP gene), were associated with higher PCL scores and lower MoCA scores. MoCA scores remained significant in the multivariate logistic regression model. The median MoCA score for participants with DDX11 mutation was 22, which was within the range indicative of mild cognitive impairment.
Laboratory data
Previous studies have revealed associations of M-CHIP with myeloid cell parameters (e.g. platelet, red blood cell, neutrophil and monocyte counts), and L-CHIP with elevated lymphocyte counts (Niroula et al. 2021). In the WTC cohort, M-CHIP positive cases had lower platelet counts (uni- and multivariate regression models) as compared to CHIP negative cases. Participants with the most frequently observed M-CHIP mutations, DNMT3A and TET2, were associated with lower absolute lymphocyte counts. TET2 mutation carriers were additionally associated with lower RBC counts, and higher segmented neutrophils. At the same time, PPM1D mutation carriers were associated with lower platelet counts and higher mean corpuscular hemoglobin (MCH). Further, DDX11 mutation carriers were associated with higher absolute lymphocytes, mean corpuscular volume (MCV), lymphocyte monocyte ratio (LMR), and lower segmented neutrophils.
HLA zygosity
We tested potential major histocompatibility complex (MHC) determinants that might be associated with CHIP positivity, and observed an association with a higher proportion of HLA-DMB homozygosity, which remained significant in the multivariate logistic regression model. At the same time, PPM1D mutation status was associated with a higher proportion of DQA1 homozygosity. Finally, participants with the top L-CHIP mutation gene, EEF1A1, were associated with a lower proportion of DPA1 homozygosity.
M/L-CHIP prevalence was higher in WTC cohort versus unexposed controls
To understand the impact of WTC debris-exposure on CHIP prevalence, we compared the WTC cohort to 293 healthy unexposed controls from New York area (Nathan et al. 2023). After downsampling, in the WTC cohort the prevalence of detectable M-CHIP mutations decreased to 7.5% (26/345) and L-CHIP to 9.9% (34/345). Yet, for unexposed controls, M-CHIP prevalence was 3.1% (9/293) and L-CHIP 2.0% (6/293). Grouping the participants into age strata, both CHIP mutation types were still generally more prevalent in the WTC cohort (Figure 5A and B, Supplementary Table S4). L-CHIP mutations were statistically significantly more prevalent in WTC participants ≤ 55 years old in comparison to unexposed controls (p=0.04).
TET2 M-CHIP and EEF1A1 and DDX11 L-CHIP mutations were more prevalent in WTC cohort versus unexposed controls
In terms of frequency of the top mutated genes (Figure 5C and D), we observed a significant difference between the number of M-CHIP TET2 mutations in the WTC cohort versus controls (p=0.02). Furthermore, we observed a significantly higher number of L-CHIP EEF1A1 (p< 0.0001) and DDX11 (p=0.04) mutations in the WTC cohort vs controls.
Discussion
CHIP is an early marker of risk for numerous maladies, including cardiovascular diseases (CVD, both ischemic heart disease and stroke), hematologic malignancies, severe COVID-19 outcomes, several solid cancers, infection, and all-cause mortality (Bhattacharya et al. 2022; Bick et al. 2020a; Bolton et al. 2021; Jaiswal et al. 2014, 2017; Jaiswal and Ebert 2019). There is also increasing appreciation that some CHIP mutations are tied to inflammatory consequences such as anemia (Girelli and Busti 2019). While CHIP presence is not a prerequisite for progression to malignancy, it significantly increases risk. The transformation rate from M-CHIP to blood cancer is estimated around 0.5-1.0% per year, roughly 13 times greater than the incidence in the general population (Genovese et al. 2014; Heuser et al. 2016). L-CHIP was also associated with higher incidence of lymphoid malignancy (Hazard Ratio ∼4) (Niroula et al. 2021). Here, we report M and/or L-CHIP mutation prevalence and patterns in a population of WTC responders. This is arguably the first study to document L-CHIP mutations within a WTC cohort and within a cohort of occupationally exposed individuals in general. In addition, we uniquely leveraged a rich dataset of clinical, laboratory, mental, cognitive and HLA zygosity information to study their associations with CHIP.
Despite the severe toxic inhalation exposures of WTC responders, to date, only one study has assessed their CHIP prevalence. Focused on FDNY first responders, this study used targeted sequencing on 237 genes known to be recurrently mutated in hematologic malignancies, and reported elevated CHIP in comparison to unexposed controls (Jasra et al. 2022). However, increased cancer mortality has also been reported for WTC responders who were not part of the FDNY (Li et al. 2023). Indeed, the combined WTC population (pooled from FDNY, WTC Health Registry and GRC) has a documented elevated risk for melanoma, as well as cancers of the prostate, thyroid and tonsil compared to the general population (Li et al. 2022), and WTC GRC cohort observed a modest elevation of hematologic malignancies (Shapiro et al. 2019). Here, our deep WEX of the WTC GRC, which includes a more diverse group of first responders than the FDNY, revealed a high prevalence of both M-CHIP (16.2%) and L-CHIP (21.4%) mutations, with an overall prevalence of 34.2%. While the elevation in M-CHIP with respect to WTC debris exposure in the WTC GRC was consistent with the previous report on targeted sequencing of samples from WTC FDNY participants, the genes, mutations and filtering criteria were different between the studies, thus preventing direct comparisons.
Currently, there are no standard definitions of CHIP, and the continual identification of specific driver mutations and their subsequent inclusion (and exclusion) in CHIP gene lists changes the definition of the condition. For this reason, the 2022 World Health Organization definition of CHIP is not gene or variant specific, only referring to somatic mutations of myeloid malignancy- associated genes detected in the blood or bone marrow at a VAF of ≥ 2% (≥4% for X-linked gene mutations in males) in individuals without a diagnosed hematologic disorder or unexplained cytopenia (Khoury et al. 2022; Steensma et al. 2015). As a result of this evolving landscape, different research studies report results on different variants and genes, without an international consensus on how CHIP should be defined (Bick et al. 2020b; Niroula et al. 2021). Furthermore, there are no established ‘gold standard’ analysis pipelines devoted to CHIP analysis. For example, while a VAF of ≥2 % is recommended, there are no guidelines on what the corresponding read depth would be, as more mutations with a true VAF of 2% will be missed at a read depth of 30X than 250X. In addition, the filters to remove technical artifacts vary between studies. Hence, both the biological and technical definitions of CHIP vary between studies, making comparisons between studies and with population level controls challenging.
Because of the differences described above, instead of directly comparing CHIP prevalence in our study with population level cohorts (ie. UK BioBank and TOPMED), we analyzed an unexposed control cohort using the same analytical pipeline (Figure 1B). Furthermore, we downsampled the WTC GRC findings to match sequencing depth of the control cohort. We observed that both M- and L-CHIP were generally more prevalent in WTC responders than unexposed controls. This suggests that WTC exposure or other characteristics of WTC responders, such as chronic stress following the WTC experience (Rogers et al. 2022) is associated with elevated M/L-CHIP, especially for those 55 and younger.
Our findings on age-associated increase in M-CHIP prevalence in the WTC cohort were consistent with its well-documented association with age in other population-level cohorts (Bick et al. 2020b; Niroula et al. 2021). These studies have documented that lifestyle factors like smoking history, BMI were also positively associated with M-CHIP prevalence (Komic et al. 2023). While we observed a similar association for smokers in the WTC cohort, a larger cohort size is needed to understand the association of BMI with M-CHIP in this population.
Genes with mutations most commonly associated with M-CHIP in the literature are involved in epigenetic regulation (TET2, ASXL1, DNMT3A, and IDH2), RNA splicing (SF3B1, SRSF2, U2AF1), cell signaling (JAK2 and NRAS), and DNA repair and cell cycle regulation (TP53) (Heuser et al. 2016; Sperling et al. 2017; Steensma et al. 2015). Similar to these trends, we observed M-CHIP mutations in most of these genes (see Figure 2C). For L-CHIP mutations, prior studies have reported an even distribution across a larger number of genes (Niroula et al. 2021). While we did observe some of the previously reported L-CHIP gene mutations, 21% of L- CHIP mutations were in EEF1A1 and 15% in DDX11 (see Figure 2F). This signal was driven by three putative mutations, DDX11:P368S, EEF1A1:E293K and EE1F1A:V315L, which have known associations with hematologic malignancies (Ma et al. 2022; Papaemmanuil et al. 2013, 2016; Tyner et al. 2018). L-CHIP mutations in EEF1A1 and DDX11 remained significantly more prevalent in the WTC cohort in comparison to unexposed controls as well. Note that, EEF1A1 and DDX11 were not among the genes considered in the previous targeted sequencing WTC- FDNY study (Jasra et al. 2022).
As the WTC responder population ages, multiple studies are reporting an increased risk for neurocognitive and motor dysfunction that resembles neurodegenerative diseases, in addition to the presence of cortical atrophy and cognitive impairment at midlife within this population. These risks have been associated not only with physical exposures at the WTC site, but also with chronic post-traumatic stress disorder (PTSD) (Clouston et al. 2022). We therefore investigated the association of the available mental and cognitive data on the WTC cohort with CHIP, and observed an association for L-CHIP DDX11 gene with higher PCL scores (indicative of PTSD severity) and lower MoCA scores (indicative of mild cognitive impairment). DDX11 is a helicase that participates in cellular processes that alter RNA secondary structure, and helicase dysfunction has been implicated in multiple syndromes (Uchiumi et al. 2015), including neurodegeneration (Lovell et al. 2000).
As CHIP can associate with inflammatory diseases, we leveraged our WEX data to further perform Human Leukocyte Antigen (HLA) typing (both classes I and II) and test the diversity in the HLA genes in association with M/L-CHIP, in the context of the heterozygote advantage hypothesis. This hypothesis posits that individuals who harbor heterozygous genotypes at the HLA genes are able to display a greater variety of antigenic peptides than those with homozygous genotypes at the HLA genes, leading to an immune response to a broader range of antigens (Doherty and Zinkernagel 1975; Pagliuca et al. 2022; Penn et al. 2002). Testing for homozygosity in Class I and Class II HLA genes with CHIP, overall CHIP prevalence was statistically significantly associated with higher proportion of HLA-DMB homozygosity. We also observed a weaker signal for PPM1D mutations with higher proportion of HLA-DQA1 homozygosity; and EEF1A1 mutations with a lower proportion of HLA-DPA1 homozygosity. These are HLA Class II genes expressed in antigen presenting cells, including lymphocytes, and future studies are needed to understand the interplay between homozygosity in these genes, immune deficiencies and cancer (Ogobuiro et al. 2023; Pagliuca et al. 2022; Planelles et al. 2006).
In exploratory analyses, we compared the association of blood counts data in WTC responders with the UK BioBank population. Specifically, we investigated M/L-CHIP with myeloid parameters (e.g. platelet, red blood cell, neutrophil and monocyte counts), and lymphocyte counts (Niroula et al. 2021). In the WTC cohort, consistent with the population-level UK Biobank cohort, TET2 mutations were negatively associated with lymphocyte counts (Kar et al. 2022) and PPM1D mutations were negatively associated with platelet counts (Kamphuis et al. 2023). However, most associations we observed between M-CHIP mutations and blood count were not statistically significant in a multivariate regression analysis, when age and other factors were taken into account. We also observed opposite trends from the UK Biobank cohort in the association of M-CHIP with platelet counts, and TET2 mutations with segmented neutrophil counts (though UK Biobank uses neutrophil counts). To understand whether this discrepancy is due to WTC-debris exposure, further investigation is warranted.
Recent studies on the UK BioBank report that in individuals without detectable CHIP at the time of lab measurements, abnormal complete blood count (CBC) labs can still be predictive of risk for future CHIP (Gu et al. 2023; Weeks et al. 2023), and that those that harbor abnormal myeloid blood cell parameters and CHIP are at the highest risk for developing myeloid malignancies. Similarly, those with elevated lymphocyte counts were reported to be at elevated risk for lymphoid malignancies, and those additionally with L-CHIP mutations at highest risk.
These reports suggest that the WTC participants with elevated myeloid and/or lymphoid blood counts, together with CHIP mutations, may be at highest risk for future malignancies. Currently, there is interest in treating CHIP patients with activated inflammatory pathways with IL-1β inhibitor and NLRP3 inhibitors in addition to IRAK1 inhibitors (Kanagal-Shamanna et al. 2024; Wang et al. 2021). Thus, it may be appropriate to investigate such interventions in WTC responders at the highest risk to prevent or delay overt myeloid neoplasm, CVD, major bleeding, and infection.
Overall, our study focused on the impact of exposure from 9/11 attacks on WTC first responders, and yet has implications beyond this population. While the WTC exposure included a complex mix of carcinogenic compounds specific to the site, exposure itself was a main determinant of risk for CHIP. This warrants future risk studies on populations exposed to debris (both from building fires and collapses) from modern warfare in populated cities.
Study findings will inform future research for risk stratification for those at highest risk, who may need to undergo more frequent screening and monitoring to prevent or delay cancer development. Despite a higher relative risk of malignancies in CHIP positive individuals, the absolute incidence remains low in the CHIP positive (Heuser et al. 2016). Yet, to identify individuals at highest risk for cancers, genetic information on CHIP can additionally be combined with demographic, clinical and laboratory data to build risk models. Efforts towards this end are already starting to gain traction (Weeks et al. 2023). We anticipate that with the establishment of CHIP clinics at many academic centers, longitudinal monitoring of CHIP individuals will inform improved risk models. Those at highest risk can then be followed, have risk assessment and eventually participate in clinical trials in CHIP clinics.
This study should be considered in the context of its limitations. Here, we compared the prevalence of CHIP in WTC rescue and recovery workers to unexposed controls. While the control participants were from the New York Area, they were not first responders. Previous studies on CHIP have additionally considered mosaic chromosomal alterations (mCAs) in the peripheral blood of individuals. We were unable investigate the prevalence of mCAs, as WES typically does not include enough heterozygous sites to allow their detection at low cell fractions (Loh et al. 2020). The study size is another weakness; the limited number of study participants makes it harder to detect modest effects. Thus, we cannot distinguish between no evidence for association due to limited power and an actual lack of association (e.g. between age and L- CHIP). This is especially important in understanding the differences in the association trends for M-CHIP positive cases with clinical and lab data in the WTC cohort versus the UK BioBank. Finally, the multiplicity of analyses we conducted may lead to results which are statistically significant at the 5% level, yet due to chance (false positives): in this respect, our results need confirmation in independent populations.
Data availability
We will deposit the whole exome sequencing data of the WTC responders in the database of Genotypes and Phenotypes (dbGaP).
Code Availability
All analyses were performed utilizing standard publicly available software. Any specific analysis code details are available from the authors upon request.
Authors’ contributions
P.B and Z.H.G. conceived and designed the study. M.E.S., P.K., R.J.K., and Z.H.G. wrote the manuscript. B.J.L. and X.Y. recruited participants and handled sample and data collection. Z.H.G led sample sequencing at Azenta Inc. M.E.S. performed sequence analyses and P.K. performed statistical analyses. All authors were involved in the interpretation of the results. B.J.L., J.M. and P.B. edited the manuscript. Z.H.G. supervised the study. All authors approved the final manuscript.
Supplementary Tables
Supplementary Table S1. List of myeloid associated (M-CHIP) mutations and lymphoid associated (L-CHIP) mutations considered in this study.
Supplementary Table S2. List of M/L-CHIP mutations identified among the 345 WTC samples.
Supplementary Table S3. CHIP-Phenotype associations.
Acknowledgements
This work was supported by grants to P.B., J.M. and Z.H.G. from the Centers for Disease Control and Prevention, National Institute of Occupational; Safety and Health (award # 1U01OH012187-01); to Z.H.G from Cancer Moonshot R33 award # CA263705-01; to B.J.L. and X.Y. from the Centers for Disease Control and Prevention (CDC/NIOSH 75D301-22-C-15522); and in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.
Footnotes
Conflict of Interest. The authors declare no potential conflicts of interest.