Abstract
Background High resolution computed tomography (HRCT) of the chest is increasingly used in clinical practice for sarcoidosis. Visual assessment of chest HRCTs in patients with sarcoidosis has high inter- and intra-rater variation. Radiomics offers a reproducible quantitative assessment of HRCT lung parenchyma and could be useful as an additional summary measure of disease. We develop radiomic profiles on HRCT and map them to radiologic, clinical, and patient reported outcomes.
Research Question Can radiomic analysis of chest HRCT cluster patients into groups that are related to radiologic, clinical, and patient reported outcomes?
Study Design and Methods Three-dimensional radiomic features were calculated on chest HRCT for both lungs from sarcoidosis cases enrolled in the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study (N=320). Robust and sparse K-means was used to cluster sarcoidosis cases using their radiomic profiles. Differences in patterns on visual assessment (VAS) by cluster were identified using chi-squared tests. Linear regression investigated how pulmonary function tests and patient reported outcomes differed between clusters with and without adjustment for other radiologic quantification.
Results Radiomic-based clustering identified four clusters associated with both Scadding stage and Oberstein score (P<0.001). One of the clusters had markedly few abnormalities. Another cluster had consistently more abnormalities along with more Scadding stage IV. Average pulmonary function testing (PFT) differed between clusters, even after accounting for Scadding stage and Oberstein score (P <0.001), with one cluster having more obstructive disease. The most discriminative radiomic measures explained 10-15% of the variation in PFT beyond demographic variables. Shortness of breath, fatigue, and physical health differed by cluster (P <0.014).
Interpretation Radiomic quantification of sarcoidosis identifies new subtypes representative of existing radiologic assessment and more predictive of pulmonary function. These findings provide evidence that radiomics may be useful for identifying new imaging-based disease phenotypes.
Sarcoidosis is a granulomatous interstitial lung disease which affects ∼ 110 thousand individuals in the United States1. Typical diagnosis is between 30-50 years of life, resulting in decreases in quality of life (QOL) and productivity2. Pulmonary disease occurs in over 90% of those with sarcoidosis3 with significant morbidity and mortality. Currently, visual assessment of chest radiography (CXR) is used to quantify lung abnormalities standardized via the Scadding staging4. Substantial variation exists in CT-based radiographic patterns within each Scadding stage limiting its utility in predicting prognosis even in the extreme stages5.
Chest high resolution computed tomography (HRCT) is increasingly used in clinical practice to monitor disease as it offers more detailed visualization of parenchymal abnormalities (PA) compared to CXR4,6–8. As with CXR, visual assessment of chest HRCT is used to evaluate abnormalities although there are limited standardized scoring metrics4,9,10. This is due in part to the diverse and heterogeneous patterns present on chest HRCT in sarcoidosis, often with multiple patterns noted. These complexities result in high inter- and intra-rater variation11. Recently, a Delphi study was undertaken to define phenotypes based on visual assessment, yet without assessment of clinical utility12. As not all patients have access to expert visual interpretation, more automated systems that quantify sarcoidosis chest HRCT could decrease variation and make HRCT more usable in routine clinical care.
Radiomics is when large numbers of quantitative features are extracted from medical images13. A radiomics panel computes summary measures of the distribution of the Hounsfield units (HU) along with summary measures of the spatial relationships of neighboring voxels14. The result is a characterization of image texture.
Radiomics have proved useful for quantifying HRCT in emphysema7, idiopathic pulmonary fibrosis15,16, interstitial lung disease17,18, diffuse lung disease19 and cancer20. Ryan et al.21 showed the potential utility of radiomics in sarcoidosis, comparing radiomic measures between sarcoidosis patients and controls. It remains unclear whether radiomics also has the potential to differentiate varied phenotypes within sarcoidosis patients and how radiomics relates to visual assessment (VAS). Radiomic profiles within patients with sarcoidosis that correlate with VAS, pulmonary function testing (PFT), and patient reported outcomes (PRO) would indicate that radiomics may serve as a useful refined measure to track change in the lung parenchyma over time than is possible with visual assessment.
The goal of this study was to develop a radiomic profile of chest HRCT in sarcoidosis using a phenotypically-diverse population of sarcoidosis participants from the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study22. We employ statistical clustering techniques and investigate the clinical utility of the clusters by quantifying their association with VAS, PFT and PRO23,24.
Study Design and Methods
Study design and participants
The sarcoidosis population was recruited in the multicenter NHLBI-funded GRADS study22 as cross-sectional observational cohort (N=368). This ancillary study has GRADS approval and all participants provided informed consent (IRB approval HS-2779 and HS-2780). More details of this cohort can be found in e-Appendix 1.
To comply with image biomarker standardization initiative (IBSI) recommendations25 details of the HRCT acquisition, processing and segmentation can be found in e-Appendix 1 and e-Tables 1 and 2. GRADS’ visual assessment score (VAS; Table 3, e-Appendix 1, e-Tables 3 and 4) was used to quantify overall Oberstein score5,22 along with additional information regarding presence of lymphadenopathy (LN), airway and vasculature distortion (AD), and PA (Table 3). Scadding stage was evaluated at the site using CXR (e-Appendix1).
PFT included pre-bronchodilator (pre-BD) forced expiratory volume at one second (FEV1), forced vital capacity (FVC), the ratio of FEV1 to FVC, and post-BD diffusing capacity of the lungs for carbon monoxide (DLCO). The PROs included the gastroesophageal reflux disease questionnaire (GERDQ)26, the University of California San Diego Shortness of Breath Questionnaire (SOBQ)27, two measures of fatigue (the Fatigue Assessment Scale [FAS]28 and Patient-Reported Outcomes Measurement Information System fatigue profile [PROMIS]29), the Cognitive Failure Questionnaire (CFQ)30, and the SF-1231,32 physical and mental subscales. The final analysis dataset included N=320 patients who had an analyzable HRCT and clinical data (e-Figure 1).
Radiomic Analysis
We computed 44 first-order and 239 gray-level co-occurrence matrix (GLCM) radiomic features on each lung (566 features) using the lungct and RIA packages in R33–36.
These are open-source packages with published documentation and permit perfect reproducibility with transparent implementation, aligning with IBSI goals. Furthermore, this allowed a fully R based analysis pipeline and provided a more comprehensive radiomic panel. Radiomic features beyond those typically used in neuroscience and cancer and previously standardized by IBSI may be important in diffuse pulmonary disease. For consistency with package documentation, we use the nomenclature of the RIA package.
To calculate the gray level co-occurrence matrix (GLCM) features, the Hounsfield units from each HRCT were discretized into 16 bins with equal relative frequencies; then, the features were calculated in 13 directions, assuming a voxel distance of one; these features were summarized using the mean statistic 34,35.
The R ez.combat function harmonized radiomic measures across scanners37–39, while preserving the biological variability in age, height, BMI, sex, race-ethnicity and GRADS phenotype (e-Table 2).
Radiomic features were high-dimensional and repetitive (Figure 1). We used a decorrelation filter40 (e-Appendix 1) that prioritized first-order over second-order features to identify a representative feature subset for analysis. This reduced the features to 99. We used robust and sparse k-means41 to cluster participants (R package, RSKC). For outlier robustness, trimming was set at 0.1. The optimal bound on feature weights (9.5) and number of clusters (4) were simultaneously selected using a permutation approach and BCS-based Gap statistic41 after standardization42. Standardization was performed with respect to the permutation reference to permit comparability across bounds and cluster numbers. The top five discriminative features were selected for further investigation.
Statistical Analysis and Validation
Descriptive statistics on VAS were computed by cluster and simulated Fisher’s exact tests used to assess cluster differences. Linear regression quantified associations between cluster, Scadding stage, and Oberstein Score, and each of the outcomes. We fitted univariable models with cluster, Scadding stage, or Oberstein score and then modeled them together to determine if cluster association remained significant accounting for other radiologic findings. Each outcome was modelled separately using a complete case analysis. Linear regression quantified associations between discriminative features and outcomes. All models were adjusted for age, sex, race/ethnicity (e-Appendix 1), height, and BMI. Analyses were conducted in R36.
This work was an unsupervised problem, which made traditional training and test validation approaches difficult given unknown true groupings. Instead, we conducted internal validation by applying our analysis pipeline under various conditions and with bootstrapped samples (e-Appendix 1).
Results
Tables 1-2 shows the characteristics of the study population (N=320), 146 (46%) self-identified as male, 220 (68.8%) self-identified as non-Hispanic white, 76 (24%) as black, 15 (4.7%) as Hispanic, and 9 (2.8%) as one of Asian, American Indian, Alaska Native, not identifying a single primary race or missing (N=2). The average age was 53 (SD=10) years, average height 67.0 in (SD=4.0) and average BMI 30.6 kg/m2 (SD=6.5). By design participants spread across Scadding stages, with 43 (14%) stage 0, 63 (20%) stage 1, 92 (29%) stage 2, 44 (14%) stage 3 and 75 (24%) stage 4. The average FEV1 was 2.62 L (SD=0.89), FVC 3.57 L (SD=1.07), and DLCO 80.31 (SD=23.97). The population predominantly demonstrated non-obstructive FEV1/FVC ratio (N=235; 73%).
Radiomic Based Clustering
We identified four clusters (Table 1). Ordered from highest FVC to lowest, 56 (17.5%) patients were cluster 1, 110 (34.3%) cluster 2, 54 (16.9%) cluster 3, and 100 (31.3%) cluster 4. Cluster was associated with Scadding stage (P<0.001), but not a direct reflection of Scadding stage (Table 1). Descriptively, cluster 3 had the highest percentage of Scadding stage IV (48%). Clusters 1 and 2 each had approximately 60% of the patients with a Scadding stage of I or II, a higher proportion of stage 0, and limited Scadding stage IV present. Cluster 3 had the second highest observed Scadding stage IV (40%) with a similar percentage of Scadding stage II compared to clusters 1 and 2 (∼30%).
Radiomic clustering and VAS
When descriptively investigating VAS patterns, the clusters reflected increased presence of VAS abnormalities with cluster 1 having low presence of nearly all airway and vascular distortions (AD) and PA and much lower average Oberstein scores than clusters 2-4 (Table 3; P <0.001). Cluster 2 reflected some increase in AD and PA with clusters 3 and 4 having a higher presence of AD and PA. For example, cluster 4 had at least double the presence of AD and PA compared to cluster 2. After adjustment for demographics, average Oberstein score for cluster 3 was 1.98 (SE=0.48) units and cluster 4 was 3.15 (SE=0.39) units higher than cluster 2 (p<0.001).
Associations with PFT and PRO
After adjustment for demographics, average PFT differed between clusters (Figure 2; P<0.001). Clusters 2 and 3 had ∼0.3 L lower average FVC compared to clusters 1 and 4 had ∼0.9 L lower average FVC compared to cluster 1. Cluster 2 had ∼0.2 L lower average FVC compared to cluster 1 and cluster 3 had a larger decline with an average FEV1 ∼0.5 L lower than cluster 1. Cluster 4 had the lowest average FEV1 and ∼0.8 L lower than cluster 1. DLCO had a similar pattern to FVC. Cluster 3 had the most FEV1/FVC based obstruction (54%) with clusters 1 and 2 having the same amount of obstruction (16%) and cluster 4 falling in the middle (32%).
After adjustment for demographics, PROs also differed between clusters but not consistently (Figure 3). Fatigue and Average shortness of breath (SOBQ) differed between clusters (P<0.014). Clusters 2 and 4 had a fatigue score that was more than 3 units lower on average compared to cluster 1 (p<0.016). Cluster 4 also had an average SOBQ ∼12 units higher than cluster 1 (p=0.003).
The clusters remained significantly associated with PFT after adjusting for Scadding stage (Figure 2; P<0.001) or Oberstein score (Figure 2; P<0.001) and after adjusting for both simultaneously (Figure 2; P<0.001). Oberstein score was not significantly associated with FVC after adjustment for cluster (P=0.59) and became even less significant after further adjustment for Scadding stage (P=0.88). However, Scadding stage was significantly associated with FVC after adjustment for cluster and Oberstein score (P=0.019). The findings for Oberstein score with DLCO were like those of FVC. For FEV1, Oberstein score was still significantly associated after adjustment for cluster (P=0.011) but became insignificant after additional adjustment for Scadding stage (P=0.21). Scadding stage remained significant in all models (P<0.001). For FEV1/FVC, all three assessments (radiomics, Scadding, and Oberstein score) were significant (P<0.016).
Cluster also remained associated with fatigue and SOBQ score after adjustment for Scadding and Oberstein (P<0.038; Figure 3).
The five most discriminatory radiomic features included kurtosis, which is a measure of shape of the distribution of the HU from an image, as well as four summary measures from the gray level co-occurrence matrix (GLCM). The GLCM measure spatial correlation and similarity of the HU in image voxels near each other. Image visualization of cases with different values of kurtosis and two GLCM measures are shown in Figure 4 with distribution of HU in e-Figure 2 for maximum, median and minimum kurtosis levels in our population. The images visibly show the increase in observable PA (Figure 4) with lower kurtosis. In addition, Figure 5 shows how Oberstein Score separates based on kurtosis and GLCM Min. GLCM Min appears to be most related to severity with higher Oberstein score mapping with higher GLCM Min and Scadding stage is, in general, not associated with either measure.
The discriminatory radiomic measures were jointly associated with FVC, FEV1, FEV1/FVC and DLCO (P<0.001; Table 4 and e-Table 5 for PRO). The radiomic measures explained between 10 to 15% more variation in PFT than adjustment for demographics (age, race, sex, height and BMI) only. For comparison, Scadding stage explained between 4% and 12% more variation in PFT and Oberstein score explained between 2% and 8% more variation in PFT.
Validation of Clustering
Detailed results can be found in e-Appendix 2, e-Figure 3. The validation analysis suggested only mild sensitivity of clustering to the analysis pipeline. There is more sensitivity to bootstrapped samples in how the sample is divided into clusters; however, cluster consistently remained a significant predictor of FVC across all sources of random perturbations in the pipeline. In all validation cases the proportions of P-values less than 0.01 were 100%.
Discussion
Radiographic manifestations in sarcoidosis are protean. As a result, traditionally VAS is used for image characterization and not standardized. The utility of VAS in sarcoidosis is limited by the intra-observer and inter-observer variability. Radiomics is a more reproducible and computationally efficient approach to characterize HRCT. We used radiomics to characterize images from a large, phenotypically diverse cohort of sarcoidosis subjects and demonstrated that radiomics are associated with VAS, clinical and patient reported outcomes of disease. Using a common unsupervised learning approach41, we identified four radiomic based clusters. These clusters differed significantly by PFT, fatigue, and shortness of breath, two PROs. Notably, each cluster included a range of Scadding stages and cluster remained significantly associated with PFT after accounting for Scadding stage and Oberstein score. These data suggest that radiomics represents radiographic abnormalities that differ from Scadding stage and may be a better representation of image characterization to explain PFT and PRO.
A holistic assessment of the clusters showed two lower severity and two higher severity clusters (Table 5). One high severity cluster (4) had significant fibrosis, substantial presence of AD and PA on VAS, and decreased PFT, which resulted in more shortness of breath and fatigue. Cluster 3 represented those with obstructive spirometry and more presence of AD and PA than clusters 1 and 2, moderate fibrosis, and the lowest FEV1/FVC ratio. These interpretations have similarities to another recent attempt to subtype using clinical characteristics43. Unlike our work, that work had no radiomics or VAS. We both several low severity clusters with minimal PA. We both found an obstructive cluster and a severe fibrotic cluster. More recently, HRCT VAS phenotypes were proposed via a Delphi consensus study12. That work defined two major groups, those with fibrotic versus non-fibrotic findings, similar to our clusters with fibrotic (3 and 4) versus more non-fibrotic (1 and 2) findings. However, the VAS of their various subtypes of non-fibrotic and fibrotic CT findings were included in a number of our clusters. This may reflect the power in our sample size to define subtypes or possibly that these phenotypes or VAS overlap as noted in our study.
Demographic characteristics of the individual explained a sizable amount of the variation observed in PFT (R2=50-60%). Radiomics also explained a significant amount of additional variation in PFT (10-50%). This additional amount of variation explained is consistent with other quantitative imaging approaches such as CALIPER17 and those used for investigations of other lung conditions such as systemic sclerosis 44 and diffuse interstitial lung diseases 45.
Except in lung cancer research, much of the quantitative CT image analysis has focused on HU density18,44. We included summarization of both density and spatial characterization. More measures of GLCM appeared discriminative in the cluster analysis than densitometry measures. This implies that texture is perhaps more useful for differentiating PA in sarcoidosis.
Although it remains speculative as to why decreased kurtosis is associated with decreased FVC and DLCO, it is plausible that as kurtosis decreases, more severe PA are present that impact lung function. As kurtosis decreases, more severe PA are visual present that may impact the function of the lung (Figure 4). Severe PA in sarcoidosis are observed as higher HU values. In e-Figure we observed the presence of a higher frequency of higher HU values dampening the peak in the HU distribution and leading to a lower kurtosis. In addition, GLCM Min appears likely to measure disease severity based on the visual patterns with Oberstein in Figure 5.
The associations with cluster and PROs were less consistent, although important patterns emerged. Cluster 4 demonstrated decreased QOL (SF-12 physical) and worse shortness of breath (SOBQ), findings that are consistent with worse lung function and VAS scores in this cluster. Cluster explained substantially less variation in PROs (0 to 8%) compared to PFTs. While the tools we used to measure PROs are validated and used widely in sarcoidosis, they are not generally correlated with objective measures of lung function46. As an example, fatigue is a known multi-factorial PRO in sarcoidosis and may impact a number of other PRO46,47. Our finding that cluster is associated with decreased physical function and increased shortness of breath provides encouragement that more detailed characterization of lung abnormalities, like we developed here, could contribute to a better understanding of the current disconnect between measures of lung function and patient experiences.
This study had several strengths. To our knowledge, this was the largest cluster of sarcoidosis patients with research grade HRCT available allowing for a full 3-D based quantitative analysis. Radiomic and other quantitative imaging approaches have a major strength in that they can be computed in a time efficient and reliable, reproducible automated procedure. The radiomics package in R took only 3 minutes to compute all the 3-D radiomics measures. This means large quantities of scans can have radiomic profiles computed for aiding in decisions on what scans the radiologist might prioritize for further consideration for visual or clinician evaluation.
In addition to the high-quality and robust clinical data, the GRADS study also provides a comprehensive and consistently collected set of patient reported outcomes, which reflect important aspects of treatment decision-making processes48. The results internally validated suggest our approach is not over fitting these data.
This study is not without limitations. The GRADS study relied on enrollment from academic centers and could be skewed to a population that was referred for worse disease or was near one of the centers, which were primarily localized to the Eastern US. The demographic is predominantly white and of higher SES as a result and thus may not provide full representation of the spectrum of disease. In addition, many subjects had disease for a decade or longer and were treated. While the same protocol was used across study sites for obtaining the CT images we studied, the scanners themselves differed. Harmonization was performed to mitigate systematic differences due to scanner type. In addition, the distribution of PFT measures is not dependent on scanner type. As such, we expect any differences in the radiomic measures due to scanner type to be non-differential as they relate to PFTs. Finally, the number of clusters we identified may not directly translate to other populations of sarcoidosis or be the unique optimal solution. There are multiple ways to choose the optimal number of clusters. We used an alternative radiomics package with a broader set of measures. However, not all these measures were part of IBSI work. We used open-source software accessible to any reader for verification of our findings and to conduct comparisons with other software.
Interpretation
In summary, this work provides evidence that in sarcoidosis radiomic quantification is useful for classifying abnormality of the lung along with pulmonary function and to a lesser degree patient reported outcomes. Future work should evaluate the potential of radiomics to capture small changes over time not easily detected by other assessment to further evaluate the potential of radiomics serving as a clinically useful quantitative measure of sarcoidosis presentation and progression.
Supplementary Material Statement
This article has an online supplement, which is accessible from this issue’s table of contents online.
Summary of COI
NEC, WLL, MM, TEF, SL, BB, TEF have received grants from the National Institutes of Health (R01 HL114587; R01 HL142049; U01 HL112695; U01 HL112707; U01 HL112707, U01 HL112694, U01 HL112695, U01 HL112696, U01 HL112702, U01 HL112708, U01 HL112711, U01 HL112712)
WLL is additionally support on National Institutes of Health 5T32HL007085
LAM has received grants from the National Institute of Health (R01HL140357, R01HL142049, R01 HL114587; U01 HL112695; U01 HL112707; U01 HL112707, U01 HL112694, U01 HL112695, U01 HL112696, U01 HL112702, U01 HL112708, U01 HL112711, U01 HL112712, and R01HL136681), Ann Theodore Foundation, the FSR, Mallinckrodt Pharmaceuticals, and the University of Cincinnati (Mallinckrodt Pharmaceuticals Foundation Grant) and serves on the Scientific Advisory Board for FSR, and Boeringer Ingelheim.
Funding
This work is supported by NIH grants: R01 HL114587; R01 HL142049; U01 HL112695; U01 HL112707; U01 HL112707, U01 HL112694, U01 HL112695, U01 HL112696, U01 HL112702, U01 HL112708, U01 HL112711, U01 HL112712 and 5T32HL007085
Data Availability
Data is available via the process laid out by the GRADS consortium.
Take-Home Points
Study Question: Is radiomics a useful quantitative approach for defining radiographic subtypes in sarcoidosis?
Results: Radiomics find four radiographic subtypes in sarcoidosis that are related to visual assessment and more predictive of clinical measures such as pulmonary function that visual assessment.
Interpretation: Radiomics analysis suggests four subtypes of sarcoidosis ranging from low severity with limited radiographic abnormality and normal pulmonary function to severe with significant abnormality on chest CT and either obstruction based on pulmonary function or reduced pulmonary function with fibrotic presentation on image.
Acknowledgements and Author contributions
NEC is responsible for all content in the manuscript.
N.E.C. Substantial contributions to the conception or design of the work, lead biostatistical analysis and interpretation of the data and wrote manuscript. W.L. developed the filtering method for this analysis, performed the sensitivity and validation analyses and edited the manuscript. S.M.R. performed image processing, conducted statistical analysis, prepared figures and table, interpreted the data, and contributed to the manuscript. M.M. coordinated the study, cleaned data, and contributed to the manuscript. B.B. supported data acquisition, data cleaning, project management, and contributed to the manuscript. S.L. contributed to the scientific rationale and interpretation of the data and contributed to the manuscript. L.A.M. Substantial contributions to the conception or design of the work, interpretation of the data and wrote the manuscript. T.E.F. Substantial contributions to the conception or design of the work, interpretation of the data and wrote the manuscript.
This work was supported by the National Institutes of Health (R01 HL114587; R01 HL142049; U01 HL112695; T32HL007085). Data from the GRADS study was used, which was funded by the NIH grant U01 HL112707 entitled “Sarcoidosis and A1AT Genomics and Informatics Centre”, as well as others (U01 HL112707, U01 HL112694, U01 HL112695, U01 HL112696, U01 HL112702, U01 HL112708, U01 HL112711, U01 HL112712). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
Footnotes
Refine analysis approach, assess sensitivity of analysis outcomes, and incorporate additional measures of disease in analyses.
Abbreviations
- AD
- Airway and Vascular Distortion
- BMI
- body mass index
- BVB
- bronchovascular bundle
- HRCT
- High resolution chest computed tomography
- CXR
- chest X-ray
- DLCO
- diffusing capacity for carbon monoxide
- FEV1
- forced expiratory volume in one second
- FEV1/FVC
- ratio of FEV1 to FVC
- FVC
- forced vital capacity
- GIC
- Genetics and Informatics Core
- GLCM
- Gray level co-occurrence matrix
- GRADS
- Genomic Research in Alpha-1 Anti-trypsin Deficiency and Sarcoidosis
- IRB
- institutional review board
- LN
- lymphadenopathy
- PA
- parenchymal abnormalities
- PFT
- Pulmonary Function Test
- QOL
- Quality of life
- VAS
- visual assessment score