Quantitative CT of Normal Lung Parenchyma and Small Airways Disease Topologies are Associated With COPD Severity and Progression
================================================================================================================================

* Alexander J. Bell
* Ravi Pal
* Wassim W. Labaki
* Benjamin A. Hoff
* Jennifer M. Wang
* Susan Murray
* Ella A. Kazerooni
* Stefanie Galban
* David A. Lynch
* Stephen M. Humphries
* Fernando J. Martinez
* Charles R. Hatt
* MeiLan K. Han
* Sundaresh Ram
* Craig J. Galban

## Abstract

**Objectives** Small airways disease (SAD) is a major cause of airflow obstruction in COPD patients, and has been identified as a precursor to emphysema. Although the amount of SAD in the lungs can be quantified using our Parametric Response Mapping (PRM) approach, the full breadth of this readout as a measure of emphysema and COPD progression has yet to be explored. We evaluated topological features of PRM-derived normal parenchyma and SAD as surrogates of emphysema and predictors of spirometric decline.

**Materials and Methods** PRM metrics of normal lung (PRMNorm) and functional SAD (PRMfSAD) were generated from CT scans collected as part of the COPDGene study (n=8956). Volume density (V) and Euler-Poincaré Characteristic (χ) image maps, measures of the extent and coalescence of pocket formations (i.e., topologies), respectively, were determined for both PRMNorm and PRMfSAD. Association with COPD severity, emphysema, and spirometric measures were assessed via multivariable regression models. Readouts were evaluated as inputs for predicting FEV1 decline using a machine learning model.

**Results** Multivariable cross-sectional analysis of COPD subjects showed that V and χ measures for PRMfSAD and PRMNorm were independently associated with the amount of emphysema. Readouts χfSAD (β of 0.106, p<0.001) and VfSAD (β of 0.065, p=0.004) were also independently associated with FEV1% predicted. The machine learning model using PRM topologies as inputs predicted FEV1 decline over five years with an AUC of 0.69.

**Conclusions** We demonstrated that V and χ of fSAD and Norm have independent value when associated with lung function and emphysema. In addition, we demonstrated that these readouts are predictive of spirometric decline when used as inputs in a ML model. Our topological PRM approach using PRMfSAD and PRMNorm may show promise as an early indicator of emphysema onset and COPD progression.

Keywords
*   chronic obstructive pulmonary disease
*   small airways disease
*   parametric response mapping
*   computed tomography of the chest
*   machine learning
*   emphysema

## Introduction

Chronic obstructive pulmonary disease (COPD) is a leading cause of death and healthcare burden in the United States and worldwide. Accounting for over 3 million deaths globally in 20151, this disease is expected to rise in prevalence as the world population ages2. COPD is understood to be a complex heterogeneous disease presenting clinically diverse phenotypes3, 4. Major causes of airflow obstruction are attributed to chronic bronchiolar obstruction, a.k.a small airways disease (SAD), and emphysema. Although SAD and emphysema are treated as separate COPD subtypes, studies have shown strong quantitative evidence that SAD exists as an intermediate state between healthy lung tissue and emphysema—i.e., irreversible lung damage—in COPD pathogenesis5–7. At present, little has been done to better quantify the onset of SAD from healthy lung parenchyma.

The Parametric Response Map (PRM) is a CT-based voxel-wise computational technique that can identify and quantify functional small airways disease (fSAD; an indirect measure of SAD) even in the presence of emphysema8. The percent volume of PRM-derived fSAD (PRMfSAD), i.e., the amount of fSAD in the lungs, has improved COPD phenotyping and the prediction of spirometric decline in subjects at risk of COPD9. To determine the value of spatial features from each PRM classification, we developed topological PRM (tPRM) as an extension of the PRM algorithm10. These radiographic tPRM readouts were shown to improve upon commonly used whole-lung PRM measures with respect to COPD characterization, and correlate to structural changes in lung tissue samples from lung transplant recipients diagnosed with bronchiolitis obliterans11.

In this study, we evaluated the PRM topologies volume density (V), a measure of extent, and Euler-Poincaré Characteristic (χ), a measure of pocket formation, of normal lung and fSAD as independent readouts of COPD severity, pulmonary function, and extent of emphysema using the Phase 1 COPDGene cohort12. We also investigated the potential of these topologic readouts as predictors of spirometric decline using a machine-learning model. This study demonstrates how tPRM readouts may be used as possible measures of early emphysema and COPD progression.

## Materials and Methods

### Study Sample

Our study was a secondary analysis of data from COPDGene (ClinicalTrials.gov: [NCT00608764](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT00608764&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom)), a large Health Insurance Portability and Accountability Act-compliant prospective multi-center observational study. In Phase 1 (2007-2012) and Phase 2 (2013-2017), 5-year follow-up, written and informed consent was obtained from all participants and the study was approved by local institutional review boards of all 21 centers. Ever-smokers with greater than or equal to 10 pack-year smoking history, with and without airflow obstruction, were enrolled between January 2008 and June 2011. Participants were non-Hispanic white or African American. Participants underwent volumetric inspiratory and expiratory CT using standardized protocol; images were transferred to a central lab for protocol verification and quality control (QC)12. Exclusion criteria included a history of other lung disease (except asthma), prior surgical excision involving a lung lobe or greater, present cancer, metal in the chest, or history of chest radiation therapy. Participants were excluded from the present study due to inadequate CT for computing tPRM, such as missing an inspiration/expiration scan, or failing QC implemented specifically for the present study. Our QC protocol is described in **Appendix 1**. Data for participants evaluated here have been utilized in numerous previous studies and a list of COPDGene publications can be found at [http://www.copdgene.org/publications](http://www.copdgene.org/publications). Our study is the first to report tPRM analysis across the whole Phase 1 cohort and predict spirometric decline over 5 years in the Phase 2 subset of COPDGene participants.

Spirometry was performed in the COPDGene study before and after the administration of a bronchodilator, specifically 180 mcg of albuterol (Easy-One spirometer; NDD, Andover, MA). Post-bronchodilator values were used in our analyses. COPD was defined by a post-bronchodilator FEV1/FVC of less than 0.7 at the baseline visit, as specified in the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines13. GOLD grades 1-4 were used to define disease severity. GOLD 0 classification, i.e., “at-risk,” was defined by a post-bronchodilator FEV1/FVC ≥ 0.7 at the baseline visit, alongside FEV1% predicted ≥ 80%. Participants with FEV1/FVC ≥ 0.7 with FEV1% predicted < 80% were classified as having preserved ratio impaired spirometry (PRISm)14. Demographic and spirometric measures used in this study included age, sex, race, smoking history, scanner manufacturer, body mass index (BMI), FEV1% predicted, FEV1/FVC and forced mid-expiratory flow (FEF25-75).

### Computed Tomography and Topological PRM Analysis

All computed tomography (CT) data were obtained from multiple sites associated with the COPDGene project at Phase 1. Whole-lung volumetric multidetector CT acquisition was performed at full inspiration and normal expiration at functional residual capacity using a standardized previously published protocol12. Data reconstructed with the standard reconstruction kernel was used for quantitative analysis. All CT data were presented in Hounsfield units (HU), where stability of CT measurement for each scanner was monitored monthly using a custom COPDGene phantom12. For reference, air and water attenuation values are −1,000 and 0 HU, respectively.

PRM were determined from paired CT scans using Lung Density Analysis (LDA) software (Imbio, LLC, Minneapolis, MN). LDA segmented the lungs from the thoracic cavity with airways removed. Inspiratory CT scans were spatially aligned to the expiratory geometric frame using deformable image registration. Lung voxels were classified using pre-determined HU thresholds as: normal (PRMNorm, -950 < inspiration HU ≤ -810, and expiration HU ≥ -856), functional small airways disease (PRMfSAD, -950 < inspiration HU ≤ -810, expiration HU < -856), emphysema (PRMEmph, inspiration HU < -950, expiration HU < -856), or parenchymal disease (PRMPD, inspiration HU > -810)15. Only voxels between -1,000 HU and -250 HU at both inspiration and expiration were used for PRM classification. Each PRM classification was quantified as the percent volume, which is defined as the sum of a PRM classification normalized to the total lung volume at expiration multiplied by 100.

Topological analysis of PRM was performed using methods previously described10. tPRM metrics were defined through application of Minkowski measures on 3D binary voxel distributions: volume density (V) and Euler-Poincaré Characteristic (χ)16. Maps of V and χ were computed for each PRM class (Norm, fSAD, Emph, and PD) using a 3D moving window of size 21 x 21 x 21 voxels evaluated on a grid of every 5th voxel. V was normalized by the Minkowski estimate of the mask within the same local window volume (rather than a direct calculation of the mask volume in the window as previously described) and χ by the masked window voxel count. Linear interpolation was applied to determine V and χ values for all segmented voxels.

To indicate the PRM class associated with a Minkowski measure, the class is presented as a superscript (e.g., VfSAD is the volume density of PRMfSAD). tPRM analysis was performed using open-source and in-house software developed in MATLAB R2019a (MATLAB, The MathWorks Inc., Natick, MA). A detailed overview and diagram, of computing tPRM from raw imaging data, was made by Hoff et al.10. Because the focus of this study is the relationship between normal parenchyma and SAD, and its association with emphysema, all analyses were performed using V and χ for PRM classifications Norm and fSAD. For completeness, V and χ for PRM classifications Emph and PD are provided.

### Phase 1 Data and Statistical Analysis

Data in this study are presented as mean and standard deviation unless stated otherwise. Correlation between V and χ for PRMNorm and PRMfSAD were calculated using Spearman rank-order correlation coefficients (*ρ*). The total Phase 1 cohort was separated into two subsets based on spirometry confirmed COPD: non-COPD (FEV1/FVC ≥ 0.7) and COPD (FEV1/FVC < 0.7). Cross-sectional multivariable regression analysis was performed on both subsets using a stepwise approach with V and χ for PRM classifications Norm and fSAD as independent variables and selected pulmonary function testing and clinical features as outcome variables, controlling for age, gender, race, BMI, smoking (pack years) and CT vendor. These control variables were included as compulsory independent variables in all regression models. Statistical work was conducted using IBM SPSS Statistics v27 (SPSS Software Products). In all tests, significance was defined by p < 0.05.

### Predict Spirometric Decline

We evaluated baseline V and χ for PRM classifications Norm and fSAD as predictors of FEV1 decline over 5 years using a machine learning (ML) model. A total of 4483 cases from the Phase 2 cohort of the COPDGene longitudinal trial, a subset of Phase 1, had FEV1 measurements at baseline and 5-year follow up. Our ML model is a sparse dictionary learning algorithm17–20 that classifies image patch features as “normal” or “abnormal”. In our method, we used the tPRM maps VNorm, VfSAD, χNorm, and χfSAD of each case as inputs for training and testing the algorithm. For training our ML model, individual cases were stratified based on the change in FEV1 over 5 years [= (FEV1 at yr 5 – FEV1 at yr 0)/5 years] as fast (ΔFEV1/yr ≤ - 60ml/yr; N=1516) and slow progressors (ΔFEV1/yr > -60ml/yr; N=2967). We used 35% of the data for training and 65% for testing the model. Training was performed on a randomly selected subset of 1569 cases, with N = 531 fast progressors and N = 1038 slow progressors. The remaining 2914 cases, consisting of N = 985 fast progressors and N = 1929 slow progressors, were used for testing the algorithm. In brief, our ML model is designed to associate unique features from the input image patches with fast and slow progressors. This is achieved by randomly selecting image patches from within the lung and extracting the information from the inputs (tPRM maps VNorm, VfSAD, χNorm, and χfSAD given as input to the ML algorithm) at these image patch locations and comparing their underlying patch features with the compiled class dictionaries of features, which are determined during training. It is important to note that no previous knowledge about the case and lung tissue features, such as emphysema, are provided for the algorithm to delineate “normal” from “abnormal” lung tissue. Details on model design and methods for training and testing are provided in **Figure 1** and **Appendix 2**. To determine the contribution of each feature to the model selection, we used the minimum redundancy maximum relevance feature selection algorithm21 to rank the tPRM inputs used in the dictionary learning algorithm. The algorithm quantifies the redundancy and relevance using mutual information of variables.22, 23 We also investigated the selection bias for each input in the ML model and obtained the prediction accuracy for 10 different choices of training image patches, considering each input separately in the model. The prediction accuracy for each training run is fit to a Gaussian probability density function24. All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (MathWorks, Natick, MA). To determine the contribution of our ML model to account for spatial features in predicting FEV1 decline, we determined if whole lung mean values of VNorm, VfSAD, χNorm, and χfSAD were as predictive of FEV1 decline using a logistic regression classifier.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F1)

Figure 1: 
Flowchart describing the various steps involved in the proposed dictionary learning algorithm. Step 1: 2D image patches are extracted from each of the prior maps (VNorm, VfSAD, χNorm, and χfSAD) in the labeled training data and a class specific dictionary is trained for the entire class. Step 2: all 2D patches from each 3D prior image map are classified and a threshold value is selected to classify the entire case as belonging to one of the classes. Step 3: The learned class specific dictionaries and the threshold for the entire case are used to classify the test images.

### Case Study: Spatial Analysis

To better understand the relationship between PRMfSAD and PRMEmph, we evaluated the spatial dependance of V and χ for these PRM classifications from a single subject. The case is a female subject, 45-50 years of age, diagnosed with GOLD 4 COPD. On a single axial slice, profiles of V and χ for PRMfSAD and PRMEmph were generated by selecting points from high emphysema (VEmph > 0.6) and low emphysema (VEmph < 0.2). A line plot (**Figure 2**) was produced for V and χ vs distance along each point of the profile. The distance, in units of centimeter, along the image profile was determined using the voxel dimensions of the CT scan. All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (MathWorks, Natick, MA).

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F2)

Figure 2: 
Case study demonstrating the spatial relationship between the topologies of PRMfSAD and PRMEmph. The case is a female subject, 45-50 years of age, diagnosed with GOLD 4 COPD. Single axial slice from (A) spatially aligned CT scan acquired at full inflation with corresponding (B) slices from V and χ of PRMfSAD and PRMEmph. (C) Topology values were plotted along the dashed line on the CT slice, starting from circle to star. Lines on plot were color coded to match PRM classification (red signifies PRMEmph and yellow signifies PRMfSAD). Solid and dashed lines indicate V (left y-axis) and χ (right y-axis).

## Results

### Population Characteristics

The original COPDGene Phase 1 cohort consisted of 10,300 individuals. We excluded 1,344 participants for: inadequate CT data, such as missing an expiration or inspiration scan, to conduct tPRM analysis (n = 1,125); missing clinical data (n = 16); or failing to pass our CT-based QC testing (n = 203). Further details of CT QC are provided in **Appendix 1**. The resulting complete subset used for analyses thus consisted of 8,956 participants. Baseline demographics and lung function for all Phase 1 participants, grouped based on FEV1% predicted and FEV1/FVC—that is, by GOLD grade or PRISm as described in the Materials and Methods—are reported in **Table 1**. Due to the COPDGene recruitment strategy, the proportion of GOLD 0 (FEV1/FVC ≥ 0.7, FEV1% predicted ≥ 80%) participants12 account for almost half of the study population (43%; 3,867 of 8,956 participants). Increasing percent volume of PRM-derived fSAD (PRMfSAD) and PRM-derived emphysema (PRMEmph), with decreasing PRMNorm, was observed with higher GOLD grades. This is consistent with previously published work. PRM-derived parenchymal disease (PRMPD) was found to be elevated in PRISm and GOLD 0 participants (35.8 ± 16.4% and 26.3 ± 12.8% of the total lung volume, respectively) as compared to the COPD subset.

View this table:
[Table 1:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/T1)

Table 1: Clinical Characterization of the Study Population Notes
: Participant characteristics of the entire study population separated in subsets of those with (FEV1/FVC < 0.7) and without (FEV1/FVC ≥ 0.7) COPD. Values are displayed as mean (standard deviation). GOLD, Global Initiative for Chronic Obstructive Lung Disease; PRISm, preserved ratio impaired spirometry; GOLD 0, at-risk smokers with normal spirometry; BMI, body mass index; FEV1, forced expiratory volume in one second; FVC, forced vital capacity; FEF25-75, forced mid-expiratory flow; PRM, parametric response map; Norm, Normal; fSAD, functional small airways disease; Emph, emphysema; PD, parenchymal disease.

### Topological Readouts of PRM

Presented in **Figure 3** is a case with elevated fSAD (PRMfSAD = 40%). Representative coronal slices of the expiration CT scan and PRMfSAD, overlaid on CT scan, are provided. To illustrate the dependence of V and χ on the arrangement of PRMfSAD, we have included VfSAD and χfSAD maps indicating regions with low (cyan box) and high (magenta box) VfSAD. As expected, VfSAD (**Figure 3C**) is dependent on the amount of fSAD (yellow voxels in **Figure 3B**). Averaged over the lungs, VfSAD is proportional to the percent volume of PRMfSAD by a factor of 100. However, χfSAD > 0 (cyan box in **Figure 3D**) corresponds to the formation of fSAD pockets (cyan box **Figure 3B**), whereas χfSAD < 0 (cyan box in **Figure 3D**) is the consolidation of these pockets into a mesh with holes (magenta box in **Figure 3B**).

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F3)

Figure 3: 
Illustration of Volume Density (V) and Euler-Poincaré Characteristic (χ) for PRMfSAD. Presented are representative coronal slices for the (A) expiratory CT scan with associated (B) PRMfSAD overlay (yellow). Included are the (C) volume density and (D) Euler-Poincaré Characteristic of PRMfSAD. Magenta box indicates a lung region with elevated VfSAD and negative χfSAD. Cyan box indicates a lung region with minimal VfSAD and positive χfSAD. The subject is a GOLD 3 female, 50-55 years of age, with FEV1% predicted of 32% and percent volume of PRMfSAD of 40%.

The volume density of PRMNorm and PRMfSAD demonstrated an inverse relationship with increasing COPD severity (**Figure 4A**), consistent with previous work. A similar inverse relationship was observed for χ of both normal lung and fSAD (χNorm and χfSAD). Values of χNorm and χfSAD were found to flip about zero (e.g., χfSAD changes from positive to negative values) from GOLD 2 to GOLD 4 (**Figure 4B**). As provided in **Table 2**, χNorm and χfSAD had means (standard deviations) of -0.0084 (0.0071) and 0.0047 (0.0074), respectively, for cases diagnosed as GOLD 2. For those with severe COPD, i.e., GOLD 4, χNorm and χfSAD were 0.0039 (0.0055) and -0.0036 (0.0048), respectively. Mean values of χEmph and χPD were found to be positive and similar across GOLD (**Table 2**).

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F4)

Figure 4: 
Boxplots for Volume Density (V) and Euler-Poincaré Characteristic (χ) of PRMNorm and PRMfSAD across all GOLD stages, “at-risk” (GOLD 0) and PRISm. Plots of (A) V and (B) χ are provided for PRMNorm (green) and PRMfSAD (yellow). Box plots were computed following standard protocol for box and whiskers. Box lines determined by lower quartile (Q1), middle quartile / median (Q2), and upper quartile (Q3). Whiskers are drawn out to Q1 - 1.5 x IQR and Q3 + 1.5 x IQR for lower and upper limits respectively. IQR = Q3-Q1. Outliers are defined as points beyond the given upper and lower limits and illustrated as black points with a random bounded horizontal perturbation beyond box whiskers.

View this table:
[Table 2:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/T2)

Table 2: Topology Readouts for all PRM Classifications Notes:
Data are presented as the mean (standard deviation). The entire study population was separated into subsets of those with (FEV1/FVC < 0.7) and without (FEV1/FVC ≥ 0.7) COPD. GOLD, Global Initiative for Chronic Obstructive Lung Disease; PRISm, preserved ratio impaired spirometry; GOLD 0, at-risk smokers with normal spirometry; PRM, Parametric Response Map; Norm, normal lung parenchyma; fSAD, functional small airways disease; Emph, emphysema; and PD, parenchymal disease.

We further evaluated the relationship of PRMNorm and PRMfSAD with respect to V (**Figure 5A**) and χ (**Figure 5B**). Both V and χ demonstrated strong correlations between Norm and fSAD (ρ = -0.666, p < 0.001 and ρ = -0.745, p < 0.001, respectively) over the Phase 1 cohort. Here the GOLD stages are coded by color and the relative amount of emphysema, quantified by VEmph, by size of the marker. As observed in **Figure 5A**, VNorm versus VfSAD had more scatter in the data compared to χNorm versus χfSAD (**Figure 5B**). As expected, GOLD 4 cases with elevated emphysema (VEmph) demonstrated a drop in VNorm and VfSAD values. In contrast, χNorm consisted of primarily positive values, whereas positive and negative values were observed for χfSAD (**Figure 5B**). Although VfSAD was found to be strongly correlated to VEmph (r = 0.845, p < 0.001), only a weak correlation was observed between χfSAD and χEmph (ρ = -0.155, p < 0.001).

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F5)

Figure 5: 
Scatter plots of all study sample participants for (A) VNorm versus VfSAD and (B) χNorm versus χfSAD. Individual points are color coded based on COPD classifications. The size of the points indicates the amount of emphysema as measured by the volume density of PRMEmph (VEmph).

### Multivariable Regression Analysis

Presented in **Table 3** are results from multivariable regression analyses that demonstrate the contribution of V and χ to PRMNorm and PRMfSAD when modeling spirometric measures and the volume density of emphysema, controlling for age, sex, race, BMI, pack-years, and CT vendor. Among those with spirometrically confirmed COPD, VNorm was found to be significantly associated with multiple clinical measures including FEV1% predicted, FEV1/FVC, FEF25-75 and VEmph (see **Table 3**). VfSAD and χfSAD were found to independently and significantly contribute to FEV1% predicted (β = 0.065, p=0.004 and β = 0.106, p<0.001). Only the Norm topological measures were found to contribute to FEV1/FVC (β = 0.668, p<0.001 for VNorm and β = - 0.120, p<0.001 for χNorm), whereas V and χ for both Norm and fSAD were found to be significant parameters for FEF25-75. With respect to VEmph, extent of emphysema, V and χ for Norm and fSAD were highly significant but demonstrated similar trends irrespective of PRM classification. For completeness, the same analyses were performed on the non-COPD cohort (**Table 4**). As compared to the COPD cohort, statistical models generated from the non-COPD cohort demonstrated significant parameters but with weaker correlations (i.e., adjusted R2).

View this table:
[Table 3:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/T3)

Table 3: Multivariable Regression for COPD Subset Notes:
Multivariable regression modelling using volume density (V) and Euler-Poincaré Characteristic (χ) for PRM-derived Normal and fSAD (introduced stepwise) to model pulmonary function testing measures in the COPD subset. Each column presents results for a different regression model. FEV1, forced expiratory volume in one second; FVC, forced vital capacity; FEF25-75, forced mid-expiratory flow; Emph, emphysema; SE, standard error of the estimate; BMI, body mass index; Norm, Normal; fSAD, functional small airways disease. Model performance is reported as adjusted R2 and standard error of the estimate. Feature association is reported as standardized beta coefficients (β); cells for stepwise variables removed from final model. All regression models were controlled for age, sex, race, BMI, pack years and CT vendor. P values ≥ 0.01, < 0.01 and ≥ 0.001, and < 0.001 are presented as values in parentheses, *, and **, respectively.

View this table:
[Table 4:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/T4)

Table 4: Multivariable Regression for non-COPD Subset Notes:
Multivariable regression modelling using volume density (V) and Euler-Poincaré Characteristic (χ) for PRM-derived Normal and fSAD (introduced stepwise) to model pulmonary function test measures in the COPD subset. Each column presents results for a different regression model. FEV1pp, forced expiratory volume in one second percent predicted; FEV1, forced expiratory volume in one second; FVC, forced vital capacity; FEF25-75, forced mid-expiratory flow; Emph, emphysema; SE, standard error of the estimate; BMI, body mass index; Norm, normal; and fSAD, functional small airways disease. Model performance is reported as adjusted R2 and standard error of the estimate. Feature association is reported as standardized beta coefficients (β); cells for stepwise variables removed from final model. All regression models were controlled for age, sex, race, BMI, pack years and CT vendor. P values ≥ 0.01, < 0.01 and ≥ 0.001, and < 0.001 are presented as values in parentheses, *, and **, respectively.

### Prediction Model of Spirometric Decline

Representative axial slices of expiration CT scan, PRM, VfSAD, χfSAD and corresponding patch probability maps from a fast progressor (with ΔFEV1/yr of -249 ml/yr) are provided in **Figure 6**. Our ML model correctly classified this subject as a fast progressor. This case is a male, 60- 65 years of age, diagnosed at baseline with GOLD 2 COPD. Using V and χ from PRMfSAD and PRMNorm as inputs, the ML model was able to determine regions of emphysema, discernible from existing fSAD, observed in the right upper lung as “abnormal” (blue patches in the probability maps). In contrast, the dorsal lung regions were classified as “normal” (red patches in the probability maps) due to the absence of fSAD and emphysema. For completeness we have provided in **Figure 7** representative axial slices of expiration CT scan, PRM, VfSAD, χfSAD and corresponding patch probability maps from a slow progressor (with ΔFEV1/yr of 101 ml/yr).

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F6)

Figure 6: 
The dictionary learning results on a 60- to 65-year-old male diagnosed at baseline with GOLD 2 COPD and a fast progressor with ΔFEV1/yr of -249 ml/yr. Representative axial slice of an expiratory CT scan acquired at baseline, its associated PRM map, the tPRM maps VfSAD and χfSAD of PRMfSAD, and their image patch probability maps from the dictionary learning model.

![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F7.medium.gif)

[Figure 7:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F7)

Figure 7: 
The dictionary learning results for a 70- to 75-year-old male diagnosed at baseline with GOLD 1 COPD and declared a slow progressor with ΔFEV1/yr of 101 ml/yr. This case was correctly identified by our ML algorithm as a slow progressor. Representative axial slice of an expiratory CT scan, its associated PRM map, the tPRM maps VfSAD and χfSAD of PRMfSAD, and their image patch probability maps from the dictionary learning model.

As seen in **Figure 8A** and **B**, our ML model had an overall classification accuracy of 70.6% and Area Under the Curve (AUC) of 0.69 of the receiver operating characteristic (ROC) curve. We compared our ML model with a simple logistic regression model using whole lung mean values of VNorm, VfSAD, χNorm, and χfSAD. **Figure 8B** shows that the logistic regression model only achieved an AUC of 0.55. The contribution of each of the inputs to the model (VNorm, VfSAD, χNorm, and χfSAD) are shown in **Figure 8C** and **D**. V and χ of PRMfSAD are dominant inputs, followed by V and χ of PRMNorm (**Figure 8C**). Using a feature rank analysis performed on our test set, we observed that V and χ of PRMfSAD are important to achieve higher prediction accuracy. In fact, χfSAD was found to have the smallest spread/variance (**Figure 8D**), indicating highly desirable robustness to the choice of training image patches and its usefulness as an input in the ML model. As reported in **Table 5**, “normal” patches, on average, consisted primarily of PRMNorm, elevated VNorm (abundant) and negative χNorm (consolidated), with negligible PRMfSAD, low VfSAD (depleted) and positive χfSAD (sparse pockets). In “abnormal” patches, similar values of V and χ for PRMNorm and PRMfSAD were observed (**Table 5**). Positive and negative values in χfSAD were found for “normal” and “abnormal” patches, respectively. This is consistent with the inverse relationship seen with increasing COPD severity shown in **Figure 4**.

![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F8.medium.gif)

[Figure 8:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F8)

Figure 8: 
Results and relevance of the different features (tPRM metrics as inputs) used in the dictionary learning method. (A) Confusion Matrix showing the sensitivity and specificity of the ML model classifications for both the fast progressor (N = 985) and the slow progressor (N = 1929) classes in the test set. Green colored and red colored fields in the matrix represent agreement and disagreement, respectively, of the ML model with the actual decision. (B) Receiver Operating Characteristic (ROC) curve for our ML model and the logistic regression classifier with the corresponding Area Under the Curve (AUC) statistics. (C) Bar plot showing the feature importance score and feature ranking using the minimum redundancy maximum relevance method. (D) Plot showing the distribution of the features and their prediction accuracy over ten different training runs.

View this table:
[Table 5:](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/T5)

Table 5: Image Patch topological PRM metrics in the ML model Notes:
Data are presented as the mean (standard deviation).

### Dependence Between Topologies of PRMfSAD and PRMEmph

As the topologies of PRM were determined as averages over the whole lungs, we provide a case study illustrating the relationship between V and χ of PRMfSAD and PRMEmph at the local level. Presented in **Figure 2** are the profiles of V and χ of PRMfSAD and PRMEmph from a region of the right lung with elevated and reduced VEmph (orange circle and star, respectively, in **Figure 2A** and **C**). The case is a female subject, 45-50 years of age, diagnosed with GOLD 4 COPD. The subject was found to have on average high levels of VfSAD (0.37) with relatively elevated VEmph (0.1). Mean values for the whole lungs of χ were 0.008 and -0.009 for PRMEmph and PRMfSAD, respectively. As seen in **Figure 2C**, VfSAD increased while VEmph decreased further from lung with the highest level of VEmph (∼0.6 at orange circle in **Figure 2A** and **C**). At approximately 1.8 cm, volume densities between PRMfSAD and PRMEmph transitioned. In addition, χfSAD was found to increase with decreasing χEmph with transition occurring at ∼1.2cm.

## Discussion

The topological parametric response map is an extension of the well-established PRM method, a quantitative imaging marker of SAD8. In this study, we have demonstrated that inclusion of topological features, in this case the Euler-Poincaré Characteristic (χ), improved characterization and interpretation of fSAD in COPD as a complimentary readout of volume density (V), which is equivalent to traditional percent volume of PRM classifications10. This study also evaluated the role of PRM-defined normal parenchyma (PRMNorm) and fSAD (PRMfSAD) as lone indicators of COPD severity. We observed distinct patterns in topological metrics with respect to GOLD grades and identified a complete inversion in topology, characterized by Euler-Poincaré Characteristic χ, between normal lung and fSAD, in mid-to-late stages of COPD. We also found V and χ of PRMNorm and PRMfSAD to have statistically significant correlation with spirometric measures and emphysema and to be predictive of spirometric decline.

Our study builds on previous work by Hoff et al.10 on tPRM characterization in COPD. This study used a much smaller population (n = 88) to demonstrate the trends of all four topological features (volume density, surface area, mean curvature and Euler-Poincaré Characteristic) with increasing COPD severity10. Limited in statistical power, it instead focused on the surface area of fSAD. Access to a notably larger population (n = 8,956) in the current study allowed us to evaluate the volume density (V) and Euler-Poincaré Characteristic (χ) of PRMNorm and PRMfSAD and relate our findings to the field’s current understanding of COPD progression, i.e., normal parenchyma transitions to emphysema through SAD.

A key finding of our study is the ability to quantify parenchymal lung health, based not only on the extent but also on the arrangement of local lung abnormalities, i.e., fSAD. This is rooted in the concept that the lungs are healthy (i.e., PRMNorm) and COPD progresses through SAD (i.e., PRMfSAD), an intermediate between normal and emphysematous lung tissue, to emphysema. The nature of this transition suggests χ may be capturing a fundamental mechanism in the emergence of fSAD. Based on our observation, fSAD appears to develop as distinct pockets, which are represented as positive values in χfSAD within healthy lung tissue, as depicted in the cyan box in **Figure 3B**. With increasing COPD severity, fSAD pockets coalesce to a mesh, which is represented by negative values in χfSAD (magenta box in **Figure 3B**). On a whole lung level, this transition occurs on average from GOLD stages 2 to 4. By quantifying the amount and arrangement of normal and fSAD parenchyma, one can assess the severity of COPD. As fSAD is an intermediate between healthy lung and emphysema, increasing levels of emphysema have a direct effect on V and χ of fSAD. This is observed in **Figure 5** and **Figure 2**, where increasing values of VEmph resulted in a drop in VfSAD and increase in χfSAD. These trends were reflected in our multivariable model for VEmph as well (**Table 3**).

In a seminal study, McDonough and colleagues7 provided pathological evidence demonstrating the role of SAD in COPD progression. Using high resolution (∼10 μm) microCT to analyze frozen lung samples from lung transplant recipients with end-stage COPD, they found that widespread narrowing and destruction of the smaller airways (i.e., SAD) occurred before emphysematous lesions became large enough to be visible on standard CT imaging. They concluded that SAD might serve as an emphysema precursor. Based on their observation, we postulated that the transition observed between χNorm and χfSAD (**Figures 4** and **5**) should be observed for χfSAD and χEmph. Using mean values of χ over the lungs, χEmph was found to be relatively stable, generating positive values across GOLD (**Table 4**), as well as demonstrating a weak correlation to χfSAD (ρ = -0.155, p < 0.001). Nevertheless, evaluating χfSAD and χEmph at the local level, we observe a strong association between these two readouts (**Figure 2**), which may be linked to the structural changes in the terminal airways observed using microCT of lung explants.

In a recent study, Bhatt and colleagues evaluated a CT readout, referred to as the mean Jacobian determinant of normal voxels, at varying distances from emphysematous tissue25. When measured at 2mm from CT voxels designated emphysema (i.e., voxel HU <-950HU), this CT-based readout was found to be predictive of spirometric decline. Our spatial analysis of a single case clearly demonstrates a transition in topologies of PRMfSAD and PRMEmph, 1.8 cm and 1.2cm for V and χ, respectively (**Figure 2**). It is the association of topologies between PRMfSAD and PRMEmph at the local level that allows our machine learning model to predict spirometric decline, with an accuracy of 70%, in the absence of any emphysema readout as an input (**Figure 6**). Although the readouts reported by Bhatt and colleagues lacked quantification of SAD, there is clear agreement that lung tissue along the periphery of emphysematous tissue provides potential insight into COPD progression. Using only topologies of PRMNorm and PRMfSAD, our patch-based ML model outperformed the whole-lung logistic regression model (**Figure 8B**). This result highlights the importance of the spatial relationship of χfSAD to χEmph to predict spirometric decline (**Figures 6** and **8**).

We acknowledge several notable limitations. COPDGene comprises over 20 study sites, making scanner variation and reconstruction kernel inconsistency inevitable. Sensitivity of PRM to scanner variability was addressed previously26 and although effort was made to apply PRM only to soft kernels, variability in scanner type was unavoidable. However, we included scanner vendor in our multivariable regressions and found that it did not significantly confound models. Another limitation is variation in levels of inspiration and expiration during CT acquisition. Earlier work demonstrated that even small perturbations from functional residual capacity (FRC) have an observable effect on threshold-based techniques such as PRM26. To limit this, we implemented QC that excluded participants based on erroneous volume changes or strong discordance with correlation between PRMNorm and FEV1% predicted. In summary, we have demonstrated that topological features, V and χ, are able to enhance the sensitivity of PRM classifications, notably Norm and fSAD, to extent of emphysema and COPD severity. These data support the concept that as pockets of small airways disease coalesce, surrounding normal tissue is lost. Pockets of fSAD are seen to correlate with increasing presence of emphysema, independent of the amount of fSAD present. We further demonstrated that local levels of χfSAD and χEmph correlate, which may be explained by bronchiolitis along the periphery of emphysematous tissue observed by McDonough and colleagues using microCT. In addition, we demonstrated that local values of V and χ for PRMNorm and PRMfSAD provide sufficient information to predict spirometric decline, even in the absence of any prior knowledge of emphysema. Our study provides a unique strategy to detect subtle changes in lung parenchyma that may progress to emphysema. This approach to monitoring extent and arrangement of Norm and fSAD offers insight into COPD phenotypes and provides improved prognostic information that has relevance in clinical care and future clinical trials.

## Data Availability

The datasets presented in this study are not readily available because they are part of an NIH sponsored clinical trial and require a data use agreement to be signed. For access to COPDGene data visit https://www.copdgene.org/phase-1-study-documents.htm for instructions.

## Conflicts of Interest and Sources of Funding

### Conflicts of Interest

Wassim W. Labaki reports personal fees from Continuing Education Alliance. Benjamin A. Hoff and Craig J. Galban are co-inventors and patent holders of tPRM, which the University of Michigan has licensed to Imbio, LLC. Craig J. Galban is co-inventor and patent holder of PRM, which the University of Michigan has licensed to Imbio, LLC. Benjamin A. Hoff and Craig J. Galban have financial interest in Imbio, LLC. Charles R. Hatt is employed by Imbio, LLC. David A. Lynch reports funds paid to the institution from NIH and personal payments from Boehringer Ingelheim. MeiLan K. Han reports personal fees from GlaxoSmithKline, AstraZeneca, Boehringer Ingelheim, Cipla, Chiesi, Novartis, Pulmonx, Teva, Verona, Merck, Mylan, Sanofi, DevPro, Aerogen, Polarian, Regeneron, Amgen, UpToDate, Altesa Biopharma, Medscape, NACE, MDBriefcase and Integrity. She has received either in kind research support or funds paid to the institution from the NIH, Novartis, Sunovion, Nuvaira, Sanofi, AstraZeneca, Boehringer Ingelheim, Gala Therapeutics, Biodesix, the COPD Foundation and the American Lung Association. She has participated in Data Safety Monitoring Boards for Novartis and Medtronic with funds paid to the institution. She has received stock options from Meissa Vaccines and Altesa Biopharma. For the remaining authors none were declared.

### Funding

This work was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health Grants R01 HL139690 (to Craig J. Galban), R01 HL150023 (to MeiLan K. Han, Craig J. Galban, and Charles R. Hatt) and T32 HL 007749 (to Jennifer M. Wang) and by the National Heart, Lung, and Blood Institute of the National Institutes of Health Grants U01 HL089897 and U01 HL089856, and NIH contract 75N92023D00011, which support the COPDGene study. The COPDGene study ([NCT00608764](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT00608764&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom)) is also supported by the COPD Foundation through contributions made to an Industry Advisory Committee that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer and Sunovion.

## Acknowledgements

We acknowledge the COPDGene investigators for their role in the study providing data for this project:

**Administrative Center**: James D. Crapo, MD (PI); Edwin K. Silverman, MD, PhD (PI); Barry J. Make, MD; Elizabeth A. Regan, MD, PhD

**Genetic Analysis Center**: Terri Beaty, PhD; Ferdouse Begum, PhD; Peter J. Castaldi, MD, MSc; Michael Cho, MD; Dawn L. DeMeo, MD, MPH; Adel R. Boueiz, MD; Marilyn G. Foreman, MD, MS; Eitan Halper-Stromberg; Lystra P. Hayden, MD, MMSc; Craig P. Hersh, MD, MPH; Jacqueline Hetmanski, MS, MPH; Brian D. Hobbs, MD; John E. Hokanson, MPH, PhD; Nan Laird, PhD; Christoph Lange, PhD; Sharon M. Lutz, PhD; Merry-Lynn McDonald, PhD; Margaret M. Parker, PhD; Dmitry Prokopenko, Ph.D; Dandi Qiao, PhD; Elizabeth A. Regan, MD, PhD; Phuwanat Sakornsakolpat, MD; Edwin K. Silverman, MD, PhD; Emily S. Wan, MD; Sungho Won, PhD

**Imaging Center**: Juan Pablo Centeno; Jean-Paul Charbonnier, PhD; Harvey O. Coxson, PhD; Craig J. Galban, PhD; MeiLan K. Han, MD, MS; Eric A. Hoffman, Stephen Humphries, PhD; Francine L. Jacobson, MD, MPH; Philip F. Judy, PhD; Ella A. Kazerooni, MD; Alex Kluiber; David A. Lynch, MB; Pietro Nardelli, PhD; John D. Newell, Jr., MD; Aleena Notary; Andrea Oh, MD; Elizabeth A. Regan, MD, PhD; James C. Ross, PhD; Raul San Jose Estepar, PhD; Joyce Schroeder, MD; Jered Sieren; Berend C. Stoel, PhD; Juerg Tschirren, PhD; Edwin Van Beek, MD, PhD; Bram van Ginneken, PhD; Eva van Rikxoort, PhD; Gonzalo Vegas Sanchez-Ferrero, PhD; Lucas Veitel; George R. Washko, MD; Carla G. Wilson, MS; PFT QA Center, Salt Lake City, UT: Robert Jensen, PhD

**Data Coordinating Center and Biostatistics, National Jewish Health, Denver, CO**: Douglas Everett, PhD; Jim Crooks, PhD; Katherine Pratte, PhD; Matt Strand, PhD; Carla G. Wilson, MS

**Epidemiology Core, University of Colorado Anschutz Medical Campus, Aurora, CO**: John E. Hokanson, MPH, PhD; Gregory Kinney, MPH, PhD; Sharon M. Lutz, PhD; Kendra A. Young, PhD

**Mortality Adjudication Core**: Surya P. Bhatt, MD; Jessica Bon, MD; Alejandro A. Diaz, MD, MPH; MeiLan K. Han, MD, MS; Barry Make, MD; Susan Murray, ScD; Elizabeth Regan, MD; Xavier Soler, MD; Carla G. Wilson, MS

**Biomarker Core**: Russell P. Bowler, MD, PhD; Katerina Kechris, PhD; Farnoush Banaei-Kashani, Ph.D We would also like to acknowledge our copy editor Lee Olsen for her assistance in preparing this manuscript.

## Appendices

## Appendix 1: Quality Control (QC) Protocol

QC was performed in two steps (e.g., see 3 and 4 in exclusion diagram below), using GOLD grade, segmented lung volume change (ΔV = inspiration volume – expiration volume, as a function of segmented voxels), and a correlation test metric (Q) defined as the absolute value of the difference between standard scores of FEV1% predicted and %PRMNorm, reported to be highly correlated in COPD studies.8

![Figure9](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/20/2023.05.26.23290532/F9.medium.gif)

[Figure9](http://medrxiv.org/content/early/2023/11/20/2023.05.26.23290532/F9)

Specifically, imaging QC tests for exclusion were applied consecutively as follows:

### 1. ΔV < -0.5 L OR Q ≥ 3

A large negative volume change, here defined as greater than 0.5 L, often indicates transposition of intended respiratory stages (expiration/inspiration), due to faulty maneuver or data handling error. In addition, we test here if a deviation of equal to or greater than 3 standard deviations from the expected positive correlation between FEV1% predicted and %PRMNorm has occurred.

### 2. ΔV ≤ 0 L and GOLD < 3

This second step goes on to test if a non-severe COPD participant (GOLD < 3) has zero or negative volume change. N.B. here and in step 1 we have considered that there may be participants with severe disease that have some abnormal volume changes (close to 0 due to very limited lung function).

## Appendix 2: Dictionary Learning Algorithm

Image sparsity has emerged as a significant property of images and sparsity-based regularization has been used for various image processing applications.18–20, 24, 27–30 Sparse image representations are at the heart of many modern approaches to medical image classification and include 31–33. The sparse model assumes that each patch within an image can be accurately represented using a few elements of a basis set called a dictionary. For image classification problems a separate class-specific dictionary is learnt from patches belonging to each class of images. In this work, we have developed a multiview task-driven dictionary learning algorithm – a novel approach that aims to learn discriminative dictionaries for each class from multiple views of the data in a joint fashion by imposing group sparsity constraints34.

### Dictionary Learning

Our proposed method utilizes an overcomplete dictionary *D* constructed from the CT images, which is an *n* × *K* matrix whose columns represent *K* “atoms” of size *n*, where an “atom” is a sparse coefficient vector (i.e., a vector of weights/coefficients in the sparse basis). We train a separate dictionary for each class. Each dictionary *D**i* represents the image patches from class *i* reasonably well, but at the same time represents the image patches from the other classes quite poorly. There are several ways to train/learn a dictionary.35 In this work, we have adopted the task-driven dictionary learning algorithm proposed by Mairal et al.36 To train them, we solve a combinatorial optimization problem, where an approximate solution is obtained by alternating between a greedy sparse coding step using the current dictionary estimate, and a dictionary update step.

### Sparse Coding

We assume that any image patch *x* in a CT image can be represented as a sparse linear combination of the atoms of the dictionary *D* as: *x* ≈*Dα*, where *D* is the sparse coefficient vector. Given a dictionary, *D*, the goal in sparse coding is to find a sparse coefficient vector *D*. This requires solving a second optimization problem, the optimal solution to which is found using a greedy approach such as an orthogonal matching pursuit algorithm.37

### Classification

In sparse representation-based classification, an image patch *x* is classified according to how well the patch is represented by the class-specific dictionaries. Once a dictionary *Di* has been trained for each class *i*, classification of a new image patch *x*new is performed by evaluating the reconstruction/representation errors for different classes. From the class representation errors a pseudo-probability measure *Pi* is computed and the image patch is assigned to the class that has the maximum probability value

### Training

The dictionary learning model was trained on a desktop workstation running a 64-bit Windows operating system (Windows 10) with an Intel Xeon W-2123 CPU at 3.6GHz with 128GB DDR4 RAM. The x-, y-, and z-dimensions of each image in our dataset were x = 512, y = 512, and z ∼ 1250. CT lung scans from N = 4483 cases from the COPDGene phase 1 dataset who had follow up examination were considered. A representative 2D slice of a donor lung CT image is shown in **Figure 3A** of the manuscript. The lungs within these CT images were then automatically segmented using in-house software developed using MATLAB R2020a (MathWorks, Natick, MA). Our dataset for this study consists of a total N = 4483 lung CT images belonging to two categories: i) N = 1516 CT lungs that were described as fast progressors with a change of FEV1 ≥ -60ml/yr, referred to as class 1, and ii) N = 2967 CT lungs that were described as slow progressors with a change of < -60ml/yr in their FEV1, referred to as class 2. We used 35% of the data for training and the remaining 65% for testing. A total of 8,000,000 2D image patches from the three (axial, coronal, and sagittal) views from each of the prior maps (tPRM maps VNorm, VfSAD, χNorm, and χfSAD) were extracted from the training data for each class to train the dictionaries. The proposed dictionary learning algorithm was developed using MATLAB R2020a software (MathWorks, Natick, MA). We used the sparse modeling software (SPAMS) toolbox38 for the orthogonal matching pursuit optimization algorithm to efficiently optimize the dictionary elements. The hyper parameters of the dictionary learning algorithm include the image patch size *l*, the number of dictionary bases *K* for each dictionary, the sparsity controlling parameter *λ*, and the positive regularization parameter *ρ* in sparse coding. The optimal values for these parameters were automatically selected on a validation set (randomly chosen from within the training data) using the receiver operating characteristic (ROC) curves, by varying one parameter at a time while keeping the others fixed and choosing that value of the parameter that maximizes the area under the curve (AUC) of the ROC curve. The parameters of the dictionary learning algorithm were set to *l* = 25, *K* = 512, *λ* = 0.0015, and *ρ* = 0.006.

## Footnotes

*   # co-last author

*   All supplemental material has been incorporated into the main text. This change provides a more accessible and comprehensive picture of our dictionary learning model; specifically, figures 6 and 7 are discussed in sequence to demonstrate how the model can identify COPD patients as fast or slow progressors. We have also added an author, Dr. Ravi Pal.

*   Received May 26, 2023.
*   Revision received November 20, 2023.
*   Accepted November 20, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.Global, regional, and national deaths, prevalence, disability-adjusted life years, and years lived with disability for chronic obstructive pulmonary disease and asthma, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet Respiratory Medicine. 2017-9 2017;5(9):691–706. doi:10.1016/S2213-2600(17)30293-X
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2213-2600(17)30293-X&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28822787&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

2.  2.Mannino DM, Buist AS. Global burden of COPD: risk factors, prevalence, and future trends. The Lancet. September 1, 2007 2007;370(9589):765–773. doi:10.1016/S0140-6736(07)61380-4
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(07)61380-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17765526&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000249209300031&link_type=ISI) 

3.  3.Han MK, Agusti A, Calverley PM, et al. Chronic Obstructive Pulmonary Disease Phenotypes. Am J Respir Crit Care Med. September 1, 2010 2010;182(5):598–604. doi:10.1164/rccm.200912-1843CC
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.200912-1843CC&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20522794&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281492000004&link_type=ISI) 

4.  4.Barker BL, Brightling CE. Phenotyping the heterogeneity of chronic obstructive pulmonary disease. Clin Sci (Lond*)*. Mar 2013;124(6):371–87. doi:10.1042/CS20120340
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToicHBjbGluc2NpIjtzOjU6InJlc2lkIjtzOjk6IjEyNC82LzM3MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzExLzIwLzIwMjMuMDUuMjYuMjMyOTA1MzIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

5.  5.Boes JL, Hoff BA, Bule M, et al. Parametric Response Mapping Monitors Temporal Changes on Lung CT Scans in the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS). Acad Radiol. February 1, 2015 2015;22(2):186–194. doi:10.1016/j.acra.2014.08.015
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.acra.2014.08.015&link_type=DOI) 

6.  6.Labaki WW, Gu T, Murray S, et al. Voxel-Wise Longitudinal Parametric Response Mapping Analysis of Chest Computed Tomography in Smokers. Acad Radiol. Feb 2019;26(2):217–223. doi:10.1016/j.acra.2018.05.024
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.acra.2018.05.024&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

7.  7.McDonough JE, Yuan R, Suzuki M, et al. Small-Airway Obstruction and Emphysema in Chronic Obstructive Pulmonary Disease. N Engl J Med. October 27, 2011 2011;365(17):1567–1575. doi:10.1056/NEJMoa1106955
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1106955&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22029978&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296181800001&link_type=ISI) 

8.  8.Galbán CJ, Han MK, Boes JL, et al. Computed tomography–based biomarker provides unique signature for diagnosis of COPD phenotypes and disease progression. Nat Med. 11/2012 2012;18(11):1711–1715. doi:10.1038/nm.2971
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nm.2971&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23042237&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

9.  9.Bhatt SP, Soler X, Wang X, et al. Association between Functional Small Airway Disease and FEV1 Decline in Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. 2016-7-15 2016;194(2):178–184. doi:10.1164/rccm.201511-2219OC
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201511-2219OC&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26808615&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

10. 10.Hoff BA, Pompe E, Galbán S, et al. CT-Based Local Distribution Metric Improves Characterization of COPD. Sci Rep. 2017-06-07 2017;7(1):2999. doi:10.1038/s41598-017-02871-1
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-017-02871-1&link_type=DOI) 

11. 11.Ram S, Verleden SE, Bell AJ, et al. Quantitative CT Correlates with Local Inflammation in Lung of Patients with Subtypes of Chronic Lung Allograft Dysfunction. Cells. 2022;11(4):699.
    
    
12. 12.Regan EA, Hokanson JE, Murphy JR, et al. Genetic Epidemiology of COPD (COPDGene) Study Design. COPD: Journal of Chronic Obstructive Pulmonary Disease. February 1, 2011 2011;7(1):32–43. doi:10.3109/15412550903499522
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3109/15412550903499522&link_type=DOI) 

13. 13.Rabe KF, Hurd S, Anzueto A, et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. September 15, 2007 2007;176(6):532–555. doi:10.1164/rccm.200703-456SO
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.200703-456SO&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17507545&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000249646100004&link_type=ISI) 

14. 14.Wan ES, Castaldi PJ, Cho MH, et al. Epidemiology, genetics, and subtyping of preserved ratio impaired spirometry (PRISm) in COPDGene. Respir Res. Aug 6 2014;15:89. doi:10.1186/s12931-014-0089-y
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12931-014-0089-y&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

15. 15.Belloli EA, Degtiar I, Wang X, et al. Parametric Response Mapping as an Imaging Biomarker in Lung Transplant Recipients. Am J Respir Crit Care Med. October 25, 2016 2016;195(7):942–952. doi:10.1164/rccm.201604-0732OC
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201604-0732OC&link_type=DOI) 

16. 16.Legland D, Kiêu K, Devaux M-F. COMPUTATION OF MINKOWSKI MEASURES ON 2D AND 3D BINARY IMAGES. Image Analysis & Stereology. 2007 2007;26(2):83–92. doi:10.5566/ias.v26.p83-92
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5566/ias.v26.p83-92&link_type=DOI) 

17. 17.Ram S, Pal R, Kumar M, et al. Dictionary Learning to Predict COPD Progression Using Topological PRM. *C29 MORE THAN MEETS THE EYE: ADVANCED LUNG IMAGING*. American Thoracic Society; 2023:A4717-A4717. American Thoracic Society International Conference Abstracts.
    
    
18. 18.Ram S, Rodríguez JJ. Single image super-resolution using dictionary-based local regression. 2014:121–124.
    
    
19. 19.Ram S, Rodriguez JJ. Image super-resolution using graph regularized block sparse representation. 2016:69–72.
    
    
20. 20.Ram S. Sparse Representations and Nonlinear Image Processing for Inverse Imaging Solutions. The University of Arizona; 2017. Accessed 2017-11-29t01:30:16z. [http://hdl.handle.net/10150/626164](http://hdl.handle.net/10150/626164)
    
    
21. 21.Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. Apr 2005;3(2):185–205. doi:10.1142/s0219720005001004
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1142/S0219720005001004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15852500&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

22. 22.Long F, Peng H, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2005;27(08):1226–1238. doi:10.1109/TPAMI.2005.159
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TPAMI.2005.159&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16119262&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000229700900004&link_type=ISI) 

23. 23.Darbellay GA, Vajda I. Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory. 1999;45(4):1315–1321. doi:10.1109/18.761290
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/18.761290&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000080037500032&link_type=ISI) 

24. 24.Ram S, Tang W, Bell AJ, et al. Lung cancer lesion detection in histopathology images using graph-based sparse PCA network. Neoplasia. Aug 2023;42:100911. doi:10.1016/j.neo.2023.100911
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neo.2023.100911&link_type=DOI) 

25. 25.Bhatt SP, Bodduluri S, Hoffman EA, et al. Computed Tomography Measure of Lung at Risk and Lung Function Decline in Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. 2017;196(5):569–576. doi:10.1164/rccm.201701-0050OC
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201701-0050OC&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

26. 26.Boes JL, Bule M, Hoff BA, et al. The Impact of Sources of Variability on Parametric Response Mapping of Lung CT Scans. Tomography. Sep 2015;1(1):69–77. doi:10.18383/j.tom.2015.00148
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.18383/j.tom.2015.00148&link_type=DOI) 

27. 27.Ding D, Ram S, Rodriguez JJ. Image Inpainting Using Nonlocal Texture Matching and Nonlinear Filtering. IEEE Trans Image Process. Apr 2019;28(4):1705–1719. doi:10.1109/TIP.2018.2880681
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TIP.2018.2880681&link_type=DOI) 

28. 28.Malladi SRSP, Ram S, Rodríguez JJ. Image Denoising Using Superpixel-Based PCA. IEEE Transactions on Multimedia. 2021;23:2297–2309. doi:10.1109/TMM.2020.3009502
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMM.2020.3009502&link_type=DOI) 

29. 29.Ram S, Hoff BA, Bell AJ, et al. Improved detection of air trapping on expiratory computed tomography using deep learning. PLoS One. 2021;16(3):e0248902. doi:10.1371/journal.pone.0248902
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0248902&link_type=DOI) 

30. 30.Ding D. Image Inpainting Based on Exemplars and Sparse Representation. The University of Arizona; 2017. Accessed 2017-10-16t22:37:36z. [http://hdl.handle.net/10150/625897](http://hdl.handle.net/10150/625897)
    
    
31. 31.Sarkar R, Acton ST. SDL: Saliency-Based Dictionary Learning Framework for Image Similarity. IEEE Trans Image Process. Feb 2018;27(2):749–763. doi:10.1109/TIP.2017.2763829
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TIP.2017.2763829&link_type=DOI) 

32. 32.Zhang S, Zhan Y, Metaxas DN. Deformable segmentation via sparse representation and dictionary learning. Med Image Anal. Oct 2012;16(7):1385–96. doi:10.1016/j.media.2012.07.007
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2012.07.007&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22959839&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

33. 33.Diamant I, Klang E, Amitai M, Konen E, Goldberger J, Greenspan H. Task-Driven Dictionary Learning Based on Mutual Information for Medical Image Classification. IEEE Trans Biomed Eng. Jun 2017;64(6):1380–1392. doi:10.1109/TBME.2016.2605627
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TBME.2016.2605627&link_type=DOI) 

34. 34.Ram S, Verleden SE, Kumar M, et al. CT-based Machine Learning for Donor Lung Screening Prior to Transplantation. medRxiv. Mar 29 2023;doi:10.1101/2023.03.28.23287705
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMy4wMy4yOC4yMzI4NzcwNXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTEvMjAvMjAyMy4wNS4yNi4yMzI5MDUzMi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

35. 35.Rubinstein R, Bruckstein AM, Elad M. Dictionaries for Sparse Representation Modeling. Proceedings of the IEEE. 2010;98(6):1045–1057. doi:10.1109/JPROC.2010.2040551
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/JPROC.2010.2040551&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277884900015&link_type=ISI) 

36. 36.Mairal J, Bach F, Ponce J. Task-Driven Dictionary Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012;34(4):791–804. doi:10.1109/TPAMI.2011.156
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TPAMI.2011.156&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21808090&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F20%2F2023.05.26.23290532.atom) 

37. 37.Tropp JA, Gilbert AC. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit. IEEE Transactions on Information Theory. 2007;53(12):4655–4666. doi:10.1109/TIT.2007.909108
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TIT.2007.909108&link_type=DOI) 

38. 38.Mairal J, Bach F, Ponce J. Sparse modeling for image and vision processing. arXiv preprint arXiv:14113230. 2014;