Machine learning analysis of the UNOS database fails to predict lung transplant outcomes

Lucy Luo; Marcin Możejko; Nikolay S. Markov; Alec Peltekian; Suror Moshin; Mary Carns; Phillip Cooper; Jeffrey Lysne; Anthony Joudi; Alan Betensley; Bradford C. Bemiss; Catherine Myers; Ankit Bharat; Rade Tomic; Ambalavanan Arunachalam; Ewa Szczurek; GR Scott Budinger; Alexander V. Misharin; Mrinalini Venkata Subramani

doi:10.1101/2024.10.19.24315817

Abstract

Background Lung transplantation is the only life-saving therapy for end-stage lung disease. However, lung transplantation has the worst survival among all solid organ transplants.¹ We applied machine learning to a large standardized electronic health record (EHR) dataset from the United Network for Organ Sharing (UNOS) to test whether pre- transplant and peri-transplant donor and recipient features can predict one-, three- and five-year survival, or favorable long-term outcomes in lung transplant.

Methods We used data from 43,869 first time lung transplant recipients >18 years old from 1987 to November 2022 for whom one-, three-, and five-year survival outcomes were available. We applied XGBoost or a tabular BERT model called EHRFormer to the UNOS EHR dataset.

Results Using pre-transplant features XGBoost predicted one year mortality with a test AUC = 0.6 [0.57, 0.64] 95% CI. Addition of peri-transplant features only modestly improved AUC for one-year mortality prediction (test AUC = 0.63 [0.60, 0.67] 95% CI and 0.64 [0.63, 0.66] 95% CI for XGBoost and EHRFormer, respectively). Top predictive features of one year mortality using peri-transplant features from each model were length of index stay, transplant type, recipient age, ventilation status during the index stay, and creatinine at the time of transplant. Both XGBoost and EHRFormer performed better when predicting lung function at one-year post-transplant (XGBoost test AUC = 0.74; EHRFormer test AUC = 0.76). Both models identified and used features previously associated with transplant outcomes to inform predictions.

Conclusions Despite machine learning approaches identifying known risk factors for transplant outcomes, EHR data collected by UNOS poorly predict one-, three-, and five-year mortality outcomes of lung transplantation. These results suggest caution when using pre-transplant EHR features to predict lung transplant outcomes.

Introduction

Lung transplantation is the only viable treatment option that confers improved survival and quality of life for patients with advanced lung disease and respiratory failure. Despite improvements in surgical techniques, immunosuppressive strategies, perioperative management, supportive strategies, and approaches for donor lung allocation over the years, lung transplantation is persistently associated with poor survival relative to other solid organ transplants, with a median survival of 5.8 years (1990-2014)² and a mean survival of 9.28 years.¹ Approximately 10-15% of all deaths after lung transplant occur in the first year.³ Indeed, in patients who survived one year after transplant, median survival is 10.2 years.⁴ As a result, one-year mortality is an important, trajectory-defining event and is the focus of public reporting of transplant outcomes. Recently, investigators have suggested that three- or five-year outcomes provide additional data with respect to center-specific transplant outcomes, leading some to suggest these outcomes be publicly reported⁵. We reasoned that early prediction of one-, three- and five-year survival based on factors available early in the transplant course might identify patients who would benefit from targeted interventions. Further, we reasoned that waitlist and peri-transplant factors in the donor and recipient that predict outcomes might include modifiable factors to improve outcomes.

Conventional approaches for analyzing risk factors and predictors for lung transplant outcomes rely on univariate or multivariate statistical approaches applied to selected variables. As such, available predictors of lung transplant outcomes vary by center and perform poorly.^4,6–15 Machine learning is a powerful approach to identify predictors of outcomes from clinical data, leading to its growing use in clinical research and care. Despite this promise, the application of historic machine learning techniques to EHR data collected by the United Network for Organ Sharing have been disappointing.^16,17 As machine learning approaches have dramatically improved since the publication of those studies, we sought to test whether modern machine learning approaches could predict one-, three-, and five-year lung transplant outcomes after training on data extracted from the UNOS database. We found that even the best performing models, including XGBoost¹⁸ and a Bidirectional Encoder Representations from Transformers (BERT)¹⁹-based model, EHRformer performed poorly in predicting one-, three-, and five-year mortality and only modestly better when predicting lung function at one year. The predictive performance of the models was further reduced when data collected during the index stay were excluded. The predictions of the models were driven by factors previously reported to be associated with poor transplant outcomes. Our findings suggest that even with state-of-the-art machine learning models, data collected by UNOS poorly predict lung transplant outcomes.

Results

The UNOS dataset reveals changes in lung transplant practices and outcomes in the US over time

The UNOS dataset is a standardized, national database that includes clinical and demographic information for candidates listed for lung transplantation.²⁰ For the purposes of this study, we focused on first-time lung transplant recipients >18 years old. Because the UNOS dataset is cumulative, it reflects overall changes in national lung transplant practice over time. Our exploratory analysis of these patients suggested these changes in practice explain a significant amount of variance in the dataset (Figure 1A, 1B). For example, the proportion of transplants performed for restrictive lung disease has increased relative to those performed for obstructive lung disease. Additionally, ischemic time, age, and FEV1% at transplant have increased over time, reflecting changes in organ storage and allocation, recipient characteristics, and indication type.

Figure 1:

An exploratory analysis of the UNOS dataset reveals distinct indication-drive groupings and time-associated trends. A. Histogram of the number of lung transplants by year since 1987 split by restrictive indication (red). The timeline schematic below the x axis indicates how many observations were included for modeling for each model across various time frames. B. Hexbin plots showing from top to bottom the relationship of ischemic time (hours), recipient age, and FEV1% at the time of transplant with sorted transplant date. Black dots on the ischemic time plot indicate the yearly median of ischemic time. P values and R-squared values were derived from linear regression on all three variables vs transplant year. C. Principal components contour plots split by single and bilateral transplants, indicated in red. D. Principal components contour plots split by indication (red). E. Principal components scatterplot colored by transplant year with reference contour plot on the right. Early to most recent transplant years are illustrated from dark to light respectively.

An initial exploration of all relevant waitlist, peri-transplant, and follow-up features (at one, three, and five years) yielded 780 features with at least 1 observation (Table S1, Figure S1A). There were distinct patterns of feature presence and missingness in the UNOS dataset (Fig. S1B), coinciding with the introduction of the Lung Allocation Score (LAS) in May 2005²¹ and changes in data collection introduced in 2015 (Fig. 1A, S1B). Accordingly, we performed modeling separately for each of these three time periods using all features available within a given period. For the entire time period, we used only shared features. To inform feature selection for machine learning applications, we first performed principal components analysis (PCA) to identify influential features and trends present in the dataset. We included all first-time lung transplant recipients >18 years old (47,864 observations) and all waitlist, peri-transplant, and follow-up features at one, three, and five years for which ≤10% of data was missing (357 features, Fig. S1C, Table S1. Lung transplant type, indication, lung function, and transplant year explain substantial variation in the dataset (Figure 1C-E).

XGBoost and EHRFormer fail to predict one year mortality

We ran 4 separate models based on waitlist and peri-transplant features present in various time frames illustrated in Fig. 1A and S1B. Briefly, features that were at least 90% complete within each specified time frame were used for the model. For the model encompassing all observations across all years, only features that were at least 90% complete across all years were used. The number of features used in each time period is shown in Fig. 2A. A data dictionary of these features is shown in Table S1. Due to class imbalance, downsampling of the majority class was also applied such that there was a 1:1 ratio of either outcome. Additionally, those who died within 90 days of the index stay were excluded from the initial mortality prediction analysis and analyzed separately (see below) to prevent length of index stay from serving as a model shortcut. A bar graph visualizing the class imbalance of outcomes is shown in Fig 2B. Final test AUCs across all models were uniformly poor, ranging from 0.58 to 0.64 (0.64 for data from all years) for EHRFormer and from 0.62 to 0.63 (0.63 for data from all years) for XGBoost (Figure 2C, S2A, and Table 1). Confusion matrices of both models on the test set are shown in Figure S2B. Figure 2D shows test AUROC, accuracy, F1, precision, recall, and specificity across all models.

Figure 2:

EHRFormer and XGBoost predict one-, three-, and five-year mortality with modest performance. A. The number of features used in each of the 4 different models. A data dictionary of these features is available in Table S1. B. The number of patients in each model split by class outcome of 1 year mortality. Teal indicates those who were excluded for modeling if they died within 90 days of their index stay. D. Test set AUROCs for EHRFormer (red) and XGBoost (blue) prediction of 1 year mortality across the 4 models. Error bars indicate 95% CIs based on 50 bootstraps of the test set. D. Heatmap of all test set metrics normalized within each metric (row-wise) including AUROC, accuracy, F1 score, precision, recall, and specificity for EHRFormer and XGBoost across the 4 models. E. Test set AUROCs for EHRFormer (red) and XGBoost (blue) prediction of 1-, 3-, and 5-year mortality. Error bars indicate 95% CIs based on 50 bootstraps of the test set. F. The number of patients split by class outcome of 1-, 3-, and 5- year mortality. Teal indicates those who were excluded for modeling if they died within 90 days of their index stay.

View this table:

Table 1:

Test AUROCs for EHRFormer and XGBoost predicting one year mortality across key time periods in the UNOS dataset.

XGBoost and EHRFormer fail to predict one-, three-, and five-year mortality

XGBoost and EHRformer predict one-, three-, and five-year mortality with poor performance, with test AUROCs ranging from 0.61 to 0.65 across all tasks and models (Figure 2E). There was significant imbalance in the number of patients belonging to the positive and negative classes for the one-, three-, and five-year tasks that was corrected by downsampling (Figure 2F). Model performance did not change between one-, three-, and five-year mortality prediction (Figure S3).

XGBoost and EHRFormer modestly predict patients with poor lung function at one year

We used all waitlist and peri-transplant features (≤10% missing) included in our mortality prediction models to predict one year lung function after downsampling for class imbalance, resulting in 4 separate models (Figure 2A, Table S1, and Figure 3A). XGBoost and EHRFormer predicted one year lung function, with final test AUCs across all models ranging from 0.72 to 0.74 (0.74 for data from all years) for EHRFormer and from 0.74 to 0.79 (0.76 for data from all years) for XGBoost (Figure 3B, 3C, S3A, S3B, and Table 3).

Figure 3:

EHRFormer and XGBoost predict one year lung function with strong performance. A. The number of patients in each model split by class outcome of FEV1 <70% predicted at 1 year. B. Test set AUROCs for EHRFormer (red) and XGBoost (blue) prediction of FEV1p <70% predicted across the 4 models. Error bars indicate 95% CIs based on 50 bootstraps of the test set. C. Heatmap of all test set metrics normalized within each metric (row-wise) including AUROC, accuracy, F1 score, precision, recall, and specificity for EHRFormer and XGBoost across the 4 models.

View this table:

Table 2:

Test AUROCs for EHRFormer and XGBoost predicting mortality at one, three-, and five-years post-transplant.

View this table:

Table 3:

Test AUROCs for EHRFormer and XGBoost at across key time periods in the UNOS dataset.

Index stay features have high importance for mortality prediction

To identify features associated with one-year mortality in our models, we developed two XGBoost models. The first model was trained on features from all years that were available pre- transplant (Figure 4A, Table S1). The second model was trained on features that were available pre-transplant and during the index hospitalization (Figure 4B, Table S1). We then obtained SHapley Additive Probabilities (SHAP)²² from these two models. The pre-transplant features with high importance for mortality prediction included transplant type, donor ethnicity, PCO2 at the time of transplant, and FVC at the time of transplant (Figure 4A). The pre-transplant features and index hospitalization features with high importance for mortality prediction included length of stay and whether the recipient experienced acute rejection during the index stay (Figure 4B).

Figure 4:

Index stay features strongly influence 1-year outcomes. A. SHAP values from the XGBoost model predicting 1 year mortality using pre-transplant features. B. SHAP values from the XGBoost model predicting 1 year mortality using pre-transplant and peri-transplant features. C. Test AUROCs from XGBoost models predicting 1 year mortality using all features (pre- and peri-transplant), pre-transplant features only, and lung allocation score. Error bars indicate 95% CIs based on 50 bootstraps of the test set. Statistical comparisons were made using the student’s t-test method with correction for FDR <0.05. D. SHAP values from the XGBoost model predicting FEV1p <70% predicted at 1 year using pre- and peri-transplant features. E. Test AUROCs from XGBoost models predicting 1 year mortality by transplant type - all transplants, single, and double lung transplants. Light and dark blue indicate models trained on all features (pre- and peri-transplant features) vs. pre-transplant features only). Highlighted significant statistical comparisons of interest are indicated on the graph. F. SHAP values from the XGBoost model predicting FEV1p <70% predicted at 1 year within single lung transplant recipients using pre-transplant features only.

Removing index hospitalization features and further subsetting on the Lung Allocation Score (LAS) further reduces model performance

Because index hospitalization features are unavailable when clinicians make decisions to list patients for lung transplantation, we trained another XGBoost model in which we removed the index stay features specified in Table S1. Model performance for mortality prediction at one year was lower when XGBoost was trained on features only available immediately preceding transplant (Figure 4C and Table 4). Similarly, model performance for mortality prediction at one year was significantly decreased when XGBoost was trained on the initial LAS on the waitlist, the end LAS on the waitlist, the calculated LAS, and the match LAS (Figure 4C and Table 4).

View this table:

Table 4:

Test AUROCs for XGBoost for predicting 1 year mortality using all features, pre- transplant features only, and the LAS.

Index stay features are highly influential for prediction of lung function

Features important to lung function prediction included whether the transplant performed was single or bilateral, ischemic time, PCO2 at registration, creatinine at registration, recipient age, donor age, days on the waiting list, and O2 requirement at rest at the time of transplant (Figure 4D and Table 5). Features unique to lung function were primarily related to lung function at the time of transplant (FEV1 and FVC at registration), indication type (COPD and cystic fibrosis were associated with better lung function), and recipient BMI (Figure 4D and Table 5).

View this table:

Table 5:

Test AUROCs for XGBoost for predicting 1 year mortality using all features or pre- transplant features only stratified by transplant type.

Stratification by transplant type results in a small increase in model performance

After length of stay, transplant type was the most important feature for 1 year mortality. Therefore, we trained separate XGBoost models on all single lung transplant recipients and all double lung transplant recipients. Utilizing all features including those collected from the index stay yielded a test AUROC of 0.67 [0.61, 0.71] 95% CI within single lung transplant recipients vs test AUROCs of 0.63 [0.60, 0.67] 95% and 0.57 [0.52, 0.62] CI for all patients and double lung transplant recipients, respectively (Figure 4E, Table 5). When modeled on pre-transplant features, subsetting on single lung transplant recipients did not increase model performance compared to modeling on both single and bilateral lung transplants. SHAP analysis within the model trained on single lung transplant recipients using only pre-transplant features revealed the importance of donor and recipient age as well as hemodynamic parameters such as pulmonary arterial pressures. Interestingly, listing center code and center code also emerged as influential features in this model (Figure 4F). Train and test AUROC curves as well as SHAP values for XGBoost models trained on the single and bilateral subsets are shown in Figure S4.

EHRFormer permits querying the effect of multiple features simultaneously

SHAP values for EHRFormer models are not currently accessible. Instead, we can query feature importance by “perturbing” or changing the value of a specified feature or even a set of multiple features and seeing what effect doing so has on the model’s output. For example, the probability distribution of mortality by one-year does not change after perturbing transplantation region (Figure 5A) but changes substantially after perturbing long index stay (Figure 5B) or dialysis during the index stay (Figure 5C). One can also perform multiple in silico perturbations simultaneously. For example, we set positive flags to the highest quartiles for complicated index stay features, including ECMO at 72 hours, inhaled NO at 72 hours, intubation status at 72 hours, and ventilation duration post-transplant which shifted the mortality prediction towards death at one year (Figure 5D).

Figure 5:

EHRFormer perturbations allow users to query the effect of one or multiple features simultaneously and reveal the paradoxical relationship of ischemic time on mortality. A, B, and C demonstrate probability distribution curves for the model’s prediction of 1 year mortality and how they change when an individual feature is toggled. A. transplantation in UNOS region 2; B. whether patient experienced long index stay; C. whether patient experienced dialysis during index stay. D. Changes in probability distribution curves for 1 year mortality when index stay features reflective of complications are toggled. E. Changes in the probability distribution for 1 year mortality when all observations are set to their maximum quartile for ischemic time are shown on the left. Changes in the probability distribution for 1 year mortality when all observations are set to their maximum quartile for ischemic time and set to their lowest quartile for transplant year are shown on the right.

Perturbing multiple features simultaneously allows EHRFormer to explain the unexpected influence of long ischemic time on one year mortality in the XGBoost model

Prolonged ischemic time has been historically associated with worse 1-year outcomes²³. Unexpectedly, longer ischemic times were associated with improved outcomes in our XGBoost models (Figure 3A). We hypothesized that the historic association between prolonged ischemic time and poor outcomes were reversed by improvements of organ handling and storage, including the use of ex-vivo lung perfusion (EVLP). Accordingly, we leveraged the ability of EHRFormer to query multiple features at once to investigate perturbation of a feature conditioned on the value of another feature - in this case what might happen if we prolong ischemic time when transplant year is set to its lowest quartile (earliest) (Figure 5E). When setting ischemic time to its highest value alone, we see a paradoxical shift in mortality prediction that is consistent with the direction of the SHAP values seen in XGBoost (Figure 4A and B). However, when setting ischemic time to its highest value when conditioned on setting transplant year to its lowest quartile, the probability distribution reverses in the opposite direction. Additionally, when we subsetted on those whose lungs underwent EVLP prior to transplant, the proportion of those experiencing poor outcomes such as 1 year mortality, ECMO at 72 hours after transplant, and death during the index stay, were significantly reduced (Figure S6).

Features associated with frailty predicted death during the index hospitalization

Those who died during the index hospitalization were excluded from our initial mortality prediction models. To investigate features associated with mortality in this subset of patients, we performed hierarchical clustering of all features (Figure 6A). Features associated with mortality during the index stay included transplant indication of idiopathic pulmonary fibrosis or restrictive lung disease, recipients of donors of black or African American ethnicity, and life support features such as ECMO, ventilator, and ICU status. A distinct group of features associated with recipient frailty such as functional status at the time of transplant, infection requiring IV drug therapy prior to transplant, and hospitalization status prior to transplant were associated with higher rates of index hospitalization mortality (Figure 6A). When we investigated features associated with a complicated index hospitalization, those who died during the index stay showed much higher oxygen requirements (FiO2), rates of ECMO, rates of inhaled NO, rates of intubation, and rates of reintubation at 72 hours after transplant (Figure 6B). Similarly, we examined additional frailty features associated with higher rates of index hospitalization mortality. Those who died during the index hospitalization had higher O2 requirements at rest, lower six-minute walk scores, as well as higher rates of chronic steroid use, pan-resistant bacterial infection, infection requiring IV drug therapy, and ventilator status at the time of transplant (Figure 6C).

Figure 6: Those who died during their index stay have a higher proportion of flags for life support features, features associated with frailty, and index stay complications.

A. heatmap visualization of the hierarchically clustered features used for the prediction tasks. The annotation legend on the left indicates those who died during their index stay (yellow) vs. those who survived the index stay (green). The heatmap also highlights groupings or clusters of features with a higher proportion of those who died during their index stay. Blue box - those whose indication for transplant was idiopathic pulmonary fibrosis; red box - a group of patients highlighted by black donor ethnicity; green box - a set of features associated with life support after transplant; grey boxes - 2 groupings belonging to separate hierarchically clustered “clades” associated with recipient frailty. B. Life support features that were statistically significant in patients who survived the index stay vs. those who died during the index stay, where stacked proportional bar plots were used to represent categorical features (green indicates “yes” and blue indicates “no” for the feature value) and violin plots were used to show continuous features. C. Frailty features that were statistically significant in patients who survived the index stay vs. those who died during the index stay. For graphs in B and C a chi-square (categorical) or Wilcoxon-rank sum (continuous) test was applied with FDR correction <0.05 for multiple comparisons.

Discussion

Lung transplant is a lifesaving treatment for patients with end stage lung disease. Over the years, lung transplant allocation systems have used prediction models to guide patient eligibility for transplant, organ allocation, and outcome reporting. The role of these models is two-fold: 1) to prioritize organ allocation to patients who have the greatest chance of death due to their underlying lung disease and 2) to direct scarce resources to patients who would achieve maximum survival benefit from lung transplantation. Achieving these goals requires robust prediction models for post-transplant outcomes. Traditionally, these predictive models have been informed by expert-guided supervised selection of features incorporated into traditional statistical methods such as multilinear regression. Modern machine learning approaches such as language models or gradient boosted decision trees can accommodate non-linear relationships between variables and have the potential to account for the multiplicative risks of comorbidities on lung transplant outcomes. We used two robust machine learning models, XGBoost and EHRFormer, which have performed well in other clinical prediction tasks, to predict one-, three- and five-year mortality and lung function after lung transplantation. Our models were trained on data within the UNOS database up until November 2022. Even after optimization, these models performed poorly as predictors of mortality or lung function after transplant, particularly when data from the post-transplant index stay were excluded.

Despite their attention to known risk factors associated with transplant outcomes, the performance of our machine learning methods in predicting outcomes was poor. There are several possible reasons that might explain this poor performance. This suggests that data features collected by UNOS are not predictive of transplant outcomes and that collection of additional predictive features such as diffusing capacity of the lungs for carbon monoxide²⁴ is needed. The application of machine learning algorithms to the larger body of EHR data available at individual centers or consortia of centers might identify informative features outside of the UNOS database. Importantly, the models would incorporate molecular features, imaging features, and other modalities outside of EHR data²⁵. Second, batch effects in the UNOS data related to differences in data curation, collection, reporting between centers, and changes in practice over time might confound the models. Arguing against this, both models identified features previously associated with transplant outcomes as important for their predictions. Applying these models to EHR data that have been validated by clinician review is a strategy to address this concern at the level of individual centers or consortia. Finally, drivers of transplant outcomes might be largely independent of clinical features present before the procedure. The limited performance of these models suggests caution in using these data for predicting lung transplant outcomes.

Our study highlights the promise of machine learning approaches in identifying risk factors that drive lung transplant outcomes. We observed a high level of concordance between predictors of poor outcomes identified by our models and those selected by experts for use in supervised models. These included donor and recipient age, length of index stay, and renal dysfunction or failure during the index stay as important predictors of 1 year mortality. Investigation of factors driving our machine learning prediction also revealed biases in the data that would likely be missed in a supervised analysis. For example, the increased risk associated with African American donors likely suggests important risks related to social determinants of health or other factors that would be unlikely to be included in supervised analyses. The failure to include these factors in current models might paradoxically perpetuate these biases in the form of reporting of higher program quality scores for centers in different geographic regions.

XGBoost provides direct measures of the relative weights of features that drive the model. BERT-based models like EHRFormer do not provide explicit information about the factors that drive their predictions. To address this concern, we developed perturbation tools in EHRFormer that allow users to investigate the hypothetical effect of changing variables. We used these to investigate the paradoxical association between prolonged ischemic time and improved mortality after transplantation in the XGBoost models. When we fixed transplant year to the earliest quartile, before procedures such as EVLP were available, increased ischemic time was associated with increased mortality. We show how this tool can be used to perform multiple “perturbations” within EHRFormer simultaneously to query multivariate hypotheses and dependencies. These tools might be helpful to generate hypotheses that can be tested with causal interventions or for individual programs to assess the relative benefits of interventions in the peri-transplant period that can improve outcomes.

Conclusion

In summary, despite their ability to identify and use clinical features known to be associated with lung transplant mortality in their predictions, modern interpretable machine learning approaches applied to the UNOS database performed poorly in predicting one-, three- and five-year survival and lung function after lung transplantation. We developed perturbation tools within EHRFormer to simultaneously explore features in the UNOS database that might reflect changes in lung transplant practice and outcomes over time and unexpected biases in the data. Our data suggest caution when using historic UNOS data to inform clinical practice decisions by multidisciplinary lung transplant teams and outcome reporting. The relative ease with which these models can be applied to more comprehensive clinical and laboratory data, and their demonstrated ability to identify features associated with transplant outcomes, suggest them as powerful approaches to address these limitations.

Disclosure and data availability statement

All patient-de-identified data was obtained from the United Network for Organ Sharing (UNOS) Standard Transplant Analysis and Research File (SRTR), which is based on the Organ Procurement and Transplantation Network data as of 11/8/2023. All research was approved by the Northwestern Institutional Review Board (STU00221316).

Data Availability

Competing interest statement

The authors have no conflicts of interest to disclose.

Methods

Data preparation, cleaning, and feature encoding

All observations with a transplant date earlier than November 8, 2022 were included. All waitlist (THORACIC_WL_DATA), peri-transplant (THORACIC_DATA), and follow-up features (THORACIC_FOLLOWUP_DATA) with at least 1 observation were initially considered, yielding 786 features from the STAR File Data Dictionary. All features were reviewed for outliers and unexpected values with consultation from a committee of transplant pulmonologists. Outlier data were treated as missing. Binary features were numerically binarized as 0 or 1. Categorical features were ordinalized if they represented a scale or one-hot encoded. Waitlist and follow-up features were collected at various points in time for each patient. For waitlist data only the first and most recent values were collected. Follow-up observations closest to and within +/-3 months of one-, three-, and five-years post-transplant were used.

Principal components analysis

The features and resultant dataframe obtained from Data Preparation and Feature Encoding were further filtered for those that were missing <10% of data, resulting in 363 features. Numerical features were mean imputed while categorical features were mode imputed. With this dataframe, principal components analysis was applied using scikit-learn²⁶ in Python, with the top 50 principal components chosen for initial exploration. Matplotlib²⁷ and seaborn²⁸ were used for data exploration and visualization.

Statistical tests

To determine whether continuous variables change with respect to time (transplant date), logistic regression was performed with statsmodels²⁹ to determine R-squared coefficients and p- values. To analyze differences in categorical variables between groups, the chi-squared test (scipy.stats)³⁰ was used with a Benjamini-Hochberg adjustment for multiple comparisons (FDR < 0.05) (statsmodels)²⁹. To analyze differences in continuous variables between groups, the wilcoxon rank sum test (scipy.stats)³⁰ was used with a Benjamini-Hochberg adjustment for multiple comparisons (FDR < 0.05) (statsmodels)²⁹. To draw statistical comparisons between the bootstrapped AUROCs between models (50 bootstraps), a Student’s t-test was performed after assessing normality with the Shapiro-Wilk test (scipy.stats)³⁰. A Benjamini-Hochberg adjustment was made for multiple comparisons (FDR < 0.05) (statsmodels)²⁹.

Data preparation for modeling

Feature selection and preparation for modeling

Subsets of data were chosen for modeling based on observations from the following time frames (inclusive): 1987-2004, 2005-2014, and 2015-Present. These time-related subsets were chosen due to changes in data collection illustrated in S1. All waitlist and peri-transplant data were included for all modeling tasks. Features were filtered for those that were missing <10% of observations within each time period, resulting in 140, 145, 152 and 140 features for data from

1987-2004, 2005-2014, 2015-Present, and all years. Numerical features were mean imputed while categorical features were mode imputed.

EHRFormer feature representation and pretraining

From each of the 4 dataframes generated as described in the previous paragraph, 4 different models were pretrained from a tabular BERT initialized with random weights, available via Huggingface in the transformers package³¹. For the base architecture of this model (EHRFormer), we set the standard Transformer Encoder with three transformer layers, with a layer size of 64, an intermediate layer size of 128, and 4 attention heads. For input representation of the tabular data, all numerical values were quantile-binned to represent data from a single patient. Binary features were set to either the lowest or highest quartile bin. Following the standard BERT procedure, we added a 64-dimensional learnable CLS embedding at the beginning of the sequence. An additional bin was included to represent the CLS token for each patient as well as an additional bin for missing data. Each bin and each EHR feature was then assigned with a learnable 64-dimensional vector embedding. We later represented a single EHR entry with a sequence of the length of the number of features, where each element of a sequence was a 64-dimensional vector obtained by summing the feature embedding vector and its assigned bin embedding vector for every feature in the training data. In line with the BERT pretraining procedure, we pre-trained EHRFormer using a masked language model objective. The observations were divided into an 80/20 train/test split for pre-training. During this phase, we randomly masked 15% of the values in the 80% train of each entry by replacing the true bin embedding with a learnable MASK embedding. We later equipped EHRFormer with the task of predicting the actual bin values of the masked entries based on the unmasked EHR feature values. None of the observations in the 20% holdout test set were seen during pretraining.

Modeling one-, three-, and five-year mortality and lung function outcomes

Mortality label retrieval

To obtain the correct labels for one-, three-, and five-year mortality, we used the patient’s date of death (COMPOSITE_DEATH_DATE) to confirm patient death. We then calculated survival in days by subtracting the difference between the patient’s date of death and date of transplant (TX_DATE). Survival time was then used to further identify which patients had died within the one-, three-, and five-year outcomes of interest. To retrieve patients who were alive at one, three, and five years, we used the “patient status”, patient status date, and date of death variables (PX_STAT = “A“ for alive, PX_STAT_DATE, COMPOSITE_DEATH_DATE = NA).

PX_STAT_DATE was used to determine total known survival time. We then added this subset of patients to patients who might have died but whose survival exceeded the 1-, 3-, and 5-year outcomes of interest for each task. To further ensure the model was not relying on shortcut features, we removed patients who died during or within 90 days of their index stay. This was determined by filtering out patients whose length of stay (LOS) exceeded survival time by at least 90 days.

Lung function <70% of predicted label retrieval

To obtain labels for lung function <70% of predicted, we used the follow-up feature FEV_percent available at 1 year post transplant (+/- 3 months). For patients in which FEV_percent at 1 year post transplant was unavailable, we used the spiref³² package with GLI- 2012³³ reference values to determine a calculated FEV_percent of predicted based on their absolute FEV1(L) at one year (FEV). Downsampling of the majority class was also applied as described in detail in the subsequent section EHRFormer fine-tuning and evaluation.

EHRFormer fine-tuning and evaluation

We used fine-tuning of the pretrained EHRFormer for binary classification of the patient’s mortality and lung function at 1 year. Specific outcome retrieval is described above. To perform this task within specific time frames, we used one of the 4 pretrained models that corresponded to these time frames: 1987-2004, 2005-2014, 2015-Present, and all years. With the data and pretrained model from data spanning all years, we also performed identical binary classification of whether the patient was alive or dead at 3 and 5 years. Due to class imbalance for all tasks, the majority class was downsampled at random to result in a 1:1 negative to positive class ratio in both the training and test sets separately.

For each task, following downsampling, hyperparameter tuning was performed using optuna’s³⁴ Parzen Tree based estimator (objective = accuracy, direction = maximize, n_trials = 75), on the 80% train split from pretraining. We searched for optimal hyperparameters within the following space: learning_rate = [1e-6, 1e-2], per_device_train_batch_size = [16, 32, 64, 128, 256], and weight_decay = [1e-4, 1e-1]. With the tuned hyperparameters, we performed 5-fold cross- validation on the 80% train split from pretraining. Splits and performance metrics (accuracy, AUROC, F1, precision, recall, and specificity) were determined using scikit-learn²⁶. The tuned models were finally evaluated on their performance using the remaining 20% holdout test set that were not seen during pre-training, hyperparameter tuning, or 5-fold cross-validation. To determine the variability of performance metrics within the test set, 50 random bootstraps were performed on the test set when calculating model metrics.

XGBoost evaluation

We similarly used XGBoost for binary classification of the patient’s mortality and lung function at 1 year within time frames corresponding to 1987-2004, 2005-2014, 2015-Present, and all years. We also performed identical binary classification of whether the patient was alive or dead at 3 and 5 years. Additional tasks included assessing the performance of XGBoost on all features, features only available at or before the time of transplant, and on the LAS. For these additional tasks, center-specific features were also included (ie. CTR_CODE, see Table S1). Where center code features were included, the Catboost Encoder package was used to perform target encoding of center codes. Finally, XGBoost was also used to determine prediction performance on single and bilateral lung transplant recipients separately. Downsampling of the majority class was done as was done for EHRFormer. Since there is no pre-training process, we divided entire datasets into an 80/20 train/test split at random. 5-fold cross-validation was performed within the training set with scikit-learn. Hyperparameter tuning was performed using a Bayesian optimizer with a Gaussian Process based estimator (scikit-optimize’s BayesSearchCV)³⁵. The search space was defined as follows: learning_rate = (0.01, 1.0), max_depth = (2, 12), subsample = (0.1, 1.0), colsample_bytree = (0.1, 1.0), reg_lambda = (1e-9, 100), reg_alpha = (1e-9, 100), min_child_weight = (1, 10), gamma = (0, 5), n_estimators = (50, 1000). To determine the variability of performance metrics within the test set, 50 random bootstraps were performed on the test set when calculating model metrics. For interpretability and insight into XGBoost model decisions, we used SHAP (SHapley Additive exPlanations)³⁶.

EHRFormer perturbations

To gain model insights from EHRFormer, we developed a pretrained tabular BERT similar to the pretraining process described in the modeling tasks. We specifically included data from all years and included additional features of interest such as transplant year (TX_YEAR) as well as all the features from the 2015-Present model. The entire dataset was used for pretraining as opposed to setting aside a test cohort. Hyperparameter search was performed within the entire dataset as described previously. We then ran this new model on the fine-tuning binary classification task of 1-year mortality. To understand feature importance, we randomly sampled half of the input observations 10 times. From these sampled observations, we manipulated features of interest by manually changing the bins for one or multiple features in each of the sampled observations.

These new “perturbed” inputs were fed to the model during fine-tuning. This process returned new probabilities of a positive class outcome (ie. 1 year mortality) determined by the model when given a perturbed set of features, visualized here as probability distributions before and after the perturbation was applied.

Hierarchical clustering for length of stay analysis

Hierarchical clustering was performed on all observations using all the features that were used for prediction of 1 year mortality (140 features). Numerical features were mean imputed while categorical features were mode imputed. The ward linkage method and euclidean distance were used. Whether an individual died during their index stay was used as the row annotation feature.

Propensity score matching for creating matched cohorts for length of stay analysis

To examine associations between those who died during the index stay and those who did not, we performed propensity score matching with psmpy³⁷ to generate a matched cohort of controls. 1:5 matching was performed based on patients with similar transplant year, indication grouping, gender, age, and ethnicity. Missing data were imputed prior to matching with a simple mean strategy. KNN matching with propensity logits was performed at a 1:5 case:control ratio to mitigate class imbalance.

Figure S1:

Heatmaps showing data missingness of features from Table S1. A. Missingness of all waitlist, follow-up, and peritransplant features sorted by transplant date (780 features). B. Missingness of all waitlist, peritransplant features sorted by transplant date that were at least 90% complete and used for modeling. 2005 and 2015 are indicated on the heatmap to demonstrate features that are missing from the time periods used for modeling. Missingness of all waitlist, follow-up, and peritransplant features sorted by transplant date, used for PCA and at least 90% complete (357 features).

Figure S2:

Area under the receiver operating curves (A) and test set confusion matrices (B) for the prediction of 1 year mortality using data from 1987-2004, 2005-2014, 2015- present, and from all years. The top row and bottom row in each panel shows the performance in EHRFormer and XGBoost respectively.

Figure S3:

Area under the receiver operating curves (A) and test set confusion matrices (B) for the prediction of 1, 3, and 5 year mortality using data from 1987-2004, 2005-2014, 2015-present, and from all years. The top row and bottom row in each panel shows the performance in EHRFormer and XGBoost respectively.

Figure S4:

Area under the receiver operating curves (A) and test set confusion matrices (B) for the prediction of FEV1 <70% at 1 year using data from 1987-2004, 2005-2014, 2015- present, and from all years. The top row and bottom row in each panel shows the performance in EHRFormer and XGBoost respectively.

Figure S5: Area under the receiver operating curves and additional SHAP values for the prediction of 1 year mortality stratified by transplant type using all features vs. pre- transplant features only.

A. Area under the receiver operating curves for XGBoost models predicting 1 year mortality in single vs bilateral lung transplant recipients using either all features or only pre-transplant features as specified in Table S1. B. SHAP values for 1 year mortality prediction single lung transplant (left) vs bilateral lung transplant (right) recipients using only pre- transplant features.

Figure S6: Stratification of patients by prolonged ischemic time (>10h) reveals a higher proportion of 1 year mortality, ECMO at 72 hours after transplant, and death during the index stay.

Stacked proportional bar plots demonstrating the differences in proportions of 1 year mortality, ECMO at 72 hours after transplant, and death during the index stay (rows) between different stratifications of the patients (columns). Green indicates a positive flag for the binary outcome of interest whereas blue indicates a negative flag. Stratification by EVLP in comparison to matched controls with <10h of lung ischemic time (middle column) revealed some mitigation of poor outcomes. On the other hand, stratification by EVLP in comparison to those with >10h of lung ischemic time (last column) revealed greater mitigation of poor outcomes. Numbers of patients in each grouping are indicated on the graphs.

Table S1: Data dictionary of all UNOS features used for analysis.

The columns indicate which features were used for data exploration and which features were used in various models.

Acknowledgements

This research was supported by the computational resources and staff contributions provided for the Quest high-performance computing facility at Northwestern University, which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology. This research was supported in part through the computational resources and staff contributions provided by the Genomics Compute Cluster which is jointly supported by the Feinberg School of Medicine, the Center for Genetic Medicine, and Feinberg’s Department of Biochemistry and Molecular Genetics, the Office of the Provost, the Office for Research, and Northwestern Information Technology. The Genomics Compute Cluster is part of Quest, Northwestern University’s high-performance computing facility, with the purpose to advance research in genomics. We thank J. Milhans, A. Kinaci, S. Coughlin and all members of the Research Computing and Data Services team at Northwestern for their support.

References

1.↵
Graham CN, Watson C, Barlev A, Stevenson M, Dharnidharka VR. Mean lifetime survival estimates following solid organ transplantation in the US and UK. J Med Econ. 2022;25(1):230–237. doi:10.1080/13696998.2022.2033050
OpenUrl CrossRef PubMed
2.↵
Yusen RD, Edwards LB, Dipchand AI, et al. The Registry of the International Society for Heart and Lung Transplantation: Thirty-third Adult Lung and Heart-Lung Transplant Report-2016; Focus Theme: Primary Diagnostic Indications for Transplant. J Heart Lung Transplant. 2016;35(10):1170–1184. doi:10.1016/J.HEALUN.2016.09.001
OpenUrl CrossRef PubMed
3.↵
Bos S, Vos R, Van Raemdonck DE, Verleden GM. Survival in adult lung transplantation: where are we in 2020? Curr Opin Organ Transplant. 2020;25(3):268–273. doi:10.1097/MOT.0000000000000753
OpenUrl CrossRef PubMed
4.↵
Foroutan F, Malik A, Clark KE, et al. Predictors of 1-year mortality after adult lung transplantation: Systematic review and meta-analyses. The Journal of Heart and Lung Transplantation. 2022;41(7):937–951. doi:10.1016/J.HEALUN.2022.03.017
OpenUrl CrossRef
5.↵
Valapour M, Lehr CJ, Wey A, Skeans MA, Miller J, Lease ED. Expected effect of the lung Composite Allocation Score system on US lung transplantation. American Journal of Transplantation. 2022;22(12):2971–2980. doi:10.1111/AJT.17160
OpenUrl CrossRef PubMed
6.↵
Inci I, Schuurmans M, Ehrsam J, et al. Lung transplantation for emphysema: impact of age on short- and long-term survival. European Journal of Cardio-Thoracic Surgery. 2015;48(6):906–909. doi:10.1093/EJCTS/EZU550
OpenUrl CrossRef PubMed
7.
Allen JG, Arnaoutakis GJ, Weiss ES, Merlo CA, Conte J V., Shah AS. The impact of recipient body mass index on survival after lung transplantation. Journal of Heart and Lung Transplantation. 2010;29(9):1026–1033. doi:10.1016/J.HEALUN.2010.05.005
OpenUrl CrossRef PubMed Web of Science
8.
Ambur V, Taghavi S, Jayarajan S, et al. The impact of lungs from diabetic donors on lung transplant recipients. academic.oup.comV Ambur, S Taghavi, S Jayarajan, S Kadakia, H Zhao, J Gomez-Abraham, Y ToyodaEuropean Journal of Cardio-Thoracic Surgery, 2017•academic.oup.com. 2017;51(2):285-290. doi:10.1093/ejcts/ezw314
OpenUrl CrossRef
9.
Aramini B, Kim C, Diangelo S, et al. Donor surfactant protein D (SP-D) polymorphisms are associated with lung transplant outcome. American Journal of Transplantation. 2013;13(8):2130–2136. doi:10.1111/AJT.12326
OpenUrl CrossRef PubMed
10.
Yoon S, Jang EJ, Kim GH, et al. Adult lung transplantation case-volume and in-hospital and long-term mortality in Korea. J Cardiothorac Surg. 2019;14(1):1–8. doi:10.1186/S13019-019-0849-3/TABLES/4
OpenUrl CrossRef PubMed
11.
Whitson BA, Hertz MI, Kelly RF, et al. Use of the donor lung after asphyxiation or drowning: Effect on lung transplant recipients. Annals of Thoracic Surgery. 2014;98(4):1145–1151. doi:10.1016/J.ATHORACSUR.2014.05.065
OpenUrl CrossRef PubMed
12.
Tague LK, Scozzi D, Wallendorf M, et al. Lung transplant outcomes are influenced by severity of neutropenia and granulocyte colony-stimulating factor treatment. American Journal of Transplantation. 2020;20(1):250–261. doi:10.1111/AJT.15581
OpenUrl CrossRef PubMed
13.
Hayes D, Black SM, Tobias JD, Higgins RS, Whitson BA. Influence of donor and recipient age in lung transplantation. Journal of Heart and Lung Transplantation. 2015;34(1):43–49. doi:10.1016/J.HEALUN.2014.08.017
OpenUrl CrossRef PubMed
14.
Huppmann P, Neurohr C, Leuschner S, et al. The Munich-LTX-Score: Predictor for survival after lung transplantation. Clin Transplant. 2012;26(1):173–183. doi:10.1111/J.1399-0012.2011.01573.X
OpenUrl CrossRef PubMed
15.↵
Kurihara C, Fernandez R, Safaeinili N, et al. Long-Term Impact of Cytomegalovirus Serologic Status on Lung Transplantation in the United States. Annals of Thoracic Surgery. 2019;107(4):1046–1052. doi:10.1016/J.ATHORACSUR.2018.10.034
OpenUrl CrossRef PubMed
16.↵
Gholamzadeh M, Abtahi H, Safdari R. Machine learning-based techniques to improve lung transplantation outcomes and complications: a systematic review. BMC Med Res Methodol. 2022;22(1):331. doi:10.1186/S12874-022-01823-2
OpenUrl CrossRef PubMed
17.↵
Agrawal A, Al-Bahrani R, Russo MJ, Raman J, Choudhary A. Lung Transplant Outcome Prediction using UNOS Data. IEEE Xplore. December 2013. doi:10.1109/BigData.2013.6691751
OpenUrl CrossRef
18.↵
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016;13–17-August-2016:785-794. doi:10.1145/2939672.2939785
OpenUrl CrossRef
19.↵
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 2018;1:4171–4186. https://arxiv.org/abs/1810.04805v2. Accessed July 30, 2024.
OpenUrl
20.↵
20. OPTN database - OPTN. https://optn.transplant.hrsa.gov/data/about-data/optn-database/. Accessed July 30, 2024.
21.↵
Egan TM, Murray S, Bustami RT, et al. Development of the new lung allocation system in the United States. Am J Transplant. 2006;6(5 Pt 2):1212-1227. doi:10.1111/J.1600-6143.2006.01276.X
OpenUrl CrossRef
22.↵
Lundberg SM, Allen PG, Lee SI. A Unified Approach to Interpreting Model Predictions. https://github.com/slundberg/shap. Accessed July 30, 2024.
23.↵
Thabut G, Mal H, Cerrina J, et al. Graft ischemic time and outcome of lung transplantation: a multicenter analysis. Am J Respir Crit Care Med. 2005;171(7):786–791. doi:10.1164/RCCM.200409-1248OC
OpenUrl CrossRef PubMed Web of Science
24.↵
Darley DR, Ma J, Huszti E, et al. Diffusing Capacity for Carbon Monoxide (DLCO): Association with long-term outcomes after Lung Transplantation in a 20-year longitudinal study. European Respiratory Journal. 2021;59(1). doi:10.1183/13993003.03639-2020
OpenUrl Abstract/FREE Full Text
25.↵
Sage AT, Donahoe LL, Shamandy AA, et al. A machine-learning approach to human ex vivo lung perfusion predicts transplantation outcomes and promotes organ utilization. Nat Commun. 2023;14(1). doi:10.1038/S41467-023-40468-7
OpenUrl CrossRef
26.↵
Pedregosa FABIANPEDREGOSA F, Michel V, Grisel OLIVIERGRISEL O, et al. Scikit- learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(85):2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html. Accessed August 27, 2024.
OpenUrl
27.↵
Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Eng. 2007;9(3):90–95. doi:10.1109/MCSE.2007.55
OpenUrl CrossRef PubMed
28.↵
Waskom ML. seaborn: statistical data visualization. J Open Source Softw. 2021;6(60):3021. doi:10.21105/JOSS.03021
OpenUrl CrossRef
29.↵
Seabold S, Perktold J. Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference. 2010:92-96. doi:10.25080/MAJORA-92BF1922-011
OpenUrl CrossRef
30.↵
Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–272. doi:10.1038/S41592-019-0686-2
OpenUrl CrossRef PubMed
31.↵
Wolf T, Debut L, Sanh V, et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. October 2019. https://arxiv.org/abs/1910.03771v5. Accessed August 27, 2024.
32.↵
Spiref: Spirometry Reference Value Calculator - KU Leuven. https://kuleuven.limo.libis.be/discovery/fulldisplay?docid=lirias3999656&context=SearchWebhook&vid=32KUL_KUL:Lirias&lang=en&search_scope=lirias_profile&adaptor=SearchWebhook&tab=LIRIAS&query=any,contains,LIRIAS3999656&offset=0. Accessed August 27, 2024.
33.↵
Quanjer PH, Stanojevic S, Cole TJ, et al. MULTI-ETHNIC REFERENCE VALUES FOR SPIROMETRY FOR THE 3–95 YEAR AGE RANGE: THE GLOBAL LUNG FUNCTION 2012 EQUATIONS: Report of the Global Lung Function Initiative (GLI), ERS Task Force to establish improved Lung Function Reference Values. Eur Respir J. 2012;40(6):1324. doi:10.1183/09031936.00080312
OpenUrl Abstract/FREE Full Text
34.↵
Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 2019:2623–2631. doi:10.1145/3292500.3330701
OpenUrl CrossRef
35.↵
scikit-optimize/scikit-optimize. doi:10.5281/ZENODO.5565057
36.↵
Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf Process Syst. 2017;2017-December:4766-4775. https://arxiv.org/abs/1705.07874v2. Accessed August 27, 2024.
37.↵
Kline A, Luo Y. PsmPy: A Package for Retrospective Cohort Matching in Python. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2022;2022-July:1354-1357. doi:10.1109/EMBC48229.2022.9871333
OpenUrl CrossRef

View the discussion thread.

Posted October 21, 2024.

Download PDF

Data/Code

Citation Tools

Subject Area

Transplantation

Subject Areas

All Articles

Addiction Medicine (399)
Allergy and Immunology (708)
Anesthesia (200)
Cardiovascular Medicine (2913)
Dentistry and Oral Medicine (332)
Dermatology (249)
Emergency Medicine (438)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1030)
Epidemiology (12699)
Forensic Medicine (12)
Gastroenterology (825)
Genetic and Genomic Medicine (4557)
Geriatric Medicine (414)
Health Economics (724)
Health Informatics (2908)
Health Policy (1068)
Health Systems and Quality Improvement (1071)
Hematology (385)
HIV/AIDS (921)
Infectious Diseases (except HIV/AIDS) (14075)
Intensive Care and Critical Care Medicine (842)
Medical Education (422)
Medical Ethics (115)
Nephrology (466)
Neurology (4324)
Nursing (233)
Nutrition (634)
Obstetrics and Gynecology (799)
Occupational and Environmental Health (734)
Oncology (2256)
Ophthalmology (642)
Orthopedics (258)
Otolaryngology (324)
Pain Medicine (277)
Palliative Medicine (83)
Pathology (496)
Pediatrics (1196)
Pharmacology and Therapeutics (502)
Primary Care Research (493)
Psychiatry and Clinical Psychology (3728)
Public and Global Health (6909)
Radiology and Imaging (1522)
Rehabilitation Medicine and Physical Therapy (892)
Respiratory Medicine (914)
Rheumatology (435)
Sexual and Reproductive Health (443)
Sports Medicine (381)
Surgery (483)
Toxicology (60)
Transplantation (209)
Urology (178)

[1] 1.↵
Graham CN, Watson C, Barlev A, Stevenson M, Dharnidharka VR. Mean lifetime survival estimates following solid organ transplantation in the US and UK. J Med Econ. 2022;25(1):230–237. doi:10.1080/13696998.2022.2033050
OpenUrl CrossRef PubMed

[2] 2.↵
Yusen RD, Edwards LB, Dipchand AI, et al. The Registry of the International Society for Heart and Lung Transplantation: Thirty-third Adult Lung and Heart-Lung Transplant Report-2016; Focus Theme: Primary Diagnostic Indications for Transplant. J Heart Lung Transplant. 2016;35(10):1170–1184. doi:10.1016/J.HEALUN.2016.09.001
OpenUrl CrossRef PubMed

[3] 3.↵
Bos S, Vos R, Van Raemdonck DE, Verleden GM. Survival in adult lung transplantation: where are we in 2020? Curr Opin Organ Transplant. 2020;25(3):268–273. doi:10.1097/MOT.0000000000000753
OpenUrl CrossRef PubMed

[4] 4.↵
Foroutan F, Malik A, Clark KE, et al. Predictors of 1-year mortality after adult lung transplantation: Systematic review and meta-analyses. The Journal of Heart and Lung Transplantation. 2022;41(7):937–951. doi:10.1016/J.HEALUN.2022.03.017
OpenUrl CrossRef

[5] 5.↵
Valapour M, Lehr CJ, Wey A, Skeans MA, Miller J, Lease ED. Expected effect of the lung Composite Allocation Score system on US lung transplantation. American Journal of Transplantation. 2022;22(12):2971–2980. doi:10.1111/AJT.17160
OpenUrl CrossRef PubMed

[6] 6.↵
Inci I, Schuurmans M, Ehrsam J, et al. Lung transplantation for emphysema: impact of age on short- and long-term survival. European Journal of Cardio-Thoracic Surgery. 2015;48(6):906–909. doi:10.1093/EJCTS/EZU550
OpenUrl CrossRef PubMed

[7] 7.
Allen JG, Arnaoutakis GJ, Weiss ES, Merlo CA, Conte J V., Shah AS. The impact of recipient body mass index on survival after lung transplantation. Journal of Heart and Lung Transplantation. 2010;29(9):1026–1033. doi:10.1016/J.HEALUN.2010.05.005
OpenUrl CrossRef PubMed Web of Science

[8] 8.
Ambur V, Taghavi S, Jayarajan S, et al. The impact of lungs from diabetic donors on lung transplant recipients. academic.oup.comV Ambur, S Taghavi, S Jayarajan, S Kadakia, H Zhao, J Gomez-Abraham, Y ToyodaEuropean Journal of Cardio-Thoracic Surgery, 2017•academic.oup.com. 2017;51(2):285-290. doi:10.1093/ejcts/ezw314
OpenUrl CrossRef

[9] 9.
Aramini B, Kim C, Diangelo S, et al. Donor surfactant protein D (SP-D) polymorphisms are associated with lung transplant outcome. American Journal of Transplantation. 2013;13(8):2130–2136. doi:10.1111/AJT.12326
OpenUrl CrossRef PubMed

[10] 10.
Yoon S, Jang EJ, Kim GH, et al. Adult lung transplantation case-volume and in-hospital and long-term mortality in Korea. J Cardiothorac Surg. 2019;14(1):1–8. doi:10.1186/S13019-019-0849-3/TABLES/4
OpenUrl CrossRef PubMed

[11] 11.
Whitson BA, Hertz MI, Kelly RF, et al. Use of the donor lung after asphyxiation or drowning: Effect on lung transplant recipients. Annals of Thoracic Surgery. 2014;98(4):1145–1151. doi:10.1016/J.ATHORACSUR.2014.05.065
OpenUrl CrossRef PubMed

[12] 12.
Tague LK, Scozzi D, Wallendorf M, et al. Lung transplant outcomes are influenced by severity of neutropenia and granulocyte colony-stimulating factor treatment. American Journal of Transplantation. 2020;20(1):250–261. doi:10.1111/AJT.15581
OpenUrl CrossRef PubMed

[13] 13.
Hayes D, Black SM, Tobias JD, Higgins RS, Whitson BA. Influence of donor and recipient age in lung transplantation. Journal of Heart and Lung Transplantation. 2015;34(1):43–49. doi:10.1016/J.HEALUN.2014.08.017
OpenUrl CrossRef PubMed

[14] 14.
Huppmann P, Neurohr C, Leuschner S, et al. The Munich-LTX-Score: Predictor for survival after lung transplantation. Clin Transplant. 2012;26(1):173–183. doi:10.1111/J.1399-0012.2011.01573.X
OpenUrl CrossRef PubMed

[15] 15.↵
Kurihara C, Fernandez R, Safaeinili N, et al. Long-Term Impact of Cytomegalovirus Serologic Status on Lung Transplantation in the United States. Annals of Thoracic Surgery. 2019;107(4):1046–1052. doi:10.1016/J.ATHORACSUR.2018.10.034
OpenUrl CrossRef PubMed

[16] 16.↵
Gholamzadeh M, Abtahi H, Safdari R. Machine learning-based techniques to improve lung transplantation outcomes and complications: a systematic review. BMC Med Res Methodol. 2022;22(1):331. doi:10.1186/S12874-022-01823-2
OpenUrl CrossRef PubMed

[17] 17.↵
Agrawal A, Al-Bahrani R, Russo MJ, Raman J, Choudhary A. Lung Transplant Outcome Prediction using UNOS Data. IEEE Xplore. December 2013. doi:10.1109/BigData.2013.6691751
OpenUrl CrossRef

[18] 18.↵
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016;13–17-August-2016:785-794. doi:10.1145/2939672.2939785
OpenUrl CrossRef

[19] 19.↵
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 2018;1:4171–4186. https://arxiv.org/abs/1810.04805v2. Accessed July 30, 2024.
OpenUrl

[20] 20.↵
20. OPTN database - OPTN. https://optn.transplant.hrsa.gov/data/about-data/optn-database/. Accessed July 30, 2024.

[21] 21.↵
Egan TM, Murray S, Bustami RT, et al. Development of the new lung allocation system in the United States. Am J Transplant. 2006;6(5 Pt 2):1212-1227. doi:10.1111/J.1600-6143.2006.01276.X
OpenUrl CrossRef

[22] 22.↵
Lundberg SM, Allen PG, Lee SI. A Unified Approach to Interpreting Model Predictions. https://github.com/slundberg/shap. Accessed July 30, 2024.

[23] 23.↵
Thabut G, Mal H, Cerrina J, et al. Graft ischemic time and outcome of lung transplantation: a multicenter analysis. Am J Respir Crit Care Med. 2005;171(7):786–791. doi:10.1164/RCCM.200409-1248OC
OpenUrl CrossRef PubMed Web of Science

[24] 24.↵
Darley DR, Ma J, Huszti E, et al. Diffusing Capacity for Carbon Monoxide (DLCO): Association with long-term outcomes after Lung Transplantation in a 20-year longitudinal study. European Respiratory Journal. 2021;59(1). doi:10.1183/13993003.03639-2020
OpenUrl Abstract/FREE Full Text

[25] 25.↵
Sage AT, Donahoe LL, Shamandy AA, et al. A machine-learning approach to human ex vivo lung perfusion predicts transplantation outcomes and promotes organ utilization. Nat Commun. 2023;14(1). doi:10.1038/S41467-023-40468-7
OpenUrl CrossRef

[26] 26.↵
Pedregosa FABIANPEDREGOSA F, Michel V, Grisel OLIVIERGRISEL O, et al. Scikit- learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(85):2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html. Accessed August 27, 2024.
OpenUrl

[27] 27.↵
Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Eng. 2007;9(3):90–95. doi:10.1109/MCSE.2007.55
OpenUrl CrossRef PubMed

[28] 28.↵
Waskom ML. seaborn: statistical data visualization. J Open Source Softw. 2021;6(60):3021. doi:10.21105/JOSS.03021
OpenUrl CrossRef

[29] 29.↵
Seabold S, Perktold J. Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference. 2010:92-96. doi:10.25080/MAJORA-92BF1922-011
OpenUrl CrossRef

[30] 30.↵
Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–272. doi:10.1038/S41592-019-0686-2
OpenUrl CrossRef PubMed

[31] 31.↵
Wolf T, Debut L, Sanh V, et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. October 2019. https://arxiv.org/abs/1910.03771v5. Accessed August 27, 2024.

[32] 32.↵
Spiref: Spirometry Reference Value Calculator - KU Leuven. https://kuleuven.limo.libis.be/discovery/fulldisplay?docid=lirias3999656&context=SearchWebhook&vid=32KUL_KUL:Lirias&lang=en&search_scope=lirias_profile&adaptor=SearchWebhook&tab=LIRIAS&query=any,contains,LIRIAS3999656&offset=0. Accessed August 27, 2024.

[33] 33.↵
Quanjer PH, Stanojevic S, Cole TJ, et al. MULTI-ETHNIC REFERENCE VALUES FOR SPIROMETRY FOR THE 3–95 YEAR AGE RANGE: THE GLOBAL LUNG FUNCTION 2012 EQUATIONS: Report of the Global Lung Function Initiative (GLI), ERS Task Force to establish improved Lung Function Reference Values. Eur Respir J. 2012;40(6):1324. doi:10.1183/09031936.00080312
OpenUrl Abstract/FREE Full Text

[34] 34.↵
Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 2019:2623–2631. doi:10.1145/3292500.3330701
OpenUrl CrossRef

[35] 35.↵
scikit-optimize/scikit-optimize. doi:10.5281/ZENODO.5565057

[36] 36.↵
Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf Process Syst. 2017;2017-December:4766-4775. https://arxiv.org/abs/1705.07874v2. Accessed August 27, 2024.

[37] 37.↵
Kline A, Luo Y. PsmPy: A Package for Retrospective Cohort Matching in Python. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2022;2022-July:1354-1357. doi:10.1109/EMBC48229.2022.9871333
OpenUrl CrossRef

Machine learning analysis of the UNOS database fails to predict lung transplant outcomes

Abstract

Introduction

Results

The UNOS dataset reveals changes in lung transplant practices and outcomes in the US over time

XGBoost and EHRFormer fail to predict one year mortality

XGBoost and EHRFormer fail to predict one-, three-, and five-year mortality

XGBoost and EHRFormer modestly predict patients with poor lung function at one year

Index stay features have high importance for mortality prediction

Removing index hospitalization features and further subsetting on the Lung Allocation Score (LAS) further reduces model performance

Index stay features are highly influential for prediction of lung function

Stratification by transplant type results in a small increase in model performance

EHRFormer permits querying the effect of multiple features simultaneously

Perturbing multiple features simultaneously allows EHRFormer to explain the unexpected influence of long ischemic time on one year mortality in the XGBoost model

Features associated with frailty predicted death during the index hospitalization

Discussion

Conclusion

Disclosure and data availability statement

Data Availability

Competing interest statement

Methods

Data preparation, cleaning, and feature encoding

Principal components analysis

Statistical tests

Data preparation for modeling

Feature selection and preparation for modeling

EHRFormer feature representation and pretraining

Modeling one-, three-, and five-year mortality and lung function outcomes

Mortality label retrieval

Lung function <70% of predicted label retrieval

EHRFormer fine-tuning and evaluation

XGBoost evaluation

EHRFormer perturbations

Hierarchical clustering for length of stay analysis

Propensity score matching for creating matched cohorts for length of stay analysis

Acknowledgements

References

Citation Manager Formats

Subject Area