Abstract
Rationale COVID-19-associated respiratory failure offers the unprecedented opportunity to evaluate the differential host response to a uniform pathogenic insult. Prior studies of Acute Respiratory Distress Syndrome (ARDS) have identified subphenotypes with differential outcomes. Understanding whether there are distinct subphenotypes of severe COVID-19 may offer insight into its pathophysiology.
Objectives To identify and characterize distinct subphenotypes of COVID-19 critical illness defined by the post-intubation trajectory of Sequential Organ Failure Assessment (SOFA) score.
Methods Intubated COVID-19 patients at two hospitals in New York city were leveraged as development and validation cohorts. Patients were grouped into mild, intermediate, and severe strata by their baseline post-intubation SOFA. Hierarchical agglomerative clustering was performed within each stratum to detect subphenotypes based on similarities amongst SOFA score trajectories evaluated by Dynamic Time Warping. Statistical tests defined trajectory subphenotype predictive markers.
Measurements and Main Results Distinct worsening and recovering subphenotypes were identified within each stratum, which had distinct 7-day post-intubation SOFA progression trends. Patients in the worsening suphenotypes had a higher mortality than those in the recovering subphenotypes within each stratum (mild stratum, 29.7% vs. 10.3%, p=0.033; intermediate stratum, 29.3% vs. 8.0%, p=0.002; severe stratum, 53.7% vs. 22.2%, p<0.001). Worsening and recovering subphenotypes were replicated in the validation cohort. Routine laboratory tests, vital signs, and respiratory variables rather than demographics and comorbidities were predictive of the worsening and recovering subphenotypes.
Conclusions There are clear worsening and recovering subphenotypes of COVID-19 respiratory failure after intubation, which are more predictive of outcomes than baseline severity of illness. Organ dysfunction trajectory may be well suited as a surrogate for research in COVID-19 respiratory failure.
Scientific Knowledge on the Subject COVID-19 associated respiratory failure leads to a significant risk of morbidity and mortality. It is clear that there is heterogeneity in the viral-induced host response leading to differential outcomes, even amongst those treated with mechanical ventilation. There are many studies of COVID-19 disease which use intubation status as an outcome or an inclusion criterion. However, there is less understanding of the post intubation course in COVID-19.
What This Study Adds to the Field We have developed and validated a novel subphenotyping model based on post-intubation organ dysfunction trajectory in COVID-19 patients. Specifically, we identified clear worsening and recovering organ dysfunction trajectory subphenotypes, which are more predictive of outcomes than illness severity at baseline. Dynamic inflammatory markers and ventilator variables rather than baseline severity of illness, demographics and comorbidities differentiate the worsening and recovering subphenotypes. Trajectory subphenotypes offer a potential road map for understanding the evolution of critical illness in COVID-19.
Introduction
The COVID-19 pandemic has created an unprecedented opportunity to explore a large cohort of patients infected with a single pathogen thus providing a window to examine patient variability in response to a uniform insult. Despite this opportunity, distinct subphenotypes of severe-COVID-19 associated respiratory failure remain largely unexplored(1-3). SARS-CoV-2 infection often leads to hypoxemic respiratory failure requiring treatment with mechanical ventilation which meets clinical and pathologic criteria for Acute Respiratory Distress Syndrome (ARDS)(4-6). In COVID-19 respiratory failure, like other forms of ARDS, there is significant risk of morbidity and mortality. However, there is clear heterogeneity in outcomes, even in those treated with mechanical ventilation(4, 5, 7-9). The baseline clinical characteristics and predictors of mortality of those requiring mechanical ventilation have been described(4, 7, 8, 10). These studies offer some insight into a differential host response but are limited to characterizing patients at baseline.
In prior studies of ARDS(11, 12), unique subphenotypes have been described, which identify hyperinflammatory and hypoinflammatory populations with differential demographics, clinical characteristics, inflammatory markers and outcomes. These subphenotypes are primarily characterized by host response inflammatory markers and patterns of organ injury, but are agnostic of the type of insult or infection. In COVID-19, baseline risk stratification may be insufficient to characterize subphenotypes that accurately reflect the complexity of the disease arc(13). Serial, temporally ordered, Sequential Organ Failure Assessment (SOFA)(14-17) and comprehensive Electronic Health Records (EHR) data are well suited to develop data-driven subphenotypes(18), where the goal is to identify coherent patient groups with similar clinical courses. Dynamic time warping (DTW)(19) is a well-established technique for evaluating the similarities among temporal sequences(20, 21). DTW is particularly well suited to evaluate longitudinal changes in organ dysfunction in COVID19. Characterizing a more complete representation of the disease course in COVID19 may offer insight into its pathophysiology.
We used DTW to conduct a two staged post-intubation trajectory analysis of SOFA-based organ dysfunction in patients with COVID19 to identify unique subphenotypes. In order to understand the differential disease course, we then explored clinical and biologic features including demographics, comorbidities, clinical characteristics, inflammatory markers, and treatments predictive of these trajectories.
Methods
This was a retrospective two staged modeling analysis on two cohorts of intubated COVID-19 patients. The overall workflow of our study is illustrated in Figure 1.
Study design and cohort description
We used individual patient data from two New York Presbyterian (NYP) system hospitals located in New York city: the New York Presbyterian Hospital-Weill Cornell Medical Center (NYP-WCMC), an 862-bed quaternary care hospital, and the New York Presbyterian-Lower Manhattan Hospital (NYP-LMH), a 180-bed non-teaching academic affiliated hospital. Patients were admitted from Mar 3, 2020 to May 12, 2020. SARS-CoV2 diagnosis was made through reverse-transcriptase–PCR assays performed on nasopharyngeal swabs. The critical care response to the pandemic has been previously described(22). The NYP-WCMC cohort was used as the development cohort to derive subphenotypes, and the NYP-LMH cohort was used for validation. The focus of this study was critically ill patients with COVID-19 who were treated with intubation (Supplemental Appendix 1).
Data collection
We collected all data from either the Weill Cornell-Critical carE Database for Advanced Research (WC-CEDAR), Weill Cornell Medicine COVID Institutional Data Repository (COVID-IDR), or via manual chart abstraction (REDCap). WC-CEDAR aggregates and transforms data from institutional electronic health records for all patients treated in ICUs in NYP-WCMC and NYP-LMH(23). The COVID-IDR contains additional aggregate EHR data on all patients who were tested for SARS-CoV-2 at NYP-WCMC or NYP-LMH. The REDCap database contains high-quality manually abstracted data on all patients who tested positive for COVID-19 at NYP-WCMC or NYP-LMH(24). In our analysis, the patient information incorporated included demographics, laboratory tests, vital signs, and respiratory variables obtained from WC-CEDAR, comorbidity information obtained from the REDCap database, and medication data obtained from the COVID-IDR. Data analyzed were detailed in Supplemental Appendix 2.
SOFA calculation
The SOFA score is the sum of six organ dysfunction subscores, including cardiovascular, central nervous system (CNS), coagulation, liver, renal, and respiration(14, 17). In this study, the CNS, coagulation, liver, and renal subscores were derived according to the standard SOFA scoring system(14). The respiration subscore was calculated using a combination of the traditional and modified scoring method(25). The cardiovascular SOFA subscore was calculated with additional vasopressors according to a norepinephrine equivalency table, where phenylephrine and vasopressin were converted to a norepinephrine equivalency(26). SOFA scores were derived every 24 hours from the time of intubation, and the worst score within that 24-hour data period was selected for each patient.(14)
Inclusion exclusion criteria
We included patients with positive results on viral RNA detection by real-time reverse transcriptase polymerase chain reaction (RT-PCR) test from nasopharyngeal swabs specimens and treated with mechanical ventilation at the ICU in NYP-WCM and NYP-LMH. We excluded patients who were less than 18 years old. Since our aim was to identify clinically meaningful organ dysfunction progression patterns of intubated patients, trajectories with low quality (20 (5.7%) patients missing over 50% SOFA records) and outlier trajectories (10 (2.9%) patients with unchanged or heavily fluctuated within the 7-day window after intubation) were excluded from the analysis (Supplemental Appendix 3 and Figure E-1).
Subphenotyping model description
SOFA scores were derived every 24 hours and post intubation 7 day SOFA trajectories were constructed for analysis. Missing values within a trajectory were imputed based on the last observation carried forward (LOCF) strategy.
A two-staged subphenotyping method was performed to derive SOFA trajectory subphenotypes (Figure 1). In the first stage, we used baseline SOFA to group patients with a similar upfront risk of death(17), as additive organ dysfunction has previously been identified to be associated with poor outcomes in COVID19(8). We partitioned the patients into three baseline severity strata (mild, intermediate, and severe) according to their SOFA scores within the first 24 hours after intubation. The SOFA score cut-offs were set to 0-10, 11-12, and 13-24 in order to obtain a balanced distribution of patients across the three strata. In the second stage, we identified the subphenotypes with similar 7-day SOFA progression patterns. Dynamic Time Warping (DTW)(19) was adopted to evaluate the similarities between pairwise patient SOFA trajectories within each baseline stratum and then hierarchical agglomerative clustering (HAC)(27) was performed on these similarities to derive the similar patient clusters as trajectory subphenotypes. DTW can account for the differences among the evolution heterogeneity among the temporal curves and is thus able to evaluate their similarity more robustly.(19) The optimal numbers of subphenotypes were determined by clear separation illustrated by clustergram according to the McClain index(28). DTW was implemented with Python 3.7 based on tslearn package 0.3.1 and HAC was performed with Python 3.7 based on scikit-learn package 0.22.2.
To validate these findings, we replicated these subphenotypes from the NYP-LMH cohort.
Clinical outcomes
We analyzed 30-day all-cause mortality as the primary outcome for patients within each phenotype. Successful extubation or need for tracheostomy within 30 days after intubation were secondary outcomes.
Statistical analysis
We examined the associations between patient characteristics and clinical variables and the identified trajectory subphenotypes, to see if there are early markers that can discriminate between them. Patient characteristics we investigated included demographics, comorbidities, medications prescribed within the window from 3-day before to 5-day after intubation, and blood type(29). Laboratory test results included: complete blood count, basic metabolic panel, liver function tests, coagulation profile and inflammatory markers including d-dimer, fibrinogen, ferritin, erythrocyte sedimentation rate, lactic acid, troponin, lactate dehydrogenase, creatine kinase, procalcitonin and C-reactive protein. Vital signs included: GCS, mean arterial pressure and temperature, urine output. Respiratory variables included: P/F ratio, FiO2, Pao2, PaCO2, PH, PEEP, peak inspiratory pressure, plateau pressure, driving pressure, static compliance, minute ventilation, ventilatory ratio, and tidal volume indexed to ideal body weight at day 1 and day 3 post-intubation.
Univariate statistical tests were performed in those association analyses. Specifically, one-way analysis of variance (ANOVA, with Tukey HSD post hoc test), Kruskal–Wallis test (with Dunn post hoc test), student’s t-test, Mann-Whitney test, Chi-square test, and Fisher’s exact test have been used whenever appropriate. The p-values were then corrected for multiple testing using false discovery rate (FDR) estimation. Analysis of covariance (ANCOVA) for the between-strata/subphenotypes comparisons was also applied based on the generalized linear model (GLM) with adjustment on age at baseline. All statistical tests were performed with Python 3.7 based on statsmodels package 0.11.1.
Subphenotype prediction modeling
We trained a random forest model with the trajectory subphenotypes as targets and the patient clinical characteristics at specific time points after intubation as input predictors to define if these trajectory subphenotypes can be predicted early. Our implementation was with Python 3.7 based on scikit-learn package 0.22.2. Candidate predictors included demographics, comorbidities, medications prescribed around the intubation event, SOFA subscores, laboratory tests, vital signs, and respiratory variables as described above. Prediction performances were measured by area under the receiver operating characteristics (AUC-ROC). The importance of predictors was visualized as a heatmap to demonstrate their contributions on subphenotype prediction.
IRB approval
The study is approved by the IRB of Weill Cornell Medicine with protocol number 20-04021909.
Results
Patients and baseline severity strata
A total of 318 mechanically ventilated COVID-19 patients from the NYP-WCMC cohort were included for analysis, consisting of 100 females (31.45%) and an average age of 62.78 ± 14.34. One day post-intubation the mean SOFA score for this cohort is 11.89 ± 2.56. A total of 84 mechanically ventilated COVID-19 patients from the NYP-LMH were included as a validation cohort, consisting of 33 (39.29%) females and an average age of 66.06 ± 13.06. One day post-intubation the mean SOFA score is 12.51 ± 2.25. The clinical characteristics of both cohorts are summarized in Table 1.
For the NYP-WCMC cohort, patients were first partitioned into mild, intermediate, and severe strata based on the SOFA scores within one day after intubation, consisting of 76 (23.29%), 116 (36.48%), and 126 (39.62%) patients, respectively; while for the NYP-LMH validation cohort, the three strata consist of 10 (11.90%), 35 (41.67%), and 39 (46.43%) patients, respectively. As shown in Table 1, the patients in both NYP-WCMC and NYP-LMH cohorts exhibit additive patterns of post intubation baseline organ dysfunction according to the SOFA subscores. Specifically, CNS and respiration dysfunction were present in the mild stratum; the intermediate stratum had additional cardiovascular dysfunction on top of CNS and respiratory dysfunction compared to the mild stratum; and the severe stratum had renal dysfunction in addition to all other organ failure. Liver and coagulation dysfunction were rare in all strata. Patients in the severe stratum were generally older and were more likely to suffer from chronic comorbidities at baseline.
SOFA trajectory subphenotypes
The clustergrams built upon the pairwise SOFA trajectory distance matrix derived by DTW are shown in Supplemental Figure E-2. The optimal number of subphenotypes within each stratum as determined by the McClain Index(28) are shown in Supplemental Table E-1, suggesting two being the best choice across all strata in both cohorts. Figure 2 demonstrates the individual averaged SOFA curves for patients in the two subphenotypes across all strata: a worsening subphenotype of which SOFA score increased within the 7-day observation window, and a recovering subphenotype of which SOFA score improved. The clinical characteristics of these subphenotypes were summarized in Table 2. Overall, there was no marked difference in terms of demographics, comorbidity burden, and pattern of organ dysfunction (distribution of SOFA subscores and total score) between the worsening and recovering subphenotypes within each baseline severity stratum at baseline. This suggests that, though the subphenotypes varied in 7-day organ dysfunction progression patterns, they have similar clinical status immediately after intubation. We further investigated medications prescribed within each subphenotype and didn’t find significant signal as well (Supplemental Table E-3). In addition, clinical characteristics and medications of the subphenotypes re-derived in the NYP-LMH validation cohort were summarized in Supplemental Tables E-2 and 4.
30-days clinical outcomes
Statistics of 30-day post-intubation clinical primary and secondary outcomes (mortality, extubation, and tracheostomy) of subphenotypes were illustrated in Figure 2A and Supplemental Figure E-3A. The worsening subphenotypes, across baseline strata, suffered from a significantly higher risk of mortality within the 30-day window after intubation (worsening vs recovering, mortality proportion: mild stratum, 29.7% vs. 10.3%, p=0.033; intermediate stratum, 29.3% vs. 8.0%, p=0.002; severe stratum, 53.7% vs. 22.2%, p<0.001). The recovering subphenotypes, across all baseline strata, showed significantly higher extubation proportions within the 30-day window compared to the worsening subphenotypes (recovering vs. worsening, extubation proportion: mild stratum, 76.9% vs. 27.0%, p<0.001; intermediate stratum, 54.7% vs. 31.7%, p=0.018; severe stratum 50.0% vs. 14.8%, p<0.001). There was no significant difference of 30-day tracheostomy detected between the subphenotypes. Importantly, the recovering subphenotype within the severe baseline stratum had a lower mortality risk compared to the worsening subphenotypes at mild and intermediate baseline strata.
The trajectory subphenotypes derived in the NYP-LMH validation cohort had similar trends in all three clinical outcomes within the 30-day window after intubation (see Figure 2A and Supplemental Figure E-3B). Across all baseline strata, the worsening subphenotypes accounted for higher risk of mortality (worsening vs recovering, mortality proportion: mild stratum, 57.1% vs. 0.0%, p=0.200; intermediate stratum, 31.8% vs. 7.7%, p=0.211; severe stratum, 83.3% vs. 17.4%, p<0.001), while the recovering subphenotypes showed higher extubation proportion within 30-days after intubation (recovering vs. worsening, extubation proportion: mild stratum, 33.3% vs. 14.3%, p=0.490; intermediate stratum, 69.2% vs. 27.3%, p=0.015; severe stratum, 48.1% vs. 9.1%, p=0.017).
Correlation of subphenotypes with early-stage markers
Vital signs, laboratory variables, and respiratory variables were evaluated to identify early-stage markers predictive of the two-stage classification. First of all, the three baseline strata of the NYP-WCMC cohort were observed to be well separated by a series of clinical variables in addition to the differential organ dysfunction pattern noted above (Supplemental Table E-5). For instance, ANOVA(or Kruskal–Wallis test) results showed significantly increased laboratory values like procalcitonin, ferritin, lactate dehydrogenase (LDH), and creatinine, and decreased carbon dioxide within the severe stratum, at day 1 post-intubation; Vital sign such as the Glasgow Coma Scale (GCS) and urine output volume and respiratory variables such as peak inspiratory pressure (PIP) were also observed to separate the strata; the mild stratum was associated with significantly decreased levels of troponin, creatinine, and glucose. Detailed statistical analyses are described in Supplemental Table E-5. Statistics of these clinical variables across baseline strata within the NYP-LMH validation cohort showed similar signals and were detailed in Supplemental Table E-6.
We further evaluated the clinical variable values at day 1 and day 3 post-intubation to compare the worsening and recovering subphenotypes within each baseline stratum. Markers, which significantly separated the worsening and recovering subphenotypes (p-value<0.05), varied across different baseline strata (Supplemental Table E-7). Specifically, on day 1, laboratory tests including white blood cell (WBC) count, procalcitonin, CRP, creatinine, neutrophil count, and globulin, and respiratory variables such as minute ventilation and ratio of tidal volume to predicted body weight (PBW) were found to significantly differentiate the trajectory subphenotypes within the mild stratum. Markers that significantly separated the trajectory subphenotypes within the intermediate stratum on day 1 included AST, creatinine, LDH and FiO2. The severe stratum had more markers separating the worsening and recovering subphenotypes on day 1, including laboratory tests like lymphocyte count, platelet count, triglycerides, procalcitonin level, ferritin, troponin, LDH, creatinine, and AST, respiratory variables such as PIP, PaO2, PaO2/FiO2 ratio (P/F ratio), and SpO2. In addition, more markers separating subphenotypes emerged within each stratum on day 3 after intubation. For instance, additional day 3 markers including GCS, platelet, sodium, and glucose were found within the mild stratum. For the intermediate stratum, GCS, albumin, WBC count, lymphocyte percentage, CRP, sodium, neutrophil percentage and count, carbon dioxide level, urine output volume, and P/F ratio were found to be additional markers on day 3. The additional markers found within the severe stratum on day 3 included vital signs GCS and temperature, lymphocyte percentage, hemoglobin level, albumin level, potassium level, ESR, CK, carbon dioxide level, and lactic acid, positive end-expiratory pressure (PEEP), plateau pressure, minute ventilation, static compliance, driving pressure, FiO2, and ventilator ratio.
As shown in Supplemental Table E-8, most markers identified within the NYP-WCMC cohort showed consistent signals within the NYP-LMH subphenotypes, even though some significance vanishes, as the confidence intervals were wide.
Subphenotype prediction models
We trained random forest models for predicting the worsening and recovering trajectory subphenotypes within each baseline stratum according to the early stage marker values. Overall, as shown in Supplemental Figure E-4, within the mild, intermediate, and severe strata, the prediction models achieved the AUC-ROCs of 0.70 (95% confidence interval [CI], [0.59, 0.80]), 0.67 (95% CI, [0.62, 0.72]), and 0.73 (95% CI, [0.67, 0.79]) respectively, with the predictor values evaluated at day 1 post-intubation. AUC-ROCs of the models increased to 0.77 (95% CI, [0.66, 0.89]), 0.77 (95% CI, [0.74, 0.81]), and 0.80 (95% CI, [0.75, 0.85]), with the predictor values evaluated at day 3 post-intubation; and to 0.83 (95% CI, [0.72, 0.94]), 0.91 (95% CI, [0.88, 0.94]), and 0.88 (95% CI, [0.80, 0.95]), with the predictor values evaluated at day 5 post-intubation.
Importance of the predictors were illustrated as heatmaps, where color intensity represents the normalized importance of specific predictors (Supplemental Figure E-5). Generally, predictor importance varied as the progress of time. Models trained on day 1-3 after intubation were observed to involve more contributions from the laboratory tests, vital signs, respiratory variables than other predictors; SOFA subscores, especially cardiovascular, CNS, and renal subscores showed relatively higher importance over models trained on day 4 or 5 data within the intermediate and severe strata. Age contributed to day 1-3 prediction to some extent, while other demographics, medications and comorbidities showed weak importance in prediction.
Correlation to blood type
We finally assessed ABO/RH blood type distribution across the subphenotypes (Supplemental Tables E-9 and 10). Overall, there is no significant signal detected.
Discussion
In this study, we identified novel trajectory subphenotypes of COVID-19 patients with an objective machine learning approach. The subphenotypes we identified are based on organ dysfunction trajectory over 7-days following intubation, which is different from existing data-driven subphenotyping methods that focus on patient data at a specific timestamp (12, 30, 31). The use of novel methodology, in addition to the robust size of our cohort, ensure that the identified trajectory based subphenotypes are less likely to suffer from cognitive bias(13) and are likely to be temporally stable(32). More concretely, we adopted a divide and conquer approach to identify the subphenotypes. Prior research has identified that additive organ dysfunction is predictive of increased mortality in COVID-19 associated ARDS(8). Therefore, we divided the patients into three different baseline strata (mild, intermediate and severe) according to additive SOFA based organ dysfunction. We identified two salient trajectory subphenotypes within each stratum.
Importantly, the baseline demographics, comorbidities and pattern of organ dysfunction did not differ between the worsening and recovering subphenotypes at each stratum. This suggests the existence of differential progression pathways that are irrespective of baseline risk factors for severe disease. This finding is unique compared to other subphenotyping projects as we are including a more complete picture of the disease course(12, 30, 31). It also highlights the temporal heterogeneity of COVID-19 and the importance of avoiding prognostication based on early post intubation clinical characteristics. We found that the worsening subphenotypes in the baseline mild and intermediate strata showed an even higher risk of death compared to the recovering subphenotype within the baseline severe stratum (Figure 3). Indeed, there is an urgent need to understand the pathophysiology of progressive non-pulmonary organ dysfunction in this disease.
We assessed the differences between a broad range of laboratory tests, vital signs, and respiratory variables in the worsening and recovering subphenotypes. Importantly, basic laboratory tests and inflammatory markers were differentially associated with the worsening and recovering subphenotypes over time, which suggests that there is value in clinically following markers such as D-dimer, LDH, ferritin, procalcitonin and C-reactive protein. In the mild stratum, markers from the regular blood panel such as total white blood cell count and neutrophil counts, while inflammatory markers, such as ferritin and LDH, differentiate in the severe stratum. Laboratory tests, vitals and inflammatory markers in the intermediate stratum were less able to distinguish between the worsening and recovering subphenotypes. However, differences emerged over a longer time horizon (e.g., day 3). This further highlights the dynamic nature of COVID-19 and the difficulty in early prognosis in the critically ill population, despite severely deranged baseline organ dysfunction and inflammatory markers.
We built multivariable prediction models for the identified trajectory subphenotypes from patient baseline characteristics and early-stage clinical feature values. Models were built on at successive time points (day 1, 2, 3, 4, and 5) after intubation. Predictive performances measured by AUC-ROC improved as the number of days increased. The predictors’ importance to differentiating worsening and recovering subphenotypes showed varying patterns (Figure E-5). Importantly, aside from age and BMI, demographics, baseline comorbidities, and medications prescribed around intubation did not contribute to discriminating the subphenotypes in any of the strata. The persistence or development of renal failure, predicts subphenotype assignment later in the severe and intermediate strata, respectively. The persistence of vasodilatory shock in the intermediate stratum predicts the worsening subphenotype. While the development of thrombocytopenia, is discriminative late in the severe stratum. Interestingly, over the course of the first 7 days following intubation, liver failure remained rare. At different points in the course, inflammatory markers such as creatine kinase and D-dimer predicted worsening and recovering subphenotypes.
Our study was conducted on the two NYP system hospitals. Woresning and recovering SOFA subphenotypes, clinical characteristics, and outcomes from the validation cohort was consistent with the original subphenotypes. Although, due to the limited size of NYP-LMH validation cohort, statistical significance of some markers vanished, most of the results reflected the development cohort’s findings. This consistency ensures the existence of the worsening and recovering trajectory subphenotypes at each baseline stratum of the critically ill COVID-19 patients.
Limitations
While this study presents a step forward in the efforts to parse the progression heterogeneity of critically ill patients with COVID-19, several limitations remain. The first limitation could be SOFA’s inadequacy in tabulating organ dysfunction in COVID-19 associated respiratory failure. For example, COVID-19 is associated with a different pattern of hypercoagulability compared to sepsis, which is reflected in the preserved platelet count in most this cohort’s patients.(33) Also despite elevations in liver chemistries in many patients, hyperbilirubinemia was rare. Despite this limitation, SOFA trajectory subphenotypes predicted mortality and will allow for future comparisons with other diseases.
Second, we did not use the progression of inflammatory markers such as C-reactive protein, D-dimer or ferritin, which are known risk factors for this disease, to identify the subphenotypes. Nor did we stratify patients based the severity of respiratory failure alone. Instead, we chose to see how these factors interacted with traditional organ dysfunction, as most patients with COVID19 die from multisystem organ failure and not refractory respiratory failure(8, 9).
Third, differentiating trajectory subphenotypes in this critically ill population was difficult, as AUC-ROC metrics of prediction modeling using data at day 1 post-intubation were around 0.7. By restricting our analysis to a very high-risk population, we decreased the discriminative power of many of our biomarkers to predict outcomes. All patients were high risk. However, we have added to our understanding of patients with critical COVID-19, by documenting the natural history of organ dysfunction in this population. Future research efforts, with novel biomarkers, are needed to predict worsening and recovering subphenotypes at an earlier time point in those with respiratory failure.
Fourth, the surge conditions in New York City during the study period could affect the study. Care may have been influenced by the surge conditions during this difficult time. However, all patients were cared for in a critical care environment and despite the massive patient burden, the all cause 30-day mortality was 25.9%.
Conclusions
In a population of critically ill patients with COVID-19 respiratory failure, there are distinct worsening and recovering organ dysfunction trajectory subphenotypes. Worsening status was predictive of poor outcomes in all strata regardless of baseline severity. These findings highlight the importance of supportive care for sequential organ failure in addition to respiratory failure in this disease. Trajectory based subphenotypes offer a potential road map for understanding the evolution of critical illness in COVID-19. We call for further analysis.
Data Availability
The patient data are from NYP-WCM and NYP-LMH and not publicly available.
Supplementary Materials
Appendix 1. Patient care
Care of the patient was at the discretion of each attending physician. Patients were intubated prior to arrival to the ICU. Daily briefings were held during the surge to reinforce best practices and evidence related to caring for acute hypoxemic respiratory failure. Volume control ventilation targeting a tidal volume of 6-8 mls/kg IDBW and a plateau pressure < 30 cm/h20 was recommended. No particular PEEP titration protocol was mandated, however the ARDSnet moderate and high PEEP documents were made available to practitioners. Propofol was the first sedative of choice with the addition of opiates for dyspnea and perceived pain. An even to positive fluid balance was suggested on days one and two of mechanical ventilation. Patients with hypotension despite volume loading were treated with vasopressors, norepinephrine as the first choice to target a MAP of 65 mmhg. Prone positioning was suggested if a patient had a PF ratio of <150 despite optimization of PEEP and vent synchrony. Neuromuscular blockade was suggested in cases of refractory hypoxemia. Inhaled nitric oxide was available in cases of refractory respiratory failure at the discretion of the treating intensivists. Extra corporeal membrane oxygenation support was not offered due to resource limitations. HProphylactic anticoagulation was recommended for all patients and a higher dose prophylactic regimen was implemented during the course of the surge. Enoxaparin was the recommended first choice anticoagulation, even in the setting of renal insufficiency with strict monitoring of anti-factor Xa levels. Heparin was used for patients with concern for active bleeding or patients unable to tolerate enoxaparin. Corticosteroids were used at the discretion of the individual attending physician for the treatment of ARDS and for other traditional complications such as septic shock, bronchospasm and airway edema. Hydroxychloroquine without azithromycin was used to treat most patients as an antiviral. Remdesivir was available through compassionate use and through several clinical trials. Off label use of IL-6 inhibition was directed by consultation with infectious disease physicians. Choices regarding antibiotics and other therapeutics were at the discretion of treating intensivists with consultation with infectious disease physicians.
Appendix 2. Data description
In this study, all data were collected from either the Weill Cornell-Critical carE Database for Advanced Research (WC-CEDAR)(1), Weill Cornell Medicine COVID Institutional Data Repository (COVID-IDR)(2), or via manual chart abstraction (REDCap). A wide range of data were included for analysis, including:
Demographics include age, sex, race, and body mass index (BMI) were obtained when admitted to ICU.
Chronic comorbidities were assessed at ICU admission and collected via chart abstraction. Comorbidities studied include coronary artery disease, cerebrovascular accident (stroke), heart failure, hypertension, diabetes mellitus, pulmonary disease, renal disease, cirrhosis, hepatitis, active cancer, transplant, inflammatory bowel disease, human immunodeficiency viruses, rheumatologic disease, and other immunosuppressed state.
Medications prescribed were collected from the COVID-IDR database. Medications screened include tocilizumab, hydroxychloroquine, corticosteroids (such as prednisone, methylprednisolone, dexamethasone, and hydrocortisone), enoxaparin, heparin, and antibiotics (such as ceftriaxone, azithromycin, piperacillin-tazobactam, meropenem, vancomycin, and doxycycline). Enoxaparin was further assessed as prophylactic dose (<.5 mg per KG), high prophylactic dose (<1 mg per KG), and treatment dose (>1 mg per KG), while heparin was assessed as prophylactic dose (subcutaneous delivery) and treatment dose (intravenous delivery).
In order to identify markers associated to identified subphenotypes, we assessed a broad spectrum of laboratory tests related to organ failure, including alanine aminotransferase (ALT), albumin level, aspartate aminotransferase (AST), total bilirubin, carbon dioxide, creatine kinase (CK) level, creatinine, C-reactive protein (CRP) level, D-dimer, erythrocyte sedimentation rate (ESR), ferritin level, globulin level, glucose level, hemoglobin level, lactate dehydrogenase (LDH), lactic acid level, lymphocyte percentage and count, neutrophil percentage and count, platelets level, potassium level, procalcitonin level, sodium level, troponin level, triglycerides, and white blood cell (WBC) count.
Several vital signs were also assessed, including Glasgow Coma Scale score (GCS), and mean arterial pressure (MAP), temperature.
Clinical variables reflecting respiratory status of patients were also studied, including fraction of inspired oxygen (FiO2), partial pressure of oxygen (PaO2), PaO2/FiO2 ratio (P/F ratio), and oxygen saturation (SpO2).
Several parameters of ventilator setting were studied, including driving pressure, minute ventilation, arterial partial pressure of carbon dioxide (PCO2) level, positive end-expiratory pressure (PEEP), PH, peak inspiratory pressure (PIP), plateau pressure, static compliance, tidal volume, tidal volume to predicted body weight (PBW) ratio, and ventilator ratio.
ABO/RH blood type data were collected from COVID-IDR database.
Appendix 3. Inclusion exclusion criteria
Inclusion
We included patients with positive results on viral RNA detection by real-time reverse transcriptase polymerase chain reaction (RT-PCR) test from nasopharyngeal swabs specimens and treated with mechanical ventilation at the ICU in NYP-WCMC and NYP-LMH.
Exclusion
Patients who were less than 18 years old when admitted to ICU were excluded for analysis.
Since our aim was to identify organ dysfunction progression patterns within 7-days after intubation, missing too many (>50%) SOFA data may mislead understanding the trajectory trends. Hence, we excluded patients with <3 days SOFA data. Specifically, 20 patients of NYP-WCMC cohort were excluded, among which 9, 3, and 3 dead within day 1, 2, 3 after intubation, respectively, 2 have no records, and 2 and 1 only have 2- and 3-days SOFA data, respectively.
11 patients of NYP-LMH cohort were excluded, among which 2, 5, and 4 dead within day 1, 2, 3 after intubation, respectively.
Outliers such as unchanged or heavily fluctuated trajectories which may hamper the model to capture clinically meaningful SOFA progression trends were excluded for analysis. Specifically, 7 unchanged trajectories of NYP-WCM cohort, and 3 and 5 heavily fluctuated trajectories within 7-day post-intubation of NYP-WCMC and NYP-LMH cohorts were excluded for analysis.
The overall inclusion exclusion criteria were provided in Figure E-1.
Tables
Figures
Acknowledgments
We thank the following Weill Cornell Medicine medical students for their contributions to the COVID-19 Registry through medical chart abstraction: Zara Adamou BA, Haneen Aljayyousi BA, Mark N. Alshak BA (student leader), Bryan K. Ang BA, Elena Beideck BS, Orrin S. Belden BS, Sharmi Biswas MD, Anthony F. Blackburn BS, Joshua W. Bliss PharmD, Kimberly A. Bogardus BA, Chelsea D. Boydstun BA, Clare A. Burchenal MPH, Eric T. Caliendo BS, John K. Chae BA, David L. Chang BS, Frank R. Chen BS, Kenny Chen BA, Andrew Cho PhD, Alice Chung BA, Alisha N. Dua MRes, Andrew Eidelberg BS, Rahmi S. Elahjji BA, Mahmoud Eljaby MMSc, Emily R. Eruysal BS, Kimberly N. Forlenza MSc, Rana Khan Fowlkes BA, Rachel L. Friedlander BA, Gary George BS, Shannon Glynn BS, Leora Haber BA, Janice Havasy BS, Alex Huang BA, Hao Huang BS, Jennifer H. Huang BS, Sonia Iosim BS, Mitali Kini BS, Rohini V. Kopparam BS, Jerry Y. Lee BA, Mark Lee BS BA, Aretina K. Leung BA, Han A. Li BA (student leader), Bethina Liu AB, Charalambia Louka BS, Brienne Lubor BS, Dianne Lumaquin BS, Matthew L. Magruder BA, Ruth Moges MSc, Prithvi M. Mohan BS, Max F. Morin BS, Sophie Mou BA, J. J. Nario BS, Yuna Oh BS, Noah Rossen BA, Emma M. Schatoff PhD, Pooja D. Shah BA, Sachin P. Shah BA, Daniel Skaf BS, Shoran Tamura BS, Ahmed Toure BA, Camila M. Villasante BA, Gal Wald BA, Graham T. Wehmeyer BS (student leader), Samuel Williams BA, Ashley Wu BS, Andrew L. Yin BA, Lisa Zhang BA
Footnotes
Author Contributions: ES and FW for conceptualization, investigation, writing, reviewing and editing of the manuscript. CS for data analysis, drafting, editing and reviewing manuscript. ZX, KH for data analysis. TC for providing data support. PG, MS, SAM, LGE, DP, JS, LT, FM, RK, AC for discussion, commenting and editing the manuscript.
Funding: This study received support from NewYork-Presbyterian Hospital (NYPH) and Weill Cornell Medical College (WCMC), including the Clinical and Translational Science Center (CTSC) (UL1 TR000457) and Joint Clinical Trials Office (JCTO). FW and CS are supported by NSF IIS 2027970, 1750326, ONR N00014-18-1-2585.