Abstract
Objective Our primary objective was to use initial data available to clinicians to characterize and predict survival for hospitalized coronavirus disease 2019 (COVID-19) patients. While clinical characteristics and mortality risk factors of COVID-19 patients have been reported, a practical survival calculator based on data from a diverse group of U.S. patients has not yet been introduced. Such a tool would provide timely and valuable guidance in decision-making during this global pandemic.
Design We extracted demographic, laboratory, clinical, and treatment data from electronic health records and used it to build and test the predictive accuracy of a survival probability calculator referred to as “the Northwell COVID-19 Survival (‘NOCOS’) calculator.”
Setting 13 acute care facilities at Northwell Health served as the setting for this study.
Participants 5,233 hospitalized COVID-19–positive patients served as the participants for this study.
Main outcome measures The NOCOS calculator was constructed using multivariate regression with L1 regularization (LASSO) to predict survival during hospitalization. Model predictive performance was measured using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) of the calculators.
Results Patient age, serum blood urea nitrogen, Emergency Severity Index, red cell distribution width, absolute neutrophil count, serum bicarbonate, and glucose were identified as the optimal predictors of survival by multivariate LASSO regression. The predictive performance of the NOCOS calculator had an AUC of 0.832, reaching 0.91 when updated for each patient daily, with stability assessed and maintained for 14 consecutive days. This outperformed other established models, including the Sequential Organ Failure Assessment (SOFA) score (0.732).
Conclusions We present a practical estimate of survival probability that outperforms other general risk models. The seven early predictors of in-hospital survival can help clinicians identify patients with increased probabilities of survival and provide critical decision support as COVID-19 spreads across the U.S.
Trial registration N/A
Introduction
The World Health Organization designated coronavirus disease 2019 (COVID-19) a global pandemic on March 11, 2020, with over 1 million confirmed worldwide cases.1 Estimates of severe disease range from 20% to 30% and case fatality rates from 2% to 7%.2,3 As healthcare facilities across the world struggle to provide care for increasing numbers of critically ill patients, many countries are reporting or anticipating significant ventilator and equipment shortages.4-6 The development of evidence-based tools and processes can facilitate medical decision-making while best aligning treatment plans with patients’ goals of care and likelihood of benefit.7
Predictive models of patient survival are common in medical practice and can facilitate conversations with patients, alignment of appropriate therapy, and just allocation of scarce resources. In particular, during times of acute illness and hospitalization, ensuring appropriate consultations and care plans can be enabled by accurate models of survival and mortality.8 Aiding healthcare workers with robust predictive survival models ensures more informed decision-making and efficient resource allocation while reducing physician stress and burnout. During the current pandemic, concerns about resource limitation and fair and appropriate allocation of resources6,9 can be mitigated by clinically relevant, objective, and accurate decision-support prediction tools. Reports from China have identified age, Sequential Organ Failure Assessment (SOFA) score, and d-dimer level as potential predictors of patient survival.10 Published multivariate models predicting survival in patients with COVID-19 are limited and largely non-peer reviewed, and they have been found to be poorly reported and at risk of bias.11
Our objectives were to use parameters available early to clinicians to characterize and predict survival for hospitalized COVID-19 patients within the largest health system in New York State, the current epicenter of the global COVID-19 pandemic. We consider significant variables reported from previous work and describe the demographics, baseline comorbidities, presenting clinical studies, and outcomes of hospitalized patients with COVID-19. We then present a simple, powerful, and clinically relevant predictive model of patient survival—the Northwell COVID-19 Survival (“NOCOS”) calculator—for all non-mechanically ventilated patients at the time of hospital admission with parameters available early in the care of all patients. The model utilizes routinely collected data, typically available within 60 minutes of patient arrival in the emergency department, and predicts hospital survival at a time that permits planning and proper decision-making around goals of care and resource allocation. This actionable model can be easily implemented and used to support providers during the current worldwide crisis.
Methods
This analysis of a COVID-19 survival calculator uses data from a retrospective cohort study that was approved by the Northwell Health Institutional Review Board. It includes all adult hospitalized patients (i.e., those aged 18 and up) with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection by positive result by polymerase chain reaction testing of a nasopharyngeal sample. Patients were excluded if they were placed on mechanical ventilation before presentation to or in the emergency department. These patients’ clinical characteristics and outcomes are described more completely in a prior publication on this cohort study.12,13 Patients were admitted to 1 of 13 Northwell Health acute care hospitals on or after March 1, 2020, and were discharged or died before April 12, 2020. Clinical outcomes (i.e., discharges, mortality, and length of stay) were monitored until April 12, 2020, the final date of follow-up. With approximately 4,844 hospital beds and 672 intensive-care-unit (ICU) beds and serving approximately 11 million persons in Long Island, Westchester, and New York City, Northwell Health is the largest academic health system in New York. Notably, during the current pandemic, the number of general hospital beds and ICU beds has increased substantially and fluctuates daily.
Data
Data were collected from the enterprise electronic health record (EHR; Sunrise Clinical Manager, Allscripts, Chicago, IL) reporting database. Transfers from one in-system hospital to another were merged and considered as one visit. Data collected included patient demographic information, comorbidities, home medications, Emergency Severity Index (ESI; an objective marker of emergency department presenting acuity), initial laboratory values and studies, prescribed medications, treatments (including oxygen therapy and mechanical ventilation), and outcomes (including length of stay, discharge, and mortality). Initial laboratory testing was defined as having been obtained while the patient was in the emergency department. Continuous variables are presented as median and interquartile range (IQR), and categorical variables are expressed as number of patients (percentage). Acute kidney injury was identified according to the Kidney Disease: Improving Global Outcomes (KDIGO) definition.14 Acute hepatic injury was defined an elevation in aspartate aminotransferase (AST) or alanine aminotransferase (ALT) of greater than 15 times the upper limit of normal. Oxygen requirements were collected for the highest requirement level during the emergency department stay. We used the chi-squared test for categorical variables and Kruskal-Wallis for continuous variables across all groups to test for differences by survival status.
Predictive Modeling
LASSO regression was used to identify a small subset out of 85 EHR measurements that, when linearly combined, predict the survival of hospitalized COVID-19 patients (Table 1).15 By including an L1-norm regularization term that promotes sparsity, LASSO regression is well suited for determining the optimal subset of measurements. The magnitudes of the coefficients relate to the predictive values of the normalized measurements while coefficients of non-predictive measurements converge exactly to 0.
The data is normalized by taking the z-score so that all measurements are sampled from a distribution with 0 mean and a standard deviation of 1. The mean and standard deviation of the measurements with non-0 coefficients are stored as model hyperparameters during training and applied to test data. Missing measurements were imputed to the mean.
The regularization factor λ is another hyperparameter that is determined by sweeping λ over a range, evaluating the performance, and choosing the value that corresponds to the optimal tradeoff between maximizing performance and minimizing the number of predictors. After optimizing for λ, the number of predictors was fixed at 7 inputs. The performance is measured as the area under the Receiver Operating Characteristic (ROC) curve.
The training set is evaluated with the model using leave-one-out cross-validation to prevent overfitting in order to estimate the class conditional distributions (survived and expired) of LASSO predictions as Gaussian likelihood functions. The posterior probability that the patient will survive is where pc(x|µc, σc) is the Gaussian likelihood function estimated from the LASSO predictions that have outcomes for class c that is an element of the set containing survived and expired, P(c= Cc) is the prior probability of class c derived from the training set, and x is the LASSO prediction for a patient.
Two instances of the calculator were tested: one fixed, trained on data acquired until March 29, 2020, and tested daily on new patients; and one retrained daily to incorporate new data.
We also tested the predictive value of the SOFA and CURB-65 Score for Pneumonia Severity as well as a linear regression model termed SOFA+ that uses the SOFA score, age, and D-Dimer>1 μg/mL based on a recently published study.10 All models are tested across all days, from March 30, 2020, to April 12, 2020, using ROC curves and the Area Under the Curve (AUC) metrics with statistical differences in predictive performance tested using the nonparametric DeLong method.16,17 All analyses were performed in Matlab 2019b (Mathworks Inc.).
Patient and Public Involvement
No patients or members of the public were involved in the design or conduct of this study. The authors, as listed, developed the research question and study design and determined the outcome measures. Patients and members of the public will not be involved in choosing the methods for dissemination of the study results. Results of the trial will be disseminated by article publication and a public website featuring the NOCOS calculator.
Results
Between March 1, 2020, and April 12, 2020, of the 5,233 patients admitted with COVID-19, 1,185 died while in the hospital (Table 1). As reported previously, 9 patients who died were more frequently older, white, and non-Hispanic males with a higher comorbidity burden, including coronary artery disease, diabetes mellitus, hypertension, heart failure, and kidney disease. With lower diastolic blood pressure, faster respiratory rate, and lower oxygen saturation, they were generally more acutely ill on emergency department arrival (based upon ESI score). The initial labs were almost all significantly different between survivors and non-survivors (Table 1), although many non-routine labs were not available for all patients. While the length of stay was not different between the groups, expired patients had been far more likely to require mechanical ventilation.
The proposed NOCOS calculator was built after optimizing for L1 regularization parameter lamda, based on out-of-sample AUCs, with multivariate logistic regression choosing 7 out of the 85 possible inputs available in the emergency department as the best predictors of survival upon hospitalization: patient age, serum blood urea nitrogen (BUN), ESI, red cell distribution width (RCDW), absolute neutrophil count, serum bicarbonate, and glucose. The fixed NOCOS calculator was trained using all cases hospitalized until March 30, 2020. The NOCOS calculator was trained every day and tested using data only from the following day. Both fixed and daily retrained versions of NOCOS were compared to clinical benchmarks SOFA and CURB-65 as well as a variation of the SOFA score.10 Based on the ROC and the AUC values, the daily retrained NOCOS calculator—with an AUC of 0.832 while the fixed NOCOS and SOFA+ variation followed very closely (AUC of 0.825 and 0.830 respectively)—outperformed all other calculators (Figure 1). CURB-65 and SOFA score had significantly lower predictive performance than the three aforementioned calculators (AUC of 0.739 and 0.732 respectively, DeLong’s, p<0.05 when compared to the daily retrained NOCOS); they couldn’t always be calculated due to some missing values for the patients.
Operating points to determine performance of survival predictions for all calculators can be established by choosing thresholds on the probability scores. We chose three different operating points for each calculator and provide the numbers of true positives, true negatives, false positives and false negatives, as well as Positive Predictive Value (PPV) and Negative Predictive Value (NPV) for each case (Table 2). In all cases, daily retrained NOCOS outperformed all other calculators.
The NOCOS calculator also demonstrated stability both in its predictive ability and the selection of the predictors across multiple days. As shown in Figure 2, panel A, the NOCOS calculator maintains an AUC value roughly between 0.8 and 0.9 from March 30, 2020, through April 12, 2020, regardless of whether it was trained once or retrained daily. The daily trained NOCOS calculator was significantly more predictive than CURB-65 on 10 out of the 14 days, significantly more predictive than SOFA on 7 out of the 14 days, and significantly more predictive than the fixed NOCOS calculator on 5 out of the 14 days (DeLong’s method, p<0.05). It was not significantly more predictive than SOFA+ on any of the days.
The coefficients of the daily retrained NOCOS calculator, chosen by the LASSO regularization across 7 days, are shown with the counts of the times selected in “Figure 2” : “B.” The final 7 parameters were patient age, ESI, BUN, serum bicarbonate, absolute neutrophil count, RCDW, and serum glucose. Five of these 7 predictors were chosen on at least 13 of the 14 days with the exception of serum bicarbonate and serum glucose, which were both chosen on 6 out of 14 days. Other measurements such as platelet count, body temperature, serum albumin, oxygen saturation, and epidermal growth factor inhibitor (eGFRi) were also chosen on fewer days but were not included in the final build of the model. In the latest iteration of the daily retrained NOCOS calculator (trained with data up to April 11, 2020), the negative predictors of survival in order of their contribution to the probability estimate are: patient age, BUN, RCDW, absolute neutrophil count and serum bicarbonate (Figure 2 panel C). The positive predictors of survival are ESI (lower scores are more acute) and serum glucose (Figure 2, panel C).
The performance of the NOCOS calculator was also tested when not limited only to the ED values of the 7 parameters of a patient, but also when the latest measurements are used as inputs. Figure 3 shows the performance of fixed NOCOS when tested using the up-to-date values of the seven measurements, with the AUC increasing steadily to values close to 0.91.
Discussion
In this study, we successfully developed a simple and practical survival calculator for hospitalized COVID-19 patients using only discrete and objective data values acquired during the patient’s initial time in the emergency department. Our Northwell COVID-19 Survival (NOCOS) calculator, modeled on over 5,200 COVID-19–positive patients, had an AUC of 0.83 and outperformed other well-established risk calculators, including CURB-65, SOFA, while it performed similarly to COVID-19–specific enhancements to SOFA.10 Developed to be parsimonious and easy to use, the predicted survival probability can be used to assist clinical decision-making and ease physician burden in this unprecedented situation. The output of this calculator (which is freely available at https://feinstein.northwell.edu/nocos) provides an easily comprehensible probability, which can be communicated to physicians and nurses, families, and other administrative teams.
The choice of variables included in our model, which were ascertained from the LASSO regularization, all have clinical face validity. It is well established with many diseases, and particularly with COVID-19, that older age confers an increased mortality risk.10 ESI, a well-established ED triage tool, is an early indicator of presenting severity of illness. Abnormal laboratory values included in our model have all been independently associated with negative outcomes in other populations,18,19 and an elevated BUN (as a maker of kidney dysfunction, in particular) was recently shown to increase mortality risk in COVID-19 patients. 20 Elevated values of RCDW, often suggesting chronic disease states and inflammation,21,22 can also be due to recently reported effects of COVID-19 on iron displacement of the heme molecule, leading to impaired red blood cells as well as free radical formation and toxic effect to the lungs.23 These findings suggest potential therapeutic approaches to reduce sudden decompensation, organ failure, and death of these patients.
A major strength of this work is the development of a powerful predictive model typically usable for clinicians within 60 minutes of a patient’s initial presentation. Although the calculator performs well with these very early measurements, it improves its predictive performance when these measurements are updated throughout the hospitalization of the patient (Figure 3), showing that, as expected, the most accurate prediction is given with the most up-to-date values of the seven measures. We also restricted inputs to commonly collected, discrete, and objective data. Its sheer simplicity and reliance on quantitative measurements makes it generalizable and easy to deploy to all interested stakeholders, including front-line providers and hospital administrators organizing distribution of scarce and limited resources. While we present the calculator output as a probability score, a specific operating point can also be chosen to provide a binary outcome prediction with significant accuracy. Choosing an operating point is left up to stakeholders; local clinical teams have flexibility to adjust thresholds toward a more stringent or risk-averse solution (Table 2), based on the rapidly changing needs during this pandemic.
Calculating estimates of survival or mortality using clinical measurements can extend from simple algorithmic rules and thresholds to linear regression models and more complex machine learning (ML) algorithms. Attempting to augment medical decision-making, studies ranging from modulating single parameters to advanced predictive modeling have been applied to forecast decompensation, mortality, and survival among other clinical outcomes.24-26 Early work with small patient cohorts of COVID-19 has led to models that identify some clinical characteristics that can be applied to predict severe cases (Yan et al., 2020, Jiang et al., 2020).27,28 However, these studies are limited to small numbers of patients as well as the inclusion of qualitative and subjective variables, are prone to mislabeling, and are not always readily available. Our approach benefits from a simple, straightforward formula of typical measurements acquired from ED patients; a patient base at least 20-fold larger than previous studies; and an approach of data-true feature selection based on their predictive value through the LASSO regularization.
Due to the challenging situation during the ongoing COVID-19 global health crisis, there is a need for robust tools to aid in complex clinical decision-making. Using well-known clinical calculators such as SOFA or CURB-65 shows ostensible promise; however, these calculators have limitations in both their accuracy and the ease of collecting necessary measurements to construct these scores. Input variables such as confusion (for the CURB-65 score) and Glasgow Coma Scale (for the SOFA score) are ambiguous, hard to measure, and frequently unavailable. Similar difficulties are encountered when trying a novel combination of SOFA score with age and D-dimer values.10 In our study, 78.3% of patients were missing the D-dimer measurement in the emergency department. In contrast, the NOCOS calculator is based on commonly collected laboratory results and a guideline based ESI triage acuity score.
Moreover, the calculator is trained and tested on the patient cohort of interest and can account for the evolving nature of this pandemic by daily or more frequent updates and model retraining.29
The proposed calculator has some limitations. It was designed to be linear with only essential predictors included, and non-linear or convolutional/recurrent models may provide improved performance. Moreover, the model is not integrating additional, more complex information such as radiology X-ray or CT-scan reads. Due to the retrospective study design, not all laboratory tests—including lactate dehydrogenase, interleukin-6, and serum ferritin—were done on all patients, and the performance of these variables could not be adequately assessed. These data were automatically extracted from the EHR database, and some patient-level details could not be extracted. However, our NOCOS calculator aimed to leverage easily obtainable data, obviating the need for sifting through charts to obtain a predictive result.
Given the complexity of data acquisition and model development in the midst of a pandemic, we prioritized the creation and rapid dissemination of a more straightforward, clinically relevant implementation. While the model validation contained patients admitted to hospitals within the New York metropolitan area, we believe it will generalize well given the diverse demographic composition of the region and the Northwell Health patient population.
In an unprecedented way, the severity of the SARS-CoV-2 pandemic has strained hospitals’ resources, including space, materials, and front-line healthcare workers. Providers are often forced to take important clinical decisions under immense time pressure and limited information. Tools that could aid them and patients in these circumstances are timely and important. The Northwell COVID-19 Survival calculator answers a clinical need and provides early information to physicians making a range of difficult-but-critical decisions every day.
Data Availability
The data that support the findings of this study are available on request from COVID19@northwell.edu. The data are not publicly available due to restrictions as it could compromise the privacy of research participants.
Contributor and guarantor information
The lead authors contributed to the planning, conduct, and reporting of the work described in the article, and the Northwell COVID-19 Research Consortium contributed to the article’s development prior to submission. Theodoros P. Zanos is responsible for the overall content as guarantor. The guarantor accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish. The Corresponding Author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Copyright/license for publication
The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide licence to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future), to i) publish, reproduce, distribute, display and store the Contribution, ii) translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution, iii) create any other derivative work(s) based on the Contribution, iv) to exploit all subsidiary rights in the Contribution, v) the inclusion of electronic links from the Contribution to third party material where-ever it may be located; and, vi) licence any third party to do any or all of the above.
Competing interests
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Data sharing statement
The data that support the findings of this study are available on reasonable request from COVID19{at}northwell.edu. The data are not publicly available due to restrictions as it could compromise the privacy of research participants.
Ethics approval
This study did not require ethics approval.
Transparency statement
The manuscript’s guarantor affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned have been explained.
Role of the funding sources
This work was supported by grants R24AG064191 from the National Institute on Aging of the National Institutes of Health and R01LM012836 from the National Library of Medicine of the National Institutes of Health. The views expressed in this paper are those of the authors and do not represent the views of the National Institutes of Health, the United States Department of Health and Human Services, or any other government entity.
Summary box
Section 1: What is already known on this topic
While clinical characteristics and a range of mortality risk factors of coronavirus disease 2019 (COVID-19) patients have been reported, a practical clinical survival calculator based on data from U.S. patients has not yet been introduced.
Such a tool would provide timely and valuable guidance in clinical care decision-making during this global pandemic.
Section 2: What this study adds
We present a practical estimate of survival probability that outperforms other general risk models.
The seven early predictors of in-hospital survival can help clinicians identify patients with increased probabilities of survival and provide critical decision support as COVID-19 spreads across the U.S.
Acknowledgments
We acknowledge and honor all of our Northwell team members who consistently put themselves in harm’s way during the COVID-19 pandemic. We dedicate this article to them, as their vital contribution to knowledge about COVID-19 and sacrifices on the behalf of patients made it possible.