Supplementing the National Early Warning Score (NEWS2) for anticipating early deterioration among patients with COVID-19 infection ================================================================================================================================== * Ewan Carr * Rebecca Bendayan * Kevin O’Gallagher * Daniel Bean * Andrew Pickles * Daniel Stahl * Rosita Zakeri * Thomas Searle * Anthony Shek * Zeljko Kraljevic * James T. Teo * Ajay M. Shah * Richard JB Dobson ## Abstract **Importance** An early minimally symptomatic phase is often followed by deterioration in patients with COVID-19 infection. This study shows that the addition of age and a minimal set of common blood tests taken in patients on admission to hospital significantly improves the National Early Warning Score (NEWS2) for risk-stratification of severe COVID disease. **Objective** To supplement the NEWS2 score with a small number of easily obtained additional demographic, physiological and blood variables indicative of severity of COVID-19 infection. **Design** Retrospective observational cohort with internal and temporal held-out external validation. **Setting** Acute secondary care. **Participants** 708 patients admitted to an acute multi-site UK NHS hospital with confirmed COVID-19 disease from 1st March to 5th April 2020. **Intervention** Not applicable. **Main outcome and measures** The primary outcome was patient status at 14 days after symptom onset categorised as severe disease (WHO-COVID-19 Outcomes Scales 6-8: i.e. transferred to intensive care unit or death). 218 of the 708 patients reached the primary end point. A range of physiological and blood biomarkers were assessed for their association with the primary outcome. Adjustments included age, gender, ethnicity and comorbidities (hypertension, diabetes, heart, respiratory and kidney diseases). **Results** NEWS2 total score was a weak predictor for severity of COVID-19 infection at 14 days (internally validated AUC = 0.628). The addition of age and common blood tests (CRP, neutrophil count, estimated GFR and albumin) provided substantial improvements to a risk stratification model but performance was still only moderate (AUC = 0.75). Common comorbidities hypertension, diabetes, heart, respiratory and kidney diseases have minor additional predictive value. **Conclusions and relevance** Adding age and a minimal set of common blood parameters to NEWS2 improves the risk stratification of patients likely to develop severe COVID-19 outcomes. The addition of a few common parameters is likely to be much easier to implement in a short time-scale than a novel risk-scoring system. ## Introduction While approximately 80% of individuals with COVID-19 infection have mild or no symptoms1, some develop severe COVID-19 disease requiring hospital admission. As of 23rd April 2020, there have been >2.5 million confirmed cases worldwide2. Within the subset of those requiring hospitalisation, early identification of those who deteriorate and require transfer to an intensive care unit (ICU) for organ support or may die is invaluable12. Currently available risk scores for deterioration of acutely ill patients include (1) widely-used generic ward-based risk indices such as the National Early Warning Score (NEWS2)3 or modified sequential organ failure assessment (mSOFA)4; and (2) the pneumonia-specific risk index, CURB-655 which usefully capture a combination of physiological observations with limited blood markers and comorbidities. The NEWS2 is a summary score of six physiological parameters or ‘vital signs’ (respiratory rate, oxygen saturation, systolic blood pressure, heart rate, level of consciousness, temperature and supplemental oxygen dependency), used to identify patients at risk of early clinical deterioration in the UK NHS hospitals6,7. The physiological parameters assessed in the NEWS2 score - particularly patient temperature, oxygen saturations and the supplemental oxygen dependency - have been associated with COVID-19 outcomes1; however, little is known about their predictive value for the severity of COVID-19 disease. Additionally, a number of COVID-19-specific risk indices are being developed8–10 as well as unvalidated online calculators11 but generalisability is not yet known10. A Chinese study has suggested a modified version of NEWS2 with addition of age only12 but without any data on performance. With near universal usage of NEWS2 in UK NHS Trusts since March 201913, minor adaptation to NEWS2 would be relatively easy to implement. As the SARS-Cov2 pandemic has progressed, evidence has emerged regarding potentially useful blood biomarkers1,14–17. Although most of these early reports contain data from small numbers of patients, a number of markers have been found to be associated with severity. These include neutrophilia and lymphopenia, particularly in older adults9,16,18,19, neutrophil-to-lymphocyte ratio20, raised C-Reactive Protein (CRP) and lymphocyte-to-CRP ratio20, markers of liver and cardiac injury such as alanine aminotransferase (ALT), aspartate aminotransferase (AST) and cardiac troponin21 and elevated D-dimers, ferritin and fibrinogen2,5,7. Furthermore, plasma levels of cytokines such as IL-6 have been found to be higher in COVID-19 patients compared to controls1. Our aim is to understand the performance of NEWS2 and identify a supplemental combination of simple clinical and blood biomarkers routinely measured in hospitals to supplement the NEWS2 score to improve prediction of a severe disease outcome at 14 days from symptom onset. To reach this aim, our specific objectives were: 1. To explore independent associations of routinely measured physiological and blood parameters (including NEWS2 parameters) at or near hospital admission with disease severity (i.e., ICU admission or death), adjusting for socio-demographics and comorbidities. 2. To examine which minimal combination of these potential determinants of disease severity (physiological and blood parameters, sociodemographics and comorbidities) are the best predictors of disease severity at 14 days since symptom onset; and 3. To compare the predictive value of the resulting model with a model based on the NEWS2 total score alone. ## Methods ### Patients The study cohort was defined as all adult inpatients testing positive for SARS-Cov2 by reverse transcription polymerase chain reaction (RT-PCR) between 1st March to 5st April 2020 at a multi-site acute NHS hospital in South East London (UK). The catchment area of King’s College Hospital NHS Foundation Trust includes the most severely affected part of the UK during the current pandemic. All patients included in the study had symptoms consistent with COVID-19 disease (e.g. cough, fever, dyspnoea, myalgia, delirium). We excluded subjects who were seen in the emergency department but not admitted. For purposes of temporal external validation, detailed below, patients were split into training and temporal external validation samples, with those tested positive before 31st March 2020 assigned to training, and those tested positive on/after 31st March 2020 assigned to validation. This project operated under London South East Research Ethics Committee (reference 18/LO/2048) approval granted to the King’s Electronic Records Research Interface (KERRI); specific work on COVID-19 research was reviewed with expert patient input on a virtual committee with Caldicott Guardian oversight. ### Data Processing The data (demographics, emergency department letters, discharge summaries, clinical notes, lab results, vital signs) were retrieved and analyzed in near real-time from the structured and unstructured components of the electronic health record (EHR) using a variety of natural language processing (NLP) informatics tools belonging to the CogStack ecosystem22, namely MedCAT23 and MedCATTrainer24. The CogStack NLP pipeline captures negation, synonyms, and acronyms for medical SNOMED-CT concepts as well as surrounding linguistic context using deep learning and long short-term memory networks. MedCAT produces unsupervised annotations for all SNOMED-CT concepts under parent terms Clinical Finding, Disorder, Organism, and Event with disambiguation, pre-trained on MIMIC-III25. The annotated SNOMED-CT terms are summarised in Supplementary Table 1. Starting from our previous model26, further supervised training improved detection of annotations and meta-annotations such as experiencer (is the concept annotated experienced by the patient or other), negation (is the concept annotated negated or not) and temporality (is the concept annotated in the past or present) with MedCATTrainer. Meta-annotations for hypothetical, historical and experiencer were merged into “Irrelevant” allowing us to exclude any mentions of a concept that do not directly relate to the patient currently. Performance of the MedCAT NLP pipeline for disorders mentioned in the text was evaluated on 4343 annotations in 146 clinical documents by a clinician (JT). F1 scores, precision, and recall are presented in Supplementary Table 2. ### Measures #### Outcome The primary outcome was patient status at 14 days after symptom onset, or admission to hospital where symptom onset was missing, categorised as transfer to ICU/death (WHO-COVID-19 Outcomes Scales 6-8) vs. not ICU/death (Scales 3-5). The WHO-COVID-19 Outcome Scales 6-7 incorporate admission to an ICU while Outcome Scale 8 indicates death. Date of symptom onset, date of ICU transfer and date of death were ascertained and verified manually by a clinician. #### Blood parameters We focused on biomarkers that were routinely obtained at or shortly after admission and were therefore available for the vast majority of patients. These comprised: albumin (g/L), alanine aminotransferase (ALT; IU/L), creatinine (µmol/L), C-reactive protein (CRP; mg/L), estimated Glomerular Filtration Rate (eGFR; mL/min), Haemoglobin (g/L), lymphocyte count (x 109/L), neutrophil count (x 109/L), and platelet count (PLT; × 109/L). We also derived the neutrophil-to-lymphocyte ratio (NLR) and the lymphocyte-to-CRP ratio13. Troponin-T (ng/L) and Ferritin (ug/L) were included, although these measures were only available for a subset of participants. D-dimers and HbA1c were excluded since they were measured in very few patients at admission and insufficient samples were available for analysis. #### Physiological parameters We included the six physiological parameters that form the basis of the NEWS2 score, namely, respiratory rate (breaths per minute), oxygen saturation (%), systolic blood pressure (mmHg), heart rate (beats/min), temperature (°C), and consciousness (measured by Glasgow Coma Scale (GCS) total score). All were measured at or shortly after admission. We assessed these parameters individually as well as a NEWS2 total score. Diastolic blood pressure, which is not part of the NEWS2 score, was also included in the analyses. #### Demographics and comorbidities Age, sex, ethnicity and comorbidities were considered. Where ethnicity data was available this was categorised as caucasian vs. BAME (Black, Asian and minority ethnic). For supplementary models adjusting for ethnicity, patients with ethnicity reported as ‘unknown/mixed/other’ were excluded. We included binary measures (present vs. not present) of relevant comorbid chronic health conditions derived from the NLP pipeline described above: hypertension, diabetes, heart disease (heart failure and ischemic heart disease), respiratory disease (asthma and chronic obstructive pulmonary disease, COPD) and chronic kidney disease. ### Statistical analyses Preliminary descriptive and exploratory analyses were performed. To address our first objective – exploring independent associations of physiological and blood parameters with 14-day death/ICU – we used penalised maximum likelihood logistic regression which reduces bias due to small sample size27. Each parameter was tested independently, adjusted for age and sex (Model 1) and then additionally adjusted for comorbidities (Model 2). Parameters exhibiting skewed distributions were transformed before modelling with logarithmic or square-root transformations. All parameters were scaled (mean = 0, standard deviation = 1) to improve interpretability. Outlying high values for some blood parameters were retained after individual examination by clinicians who ascertained their plausibility. We used the maximal available sample when testing each parameter. Given the number of tests conducted, *P*-values were adjusted using the Benjamini-Hochberg procedure to keep the False discovery rate at 5%28. These models were conducted with R 3.623 using the logistf24 package. To address our second and third objectives – which combination of parameters performed best in predicting the 14-day outcome over and above NEWS2 – we estimated models combining all parameters using regularized logistic regression with a LASSO (Least Absolute Shrinkage and Selection Operator) estimator which shrinks parameters according to their variance, reduces overfitting and enables automatic variable selection29. The optimal degree of regularization was determined by identifying a tuning parameter λ using cross-validation30. LASSO regression provides a sparse, interpretable model, which allows us to predict individual risk scores (i.e. probability of severe outcome). Starting from an initial model with NEWS2 total score only, sets of features were added in order of (i) age and sex, (ii) blood and physiological parameters; (iii) comorbid conditions. A final model was estimated using NEWS2 total score alongside the top five most influential features from previous models. To estimate the predictive performance of our model on new unseen cases of the same underlying population, we performed internal nested cross-validation (10 folds and 20 repeats for the inner loop; 10 folds and 100 repeats for the outer loop). Overall discrimination was assessed based on the area under the curve (AUC). All continuous features were scaled (mean = 0, standard deviation = 1). Missing feature information was imputed (after scaling) using k-Nearest Neighbours imputation (k=5).Scaling and kNN imputation were incorporated within the model development and selection process to avoid data leakage which would otherwise result in optimistic performance measures31. To assess whether a more complex machine learning estimator would improve predictive performance, we repeated this set of models using gradient boosted trees implemented in the XGBoost library32. Procedures for internally validating these models were equivalent to those described above for regularized logistic regression except the imputation step was omitted due to the ability of XGBoost to handle missing data. The predictive performance of the derived regularized logistic regression model was then evaluated by temporal external validation33 with a hold-out sample of 256 patients who were admitted to hospital after the training sample (see Supplementary Figure 1). This involved estimating the original model exactly as presented, including scaling and imputation models derived in the training data set. Discrimination performance was assessed using AUC, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Model calibration was assessed using a calibration plot (model predicted probability vs. true probability). These models were estimated in Python 3.634 using NumPy35, and Scikit-Learn36. Sensitivity analyses were performed to account for potential demographic variability. Recent evidence suggest sex differences with men more likely to experience worse outcomes16. Therefore, in separate models, we tested interactions between each physiological and blood parameter and sex using likelihood-ratio tests (comparing a null model with the main effects only vs. a model additionally including the interaction term). In addition, we replicated all models with adjustment for ethnicity in the subset of individuals with available data for ethnicity (n=285 in training sample). ## Results The initial inpatient cohort comprised 452 inpatients testing positive for COVID-19 of whom 159 (35%) were transferred to ICU or died (COVID-19 WHO Score 6-8) within 14 days of symptom onset. Table 1 describes the clinical characteristics of the cohort: the mean age was 67 years (standard deviation = 18.5); 54% (n=248) were male; 42% (n=120) were categorised as BAME. Patients associated with a more severe outcome were significantly older (71 vs. 65 years; p = 0.004) but there was no evidence of differences by sex or ethnicity. There were some differences between groups in the prevalence of comorbidities but these did not reach statistical significance after multiple testing correction. For example, compared to patients with less severe outcomes, those who transferred to ICU or died had higher rates of hypertension (60% vs. 50%; p = 0.11), diabetes (38% vs. 32%; p = 0.33), heart failure (16% vs. 11%; p = 0.33) and chronic kidney disease (24% vs. 16%; p = 0.11). Rates of other comorbidities were similar between the two groups. There were differences between outcome groups for most blood and physiological parameters. Patients who had transferred to ICU or died within 14 days had, at admission, lower levels of Albumin, ALT, and estimated GFR; and elevated levels of CRP, creatinine, Ferritin, and Neutrophils. Mean NEWS2 total scores were significantly different (3.4 vs 2.1; p < 0.001; corresponding to Cohen’s d of −0.57) in patients who transferred to ICU or died, compared to inpatients experiencing less severe outcomes. View this table: [Table 1:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T1) Table 1: Patient characteristics at hospital admission Logistic regression models were used to assess independent associations between each physiological and blood parameter and disease severity measured as transfer to ICU or death (Table 2). Individuals were more likely to have transferred to ICU/died within 14 days of symptom onset if: they had higher CRP, NEWS2 score, heart rate, neutrophils, neutrophil-lymphocyte ratio, respiration rate; or if they had lower lymphocyte/CRP ratios, eGFR, creatinine, and oxygen saturation. These associations remained after adjustment for age, sex and comorbidities. There was no evidence of differences by sex (results not presented) and findings were consistent when additionally adjusting for ethnicity in secondary analyses using the subset of individuals with ethnicity data (Supplementary Table 3). View this table: [Table 2:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T2) Table 2: Logistic regression models for each blood and physiological measure tested separately, sorted by effect size ### Combining physiological and blood parameters to assess ability to improve on NEWS2 in predicting 14-day outcome To identify which minimal set of parameters were best able to improve on NEWS2 in predicting the 14-day outcome (ICU/death vs. not ICU/death), we combined all predictors in a single logistic regression model using LASSO regularisation. Internally validated predictive performance based on the area under the ROC curve (AUC) is presented in Table 3 for different feature sets. NEWS2 shows poor discrimination with an AUC of 0.628. Adding age and sex to a baseline model of NEWS2 total score only increased the AUC by 0.025 to 0.653 (+/- 2SD range: 0.639, 0.667). Further adding in all other blood and physiological parameters (except NEWS2) increased the AUC further by 0.089, to 0.742 (+/- 2SD: 0.726, 0.758). Additionally including comorbidities in this model did not improve performance. A final model was estimated including NEWS2 and the top five most important features taken from Model 4. This simpler model resulted in a slightly larger AUC of 0.751 (+/- 2SD range: 0.737, 0.764) which may indicate some overfitting due to the pre-selection of variables from previous analyses. Results were consistent when repeating these models in the subset of patients with information available on ethnicity (Supplementary Table 5). View this table: [Table 3:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T3) Table 3: Internally validated predictive performance (n=452) *Notes*. AUC based on repeated, nested cross-validation (inner loop: 10-fold, 20 repeats; outer loop = 10-fold, 100 repeats). Missing values imputed at each outer loop with k-Nearest Neighbours (KNN) imputation. Figure 1 summarises feature importances from the LASSO logistic regression models. When adding blood and physiological parameters to NEWS2 (‘NEWS2 + DBP’), 8 features were retained, in order of effect sizes: NEWS2 total score, CRP, neutrophils, estimated GFR, albumin, age, Troponin T, and oxygen saturation. Notably, when additionally considering comorbid conditions (‘NEWS2 + DBPC’), the retained features were similar, and no comorbid conditions were retained. This suggests that most of the variance is already captured by the top 5 parameters. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/04/29/2020.04.24.20078006/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/F1) Figure 1: Feature importances from LASSO logistic regression in training sample (n=452) *Notes*. Feature importances refer to absolute values of standardised coefficients from logistic regression, sorted by effect size in model ‘NEWS2 + DBPC’. Where a feature is labelled on the y-axis, it was entered into the model. Features retained following LASSO regularisation are represented by a coloured bar; the absence of a bar indicates that this feature was omitted during regularisation. When these models were repeated using a more complex estimator (gradient boosted trees, using XGBoost32) the pattern of results was consistent with those from regularized logistic regression (Supplementary Table 5). Namely, the internally validated AUC improved from 0.646 for a model with NEWS2 alone, to 0.722 for a model that additionally included the five parameters: CRP, neutrophils, estimated GFR, albumin, and age. Importantly, while the pattern of results was consistent, a more complex machine learning estimator produced no improvements to predictive performance. Temporal external validation was conducted on a hold-out sample of 256 patients. This sample was similar to the training sample on all parameters (Supplementary Table 6) except the proportion who transferred to ICU or died was lower. Overall, results from the hold-out sample were consistent with those from internal validation. The AUC for NEWS2 alone was 0.700, and this improved to 0.730 when adding all blood and physiological parameters (sensitivity = 0.441; specificity = 0.873). The AUC for the simplified final model including NEWS2 and the top five features (CRP, neutrophils, estimated GFR, albumin and age) was similar (AUC = 0.730; sensitivity = 0.458; specificity = 0.873) (Supplementary Table 7). Calibration for these models (Supplementary Figure 2) was acceptable but showed some consistent overestimation of risk probabilities. ## Discussion To our knowledge our study is the first to systematically attempt to improve performance of NEWS2 specifically for COVID-19. We found that the NEWS2 score shows overall poor discrimination with high specificity but poor sensitivity for severe outcomes in COVID-19 infection (transfer to ICU or death). However, its value for risk stratification (especially sensitivity) can be significantly improved by adding age and a small number of additional blood parameters (CRP, neutrophils, estimated GFR and albumin). A number of blood measures previously linked with more severe outcomes – such as lymphocyte and ALT14, or transformations of inflammatory markers such as CRP/lymphocyte or neutrophil/lymphocyte ratio – did not provide additional value to the model over and above the existing features despite being more common in those individuals with more severe outcomes. Moreover, cardiac disease and myocardial injury has been described to be commonly seen in the severe COVID-19 cases in China1,21. In our model, blood Troponin-T, a marker of myocardial injury, had additional salient signal but was only measured in a subset of our cohort at admission, so it was not included in our final model. This would have to be explored further in larger datasets. A systematic review of 10 prediction models for mortality in COVID-19 infection10 found broad similarities with the features retained in our models, particularly regarding CRP and neutrophil levels. However, existing prediction models suffer several methodological weaknesses including over-fitting, selection bias, and reliance on cross-sectional data without accounting for censoring. Additionally, almost all existing studies have relied on ethnically homogenous Chinese cohorts and thus may be unrepresentative of other global populations. With regards to pre-existing disease comorbidities (hypertension, diabetes mellitus, heart failure, ischaemic heart disease, COPD, asthma and chronic kidney disease), these were more common in patients with severe outcomes but had minimal contribution to the risk prediction and were not retained in the final model. This was unexpected and suggests potential shared variance between pre-existing health conditions and some of the included blood or physiological markers. Future research should explore further the potential underlying shared mechanisms that can predict deterioration. NEWS2 is a summary score derived from six physiological parameters, including oxygen saturation. While NEWS2 total score was one of the most influential parameters in our models, the oxygen saturation sub-parameter remained influential and was retained following regularisation (i.e. model ‘NEWS + DBP’). This suggests some residual association over and above what is captured by the NEWS2 score between oxygen saturation and more severe outcomes, and reinforces Royal College of Physicians guidance that the NEWS2 score ceilings with respect to respiratory function37. ### Strengths and limitations Our study included data from a large sample of patients admitted to hospital with high rates of the primary outcome (transfer to ICU or death) and considered a large number of potential predictors including demographics, physiological and blood parameters and comorbidities. However, some limitations should be acknowledged. First, there are likely to be other parameters not measured in this study that could improve the risk stratification model substantially (e.g. radiological features, other comorbidities or comorbidity load). This could be addressed by future work to introduce additional data modalities, but these were not considered in the present study to avoid limiting the real-world implementation of the risk stratification model; a complex model with many parameters will be harder to implement in clinical practice. Second, we used a 14-day time window from the symptom onset date as this provides a balance between medium-term prognostication and actionable risk stratification at the usual period of deterioration. Longer timeframes may be useful for prognostication but are harder to generalise due to the greater number of factors affecting outcomes, including institutional, regional or national policies. Since NEWS2 score is optimised for very near-term deterioration at 24 hours7, a 14-day window was used as a compromise. Third, while the hold-out sample used for temporal external validation was similar in terms of demographics, blood and physiological parameters, the rate of more severe outcomes differed significantly. Perhaps due to changes in hospital procedures over time, this again suggests the need to validate these models in other hospitals or regions. Finally, while the model was derived from two hospital sites providing a mixed population, this study highlights that initial prediction models still have poor sensitivity and recalibration would be required before implementation as a risk model in clinical practice. Validation across datasets from a wider geographical region will be necessary to ensure generalisability. ## Conclusion In conclusion, this study suggests that the simple addition of a limited number of blood parameters to the existing and widely implemented NEWS2 system can contribute to improved risk stratification among COVID-19 patients. Our model can be easily implemented in clinical practice and predicted risk score probabilities of individual patients are easy to communicate. The additional parameters are widely collected on patients at hospital admission, and with near universal usage of NEWS2 in NHS Trusts since March 201913, a minor adaptation to NEWS2 is substantially easier to implement in a variety of health settings than a bespoke risk score. ## Data Availability The data are not publicly available. ## Supplementary Materials ![Supplementary Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/04/29/2020.04.24.20078006/F2.medium.gif) [Supplementary Figure 1:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/F2) Supplementary Figure 1: Timing of 14-day endpoints for training (n=452) and validation (n=256) samples ![Supplementary Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/04/29/2020.04.24.20078006/F3.medium.gif) [Supplementary Figure 2:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/F3) Supplementary Figure 2: Calibration plot from temporal external validation View this table: [Supplementary Table 1:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T4) Supplementary Table 1: SNOMED terms View this table: [Supplementary Table 2:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T5) Supplementary Table 2: F1, precision and recall for NLP co-morbidity detection MedCATTrainer24 was used to collect manual annotations for 146 clinical documents totalling 4343 annotations. Each co-morbidity is defined using one or more SNOMED terms. Predicted true positive labels (TP), precision (P), recall (R), F1-score (F1) are shown for these aggregated concepts. These results only consider entity detection and not meta annotation. View this table: [Supplementary Table 3:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T6) Supplementary Table 3: Logistic regression models for each blood measure tested separately, adjusted for ethnicity for patients with information on ethnicity View this table: [Supplementary Table 4:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T7) Supplementary Table 4: Internally validated predictive performance, adjusted for ethnicity for patients with information on ethnicity (n=285) View this table: [Supplementary Table 5:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T8) Supplementary Table 5: Internally validated predictive performance using XGBoost (Gradient Boosting Trees) (n=452) AUC based on repeated, nested cross-validation (inner loop: 10-fold, 20 repeats; outer loop = 10-fold, 100 repeats). View this table: [Supplementary Table 6:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T9) Supplementary Table 6: Comparison of training and held-out validation samples View this table: [Supplementary Table 7:](http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078006/T10) Supplementary Table 7: Temporal external validation, using hold-out sample (n=256) ## Acknowledgments DMB is funded by a UKRI Innovation Fellowship as part of Health Data Research UK MR/S00310X/1 ([https://www.hdruk.ac.uk](https://www.hdruk.ac.uk)). RB is funded in part by grant MR/R016372/1 for the King’s College London MRC Skills Development Fellowship programme funded by the UK Medical Research Council (MRC, [https://mrc.ukri.org](https://mrc.ukri.org)) and by grant IS-BRC-1215-20018 for the National Institute for Health Research (NIHR, [https://www.nihr.ac.uk](https://www.nihr.ac.uk)) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. RJBD is supported by: 1. Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome Trust. 2. The BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement No. 116074. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA; it is chaired by DE Grobbee and SD Anker, partnering with 20 academic and industry partners and ESC. 3. The National Institute for Health Research University College London Hospitals Biomedical Research Centre. 4. National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. KO’G is supported by an MRC Clinical Training Fellowship. RZ is supported by a King’s Prize Fellowship. AS is supported by a King’s Medical Research Trust studentship. KO is supported by grant MR/R017751/1 AMS is supported by the British Heart Foundation (CH/1999001/11735), the National Institute for Health Research (NIHR) Biomedical Research Centre at Guy’s & St Thomas’ NHS Foundation Trust and King’s College London (IS-BRC-1215-20006), and the Fondation Leducq. AP is partially supported by NIHR NF-SI-0617-10120. This work was supported by the National Institute for Health Research (NIHR) University College London Hospitals (UCLH) Biomedical Research Centre (BRC) Clinical and Research Informatics Unit (CRIU), NIHR Health Informatics Collaborative (HIC), and by awards establishing the Institute of Health Informatics at University College London (UCL). This work was also supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and the Wellcome Trust. This paper represents independent research part funded by the National Institute for Health Research (NIHR) Biomedical Research Centres at South London and Maudsley NHS Foundation Trust, and Guy’s & St Thomas’ NHS Foundation Trust, both with King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We would also like to thank all the clinicians managing the patients, the patient experts of the KERRI committee, Professor Irene Higginson, Professor Alastair Baker, Professor Jules Wendon, Dan Persson and Damian Lewsley for their support. ## Footnotes * * joint author * Received April 24, 2020. * Revision received April 24, 2020. * Accepted April 29, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet. 2020;395(10229):1054–1062. doi:10.1016/S0140-6736(20)30566-3 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30566-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32171076&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 2. 2.WHO. WHO COVID-19 Dashboard. [https://who.sprinklr.com/](https://who.sprinklr.com/). Published 2020. Accessed April 20, 2020. 3. 3.Scott LJ, Redmond NM, Tavaré A, Little H, Srivastava S, Pullyblank A. Association between National Early Warning Scores in primary care and clinical outcomes: an observational study in UK primary and secondary care. Br J Gen Pract. April 2020. doi:10.3399/bjgp20X709337 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYmpncCI7czo1OiJyZXNpZCI7czoxMToiNzAvNjk1L2UzNzQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wNC8yOS8yMDIwLjA0LjI0LjIwMDc4MDA2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 4. 4.Lambden S, Laterre PF, Levy MM, Francois B. The SOF. score—development, utility and challenges of accurate assessment in clinical trials. Crit Care. 2019;23(1):374. doi:10.1186/s13054-019-2663-7 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13054-019-2663-7&link_type=DOI) 5. 5.Lim WS, Eerden MM van der, Laing R, et al. Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study. Thorax. 2003;58(5):377–382. doi:10.1136/thorax.58.5.377 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToidGhvcmF4am5sIjtzOjU6InJlc2lkIjtzOjg6IjU4LzUvMzc3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDQvMjkvMjAyMC4wNC4yNC4yMDA3ODAwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 6. 6. Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the Assessment of Acute-Illness Severity in the NHS. Updated Report of a Working Party. London: RCP; 2017. 7. 7.Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84(4):465–470. doi:10.1016/j.resuscitation.2012.12.016 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.resuscitation.2012.12.016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23295778&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 8. 8.Ji D, Zhang D, Xu J, et al. Prediction for Progression Risk in Patients with COVID-19 Pneumonia: the CALL Score. Clin Infect Dis. doi:10.1093/cid/ciaa414 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciaa414&link_type=DOI) 9. 9.Shi Y, Yu X, Zhao H, Wang H, Zhao R, Sheng J. Host susceptibility to severe COVID-19 and establishment of a host risk score: findings of 487 cases outside Wuhan. Crit Care. 2020;24(1):108. doi:10.1186/s13054-020-2833-7 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13054-020-2833-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32188484&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 10. 10.Wynants L, Calster BV, Bonten MMJ, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ. 2020;369. doi:10.1136/bmj.m1328 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNjkvYXByMDdfMi9tMTMyOCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA0LzI5LzIwMjAuMDQuMjQuMjAwNzgwMDYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. 11.COVIDAnalytics. [https://www.covidanalytics.io/calculator](https://www.covidanalytics.io/calculator). Accessed April 21, 2020. 12. 12.Liao X, Wang B, Kang Y. Novel coronavirus infection during the 2019–2020 epidemic: preparing intensive care units—the experience in Sichuan Province, China. Intensive Care Med. 2020;46(2):357–360. doi:10.1007/s00134-020-05954-2 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00134-020-05954-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32025779&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 13. 13.NHS England » National Early Warning Score (NEWS). [https://www.england.nhs.uk/ourwork/clinical-policy/sepsis/nationalearlywarningscore/](https://www.england.nhs.uk/ourwork/clinical-policy/sepsis/nationalearlywarningscore/). Accessed April 23, 2020. 14. 14.Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2020;395(10223):497–506. doi:10.1016/S0140-6736(20)30183-5 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30183-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31986264&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 15. 15.Li K, Wu J, Wu F, et al. The Clinical and Chest CT Features Associated with Severe and Critical COVID-19 Pneumonia. Invest Radiol. 2020;Publish Ahead of Print. doi:10.1097/RLI.0000000000000672 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/RLI.0000000000000672&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32118615&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 16. 16.Xie J, Tong Z, Guan X, Du B, Qiu H. Clinical Characteristics of Patients Who Died of Coronavirus Disease 2019 in China. JAMA Netw Open. 2020;3(4):e205619–e205619.doi:10.1001/jamanetworkopen.2020.5619 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamanetworkopen.2020.5619&link_type=DOI) 17. 17.Zhang J-J, Dong X, Cao Y-Y, et al. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan, China. Allergy. February 2020. doi:10.1111/all.14238 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/all.14238&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 18. 18.Ruan Q, Yang K, Wang W, Jiang L, Song J. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. March 2020. doi:10.1007/s00134-020-05991-x [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00134-020-05991-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32125452&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 19. 19.Guan W, Ni Z, Hu Y, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N Engl J Med. February 2020. doi:10.1056/NEJMoa2002032 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2002032&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32109013&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 20. 20.Lagunas-Rangel FA. Neutrophil-to-lymphocyte ratio and lymphocyte-to-C-reactive protein ratio in patients with severe coronavirus disease 2019 (COVID-19): A meta-analysis. J Med Virol. April 2020. doi:10.1002/jmv.25819 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jmv.25819&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32242950&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 21. 21.Guo T, Fan Y, Chen M, et al. Cardiovascular Implications of Fatal Outcomes of Patients With Coronavirus Disease 2019 (COVID-19). JAMA Cardiol. March 2020. doi:10.1001/jamacardio.2020.1017 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamacardio.2020.1017&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32219356&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 22. 22.Jackson R, Kartoglu I, Stringer C, et al. CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak. 2018;18. doi:10.1186/s12911-018-0623-9 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12911-018-0623-9&link_type=DOI) 23. 23.Kraljevic Z, Bean D, Mascio A, et al. MedCAT -- Medical Concept Annotation Tool. arxiv:191210166 Cs Stat. December 2019. [http://arxiv.org/abs/1912.10166](http://arxiv.org/abs/1912.10166). Accessed April 17, 2020. 24. 24.Searle T, Kraljevic Z, Bendayan R, Bean D, Dobson R. MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation. ArXiv190707322 Cs. July 2019. [http://arxiv.org/abs/1907.07322](http://arxiv.org/abs/1907.07322). Accessed April 17, 2020. 25. 25.Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1–9. doi:10.1038/sdata.2016.35 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sdata.2016.35&link_type=DOI) 26. 26.Bean D, Kraljevic Z, Searle T, et al. Treatment with ACE-inhibitors is associated with less severe disease with SARS-Covid-19 infection in a multi-site UK acute Hospital Trust. medRxiv. April 2020:2020.04.07.20056788. doi:10.1101/2020.04.07.20056788 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNC4wNy4yMDA1Njc4OHYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDQvMjkvMjAyMC4wNC4yNC4yMDA3ODAwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 27. 27.Firth D. Bias Reduction of Maximum Likelihood Estimates. Biometrika. 1993;80(1):27–38. doi:10.2307/2336755 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/80.1.27&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1993KZ19500002&link_type=ISI) 28. 28.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach. J R Stat Soc Ser B-Methodol. 1995;57(1):289–300. 29. 29.Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267–288. doi:10.1111/j.2517-6161.1996.tb02080.x [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.2517-6161.1996.tb02080.x&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996TU31400017&link_type=ISI) 30. 30.Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations. New York: CRC Press; 2015. 31. 31.Kuhn M, Johnson K. Applied Predictive Modeling. Vol 26. Springer; 2013. 32. 32.Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ‘16. New York, NY, USA: ACM; 2016:785–794. doi:10.1145/2939672.2939785 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/2939672.2939785&link_type=DOI) 33. 33.Steyerberg E. Clinical Prediction Models. Second Edition. Cham, Switzerland: Springer; 2019. 34. 34.Van Rossum G, Drake FL. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009. 35. 35.Oliphant T. NumPy: A guide to NumPy. 2006. [http://www.numpy.org/](http://www.numpy.org/). 36. 36.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–2830. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cpc.2010.04.018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23755062&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F04%2F29%2F2020.04.24.20078006.atom) 37. 37.NEWS2 and deterioration in COVID-19. RCP London. [https://www.rcplondon.ac.uk/news/news2-and-deterioration-covid-19](https://www.rcplondon.ac.uk/news/news2-and-deterioration-covid-19). Published April 14, 2020. Accessed April 24, 2020.