Abstract
Background Accurate risk prediction of clinical outcome would usefully inform clinical decisions and intervention targeting in COVID-19. The aim of this study was to derive and validate risk prediction models for poor outcome and death in adult inpatients with COVID-19.
Methods Model derivation using data from Wuhan, China used logistic regression with death and poor outcome (death or severe disease) as outcomes. Predictors were demographic, comorbidity, symptom and laboratory test variables. The best performing models were externally validated in data from London, UK.
Findings 4.3% of the derivation cohort (n=775) died and 9.7% had a poor outcome, compared to 34.1% and 42.9% of the validation cohort (n=226). In derivation, prediction models based on age, sex, neutrophil count, lymphocyte count, platelet count, C-reactive protein and creatinine had excellent discrimination (death c-index=0.91, poor outcome c-index=0.88), with good-to-excellent calibration. Using two cut-offs to define low, high and very-high risk groups, derivation patients were stratified in groups with observed death rates of 0.34%, 15.0% and 28.3% and poor outcome rates 0.63%, 8.9% and 58.5%. External validation discrimination was good (c-index death=0.74, poor outcome=0.72) as was calibration. However, observed rates of death were 16.5%, 42.9% and 58.4% and poor outcome 26.3%, 28.4% and 64.8% in predicted low, high and very-high risk groups.
Interpretation Our prediction model using demography and routinely-available laboratory tests performed very well in internal validation in the lower-risk derivation population, but less well in the much higher-risk external validation population. Further external validation is needed. Collaboration to create larger derivation datasets, and to rapidly externally validate all proposed prediction models in a range of populations is needed, before routine implementation of any risk prediction tool in clinical care.
Funding MRC, Wellcome Trust, HDR-UK, LifeArc, participating hospitals, NNSFC, National Key R&D Program, Pudong Health and Family Planning Commission
Evidence before this study Several prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay in COVID-19 have been published.1 Commonly reported predictors of severe prognosis in patients with COVID-19 include age, sex, computed tomography scan features, C-reactive protein (CRP), lactic dehydrogenase, and lymphocyte count. Symptoms (notably dyspnoea) and comorbidities (e.g. chronic lung disease, cardiovascular disease and hypertension) are also reported to have associations with poor prognosis.2 However, most studies have not described the study population or intended use of prediction models, and external validation is rare and to date done using datasets originating from different Wuhan hospitals.3 Given different patterns of testing and organisation of healthcare pathways, external validation in datasets from other countries is required.
Added value of this study This study used data from Wuhan, China to derive and internally validate multivariable models to predict poor outcome and death in COVID-19 patients after hospital admission, with external validation using data from King’s College Hospital, London, UK. Mortality and poor outcome occurred in 4.3% and 9.7% of patients in Wuhan, compared to 34.1% and 42.9% of patients in London. Models based on age, sex and simple routinely available laboratory tests (lymphocyte count, neutrophil count, platelet count, CRP and creatinine) had good discrimination and calibration in internal validation, but performed only moderately well in external validation. Models based on age, sex, symptoms and comorbidity were adequate in internal validation for poor outcome (ICU admission or death) but had poor performance for death alone.
Implications of all the available evidence This study and others find that relatively simple risk prediction models using demographic, clinical and laboratory data perform well in internal validation but at best moderately in external validation, either because derivation and external validation populations are small (Xie et al3) and/or because they vary greatly in casemix and severity (our study). There are three decision points where risk prediction may be most useful: (1) deciding who to test; (2) deciding which patients in the community are at high-risk of poor outcomes; and (3) identifying patients at high-risk at the point of hospital admission. Larger studies focusing on particular decision points, with rapid external validation in multiple datasets are needed. A key gap is risk prediction tools for use in community triage (decisions to admit, or to keep at home with varying intensities of follow-up including telemonitoring) or in low income settings where laboratory tests may not be routinely available at the point of decision-making. This requires systematic data collection in community and low-income settings to derive and evaluate appropriate models.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
HW and HZ are supported by Medical Research Council and Health Data Research UK Grant (MR/S004149/1), Industrial Strategy Challenge Grant (MC_PC_18029) and Wellcome Institutional Translation Partnership Award (PIII054). RD is supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. DMB is funded by a UKRI Innovation Fellowship as part of Health Data Research UK MR/S00310X/1 (https://www.hdruk.ac.uk). KD is supported by LifeArc STOPCOVID award. This work uses data provided by patients and collected by the NHS as part of their care and support. XW is supported by National Natural Science Foundation of China (grant number:81700006). QL is supported by National Key R&D Program (2018YFC1313700), National Natural Science Foundation of China (grant number: 81870064) and the “Gaoyuan” project of Pudong Health and Family Planning Commission (PWYgy2018-06). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. We acknowledge Lingyu Ran and Yongsheng Du for their contribution in data collection.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Data from patient records used in the study will not be available due to inability to fully anonymise up to the Information Commissioner Office (ICO) standards and ethical requirements. A subset of the dataset limited to anonymisable information is available on request to researchers with suitable training in information governance and human confidentiality protocols subject to approval by the King's College Hospital Information Governance committee and Shanghai East Hospital committee; applications for research access to KCH cohort data should be sent to [kch-tr.cogstackrequests@nhs.net] and applications for research access to Wuhan cohort data should be sent to [ting.shi{at}ed.ac.uk]. This dataset cannot be released publicly due to the risk of re-identification of such granular individual-level data. Risk prediction models are published at (https://covid.datahelps.life/prediction/).