Abstract
Background Updatable understanding of the onset and progression of individuals COVID-19 trajectories underpins pandemic mitigation efforts. In order to identify and characterize individual trajectories, we defined and validated ten COVID-19 phenotypes from linked electronic health records (EHR) on a nationwide scale using an extensible framework.
Methods Cohort study of 56.6 million people in England alive on 23/01/2020, followed until 31/05/2021, using eight linked national datasets spanning COVID-19 testing, vaccination, primary & secondary care and death registrations data. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity using a combination of international clinical terminologies (e.g. SNOMED-CT, ICD-10) and bespoke data fields; positive test, primary care diagnosis, hospitalisation, critical care (four phenotypes), and death (three phenotypes). Using these phenotypes, we constructed patient trajectories illustrating the transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status.
Findings We identified 3,469,528 infected individuals (6.1%) with 8,825,738 recorded COVID-19 phenotypes. Of these, 364,260 (11%) were hospitalised and 140,908 (4%) died. Of those hospitalised, 38,072 (10%) were admitted to intensive care (ICU), 54,026 (15%) received non-invasive ventilation and 21,404 (6%) invasive ventilation. Amongst hospitalised patients, first wave mortality (30%) was higher than the second (23%) in non-ICU settings, but remained unchanged for ICU patients. The highest mortality was for patients receiving critical care outside of ICU in wave 1 (51%). 13,083 (9%) COVID-19 related deaths occurred without diagnoses on the death certificate, but within 30 days of a positive test while 10,403 (7%) of cases were identified from mortality data alone with no prior phenotypes recorded. We observed longer patient trajectories in the second pandemic wave compared to the first.
Interpretation Our analyses illustrate the wide spectrum of severity that COVID-19 displays and significant differences in incidence, survival and pathways across pandemic waves. We provide an adaptable framework to answer questions of clinical and policy relevance; new variant impact, booster dose efficacy and a way of maximising existing data to understand individuals progression through disease states.
Evidence before the study We searched PubMed on October 14, 2021, for publications with the terms “COVID-19” or “SARS-CoV-2”, “severity”, and “electronic health records” or “EHR” without date or language restrictions. Multiple studies explore factors associated with severity of COVID-19 infection, and model predictions of outcome for hospitalised patients. However, most work to date focused on isolated facets of the healthcare system, such as primary or secondary care only, was conducted in subpopulations (e.g. hospitalised patients) of limited sample size, and often utilized dichotomised outcomes (e.g. mortality or hospitalisation) ignoring the full spectrum of disease. We identified no studies which comprehensively detailed severity of infections while describing disease severity across pandemic waves, vaccination status, and patient trajectories.
Added value of this study To our knowledge, this is the first study providing a comprehensive view of COVID-19 across pandemic waves using national data and focusing on severity, vaccination, and patient trajectories. Drawing on linked electronic health record (EHR) data on a national scale (56.6 million people alive and registered with GP in England), we describe key demographic factors, frequency of comorbidities, impact of the two main waves in England, and effect of full vaccination on COVID-19 severities. Additionally, we identify and describe patient trajectory networks which illustrate the main transition pathways of COVID-19 patients in the healthcare system. Finally, we provide reproducible COVID-19 phenotyping algorithms reflecting clinically relevant stages of disease severity i.e. positive tests, primary care diagnoses, hospitalisation, critical care treatments (e.g. ventilatory support) and mortality.
Implications of all the available evidence The COVID-19 phenotypes and trajectory analysis framework outlined produce a reproducible, extensible and repurposable means to generate national-scale data to support critical policy decision making. By modelling patient trajectories as a series of interactions with healthcare systems, and linking these to demographic and outcome data, we provide a means to identify and prioritise care pathways associated with adverse outcomes and highlight healthcare system ‘touch points’ which may act as tangible targets for intervention.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Protocols
https://github.com/BHFDSC/CCU013_01_ENG-COVID-19_event_phenotyping/tree/main/protocol
Funding Statement
The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK) funded co-development (with NHS Digital) of the trusted research environment, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK data and connectivity component of the UK governments chief scientific advisers national core studies programme to coordinate national covid-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. AA is supported by Health Data Research UK (HDR-9006), which receives its funding from the UK Medical Research Council (MRC), Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF), and Wellcome Trust; and Administrative Data Research UK, which is funded by the ESRC (grant ES/S007393/1). AB is supported by research funding from the National Institute for Health Research (NIHR), British Medical Association, Astra-Zeneca, and UK Research and Innovation. AB, AW, HH, and SD are part of the BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement No 116074. AW and SI are supported by the BHF-Turing Cardiovascular Data Science Award (BCDSA\100005) and by core funding from UK MRC (MR/L003120/1), BHF (RG/13/13/30194; RG/18/13/33946), and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). JAC and JS are supported by the Health Data Research (HDR) UK South West Better Care Partnership and the NIHR Bristol Biomedical Research Centre at University Hospitals Bristol, and Weston NHS Foundation Trust and the University of Bristol. SD, HH are supported by HDR UK London, which receives its funding from HDR UK funded by the UK MRC, EPSRC, ESRC, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh government), Public Health Agency (Northern Ireland), BHF, and Wellcome Trust; HH and SD are supported by the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. SD is supported by an Alan Turing Fellowship (EP/N510129/1). HH is a NIHR Senior Investigator. SD and HH are supported by the BHF Accelerator Award AA/18/6/24223. CT is supported by a UCL UKRI Centre for Doctoral Training in AI-enabled Healthcare studentship (EP/S021612/1), MRC Clinical Top-Up and a studentship from the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. WW is supported by a Scottish senior clinical fellowship, CSO (SCAF/17/01).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Data access approval was granted to the CVD-COVID-UK consortium (under project proposal CCU013 High-throughput electronic health record phenotyping approaches) through the NHS Digital online Data Access Request Service (ref. DARS-NIC-381078-Y9C5K). NHS Digital data have been made available for research under the Control of Patient Information (COPI) notice which mandated the sharing of national electronic health records for COVID-19 research (more info: https://digital.nhs.uk/coronavirus/coronavirus-covid-19-response-information-governance-hub/control-of-patient-information-copi-notice). For further detail see supplementary methods.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵† Joint first authors
Data Availability
Data access approval was granted to the CVD-COVID-UK consortium (under project proposal CCU013 High-throughput electronic health record phenotyping approaches) through the NHS Digital online Data Access Request Service (ref. DARS-NIC-381078-Y9C5K). Requests for data access should be made to NHS Digital.
https://digital.nhs.uk/services/data-access-request-service-dars
https://web.www.healthdatagateway.org/dataset/7e5f0247-f033-4f98-aed3-3d7422b9dc6d