STRUCTURED ABSTRACT
Importance Algorithms for classification of inpatient COVID-19 severity are necessary for confounding control in studies using real-world data (RWD).
Objective To explore use of electronic health record (EHR) data to inform an administrative data algorithm for classification of supplemental oxygen or noninvasive ventilation (O2/NIV) and invasive mechanical ventilation (IMV) to assess disease severity in hospitalized COVID-19 patients.
Design In this retrospective cohort study, we developed an initial procedure-based algorithm to identify O2/NIV, IMV, and NEITHER O2/NIV nor IMV in two inpatient RWD sources. We then expanded the algorithm to explore the impact of adding diagnoses indicative of clinical need for O2/NIV (hypoxia, hypoxemia) or IMV (acute respiratory distress syndrome) and O2-related patient vitals available in the EHR. Observed changes in severity categorization were used to augment the administrative algorithm.
Setting Optum de-identified COVID-19 EHR data and HealthVerity claims and chargemaster data (March – August 2020).
Participants Among patients hospitalized with COVID-19 in each RWD source, our motivating example selected dexamethasone (DEX+) initiators and a random selection of patients who were non-initiators of a corticosteroid of interest (CSI-) matched on date of DEX initiation, age, sex, baseline comorbidity score, days since admission, and COVID-19 severity level (NEITHER, O2/NIV, IMV) on treatment index.
Main Outcome and Measures Inpatient COVID-19 severity was defined using the algorithms developed to classify respiratory support requirements among hospitalized COVID-19 patients (NEITHER, O2/NIV, IMV). Measures were reported as the treatment-specific distributions of patients in each severity level, and as observed changes in severity categorization between the initial procedure-based and expanded algorithms.
Results In the administrative data cohort with 5,524 DEX+ and CSI- patient pairs matched using the initial procedure-based algorithm, 30% were categorized as O2/NIV, 5% as IMV, and 65% as NEITHER. Among patients assigned NEITHER via the initial algorithm, use of an expanded algorithm informed by the EHR-based algorithm shifted 54% DEX+ and 28% CSI- to O2/NIV, and 2% DEX+ and 1% CSI- to IMV. Among patients initially assigned O2/NIV, 7% DEX+ and 3% CSI- shifted to IMV.
Conclusions and Relevance Application of learnings from an EHR-based exploration to our administrative algorithm minimized treatment-differential misclassification of COVID-19 severity.
1. INTRODUCTION
Since coronavirus disease 2019 (COVID-19) emerged in the United States (US), there have been over 42.9 million cases {WHO 2020} with COVID-related hospitalizations at an estimated cumulative incidence over 676.4 per 100,000 persons and COVID-19 related deaths exceeding 688,000 as of October 1, 2021 {CDC 2020}. Patients hospitalized for COVID-19 may have respiratory manifestations ranging from mild dyspnea to acute respiratory failure, septic shock, and/or multiple organ dysfunction {NIH 2021}. Respiratory support including supplemental oxygen (O2) delivery with or without noninvasive ventilation (NIV) and invasive mechanical ventilation (IMV) may be life-saving for patients experiencing moderate-to-severe disease. Guidelines recommend maintaining blood oxygen saturation (SpO2) of 92-96% {NIH 2021; Barrot 2020; Chu 2018}. High-flow oxygen via nasal cannula or loose-fitting mask at a titrated flow rate is preferred over noninvasive positive pressure ventilation and conventional oxygen therapy {NIH 2021; Ni 2018}. IMV, which is indicated for patients receiving oxygen support with a low ratio of arterial oxygen partial pressure to fractional inspired oxygen (PaO2/FIO2 ≤ 200), may be initiated among patients who continue respiratory deterioration {Rochwerg 2017; Fan 2018}. According to NIH guidelines {NIH 2021}, ventilator maintenance among COVID-19 patients should follow standard guidelines for management of hypoxemic respiratory failure due to other causes {Papazian 2019}. In more severe cases when organs start to fail, additional support may be added to IMV to help the heart and lungs pump oxygen into the blood (extracorporeal membrane oxygenation, ECMO), help the kidneys with filtration (renal replacement therapy), or improve blood and oxygen delivery to vital organs (vasopressors). Despite the increasing availability of vaccines, determining the effectiveness of potential COVID-19 therapeutics continues to be an urgent concern. As indicators of COVID-19 severity, baseline O2, NIV, and IMV respiratory support requirements are critical measures of risk, prognosis and outcomes. This was made clear in the United Kingdom RECOVERY trial, which reported lower 28-day mortality for dexamethasone (DEX) versus usual care among patients with O2 or NIV (risk ratio: 0.82, 95% confidence interval: 0.72-0.94) and IMV (0.64, 0.51-0.81), but no discernable difference among patients receiving neither (1.19, 0.91-1.55) {RECOVERY 2020}. The US Food and Drug Administration (FDA) has since issued guidance recommending that patients be classified according to baseline disease severity in all studies determining the effectiveness of new COVID-19 treatments and prevention {FDA 2021}. Therefore, we sought to define COVID-19 severity for an inpatient comparative effectiveness study of DEX+ versus matched non-users of corticosteroid of interest (CSI-; dexamethasone, methylprednisolone, prednisone, hydrocortisone) at the time of match (i.e., the DEX+ treatment initiation day) using an administrative real-world data (RWD) source (protocol posted to clinicaltrials.gov; NCT04926571). Matching treated and untreated patients on the day of inpatient treatment initiation requires the ability to classify severity at study entry.
The World Health Organization (WHO) developed a Clinical Progression Scale to classify COVID-19 severity {WHO Working Group 2020}. While this scale was developed for determining patient outcomes, it can also be used to determine baseline severity. However, it relies heavily on the availability of clinical information that may not always be available in RWD. The FDA Sentinel Initiative also developed a practical severity score to classify patient severity ranging from asymptomatic to critical using RWD {Yih 2020}. However, the categories rely on day-level diagnoses that are often unavailable or under-recorded within inpatient data sources. There is a clear need for additional mechanisms, including algorithms to determine COVID-19 severity using RWD, especially in the inpatient setting. We therefore developed a modified version of the WHO Scale that was influenced by the FDA Sentinel score to determine O2/NIV and IMV respiratory support requirements. We started with a procedure-based initial algorithm. Given its invasiveness and cost, we assumed IMV would be fairly well recorded in administrative (billing based) RWD. We had concerns, however, that O2 and NIV procedures may not be as well captured.
This paper is a result of a research collaboration agreement with the FDA to use RWD to advance the understanding and the natural history of COVID-19 in specific patient populations, as well as treatment and diagnostic patterns during the COVID-19 pandemic. The study aimed to use an electronic health record (EHR) RWD source to develop an expanded algorithm to improve upon our initial algorithm applied to administrative data for categorization of O2/NIV and IMV in patients hospitalized for COVID-19. While the motivating example was the comparative effectiveness of DEX+ versus CSI-in HealthVerity data, we aimed to develop an algorithm for an inpatient score with broad utility beyond this example to other EHR, administrative claims, and hospital billing data sources.
2. MATERIALS AND METHODS
2.1. Data Sources
Our study used two US RWD sources (see Appendix C for additional detail). First, HealthVerity data comprises medical and pharmacy open claims (sourced in near-real-time from practice management systems, billing systems and claims clearinghouses) and closed claims (sourced from insurance providers and payers), laboratory test history and results, and chargemaster administrative hospital billing data for inpatient and outpatient encounters from all US states. The data include all major payer types (commercial, Medicaid and Medicare). Second, Optum de-identified COVID-19 EHR data includes patients with suspected COVID-19, sourced from medical and administrative encounters from hospitals, emergency departments, outpatient centers, and laboratories from across the US that include diagnosis data, laboratory data with results, procedures, vital sign measurements, prescriptions written, and medications administered. There has been scientific publication of COVID-19 research using both the HealthVerity {Burn 2020; Gordon 2020; Harvey 2021; Murk 2020} and Optum {Hughes 2020} datasets.
In the medical claims of both datasets, procedures are captured via current procedural terminology (CPT), healthcare common procedure coding system (HCPCS), and international classification of disease 10th revision procedural classification system (ICD-10-PCS) codes, diagnoses are captured via ICD-10, clinical modification (ICD-10-CM) codes, and O2 or related supplies indicating an O2 procedure may additionally be captured via revenue codes. In the chargemaster data of HealthVerity, vendor charge codes also capture procedures and diagnoses; although procedures are captured daily while hospitalized, diagnoses are primarily captured at discharge, with some further capture of diagnoses present at admission. In contrast, the Optum de-identified COVID-19 EHR has day-level diagnosis encounters while hospitalized. Comparing the available data in these two data sources offers the opportunity to create a more widely-applicable COVID-19 severity score that can be used in EHR, claims and/or hospital billing data.
2.2. COVID-19 severity levels
The WHO Clinical Progression Scale scores range of COVID-19 severity outcomes from uninfected (score of 0) to dead (10; see Table 1) {WHO Working Group 2020}. We developed a modified version, referred to here as the mWHO score, that restricts to severity levels applicable to hospitalized COVID-19 patients (WHO original scores of 4-9), collapsed into three mutually exclusive categories for neither O2/NIV nor IMV, O2 or NIV without IMV, and any IMV with or without additional support (NEITHER, O2/NIV, and IMV, respectively; see Appendix A for additional mWHO algorithm detail). These categories correspond to the FDA Sentinel’s Moderate, Severe, and Critical categories, respectively, leaving out the Asymptomatic and Mild categories that apply only to non-hospitalized patients {Yih 2020}.
To operationalize the score, patients were assigned to mWHO categories based on the highest severity recorded on the treatment index date, considering both the first day of treatment and the day prior to minimize potential misclassification from situations in which patients received O2 or IMV in other medical settings (e.g., emergency room or ambulance), or cases where the billing date for a procedure was captured at a later calendar date than the procedure was performed due to a lag in reporting.
In each data source, two different algorithms were developed to identify the COVID-19 severity levels (see summary in Table 2). First, we developed procedure-based algorithms that were similar for both data sources (initial-EHR and initial-administrative), except procedure-based charge codes were only applicable for the administrative chargemaster data. Next, to increase the specificity of O2 and IMV identification, we expanded the algorithms to include diagnoses indicating a clinical need for O2/NIV (hypoxia or hypoxemia) or IMV (acute respiratory distress syndrome), among other additions. The expanded-EHR algorithm included available clinical patient vitals and day-level diagnosis data. We then applied the EHR-based learnings to the expanded-administrative algorithm, adapting as needed to analogous data available in the administrative dataset. For example, day-level diagnosis data was not available in the administrative data during the hospitalizations, but we were able to utilize the flag indicating ‘present at admission’ to differentiate the admitting diagnoses from all diagnoses reported at discharge.
2.3. Study population
In each dataset, we identified patients hospitalized with confirmed COVID-19 (diagnosis or positive SARS-CoV-2 laboratory results) between March and August 2020 with at least 1 encounter during the 183-day baseline and no prior CSI use during the 90-day washout period. From these populations, we selected primary cohorts of DEX initiators (DEX+) and a random selection of patients who had not or had not yet initiated any CSI (CSI-) matched 1:1 on date of DEX initiation, age, sex, Charlson-Quan comorbidity score over the 183-day baseline period {Quan 2005}, days since admission, and the mWHO severity level (NEITHER, O2/NIV, IMV) on treatment index (see NCT04926571 protocol for additional detail). The primary analysis cohorts matched on mWHO severity using the initial algorithms, while secondary analysis cohorts were rematched using the expanded algorithms.
2.4. Approach / statistical analysis
The primary analysis evaluated the treatment-specific distributions of patients in each mWHO severity level, according to both the initial algorithms that patients were matched on and the expanded algorithms. This was done to describe how many of the patients from each initial level would have shifted to a higher severity level via the expanded algorithms (the percent of NEITHER who shifted to O2/NIV or IMV, and the percent of O2/NIV that shifted to IMV). We started with the EHR data and then repeated the same process using the administrative data.
We performed two secondary analyses and one sensitivity analysis largely focusing on the administrative data. The first secondary analysis rematched cohorts in both datasets using the expanded algorithms to illustrate the distribution of patients who qualified for O2/NIV and IMV via each measure component (i.e., procedures, medical diagnosis claims, chargemaster diagnosis data, and revenue codes) using Venn diagrams to understand the relative contribution of each component type. The second compared the distribution of select patient characteristics in each treatment arm of the administrative data cohort in the categories they were matched on using the initial-administrative algorithm to the same cohort recategorized using the expanded-administrative algorithm. Lastly, a sensitivity analysis evaluated the shifts in severity and the relative contribution of each component type among a subset of patients in the administrative data cohort who were early initiators (initiated treatment on the same day or 1 day after the admission date).
All analyses were conducted using the Aetion Evidence Platform® (2021), a software for real-world data analysis, validated for a range of studies. {Wang 2016} Visualizations were created using the tidyverse {Wickham 2020} and plotly {Sievert 2019} packages in R (v4.0.3).
3. RESULTS
3.1. Primary analyses
In the EHR-based cohort with 1,768 DEX+ and CSI− patient pairs matched using the initial-HER algorithm, 17% were categorized as O2/NIV, 2% as IMV, and 81% as NEITHER (Figure 1). Among those categorized as NEITHER, use of the expanded-EHR algorithm shifted 56% DEX+ and 32% CSI- to O2/NIV, and 5% DEX+ and 3% CSI- to IMV. Among the patients initially categorized as O2/NIV, 10% DEX+ and 8% CSI-patients shifted to IMV.
In the administrative data cohort with 5,524 DEX+ and CSI− patient pairs matched using the initial-administrative algorithm, 30% were categorized as O2/NIV, 5% as IMV, and 65% as NEITHER (Figure 2). Similar to the EHR-based cohort, among those categorized as NEITHER, use of the expanded-administrative algorithm shifted 54% DEX+ and 28% CSI- to O2/NIV, and 2% DEX+ and 1% CSI- to IMV. Among the patients initially categorized as O2/NIV, 7% DEX+ and 3% CSI− patients shifted to IMV.
3.2. Secondary and sensitivity analyses
After matching using the expanded-administrative algorithm, the sample size slightly decreased to 5,337 DEX+ and CSI− patient pairs with 61% categorized as O2/NIV, 7% as IMV, and 32% as NEITHER (Appendix Figure B.1; Appendix Table B.1). Illustrated areas of overlap indicate the distribution of patients with more than one measure component that qualifies them for either O2/NIV or IMV, while areas of non-overlap indicate only one qualifying component. Among the patients categorized as O2/NIV, 49% of both DEX+ and CSI− qualified only due to the added clinical diagnosis of hypoxia or hypoxemia of the expanded-administrative algorithm (i.e., did not have a procedure-based component). In contrast, 27% of DEX+ and 24% of CSI− qualified as IMV due to only the clinical diagnosis of acute respiratory distress syndrome. The inclusion of revenue codes only added 1 DEX+ patient to the O2/NIV group.
Additional analyses of the EHR-based cohort matched using the expanded-EHR algorithm are shown in Appendix B. The addition of clinical diagnoses in this expanded algorithm had a similar impact on the mWHO severity categorization to that seen in the expanded-administrative algorithm. Additionally, patient vitals were responsible for 14% of DEX+ and 27% of CSI− patients categorized as O2/NIV (Appendix Figure B.2).
Compared to the mWHO severity categorization using the initial-administrative algorithm that the patients were matched on, recategorization using the expanded-administrative algorithm showed that there was a significant decrease in the IMV group for heart disease for DEX+ patients (ASD: 0.12; Table 3). There was a treatment-differential change that was greater for DEX+ versus CSI− patients in the NEITHER, O2/NIV, and IMV groups for diabetes (ASD: 0.04 vs 0.02, 0.05 vs 0.02, and 0.08 vs 0.06), NEITHER and O2/NIV groups for age (ASD: 0.08 vs 0.07 and 0.04 vs 0.02), and in the IMV group for heart disease (ASD: 0.12 vs 0.01) chronic pulmonary disease (ASD: 0.09 vs 0.04, and end-stage renal disease (ASD: 0.09 vs 0.07). There was also a treatment-differential change that was less for DEX+ versus CSI− patients in the IMV group for age (ASD: 0.04 vs 0.05).
When the administrative cohort was restricted to early initiators in the sensitivity analysis, similar shifts in mWHO severity and relative contributions of each component type were identified (Figures B.3-B.4).
4. CONCLUSION
Application of the expanded algorithms resulted in treatment-differential shifts to higher mWHO severity levels in both datasets, suggesting a substantial increase in the specificity. As expected, IMV was mostly captured via procedure-related encounters, but there were still notable shifts from NEITHER and from O2/NIV to IMV. In contrast, the shifts from NEITHER to O2/NIV under the expanded algorithms were substantial. More than half of the DEX+ patients (56% and 49% in the EHR and administrative data cohorts, respectively) and more than one-quarter of the CSI− patients (32% and 28%) had no evidence of an O2/NIV procedure-related encounter, but had one of the other O2/NIV components (largely diagnosis of hypoxia or hypoxemia). Rematching the administrative data cohort using the expanded-administrative algorithm further supported the value of adding the diagnoses indicating clinical need for procedure. There were important treatment-differential changes in the distributions of patient characteristics seen when patients who had been matched using the initial-administrative algorithm had been recategorized via the expanded-administrative algorithm that would have been masked had a procedure-based algorithm alone been used.
Since the patients randomized in the RECOVERY trial were older and had more comorbidities {RECOVERY 2020} than our real-world administrative data cohort, we anticipated that our patients may be less severe at baseline. However, we were surprised to see that compared to the RECOVERY trial DEX+ treatment arm, the initial-administrative algorithm identified 42% more patients in the NEITHER group, and 31% and 11% less patients in the O2/NIV and IMV groups, respectively. After matching using the expanded-administrative algorithm, the severity classifications aligned with our expectations, i.e., we had 8% more patients in the NEITHER group compared with RECOVERY, approximately the same in O2/NIV, and 8% less patients in IMV.
Advantages and limitations of the HealthVerity administrative data were taken into consideration. First, although the included open claims data has the benefit of near-real-time capture, it may be less complete for the most recent calendar dates. However, because our study period ended August 2020 to align with the Optum data we had available, this concern was minimal. Additionally, although access to inpatient medication use via the chargemaster data is a benefit over most other RWD sources, inpatient corticosteroids (DEX and other CSIs) are captured via non-standardized vendor descriptions rather than standardized fields (e.g., national drug codes). We have therefore developed a series of search string algorithms in collaboration with a licensed pharmacist to minimize the potential for misclassification of these medications. Finally, while inpatient day-level diagnoses are not consistently available in HealthVerity, which could lead to under-reporting of severity on days during the hospitalization, we were able to incorporate the admitting diagnoses available via the chargemaster data into the expanded-administrative algorithm and we assume to have complete diagnosis data for the majority of patients since 90% of our cohort were early initiators with a treatment index either on or one day after the admission date. This concern was further mitigated by the finding that the results among early initiators were similar.
Access to the Optum data with patient vitals and day-level diagnoses informed and improved our administrative data algorithm. The fact that the shifts using the administrative data cohort were similar to the shifts in the EHR-based cohort provides additional assurance that we were able to still capture necessary diagnoses via admitting diagnoses alone. Therefore, the expanded algorithms have applicability for use in any RWD that allows for valid identification of inpatient hospitalization and includes procedure codes. While day-level diagnoses are ideal, access to admitting diagnoses should be sufficient (in particular when study entry happens close to admission) with noted limitations.
Applying learnings from use of an EHR-based algorithm to our administrative data algorithm minimized treatment-differential misclassification, and allowed for more specific identification of lower and higher severity patients through the addition of relevant clinical diagnoses as well as patient vitals (where available). These findings demonstrate the importance of these expanded algorithms, as O2 and NIV, which are critical to COVID-19 severity categorization, may be substantially under-reported in RWD using procedure-based encounters alone. Although further validation efforts may be considered, our expanded algorithms are an important addition to the COVID-19 RWD literature with applicability for any study requiring categorization of disease severity in inpatient EHR or administrative data.
Data Availability
Patient-level data belong to third parties and cannot be shared publicly. Upon reasonable request, researchers may obtain limited access to data and analytics infrastructure.
Acknowledgements
We wish to thank the following additional members of the COVID-19 Research Collaborative team for their additional review and discussion of the research design and interpretation of findings: Marie C. Bradley, PhD, MScPH, MPharm; Silvia Perez-Vilar, PharmD, PhD; Laura M. Roe, MMCi (U.S. Food and Drug Administration), Amy Abernethy, MD, PhD (former Principal Deputy Commissioner, U.S. Food & Drug Administration), Nevine Zariffa, M.Math. (NMD Group LLC consulting), and Anthony Louder, PhD (Aetion, Inc.). We wish to also thank Melanie Wang, MPH (Aetion, Inc.) for her management and coordination of this collaborative research effort.
APPENDIX A: mWHO COVID-19 Severity Detailed Breakdown and Code Lists
APPENDIX B: Additional Results
APPENDIX C. Additional Data Source Detail
C.1 HealthVerity Data
The HealthVerity COVID-19 data includes 16 unique US data sources from all 50 states across four main data types (described in Table C.1 and illustrated in Figure C.1). Patients with at least one medical and pharmacy claim, and any available CDM, EHR, and lab data are linked across data types using a unique patient identifier. Data capture began December 2018, with the exception of two lab sources that began in March/June 2020. The data is refreshed approximately every 2 weeks with varying data lag by data type and vendor (see Table C.1). Additional specifications of the HealthVerity COVID-19 dataset are summarized in Table C.2.
C.2 Optum COVID-19 De-identified Electronic Health Records
The Optum COVID-19 de-identified electronic health records (EHR) is sourced from laboratories and hospital and emergency department EHRs from integrated delivery networks (IDNs) and smaller outpatient clinics from all over the country. The data in the analysis is entirely inpatient and includes diagnosis data, laboratory data with results, procedures, vital sign measurements, prescriptions written, and medications administered. Sourced from the legacy Humedica database, now Optum EHR, the limited dataset includes a subset of patients as described in the COVID-specific data selection criteria of Table C.3. Data capture began February 1, 2020 and ended on September 24, 2020 with no scheduled updates and includes approximately 2 million patients (N=2,018,728). If patients were already in the Optum EHR database, patient history was included. The underlying data is representative of the US, but the COVID-19 cut of data is skewed towards the Midwest and Northeast. Additional specifications of the Optum COVID-19 dataset are summarized in Table C.3.
Footnotes
Disclosures: This paper is part of a research collaboration agreement between the U.S. Food and Drug Administration (FDA) and Aetion, Inc. to use real-world data to advance the understanding and the natural history of coronavirus disease (COVID-19) in specific patient populations, as well as treatment and diagnostic patterns during the COVID-19 pandemic.
This paper reflects the views of the authors and should not be construed to represent FDA’s views or policies. EMG, ARW, SEV, IJE, JAR, and NMG are employees of Aetion, Inc., with stock options or existing equity.
ABBREVIATIONS
- COVID-19
- coronavirus disease 19
- CPT
- current procedural terminology
- CSI
- corticosteroid of interest (DEX, methylprednisolone, prednisone, hydrocortisone)
- CSI-
- random selection of patients who had not (yet) initiated any CSI
- DEX
- dexamethasone
- DEX+
- dexamethasone initiators in the matched cohort
- ECMO
- extracorporeal membrane oxygenation
- EHR
- electronic health record
- FDA
- United States Food and Drug Administration
- HCPCS
- healthcare common procedure coding system
- ICD-10-CM
- international classification of disease codes, 10th revision, clinical modification
- ICD-10-PCS
- international classification of disease codes, 10th revision, procedural classification system
- IMV
- invasive mechanical ventilation
- mWHO
- modified version of WHO Clinical Progression Scale for COVID-19 severity
- NEITHER
- neither O2/NIV nor IMV
- NIV
- noninvasive ventilation
- O2
- oxygen/supplemental oxygen
- PaO2/FIO2
- ratio of arterial oxygen partial pressure to fractional inspired oxygen
- RWD
- real-world data
- SpO2
- blood oxygen saturation
- US
- United States
- WHO
- World Health Organization