Abstract
Purpose The goal of this study was to compare noninvasive respiratory support to invasive mechanical ventilation as the initial respiratory support in COVID-19 patients with acute hypoxemic respiratory failure.
Methods All patients admitted to a large healthcare network with acute hypoxemic respiratory failure associated with COVID-19 and requiring respiratory support were eligible for inclusion. We compared patients treated initially with noninvasive respiratory support (noninvasive positive pressure ventilation by facemask or high flow nasal oxygen) with patients treated initially with invasive mechanical ventilation. The primary outcome was time-to-in-hospital death analyzed using an inverse probability of treatment weighted Cox model adjusted for potential confounders. Secondary outcomes included unweighted and weighted assessments of mortality, lengths-of-stay (intensive care unit and hospital) and time-to-intubation.
Results Over the study period, 2354 patients met inclusion criteria. Nearly half (47%) received invasive mechanical ventilation first and 53% received initial noninvasive respiratory support. There was an overall 38% in-hospital mortality (37% for invasive mechanical ventilation and 39% for noninvasive respiratory support). Initial noninvasive respiratory support was associated with an increased hazard of death compared to initial invasive mechanical ventilation (HR: 1.61, p < 0.0001, 95% CI: 1.33 - 1.94). However, patients on initial noninvasive respiratory support also experienced an increased hazard of leaving the hospital sooner, but the hazard ratio waned with time (HR: 0.97, p < 0.0001, 95% CI: 0.96 - 0.98).
Conclusion These data show that the COVID-19 patients with acute hypoxemic respiratory failure initially treated with noninvasive respiratory support had an increased hazard of in-hospital death.
Introduction
The optimal management strategy for patients with acute hypoxemic respiratory failure due to SARS-CoV-2 infection has undergone particular interest and scrutiny. Early, intense discussion occurred in the published literature and on social media. High failure rates with noninvasive positive pressure ventilation (NIPPV) during the SARS-CoV outbreak in 2003,1 concerns over aerosol transmission with noninvasive support strategies,2–8 potential novelty of respiratory physiology with COVID-19,9–18 impending ventilator shortages,19–21 and the high mortality initially reported with invasive mechanical ventilation (IMV)22, 23 all factored in the discussion over timing of intubation and the utility of noninvasive strategies.
The current literature on noninvasive respiratory support (NIRS) — NIPPV or high flow nasal oxygen (HFNO) — for patients with COVID-19 report disparate outcomes. Importantly, these studies compare noninvasive strategies to each other, or to conventional oxygen therapy, but not to invasive mechanical ventilation. The goal of this study was to explore the optimal initial treatment for COVID-19 patients admitted with acute hypoxemic respiratory failure by comparing invasive versus noninvasive strategies.
Methods
Study Design and Setting
This study is a retrospective cohort observational study using clinical data. Clinical data were obtained from the Banner Health Network clinical data warehouse, covering 26 hospitals across six states in the western United States. Data were extracted for all adult patients (≥18 years) admitted for respiratory failure associated with COVID-19 between January 1, 2020 and January 7, 2021. Data consist of de-identified structured data generated from the Cerner (Cerner Corporation, North Kansas City, MO, USA) electronic health record. This work adheres to the STROBE reporting guidelines and was approved by the University of Arizona (IRB #1907780973) and Banner Health Institutional Review Boards (IRB #483-20-0018).
Study Participants and Treatment Assignment
We generated seven cohorts using a phenotyping algorithm24 based on the sequence of therapies received: (1) IMV only, (2) NIPPV only, (3) HFNO only, (4) NIPPV requiring subsequent IMV, (5) HFNO requiring subsequent IMV, (6) IMV extubated to NIPPV, (7) IMV extubated to HFNO, and (8) evidence of all three treatments but unclear treatment ordering. Those treated with any noninvasive respiratory support modality first were compared to those treated with invasive mechanical ventilation first. Subsequent analyses separated NIPPV and HFNO and included pairwise comparisons of initial IMV, NIPPV, and HFNO.
We estimated the propensity to be given each modality using generalized boosted models and used inverse probability of treatment weighting in the models to account for non-random treatment assignment.25 The variables for propensity score estimation included age, body mass index, sex, ethnicity (non-Hispanic, Hispanic), race (white, non-white), respiratory rate and SPO2/FIO2 ratio most immediately prior to first treatment, hospital size by American Hospital Association category (small [< 100 beds], medium [100-499 beds], large [> 500 beds]), and either hours from hospital admission to first treatment or hours from ICU admission to first treatment, transformed via the Box-Cox method with negatives.26 These variables were additionally included in Cox models to further improve balance between treatment groups.
Outcomes and Data Analysis
The primary outcome is time-to-in-hospital death. We fit a Cox model from first treatment initiation to death with censoring occurring at hospital discharge. However, there are multiple potential pathways for a patient (1. death, 2. discharge alive [from hospital or ICU], 3. remain in the hospital, and 4. intubation [if on NIRS]). We modeled discharge alive and intubation with three additional hazard models. First, we fit a cause-specific hazard model for time-to-hospital discharge alive with death as a competing risk where time zero was set as time of first treatment initiation. For this outcome, we performed a planned sensitivity analysis with time zero set as time of hospital entrance. Second, we fit a cause-specific hazard model for time-to-ICU discharge alive (from ICU admission) with death as a competing risk. Only visits with an ICU stay were included in this analysis; thus, results are conditional on having been admitted to the ICU and by nature excluded ICU-level of care patients who remained outside of the ICU. Third, we conducted a competing risks analysis for time-to-intubation with death as a competing risk and censoring occurring at hospital discharge. Only visits with known time to event (96.3% of possible visits) and a noninvasive initial treatment were included. The competing risks analysis used a modification of Gray’s test that incorporates inverse probability of treatment weighting.27 Statistical significance was judged via the median p-value approach.28 Planned sensitivity analyses were performed to assess preprocessing and analysis decisions made for the competing risks (see online supplement for details).
We analyzed unweighted outcomes of mortality, intubation rate, duration of mechanical ventilation in patients who failed initial NIRS treatment, length-of-hospital-stay, and ICU-free days using Fisher’s Exact and Kruskal-Wallis rank sum tests where appropriate. For all outcomes, the predictor of interest was first treatment (noninvasive respiratory support or invasive mechanical ventilation). We assessed if the proportional hazard assumption was violated in the Cox models by including an interaction of time by first treatment. If the interaction was statistically significant at α = 0.05, we report the interaction model; otherwise, we report the proportional hazards model.
Data preprocessing is described in detail in the online supplement. As a retrospective study based on routinely collected EHR data, it is expected that the extent of observed data will be different for different clinical variables.29–31 Missing data were handled by using multiple imputation by chained equations (MICE).32, 33 For each non-competing risks analysis, we created 50 imputed data sets using all variables in the propensity score construction, the Nelson-Aalen estimate of the available time-to-event data, the time-to-event itself, and event information (e.g., death, hospital discharge alive, ICU discharge alive). Age, body mass index, SpO2/FiO2, the transformed time-from-hospital admission (or ICU admission) to first treatment, respiratory rate, time-to-event, and the Nelson-Aalen estimate of the time-to-event were imputed via predictive mean matching. Sex, ethnicity, race, the outcome event, hospital size, and first treatment were imputed with logistic regression (two-category variables) or multinomial log-linear models via neural networks (> two category variables) as appropriate. Raw time-to-event was imputed but not used to predict other variables in the MICE algorithm. Instead, for the Cox models and some competing risk models, temporal information was used in the prediction of other missing values via the Nelson-Aalen estimate. For the competing risk models using the median p-value inference approach, no outcome information was allowed to predict other variables in the MICE algorithm. Since each imputed data set had different values for variables used in the propensity score algorithm, we estimated propensity scores for each imputed data set separately. For the Cox models, the propensity scores from a specific imputed data set were used to do inverse probability of treatment weighting with the analysis of that data set,34 and results were combined using Rubin’s Rules. For the competing risks analyses, the propensity scores from each imputed data set were handled differently depending on the inference approach. See the supplementary information for further details. All data preprocessing and statistical analyses were done using R version 4.0.435 and included the following packages: twang34, survival36, 37, survminer,38 MICE32, xtable,39 and tidyverse.40
Results
During the study period, there were 2354 COVID-19 patients that met criteria for inclusion. There were 326 that were not classified by the phenotyping algorithm but received all three treatments. Of the 2028 patients that were reliably classified, invasive mechanical ventilation was used as the first therapy in 947 (47%) patients and noninvasive respiratory support in 1081 (53%), Figure 1, Table 1. Of those on noninvasive respiratory support, 811 (75%) received NIPPV first and 270 (25%) received HFNO first. There was imbalance in NIRS use among hospitals of different sizes with larger hospitals disproportionately using NIPPV first (large hospitals 81% [433/536], medium hospitals 70% [346/495], small hospitals 64% [32/50]) Table 1. For the 326 unclassifiable records, the first treatment was imputed using multiple imputation. Since treatment assignments may have varied between imputed data sets, demographics for this group are reported separately in the online supplement.
STROBE Statement
326 participants had evidence of high-flow nasal oxygen, noninvasive positive pressure ventilation, and invasive mechanical ventilation but were unable to be reliably classified by the phenotyping algorithm because of nonsensical time stamps for timing of therapies. See text and online supplement for further description and demographics of this group. “Additional exclusions” includes 4 participants that were excluded because of nonsensical hospital discharge time stamps, and an additional group of patients excluded only from the model assessing time-to-ICU exit.
Demographics
All-cause in-hospital mortality was 38%. For those on IMV first, mortality was 37%. For those treated with noninvasive respiratory support first and never required intubation, mortality was 29%, but rose to 60% for those that required intubation. There was also an imbalance in mortality rates between NIPPV and HFNO, Table 2. Intubation rates for those treated with HFNO and NIPPV were 32% (87/270) and 33% (268/811), respectively.
Unmatched outcomes
NIRS vs IMV
Initial noninvasive respiratory support was associated with an increased hazard of in-hospital death compared to initial invasive mechanical ventilation (HR: 1.61, p < 0.0001, 95% CI:1.33 - 1.94), and there was no significant interaction of treatment and time with this association (HR: 1.00, p = 0.6038, 95% CI: 0.98 - 1.01) Figure 2. However, being on initial noninvasive respiratory support was also associated with an early increased hazard of leaving the hospital alive, but this hazard ratio decreased over time due to an interaction between time and treatment. This eventually resulted in the hazard ratio reversing such that initial noninvasive respiratory support was later associated with a decreased hazard of leaving the hospital alive (HR: 0.97, p < 0.0001, 95% CI: 0.96 - 0.98) FIGURE 2. This was consistent with the sensitivity analysis where time zero is hospital admission (online supplement). There was also a similar pattern in the difference between initial NIRS and initial IMV in the time-to-ICU discharge alive: early on, initial noninvasive respiratory support was associated with an increased hazard of discharge alive from the ICU, but this hazard ratio decreased over time because of a statistically significant interaction between treatment and time (HR: 0.98, p = 0.0321, 95% CI: 0.97 - 1.00), resulting in no significant differences between non-invasive and invasive initial support at later time points..
Kaplan-Meier curves for NIRS vs IMV for death (left) and discharge alive (right)
HFNO vs NIPPV vs IMV
Both HFNO (HR: 2.37, p < 0.0001, 95% CI: 1.85 - 3.03) and NIPPV (HR: 1.44, p = 0.0003, 95% CI: 1.18 - 1.74) were associated with increased hazards of in-hospital mortality compared to invasive mechanical ventilation. HFNO had an increased hazard of in-hospital death compared to NIPPV (HR: 1.65, p = 0.0001, 95% CI: 1.29 - 2.10). The interactions of time and both NIPPV and HFNO were not statistically significant and were excluded from the final model (both p > 0.05).
For time-to-hospital discharge alive, there were significant interactions between treatment and time for both NIRS modalities (time by HFNO HR: 0.96, p = 0.0011, 95% CI: 0.94 - 0.98 and time by NIPPV HR: 0.98, p = 0.0231, 95% CI: 0.97 - 1.00). Similar to the models that grouped NIPPV and HFNO, these models showed increased hazards of discharge alive for both HFNO and NIPPV compared to invasive mechanical ventilation that decreased over time but at different rates (HFNO faster, NIPPV slower), FIGURE 3. This resulted in no statistically significant differences between NIPPV and invasive mechanical ventilation at later time points but a decreased hazard of discharge alive for HFNO compared to invasive mechanical ventilation.
Estimated Hazard Ratios for Time-to-Discharge Alive HFNO vs. IMV (left) and NIPPV vs. IMV (right) at various points in time. Pointwise 95% confidence intervals included.
Patients who started on HFNO first and were intubated were more likely to be intubated sooner than those who started on NIPPV (p < 0.001). First treatment with both IMV and NIPPV were associated with a higher hazard of ICU discharge alive than first treatment with HFNO (IMV HR: 1.50, p = 0.0226, 95% CI: 1.06 - 2.13; NIPPV HR: 1.84, p = 0.0014, 95% CI: 1.27 - 2.68), and first treatment with NIPPV was associated with a higher hazard of leaving the ICU alive than first treatment with IMV (HR: 1.23, p = 0.0341, 95% CI: 1.02 - 1.48). There was no significant interaction between time and treatment for time-to-ICU discharge alive (time by HFNO HR: 0.99, p = 0.4065, 95% CI: 0.97 - 1.01; time by NIPPV HR: 0.99, p = 0.3047, 95% CI: 0.97 - 1.01), and these interactions were left out of the final model. All competing risk sensitivity analyses showed the same pattern and were also statistically significant (all p < 0.001, see online supplement). In the sensitivity analysis of time-to-hospital discharge alive, the interaction of time with HFNO was statistically significant (p = 0.0055) but the interaction of time with NIPPV was not (p = 0.0848). Despite this, the relationship between therapies with the time by treatment interaction was consistent with the model where time zero was set to first treatment initiation.
Discussion
Our results show that initial noninvasive respiratory support presents a double-edged sword for COVID-19 patients with acute respiratory failure. Patients supported first by NIRS experience a greater hazard of in-hospital death compared to those intubated first. However, those same patients also experience a greater hazard of being discharged alive up to about 15-35 days, after which that hazard decreases either to the point of reversing direction (for NIRS in general and HFNO specifically) or no longer being statistically significant (for NIPPV). Essentially, patients initially treated noninvasively have higher hazards of both in-hospital death and discharge alive up to about a month. After that, their hazard of in-hospital death remains elevated but their hazard of discharge alive decreases. Moreover, those started on HFNO first experience faster intubation compared to those started on NIPPV first.
These results offer several important contributions. Studies utilizing NIRS for patients with COVID-19 compare one or both noninvasive respiratory support strategies either to each other or to conventional oxygen therapy and have reported mixed findings. Some report improved outcomes with high flow nasal oxygen compared to conventional oxygen therapy,41–43 or to noninvasive positive pressure ventilation,44 while others report improved outcomes with noninvasive positive pressure ventilation compared to conventional oxygen therapy,45 or to high flow nasal oxygen.46 Yet some studies report no difference in outcomes compared to conventional oxygen therapy,43, 45, 47 or to each other.46–50 Those studies utilize intubation directly as the primary outcome alone or as part of a composite outcome with mortality.>41–>48, >51, >52 We found that invasive mechanical ventilation as a comparator intervention may be the optimal method of respiratory support for COVID-19 patients that need more than conventional oxygen.
Additionally, most studies are limited to patients admitted to the ICU, which carries inherent selection bias and confounding potentially reflected as higher intubation rates of around 50%. We included all patients regardless of location as many patients were cared for outside of the physical ICU location or the ICU service from pandemic-strained resources.
We found that the benefit of leaving the hospital alive has a relationship with time. There are several potential explanations for this. NIRS cohorts are classified by a phenotyping algorithm based on the first therapy received but could have crossed over between NIRS modalities at any point with or without intubation. Crossover in those patients requires failing NIRS twice before intubation. Second, intubation could have been delayed for various reasons at the cost of worsening acute lung injury that may have benefitted from lung protective ventilation. Gershengorn found similar results in that the overall 46% failure rate on HFNO evolved from a decreased odds of failure for those on HFNO for a short time to an increased odds of failure for those on HFNO for longer.52 Similar findings were reported with NIPPV, which reduced mortality only in patients with short hospital stays, but was associated with higher mortality in those hospitalized longer than seven days.53
There are several issues to consider when generalizing and interpreting data on noninvasive respiratory support and invasive mechanical ventilation in COVID-19 patients. Early in the pandemic, early intubation was often recommended to avoid aerosol exposure to healthcare staff and based on the high failure rates of NIPPV in the SARS epidemic. This was followed by a concerningly high early mortality reported in patients on mechanical ventilation22 and debate over the clinical syndrome seen in COVID-19 that contributed to variability in mechanical ventilation (e.g., high tidal volumes or alternative ventilator modes) or an avoidance of intubation altogether in some patients.54–59 Simultaneously, pharmacological adjuncts to standard of care have changed over time, especially during the duration of this study60, including convalescent plasma 61–63, corticosteroids,64–68 interleukin 69–76 and janus kinase 77–79 inhibitors, antivirals 80–83, hydroxychloroquine 80, 84, 85, and anticoagulation strategies.86–88 Lastly, patient surges89 and surge capacity almost certainly contributed to patient outcomes as more patients were managed outside of traditional ICUs, or in expansion ICUs, and facilities were faced with impending ventilator shortages, staffing ratio changes, and an increase in traveling staff. These results are, however, hypothesis generating for non-COVID-19 acute respiratory distress syndrome patients where NIPPV has been shown in some studies to be associated with higher mortality than IMV.90
There are also important limitations to our study. Our data are limited to the COVID-19 patients in 2020 and the evolution of COVID-19 management potentially limits generalizability. Additionally, results are based on the first assigned therapy and patients who crossed over between NIRS and any imbalance between those crossovers could confound the findings. There were also some necessary assumptions based on the nature of the observational dataset from the electronic health record. For instance, we only considered a given patient’s first COVID-19 hospital visit where they required respiratory support. If two visits that appeared to belong to the same person were less than 24 hours apart, we assumed a hospital transfer occurred and combined these records into the same visit. We attempted to control for confounding, including by indication, by weighting our analysis by the inverse probability for treatment assignment, further adjusting for potential confounders in the Cox models, and by a competing risks analysis where death and intubation were competing risks. Additionally, some patients had evidence of all three treatments but no clear treatment ordering for which we performed sensitivity analyses to assess how dependent our results are on our specific inclusion, exclusion, or imputation decisions. Lastly, clinical care was not protocolized and there were likely important differences between flow rates with HFNO, airway pressures with NIPPV (of which only facemask NIPPV was available), and mechanical ventilator settings that could confound these results. Despite these limitations, the size of our dataset and statistical analyses still provide valuable results.
We found that patients intubated early without a trial of noninvasive respiratory support had a lower hazard of in-hospital death, but also a lower hazard of hospital discharge alive up to about a month. Clinical focus should shift to not delaying intubation in patients in whom noninvasive respiratory support has not reduced work of breathing or hypoxemia. Meanwhile, studies are needed to identify optimal patients for each noninvasive respiratory support modality and early prediction and identification of failure of that modality.
Statements and Declarations
Source of Support
This work was supported by an Emergency Medicine Foundation grant sponsored by Fisher & Paykel, and in part by the National Science Foundation under grant#1838745 and the National Heart, Lung, and Blood Institute of the National Institutes of Health under award number 5T32HL007955. Neither funding agency or sponsor was involved in the design or conduct of the study or interpretation and presentation of the results.
Disclosures
JMM, VS, and JMF received grant support for this work by the Emergency Medicine Foundation.
Author Contributions
JMM, VS, JMF, and PE conceived the study idea. PE, VS, and JMM developed the phenotyping algorithm. PE, JMF, JMM, and SP preprocessed the data. PE and VS applied the phenotyping algorithm. JMF, EJB, and SP performed the statistical analysis. JMM and JMF drafted the initial manuscript, and all authors participated in manuscript revisions.
Data Availability
Data are available upon reasonable request, IRB approval, and a data use agreement.
Online Supplement
Data Preprocessing
Data were preprocessed to exclude participants under age 18, visits for which a phenotype was not available, and participants who exited the hospital (either alive or dead) before treatment. Some patients appeared to have multiple applicable hospital visits. Hospital transfers were identified by looking for multiple entries with the same ID where the patients had the same gender, race, ethnicity, age (max one year difference), the patient was alive at transfer, and consecutive hospital discharge and admission times were fewer than 24 hours apart. Identified transfers were combined into a single visit entry, and for six visits, phenotypes were adjusted as appropriate. Overall, 48 transfers were identified, and 122 later visits were discarded. Later visits were discarded in order to sample from the population of first COVID-19 visits. Height and weight were defined preferentially as the first recorded height and weight and secondarily as the admission height and weight (available for ICU data only); among ICU visits, there were 72 instances of using an admission instead of a time-stamped height and 11 instances of using an admission instead of a time-stamped weight. Heights below 48 inches and weights below 50 pounds were classified as potential data errors, and the next valid height and/or weight was sought. For nine people, this resulted in no height data being available; for two people these rules resulted in no weight data being available. In the ICU data, 72 visits had an admission height but not a time-stamped height, and 11 visits had an admission weight but not a time-stamped weight. In order to work with a deidentified data set, for the 42 participants with age over 89 years (18 participants with “>89”, and 24 with specific ages >89), their age was marked as 90. Race and ethnicity were standardized. For the respiratory rate, FiO2, oxygen flow rate, and SpO2 variables, the value closest to and before first treatment were identified. For people with phenotype 8 (evidence of all three treatments but unclear order), the first treatment start time was taken to be the earliest treatment start time identified by the phenotyping algorithm. Oxygen flow rates were used to calculate FiO2 (FiO2 = 100(0.21 + oxygen flow [L/min-1] x 0.03), and the FiO2 value (either raw or calculated) selected for later use was the latter one.
Propensity Score Estimation via Generalized Boosted Models
We follow McCaffrey et al.’s (2013) approach to estimating propensity scores by using generalized boosted models (GBMs).1 GBMs estimate propensity scores for the three treatments by fitting the models to the dichotomized outcomes of each treatment versus the other two. This results in three GBMs being fit each time propensity scores are estimated for models comparing NIPPV, HFNO, and IMV. Because each GBM consists of an iterative process of combining simple regression trees into a piecewise constant function, for each GBM, we must choose a point at which to stop adding onto the regression tree. Four criteria can readily be used to choose a stopping point; all four attempt to describe the difference between the distributions of covariates for the comparison groups for each dichotomized outcome (e.g., between patients who received IMV and all patients); the goal is to minimize the differences between the groups. Note that we wish to estimate the average treatment effect between pairs of treatments within the context of our entire population. This is considered in each stopping criterion:1
Mean Absolute Standardized Bias: mean across covariates of the absolute standardized difference between the weighted mean of the covariate for the treatment group of interest and the unweighted mean of the covariate for the entire sample. Weights are the inverse of the estimated propensity scores. Because we are interested in estimating the treatment effect in the general population of people receiving any of the three treatments, we standardize (i.e., divide) here by the estimated standard deviation from the unweighted, pooled, full sample.
Maximum Absolute Standardized Bias: maximum across covariates of the absolute standardized difference between the weighted mean of the covariate for the treatment group of interest and the unweighted mean of the covariate for the entire sample. Weights and standardization are as described above.
Mean Kolmogorov-Smirnov mean across covariates of the supremum of pointwise absolute differences between the weighted empirical distribution function for the covariate among those treated with the treatment of interest and the unweighted empirical distribution function for the entire sample.
Maximum Kolmogorov-Smirnov: maximum across covariates of the supremum of pointwise absolute differences between the weighted empirical distribution function for the covariate among those treated with the treatment of interest and the unweighted empirical distribution function for the entire sample.
For each GBM of a dichotomized outcome, we allow a maximum of 5000 simple regression trees. Empirically, this number appeared to allow all four criteria above to achieve their minimum and thus was large enough without being excessively large. For each outcome and each stopping criterion, the optimal number of trees was selected. Thus, for each treatment, we estimated four propensity scores for having been given the treatment in question. Inverse probability of treatment weighting was used in subsequent analyses by taking the inverse of the propensity scores.
For all non-interaction and non-sensitivity models, a separate analysis was done for each stopping criterion to ensure that results were not dependent on the particular criterion used. No result was dependent on stopping criterion.
Sensitivity Analyses for Time to Intubation
The competing risks analyses of time-to-intubation made a series of analytical decisions that could impact the subsequent inference. We thus designed a series of sensitivity analyses that systematically varied our assumptions and analytical approach to ensure that results were not dependent on specific decisions.
The primary reason for sensitivity analyses was that 326 visits could not be reliably classified using the phenotyping algorithm. These unclassifiable visits always contained evidence of all three treatments (IMV, HFNO, and NIPPV). However, in these cases, conflicting information in the medical record often made it unclear to the algorithm which treatment came first. For example, a patient might appear to be intubated, switched to NIPPV, then reintubated in a 30-minute period. Such information would cause the algorithm to fail to assign a treatment ordering to that visit. While these same cases were present in the other analyses, in those analyses, we were able to impute treatment, estimate propensity scores for each imputed data set, model each data set, then combine model estimates across data sets using Rubin’s Rules.
The weighted Gray’s test of competing risks, however, used a bootstrapped null distribution. There was thus no obvious way to use Rubin’s Rules. A common alternative approach to combining propensity scores and multiple imputation --- estimating propensity scores for each imputed data set, averaging them, and choosing a single imputed data set at random to use in the analysis along with the averaged propensity scores --- wasn’t applicable because of the need to impute treatment. When treatment is imputed, it is unclear how well a particular data set approximates the true but unknown treatment identities, and thus using the average propensity score approach is inappropriate.2 Moreover, different observations could be included in each data set since people imputed to have an invasive first treatment would be excluded. Determining how to compute average propensity scores for people who were not consistently included in the analyses and how to appropriately handle the data from those individuals left out of the randomly sampled data set is beyond the scope of this paper. A final challenge of using the average propensity score approach came from the need to impute the time to event for 52 people; if the imputed outcomes were particularly outlying and influential for a randomly selected data set, then the inference may not be correct. An alternative to Rubin’s Rules — judging statistical significance via the median p-value from the pooled p-values across all the imputed data sets, while appealing, is less common and would need to be utilized in an analysis scenario not covered in Eekhout et al.3
The sensitivity analyses thus encompass a range of ways to handle the above challenges. Each combines a data decision, an outcome decision, an imputation decision, and an inference approach decision to arrive at an analysis pipeline. Note that some combinations of the four decisions were not possible. Moreover, where it was possible to take either the average propensity score or median p-value approach, we did not always conduct both corresponding analyses. Rather, we defaulted to the average propensity score approach except when we are imputing time-to-event information; in those cases, we also conduct the corresponding analysis with the median p-value inference approach as an extra confirmation that the specific data set chosen in the average propensity score approach didn’t have particularly influential imputed times-to-event.
Data Decision: How and how much of the unclassifiable data to include.
- Option A: We assume that we know everything about the unclassifiable data and base our knowledge on treatment start information. We exclude patients with IMV as the first marked treatment and assign patients with HFNO or NIPPV marked as the first treatment to the corresponding treatment group. If the two noninvasive therapies tie for first treatment (7 people), we randomly select one.
- Option B: We assume that we know whether the first treatment was IMV or NIRS based on the treatment time stamps. Patients with IMV as the first treatment are excluded from analysis. However, because of potential nomenclature inconsistencies between NIPPV and HFNO, we assume that the first noninvasive treatment could have been either NIPPV or HFNO. We impute the initial noninvasive treatment type.
- Option C: We assume that we do not know what treatment came first and thus whether or not anyone with the unclassifiable phenotype should be included in this competing risks analysis. Thus, we exclude them all from the analysis.
Unknown Outcome Decision:
- Yes: Include patients with unknown time-to-event.
- No: Do not include patients with unknown time-to-event.
Imputation Decision: The decision whether to do imputation impacts which predictors are used in the propensity score calculations. If imputation is conducted, the propensity score algorithm uses the previously described predictors; otherwise, it uses a restricted subset for which all information is available (age, gender, hospital size).
- Yes: Conduct multiple imputation by chained equations.
- No: Do not impute missing data.
Inference Approach Decision:
- Average Propensity Score Approach: In this approach, the propensity scores for each visit are averaged across imputed data sets. These averaged scores are used with a randomly-selected imputation dataset. This approach is not appropriate when treatment is imputed.
- Median P-Value Approach: In this approach, the weighted Gray’s test is conducted on each imputed data set (with its own set of propensity scores). Then, the median p-value across data sets is taken for inference.
- Single Propensity Score Approach: This case arises when we are not imputing data. Then, we estimate propensity scores once (based on the limited predictor set) and use them in the weighted Gray’s test.
The ten proposed competing risks analyses correspond to the following combinations of the four decision points (all p-values were <0.001):
We considered the analysis Data Decision B + Unknown Outcome Decision: No + Imputation Decision: Yes + Inference Approach Decision: Median P-Value to be primary and the others to be sensitivity analyses. Additionally, given the lack of inferential difference associated with the different propensity score estimation stopping rules, for all sensitivity analyses, we only used the mean absolute standardized bias stopping rule. We did, however, use all four stopping rules for the primary competing risks analysis.
In the estimation of these weighted survival plots, people who die remain in the risk set after their death because they have neither been censored nor experienced intubation. All time-to-intubation analyses showed statistically significant differences between initial treatment with HFNO and NIPPV (p < 0.001).
Demographics of Unclassifiable Cohort
Model estimates shown below correspond to the mean absolute standardized bias stopping rule in the generalized boosted models. Inferential results were the same for the other stopping rules
Time-to-in-hospital death weighted Kaplan-Meier curves for noninvasive respiratory support versus invasive mechanical ventilation. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
Death is treated as a competing risk. Time zero is time of first treatment assignment.
Time-to-hospital discharge alive weighted Kaplan-Meier curves for noninvasive respiratory support versus invasive mechanical ventilation. Time zero is time of first treatment assignment. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
Death is treated as a competing risk. Time zero is time of hospital admission.
Time-to-hospital discharge alive weighted Kaplan-Meier curves for noninvasive respiratory support versus invasive mechanical ventilation. Time zero is time of hospital admission. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
with death as a competing risk for noninvasive respiratory support versus invasive mechanical ventilation. Time zero is time of ICU admission.
Weighted Kaplan-Meier curves for noninvasive respiratory support versus invasive mechanical ventilation. Time zero is time of ICU admission. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
Model estimates shown below correspond to the mean absolute standardized bias stopping rule in the generalized boosted models. Inferential results were the same for the other stopping rules.
Time-to-in-hospital death weighted Kaplan-Meier curves for HFNO versus NIPPV versus invasive mechanical ventilation. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
Death is treated as a competing risk. Time zero is time of first treatment assignment.
Time-to-hospital discharge alive weighted Kaplan-Meier curves for HFNO versus NIPPV versus invasive mechanical ventilation. Time zero is time of first treatment assignment. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
Death is treated as a competing risk. Time zero is time of hospital admission.
Time-to-hospital discharge alive weighted Kaplan-Meier curves for HFNO versus NIPPV versus invasive mechanical ventilation. Time zero is time of hospital admission. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
with death as a competing risk for HFNO versus NIPPV versus invasive mechanical ventilation. Time zero is time of ICU admission. Model estimates shown below correspond to the mean absolute standardized bias stopping rule in the generalized boosted models. Inferential results were the same for the other stopping rules.
Weighted Kaplan-Meier curves for HFNO versus NIPPV versus invasive mechanical ventilation. Time zero is time of ICU admission. In these Kaplan-Meier estimates, people experiencing death are treated as censored. Curves use propensity scores estimated via the mean absolute standardized bias stopping rule in the generalized boosted models.
STROBE Checklist
Acknowledgements
The authors would like to thank Don Saner and Mario Arteaga from the Banner Health Network Clinical Data Warehouse for their support during this project.
References
- 1.↵
- 2.↵
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.↵
- 9.↵
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.
- 57.
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.
- 66.
- 67.
- 68.↵
- 69.↵
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.↵
- 77.↵
- 78.
- 79.↵
- 80.↵
- 81.
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.↵