SUMMARY
This study applied causal criteria in directed acyclic graphs for handling covariates in associations for prognosis of severe COVID-19 (Corona virus disease 19) cases. To identify nonspecific blood tests and risk factors as predictors of hospitalization due to COVID-19, one has to exclude noisy predictors by comparing the concordance statistics (AUC) for positive and negative cases of SARS-CoV-2 (acute respiratory syndrome coronavirus 2). Predictors with significant AUC at negative stratum should be either controlled for their confounders or eliminated (when confounders are unavailable). Models were classified according to the difference of AUC between strata. The framework was applied to an open database with 5644 patients from Hospital Israelita Albert Einstein in Brazil with SARS-CoV-2 RT-PCR (Reverse Transcription – Polymerase Chain Reaction) exam. C-reactive Protein (CRP) was a noisy predictor: hospitalization could have happen due to causes other than COVID-19 even when SARS-CoV-2 RT-PCR is positive and CRP is reactive, as most cases are asymptomatic to mild. Candidates of characteristic response from moderate to severe inflammation of COVID-19 were: combinations of eosinophils, monocytes and neutrophils, with age as risk factor; and creatinine, as risk factor, sharpens the odds ratio of the model with monocytes, neutrophils, and age.
INTRODUCTION
COVID-19 (Corona virus disease 19) caused by SARS-CoV-2 (acute respiratory syndrome coronavirus 2) stands out for its high rate of hospitalization and long hospital stay and in intensive care units (ICU). COVID-19 disease severity can be mild, moderate, severe, and critical [1]. While 81% of those infected with COVID-19 have mild or moderate symptoms, World Health Organization (WHO) estimates that 14% of those infected with COVID-19 are severe and require hospitalization and oxygen support, and 5% are critical and admitted to intensive care units [1]. Reported median hospital length of stay (LoS) was from 4 to 21 days (outside China) and ICU LoS was from 4 to 19 days [2].
The severity of COVID-19 states is associated with many risk factors. Early reports suggest advanced age, comorbidities, multi-comorbidities, and immunosuppression [3,4]. The enlarging list includes diabetes, cardiac disease, chronic lung disease, cerebrovascular disease, chronic kidney disease, cancer, liver disease, obesity, hypertension, dyspnea, fatigue, and anorexia [1,5,6].
Early identification of severe cases allows the optimization of emergency care support [1] and the improvement of patient outcomes [7]. However, patients who do not yet meet supportive care criteria may fail to receive the necessary care, when there is rapid deterioration or inability to promptly go to a hospital. In the transition from moderate to severe cases there can be avoidable delays in life support interventions with non-optimized treatments.
Interest in developing predictive models of COVID-19 outcomes are widespread [7,8]. A review of 50 prognostic models concluded that they are at high risk of bias [8]. As most studies are focused on reporting statistical findings, our concern is with lack of minimum causal criteria to evaluate fragmented findings and to identify potential useful associations that are effectively related to COVID-19 inflammation.
In this context, a path to optimized supportive treatments is more reliable assessments of the transition from moderate to severe cases of COVID-19 inflammation. We choose nonspecific blood tests as they are widely available and hospitalization decision as a proxy to characterize the transition from moderate to severe cases (when not constrained by inpatients availability). After formalizing an analytical framework with causal reasoning, the goal is to identify candidate sets of blood tests associated with hospitalization (with risk factors), excluding noisy predictors that are not related to COVID-19 inflammation.
METHODS
Whereas causal effects are clearly predictive, prediction studies usually refer to noncausal analysis that uses observational data to make predictions beyond the observed ones and confounding bias is generally considered a nonissue [9]. But when one needs more reliable predictions, confounding bias and causality should be accounted for in associations. This study applies analytical tools from the causal effect estimation of directed acyclic graph theory [10] to investigate associations between two sets of outcomes (hospitalization and blood tests) that are related to a common cause (moderate to severe COVID-19 inflammation) considering covariates.
The strength of the association depends on the specificity and sensitivity of the COVID-19 inflammation pattern, as a kind of distinctive signature of the disease. A low association can also occur and means that the pattern with that set of variables allows weak inferences. If a substantial association is identified and it is also stable and representative of the target population, then these blood tests may be useful as proxies in COVID-19 surveillance protocols and screening interventions.
Theoretical framework for analyzing associations with causal criteria
A common use of directed acyclic graph (DAG) in epidemiologic research is to identify sources of bias that may introduce spurious correlations [11,12]. A hypothetical DAG model with latent variable was conceived to evaluate various types of covariates on the focal association, figure 1. The causal path starts with the infection by SARS-CoV-2 (exposure E) that, in some cases, leads to “Moderate to Severe Inflammation due to COVID-19” (MSIC, hypothetical latent variable [E→MSIC]), and that inflammation causes two outcomes (mutual dependent relationship [H←MSIC→B]): (H) hospitalization decision; and (B={B1,…,Bk}) blood tests measured at hospital admission. The blood tests are selected according to their strength with hospitalization. The hypothetical covariates that contribute directly to COVID-19 inflammation were considered risk factors (RF={RF1,…,RFL}, mutual causation relationships [RFi→MSIC←RFj]). Covariates that affect both outcomes are identified as Both-Outcome-Covariate (BOC={BOC1,…,BOCm}) and when affect one outcome as Single-Outcome-Covariate (SOC={SOC1,…,SOCn}). These covariates are not exhaustive but to generate causal graph criteria for handling confounding factors with the d-separation and d-connection concepts [10].
The d-separation concept attempts to separate (make independent) two focal sets of variables by blocking the causal ancestors and by avoiding statistical control for mutual causal descendants [10,13]. Differently, to preserve the association between descendants of MSIC, the focal outcomes (H and B) must remain d-connected (dependent on each other only through MSIC) and their relations with other covariates (that may introduce unwanted dependencies) have to be d-separated (conditionally independent).
Causal relationships in the DAGs are defined with the concept of the do(.) operator [10,14]. The association caused by COVID-19 inflammation can be understood as a comparison of the conditional probabilities of hospitalization (H) given a set of blood tests (B) under exposure intervention (do(SARS-CoV-2)=1) and without exposure intervention (do(SARS-CoV-2)=0): Where P[H|B=b,do(SARS-CoV-2=1)] represents the population distribution of H (Hospitalization) given a set of blood tests equal to b, if everyone in the population had been exposed to SARS-CoV-2. And P[H|B=b’,do(SARS-CoV-2=0)] if everyone in the population had not been exposed to.
The interventions with do(.) generate modified DAGs (or single-world interventions graphs [9]) that allows the analysis of the covariates:
The do(SARS-CoV-2=0) eliminates all arrows directed towards SARS-CoV-2 and to MSIC, because MSIC is assumed to be non-existent without exposure (figure 2). Ignoring the floating covariates, there are single arrow covariates pointing to hospitalization (RF3, RF4A, SOC1, SOC3) and to blood tests (RF4B, SOC2, SOC4) and fork covariates pointing to both outcomes (RF5, BOC1, BOC2).
Similarly, the modified graph of do(SARS-CoV-2=1) is equal to the former graph and adds single arrows from RF1 and RF2 to MSIC; and converts RF3, RF4A, RF4B, and RF5 to fork types with arrows directed to MSIC.
As most covariates are either unmeasured or unknown, their absence can be evaluated following the intuition of the back-door criteria [10,11]:
Covariates with arrow into the causal node (MSIC) are risk factors that may increase the focal association between outcomes by increasing the effect of MSIC; and their absence tends to weaken the focal association.
Single arrow covariates and unbalanced fork covariates into one outcome (H or B) may distort the association and their absence introduces errors in the focal association, reducing the discriminative ability.
Fork covariates into both outcomes (H and B) may add spurious relations (through the back-door) into the focal association, and their absence may inadvertently increase the focal association.
Type (c) introduces non-causally related relations into the focal association and the influence of covariates RF5, BOC1, BOC2 can be estimated with the modified model without exposure (figure 2). A strong association of the outcomes (without exposure) can be due to these covariates and suggest additional efforts to control for them. Another possibility is to exclude the noisy exams that have strong spurious associations.
Model assessment with naïve estimation
A naïve estimation of equations (1) and (2) is to assume that they are equal to their conditional probabilities available in a given dataset at each stratum. The cost of this simplification is that the analysis is no longer causal (in a counterfactual sense, because we are not contrasting the whole population exposed and the whole population not exposed [9,14]) and the estimation becomes an association between two disjoint sets that each represents separate parts of the target population. As Hospitalization is a dichotomous variable, this conditional probability, P[H|B=b,SARS-CoV-2=1], can be computed through a logistic regression of Hospitalization (dependent variable) given a set of blood tests at SARS-CoV-2=1. Similarly, P[H|B=b’,SARS-CoV-2=0] can be obtained with another model (same variables but different coefficients). It is implicit that there is the conditioning by a proper set of covariates at each intervention.
The concordance statistic (C-statistic) of a logistic regression model is a standard measure of its predictive accuracy and is calculated as the Area Under of the receiver operating characteristic Curve (AUC) [9,15]. A simple way to compare the discriminative ability of (3) and (4) is to calculate the difference of the AUC at each stratum. A difference of 0.0 means no association with COVID-19 and 0,5 means perfect focal association of the outcomes and perfect differentiation among strata. The assessment of the magnitude of the naïve estimation bias requires further refinements with potential outcomes and in selecting relevant covariates to render the modeling effort analytically tractable to evaluate specific configurations. As a minimum, the comparison of the models with AUC values at the negative stratum of SARS-CoV-2 is a necessary improvement in the assessment of prognostic models. This is similar to the null values concept in measures of associations of two groups with two outcomes to assess if there is any difference between them [9], but generalized for continuous multivariable prognostic models.
Model selection criteria
The above framework guided our approach to identify sets of blood tests associated with the hospitalization due to COVID-19 together with:
Acceptable overall statistical properties of each model at the positive stratum of SARS-CoV-2, considering the magnitude of the coefficient odds ratio and their statistical significance without and with bootstrap procedure (resampling);
Consistency of the blood test coefficients across models with one variable and with multiple variables: considering causal effects, coefficients should not change signal when properly conditioned across models [16]; and
Elimination of models with high AUC at the negative stratum of SARS-CoV-2 and classification of the sets of blood tests by the difference of AUC between strata.
Source dataset
We identified one observational database in which, at least partially, we could apply the framework and generate candidate prognostic models. Hospital Israelita Albert Einstein (HIAE), in São Paulo – Brazil, made public a database (HIAE_dataset)[17] in the kaggle platform of 5644 patients screened with SARS-CoV-2 RT-PCR (Reverse Transcription – Polymerase Chain Reaction) exam and a few collected additional laboratory tests during a visit to this hospital from February to March, 2020. All blood tests were standardized to have a mean of zero and a unitary standard deviation. As this research is based on a public and anonymized dataset, it was not revised by any institutional board or ethics commission. The logistic regression models were analyzed with the aid of IBM SPSS version 22.0 and the causal map with DAGitty.net version 3.0.
RESULTS
Of the 5644 patients in the HIAE_dataset [17], 558 presented positive results for SARS-CoV-2 RT-PCR. Of the 170 patients hospitalized (in regular ward, semi-intensive unit or intensive care unit), 52 were positive (9,3% rate of hospitalization due to COVID-19). Patient age quantile, from 0 to 19, with sample mean of 9,32, was the only demographic variable available. Age was not conditionally independent with SARS-CoV-2 RT-PCR exam. Only 0,9% were positive in the age quantile 0, 1, and 2 (8 positive cases in 883 exams) while the incidence (not weighted) in the age quantile from 3 to 19 was 11,7% ± 2,6%.
In the first round, the following blood tests were discarded because of poor performance of the univariate model when SARS-CoV-2=1: Basophils, Hematocrit, Hemoglobin, Leukocytes, Mean platelet volume, Mean corpuscular hemoglobin (MCH), Mean corpuscular hemoglobin concentration (MCHC), Mean corpuscular volume (MCV), Platelets, Potassium, Red blood Cells, Red blood cell distribution width (RDW), Serum Glucose, Sodium, and Urea (Table 1).
The remaining blood tests are creatinine, C-Reactive Protein (CRP), eosinophils, lymphocytes, monocytes, and neutrophils (Table 1). Only creatinine is not related with the immune system directly and it will be evaluated initially as a risk factor. Of the 5644 patients, eosinophils were recorded for 602 patients, lymphocytes for 602, monocytes for 601, neutrophils for 513, CRP for 506, and creatinine for 424. In dealing with missing cases, all observations with the required data were included (available-case analysis).
CRP is a biomarker of various types of inflammation [18,19]. At SARS-CoV-2=1, the model with CRP and age has good discriminative ability with AUC of, 872 (95% confidence interval (CI), lower bound (LB)=,783; upper bound (UB)=,961). But at SARS-CoV-2=0, AUC is also substantial, 774 (95% CI, LB=,713; UB=,836) with significant overlap between strata at 95% CI. CRP is a predictor of hospitalization in general, but the substantial AUC value at the negative stratum suggests that the focal association due to COVID-19 is contaminated with other non-related associations. Models with CRP demonstrated sensitivity to resampling within the HIAE_dataset [17], the significance of the coefficient moved from, 005 to, 144 (from, 140 to, 148 in other simulations). Similar effects were found in models that include CRP with other blood tests and sensitivity to bootstrapping was reduced by dichotomizing CRP (reactive/not-reactive). Models with CRP_reactive, neutrophils, and age generated AUC of .901 (LB=,826; UB=,977) and, 755 (LB=,684; UB=,827) in the positive and negative strata, and CRP_reactive, monocytes, neutrophils, and age generated AUC of, 920 (LB=,853; UB=,987) and, 753 (LB=,678; UB=,827), respectively. High levels of AUC at the negative stratum mean that CRP is a response with significant associations due to other causes than COVID-19. Differently from other prognostic studies [20,21,22,23,24,25] (none used data at negative stratum), CRP was excluded as candidate.
The Neutrophils to Lymphocytes Ratio (NLR) is considered as a possible indicator of severity [21,24,26,27] of COVID-19, but the NLR could not be evaluated with HIAE_dataset [17] as the variables were standardized (division by zero) and were analyzed separately. Lymphocytes presented inconsistent behavior in models with two blood tests. Models with only lymphocytes (with and without age quantile) indicated lymphopenia when SARS-CoV-2=1, as expected [28,29]. The model with lymphocytes, neutrophils and age reversed the sign of the lymphocytes coefficient (SARS-CoV-2=1), possibly, due to collinearity between these blood tests (Pearson correlation of -,925 and -,937 at positive and negative strata, both significant at, 01 (2-tail)). As there are indications of collinearity issues at both strata, lymphocyte and neutrophils should not be in the same model as independent variables. As models with combinations of neutrophils were slightly better than with lymphocyte, lymphocyte was dropped from analysis.
In the second round, models with all combinations of eosinophils, monocytes, and neutrophils with age were tested systematically. Table 2 presents the models with combinations of eosinophils, monocytes, and neutrophils (with age) and the best model with creatinine (as risk factor). Table 3 presents the AUC of each model with the difference of discriminative ability of the association between strata.
Considered individually, eosinophils, monocytes, and neutrophils generated models that have good discriminative performance to estimate the probability of hospitalization (models 1, 2, 3 with AUC>,810 at positive stratum). The combination of these blood tests generates models (4, 5, 6, 7) with better discriminative ability (AUC>,856 at SARS-CoV-2=1). None of these blood tests presented AUC superior to, 680 at SARS-CoV-2=0. All models with two or more blood tests presented a difference of discriminative ability higher than Δ>,220.
Two patterns of associations are more salient: (1) age as a risk factor with combinations of eosinophils, monocytes, and neutrophils as predictors; (2) age and creatinine as risk factors with monocytes and neutrophils as predictors. The interpretation of the conditional probabilities will focus on models 6, 7, and 8, but models with at least two blood tests (4 to 8) are potential candidate associations.
Models 6 and 8 have significant blood test coefficients at p<.05 (with and without bootstrapping), but model 6 has an intermediate performance in the difference of discriminative ability between strata. Model 7 can also be seen as an extension of model 6 by adding eosinophils into the model. Considering creatinine as a risk factor (as a marker of the renal function), model 8 is the overall best model with significant coefficients at p<,05 and the highest difference of discriminative ability between strata (Δ=,268). This inclusion eliminated the influence of eosinophils from the model and can be considered as an improvement from model 6 (monocytes and neutrophils with age) by adding creatinine.
When the coefficients of model 6 (table 2) are converted to conditional probabilities we find that with monocytes and neutrophils at average, hospitalization probability is >50% for age quantile >11. At average age (quantile 9) and −1 SD (one standard deviation below average) of monocytes (or +1 SD neutrophils) result in hospitalization probability >50%. At the age quantile 15 and −1/2 SD of monocytes (or +1/2 SD of neutrophils) result in hospitalization probability of 86%. Model 7 has similar predictions with monocytes and neutrophils (of model 6) with the addition of eosinophils. When age, monocytes and neutrophils are at average, there is a hospitalization probability of 51,1% with eosinophils at −1 SD; and 90,2% when age quantile is 15.
Model 8 with creatinine has different responses than models 6 and 7. Age quantile coefficient is more pronounced and the odds ratio of creatinine is steep (8,338), so average levels of creatinine result in a probability of hospitalization >50% for age quantile >9 (with monocytes and neutrophils at average). When creatinine is +1 SD at age quantile 9, hospitalization probability is 85,9% (monocytes and neutrophils at average). In fact, only below average levels of creatinine lower hospitalization probabilities. Monocytes and neutrophils are also steeper than models 6 and 7. At age quantile 9, +1/2 SD of creatinine, −1/2 SD of monocytes, and +1/2 SD of neutrophils result in a hospitalization probability of 92,5%.
Main model biases may be due to contamination with noisy associations and missing cases selection. The AUC at SARS-CoV-2=0 is a simplified measure of the magnitude of the spurious association bias in both outcomes and all models presented relevant noisy associations (AUC from, 588 up to, 679, but not as high as CRP with, 774). Most likely, missing data are not at random (MNAR). We performed the bootstrapping procedure to identify potential sensitivity to resampling and, indirectly, to selection bias. The selected models maintained the magnitude and statistical significance of the coefficients. Apparently, as no significant deviation was detected, the missing cases bias may be less pronounced than spurious association bias.
DISCUSSION
We focused on models with discriminative ability to identify peculiar responses in the transition from moderate to severe inflammation only due to COVID-19. It is not intended to predict mortality nor severe to critical cases that require ICU. The risk of overfitting was minimized by not accepting isolated variables in “ad hoc mined” models, and by applying causal criteria to evaluate associations. The AUC evaluation at the negative SARS-CoV-2 stratum to estimate the influence of unwanted covariates into the focal association together with equivalent criteria of severity state at both strata is, to the best of our knowledge, a needed improvement in prognosis studies of COVID-19 (as an operational procedure of the null values in measures of associations for continuous multivariable prognostic models). The selected models are candidates only, the dataset [17] on which they are based cannot be representative beyond the patient health profiles of this hospital. HIAE is a reference hospital in Brazil with practices, standards, and hospitalization criteria that attends a high social-economic segment (mostly living in São Paulo). The observational sample refers to the initial phase of the pandemics in Brazil and the patterns may change with medicine prescriptions and other changes of SARS-CoV-2.
In comparison to other prediction studies, we identified a few focused on the transition from moderate to severe cases of COVID-19 [20,21,22,23,24,25,26,27]. Most studies recommend NLR and CRP as a predictor. None considered data from the negative stratum of SARS-CoV-2, therefore, these models are biased by not excluding noisy predictors.
We eliminated variables with “high” ROC at SARS-CoV-2=0, so that variables with more peculiar responses to COVID-19 were included. CRP is a general marker and not a peculiar response to COVID-19. Reactive levels of CRP together with SARS-CoV-2 RT-PCR exam may be a predictor of hospitalization, but this can happen due to causes other than COVID-19 (most cases of COVID-19 are asymptomatic to mild) and so the protocol should be different. To include it in a model, one should control for all other causes of CRP reactive.
We have not included lymphocyte count in the models and have not evaluated the neutrophil to lymphocyte ratio (NLR) as predictor. Lymphocytes and neutrophils are strongly related in this dataset. The similarity of correlations at both strata of SARS-CoV-2 suggests that NLR can also be a noisy association with hospitalization too. If other studies validate that CRP is a noisy predictor (and also possibly NLR), the remaining associations will have less specificity and sensitivity, but at least, they will not generate unreliable COVID-19 predictions.
We evaluated age and creatinine as risk factors. Controlling for age quantile improved the AUC of all models at the positive stratum of SARS-CoV-2. There are other risk factors that can be evaluated with this framework, but not with HIAE_dataset [17], and they could lead to different patterns or enhance a few ones. The difference between risk factor and outcome among blood tests is subtle. The emergent literature is cautious about whether eosinopenia may be a risk factor [30] and whether creatinine (and other renal markers) may be associated with COVID-19 renal inflammatory response [31]. As an acute inflammatory kidney response to COVID-19, the interpretation changes and further refinement of the framework is necessary. If eosinopenia is a risk factor, the prevalence of this condition should be considered and must be properly diagnosed at admission, and the models should be reviewed with new data. As “inflammation” is latent in the DAG, one cannot test key conditional independencies from this framework (DAGs must be hypothesis driven). Additionally, the usefulness of characteristic associations only due to COVID-19 (when existent) is that they can help in the identification and estimation of risk factors.
As we drop noisy predictors, we are effectively dealing with hypothesis about the physiopathology of COVID-19 inflammation. Even though not as frequent as the mentions of neutrophils, there are studies on the complex role of eosinophils [30,32] and monocytes [33,34] in COVID-19 inflammation indicating eosinopenia in severe cases and monocytopenia in some phase of the cytokine storm and other COVID-19 pathologies [35].
We selected two patterns of blood tests that are associated with hospitalization due to COVID-19 inflammation: age with combinations of eosinophils, monocytes and neutrophils; and age and creatinine with monocytes and neutrophils. The model findings are aligned with the known physiopathology of COVID-19 but in a more integrative framework of analysis (not as individual predictors, but as a set that is related to risk factors). The selected blood tests are broadly available even in regions with scarce health care resources. It is unlikely that we will have just one or two overall best models; given different sets of risk factors, we should expect a few representative patterns of the COVID-19 inflammation from moderate to severe.
All models can be reproduced by downloading the dataset [17]. More important, we believe that most hospitals (and COVID-19 care centers) in regions affected by the pandemics can apply the framework to generate similar models appropriate to the target population in which they are inserted by making systematic efforts to collect blood tests and potential risk factors at admission together with the SARS-CoV-2 RT-PCR, and other clinical data. Therefore, by making these databases public (anonymized and with standardized data), they will allow future external validation in larger target populations and meta-analysis efforts.
Data Availability
The data that support the results of this study are openly available.
https://www.kaggle.com/dataset/e626783d4672f182e7870b1bbe75fae66bdfb232289da0a61f08c2ceb01cab01
Author Contributions
G. Ishikawa: Conceptualization, methodology, and formal analysis
G. Argenti: Conceptualization, formal analysis, and clinical and epidemiological validation
C. B. Fadel: Clinical and epidemiological validation and critical review
All authors: Writing, editing, visualization, review and final approval of manuscript
Statements
The authors declare no conflicts of interest.
This paper has not been published previously in whole or part.
The data that support the results of this study are openly available in reference number [17]. Although this research received no specific grant from any funding agency, commercial or not-for-profit sectors, as institutionally required we inform that “this study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001”.
Acknowledgements
We are grateful to Antônio Magno Lima Espeschit and Sônia Mara de Andrade who contributed with suggestions to this research.