Abstract
Background Many different treatments were heavily administered to patients with COVID-19 during the peak of the pandemic in Madrid without robust evidence supporting them.
Methods We examined the association between sixteen treatments in four groups (steroids, antivirals, antibiotics and immunomodulators) and intubation or death. Data were obtained from patients that were admitted to an HM hospital with suspicion of COVID-19 until 24/04/2020, excluding unconfirmed diagnosis, those who were admitted before the epidemic started in Madrid, had an outcome that was not discharge or death or died within 24 hours of presentation. We compared outcomes between treated and untreated patients using propensity-score caliper matching.
Results Of 2,307 patients in the dataset, 679 were excluded. Of the remaining 1,645 patients, 263 (16%) died and 311 (18.9%) died or were intubated. Except for hydroxychloroquine and prednisone, patients that were treated with any of the medications were more likely to go through an outcome of death or intubation at baseline. After propensity matching we found an association between treatment with hydroxychloroquine and prednisone and better outcomes (hazard ratios with 95% CI of 0.83 ± 0.06 and 0.85 ± 0.03). Results were similar in multiple sensitivity analyses.
Conclusions In this multicenter study of patients admitted with COVID-19 hydroxychloroquine and prednisone administration was found to be associated with improved outcomes. Other treatments were associated with no effect or worse outcomes. Randomized, controlled trials of these medications in patients with COVID-19 are needed to avoid heavy administration of treatments with no strong evidence to support them.
Introduction
The coronavirus disease 2019 (COVID-19) pandemic, declared by the WHO Director General at the media briefing on March 11th 2020, is caused by the named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)1. It started in Wuhan, China, but it later spread to the rest of the world, with currently over 8.7 million confirmed cases worldwide, most of them in Europe and North America2.
Spain, with over 246,000 confirmed cases and over 28,000 confirmed deaths as of June 22nd 20202, has one of the highest burdens of COVID-19 per inhabitant worldwide (over 524 cases per 100,000 inhabitants). The numerical data provided by the Spanish Health Ministry is becoming confusing, and the Financial Times regarded the figures as deeply compromised3. After a quick search in PubMed with the keywords “novel coronavirus”, “COVID-19” or “SARS-CoV2” and “Spain”, we have found no articles that describe the characteristics of infected patients.
Because it is a new epidemic, the specific mechanisms and pathophysiology remain elusive, and the risk factors for death have not been accurately defined (mainly age and comorbidities)4. No effective vaccine or antiviral treatment is currently available5. The analysis of data on the clinical characteristics, received treatment and outcomes of COVID-19 patients is of vital importance to reduce its mortality6. It will allow the identification of possible prognostic factors and provide preliminary data for the future development of management algorithms.
In this article we report on the clinical characteristics, previous history, received treatments and clinical outcome of 1,645 COVID-19 patients admitted to several hospitals in Madrid, Spain. The goal of this report is to identify those clinical profiles most likely to benefit from specific treatment and possible early clinical prognostic factors.
Methods
Study Design and participants
This is a multi-center, retrospective, observational study done with anonymized records provided by several HM hospitals in Madrid, Spain. These records included information of 2,307 patients admitted with a diagnosis of COVID POSITIVE or COVID PENDING, since the first officially recognized cases in Madrid (24/02/2020) to the last update to the database (24/04/2020). Patients are monitored up to discharge or death. The full description of the dataset is included in Appendix 1.
Exclusion criteria
Patients admitted before the first cases were declared in Madrid (24/02/2020), who had not yet reached an outcome (discharge or death) by 24/04/2020, transferred to a different hospital for admission, voluntarily discharged, with a “COVID PENDING” diagnosis or that were interned for less than 24 hours were excluded from the analysis. After exclusion there were 1,645 patients remaining. The exclusion process is explained in Figure 1.
Flow diagram of the patients excluded from the intial sample
Clinical outcomes
In-hospital mortality was chosen as the primary outcome, defined as registered death of a patient before their discharge. As a secondary outcome, we chose a composite event that integrated both in-hospital mortality and intubation.
Data sources
We obtained the data from the HM hospital network in Madrid, thanks to its project ‘COVID DATA SAVE LIVES’7. This anonymized clinical dataset comes from the HM hospitals HER system. It was openly released on April 25th on demand to any research groups that wanted to analyse it, provided they presented a project beforehand and said project was approved by the corresponding board of experts.
The data included patients’ age, sex, past diagnoses, smoking status, admission data, initial vital signs and complementary tests performed in the Emergency Room, vitals and tests performed throughout their hospital stay, treatments received (including previous medications continued and specific treatment for COVID-19), destination at discharge (or death) and diagnoses during their stay.
Statistical analysis
Variable selection
Since the dataset had an enormous number of variables for each patient, there were not always enough patients to effectively analyze each variable and obtain statistically significant associations while correcting for multiple comparisons and properly stratifying. Therefore, we chose to focus on only a few variables, as to avoid the statistical problems that come with high dimensional data. To do so, we consulted two independent experts on what the most important indicators for prognosis were. These variables are the following: age (divided into the following groups: <40, 40-59, 60-79 and ≥80), gender, past diagnoses (diabetes, hypertension, hypercholesterolemia, ischaemic events [recorded as continued use of anticoagulants and/or aspirin], chronic obstructive pulmonary disease [COPD] and cancer), toxic habits (current or past smoking), allergies (to penicillin and other medications), initial vital signs (temperature [measured in °C], heart rate, oxygen saturation [measured through pulse oximetry] and blood pressure [systolic and diastolic, measured with an arm cuff]) and initial laboratory parameters (lactate dehydrogenase [LDH], D-dimer, ferritin, C-reactive protein [CRP], lactate, creatinine, procalcitonine, lymphocyte count and neutrophil count). For these features we took the first analysis they had after arriving at the hospital or, when possible, the average of the first two if taken within 24 hours so as to reduce the amount of missing data. Viral load and Interleukin-6 level were considered too but not enough patients were tested for them to be able to do the analysis. Viral load was not available in the data and only 30 patients had been tested for IL-6 level in their first two tests. All these variables and their distribution in our sample for patients that reached the primary outcome and those who did not are presented in Table 1. Table A2a shows the distribution stratified by the secondary outcome.
To analyze the differences in the distributions between patients that reached one of the outcomes and those who did not we conducted Mann-Whitney U tests for the continuous variables and Chi-square tests for the categorical ones. The p-values for the differences in the distributions both for the totality of the patients and stratified by age are shown in Table A3. After Bonferroni corrections to adjust for multiple comparisons we required P < .001 for statistical significance.
Analysis
The main objective of our analysis was to examine the association between treatment and death rate for 16 drugs of interest. Two independent experts were prompted to choose which medication they considered critical from the 447 ATC7 identifiers in the original dataset, previously screened to include only the ones administered to at least 50 patients. The chosen drugs were of one of the following types: steroids, antivirals, antibiotics and immunomodulators. Both experts marked the same 16 drugs as critical. The full list of medication is reported in Table 2 along with the results. Appendix 9 is a short literature review of the chosen treatments with a summary of previous results and reasons why these treatments were considered useful for patients diagnosed with COVID-19.
An initial crude univariate analysis was done in which the treated and untreated groups for each medication were compared according to their outcomes. Due to the non-randomized nature of assignment of treatment, we then used a propensity-score based method8 to reduce the effects of confounding.
The primary analysis used was propensity-score caliper matching9. We first trained a logistic regression model using the variables in Table 1 to predict whether the patient took the medication or not, then we used the predicted probabilities for each patient as propensity scores for the respective medication. Table A4 shows the AUROC and AUPRC for these models and Appendix 5 shows the propensity-score distributions for all 16 treatments. Then, we used a caliper of 0.2 of the standard deviation in the propensity score of the population as recommended in the literature10,11 to match patients one-to-many from the smallest group (treated or untreated, depending on the medication) with patients from the other group with replacement, that is, we allowed the same patient from the bigger group to be matched to more than one patient of the smaller group. Patients that did not have any matches were discarded, although this was almost never the case and the worst case only excluded 6 patients, median exclusion rate at this caliper size was 1.1%. (See Table A7 for sensitivity analysis on caliper size and exclusion rates. Caliper size did not have a big effect in the results as long as the population of unmatched patients was not too big). Using these sets of matched patients, we calculated the weighted difference in means and effect sizes which we report as hazard ratios and Cohen’s d in Table 2 with 95% confidence intervals. For Cohen’s d we apply the usual interpretation12 where an effect size of less than 0.2 is considered negligible, 0.2 to 0.5 is small, 0.5 to 0.8 is medium and more than 0.8 is large. All the analyses were done using Python v3.6.1 with the sklearn, numpy, pandas and scipy libraries.
Results
Characteristics of the patients
Of the 2,307 patients in the original dataset (last updated 24/04/2020), only 2098 had been resolved. Following the exclusion criteria described above, 1645/ rema//ined for analysis. Their characteristics are shown in Table 1. Over the two-month /period of the study, 263 patients (16%) died and 1382 were discharged. For the initial laboratory tests and initial vital signs, missing data was imputed using multiple iterated imputation13. Appendix 2 shows the results of the statistical analysis considering a composite event (intubation or death) instead of only death. Appendix 3 shows the results of hypothesis tests for the association between these characteristics and both chosen outcomes (death and death or intubation), firstly considering the whole sample and afterwards stratifying by the two eldest age groups (60-79 years and ≥80years) since the other two did not have enough patients that underwent primary or secondary outcomes.
Among the initial vital signs, the main difference between the patients that died and those who did not, was the initial oxygen saturation. When analysing the difference in the distributions, it was the only parameter with P<.001 in all groups. Diastolic blood pressure also showed a significant effect, but this disappeared when separately considering the eldest age groups.
As for the initial laboratory tests, the analysed variables were chosen by experts according to their relevance for assessing risk, and their distributions are clearly distinct between the groups. After performing statistical hypothesis tests, all of these laboratory parameters showed statistically significant association (P<.001) with both outcomes chosen for our study. Only for D-dimer and when stratifying our sample according to age, did this significance disappear in the eldest group (≥80 years) (P=.02 and P=.04 for the primary and secondary outcomes respectively).
Finally, our data confirmed some previous clinical history as risk factors for a higher mortality rate, as they have likewise been proven to be in numerous other diseases, also with statistical significance in most of them although this effect was mostly explained by age, since it disappeared when stratifying. Noticeably, smoking, hypertension and medication allergies did not show a significant association with any of the chosen outcomes even in the whole population. In general, patients with worse clinical outcome had a bigger disease burden at baseline and, therefore, a smaller physiologic reserve.
Effect of treatments in death rate
Table 2 shows the crude and propensity-matched hazard ratios for 16 medications, with their ATC7 identifier, given to COVID-19 patients, along with the estimated effect sizes. To estimate the effects we use hazard ratios with 95% confidence intervals and to show the size of the effect relative to the total variation between patients, we use Cohen’s d14 also with 95% confidence intervals.
In the initial crude analysis, all medications except for prednisone, ritonavir and lopinavir, azithromycin and hydroxychloroquine, were associated with higher mortality. Of these four, associated with a reduction in the mortality rate, only hydroxychloroquine had a medium positive effect (d= −0.58 ± 0.17).
But treatment administration is not random, since patients that receive specific treatment for any disease are usually those with a more severe clinical presentation. Therefore, we performed a second analysis applying propensity-score matching methods. In general, effect sizes for all medications were attenuated, except for prednisone. Its influence on outcome increased from a non-significant to a small positive value (hazard ratio 0.85 ± 0.06 and d= −0.42 ± 0.18). Hydroxychloroquine maintained a similarly positive effect despite its attenuation due to the propensity-score matched analysis, with hazard ratio 0.84 ± 0.08 and d = −0.44 ± 0.17.
Regarding the remaining medications received by the patients in our study, after propensity-matching, nine of them (methylprednisolone, ritonavir and lopinavir, oseltamivir, amoxicillin, azithromycin, ceftriaxone, levofloxacin, tocilizumab and interferon beta 1-b) had no significant effect on mortality (d<0.2). Two of them (dexamethasone and piperacillin) had a small negative effect (0.2<d<0.5), slightly increasing mortality. The remaining three drugs (hydrocortisone, linezolid and meropenem) had medium or large negative effects (d>0.5).
When comparing these results with the analysis according to composite critical event, as appears in Table A2b, the difference is minimal, with the only change being that amoxicillin goes to d=0.21± 0.23, a small negative effect. As an added step to check the robustness of the results we carried out an analysis with nearest-neighbour matching, where each patient in the smaller group (treated or untreated) was matched with the patient with the closest characteristics but different treatment value with no replacement. Table A6 shows the results. In general, this increased the confidence intervals of the effect sizes since the samples were usually smaller. The results were similar and mostly within error or with overlapping confidence intervals of the results in Table 2, with the biggest differences found for methylprednisolone and tocilizumab which both showed a small negative effect not within error of their results in Table 2.
Negative d or hazard ratio less than 1 implies patients treated with that medication died less than those that did not. The opposite is true for positive d or >1 hazard ratio. Medications highlighted in red have at least a small positive d (>0.2) after propensity-score matching. Similarly, medications highlighted in green have d < −0.2.
Discussion
In this analysis of 1645 patients hospitalized with COVID-19 we find that the risk of death was significantly lower for patients treated with azithromycin, prednisone and, especially, hydroxychloroquine. The confidence intervals for these treatments overlap with those found in previous observational studies15. When comparing our results with a small review of the current literature about effectiveness of treatments for COVID-19, we found evidence in favour of hydroxychloroquine, with good virological and clinical outcomes in some studies, both alone and in combination with azithromycin, although these results were not uniform in all studies16. There were no studies found for standalone use of antibiotics, and regarding antivirals, we found no evidence of benefits that would recommend their use17. Finally, there is evidence in favour of tocilizumab18 and interferon beta 1b19, but mostly for patients in critical state. The evidence regarding corticosteroids is controversial20, with no significant proof of positive or negative effect, although they appear to improve outcome in critical cases21.
However, the effects of hydroxychloroquine and prednisone are medium after matching, and persist even when considering intubated patients (Appendix 2). We think this is at least small evidence in favor of hydroxychloroquine and prednisone having a positive effect in the treatment of COVID-19.
We find that risk factors in our data are in agreement with what is already common knowledge about COVID-19: age, comorbidities (mainly cardiovascular risk factors: Diabetes, hypertension, ischaemic coronary disease [measured through use of anticoagulants or aspirin]; as well as chronic obstructive pulmonary disease). We also find that smokers are less prevalent in our sample than in the Spanish population at large, indicating some measure of resistance to infection. However, once admitted in the hospital, their outcomes are not significantly different in any way we can measure from the overall population.
Also in accordance with expert opinion, no aspects of the initial clinical presentation except for O2Sat were found to be associated with a higher risk for a bad prognosis. However, laboratory parameters associated with systemic inflammatory response (lactate, LDH, D-dimer, ferritin and neutrophil count) did prove to have a relationship with higher mortality and intubation rates. A lower initial lymphocyte count and worse kidney function (measured through plasmatic creatinine) was also found in deceased and intubated patients. These parameters might be useful for an initial estimation of a patient’s prognosis, but these results alone do not give enough information to alter clinical practice.
Appendix
Appendix 1: HM dataset description
The following text has been copied from a document released by the HM hospital network along with its database explaining its project and the structure of the dataset.
The information is organized in tables according to their content, all of them linked by a unique admission identifier. This identifier is the de-anonymization key, explicitly created for this purpose, and has nothing to do with the actual identifier of each admission.
The main table includes data on the admission and the patient (age and sex), data on the previous emergency if there has been one (2,226 records), data on their stay in the ICU if there has been one and records of the first and last set of emergency constants.
The medication table shows all the medication administered to each patient during admission (more than 60,000 records), with the dates corresponding to the first and last administration of each drug, identified by their brand name and classification in the ATC5/ATC7.
In the table of vital signs, there are all the basic records of constants (54,000 records so far) collected during admission with their date and time of registration.
The laboratory table contains the results of the determinations (398,884 records) of all the requests made to each patient during admission and in the previous emergency, if any.
And finally, the ICD10 coding tables show the records of diagnostic and procedural information coded according to the international ICD10 classification in its latest distributed version (does not include COVID), for the patients referred, both for episodes of hospital admission (more than 1,600) and for the emergency (more than 1,900) prior to those episodes, if any.
Appendix
Appendix 2: Results by critical composite event
Characteristics of patients
Treatment hazard ratios for critical event patients
Negative d or hazard ratio less than 1 implies patients treated with that medication died less than those that did not. The opposite is true for positive d or >1 hazard ratio. Medications highlighted in red have at least a small positive d (>0.2) after propensity-score matching. Similarly, medications highlighted in green have d < −0.2.
Appendix
Appendix 3: Differences in the characteristic distributions in Table 1 and Table A2a
The table shows the p-values of testing if the distributions of the patients that underwent end-point events and those who did not are similar.
Appendix
Appendix 4: Propensity-score models area under curves
We present AUPRC in addition to AUROC due to the class imbalance of these classification problems, since it is a better metric here. For a dummy model that predicted the most frequent class with probability equal to its relative frequency, AUROC is 0.5 and AUPRC is the relative frequency of the treated group.
Appendix
Appendix 5: Propensity score distributions
Appendix
Appendix 6: Treatment effect with nearest neighbour matching
Every patient in the smaller group was matched to the closest patient in terms of characteristics. (Euclidean distance after normalizing the continuous variables). Medications highlighted in red have at least a small positive d (>0.2) after propensity-score matching. Similarly, medications highlighted in green have d < −0.2.
Appendix
Appendix 7: Caliper size sensitivity and exclusion rates
To ensure that the choice of caliper did not have an outsized effect on the results we repeated the analysis for a range of caliper sizes (from 0.01 to 0.5 as a fraction of the standard deviation). Table A7a shows the median and inter quartile range exclusion rates at each of the caliper sizes tested (we excluded any unmatched patients from the analysis) which justifies using a caliper in the bigger range to avoid excluding too many patients. Table A7b shows the median and interquartile range for the effect size (Cohen’s d) over the caliper range. The results reported in the discussion mostly do not change when taking into account different caliper sizes, except for amoxicillin which at d=0.19 was at the edge of what we considered significant effect and goes to d=0.22 median effect size over the range of calipers.
The smaller the caliper, the easier it is to have some patients too far away from any other in the opposite treatment group, so the exclusion rate rises. In blue, the caliper size we used for the main analysis.
Results are very similar to those of Tables 2 and A2b and do not change the conclusions presented. This shows that the results are robust to changing the caliper size.
Appendix
Appendix 8: Missing data for each variable
Missing data was handled by first using the median to impute and iteratively training multivariate regressions on all the variables except for one to predict the one out. The improved predictions were then used to train new models in the next iteration and this was repeated until convergence. Due to the small sample size, procalcitonin was not used to train the propensity-score models.
Appendix
Appendix 9: Literature review of treatment effectiveness
Each of the 16 medications in the main analysis were chosen because of their therapeutic relevance during the peak of the pandemic in Madrid. The evidence of their effectiveness was very limited during that time, and is still controversial at most. This is our attempt at collecting the evidence for each of these treatments, both the physiopathologic reasons for them being considered possibly useful in the beginning and a small review of the currently published studies regarding their effectiveness.
Steroids
Glucocorticoids were given for their possible role in management of acute respiratory distress syndrome and refractory shock in critically ill patients with COVID-19, apart from their standardised use for exacerbation of chronic obstructive pulmonary disease1
During the period when the data was collected (beginning and peak of the pandemic in Madrid), steroids were given very sparingly because of a possibly unfavourable risk/benefit ratio. They were believed to have severe adverse effects in this particular disease, being precisely patients in worse clinical condition those at the highest risk to develop them.
The evidence found in the literature is controversial2, but recent results indicate that steroids might have a strong positive effect in patients receiving oxygen or intubated3, which would indicate that patients treated with stronger steroids are those with a worse clinical presentation. These results are concordant with the impression of the authors that worked with COVID-19 patients during the pandemic.
Antivirals
Ritonavir and lopinavir are viral protease inhibitors used as treatment for HIV. Due to SARS-CoV-2 being a retrovirus that requires similar enzymes to complete its intracellular cycle, they were hypothesized to have some effect in reducing viral load. However, no effect on clinical outcome has been observed in several studies 4,5.
Oseltamivir was used with almost no evidence to support it, but it was an antiviral readily available in hospitals. To the best of our knowledge, there is no mechanism through which it could work and studies that have tested its effectiveness have found negative results 5,6.
Antibiotics
Antibiotics are protocolarily given to patients with viral pneumonia as prophylaxis for bacterial superinfection, a common and grave complication. Amongst them only azithromycin has been hypothesized to have a direct benefit in COVID-19, because of its immunomodulating effect through inhibition of the interleukin-6 inflammatory pathway7. There have been some studies that have tested its effectiveness in association with hydroxychloroquine, with discrepant results8.
Immunomodulators
Hydroxicloroquine was reported to inhibit SARS-CoV-2 in vitro 9 and heralded from the beginning of the pandemic as a working treatment for COVID-19 patients. Many studies, mostly observational, have been done to test its effectiveness with very mixed results. However, we agree with Saibal et al.10 in that most studies have major methodological limitations, many of them with very small sample sizes and some of the bigger ones with no attempt to control for any confounders 11, suggesting that the evidence is very poor. The biggest studies have found no significant effects 10,12,13. However the evidence is still unclear. RCTs are ongoing to better assess its effectiveness.
Markedly elevated inflammatory markers (eg, D-dimer, ferritin) and elevated proinflammatory cytokines (including interleukin-6) are associated with critical and fatal COVID-19. Usage of tocilizumab to block the inflammatory pathway has been hypothesized to prevent disease progression14. Preliminary studies with small sample sizes seem to suggest positive effects but only in patients at risk of cytokine storm. There are ongoing RCTs to clarify the effects of tocilizumab15.
Interferon beta 1-b was suggested to have in vitro activity against Middle East respiratory syndrome coronavirus (MERS-CoV) and good outcomes in an animal model of MERS-CoV infection16. Several interferons have been studied in Covid-19 patients. Dastan et al.17 found positive results with interferon beta 1a in a non-controlled trial, and interferon beta 1b showed significant improvement in virological and clinical outcomes in a phase 2 trial, both in critical and non-critical patients18.