ABSTRACT
BACKGROUND Corona Virus Disease 2019 (COVID-19) is spreading worldwide. Effective screening for patients is important to limit the epidemic. However, some defects make the currently applied diagnosis methods are still not very ideal for early warning of patients. We aimed to develop a diagnostic model that allows for the quick screening of highly suspected patients using easy-to-get variables.
METHODS A total of 1,311 patients receiving severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) nucleicacid detection were included, whom with a positive result were classified into COVID-19 group. Multivariate logistic regression analyses were performed to construct the diagnostic model. Receiver operating characteristic (ROC) curve analysis were used for model validation.
RESULTS After analysis, signs of pneumonia on CT, history of close contact, fever, neutrophil-to-lymphocyte ratio (NLR), Tmax and sex were included in the diagnostic model. Age and meaningful respiratory symptoms were enrolled into COVID-19 early warning score (COVID-19 EWS). The areas under the ROC curve (AUROC) indicated that both of the diagnostic model (training dataset 0.956 [95%CI 0.935-0.977, P < 0.001]; validation dataset 0.960 [95%CI 0.919-1.0, P < 0.001]) and COVID-19 EWS (training dataset 0.956 [95%CI 0.934-0.978, P < 0.001]; validate dataset 0.966 [95%CI 0.929-1, P < 0.001]) had good discrimination capacity. In addition, we also obtained the cut-off values of disease severity predictors, such as CT score, CD8+ T cell count, CD4+ T cell count, and so on.
CONCLUSIONS The new developed COVID-19 EWS was a considerable tool for early and relatively accurately warning of SARS-CoV-2 infected patients.
INTRODUCTION
The Corona Virus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is erupting worldwide.1-4 Due to the negative effects it brought, the World Health Organization (WTO) defined the outbreak a public health emergency of international concern on January 31th, 2020.5 As early as December 2019, the COVID-19 epidemic were reported in Wuhan, Hubei Province, China. In a short period of time, the number of COVID-19 confirmed patients in Zhejiang Province also increase rapidly.6,7
Currently, the diagnosis of COVID-19 relies mainly on SARS-CoV-2 nucleicacid detection. However, due to the shortcomings of false-negative results caused by the low viral load in the samples and relatively insufficient detection kits, many patients cannot be detected in time. Although clinical symptoms, epidemiological exposure history and computed tomography (CT) manifestations have been added to the diagnostic criteria of the disease, the atypical early clinical symptoms, unclear epidemiological exposure history and ambiguous imaging conclusions also bring difficulties to the screening of the patients.6,8 An effective and simple multi-parameter diagnostic tool is urgently needed.
Our study selected the clinical data of patients who came to our hospital during the first phase of the epidemic (by February 5th, 2020, most of the confirmed patients were from Wuhan or had a clear history of close contact with the returning population from Wuhan) as the training dataset. By comparing the characteristics of the confirmed cases and excluding cases, the independent risk factors related to the disease were identified and included in the diagnostic model. The purpose of establishing a multifaceted diagnostic model is to improve the detection rate and detection speed of infected patients, so as to truly achieve early detection, early isolation and early treatment to curb the further development of the epidemic.
METHODS
POPULATIONS
We included clinical data of a total of 1,311 patients who came to the First Affiliated Hospital, School of Medicine, Zhejiang University and performed at least once SARS-CoV-2 nucleicacid detection for analysis. This study was approved by the Ethical Committee of the First Affiliated Hospital, School of Medicine, Zhejiang University (code number IIT20200025A). Data of patients between January 20th, 2020 and February 5th, 2020 were enrolled into training dataset, and between February 6th, 2020 and February 19th, 2020 were enrolled into validate dataset. All patients who had a positive SARS-CoV-2 nucleicacid detection result were classified into COVID-19 group, while the others were classified into non-COVID-19 group. Due to the huge number of non-COVID-19 group patients, we eliminated the enrolled patients according to the following steps: (1) all negative patients from the COVID-19 designated district, Zhijiang District of the First Affiliated Hospital of Zhejiang University School of Medicine; (2) negative outpatients from Qingchun District were sorted by “Z-A” based on Chinese first name, the first 60% between January 20th, 2020 and February 5th, 2020, and the first 40% between February 6th, 2020 and February 19th, 2020 were included. Exclusion criteria: (1) asymptomatic patients without history of exposure but had strong willingness for detection; (2) patients with important information deficits. Figure 1 presented the concrete procedures of patients screening. The definitions of severity was based on the New Coronavirus Pneumonia Prevention and Control Program (6th edition) published by the National Health Commission of China9: (1) mild: patients had no pneumonia on imaging;(2) moderate: patients with symptoms and imaging examination shows pneumonia; (3) severe: patients meet any of the following: (i) respiratory rate ≥ 30/min; (ii) resting pulse oxygen saturation (SpO2) ≤ 93%; (iii) partial pressure of oxygen (PaO2) / fraction of inspired oxygen (FiO2) ≤ 300mmHg (1mmHg = 0.133kPa); (iiii) multiple pulmonary lobes showing more than 50% progression of lesion in 24-48 hours on imaging; (4) critical: patients meet any of the following: (i) respiratory failure requires mechanical ventilation; (ii) shock; (iii) combination of other organ failure that requires admitted into intensive care unit (ICU). Mild or moderate patients were included in the non-severe group, and severe or critical patients were included in the severe group.
DATA COLLECTION
Data were extracted from electronic medical record system of the the First Affiliated Hospital, School of Medicine, Zhejiang University. Epidemiological exposure history within the 14 days before illness onset is defined as: (1) having exposure related to Wuhan: had recently lived, traveled, or had close contact with someone who has been to Wuhan; (2) close contacts: had contacted with a COVID-19 confirmed patient. Except for SARS-CoV-2 nucleicacid detection, the data used in this study were the first inhospital results. Based on the clinical judgment of doctors and the medical burden considerations, patients who did not performed CT in non-COVID-19 group were assigned as a negative CT report by default.
LABORATORY CONFIRMATION
Laboratory confirmation was achieved by real time reverse transcription-polymerase chain reaction (RT-PCR) assay for SARS-CoV-2 according to the protocol established by the WHO.10 Sputum samples were preferred in our hospital for RT-PCR assay within 3 hours. Two target genes of SARS-CoV-2 were tested during the process, including open reading frame 1ab (ORF1ab) and nucleocapsidprotein (N).
CT SCORING
The CT scores were analyzed retrospectively by two radiologists blinded to the patient’s diagnosis and other clinical features. Two lung is divided into the upper (above trachea juga), middle (between trachea juga and pulmonary vein) and lower (under pulmonary vein) parts. According to the area occupied by different CT signs in each region: normal lung tissue, 0 points; lesion area < 25% of this layer, 1 point; 25% ∼ 50%, 2 points; 50% ∼ 75%, 3 points; > 75% 4 points. The score of each part were added to obtain the final CT score.11
STATISTICAL ANALYSIS
Continuous variables were expressed as medians with interquartile ranges (IQR). Categorical variables were expressed as numbers and percentages in each category. Mann-Whitney U-test were used to evaluate continuous data. Chi-square test were used for categorical variables. Independent risk factor analysis was performed using multivariate logistic regression analyses with forward stepwise method. The discrimination capacity of the diagnostic model and COVID-19 early warning score (COVID-19 EWS) were accessed by calculating the area under the receiver operating characteristic (AUROC) curve. All statistical analyses were done by SPSS statistical software package (version 25.0). P < 0.05 was considered statistically significant.
RESULTS
DEMOGRAPHIC AND CLINICAL CHARACTERISTICS OF PATIENTS IN THE TRAINING DATASET
A total of 304 patients were enrolled in the training dataset, including 73 patients in the COVID-19 group and 231 patients in the non-COVID-19 group (Table 1). There was a higher proportion of males in the COVID-19 group (63.0% vs 37.0%, P = 0.007). Patients in COVID-19 group had a higher age distribution than non-COVID-19 group (53 years [IQR, 43.5-62.0] vs 34 years [IQR, 29.0-49.0]; P < 0.001). As for epidemiological exposure, proportion of the patients with history of Wuhan-related exposure and close contact was higher than the COVID-19 group (57.5% vs 40.7%, P = 0.012; 34.2% vs 5.6%, P < 0.001). Patients in the COVID-19 group had a longer out-hospital disease course than the non-COVID-19 group (7 [IQR, 4.0-9.0] vs 3 [IQR, 1.0-5.0]; P < 0.001). More patients in the COVID-19 group showed fever (94.5% vs 68.0%, P < 0.001), dyspnea (21.9% vs 8.7%, P= 0.002), cough (60.3% vs 45.9%, P = 0.032), expectoration (23.3% vs 12.6%, P = 0.026), and myalgia or fatigue (32.9% vs 19.9%, P = 0.022). On the other hand, more patients in the non-COVID-19 group showed pharyngalgia (22.9% vs 6.8%, P = 0.002), and rhinobyon or rhinorrhoea (10.4% vs 1.4%, P = 0.014). In addition, patients in the COVID-19 group had a higher maximum body temperature (Tmax) during the outhospital phase (38.2 °c [IQR,37.5-39.0], P < 0.001) and a lower SpO2 (97.0% [IQR, 95.0-99.0] vs 98.0% [98.0-99.0], P < 0.001).
LABORATORY AND RADIOLOGIC CHARACTERISTICS OF PATIENTS IN THE TRAINING DATASET
As shown in Table 1, the white blood cell count (P = 0.002), lymphocyte count (P < 0.001), monocyte count (P < 0.001), and platelet count (P < 0.001) were lower in the COVID-19 group than the non-COVID-19 group. Although neutrophil count had no obvious difference, the neutrophil-to-lymphocyte ratio (NLR) were significantly higher in the COVID-19 group (5.00 [IQR, 2.2-13.9] vs 2.7 [1.7-4.7], P < 0.001). Many indicators associated with inflammation or cell damage were increased, including erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), procalcitonin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), lactose dehydrogenase (LDH), and hydroxybutyrate dehydrogenase (HBDH) (all P < 0.001). Nearly all patients (95.9%) in the COVID-19 had signs of pneumonia on CT scans. Only 26.4% patients in the non-COVID-19 group had abnormal findings on lung CT scans, and some of them were considered as bacteria or other virus infection.
DIAGNOSTIC MODEL AND COVID-19 EWS
Variables with P < 0.05 were selected to perform multivariate logistic regression analysis by using the forward stepwise method. In order to facilitate the later establishment of a scoring tool, NLR and Tmax were transformed into a categorical variable (< 5.8 / ≥ 5.8; <37.8 / ≥ 37.8) based on the cut-off value. Signs of pneumonia on CT, history of close contact, fever, NLR, Tmax and sex were considered as the independent factors related to the onset of COVID-19 (Supplementary Table 1). The equation was derived: Probability (COVID-19) = 1 / 1 + exp - [−9.106 + (2.79 x Fever) + (4.58 x History of close contact) + (5.10 x Signs of pneumonia on CT) + (0.97 x NLR) + (0.94 x Tmax) + (0.90 x Sex)].
To further improve the operability of this diagnostic model, we converted the beta coefficient values into score. Other significant parameters, age and meaningful lower respiratory symptoms, were also included into the COVID-19 EWS (Figure 2A). According to the results of ROC curve analysis, a cut-off value of 10 was selected as the threshold to distinguish COVID-19 patient, with sensitivity and specificity of 0.932 and 0.874, respectively. As SARS-CoV-2 nucleicacid positive is the accepted standard for COVID-19 diagnosis, and all positive patients need isolated observation and/or treatment, we did not include this single item in COVID-19 EWS.
In order to determine whether the diagnostic model and COVID-19 EWS are advantageous, we compared their diagnostic capacity to SARS-CoV-2 nucleicacid detection and CT scan (Figure 2B-F). We used the first SARS-CoV-2 nucleicacid detection result to value its sensitivity and specificity. Results showed that although nucleicacid detection had a high specificity, its sensitivity was lower than COVID-19 EWS (training dataset 0.918 vs 0.932; validate dataset 0.889 vs 0.944). Screening only by CT scan had a high sensitivity, but low specificity (training dataset 0.737 vs 0.874; validate dataset 0.737 vs 0.842). Either of the diagnostic model or the COVID-19 EWS had a quite large AUROC both in the training dataset and the validation dataset.
PREDICTIVE FACTORS ANALYSIS OF COVID-19 DISEASE SEVERITY
We compared the clinical data between the severe group and the non-severe group to find predictive factors of COVID-19 disease severity (Table 2).The median age of the severe group was higher than the non-severe group (55.5 years [IQR 48.0-64.3] vs 48 years [IQR 37.0-59.0], P < 0.001), indicating that older patients were more likely to develop severe illness. The proportion of male patients in the severe group was great larger than that of female (71.4% vs 28.6%). Percentage of patients with hypertension was higher in severe group (22 [52.4%] vs 4 [12.9%], P < 0.001), while other coexisting disorders, including diabetes, cardiovascular disease and chronic obstructive pulmonary disease showed no significant differences. In addition, many laboratory items and CT score presenting significant differences too. In order to better distinguish severe and non-severe patients, we defined the new threshold value of the selected parameters (P < 0.05) by calculating the cut-off value using ROC curve analysis. Table 3 showed the parameters with AUROC > 0.60.
DISCUSSION
The worldwide outbreak of COVID-19 is intensifying.12 As the main battlefield on the first stage, early detection and effective quarantine of patients and close contacts has made the epidemic in China so far under effective control. However, practice has shown that detection of highly suspected patients remains problematic. Considering the relative shortage of SARS-CoV-2 nucleicacid detection kits and false-negative results caused by various reasons, such as the quality of the samples taken, the number of viruses and the stage of the disease, medical experts have proposed screening of highly suspected patients with lung CT. However, due to mild COVID-19 patients are defined as no pneumonia on imaging, and imaging findings of some patients are atypical, screening based on CT findings is greatly dependent on the physician’s experience and the effectiveness is limited. All the above reasons increase the diagnosis difficult of COVID-19 patients when they first come to the doctors.
In our study, we firstly developed a diagnostic model based on multivariate logistic regression analysis. Since this diagnostic model was a logistic regression equation and inconvenient to operate in clinical environment, we converted the diagnostic model into an scoring tool by assigning values to each variable in the equation. Besides of that, we enrolled two more parameters into the final COVID-19 EWS, including age and meaningful respiratory symptoms. Based on the statistical analysis, we found that elderly people were more susceptible to COVID-19 and develop severe, which may due to their lower disease resistance and more basic diseases; some significant symptoms, such as cough, expectoration and dyspnea, can help to strengthen the association between fever and lower respiratory symptoms and weaken the weight of fever caused by other reasons. As a result, the COVID-19 EWS contained easy-to-get parameters on clinical manifestation, epidemiological characteristics, basic vital signs, laboratory and radiologic data, which provide a more comprehensive assessment of patients. Through re-evaluation in the validation dataset and comparison with other methods, results showed that both of our diagnostic model and COVID-19 EWS had relatively high and stable sensitivity and specificity.
The clinical features of patients in our medical center were similar to those reported in previous studies13-15: Male were more susceptible to COVID-19 but had no significance to predict of disease severity in our study may due to the small sample size, although the male to female ratio was quite high in the severe group (71.4 vs 28.6); most patients had initial symptoms of fever and cough; as for laboratory results, the decrease of white blood cell, lymphocyte, monocyte, and platelet, and the increase of the levels of ESR, CRP, Cr, serum urea nitrogen, LDH, and HBDH were found in COVID-19 patients, suggesting that SARS-CoV-2 infection may be associated with the injury of cellular immune, cardiomyocytes, liver and kidney.16 NLR was a widely used marker for the assessment of the severity of bacterial infections and the prognosis of patients with pneumonia and tumors.17-19 In our study, we also found the increase of NLR in COVID-19 patients, and it was positively related to the disease severity.
Consistent with a previous study, we found CD4+ T cell count and CD8+ T cell count were negatively correlated to the severity of COVID-19, indicating that SARS-CoV-2 may direct or indirect damage to T lymphocytes and further aggravates disease progression.20,21 Although inflammatory cytokine storms were thought to be a mechanism for COVID-19 progression,15 we did not find an significantly increased level of IL-2, IL-4, IL-6, IL-10, TNF-α, and IFN-γ. These findings might due to the limited size of our study and the large heterogeneity at the time-points of the first detection.
There were also many shortcomings in our study that should be considered. Firstly, it was a retrospective study, which might contain selection bias, although we tried to avoid the bias through sorting patients by “Z-A” based on Chinese first name and strictly abide the inclusion and exclusion criteria. In addition, because of the incomplete of potentially valuable information, such as smoking history and basic diseases, we cannot conduct a comprehensive analysis of these factors currently. Besides of that, additional multi-center, multi-ethnic and prospective studies are expected to revised our diagnostic model and COVID-19 EWS, and we also plan to implement a multi-center study with a larger sample size to further validate and optimize the model. More over, it is hoped that better statistical algorithms will make the diagnostic model more practical.
CONCLUSION
We established a novel and easy-to-get early warning score for COVID-19 screening. This scoring tool allows clinicians to more quickly and relatively accurately detect COVID-19 patients, especially when the nucleicacid detection capacity is relatively lacking.
Data Availability
The data used in this study are available from the corresponding author.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest regarding the publication of this paper.
ACKNOWLEDGMENTS
We thank for Ying-ying Mao, Ph.D., providing the help on statistical analyses.