A Novel Triage Tool of Artificial Intelligence Assisted Diagnosis Aid System for Suspected COVID-19 pneumonia In Fever Clinics ============================================================================================================================== * Cong Feng * Zhi Huang * Lili Wang * Xin Chen * Yongzhi Zhai * Feng Zhu * Hua Chen * Yingchan Wang * Xiangzheng Su * Sai Huang * Lin Tian * Weixiu Zhu * Wenzheng Sun * Liping Zhang * Qingru Han * Juan Zhang * Fei Pan * Li Chen * Zhihong Zhu * Hongju Xiao * Yu Liu * Gang Liu * Wei Chen * Tanshi Li ## Abstract Currently, the prevention and control of COVID-19 outside Hubei province in China, and other countries has become more and more critically serious. We developed and validated a diagnosis aid model without CT images for early identification of suspected COVID-19 pneumonia (S-COVID-19-P) on admission in adult fever patients and made the validated model available via an online triage calculator. Patients admitted from Jan 14 to Feb 26, 2020 with the epidemiological history of exposure to COVID-19 were included [Model development (n = 132) and validation (n = 32)]. Candidate features included clinical symptoms, routine laboratory tests and other clinical information on admission. Features selection and model development were based on Lasso regression. The primary outcome is the development and validation of a diagnosis aid model for S-COVID-19-P early identification on admission. The development cohort contains 26 S-COVID-19-P and 7 confirmed COVID-19 pneumonia cases. The model performance in held-out testing set and validation cohort resulted in AUCs of 0.841 and 0.938, F-1 score of 0.571 and 0.667, recall of 1.000 and 1.000, specificity of 0.727 and 0.778, and the precision of 0.400 and 0.500. Based on this model, an optimized strategy for S-COVID-19-P early identification in fever clinics has also been designed. S-COVID-19-P could be identified early by a machine-learning model only used collected clinical information without CT images on admission in fever clinics with 100% recall score. The well performed and validated model has been deployed as an online triage tool, which is available at: [https://intensivecare.shinyapps.io/COVID19/](https://intensivecare.shinyapps.io/COVID19/). KEYWORDS * Suspected COVID-19 pneumonia * Diagnosis Aid model * Fever Clinics * Machine Learning ## Introduction Since December 2019, the outbreak of novel coronavirus disease (COVID-19; previously known as 2019-nCoV) 1, which causing severe pneumonia and acute respiratory syndrome was emerged in Wuhan, China, and rapidly affecting worldwide2-5. Until February 29th, 2020, the total reported confirmed COVID-19 pneumonia (C-COVID-19-P) cases have reached 85,403 in the whole world, including 79,394 in China and 6,009 in other countries globally, and the number of cases is increasing rapidly and internationally6, 7. The main reason for the outbreak of infected cases in the early stage of the epidemic was short in ability to rapidly and effectively detect such a large number of suspected cases8. Outside Hubei Province, such as in Beijing with a large population, sporadic and clustered cases have continuously been reported. Some other countries and regions, prominently in South Korea, Japan, Iran, etc., are reporting more and more confirmed cases4, 6, 9, 10. Currently, epidemic prevention and control outside Hubei province and other countries have become more and more critically serious. Therefore, establishing an early identification method of suspected COVID-19 pneumonia (S-COVID-19-P) and optimizing triage strategies for fever clinics is urgent and essential for the coming global challenge. The identification of S-COVID-19-P relies on the following criteria: the epidemiological history, clinical signs and symptoms, routinely laboratory tests (such as lymphopenia) and positive Chest computerized tomography (CT) findings3. However, clinical symptoms and routinely laboratory tests are sometimes non-specific2, 3. Although CT is becoming a major diagnostic tool helping for early screening of S-COVID-19-P, the resources of the designated CT room are relatively limited, especially in less-developed regions and when the influx of patients substantially outweighed the medical service capacities in fever clinics11, 12. Moreover, not all patients with clinical symptoms or abnormal blood routine values need CT examination, besides radiation injury, high cost and other restrictions. Therefore, it is critical to integrate and take the most advantages of clinical signs and symptoms, routinely laboratory tests and other clinical information which available on admission before further CT examination, which would strength the ability of early identification of S-COVID-19-P, improve the triage strategies in fever clinics and make a balance between standard medical principles and limited medical resources. The increase of secondary analysis in the emergency department and intensive care unit has given feasibility to get ‘real time’ data from the electronical medical records, thus making them enable for ‘real world’ research13, 14. This term pertains to machine-learning algorithms to analyze specific clinical cohorts and develop models for diagnosis aid or decision support in emergent triage15. Such models could be a cost-effective assisted tool to integrate clinical signs and symptoms, blood routine values and infection-related biomarkers on admission for S-COVID-19-P early identification. The aim of this study was the development and validation of a diagnosis aid model on admission without CT images for early identification of S-COVID-19-P in adult fever patients with the epidemiological history of exposure to COVID-19. The model performance was also compared to some infection-related biomarkers on admission in the general population admitted to the fever clinic. The well-performed model is available as an online triage calculator, and based on it, the optimized strategy for S-COVID-19-P early identification in fever clinics has also been discussed. ## Materials and methods ### Study design and population: development and validation cohorts We developed a novel diagnosis aid model for early identification S-COVID-19-P based on the retrospective analysis of a single center study. All patients admitted to the fever clinic of emergency department of the First medical center, Chinese People’s Liberation Army General Hospital (PLAGH) in Beijing with the epidemiological history of exposure to COVID-19 according to WHO interim guidance were enrolled in this study. The fever clinic is a department for adults (*i*.*e*., aged ≥14 years) specializing in identification of infectious diseases, especially for S-COVID-19-P. We recruited patients from Jan 14 to Feb 9, 2020 as a model development cohort. Meanwhile, we also recruited patients from Feb 10 to Feb 26, 2020 as a dataset for model validation. ### The definition of S-COVID-19-P All recruited patients on admission were given vital signs, blood routine, infection-related biomarkers, influenza viruses (A+B) and chest CT examination. The patients who have the epidemiological history and CT imaging characteristics of viral pneumonia and any other one of the following two clinical signs were diagnosed as S-COVID-19-P, which according to the “Guidelines for diagnosis and management of novel coronavirus pneumonia (The sixth Edition)” published by Chinese National Health and Health Commission on Feb 18, 2020 (6th-Guidelines-CNHHC). The two clinical signs including: 1) Fever and/or respiratory symptoms; 2) Total count of leukocyte was normal or decreased, or lymphopenia (<1.0× 109/L). ### The definition of C-COVID-19-P Patients who were clinically identified as S-COVID-19-P, the throat swab specimens from the upper respiratory tract obtained from all patients on admission were maintained in viral-transport medium3. Laboratory confirmation of COVID-19 infection was done in four different institutions: the PLAGH, the Haidian District Disease Control and Prevention (CDC) of Beijing, the Beijing CDC and the academy of Military Medical Sciences. COVID-19 infection was confirmed by real-time RT-PCR using the same protocol described previously2. RT-PCR detection reagents were provided by the four institutions. ### Data extraction For each patient, we extracted all data on admission, which included demographic information, comorbidities, epidemiological history of exposure to COVID-19, vital sign, blood routine values, clinical symptoms, infection-related biomarkers, influenza viruses (A+B) test, CT findings, and days from illness onset to first admission. All data were checked and missing data were obtained by direct communication with other two attending doctors (XC and YZ). ### Outcomes The primary outcome is the development and validation of a diagnosis aid model for S-COVID-19-P early identification on admission. The secondary outcome is the comparison of the diagnostic performance between diagnosis aid model and infection-related biomarkers on admission. ### Diagnosis aid model and candidate features For early identification for S-COVID-19-P on admission, a diagnosis aid model was developed which are intended to be used early clinical information based on the availability from patients’ medical records. We included following candidate features: 1) 2 variables of demographic information (*e*.*g*., age and gender); 2) 4 variables of vital signs (*e*.*g*., temperature, heart rate, etc.); 3) 20 variables of blood routine values (*e*.*g*., white blood cell count, red blood cell count, hemoglobin, hematocrit, etc.); 4) 17 variables of clinical signs and symptoms [e.g., fever, fever classification (°C,normal: <= 37.0, mild fever: 37.1-38.0, moderate fever: 38.1-39.0, severe fever: >=39.1), cough, muscle ache, etc.]; 5) 2 infection-related biomarkers (*e*.*g*., C-reactive protein and Interleukin-6); 6) 1 other variable: days from illness onset to first admission (DOA). The complete candidate features list is shown in Table 1. View this table: [Table 1:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T1) Table 1: Candidate features for diagnosis aid model ### Features selection and model development Candidate features were selected based on expert opinion and availability in the medical records. For the model, we compared 4 different algorithms: 1) logistic regression with LASSO, 2) logistic regression with Ridge regularization, 3) decision tree, 4) Adaboost algorithms, and found logistic regression with LASSO achieved overall best performances in testing set and external validation set in terms of AUC and recall score (Table S1). Features selection and model development were performed in the development cohort only and using a logistic regression with Lasso regularization (Lasso regression) which is one of the models that shrinks some regression coefficients toward zero, thereby effectively selecting important features and improving the interpretability of the model16. The features selection and model development were performed in Python 3.7. During the model training, we randomly held out 20% of the cohort data as testing set, and then used a 10-fold cross-validation to yield the optimal of LASSO regularization parameter in the training and validation sets. All features were normalized to standard uniform distribution according to the training and validation sets, and then applied this transformation to both held-out testing set as well as external validation set. All computations were achieved by scikit-learn (version: 0.22.1) in python. Random oversampling was performed to construct balanced data on training and validation sets by using imblearn python package (version 0.6.2). ### Model validation After model development, we used the cohort with the epidemiological history from Feb 10 to Feb 26, 2020 for model validation. The model validation was also performed in python. ### Features Importance Ranking Feature importance was performed in the development cohort. The associated coefficient weights correspond to the logistic regression model were used for identifying and ranking feature importance. ### Comparison of diagnostic performance among diagnosis aid model and infection-related biomarkers Lymphocyte count (LYMPH#), C-reactive protein (CRP) and Interleukin-6 (IL-6) were evaluated on admission. Lymphopenia (<1.0×109/L) was one of the three diagnostic criteria for S-COVID-19-P according to the 6th-Guidelines-CNHHC. Elevated CRP (>0.8 mg/L) and elevated IL-6 (>5.9 pg/mL) were both important infection-related biomarkers. The diagnostic performance among diagnosis aid model and biomarkers for early identifying S-COVID-19-P was also compared. The entire workflow is shown in Figure 1. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/03/20/2020.03.19.20039099/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/F1) Figure 1. The study overview of the Artificial Intelligence Assisted Diagnosis Aid System for Suspected COVID-19 Pneumonia, including (1) Development and validation cohorts, (2) Outcomes, (3) Diagnosis aid model and candidate features, (4) Features selection and diagnosis aid model development, (5) Model validation, and (6) Feature Importance ranking and comparison of diagnostic performance between model and biomarker. S-COVID-19-P= suspected COVID-19 pneumonia, ### Statistical Analysis and Performance Evaluation Continuous variables were expressed as median with interquartile range (IQR) and compared with the Mann-Whitney U test; categorical variables were expressed as absolute (n) and relative (%) frequency and compared by χ2 test or Fisher’s exact test. A two-sided α of less than 0.05 was considered statistically significant. Statistical analysis was performed by R version 3.5.1. Model performance were evaluated by: 1) the area under the ROC curve (AUC) 17, 2) F-1 score, 3) Precision, 4) Sensitivity (Recall), 5) Specificity. AUC, ranging from 0 to 1, the higher the better, indicates the algorithm’s performances. Precision is the fraction of true positive classification among the positive results classified by algorithm; a higher precision indicates an algorithm’s result is reliable. Recall is the fraction of true positive classification among all the true samples, describes the ability of identifying true samples (S-COVID-19-P) among the whole population. F1 score is the harmonic average of precision and recall, higher F1 score indicates better performance. In this study, to avoid missed suspected cases, recall is the most important reference18. We considered the model with AUC above 0.80 and recall above 0.95 as the adequate and well-performed model. ## Results ### Study population: development and validation cohorts In development cohort, a total of 132 unique admissions with the epidemiological history of exposure to COVID-19 were included from Jan 14 to Feb 9, 2020. 26 patients were clinically identified as S-COVID-19-P according to the 6th-Guidelines-CNHHC and 7 patients out of them were further identified as C-COVID-19-P in Beijing. 10 (38.5%) out of 26 S-COVID-19-P cases were transferred to CDC after the first laboratory confirmation of COVID-19 infection by PLAGH. The left 16 (61.5%) S-COVID-19-P cases were kept hospitalizing for quarantine and further laboratory confirmation of COVID-19 infection. The 7 C-COVID-19-P cases were all belonged to moderate type based on the 6th-Guidelines-CNHHC, so as to no ICU admission and no death occurred. (Table 2) View this table: [Table 2:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T2) Table 2: Demographics, baseline and clinical characteristics of 132 patients admitted to PLA General Hospital (Jan 14–Feb 9, 2020) with the epidemiological history of exposure to COVID-19 in development cohort. These S-COVID-19-P cases with a median age of 39.5 (36.3-52.3), 17 (65.4%) were male and the median days of DOA were 2.5 (1.0-4.8). Non-suspected COVID-19 pneumonia (N-S-COVID-19-P) cases with a median age of 33.0 (28.0-40.0), 57 (53.8%) were male and the median days of DOA were 2.0 (1.0-5.0). C-COVID-19-P cases with a median age of 39.0 (37.0-41.5), 5 (71.4%) were male and the median days of DOA were 5.0 (3.5-5.5). (Table 2) Within 14 days before the onset of the disease, there were 3 (11.5%), 7 (6.6%) and 2 (28.6%) patients had a history of contact with COVID-19 infected patients (laboratory-confirmed infection) in suspected, non-suspected and confirmed COVID-19 pneumonia cases, respectively. On admission, the median heart rate [107.5 (100.0-116.2) vs 99.5 (89.5-110.0), p=0.035], diastolic blood pressure [89.5 (80.5-96.3) vs 81.0 (75.0-88.0), p=0.014], systolic blood pressure [145.5 (136.2-156.8) vs 134.0 (124.0-143.0), p<0.001] and the highest temperature [37.9 (37.4-38.5) vs 37.4 (36.8-37.8), p=0.006] were much higher in S-COVID-19-P cases than in N-S-COVID-19-P cases. (Table 2) The most common symptoms at onset of illness were fever [23 (88.5%), 70 (66.0%)], sore throat [15 (57.7%), 43 (40.6%)], and cough [12 (46.2%), 53 (50.0%)) in S-COVID-19-P and N-S-COVID-19-P cases, respectively. However, in C-COVID-19-P cases, muscle ache 6 (85.7%) and headache 5 (71.4%) were also the most common symptoms besides the fever 6 (85.7%), cough 5 (71.4%) and sore throat 5 (71.4%). (Table 2) The blood routine values of patients on admission showed lymphopenia [lymphocyte count <1·0 × 109/L; 9 (34.6%), 17 (16.0%) and 1 (14.3%)] and elevated monocyte ratio [monocyte ratio > 0.08; 12 (46.2%), 18 (17.0%) and 4 (57.1%)] in S-COVID-19-P, N-S-COVID-19-P and C-COVID-19-P cases, respectively. Early lymphopenia (p=0.051) and elevated monocyte ratio (p=0.003) were more prominent in S-COVID-19-P than N-S-COVID-19-P cases, but no statistically different between C-COVID-19-P and non-C-COVID-19-P in S-COVID-19-P cases. The ratio of elevated CRP cases on admission was more in S-COVID-19-P cases than N-S-COVID-19-P cases [13(50.0%) vs 29(27.4%), p=0.035], but no statistically significant between C-COVID-19-P cases and non-C-COVID-19-P in S-COVID-19-P cases [6(85.7%) vs 7(36.8%), p=0.190]. The ratio of elevated IL-6 cases on admission was also more in S-COVID-19-P cases than N-S-COVID-19-P cases [16(61.5%) vs 34(32.1%), p=0.007], but no statistically significant between C-COVID-19-P cases and non-C-COVID-19-P in S-COVID-19-P cases [6(85.7%) vs 10(52.6%), p=0.190]. (Table 3) View this table: [Table 3:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T3) Table 3: Laboratory results and CT findings of 132 patients admitted to PLA General Hospital (Jan 14–Feb 9, 2020) with the epidemiological history of exposure to COVID-19 in development cohort. On admission, 26 (100%) and 10 (9.4%) patients had positive CT findings in S-COVID-19-P and N-S-COVID-19-P cases, respectively. In S-COVID-19-P cases, multiple macular patches and interstitial changes accounted for 53.8% (n=14) and multiple mottling and ground-glass opacity accounted for 8.5% (n=9). Positive CT findings in 11 (42.3%) S-COVID-19-P cases and 6 (85.7%) C-COVID-19-P cases were obvious in extra-pulmonary zone. (Table 3) The descriptions and statistics of the development cohort’s demographics, baseline and clinical characteristics were summarized in Table 2, the laboratory results and CT findings were summarized in Table 3. Meanwhile, the same details of the validation cohort, a total of 33 unique admissions with the epidemiological history of exposure to COVID-19 from Feb 10 to Feb 26, 2020 were summarized in Table S2 and Table S3. ### Features selection Candidate features and univariable association with S-COVID-19-P are listed in Table S4 from the resulting coefficients of LASSO regularized logistic regression. Therefore, final selected features for model development are including: 1) 1 variable of demographic information (age); 2) 4 variables of vital signs [*e*.*g*., Temperature (TEM), Heart rate (HR), etc.]; 3) 5 variables of blood routine values [*e*.*g*., Platelet count (PLT), Monocyte ratio (MONO%), Eosinophil count (EO#), etc.]; 4) 7 variables of clinical signs and symptoms [*e*.*g*., Fever, Fever classification, Shiver, etc.]; 5) 1 infection-related biomarkers [Interleukin-6 (IL-6)]. The final selected features list was shown in Table 4. View this table: [Table 4:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T4) Table 4: Final selected features for model development ### Model performance in development and validation cohort The diagnosis aid model for S-COVID-19-P early identification on admission performed well in both development and validation cohort according to all evaluation criteria. For the LASSO regularized logistic regression, we introduce LASSO penalty from C = 0.25 to 7.5 with a step size = 0.25 in scikit-learn package and found C = 7.0 achieved optimal performance with respect to the AUC in the validation set. In the held-out testing set, we found AUC = 0.8409, F-1 score = 0.5714, precision = 0.4000, recall = 1.0000 and specificity = 0.727. In the validation set, we found AUC = 0.9383, F-1 score = 0.6667, precision = 0.5000, recall = 1.0000 and specificity = 0.778. (Table S1) ### Identifying Feature Importance We analyzed feature importance from the coefficient weights in the LASSO regularized logistic regression model. The list of feature importance ranking of diagnosis aid model for S-COVID-19-P early identification in development cohort is shown in Figure 2. Note that the top 5 important features that strongly associated with S-COVID-19-P were Age (0.1115), IL-6 (0.0880), SYS_BP (0.0868), MONO% (0.0679), and Fever classification (0.0569). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/03/20/2020.03.19.20039099/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/F2) Figure 2. Features Importance Ranking. Feature importance was performed in the development cohort. The associated coefficient weights correspond to the logistic regression model were used for identifying and ranking feature importance. Interleukin-6 (IL-6), Systolic blood pressure (SYS_BP), Monocyte ratio (MONO%), Fever classification (°C,Normal: <= 37.0; mild fever: 37.1-38.0; moderate fever: 38.1-39.0; severe fever: >=39.1), platelet count (PLT), diastolic blood pressure (DIAS_BP), Heart rate (HR), Mean corpuscular hemoglobin content (MCH), Temperature (TEM), Eosinophil count (EO#), Basophil count (BASO#). ### Comparison of diagnostic performance among diagnosis aid model and infection-related biomarkers The comparison of diagnostic performance among diagnosis aid model and prominently infection-related biomarkers (lymphopenia, elevated CRP, and elevated IL-6) for early identifying S-COVID-19-P in development cohort was shown in Table 5. The performance of the diagnosis aid model was better than lymphopenia, elevated CRP, and elevated IL-6, respectively, which resulted in AUCs of 0.841, 0.407, 0.613 and 0.599, Recall of 1.0000, 0.346, 0.500 and 0.615. View this table: [Table 5:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T5) Table 5: Comparison of diagnostic performance among diagnosis aid model and infection-related biomarkers ### Online Suspected COVID-19 Pneumonia Diagnosis Aid System We made the validated diagnosis aid model by LASSO regularized logistic regression algorithm as the “Suspected COVID-19 pneumonia Diagnosis Aid System” which was publicly available through our online portal at [https://intensivecare.shinyapps.io/COVID19/](https://intensivecare.shinyapps.io/COVID19/). ## Discussion In this retrospective observation, we evaluated the development and validation of a diagnosis aid model based on machine-learning algorithm and clinical data without CT images for S-COVID-19-P early identification. The clinical data comes from the demographic information, routinely clinical signs, symptoms and laboratory tests before the further CT examination. Therefore, in fever clinics under epidemic outbreak, such diagnosis aid model might improve triage efficiency, optimize medical service process, and save medical resources. From the results in LASSO regularized logistic regression, though some false alarm may exist, the model is able to identify 100% of the suspected cases in both held-out testing set and external validation set. By applying this stringent rule to the clinical diagnosis, it is of our great interest to avoid any missed cases. This suggests that our diagnosis aided system is able to help doctors make decision of suspected cases in a highly reliable manner. According to the analysis of features selection and features importance ranking, the univariable from the most demographic information, clinical signs, symptoms and blood routine values on admission could not show a remarkable association with S-COVID-19-P, which indicated that they may not be informative and increased the difficulty for early identifying S-COVID-19-P with routinely clinical information. Therefore, it is necessary to integrate all above nonspecific but important features by machine-learning algorithms for secondary analysis and developing cost-effective diagnosis aid models19, 20. The infection-related biomarkers, most prominently lymphopenia, elevated CRP and IL-6, played a key role in identifying clinical infections, such as the lymphopenia have been included as one of three diagnostic criteria for S-COVID-19-P based on 6th-Guidelines-CNHHC3, 21, 22. In this study, all of these three biomarkers based on the blood routine test on admission could distinguish S-COVID-19-P from the N-S-COVID-19-P well. According to the comparison of diagnostic performance among diagnosis aid model and these biomarkers, the diagnosis aid model significantly outperformed in AUC and Recall than other biomarkers, which highlighting its potential use for clinical triage. Moreover, we also found that the early elevated monocyte ratio in development cohort and the early elevated monocyte count could identify S-COVID-19-P from N-S-COVID-19-P well in this study, which suggested that monocyte ratio or monocyte count would also be a new potentially infection-related biomarker for S-COVID-19-P early identification22. Although CT scan was becoming a major diagnostic tool helping for early screening of S-COVID-19-P cases, it could not satisfy every patient when the medical resources insufficient in the epidemic outbreak. From the result of CT findings in development and validation cohort, there were only 10 (9.4%) and 4 (14.8%) N-S-COVID-19-P cases have mild CT findings on admission, which indicated that the triage strategies for CT scans mainly based on fever or lymphopenia need further optimizing23. Therefore, it is meaningful to use machine-learning algorithms to comprehensive analyze clinical symptoms, routine laboratory tests and other clinical information before further CT examination and develop diagnosis aid model to improve the triage strategies in fever clinics, which would make a well balance between standard medical principles and limited medical resources. The developed and validated model performances clearly confirmed that the early identification of S-COVID-19-P in fever clinics could be accurately triaged based only on clinical information without CT images on admission. After features selection, the final developed model based on fewer predictors could perform well according to most evaluation criteria, and also have a better result in further validation. Therefore, the final model based on a small number of features would be likely applicable in most fever clinics. One of the most effective strategies to control epidemic outbreak was the establishment of an efficient triaging process for early identification S-COVID-19-P in fever clinics23. Based on our successful experience in Beijing and well performed ‘Suspected COVID-19 Pneumonia Diagnosis Aid System’, we have designed the following improved S-COVID-19-P early identification strategies in adult fever clinics (Figure 3). All patients with fever, sore throat or cough, whether there is hypoxia or not, we proposed routinely take the measurements of blood routine, CRP, IL-6 and influenza virus (A+B) test. Then, if the results of the above tests are normal and the patient without any epidemiological history, home quarantine, regular treatment (such as oral antibiotics) and continuous monitoring clinical signs and symptoms are suggested. If not, a rapid and artificial intelligence assisted evaluation of all clinical results will be required based on our ‘Suspected COVID-19 Pneumonia Diagnosis Aid System’ for S-COVID-19-P early identification, which helping for a decision-support of whether the next CT examination is needed. When the clinical symptoms do not relieve in a few days for home-quarantine patients, they would be required to return for further examination (such as CT scan). Meanwhile, patients with negative CT findings would also be advised to have a home quarantine with regular treatment and continuous monitoring. Therefore, artificial intelligence assisted diagnosis aid system for S-COVID-19-P would take the most advantages of clinical symptoms, routine laboratory tests and other clinical information which available on admission before further CT examination in order to improve the triage strategies in fever clinics and make a balance between standard medical principles and limited medical resources. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/03/20/2020.03.19.20039099/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/F3) Figure 3. Flow chart for improved S-COVID-19-P early identification strategies in adult fever clinics in PLAGH, China. CRP= C-reactive protein, IL-6= Interleukin-6. Our current study has several strengths. First, we successfully used machine-learning algorithm to analyze clinical datasets without CT images and develop a diagnosis aid model for early identification of S-COVID-19-P cases in fever clinic, which would become a key method to answer the questions of insufficient medical resources in epidemic outbreak. Second, we integrated most of the routinely available data on admission, including 46 features which would be considered containing the largest number of predictors. Third, we found that the admitted monocyte ratio or monocyte count in blood routine test was more discriminant in S-COVID-19-P cases which might be a new potential infection-related biomarker for early identification. Fourth, we also discussed an optimized triage strategy in fever clinics for early identification of S-COVID-19-P with the help of our new diagnosis aid model which would help to make a balance between standard medical principles and limited medical resources. Fifth, the final model based on a small number of features are likely available in most fever clinics, which has the advantages to increase the possibility of worldwide use and generalizability. Lastly, the developed and validated diagnosis aid model was publicly available as an online triage calculator. This is the first of this method and provides a platform and useful tool for future biomarker and S-COVID-19-P early identification studies in limited resource settings. Although the diagnosis results are highly reliable according to the recall score, this study may still exist following inevitable limitations. First, we only evaluated lymphopenia, elevated CRP and elevated IL-6, while other biomarkers might be more discriminant. Second, the data size was relatively small based on only a single-center fever clinic, which calls for ‘big data’ analysis depend on multiple-center fever clinics. Third, model was developed and validated for mildly ill patients and with less comorbidities; therefore, more well-performing models would be welcomed for specifically subpopulation. Fourth, since the model was developed and validated in a single-center fever clinic, the performance might vary when evaluated in other fever clinics, particularly if they differ in patient characteristics and COVID-19 prevalence. Therefore, the diagnosis aid model of this study requires further external validation based on different background populations. Fifth, there is a potential risk for misuse of the online calculator. The suited patients and the classification threshold should be taken more consideration so as to make the right choice and decision24. Last but not the least, the “Suspected COVID-19 pneumonia Diagnosis Aid System” would only be used as one of the auxiliary references for making clinical and management decisions. ## Conclusion We successfully used machine-learning algorithm to develop a diagnosis aid model without CT images for early identification of S-COVID-19-P, and the diagnostic performance was better than lymphopenia, elevated CRP and elevated IL-6 on admission. The recall score on both held-out testing and validation sets are all 100%, suggest the model is highly reliable for clinical diagnosis. We also discussed an optimized triage strategy in fever clinics for early identification of S-COVID-19-P with the help of our new diagnosis aid model which would make a well balance between standard medical principles and limited medical resources. To facilitate further validation, the developed diagnosis aid model is available online as a triage calculator. ## Data Availability The data that support the findings of this study are available from the corresponding author on reasonable request. Participant data without names and identifiers will be made available after approval from the corresponding author, PLAGH and National Health Commission. After publication of study findings, the data will be available for others to request. The research team will provide an email address for communication once the data are approved to be shared with others. The proposal with detailed description of study objectives and statistical analysis plan will be needed for evaluation of the reasonability to request for our data. The corresponding author, PLAGH and National Health Commission will make a decision based on these materials. Additional materials may also be required during the process. ## Author Contributions CF designed the study, conducted the data collection, data analysis, data interpretation, and wrote the manuscript. ZH and LL conducted the data analysis, data interpretation, conducted the online calculator, developed the website, and wrote the manuscript. WS, XC, YZ, FZ, XS and YW conducted the data interpretation and reviewed the manuscript. FP, LT, WZ, HC, LZ, and QH conducted the data interpretation and wrote the manuscript. LC, ZZ, JZ, HX and YL reviewed the manuscript. GL, WC, and TL conducted the data interpretation and reviewed the manuscript. ## Compliance with Ethical Standards Data collection was passive and had no impact on patient safety. This study was approved by the PLA General Hospital ethics committee. ## Conflicts of Interest The authors declare that they have no conflict of interest. ## Data sharing The data that support the findings of this study are available from the corresponding author on reasonable request. Participant data without names and identifiers will be made available after approval from the corresponding author, PLAGH and National Health Commission. After publication of study findings, the data will be available for others to request. The research team will provide an email address for communication once the data are approved to be shared with others. The proposal with detailed description of study objectives and statistical analysis plan will be needed for evaluation of the reasonability to request for our data. The corresponding author, PLAGH and National Health Commission will make a decision based on these materials. Additional materials may also be required during the process. View this table: [Table S1:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T6) Table S1: Comparison of different algorisms View this table: [Table S2:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T7) Table S2: Demographics, baseline and clinical characteristics of 33 patients admitted to PLA General Hospital (Feb 10–Feb 26, 2020) with the epidemiological history of exposure to COVID-19 in validation cohort. View this table: [Table S3:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T8) Table S3: Laboratory results and CT findings of 33 patients admitted to PLA General Hospital (Feb 10–Feb 26, 2020) with the epidemiological history of exposure to COVID-19 in validation cohort.. - Continuous variables were expressed as median with interquartile range (IQR) and compared with the Mann-Whitney U test; categorical variables were expressed as absolute (n) and relative (%) frequency and compared by χ2 test or Fisher’s exact test. A two-sided α of less than 0.05 was considered statistically significant. Increased means over the upper limit of the normal range and decreased means below the lower limit of the normal range. COVID-19: 2019 novel coronavirus. View this table: [Table S4:](http://medrxiv.org/content/early/2020/03/20/2020.03.19.20039099/T9) Table S4: Candidate features and univariable association with S-COVID-19-P ## Acknowledgements The present study was supported by grants from the PLA Science and Technology Project (14CXZ005, AWS15J004, 16BJZ19), National Key R&D Program of China (2019YFF0302300), Construction Project of Key Disciplines in the 13th Five-Year Plan of the PLA (Traumatic Surgery in the Battlefield, 2019-126, 2019-513), Beijing Science and Technology New Star Project (XX2018019/Z181100006218028), the PLA General Hospital Science and technology Project (2019XXJSYX20, 2018XXFC-20, ZH19016). * Received March 19, 2020. * Revision received March 19, 2020. * Accepted March 20, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. [1].Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature 2020. 2. [2].Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet (London, England) 2020; 395(10223): 497–506. 3. [3].Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet (London, England) 2020; 395(10223): 507–13. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30211-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 4. [4].Chan JF, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet (London, England) 2020; 395(10223): 514–23. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30154-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 5. [5].Xu Z, Shi L, Wang Y, et al. Pathological findings of COVID-19 associated with acute respiratory distress syndrome. The Lancet Respiratory medicine 2020. 6. [6].Kim JY, Choe PG. The First Case of 2019 Novel Coronavirus Pneumonia Imported into Korea from Wuhan, China: Implication for Infection Prevention and Control Measures. 2020; 35(5): e61. 7. [7].Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of global health concern. Lancet (London, England) 2020; 395(10223): 470–3. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30185-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 8. [8].The L. Emerging understandings of 2019-nCoV. Lancet (London, England) 2020; 395(10221): 311. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30186-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 9. [9].Chang, Lin M, Wei L, et al. Epidemiologic and Clinical Characteristics of Novel Coronavirus Infections Involving 13 Patients Outside Wuhan, China. Jama 2020. 10. [10].Holshue ML, DeBolt C, Lindquist S, et al. First Case of 2019 Novel Coronavirus in the United States. The New England journal of medicine 2020; 382(10): 929–36. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2001191&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 11. [11].Lee EYP, Ng MY, Khong PL. COVID-19 pneumonia: what has CT taught us? The Lancet Infectious diseases 2020. 12. [12].Shi H, Han X, Jiang N, et al. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. The Lancet Infectious diseases 2020. 13. [13].Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. The New England journal of medicine 2019; 380(14): 1347–58. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMra1814259&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 14. [14].Bailly S, Meyfroidt G, Timsit JF. What’s new in ICU in 2050: big data and machine learning. 2018; 44(9): 1524–7. 15. [15].Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CA, Jr.., Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Critical care 2019; 23(1): 64. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) 16. [16].Reid S, Tibshirani R. Regularization Paths for Conditional Logistic Regression: The clogitL1 Package. Journal of statistical software 2014; 58(12). 17. [17].Bradley APJPr. The use of the area under the ROC curve in the evaluation of machine learning algorithms. 1997; 30(7): 1145–59. 18. [18].Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology (Cambridge, Mass) 2010; 21(1): 128–38. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/EDE.0b013e3181c30fb2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20010215&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000272872900023&link_type=ISI) 19. [19].Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Science translational medicine 2015; 7(299): 299ra122. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTE6InNjaXRyYW5zbWVkIjtzOjU6InJlc2lkIjtzOjE0OiI3LzI5OS8yOTlyYTEyMiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzAzLzIwLzIwMjAuMDMuMTkuMjAwMzkwOTkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 20. [20].Komorowski M, Celi LA. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. 2018; 24(11): 1716–20. 21. [21].Wong CK, Lam CW, Wu AK, et al. Plasma inflammatory cytokines and chemokines in severe acute respiratory syndrome. Clinical and experimental immunology 2004; 136(1): 95–103. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1365-2249.2004.02415.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15030519&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F03%2F20%2F2020.03.19.20039099.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000220306300014&link_type=ISI) 22. [22].Wu J, Wu X, Zeng W, et al. Chest CT Findings in Patients with Corona Virus Disease 2019 and its Relationship with Clinical Features. Investigative radiology 2020. 23. [23].Zhang J, Zhou L, Yang Y, Peng W, Wang W, Chen X. Therapeutic and triage strategies for 2019 novel coronavirus disease in fever clinics. The Lancet Respiratory medicine 2020. 24. [24].Flechet M, Guiza F, Schetz M, et al. AKIpredictor, an online prognostic calculator for acute kidney injury in adult critically ill patients: development, validation and comparison to serum neutrophil gelatinase-associated lipocalin. Intensive care medicine 2017; 43(6): 764–73.