Accuracy of deep learning based computed tomography diagnostic system of COVID-19: a consecutive sampling external validation cohort study

Tatsuyoshi Ikenoue; Yuki Kataoka; Yoshinori Matsuoka; Junichi Matsumoto; Junji Kumasawa; Kentaro Tochitatni; Hiraku Funakoshi; Tomohiro Hosoda; Aiko Kugimiya; Michinori Shirano; Fumiko Hamabe; Sachiyo Iwata; Shingo Fukuma; Japan COVID-19 AI team

doi:10.1101/2020.11.15.20231621

Abstract

Objectives Ali-M3, an artificial intelligence, analyses chest computed tomography (CT) and detects the likelihood of coronavirus disease (COVID-19) in the range of 0 to 1. It demonstrates excellent performance for the detection of COVID-19 patients with a sensitivity and specificity of 98.5 and 99.2%, respectively. However, Ali-M3 has not been externally validated. Our purpose is to evaluate the external validity of Ali-M3 using Japanese sequential sampling data.

Methods In this retrospective cohort study, COVID-19 infection probabilities were calculated using Ali-M3 in 617 symptomatic patients who underwent reverse transcription-polymerase chain reaction (RT-PCR) tests and chest CT for COVID-19 diagnosis at 11 Japanese tertiary care facilities, between January 1 and April 15, 2020.

Results Of 617 patients, 289 patients (46.8%) were RT-PCR-positive. The area under the curve (AUC) of Ali-M3 for predicting a COVID-19 diagnosis was 0.797 (95% confidence intervals [CI]: 0.762-0.833) and goodness-of-fit was P = 0.156. With a cut-off of probability of COVID-19 by Ali-M3 diagnosis set at 0.5, the sensitivity and specificity were 80.6% and 68.3%, respectively, while a cut-off of 0.2 yielded a sensitivity and specificity of 89.2% and 43.2%, respectively. Among 223 patients who required oxygen support, the AUC was 0.825 and sensitivity at a cut-off of 0.5 and 0.2 were 88.7% and 97.9%, respectively. Although the sensitivity was lower when the days from symptom onset were few, sensitivity increased for both cut-off values after 5 days.

Conclusions Ali-M3 was evaluated by external validation and shown to be useful to exclude a diagnosis of COVID-19.

Key Points

The area under the curve (AUC) of Ali-M3, which is an AI system for diagnosis of COVID-19 based on chest CT images, was 0.797 and goodness-of-fit was P = 0.156.
With a cut-off of probability of COVID-19 by Ali-M3 diagnosis set at 0.5, the sensitivity and specificity were 80.6% and 68.3%, respectively, while a cut-off of 0.2 yielded 89.2% and 43.2%.
Although low sensitivity was observed in less number of days from symptoms onset, after 5 days high increasing sensitivity was observed. In patients requiring oxygen support, the AUC was higher that is 0.825.

Introduction

A proper triage system is necessary during this coronavirus disease (COVID-19) pandemic era,[1, 2] as improper triage systems may disadvantage patients and lead to wastage of personal protective equipment (PPE) and hospital infections through admission of infected patients to facilities, causing collapse of the medical system. Although reverse transcription-polymerase chain reaction (RT-PCR) tests have been developed, the delay in waiting for RT-PCR results can hamper proper triage.

Computed tomography (CT) is a fast and useful diagnostic tool. Some studies have reported the characteristic findings on chest CT images of COVID-19 patients.[3-8] Use of chest CT images by radiologists has shown high diagnostic performance for COVID-19. However, even radiologists’ interpretations vary largely, because of the influence of their habituation in the interpretation of COVID-19 CT images.[9] Therefore, using CT as a diagnostic tool in general clinical practice is difficult in the current situation.

Diagnostic support systems using artificial intelligence (AI) have the potential to replace many of the routine detection, characterisation, and quantification tasks currently performed by radiologists using cognitive ability.[10] AI can prevent the variability of diagnosis from inter- and intra-reader variability. In China, where COVID-19 infection originated, many AI systems were developed for establishing a diagnosis of COVID-19 based on chest CT images.[11-15] One such system, Ali-M3, can detect the likelihood of COVID-19 in the range of 0 to 1, and has excellent accuracy for the detection of COVID-19 with an accuracy, sensitivity, and specificity of 99.0, 98.5, and 99.2%, respectively. Although Ali-M3 has excellent accuracy, it was developed in a virtual population, which consisted of 3,067 examinations for COVID-19; 1,996 for community-acquired pneumonia; and 1,975 for non-pneumonia, which was different from the general population and its accuracy could be overestimated.[16]

To use Ali-M3 to diagnose exclusion of COVID-19, its external validity must be evaluated based on the distribution of diseases in a real-world setting. We here conducted a retrospective cohort study to evaluate the external validity of Ali-M3 using the Japanese sequential sampling data of patients who underwent RT-PCR tests and chest CT for diagnosis of COVID-19.

Materials and Methods

Study design

This retrospective cohort study consisted of 11 Japanese tertiary care facilities that provided treatment for COVID-19 in each area. We partially followed the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Statement to plan and report this study (Supplemental Table 1).[17] The institutional review board of each facility approved the study and the need to obtain written informed consent was waived.

View this table:

Table 1.

Demographics of patients’ characteristics.

Participants

We included patients who underwent both RT-PCR examinations and chest CT for the diagnosis of COVID-19. The potentially eligible participants were identified on the advice of physicians that both RT-PCR test and chest CT be obtained when the patients presented with symptoms or were suspected of having COVID-19. The detailed information of the inclusion criteria is shown in Supplemental Table 2. We selected patients by using consecutive sampling methods between January 1 and April 15, 2020. The RT-PCR results were extracted from the patients’ medical records at each facility. Patients were excluded when the time-interval between chest CT and the first RT-PCR assay was longer than 7 days.

View this table:

Table 2.

Moving cut-off confidence score and test performance.

All available data on the database were used to maximize the power and generalizability of the results.

Chest CT protocols

All images were obtained on one of five types of CT systems, with the patient in the supine position.

The details of scanning parameters and systems are shown in Supplemental Table 3.

Image analysis

We used a three-dimensional deep learning framework for the detection of COVID-19 infections.[16] The details of this model are included in Appendix 1. The learning of Ali-M3 was stopped before our evaluation. We set a cut-off point for the model output at 0.5, because this cut-off point was used during the developing stage. The investigators who entered the CT images data into Ali-M3 were blinded to the RT-PCR results.

Reference standard

The diagnosis of COVID-19 was established by the RT-PCR test, which detected the nucleic acid of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the sputum, throat swabs, and secretions of the lower respiratory tract samples.[18] We established the RT-PCR tests as the main reference standard. Although the findings of chest CT, interpreted by radiologists, were included as the reference standard in the derivation study, we did not include it as the reference standard in the present study.

Statistical analysis

Statistical analysis was performed using R statistical software, version 3.6.3 (R Foundation for Statistical Computing). Data analysis was performed in a complete-case dataset. Continuous variables are presented as means (standard deviation) and categorical variables are presented as counts and percentages. Using the RT-PCR results as reference, the area under the curve (AUC), sensitivity, specificity, positive-predictive value, and negative-predictive value of the likelihood of COVID-19 as derived from the Ali-M3’s analysis of chest CT imaging were calculated. A 95% confidence interval (CI) was determined by the Wilson score method. The goodness-of-fit was calculated using the Le Cessie-Van Houwelingen normal test statistic for the unweighted sum of squared errors.

Sensitivity analysis

1. Moving cut-off point

The objective of this study was to determine whether this AI model could be used as a screening tool for COVID-19 in the real world. In a clinical situation, physicians require an accurate diagnosis of COVID-19; hence, they insist more on sensitivity than on specificity. For sensitivity analysis, we moved the cut-off point and observed sensitivities and specificities to minimize overlooking COVID-19 patients.

2. Simulation of imperfect reference

In the main analysis, we assumed RT-PCR as the perfect reference (100% sensitivity and 100% specificity). However, in the real world, RT-PCR is not the perfect reference since the sensitivity of the RT-PCR test was estimated at 0-80%.[19] To evaluate the effect of this imperfect reference, we calculated the sensitivity, specificity, and AUC of Ali-M3 using the methods and R code described in the Supplemental Material when varying the sensitivity, but fixing the specificity of RT-PCR at 100%.[20]

3. Effect of the number of days after symptom onset

The number of days that have passed since the onset of symptoms affects the performance of antibody and RT-PCR tests in COVID-19 patients.[19, 21] However, it was not clear if this could affect CT images in COVID-19 patients. Sensitivity and specificity were calculated for a group of patients whose symptom onset date was known, among those were those with 14 days or more, as well as those at every 2 days from 0 to 13 days after symptom onset.

4. Effect of symptom severity

Imaging is not routinely indicated as a screening test for COVID-19 in asymptomatic individuals.[22] However, CT images are used in assessment of disease severity. We established the severity by evaluating whether oxygen therapy was required and if the patient was asymptomatic while undergoing CT.

5. Effect of reconstruction slice

The thickness of the reconstruction slice can affect the diagnostic performance.[23] We separated the dataset for the main analysis by a 3-mm reconstruction slice thickness to account for the fissure in our data set between 3 mm and 4 mm and calculated the performance of the model in each dataset.

Results

Study population characteristics

Figure 1 shows the patient flow diagram. Data of 749 patients were evaluated. We assessed 617 symptomatic patients in this validation study. The characteristics of the study population for the main analysis datasets are shown in Table 1. Overall, 289 patients (46.8%) were diagnosed with COVID-19 using the RT-PCR test. Thirteen patients need more than two RT-PCR tests before being diagnosed with COVID-19. Major symptoms were dry cough (37.6%), fever (33.5%), and sore throat (25.8%).

Figure 1.

Patient flow.

Abbreviations: CT, computed tomography; RT-PCR, reverse transcription polymerase chain reaction; DICOM, digital imaging and communications in medicine

Model performance

The performance of the confidence score after validation among symptomatic patients is shown in Figure 2. The performance of the confidence score was P = 0.156 for the goodness-of-fit, and the AUC was 0.797 (95% CI 0.762-0.833). The relationship between the score and predicted probability is shown in Figure 2. The optimal cut-off point with maximal sensitivity and specificity was 0.5, and the sensitivity and specificity were 80.6% (233 of 289) [95% CI: 75.6-85.0%] and 68.3% (224 of 328) [95% sCI, 63.0-%], respectively.

Figure 2.

Differential performance of Ali-M3 for coronavirus disease in symptomatic patients.

(A) A plot of test sensitivity (y-coordinate) versus its false-positive rate (x-coordinate) obtained at each cutoff level confidence score. The area under the receiver operating characteristic curve is 0.797 and the Youden index is 0.50. (B) A plot of test sensitivity, specificity, positive predictive value (PV+), and negative predictive value (PV-) in y-coordinate versus confidence score obtained from Ali-M3 in x-coordinate. The PV+ is dark gray and the PV-is light gray. The maximum PV+ is 46.8% and the maximum PV-is 53.2%. (C) This graph shows the goodness-of-fit. The dashed line is an ideal line that predicts the probability obtained from the confidence score of Ali-M3 equal to the actual probability. The pointed line is the fitted line that is estimated with non-linear assumption alone. The dashed line is the fitted line that is estimated with non-linear assumption and considering the bias in nonparametric estimation using the le Cessie-van Houwelingen method.

Sensitivity analysis

1. Moving cut-off point

Table 2 shows the relationship between cut-off points for the confidence score and performance.When the cut-off point was 0.2, the sensitivity and specificity were 89.2% and 43.3%, respectively.

2. Simulation of imperfect reference

Figure 3 shows the sensitivity and specificity, with the assumption of imperfect reference of RT-PCR test. The AUC was 0.865. When the cut-off point was set at 0.5, using the Youden Index, the sensitivity and specificity were 80.6% and was 81.3%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 89.2% and 51.9%, respectively.

Figure 3.

Relationship between test performance and the number of days after the onset of symptoms.

(A) The graph shows the relationship between test performance and the number of days after the onset of symptoms when the confidence score from Ali-M3 is at 0.20. (B) The graph shows the relationship between test performance and the number of days after onset of symptoms when the confidence score from Ali-M3 is at 0.50. Light gray bar shows the number of patients included in the strata of days after the onset of symptoms, following the right axis. One stratum includes 2 days from day 0 to day 13. The stratum to the extreme right includes 14 days or more. Following the left axis, solid lines are sensitivity in strata and dash lines are specificity in strata.

3. Effect of number of days after symptom onset

Of all symptomatic patients, 600 patients (97.2%) were included in this sensitivity analysis. Of these, the number of days after the onset of symptoms was not known for 17 patients. Figure 4 shows the relationship between test performance and the number of days since the onset of symptoms when the confidence score of Ali-M3 was set at 0.5 or 0.2. Sensitivity values started at 0.7 and increased up to 1.0 until 10-11 days in both cases. However, specificity values remained similar across the strata. The sensitivity increased over 0.9 when the confidence score was set at 0.2 than when the confidence score was set at 0.5.

Figure 4.

Receiver operating characteristic (ROC) curves in ignoring imperfect reference and considering imperfect reference.

(A) A plot of test sensitivity (y-coordinate) versus its false-positive rate (x-coordinate) obtained at each cutoff level confidence score ignoring imperfect reference. The area under the ROC curve is 0.797. (B) A plot of test sensitivity (y-coordinate) versus its false-positive rate (x-coordinate) obtained at each cutoff level confidence score considering imperfect reference. The area under the ROC curve is 0.865.

4. Changing the eligibility criteria

The effects of changing the criteria for patient eligibility are shown n Figure 5.

Figure 5.

Differential performance of Ali-M3 for coronavirus disease in asymptomatic patients and patients using oxygen support.

(A) A plot of test sensitivity (y-coordinate) versus its false-positive rate (x-coordinate) obtained at each cutoff level confidence score in asymptomatic patients. The area under the receiver operating characteristic (ROC) curve is 0.623 and the Youden index is 0.25. (B) A plot of test sensitivity, specificity, positive predictive value (PV+), and negative predictive value (PV-) in y-coordinate versus confidence score obtained from Ali-M3 in x coordinate among asymptomatic patients. The PV+ is dark gray and PV-is light gray. The maximum PV+ is 43.0% and maximum PV-is 57.0%. (C) A plot of test sensitivity (y-coordinate) versus its false-positive rate (x-coordinate) obtained at each cutoff level confidence score in patients using oxygen support. The area under the ROC curve is 0.623 and the Youden index is 0.25. (D) A plot of test sensitivity, specificity, PV+, and PV-in y-coordinate versus confidence score obtained from Ali-M3 in x-coordinate in patients using oxygen support. The PV+ is dark gray and the PV-is light gray. The maximum PV+ is 43.5% and the maximum PV-is 56.5%.

Dataset focused on asymptomatic patients

There were 86 asymptomatic patients (RT-PCR positive: 37). Using these patients only, the AUC was 0.623. When the cut-off point was 0.5, the sensitivity and specificity were 51.4% and 59.2%, respectively. When the cut-off point was 0.2, the sensitivity and specificity were 44.9% and 73.0%, respectively.

Dataset focused on patients needing oxygen therapy

There were 223 patients who required oxygen support (RT-PCR positive: 97). When using these patients only, the AUC was 0.828. When the cut-off point was set at 0.5, the sensitivity and specificity were 88.7% and 57.9%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 97.9% and 34.9%, respectively.

5. Effect of the thickness of the CT reconstruction slice of CT

There were 320 patients (RT-PCR positive: 121) with a reconstruction slice thickness of under 3 mm When considering these patients only, the AUC was 0.825. When the cut-off point was set at 0.5, the sensitivity and specificity were 82.6% and 69.7%, respectively. When the cut-off point was set at 0.2, the sensitivity and specificity were 94.2% and 51.5%, respectively. In patients with a reconstruction slice thickness over 3 mm, the AUC was 0.789 (Supplement Figure 1)

Discussion

In this external validation study, our results indicated that Ali-M3 could be useful for early triage of suspected COVID-19 patients with symptoms at a lower cut-off. In particular, higher accuracy was observed in patients with higher severity and a few days since symptom onset, and with images with a thinner reconstructed CT slice thickness.

Currently, all patients with symptoms, such as fever, are triaged as COVID-19 patients. Thus, medical practitioners must use PPE for each patient.[24] Additionally, bed zoning is essential to avoid contamination of non-infected patients.[25] On the other hand, under-triage cause hospital infections through admission of infected patients to facilities. This should continue until a definitive diagnosis is established. Since Ali-M3 is available on the cloud, the physician can receive the results immediately by sending the digital imaging and communications in medicine images from the ordinal picture archiving and communication system. When applying triage, clinicians require sufficient accuracy in terms of sensitivity, but specificity is less important.[19] The high sensitivity obtained at a cut-off of 0.2 with the AI diagnosis is useful for exclude the diagnosis of COVID-19.

Ali-M3 also has the potential to support a diagnosis of COVID-19. The tools currently used for diagnosing COVID-19 infection are antibody, antigen, and RT-PCR tests. Both antigen and RT-PCR tests use tracheal secretions or saliva. An antigen test requires an antigen protein above a given detectable level, and is currently inferior to RT-PCR tests. As the same patient sample is used, the antigen test cannot support the RT-PCR test. The RT-PCR test is currently used as a gold standard, but the sensitivity changes depending on the number of days after the onset of symptoms.[19] Therefore, for an exclusion diagnosis, multiple tests staggered over time are needed, rather than a single negative RT-PCR test. Even when this test is performed as rapidly as possible, it still requires a few days to obtain multiple test results. On the other hand, Ali-M3 uses the configurational information of patients’ lungs, and can add different information than obtained from RT-PCR, thereby complementing the drawbacks of RT-PCR.

In this study, the diagnostic accuracy at the validation stage was lower than the accuracy at the development stage. A two-gate (case-control) design was used in the development of the AI system but in the present study for evaluating the ability of Ali-M3 to assess a COVID-19 diagnosis by chest CT image, we used a single-gate (cohort) design. Although many studies have used the two-gate design in evaluation of AI for the diagnosis of COVID-19,[26] the two-gate design is generally prone to overestimation of diagnostic test results.[27] Thus, blindly using the results based on a two-gate design in a clinical situation can be inappropriate. Moreover, other factors should be considered. With the use of a two-gate design, the fact that RT-PCR is an imperfect reference standard is typically ignored. Furthermore, performing culture and tests to ascertain the true sensitivity of this test is difficult. In the present study, we simulated the diagnostic ability of Ali-M3 with consideration that the sensitivity of the reference standard was imperfect, which leads to underestimation of the specificity and AUC of Ali-M3, without distortion of the sensitivity. Furthermore, the outcomes while developing Ali-M3 and while examining its adequacy were different. Taking into account the patient flow in China, the outcomes at the development stage were set as positive cases with RT-PCR negative results and positive CT image findings.[28] This had a small effect on the sensitivity, but a large effect on the specificity. For example, if in the development stage, 33.9% of the positive patients had negative RT-PCR results and positive CT image findings,[28] then the performance that showed a sensitivity of 98.5% and specificity of 99.2% in the developing Ali-M3,[16] changes from 97.7% to 100% for sensitivity and from 80.8% to 81.6% for specificity when positive RT-PCR is the only reference used. Upgrading to a diagnostic AI that targets only RT-PCR-positive cases at the development stage is desirable.

This study had some limitations. First, the differentiation performance of Ali-M3 was poor in asymptomatic patients; thus, Ali-M3 should not be used to screen asymptomatic patients. While an alternative to the RT-PCR test for COVID-19 is expected in terms of screening for nosocomial infections and screening on admission for patients with other diseases, Ali-M3 is not recommended for this purpose. Second, we could not differentiate COVID-19 from other viral pneumonias. Compared to the past five seasons, the number of Japanese people infected with influenza during this season was markedly low.[29] In fact, only a few cases in our cohort were diagnosed with other viral pneumonias. Third, it could not reflect the difference in imaging features caused by different COVID-19 types. In addition to type A COVID-19 that was initially prevalent in Asia, type B and type C were prevalent in Europe and in the United States. These different types were not determined in the PCR test, and thus we could not evaluate these differences.

In conclusion, we conducted a retrospective cohort study for external validation of Ali-M3. Our results indicated that AI-based CT diagnosis could be useful for a diagnosis of exclusion of COVID-19 in symptomatic patients, particularly those requiring oxygen and with only a few days since symptom onset. Using Ali-M3 support can reduce PPE consumption and prevent hospital infections through the admission of covertly infected patients. Moreover, Ali-M3 also has the potential to support the diagnosis of RT-PCR for suspected COVID-19 patients. However, as Ali-M3 had some limitations in terms of development, further studies and learning are warranted for updating the system.

Data Availability

The data that support the findings of this study are available from the corresponding author, Tatsuyoshi Ikenoue, upon permission of IRB at Hyogo Prefectural Amagasaki General Medical Center and reasonable request.

Compliance with ethical standards

Guarantor

The scientific guarantor of this publication is Tatsuyoshi Ikenoue.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Funding

The authors state that this work has not received any funding.

Statistics and biometry

No complex statistical methods were necessary for this paper.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained.

Study subjects or cohorts overlap

Any study subjects or cohorts have not been previously reported.

Methodology

retrospective
diagnostic or prognostic study
multicentre study

Acknowledgments

We thank M3 Inc., and Clinical Porter for the support with providing free Ali-M3 and data storage, although they did not participate in the preparation protocol and manuscript. To want to access Ali-M3, reader can contact M3 (m3-ai-lab@m3.com). We also thank Ms. Kyoko Wasai, who assisted retrieving data.

Footnotes

Funding information: None

Abbreviations

AI: artificial intelligence
COVID-19: coronavirus disease 2019
CT: computed tomography

References

1.↵
Maves RC, Downar J, Dichter JR, Hick JL, Devereaux A, Geiling JA, Kissoon N, Hupert N, Niven AS, King MA et al: Triage of Scarce Critical Care Resources in COVID-19 An Implementation Guide for Regional Allocation: An Expert Panel Report of the Task Force for Mass Critical Care and the American College of Chest Physicians. Chest 2020, 158(1):212–225.
OpenUrl
2.↵
Carenzo L, Costantini E, Greco M, Barra FL, Rendiniello V, Mainetti M, Bui R, Zanella A, Grasselli G, Lagioia M et al: Hospital surge capacity in a tertiary emergency referral centre during the COVID-19 outbreak in Italy. Anaesthesia 2020, 75(7):928–934.
OpenUrl CrossRef PubMed
3.↵
Li Y, Xia L: Coronavirus Disease 2019 (COVID-19): Role of Chest CT in Diagnosis and Management. AJR American journal of roentgenology 2020:1–7.
4.
Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A: Coronavirus Disease 2019 (COVID-19): A Systematic Review of Imaging Findings in 919 Patients. AJR American journal of roentgenology2020:1–7.
5.
Zhou S, Wang Y, Zhu T, Xia L: CT Features of Coronavirus Disease 2019 (COVID-19) Pneumonia in 62 Patients in Wuhan, China. AJR American journal of roentgenology2020:1–8.
6.
Chaganti S, Balachandran A, Chabin G, Cohen S, Flohr T, Georgescu B, Grenier P, Grbic S, Liu S, Mellot F et al: Quantification of Tomographic Patterns associated with COVID-19 from Chest CT. ArXiv 2020.
7.
Liu K-C, Xu P, Lv W-F, Qiu X-H, Yao J-L, Gu J-F, Wei W: CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity. European Journal of Radiology 2020, 126:108941.
OpenUrl PubMed
8.↵
Pan F, Ye T, Sun P, Gui S, Liang B, Li L, Zheng D, Wang J, Hesketh RL, Yang L et al: Time Course of Lung Changes at Chest CT during Recovery from Coronavirus Disease 2019 (COVID-19). Radiology 2020, 295(3):715–721.
OpenUrl PubMed
9.↵
Bai HX, Hsieh B, Xiong Z, Halsey K, Choi JW, Tran TML, Pan I, Shi LB, Wang DC, Mei J et al: Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT. Radiology 2020, 296(2):E46–E54.
OpenUrl
10.↵
Pesapane F, Codari M, Sardanelli F: Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp 2018, 2(1):35.
OpenUrl CrossRef PubMed
11.↵
Huang L, Han R, Ai T, Yu P, Kang H, Tao Q, Xia L: Serial Quantitative Chest CT Assessment of COVID-19: Deep-Learning Approach. Radiology: Cardiothoracic Imaging 2020, 2(2):e200075.
OpenUrl
12.
Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z, Song Q: Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT. Radiology2020:200905.
13.
Liu W, Liu M, Guo X, Zhang P, Zhang L, Zhang R, Kang H, Zhai Z, Tao X, Wan J et al: Evaluation of acute pulmonary embolism and clot burden on CTPA with deep learning. European radiology 2020, 30(6):3567–3575.
OpenUrl
14.
Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ (Clinical research ed) 2020, 368:m689.
OpenUrl Abstract/FREE Full Text
15.↵
Wynants L, Van Calster B, Bonten MMJ, Collins GS, Debray TPA, De Vos M, Haller MC, Heinze G, Moons KGM, Riley RD et al: Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ (Clinical research ed) 2020, 369:m1328.
OpenUrl Abstract/FREE Full Text
16.↵
Academy TAD: COVID-19 AI Assisted Analysis Based On Chest CT Imaging. In., vol. 2: The Alibaba DAMO Academy; 2020.
17.↵
Gary S. Collins JBR, Douglas G. Altman, Karel G.M. Moons: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Annals of Internal Medicine 2015, 162(1):55–63.
OpenUrl CrossRef PubMed
18.↵
Lippi G, Simundic AM, Plebani M: Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19). Clin Chem Lab Med 2020.
19.↵
Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J: Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure. Ann Intern Med 2020, 173(4):262–267.
OpenUrl CrossRef PubMed
20.↵
Limmathurotsakul D, Turner EL, Wuthiekanun V, Thaipadungpanit J, Suputtamongkol Y, Chierakul W, Smythe LD, Day NPJ, Cooper B, Peacock SJ: Fool’s Gold: Why Imperfect Reference Tests Are Undermining the Evaluation of Novel Diagnostics: A Reevaluation of 5 Diagnostic Tests for Leptospirosis. Clinical Infectious Diseases 2012, 55(3):322–331.
OpenUrl CrossRef PubMed
21.↵
Long QX, Liu BZ, Deng HJ, Wu GC, Deng K, Chen YK, Liao P, Qiu JF, Lin Y, Cai XF et al: Antibody responses to SARS-CoV-2 in patients with COVID-19. Nat Med 2020, 26(6):845–848.
OpenUrl PubMed
22.↵
Rubin GD, Ryerson CJ, Haramati LB, Sverzellati N, Kanne JP, Raoof S, Schluger NW, Volpi A, Yim JJ, Martin IBK et al: The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society. Radiology 2020, 296(1):172–180.
OpenUrl
23.↵
He L, Huang Y, Ma Z, Liang C, Liang C, Liu Z: Effects of contrast-enhancement, reconstruction slice thickness and convolution kernel on the diagnostic performance of radiomics signature in solitary pulmonary nodule. Sci Rep 2016, 6:34921.
OpenUrl
24.↵
Organization WH: Rational use of personal protective equipment (PPE) for coronavirus disease (COVID-19): interim guidance, 19 March 2020. In.: World Health Organization; 2020.
25.↵
Liu J, Yang J, Li S, Chen J, Yang L, Zhao Z, Hong L: Gynecological prevention and control model based on ward rearrangement and zoning management in pandemic period of COVID-19. Panminerva Med 2020.
26.↵
Pham TD: A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks. Sci Rep 2020, 10(1):16942.
OpenUrl
27.↵
Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne 2006, 174(4):469–476.
OpenUrl
28.↵
Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L: Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology2020:200642.
29.↵
Sakamoto H, Ishikane M, Ueda P: Seasonal Influenza Activity During the SARS-CoV-2 Outbreak in Japan. JAMA 2020, 323(19):1969–1971.
OpenUrl

View the discussion thread.

Posted November 18, 2020.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Radiology and Imaging

Subject Areas

All Articles

Addiction Medicine (362)
Allergy and Immunology (682)
Anesthesia (183)
Cardiovascular Medicine (2721)
Dentistry and Oral Medicine (320)
Dermatology (234)
Emergency Medicine (411)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (964)
Epidemiology (12367)
Forensic Medicine (10)
Gastroenterology (782)
Genetic and Genomic Medicine (4222)
Geriatric Medicine (394)
Health Economics (697)
Health Informatics (2724)
Health Policy (1012)
Health Systems and Quality Improvement (1011)
Hematology (369)
HIV/AIDS (873)
Infectious Diseases (except HIV/AIDS) (13804)
Intensive Care and Critical Care Medicine (810)
Medical Education (403)
Medical Ethics (111)
Nephrology (449)
Neurology (3999)
Nursing (216)
Nutrition (592)
Obstetrics and Gynecology (760)
Occupational and Environmental Health (707)
Oncology (2108)
Ophthalmology (601)
Orthopedics (250)
Otolaryngology (309)
Pain Medicine (256)
Palliative Medicine (77)
Pathology (475)
Pediatrics (1143)
Pharmacology and Therapeutics (474)
Primary Care Research (465)
Psychiatry and Clinical Psychology (3529)
Public and Global Health (6625)
Radiology and Imaging (1438)
Rehabilitation Medicine and Physical Therapy (842)
Respiratory Medicine (882)
Rheumatology (418)
Sexual and Reproductive Health (418)
Sports Medicine (350)
Surgery (458)
Toxicology (57)
Transplantation (192)
Urology (171)

[1] 1.↵
Maves RC, Downar J, Dichter JR, Hick JL, Devereaux A, Geiling JA, Kissoon N, Hupert N, Niven AS, King MA et al: Triage of Scarce Critical Care Resources in COVID-19 An Implementation Guide for Regional Allocation: An Expert Panel Report of the Task Force for Mass Critical Care and the American College of Chest Physicians. Chest 2020, 158(1):212–225.
OpenUrl

[2] 2.↵
Carenzo L, Costantini E, Greco M, Barra FL, Rendiniello V, Mainetti M, Bui R, Zanella A, Grasselli G, Lagioia M et al: Hospital surge capacity in a tertiary emergency referral centre during the COVID-19 outbreak in Italy. Anaesthesia 2020, 75(7):928–934.
OpenUrl CrossRef PubMed

[3] 3.↵
Li Y, Xia L: Coronavirus Disease 2019 (COVID-19): Role of Chest CT in Diagnosis and Management. AJR American journal of roentgenology 2020:1–7.

[4] 4.
Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A: Coronavirus Disease 2019 (COVID-19): A Systematic Review of Imaging Findings in 919 Patients. AJR American journal of roentgenology2020:1–7.

[5] 5.
Zhou S, Wang Y, Zhu T, Xia L: CT Features of Coronavirus Disease 2019 (COVID-19) Pneumonia in 62 Patients in Wuhan, China. AJR American journal of roentgenology2020:1–8.

[6] 6.
Chaganti S, Balachandran A, Chabin G, Cohen S, Flohr T, Georgescu B, Grenier P, Grbic S, Liu S, Mellot F et al: Quantification of Tomographic Patterns associated with COVID-19 from Chest CT. ArXiv 2020.

[7] 7.
Liu K-C, Xu P, Lv W-F, Qiu X-H, Yao J-L, Gu J-F, Wei W: CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity. European Journal of Radiology 2020, 126:108941.
OpenUrl PubMed

[8] 8.↵
Pan F, Ye T, Sun P, Gui S, Liang B, Li L, Zheng D, Wang J, Hesketh RL, Yang L et al: Time Course of Lung Changes at Chest CT during Recovery from Coronavirus Disease 2019 (COVID-19). Radiology 2020, 295(3):715–721.
OpenUrl PubMed

[9] 9.↵
Bai HX, Hsieh B, Xiong Z, Halsey K, Choi JW, Tran TML, Pan I, Shi LB, Wang DC, Mei J et al: Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT. Radiology 2020, 296(2):E46–E54.
OpenUrl

[10] 10.↵
Pesapane F, Codari M, Sardanelli F: Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp 2018, 2(1):35.
OpenUrl CrossRef PubMed

[11] 11.↵
Huang L, Han R, Ai T, Yu P, Kang H, Tao Q, Xia L: Serial Quantitative Chest CT Assessment of COVID-19: Deep-Learning Approach. Radiology: Cardiothoracic Imaging 2020, 2(2):e200075.
OpenUrl

[12] 12.
Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z, Song Q: Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT. Radiology2020:200905.

[13] 13.
Liu W, Liu M, Guo X, Zhang P, Zhang L, Zhang R, Kang H, Zhai Z, Tao X, Wan J et al: Evaluation of acute pulmonary embolism and clot burden on CTPA with deep learning. European radiology 2020, 30(6):3567–3575.
OpenUrl

[14] 14.
Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ (Clinical research ed) 2020, 368:m689.
OpenUrl Abstract/FREE Full Text

[15] 15.↵
Wynants L, Van Calster B, Bonten MMJ, Collins GS, Debray TPA, De Vos M, Haller MC, Heinze G, Moons KGM, Riley RD et al: Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ (Clinical research ed) 2020, 369:m1328.
OpenUrl Abstract/FREE Full Text

[16] 16.↵
Academy TAD: COVID-19 AI Assisted Analysis Based On Chest CT Imaging. In., vol. 2: The Alibaba DAMO Academy; 2020.

[17] 17.↵
Gary S. Collins JBR, Douglas G. Altman, Karel G.M. Moons: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Annals of Internal Medicine 2015, 162(1):55–63.
OpenUrl CrossRef PubMed

[18] 18.↵
Lippi G, Simundic AM, Plebani M: Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19). Clin Chem Lab Med 2020.

[19] 19.↵
Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J: Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure. Ann Intern Med 2020, 173(4):262–267.
OpenUrl CrossRef PubMed

[20] 20.↵
Limmathurotsakul D, Turner EL, Wuthiekanun V, Thaipadungpanit J, Suputtamongkol Y, Chierakul W, Smythe LD, Day NPJ, Cooper B, Peacock SJ: Fool’s Gold: Why Imperfect Reference Tests Are Undermining the Evaluation of Novel Diagnostics: A Reevaluation of 5 Diagnostic Tests for Leptospirosis. Clinical Infectious Diseases 2012, 55(3):322–331.
OpenUrl CrossRef PubMed

[21] 21.↵
Long QX, Liu BZ, Deng HJ, Wu GC, Deng K, Chen YK, Liao P, Qiu JF, Lin Y, Cai XF et al: Antibody responses to SARS-CoV-2 in patients with COVID-19. Nat Med 2020, 26(6):845–848.
OpenUrl PubMed

[22] 22.↵
Rubin GD, Ryerson CJ, Haramati LB, Sverzellati N, Kanne JP, Raoof S, Schluger NW, Volpi A, Yim JJ, Martin IBK et al: The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society. Radiology 2020, 296(1):172–180.
OpenUrl

[23] 23.↵
He L, Huang Y, Ma Z, Liang C, Liang C, Liu Z: Effects of contrast-enhancement, reconstruction slice thickness and convolution kernel on the diagnostic performance of radiomics signature in solitary pulmonary nodule. Sci Rep 2016, 6:34921.
OpenUrl

[24] 24.↵
Organization WH: Rational use of personal protective equipment (PPE) for coronavirus disease (COVID-19): interim guidance, 19 March 2020. In.: World Health Organization; 2020.

[25] 25.↵
Liu J, Yang J, Li S, Chen J, Yang L, Zhao Z, Hong L: Gynecological prevention and control model based on ward rearrangement and zoning management in pandemic period of COVID-19. Panminerva Med 2020.

[26] 26.↵
Pham TD: A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks. Sci Rep 2020, 10(1):16942.
OpenUrl

[27] 27.↵
Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne 2006, 174(4):469–476.
OpenUrl

[28] 28.↵
Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L: Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology2020:200642.

[29] 29.↵
Sakamoto H, Ishikane M, Ueda P: Seasonal Influenza Activity During the SARS-CoV-2 Outbreak in Japan. JAMA 2020, 323(19):1969–1971.
OpenUrl

Accuracy of deep learning based computed tomography diagnostic system of COVID-19: a consecutive sampling external validation cohort study

Abstract

Introduction

Materials and Methods

Study design

Participants

Chest CT protocols

Image analysis

Reference standard

Statistical analysis

Sensitivity analysis

1. Moving cut-off point

2. Simulation of imperfect reference

3. Effect of the number of days after symptom onset

4. Effect of symptom severity

5. Effect of reconstruction slice

Results

Study population characteristics

Model performance

Sensitivity analysis

1. Moving cut-off point

2. Simulation of imperfect reference

3. Effect of number of days after symptom onset

4. Changing the eligibility criteria

Dataset focused on asymptomatic patients

Dataset focused on patients needing oxygen therapy

5. Effect of the thickness of the CT reconstruction slice of CT

Discussion

Data Availability

Compliance with ethical standards

Guarantor

Conflict of interest

Funding

Statistics and biometry

Informed consent

Ethical approval

Study subjects or cohorts overlap

Methodology

Acknowledgments

Footnotes

Abbreviations

References

Citation Manager Formats

Subject Area