Summary
Background COVID-19 is an acute respiratory illness caused by the novel coronavirus SARS-CoV-2. The disease has rapidly spread to most countries and territories and has caused 14·2 million confirmed infections and 602,037 deaths as of July 19th 2020. Massive molecular testing for COVID-19 has been pointed as fundamental to moderate the spread of the disease. Pooling methods can enhance testing efficiency, but they are viable only at very low incidences of the disease. We propose Smart Pooling, a machine learning method that uses clinical and sociodemographic data from patients to increase the efficiency of pooled molecular testing for COVID-19 by arranging samples into all-negative pools.
Methods We developed machine learning methods that estimate the probability that a sample will test positive for SARS-Cov-2 based on complementary information from the sample. We use these predictions to exclude samples predicted as positive from pools. We trained our machine learning methods on samples from more than 8,000 patients tested for SARS-Cov-2 from April to July in Bogotá, Colombia.
Findings Our method, Smart Pooling, shows efficiency of 306% at a disease prevalence of 5% and efficiency of 107% at disease a prevalence of up to 50%, a regime in which two-stage pooling offers marginal efficiency gains compared to individual testing (see Figure 1). Additionally, we calculate the possible efficiency gains of one- and two-dimensional two-stage pooling strategies, and present the optimal strategies for disease prevalences up to 25%. We discuss practical limitations to conduct pooling in the laboratory.
Interpretation Pooled testing has been a theoretically alluring option to increase the coverage of diagnostics since its proposition by Dorfmann during World War II. Although there are examples of successfully using pooled testing to reduce the cost of diagnostics, its applicability has remained limited because efficiency drops rapidly as prevalence increases. Not only does our method provide a cost-effective solution to increase the coverage of testing amid the COVID-19 pandemic, but it also demonstrates that artificial intelligence can be used complementary with well-established techniques in the medical praxis.
Funding Faculty of Engineering, Universidad de los Andes, Colombia.
Evidence before this study The acute respiratory illness COVID-19 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The World Health Organization (WHO) labeled COVID-19 as a pandemic in March 2020. Reports from February 2020 indicated the possibility of asymptomatic transmission of the virus, which has called for molecular testing to identify carriers of the disease and prevent them from spreading it. The dramatic rise in the global need for molecular testing has made reagents scarce. Pooling strategies for massive diagnostics were initially proposed to diagnose syphilis during World War II, but have not yet seen widespread use mainly because their efficiency falls even at modest disease prevalence.
We searched PubMed, BioRxiv, and MedRxiv for articles published in English from inception to July 15th 2020 for keywords “pooling”, “testing” AND “COVID-19”, AND “machine learning” OR “artificial intelligence”. Early studies for pooled molecular testing of SARS-CoV-2 revealed the possibility of detecting single positive samples in dilutions of samples from up to 32 individuals. The first reports of pooled testing came in March from Germany and the USA. These works suggested that it was feasible to conduct pooled testing as long as the prevalence of the disease was low. Numerous theoretical works have focused only on finding or adapting the ideal pooling strategy to the prevalence of the disease. Nonetheless, many do not consider other practical limitations of putting these strategies into practice. Reports from May 2020 indicated that it was feasible to predict an individual’s status with machine learning methods based on reported symptoms.
Added value of this study We show how artificial intelligence methods can be used to enhance, but not replace, existing well-proven methods, such as diagnostics by qPCR. We show that in this fashion, pooled testing can yield efficiency gains even as prevalence increases. Our method does not compromise the sensitivity or specificity of the diagnostics, as these are still given by the molecular test. The artificial intelligence models are simple, and we make them free to use. Remarkably, artificial intelligence methods can continuously learn from every set of samples and thus increase their performance over time.
Implications of all the available evidence Using artificial intelligence to enhance rather than replace molecular testing can make pooling testing feasible, even as disease incidence rises. This approach could make pooled testing an effective tool to tackle the disease’s progression, particularly in territories with limited resources.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No external funding has been received for this project.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Smart Pooling has been approved by the ethics committee at Universidad de Los Andes in Bogotá, Colombia. Both the Patient Dataset and the Test Center Dataset described in section 3.1.1 have been anonymized and its usage was authorized by the district health department.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
We will make our computational platform freely available for the general public in order to optimize COVID-19 testing.