Abstract
Importance Open-access data challenges have the potential to accelerate innovation in artificial-intelligence (AI)-based tools for global health. A specimen-free rapid triage method for TB is a global health priority.
Objective To develop and validate cough sound-based AI algorithms for tuberculosis (TB) through the Cough Diagnostic Algorithm for Tuberculosis (CODA TB) DREAM challenge.
Design In this diagnostic study, participating teams were provided cough-sound and clinical and demographic data. They were asked to develop AI models over a four-month period, and then submit the algorithms for independent validation.
Setting Data was collected using smartphones from outpatient clinics in India, Madagascar, the Philippines, South Africa, Tanzania, Uganda, and Vietnam.
Participants We included data from 2,143 adults who were consecutively enrolled with at least two weeks of cough. Data were randomly split evenly into training and test partitions.
Exposures Standard TB evaluation was completed, including Xpert MTB/RIF Ultra and culture. At least three solicited coughs were recorded using the Hyfe Research app.
Main Outcomes and Measures We invited teams to develop models using 1) cough sound features only and/or 2) cough sound features with routinely available clinical data to classify microbiologically confirmed TB disease. Models were ranked by area under the receiver operating characteristic curve (AUROC) and partial AUROC (pAUROC) to achieve at least 80% sensitivity and 60% specificity.
Results Eleven cough models were submitted, as well as six cough-plus-clinical models. AUROCs for cough models ranged from 0.69-0.74, and the highest performing model achieved 55.5% specificity (95% CI 47.7-64.2) at 80% sensitivity. The addition of clinical data improved AUROCs (range 0.78-0.83), five of the six submitted models reached the target pAUROC, and highest performing model had 73.8% (95% CI 60.8-80.0) specificity at 80% sensitivity. In post-challenge subgroup analyses, AUROCs varied by country, and was higher among males and HIV-negative individuals. The probability of TB classification correlated with Xpert Ultra semi-quantitative levels.
Conclusions and Relevance In a short period, new and independently validated cough-based TB algorithms were developed through an open-source and transparent process. Open-access data challenges can rapidly advance and improve AI-based tools for global health.
Question Can an open-access data challenge support the rapid development of cough-based artificial intelligence (AI) algorithms to screen for tuberculosis (TB)?
Findings In this diagnostic study, teams were provided well-characterized cough sound data from seven countries, and developed and submitted AI models for independent validation. Multiple models that combined clinical and cough data achieved the target accuracy of at least 80% sensitivity and 60% specificity to classify microbiologically-confirmed TB.
Meaning Cough-based AI models have promise to support point-of-care TB screening, and open-access data challenges can accelerate the development of AI-based tools for global health.
Competing Interest Statement
PMS is employed by Hyfe AI.
Funding Statement
The CODA TB DREAM Challenge and post-challenge evaluation was funded in part by the Bill & Melinda Gates Foundation. R2D2 was funded by the U.S. National Institutes of Health (U01 AI152087), and the Digital Cough Monitoring study was funded by the Patrick J. McGovern Foundation. SGL is supported by a Junior 1 Salary Award from the Fonds de Recherche Sante Quebec. DJ is supported by funding by the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethical approvals for the studies were obtained from institutional review boards (IRB) in the US (R2D2 TB Network, University of California, San Francisco) and Canada (Digital Cough Monitoring Project, University of Montreal), as well as IRBs in each country in which participants were enrolled. In Vietnam, approval was obtained from the Ministry of Health Ethical Committee for National Biological Medical Research (94/CN-HĐĐĐ), the National Lung Hospital Ethical Committee for Biological Medical Research (566/2020/NCKH) and the Hanoi Department of Health, Hanoi Lung Hospital Science and Technology Initiative Committee (22/BVPHN). In India, approval was obtained from Christian Medical College IRB (13256). In South Africa, approval was obtained from Stellenbosch University Health Research Ethics Committee (17047). In Uganda, approval was obtained from Makerere University, College of Health Sciences, School of Medicine, Research Ethics Committee (2020-182). In the Philippines, approval was obtained from De La Salle Health Sciences Institute Independent Ethics Committee (2020-33-02-A). In Madagascar, approval was obtained from the Comite d Ethique a la Recheche Biomedicale (IORG0000851). In Tanzania, approval was obtained from the Ifakarah Health Institute IRB (31-2021) and the National Institute for Medical Research (NIMR/HQ/R.8a/Vol IX/3805).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The challenge training data and links to the code and write-ups for the model submissions are available at www.synapse.org/TBcough. Additionally, users can register to submit models for evaluation against the validation data in an ongoing manner.