RT Journal Article SR Electronic T1 Digital Omicron detection using unscripted voice samples from social media JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.09.13.22279673 DO 10.1101/2022.09.13.22279673 A1 Anibal, James T. A1 Landa, Adam J. A1 Nguyen, Hang T. A1 Peltekian, Alec K. A1 Shin, Andrew D. A1 Song, Miranda J. A1 Christou, Anna S. A1 Hazen, Lindsey A. A1 Rivera, Jocelyne A1 Morhard, Robert A. A1 Bagci, Ulas A1 Li, Ming A1 Clifton, David A. A1 Wood, Bradford J. YR 2022 UL http://medrxiv.org/content/early/2022/12/22/2022.09.13.22279673.abstract AB The success of artificial intelligence in clinical environments relies upon the diversity and availability of training data. In some cases, social media data may be used to counterbalance the limited amount of accessible, well-curated clinical data, but this possibility remains largely unexplored. In this study, we mined YouTube to collect voice data from individuals with self-declared positive COVID-19 tests during time periods in which Omicron was the predominant variant1,2,3, while also sampling non-Omicron COVID-19 variants, other upper respiratory infections (URI), and healthy subjects. The resulting dataset was used to train a DenseNet model to detect the Omicron variant from voice changes. Our model achieved 0.85/0.80 specificity/sensitivity in separating Omicron samples from healthy samples and 0.76/0.70 specificity/sensitivity in separating Omicron samples from symptomatic non-COVID samples. In comparison with past studies, which used scripted voice samples, we showed that leveraging the intra-sample variance inherent to unscripted speech enhanced generalization. Our work introduced novel design paradigms for audio-based diagnostic tools and established the potential of social media data to train digital diagnostic models suitable for real-world deployment.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by the NIH Center for Interventional Oncology and the Intramural Research Program of the National Institutes of Health, National Cancer Institute, and the National Institute of Biomedical Imaging and Bioengineering, via intramural NIH Grants Z1A CL040015 and 1ZIDBC011242.Work also supported by the NIH Intramural Targeted Anti-COVID-19 (ITAC) Program, funded by the National Institute of Allergy and Infectious Diseases.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of the National Institutes of Health waived ethical approval for this work.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors