PT - JOURNAL ARTICLE AU - Wen, Zhi AU - Powell, Guido AU - Chafi, Imane AU - Buckeridge, David AU - Li, Yue TI - Inferring global-scale temporal latent topics from news reports to predict public health interventions for COVID-19 AID - 10.1101/2021.06.10.21257749 DP - 2021 Jan 01 TA - medRxiv PG - 2021.06.10.21257749 4099 - http://medrxiv.org/content/early/2021/06/10/2021.06.10.21257749.short 4100 - http://medrxiv.org/content/early/2021/06/10/2021.06.10.21257749.full AB - The COVID-19 global pandemic has highlighted the importance of non-pharmacological interventions (NPI) for controlling epidemics of emerging infectious diseases. Despite the importance of NPI, their implementation has been monitored in an ad hoc and uncoordinated manner, mainly through the manual efforts of volunteers. Given the absence of systematic NPI tracking, authorities and researchers are limited in their ability to quantify the effectiveness of NPI and guide decisions regarding their use during the progression of a global pandemic. To address this issue, we propose 3-stage machine learning framework called EpiTopics to facilitate the surveillance of NPI by mining the vast amount of unlabelled news reports about these interventions. Building on topic modeling, our method characterizes online government reports and media articles related to COVID-19 as a mixture of latent topics. Our key contribution is the use of transfer-learning to address the limited number of NPI-labelled documents and topic modelling to support interpretation of the results. At stage 1, we trained a modified version of the unsupervised dynamic embedded topic model (DETM) on 1.2 million international news reports related to COVID-19. At stage 2, we used the trained DETM to infer topic mixture from a small set of 2000 NPI-labelled WHO documents as the input features for predicting NPI labels on each document. At stage 3, we supply the inferred country-level temporal topics from the DETM to the pretrained document-level NPI classifier to predict country-level NPIs. We identified 25 interpretable topics, over 4 distinct and coherent COVID-related themes. These topics contributed to significant improvements in predicting the NPIs labelled in the WHO documents and in predicting country-level NPIs. Together, our work lay the machine learning methodological foundation for future research in global-scale surveillance of public health interventions. The EpiTopics code is available at GitHub: https://github.com/li-lab-mcgill/covid-npi.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work is supported by CIHR through the Canadian 2019 Novel Coronavirus (COVID-19) Rapid Research Funding Opportunity (Round 1), (Application number: 440236).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:N/AAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets analyzed during the current study are from publicly available repositories or data portals. The acquisition and quality control steps for all datasets are included in the supplementary information. https://github.com/li-lab-mcgill/covid-npi