ABSTRACT
The COVID-19 pandemic poses a heightened risk to health workers, especially in low- and middle-income countries such as Indonesia. Due to the limitations of implementing mass RT-PCR testing for health workers, high-performing and cost-effective methodologies must be developed to help identify COVID-19 positive health workers and protect the spearhead of the battle against the pandemic. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity (by RT-PCR) using data obtained from a survey specific to health workers. Machine learning tools can enhance COVID-19 screening capacity in high-risk populations such as health workers in environments where cost is a barrier to the accessibility of adequate testing and screening supplies. We built two sets of COVID-19 Likelihood Meter (CLM) models: one trained on data from a broad population of health workers in Jakarta and Semarang (full model) and tested on the same, and one trained on health workers from Jakarta only (Jakarta model) and tested on both the same and an independent population of Semarang health workers. The area under the receiver-operating-characteristic curve (AUC), average precision (AP), and the Brier score (BS) were used to assess model performance. Shapely additive explanations (SHAP) were used to analyse future importance. The final dataset for the study included 5,393 healthcare workers. For the full model, the random forest was selected as the algorithm choice. It achieved cross-validation of mean AUC of 0.832 ± 0.015, AP of 0.513 ± 0.039, and BS of 0.124 ± 0.005, and was high performing during testing with AUC and AP of 0.849 and 0.51, respectively. The random forest classifier also displayed the best and most robust performance for the Jakarta model, with AUC of 0.856 ± 0.015, AP of 0.434 ± 0.039, and BS of 0.08 ± 0.0003. The performance when testing on the Semarang healthcare workers was AUC of 0.745 and AP of 0.694. Meanwhile, the performance for Jakarta 2022 test set was an AUC of 0.761 and AP of 0.535. Our models yielded high predictive performance and can be used as an alternative COVID-19 methodology for healthcare workers in Indonesia, therefore helping in predicting an increased trend of transmission during the transition into endemic.
Competing Interest Statement
The authors are affiliated with their respective organizations, as indicated on the first page. SS, MAM, LS, JK, AI, FA, BR, and AT are affiliated with Nalagenetics. The models used in this study were also used by Nalagenetics in a now-defunct COVID-19 screening tool previously offered to hospitals in Indonesia. OH, SAKS, FZK, NL, DS, and AT are affiliated with CISDI. CISDI is a healthcare think tank organization that uses research findings to advocate public health policies. The authors have no financial gain or loss in any form that could result from the publication of this manuscript, but the absence of authors without any organizational affiliation could be considered a non-financial competing interest.
Funding Statement
This study was primarily funded by Yayasan Satriabudi Dharma Setia, a philanthropic organization funded by the Indonesian Coordinating Ministry for Maritime and Investment Affairs (Kemenkomarves). They provided the labs and PCR test kits used for the study but had no role in sample collection or data analysis. Additional funding for operational and manpower expenses was also provided by Nalagenetics Pte Ltd (Singapore), Nalagenetics, CISDI, and RSND Semarang. The funders had no role in study design, data collection, and analysis, the decision to publish, or the preparation of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Board (IRB) approval was granted by the Institute of Research and Community Service of Universitas Katolik Indonesia Atma Jaya (Jakarta, Indonesia) under the IRB Reference Number of 626A /III/LPPM.PM.10.05/05/2020. Informed consents were obtained online from respondents before they were enrolled in the study
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Author list updated; an updated Introduction in accordance with more current data; an increased number of the total participants; supplemental files containing STROBE checklist (S1 File) and the survey questions (S2 File) are included; Modelling and Prediction section updated with an additional algorithm and training on the models; Data Summary updated; Model Performance and Explainability updated in relation with the additional model training; figures updated
Data Availability
The datasets generated and analyzed in this study are not publicly available since it contains sensitive personal information, but are available from the corresponding author on reasonable request.