Abstract
Objective Smoking is very common in Indonesia: among adults, around 66% of males and 7% of females are smokers. Smoking is not only harmful for people who smoke but also for people who are exposed to second-hand smoke on a regular basis. Previous research in various countries has shown a changing trend in smoking during the COVID-19 pandemic. However, despite the high prevalence of smoking in Indonesia and the shifting trend during COVID-19, no studies have utilized machine learning to investigate the potential increase in daily cigarette consumption during the pandemic. This study aimed to predict the increase in daily cigarette consumption among smokers during the pandemic, focused on smokers selected from vaccination registrants in the Special Region of Yogyakarta.
Design Five machine learning algorithms were developed and tested to assess their performance: decision tree (DT), random forest (RF), logistic regression (LoR), k-nearest neighbors (KNN), and naive Bayes (NB). The results showed a significant difference in the number of cigarettes consumed daily before and during the pandemic (statistic=2.8, p=0.004).
Setting This study is believed to be the first study prediction model to predict the increase of cigarette consumption during the COVID-19 pandemic in Indonesia.
Results The study found that both DT and LoR algorithms were effective in predicting increased daily cigarette consumption during the COVID-19 pandemic. They outperformed the other three algorithms in terms of precision, recall, accuracy, F1-score, sensitivity, and AUC (area under the curve operating characteristic curve). LoR showed a precision of 92%, recall of 99%, accuracy of 93%, F1-score of 96%, sensitivity of 91% and AUC of 78%, DT showed a precision of 88%, recall of 91%, accuracy of 81%, F1-score of 89%, sensitivity of 95% and AUC of 98%.
Conclusion We recommend using the DT and LoR algorithms, as they demonstrated better prediction performance. This study can be used as a pilot study for predicting smokers’ continuing behaviour status and the possibility of smoking cessation promotion among smokers, this study is a short report, and we suggested expanding with more factors and a larger dataset to provide more informative and reliable results, The recommendations based on the current findings can serve as a starting point for initial actions and can be further validated and refined with larger-scale studies in the future.
STRENGHTS AND LIMITATION OF THIS STUDY
⟹ This is the first study to investigate the increased number of cigarettes consumed daily by Indonesian smokers during the pandemic using machine learning models.
⟹ This paper using Multiple Algorithms: The author did not rely on a single algorithm but compared five different ML methods, providing a comprehensive analysis.
⟹ This paper using external research as a reference, the author established a solid basis for their methodology and ensured their research was supported by existing literature.
⟹ The paper clearly identified the DT model as superior, bringing clarity to the readers.
⟹ The paper suggests that the developed framework has wide applicability in healthcare, increasing its relevance and potential impact.
⟹ This paper considered only a few features (27), and more data on economic factors can be incorporated in future research work, as it will enable the real-life application of this model.
⟹ The selection bias introduced by recruiting participants from those who came for vaccination. This sample may not fully represent the general population.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Funding: None
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the ethic committee of the Polytechnic of Health Department Yogyakarta, Republic of Indonesia, with the approval number No. e-KEPK/POLKESYO/0751/X/2021, granted on October 5th, 2021.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability statement
Data are available upon reasonable request.