Toward AI-Guided Smoking Cessation: Individualized Nicotine Addiction Modeling Using Gaussian Processes ======================================================================================================= * Anirudh Chari ## Abstract Cigarette smoking remains the leading cause of preventable disease and death in the United States, accounting for nearly half a million deaths annually. Given the recent rise of artificial intelligence in healthcare applications, computational assessment of smoking behaviors is a promising direction. In this study, we aim to recognize and classify addiction patterns in individual smokers’ daily usage based on time series data. To this end, we leverage Gaussian process modeling to iteratively learn a function that defines a smoker’s behavior as usage data is accumulated. Namely, we aim to learn weekly periodic trends in usage, and then utilize the model to predict future trends. We demonstrate that the outputted predictions resemble the actual data well, and that these informed forecasts significantly outperform those of a naive prediction model with respect to accuracy. Finally, we propose strategies for utilizing these predictions for goal-setting as part of a computer-supervised gradual cessation program. ## 1 Introduction Smoking cessation generally comes in two forms: abrupt and gradual. Abrupt cessation has shown to be more effective in maintaining long-term abstinence than self-supervised gradual cessation [1]. However, computer-supervised gradual cessation (i.e., goal-setting algorithms) has not been widely explored, and a preliminary approach showed significantly higher long-term abstinence rates than both abrupt and self-supervised gradual cessation attempts, at least in adolescents [2]. With the recent rise and great success of artificial intelligence (AI) in healthcare applications [3], this is a promising direction. Modeling drug use and addiction computationally has been widely explored in recent years [4]. Popular approaches include reinforcement learning [5] and dopamine-based modeling [6]. In [7], the author demonstrates that Bayesian frameworks can be effective for analyzing decision-making behaviors in drug addicts. In [8], the authors develop a computational model of nicotine addiction that classifies the severity of a user’s addiction based on neurophysiological indicators. Time series forecasting is the problem of predicting future data points or trends in a time series, given a segment of the current data. Popular time series forecasting techniques include exponential smoothing, ARIMA models, Kalman filters, long short-term memory (LSTM) models [9], and Gaussian processes (GPs) [10]. Because they operate on Bayesian inference, GPs pose the additional advantage of uncertainty quantification in outputted predictions, which can be especially helpful in evaluating their efficacy in the proposed setting. For an intuitive tutorial on the mechanics of GPs for solving regression problems, see [11]. The main challenge in forecasting is to learn the underlying patterns within potentially noisy data. This makes time series forecasting a very valuable tool in healthcare, as exemplified by its application to disease diagnosis and prognosis [12, 13]. Time series analysis and forecasting have also been applied in modeling large-scale nicotine use and cessation [14], especially in evaluating interventions [15, 16]. However, applying forecasting methods in behavior assessment of individual smokers has not been widely studied in recent years. We propose that the nicotine use of smokers with respect to time can be modeled using GPs, and that this model can be employed for effective classification and prediction of addiction and cessation behaviors. Ultimately, we aim to leverage these insights to develop a personalized and adaptive gradual smoking cessation program. ## 2 Methods ### 2.1 Preliminary: Gaussian Processes #### 2.1.1 Definition A Gaussian process (GP) model describes a probability distribution over all possible functions that fit a set of points. A GP model leverages Bayesian inference to update the posterior function function, defined as the outputted means with the variances as quantified uncertainties, as new data is obtained. This posterior function can be used as a regression model to make predictions about new data. #### 2.1.2 Kernels The *kernel* of a GP model defines the general curve fitting behavior, i.e. our foundational beliefs on how our function should behave. More formally, a kernel *k**θ*(*x*1, *x*2) defines the covariance between two function values *x*1 and *x*2 using some hyperparameters *θ*. Multiple kernels can be combined through *kernel composition* in order to fit more complex functions. #### 2.1.3 Hyperparameters Being the coefficients of our kernel function, the hyperparameters *θ* determine the exact curve fitting behavior of the GP model. Estimating hyperparameters manually is difficult; a common approach is Bayesian optimization. ### 2.2 Modeling Smoking Habits We observe that weekly patterns emerge in smokers’ nicotine use [17]. Thus, we can formulate the problem as periodical time forecasting, for which GPs have proven to be effective [18]. At a high level, the role of the GP is to learn a function that defines the user’s behavior. Our approach uses the framework provided in [18] as a foundation, and we summarize the framework as it relates to our work below. #### 2.2.1 Kernel Composition We utilize a kernel *K*, which is composed by ![Formula][1] Here, the periodic kernel PER provides a weekly seasonality, the LIN kernel provides a linear trend, and the RBF, SM1 and SM2 kernels are collectively used to represent nonlinear trends. See [18] for explicit definitions of each of these kernels. #### 2.2.2 Hyperparameter Estimation While the model aims to learn the specific parameters defining an individual’s behavior, it also consists of hyperparameters *θ* defining general behavior that applies to the entire function space, i.e., all users’ behaviors. We begin by assigning log-normal priors to each hyperparameter, and then improve our estimations by training the model on various sample data using an iterative process called *maximum a posteriori* (MAP) estimation, as outlined in [18]. #### 2.2.3 The Forecasting Problem GP regression modeling is naturally applicable to time series forecasting problems. We can formally represent our addiction forecasting task as a regression problem ![Formula][2] where *y* is nicotine dose per day (measured in number of cigarettes), *x* is time in days, *f* ∼ *GP* (0, *k****θ***) is the function with the GP as a prior distribution, and ![Graphic][3] is the noise with variance ![Graphic][4]. Then, given *m* points of training data *x* = (*x*1, …, *x**m*), *y* = (*y*1, …, *y**m*), and given *n* test inputs ![Graphic][5], the forecasting problem is to compute the function’s posterior distribution ![Graphic][6]. ## 3 Experiments ### 3.1 Dataset We utilize the dataset provided in [17], which contains a sample of 62 participants who have been smoking for more than two consecutive years and who smoke more than five cigarettes a day. Participants were male and female college students between 18 and 26 years of age. Each participant recorded the number of cigarettes he or she smoked each day for up to 12 consecutive weeks. Some data was incomplete, so we consider only the participants who recorded data for the full 12-week period, which yields *T* = 50 time series. For each time series, we form our training set using the first *m* = 56 days and our test set using the last *n* = 28 days. We normalize each time series to have a mean of 0 and a standard deviation of 1. ### 3.2 Baseline We compare the performance of the GP model to that of a baseline naive model. The naive model uses the average value of the historical data as its predicted value for all future days. More formally, ![Formula][7] The naive model makes an educated guess that smoking behaviors remain constant, but it leverages no knowledge of weekly patterns in behavior. ### 3.3 Evaluation Metric The *mean absolute error* (MAE) of the test set is given by ![Formula][8] Lower values of MAE indicate better model performance. MAE is a common metric for evaluating regression models on normalized data. ### 3.4 Results The MAE values across the time series were significantly lower for the GP model (*M* = 0.823, *SD* = 0.168) than for the naive model (*M* = 0.908, *SD* = 0.150), *t*(49) = 4.78, *p <* .001. The GP model successfully discerns a wide variety of periodicities (Fig. 1), and the predicted values and associated error bounds resemble the actual data well, *χ*2(26, *N* = 50) = 767.65, *p <* .001. ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/27/2023.10.25.23297563/F1/graphic-5.medium.gif) [](http://medrxiv.org/content/early/2023/10/27/2023.10.25.23297563/F1/graphic-5) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/27/2023.10.25.23297563/F1/graphic-6.medium.gif) [](http://medrxiv.org/content/early/2023/10/27/2023.10.25.23297563/F1/graphic-6) Figure 1. Example outputs from GP forecasting. ## 4 Guided Cessation We have shown that GPs are an effective tool for modeling nicotine addiction behaviors. We expect that the model will be useful in designing a computer- supervised gradual smoking cessation program. However, this requires an additional goal-setting procedure that actually leverages the model’s insights. Here, we outline some suggested procedures. ### 4.1 Naive Bounding Procedure At time *t*, let ![Graphic][9] denote the posterior distribution of *f* ∼ *GP* (0, *k****θ***), i.e. the regression output up to *x* = *x**m* where *t < x**m*. Then a simple goal-setting function is ![Formula][10] for some constant *k* ∈ (0, 1) to enforce a downward trend. ### 4.2 Bayesian Inference Procedure More generally, we can express our goals-setting function as ![Formula][11] where *λ* is the user’s *learning rate* – the rate at which he or she can successfully reduce nicotine intake. The learning rate can be somewhat interpreted as the strength of the user’s addiction. If *λ* is too low, the goals are unachievable and the user is prone to relapse. On the other hand, if *λ* is too high, progress is minimal the cessation process is unnecessarily prolonged. In some sense, our learning rate here is analogous to the learning rate in gradient descent (usually denoted *α*), which determines the step size in each iteration. Each user has a unique learning rate that might even change throughout the cessation attempt, and thus we are tasked with learning the optimal *λ* over time. We claim that *λ* can be learned using Bayesian inference. On day *t* = 0 we begin with some prior distribution, say *λ* ∼ 𝒩 (0.9, 0.2). Then for all days *t >* 0, we update our distribution with an iteration of Bayes’ theorem using the new evidence ![Formula][12] and we sample our new *λ* from the posterior distribution. ## 5 Conclusion In this paper, we showed that Gaussian processes are an effective method for individualized modeling of nicotine addiction behaviors. In particular, GPs perform well when tasked with finding weekly patterns in a user’s smoking habits, and the outputted predictions for future usage are significantly more accurate than those of a naive forecasting model. We also proposed methods for leveraging the GP forecast outputs for goal-setting as part of a guided gradual cessation program. In future work, we intend to conduct a user study that validates the effectiveness of GPs in computer-guided smoking cessation. ## Data Availability All data produced are available online at [https://repositori.uji.es/xmlui/handle/10234/180682](https://repositori.uji.es/xmlui/handle/10234/180682). [https://repositori.uji.es/xmlui/handle/10234/180682](https://repositori.uji.es/xmlui/handle/10234/180682) ## Footnotes * achari{at}imsa.edu * Received October 25, 2023. * Revision received October 25, 2023. * Accepted October 27, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. [1]. N. Lindson-Hawley, M. Banting, R. West, S. Michie, B. Shinkins, and P. Aveyard, “Gradual versus abrupt smoking cessation: A randomized, controlled noninferiority trial,” Ann. Intern. Med., vol. 164, no. 9, p. 585, 2016. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/M14-2805&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26975007&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 2. [2]. W. Riley, A. Jerome, A. Behar, and S. Zack, “Feasibility of computerized scheduled gradual reduction for adolescent smoking cessation,” Subst. Use Misuse, vol. 37, no. 2, pp. 255–263, 2002. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11863279&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 3. [3]. A. Bohr and K. Memarzadeh, “The rise of artificial intelligence in healthcare applications,” in Artificial Intelligence in Healthcare, Elsevier, 2020, pp. 25–60. 4. [4]. J. A. Mollick and H. Kober, “Computational models of drug use and addiction: A review,” J. Abnorm. Psychol., vol. 129, no. 6, pp. 544–555, 2020. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1037/abn0000503&link_type=DOI) 5. [5]. K. C. Berridge, “From prediction error to incentive salience: mesolimbic computation of reward motivation,” Eur. J. Neurosci., vol. 35, no. 7, pp. 1124–1143, 2012. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1460-9568.2012.07990.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22487042&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 6. [6]. W. Schultz, P. Dayan, and P. R. Montague, “A neural substrate of prediction and reward,” Science, vol. 275, no. 5306, pp. 1593–1599, 1997. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIyNzUvNTMwNi8xNTkzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTAvMjcvMjAyMy4xMC4yNS4yMzI5NzU2My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 7. [7]. K. Friston, “A theory of cortical responses,” Philos. Trans. R. Soc. Lond. B Biol. Sci., vol. 360, no. 1456, pp. 815–836, 2005. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rstb.2005.1622&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15937014&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 8. [8]. S. Metin and N. S. Sengor, “From occasional choices to inevitable musts: A computational model of nicotine addiction,” Comput. Intell. Neurosci., vol. 2012, pp. 1–14, 2012. 9. [9]. R. Williams, S. Hochreiter, andJ. Schmidhuber,“Long Short-Term Memory,” Cmu.edu, 1997.[Online]. Available: [https://deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf](https://deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf). [Accessed: 02-Oct-2023]. 10. [10]. S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain, “Gaussian processes for time-series modelling,” Philos. Trans. A Math. Phys. Eng. Sci., vol. 371, no. 1984, p. 20110550, 2013. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23277607&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 11. [11]. J. Wang, “An intuitive tutorial to Gaussian processes regression,” Arxiv.org. [Online]. Available: [http://arxiv.org/abs/2009.10862](http://arxiv.org/abs/2009.10862). [Accessed: 02-Oct-2023]. 12. [12]. C. Bui, N. Pham, A. Vo, A. Tran, A. Nguyen, and T. Le, “Time series forecasting for healthcare diagnosis and prognostics with the focus on cardiovascular diseases,” in 6th International Conference on the Development of Biomedical Engineering in Vietnam (BME6), Singapore: Springer Singapore, 2018, pp. 809–818. 13. [13]. A. S. Billis and P. D. Bamidis, “Employing time-series forecasting to historical medical data: an application towards early prognosis within elderly health monitoring environments,” in Proceedings of the 3rd International Conference on Artificial Intelligence and Assistive Medicine - Volume 1213, 2014, pp. 31–35. 14. [14]. L. Dierker et al., “Tobacco, alcohol, and marijuana use among first-year U.s. college students: A time series analysis,” Subst. Use Misuse, vol. 43, no. 5, pp. 680–699, 2008. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18393083&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F27%2F2023.10.25.23297563.atom) 15. [15]. S. J. Hoffman, M. J. P. Poirier, S. Rogers Van Katwyk, P. Baral, and L. Sritharan, “Impact of the WHO Framework Convention on Tobacco Control on global cigarette consumption: quasi-experimental evaluations using interrupted time series analysis and in-sample forecast event modelling,” BMJ, vol. 365, p. 2287, 2019. 16. [16]. J. M. Lightwood, S. Anderson, and S. A. Glantz, “Smoking and healthcare expenditure reductions associated with the California Tobacco Control Program, 1989 to 2019: A predictive validation,” PLoS One, vol. 18, no. 3, p. e0263579, 2023. 17. [17]. J. F. Rosel et al., “Pooled time series modeling reveals smoking habit memory pattern,” Front. Psychiatry, vol. 11, 2020. 18. [18]. G. Corani, A. Benavoli, and M. Zaffalon, “Time series forecasting with Gaussian Processes needs priors,” Arxiv.org. [Online]. Available: [http://arxiv.org/abs/2009.08102](http://arxiv.org/abs/2009.08102). [Accessed: 02-Oct-2023]. [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/inline-graphic-1.gif [4]: /embed/inline-graphic-2.gif [5]: /embed/inline-graphic-3.gif [6]: /embed/inline-graphic-4.gif [7]: /embed/graphic-3.gif [8]: /embed/graphic-4.gif [9]: /embed/inline-graphic-5.gif [10]: /embed/graphic-7.gif [11]: /embed/graphic-8.gif [12]: /embed/graphic-9.gif