Abstract
Background Health utility data are rarely routinely collected in mental helath services. Mapping models that predict health utility from other outcome measures are typically derived from cross-sectional data but often used to predict longitudinal change.
Objective We aimed to develop models to map six psychological measures to adolescent Assessment of Quality of Life – Six Dimensions (AQOL-6D) health utility for youth mental health service clients and assess the ability of mapping models to predict longitudinal change.
Methods We recruited 1107 young people attending Australian primary mental health services, collecting data at two time points, three months apart. Five linear and three generalised linear models were explored to identify the best mapping model. Ten-fold cross-validation using R2, root mean square error (RMSE) and mean absolute error (MAE) were used to compare models and assess predictive ability of six candidate measures of psychological distress, depression and anxiety. Linear / generalised linear mixed effect models were used to construct longitudinal predictive models for AQoL-6D change.
Results A depression measure (Patient Health Questionnaire-9) was the strongest independent predictor of health utility. Linear regression models with complementary log-log transformation of utility score were the best performing models. Between-person associations were slightly larger than within-person associations for most of the predictors.
Conclusions Adolescent AQoL-6D utility can be derived from a range of psychological distress, depression and anxiety measures. Mapping models estimated from cross-sectional data can approximate longitudinal change but may slightly bias health utility predictions.
Data Replication code and model catalogues are available at: https://doi.org/10.7910/DVN/DKDIB0.
1 Introduction
Quality adjusted life years (QALYs) are indices of outcome that inform public health policy in many countries [1]. One of the main uses of QALYs is as the measure of benefit in Cost-Utility Analyses (CUAs). Compared to economic evaluations that use other outcome measures, CUAs facilitate comparisons of the value for money claims of interventions for different health conditions. CUAs have this advantage because of the generic nature of health outcomes measured by QALYs and the existence of well understood policymaker willingness to pay thresholds for QALYs.
The “quality” in QALYs is measured via the use of multi-attribute utility instruments (MAUIs), where domains of quality of life measured by a questionnaire are weighted using people’s preferences [2]. This approach produces a single health utility value for each individual at each measured health state, anchored on a scale where 0 represents a state equivalent to death and 1 represents perfect health. Health utilities can be converted to QALYs by weighting the duration (the “years” part of QALYs) each individual spends in each health state.
MAUIs are regularly used in research studies such as clinical trials and epidemiological surveys, but rarely feature in routine data collection by mental health services. In the absence of direct measurement, mapping analysis has been developed to predict health utility from standard health status measurements [3, 4]. MAUIs such as the Assessment of Quality of Life – Eight Dimensions (AQoL-8D [5]) have been shown to be sensitive to psychological measures [6] and can be mapped to [7] using measures of psychological distress (measured using Kessler Psychological Distress Scale – 10 items, K10 [8]) and depression and anxiety symptoms (measured using Depression, Anxiety, and Stress Scale – 21 items, DASS-21 [9]). Measures of anxiety (Generalised Anxiety Disorder Scale (GAD-7; [10]) and depression (Patient Health Questionnaire-9 (PHQ-9; [11]) have recently been used to map to other MAUIs [12]. However, it is unclear which mental health measures are the most predictive of health utility and existing algorithms developed for adult [7] or child [13] general populations are of questionable appropriateness for predicting health utility in clinical youth mental health samples.
Currently available mapping algorithms are largely derived from cross-sectional data and assume that the associations between psychological measurements and health utility are time-invariant. Therefore, the between-person association (variations in mental health symptoms associated with variations in health utility observed cross-sectionally in a population) can be applied to estimate the within-person association (changes in mental health symptoms associated with changes in health utility over time). However, this time invariant assumption may not be true (for example, health utility measures may be less sensitive to change compared with measures of mental health symptoms).
We are developing a computational model of the systems shaping the mental health of young people [14]. To enable that model to be used in cost-utility analyses of primary youth mental health services, we aimed to use data from a sample of help-seeking young people to: (i) identify the best mapping regression models to predict adolescent weighted AQoL-6D utility from six candidate measures of psychological distress, depression and anxiety; and (ii) assess ability of the mapping algorithms to predict longitudinal (three-month) change.
2 Methods
2.1 Ethics approval
The study was approved by the University of Melbourne’s Human Research Ethics Committee and the local Human Ethics and Advisory Group (1645367.1)
2.2 Sample and setting
This study forms part of a youth outcomes measurement research program, and the study sample has previously been described [15]. Briefly, young people aged 12 to 25 years who presented for a first appointment for mental health or substance use related issues were recruited from three metropolitan and two regional Australian youth-focused primary mental health clinics (headspace centres) between September 2016 to April 2018. Sample characteristics are similar to previous descriptions of headspace clients, with slight differences in age (less aged 12-14, more aged 18-20), cultural background (more Culturally and Linguistically Diverse and less Aboriginal and Torres Strait Islander young people), sexuality (fewer heterosexual clients) and housing (more in unstable accommodation) [15].
2.3 Measures
We collected data on health utility, six candidate predictors of health utility including psychological distress, depression and anxiety measures as well as demographic, clinical and functional population information.
2.3.1 Health utility
We assessed health utility using the adolescent version of the Assessment of Quality of Life – Six Dimension scale (AQoL-6D; [16]) MAUI. It was selected due to its validity for use in adolescents, the relevance of its domains for a clinical mental health sample [6] and its acceptable participant time-burden. The AQoL-6D instrument contains 20 items across the six dimensions of independent living, social and family relationship, mental health, coping, pain and sense. Health utility scores were calculated using a published algorithm (available at https://www.aqol.com.au/index. php/aqolinstruments?id=92), using Australian population preference weights. The distribution of AQoL-6D instrument item specific scores within our sample has been described in a previous study [17].
2.3.2 Candidate predictors
Data from six measures of psychological distress (one measure), depression (two measures) and anxiety (three measures) symptoms were used as candidate predictors to construct mapping models. These measures are widely used in clinical mental health services or clinically relevant to the profiles of young people seeking mental health care. The Kessler Psychological Distress Scale (K6; [8]) was used to measure psychological distress over the last 30 days. It includes six items (nervousness, hopelessness, restlessness, sadness, effort, and worthlessness) of the 10 item version of this measure, K10. We used the US scoring system, in which each item uses a scale that spans from 0 (“none of the time”) to 4 (“all of the time”).
The Patient Health Questionnaire-9 (PHQ-9; [11]) and Behavioural Activation for Depression Scale (BADS; [18]) were used to measure degree of depressive symptomatology. PHQ-9 includes nine questions measuring the frequency of depressive thoughts (including self-harm/suicidal thoughts) as well as associated somatic symptoms (e.g., sleep disturbance, fatigue, anhedonia, appetite, psychomotor changes) in the past two weeks. PHQ-9 uses a four-point frequency scale ranging from 0 (“Not at all”) to 3 (“Nearly every day”). For the PHQ-9 a total score is derived (0-27) with higher scores depicting greater symptom severity. BADS measures a range of behaviours (activation, avoidance/rumination, work/school impairment as well as social impairment) reflecting severity of depression. BADS includes 25 questions on behaviours over the past week, scored on a seven-point scale ranging from 0 (“Not at all”) to 6 (“Completely”). A total score is derived for the BADS (0-150) as well as subscale scores, with higher scores indicating greater activation.
The Generalised Anxiety Disorder Scale (GAD-7; [10]), Screen for Child Anxiety Related Disorders (SCARED; [19]) and Overall Anxiety Severity and Impairment Scale (OASIS; [20]) were used to measure anxiety symptoms. GAD-7 measures symptoms such as nervousness, worrying and restlessness, over the past two weeks using seven questions, with a four-point frequency scale ranging from 0 (“Not at all”) to 3 (Nearly every day”). A total score is calculated with scores ranging from 0 to 21 and higher scores indicating more severe symptomatology. SCARED is an anxiety screening tool designed for children and adolescents which can be mapped directly on specific Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) anxiety disorders including generalised anxiety disorder, panic disorder, separation anxiety disorder and social phobia. It includes 41 questions on a three-point scale of 0 (“Not true or hardly ever true”), 1 (“Somewhat True or Sometimes True”) and 2 (“Very true or often true”) to measure symptoms over the last three months. A total score is derived with scores ranging from 0-82, with higher scores indicative of the presence of an anxiety disorder. The OASIS was developed as a brief questionnaire to measure severity of anxiety and impairment in clinical populations. The OASIS includes five questions about frequency and intensity of anxiety as well as related impairments such as avoidance, restricted activities and problems with social functioning over the past week. Total scores range from 0-20 with higher scores depicting more severe symptomatology.
2.3.3 Population characteristics
We collected self-reported demographic measures (age, gender, sex at birth, education and employment status, languages spoken at home and country of birth). We also collected clinician or research interviewer assessed measures of mental health including primary diagnosis, clinical stage [21] and functioning (measured by the Social and Occupational Functioning Assessment Scale (SOFAS) [22]).
2.4 Procedures
Eligible participants were recruited by trained research assistants and responded to the questionnaire via a tablet device. Participants’ clinical characteristics were obtained from clinical records and research interview. At three-months post-baseline, participants were contacted in person or by telephone, to complete a follow-up assessment.
2.5 Statistical analysis
We undertook a three-step process to: i) gain insight into our dataset and measures; ii) specify models and assess model performance using cross-sectional data; and iii) specify longitudinal models and assess whether modelling assumptions were met.
2.5.1 Pre-modelling steps
Basic descriptive statistics characterised the cohort in terms of baseline demographics and clinical variables. Pearson’s Product Moment Correlations (r) were used to determine the relationships between candidate predictors and AQoL-6D utility score.
2.5.2 Cross-sectional model specification and assessment
Good practice guidance on mapping studies [4] does not advocate specific model types as the appropriate choice will vary depending on factors that include the health utility measure being mapped to, the applicable health condition and target population, the clinical variables used as predictors and the intended use of the mapping algorithm. Compared to the EQ-5D utility measures most commonly used in the mapping literature [23], AQOL-6D has better dimensional overlap with mental health measures [6] and is derived from a greater number of questionnaire items (which can generate more continuous utility score distributions). The types of models we explored reflected these considerations.
We used a cross-section of our dataset (baseline measures only) to explore appropriate type(s) of models to use, compare the relative predictive performance of candidate predictors, and identify other potential risk factors associated with quality of life independent of mental health measures.
As adolescent AQoL-6D utility score is normally left skewed and constrained between 0 and 1, ordinary least squares (OLS) models with different types of outcome transformations (such as log and logit) have been previously used in mapping [3]. Similarly, generalised linear models (GLMs) address this issue via modelling the distribution of the outcome variable and applying a link function between the outcome and linear combination of predictors [24]. Beta regressions, which can be considered a special form of GLMs, are another popular strategy for modelling health utility [25]. We chose to explore OLS, GLM and beta models with commonly adopted transformation algorithms. The models we selected for comparison were OLS regression with log, logit, log-log (f (y) = −log(−log(y))) and clog-log (f (y) = log(−log(1 − y))) transformation; GLM using Gaussian distribution with log link; and beta regression with logit and clog-log link. For each candidate model type, we evaluated the modelling performance and predictive ability of a univariate model using the candidate predictor with the highest Pearson correlation coefficient with utility scores. Ten-fold cross-validation was used to compare model fitting using training datasets and predictive ability using testing datasets using three indicators including R2, root mean square error (RMSE) and mean absolute error (MAE) [26, 27]. After identifying the best performing model type, we used 10-fold cross-validation to compare predictive ability of six mental health measures in predicting AQol-6D (one model for each candidate predictor).
To evaluate whether candidate predictors could independently predict utility scores, we added a range of independent variables to each of the six models that included participants’ age, sex at birth, clinical stage, cultural and linguistic diversity, education and employment status, primary diagnosis, region of residence (whether metropolitan – based on location of attending service) and sexual orientation. Functioning (as measured by SOFAS) was also included in each model to evaluate whether it can jointly predict utility with clinical symptom measurements.
2.5.3 Longitudinal modelling and assumption testing
After identifying the best mapping regression model(s) for predicting between person change, we established longitudinal models to predict within person change. This was achieved using generalised linear mixed-effect models (GLMMs) including both the baseline and follow-up data. All records with complete baseline data (with or without follow-up data) were included.
Bayesian linear mixed models were used to avoid common convergence problems in frequentist tools [28]. Linear mixed effect models (LMMs) can be fitted in the same framework with Gaussian distribution and identify link function. Clustering at individual level is controlled via including random intercepts. Model fitting was evaluated using Bayesian R2 [29].
We compared the model coefficients for each predictor’s score at baseline and score change from baseline to assess the potential bias of estimating score changes within individuals using score difference between individuals.
2.6 Replicability and reuse
We undertook all our analyses using R 4.0.5 [30], using a wide range of packages (see Online Supplement, A.5). To aid study replication and appropriate out of sample re-use of our mapping models, we developed novel software that we have previously described [14] and distributed study code, data and documentation in online repositories (see Availability of data and materials).
3 Results
3.1 Cohort characteristics
Participants characteristics are summarised in Table 1. This study included 1068 (out of the 1073) participants with complete AQoL-6D data at baseline. This cohort predominantly comprised individuals with anxiety/depression (76.650%) at early (prior to first episode of a serious mental disorder) clinical stages (91.707%). Participant ages ranged between 12-25 with a mean age of 18.129 (SD = 3.263).
There were 643 participants (60.205%) who completed AQoL-6D questions at the follow-up survey.
3.2 AQoL-6D and candidate predictors
Distributions of AQoL-6D total utility score and sub-domain scores are displayed in Figure 1. The mean utility score at baseline is 0.589 (SD = 0.235) and is 0.678 (SD = 0.238) at follow-up. Distributions of candidate predictors, K6, BADS, PHQ-9, GAD-7, OASIS and SCARED, are summarised in Table 2. PHQ-9 was found to have the highest correlation with utility score both at baseline and follow-up followed by OASIS and BADS; baseline and follow-up SCARED was found to have the lowest correlation coefficients with utility score, although all correlation coefficients can be characterised as strong.
3.3 Cross-sectional models
The 10-fold cross-validated model fitting indices from mapping models using PHQ-9 are reported in Online Supplement Table A.1. There were convergence issues with beta regression models and, as they do not show major advantages in predictive performance, these models were not considered further.
Model diagnoses (such as heteroscedasticity, residual normality) suggested better model fit of the clog-log transformed OLS model, as the distribution of clog-log transformed utility are closest to normal distribution among all transformation methods. Another benefit of the clog-log transformed model is that the predicted utility score will be constrained with an upper bound of 1, thus preventing out of range prediction. Therefore, OLS with clog-log transformation was chosen as the best model for further evaluation. The GLM with Gaussian distribution and log link is commonly used in mapping studies [23], so was included to provide comparisons with other published work.
PHQ9 had the highest predictive ability followed by OASIS, BADS, GAD7 and K6 (Online Supplement Table A.2). SCARED had the least predictive capability. The confounding effect of other participant characteristics were also evaluated, with SOFAS found to independently predict utility scores in models for all six candidate predictors (p<0.005). Sex at birth was found to be a confounder for the K6 model (p<0.01). A few other confounders, including primary diagnosis, clinical staging and age were identified as weakly associated with utility in mapping models using anxiety and depression measurements other than PHQ-9. Many of these factors are unlikely to change over three months, so were not evaluated in the mixed effect models.
3.4 Longitudinal models
Regression coefficients of the baseline score and score changes (from baseline to follow-up) estimated in individual GLMM and LMM models are summarised in Table Table 3). In GLMM and LMM models, the prediction models using OASIS and PHQ-9 respectively had the highest R2 (0.681 and 0.762). R2 was between 0.581 and 0.681 for all GLMMs and between 0.712 and 0.762 for all LMMs. Variance of the random intercept was comparable with the residual variance.
Distribution of observed and predicted utility scores and their association from GLMM (Gaussian distribution with log link) and LMM (c-loglog transformed) using PHQ-9 are plotted in Figure 2. Compared with GLMM, the predicted utility scores from LMM converge better to the observed distribution and provide better estimations at the tail of the distribution. When the observed utility scores were low, the predicted utility were too high in the GLMM model, see Figure 2 (B). The observed and predicted distributions of utility scores for other anxiety and depression measurements were similar for LMM. However, GLM had low coverage in utility scores below 0.3 and also made predictions out of range (over 1).
Model coefficients of score change from baseline were generally estimated to be lower compared with coefficients of baseline score (except for SCARED). The mean (across GLMM and LMM models) ratio of the two coefficients (βchange/βbaseline) is 0.823 for K6, between 0.802 and 0.850 for depression measurements and between 0.900 and 1.086 for anxiety measurements.
4 Discussion
Although there is encouraging evidence about the quality, effectiveness and cost-effectiveness of youth mental health service innovations worldwide [31, 32], the public health and economic returns from systemic reforms to support better mental health in young people needs to be better understood [33]. Our study contributes to this goal by developing tools that can facilitate derivation of QALYs from measures commonly collected in youth mental health services.
Our study is the first to evaluate longitudinal mapping ability between a wide range of affective symptom measurements and health utility in a cohort of help seeking young people. We were able to independently predict adolescent AQoL-6D from each of the six candidate measures we assessed, with PHQ-9 having the best predictive performance. Predictive performance was improved when adding a measure of functioning (SOFAS) as an additional predictor or confound to each model; SOFAS also performed well as an independent predictor. As many youth mental health services routinely collect data on at least one of the predictors included in our models, these mapping models may have widespread applicability.
Study results may be useful for helping service system planners to prioritise the measures to include in routine data collection. Although direct measurement of health utility with measures such as the ReQoL [34] may be feasible in some mental health services, relying on clinical measures that can also map to health utility may be an attractive alternative.
Our study extends recent work that showed both PHQ-9 and GAD-7 can be mapped to EQ-5D and ReQoL MAUIs [12] using data from a UK adult clinical mental health sample. Compared to that study, our mapping models have been derived from a clinical youth mental health sample, map to a different MAUI (adolescent AQoL-6D) and allow use of four additional clinical and one additional functional measures as predictors. Notably, although our sample had similar mean baseline PHQ-9 and GAD-7 scores (indicative of moderate severity) to that of the recent study, mean baseline health utility was lower (0.589 compared to 0.653 – 0.778), suggesting that the choice of instrument to map to has the potential to significantly influence the results of economic evaluations. Unlike the prior study, we did not use mixture models, but as health utility in our sample was more normally distributed, this is unlikely to be a limitation. Our modelling diagnostics suggest high performance.
A key feature of QALYs is their longitudinal dimension – health utilities are weighted and aggregated based on the time spent in varying health states. Our results suggest that cross-sectional variations in psychological distress, depression and anxiety measurements can be used to approximate the longitudinal change in health utility in this cohort. However, for psychological distress and depression measures at least, mapping algorithms developed from cross-sectional data may slightly over-estimate these changes, introducing bias into QALY predictions (overestimating QALYs for populations whose health utility improves over time, underestimating QALYs for those with deteriorating mental health).
Key strengths of our study include the novelty of our clinical youth mental health study sample, the use of clinically relevant and frequently collected outcome measures as predictors, the appropriateness and range of statistical methods deployed, the comparison of within-person and between-person differences in health utility weight predictions and highly replicable, publicly disseminated study algorithms. We acknowledge limitations that our data pertained to a single country, and we explored only one MAUI-derived utility weight. However, using utility weight input data derived from the same country as that to which an analysis pertains may be relatively unimportant [35], particularly when the MAUI is well suited to the relevant health condition (as is the case with AQoL and mental health [6]). We did not examine some potential predictors that may be more common in some mental health services (for example we explored K6, as opposed to the expanded, and commonly used measure, the K10). We also did not develop age-group specific models which would be a potentially useful extension of this work.
5 Conclusions
We have found that it is possible to predict both within-person and between-person differences in adolescent AQOL-6D utility weights from measures routinely collected in youth mental health services. Mapping algorithms developed from cross-sectional data can approximate longitudinal changes in health utility, but may slightly over-estimate these changes. The mapping algorithms we have developed can enable greater use of CUAs in economic evaluations using youth mental health data collections.
Availability of data and materials
Utility mapping models, instructions on how to apply them and study replication code are distributed as part of the study data repository (https://dataverse.harvard. edu/privateurl.xhtml?token=a437cc9c-b809-4513-bbef-a2333c1c934a). R packages are available for replicating and transferring study methods (https://ready4-dev.github. io/TTU/) and applying the mapping models to out of sample data (https:// ready4-dev.github.io/youthu/).
Competing Interests
None declared.
Funding
The study was funded by the National Health and Medical Research Council (NHMRC, APP1076940), Orygen, headspace and an Australian Government Research Training Program (RTP) Scholarship.
Ethics
The study was reviewed and granted approval by the University of Melbourne’s Human Research Ethics Committee and the local Human Ethics and Advisory Group (1645367.1)
Data Availability
Detailed results in the form of catalogues of the utility mapping models produced by this study and other supporting information are available in the results repository https://doi.org/10.7910/DVN/DKDIB0. Tools for finding and using the utility mapping models with new prediction datasets are available as part of the youthu R package (https://ready4-dev.github.io/youthu/). The TTU (https://ready4-dev.github.io/TTU/) has tools for both replicating the study and generalising our algorithms to develop utility mapping algorithms with other utility measures and predictors. A program to replicate all steps in the study from data ingest to manuscript creation is available at https://doi.org/10.5281/zenodo.6116077.
https://doi.org/10.7910/DVN/DKDIB0
https://ready4-dev.github.io/TTU
Footnotes
Modified abstract. Removed some content on computational implementation. Correction of incorrect referencing.