Bivariate joint modelling for mixed continuous and binary responses in the presence of non-monotone, non-ignorable missingness: The case of prostate cancer

Madiha Liaqat; Shahid Kamal; Florian Fischer

doi:10.1101/2023.08.22.23294418

Abstract

Joint modelling for mixed longitudinal responses has played a prominent part in disease decision-making. It is based on a joint strategy of estimating joint likelihood with shared random effects. Non-ignorable missingness in outcomes increases complexity in joint model; a shared parameter model is proposed to incorporate non-ignorable missing data for joint modelling of longitudinal responses and missing data mechanism. Parameters are estimated under the Bayesian paradigm and implemented via Markov chain Monte Carlo (MCMC) methods with Gibbs sampler. To demonstrate the effectiveness of the proposed method, the joint model is applied to analyze a prostate cancer dataset. The objective is to assess whether there is an association between two mixed longitudinal biomarkers, which could have important implications for understanding disease progression and guiding treatment decisions. The dataset contains non-monotone missingness pattern. To evaluate the performance and robustness of the proposed joint model, simulation studies are conducted.

1. Introduction

Longitudinal studies have a wide range of applications in medicine, where measurements are repeatedly observed for individuals over time. These measurements are correlated even for different types of responses and collected for the same individuals. For example, in oncology research, more than one biomarker of mixed types is measured repeatedly over time for the same patient. These biomarkers are inherently associated – and this association is accounted for valid conclusions. Simultaneous modelling of mixed biomarkers is an optimal approach to account for correlation among multiple biomarkers [1].

Missing data is an unavoidable issue in longitudinal studies because patients may not come for a pre-specified follow-up visit at a particular time point for many reasons. In this regard, missingness exists in collected data and creates hindrances to extracting inference without making assumptions about the missingness mechanism. According to Rubin [2], missing completely at random (MCAR) and missing at random (MAR) mechanisms being not dependent upon unobserved measurements are ignorable. Not missing at random (NMAR) mechanism may depend upon both observed and unobserved responses and is considered non-ignorable. Non-ignorable missingness mechanism is considered in data analysis with longitudinal measurements to get unbiased estimates.

In literature, selection models, pattern-mixture models (PMMs), and shared parameter models (SPMs) are mostly applied to tackle non-ignorable missing data. These models use different factorizations for missingness and measured processes to jointly model longitudinal responses incorporating missingness. The SPM approach is widely applied to model MAR data in longitudinal responses. Still, there exists a gap to employ SPM for NMAR. Non-ignorable missing data is addressed by employing logistic or probit models, which are specifically designed to handle situations where the missing data mechanism is influenced by unobserved variables or responses, resulting in NMAR mechanisms. By integrating the logistic or probit model into the analysis, researchers can effectively mitigate potential biases caused by non-ignorable missing data that consequently makes precise inferences based on the available data. Logistic and probit models provide valuable insights into the underlying patterns of missingness, contributing to obtaining reliable results in longitudinal studies and other scenarios where missing data is common.

Joint modelling terms are applied to simultaneously analyzed more than one outcome. Many methods have been proposed to jointly analyze single or multivariate longitudinal measurements alongside event time data in recent years [3,4]. Another joint modelling term is used to simultaneously analyze more than one longitudinal outcome possibly of different types using random effects, marginal or conditional models [5,6]. Catalano and Ryan [7] analyzed toxicity data proposing bivariate latent variable models to account for the relationship between fetal weight and malformation in live fetuses. Leon and Carriere [8] studied one longitudinal and one binary response to assess maternal smoking’s effect on the respiratory illness of children. Liu et al. [9] formulated a joint model to handle longitudinal binary and continuous responses by incorporating an ignorable missing data mechanism. Li et al. [10] applied joint modelling for continuous, binary, and ordinal responses under the Bayesian framework, while Kürüm et al. [11] proposed a joint model to analyze binary and continuous responses under frequentist statistics.

Under the umbrella of non-ignorable missing data (NMAR) fall both non-monotone and monotone missingness patterns, which in turn provide incomplete observations. Intermittent missingness comes under non-monotone missingness pattern, where an individual misses any visit during the follow-up time, and returns to show up for subsequent visits. The monotone missingness pattern usually refers to informative drop-out, where an individual leaves the study before completion and never comes back to complete follow-up time. Stubbendick and Ibrahim [12] used a likelihood-based approach to incorporate non-monotone NMAR by proposing a joint likelihood of outcome and missing data indicators. Hogan et al. [13] worked on monotone missingness in longitudinal data. Gaskins et al. [14] worked on non-ignorable drop-out mixed longitudinal responses using PMM under joint modelling methodology.

In this article, a joint model is proposed for longitudinal continuous-binary biomarkers using a conditional modelling approach, and two shared parameter models are proposed for non-ignorable missingness under the conditional model. For estimating the model’s parameters different approaches are employed: likelihood-based approach, linear programming technique, and Bayesian approach. We work under the Bayesian framework to obtain parameter estimates.

The article presents a robust statistical approach to analyze longitudinal mixed-effects biomarker measurements, accounting for non-monotone, non-ignorable missingness in the context of prostate cancer research. By utilizing joint modelling techniques and considering the missingness mechanisms, the researchers aim to obtain more reliable and meaningful results from their data analysis.

2. Motivation: Prostate cancer dataset

This research is motivated by a comprehensive dataset obtained from prostate cancer patients who underwent external beam radiotherapy (EBRT), Androgen deprivation therapy (ADT), and a combination of ADT along with other therapies. This dataset spans from 2012 to 2019 and includes information collected during up to 5 follow-up visits after treatment. The data was sourced from Mayo hospital, a renowned referral hospital in Pakistan. Data was received in 2021 by the authors of this study. The authors had no access to information that could identify individual participants.

The evaluation of patients took place from the pre-treatment phase to the post-treatment follow-up. During the pre-treatment phase, demographic characteristics and medical history of the patients were recorded. We specifically focused on patients’ age, body mass index (BMI), and Gleason score as baseline predictors for our study. Blood tests were also conducted to measure prostate-specific antigen (PSA), alkaline phosphatase (ALT), platelets, and bilirubin. These biomarkers were tracked after treatment at each follow-up visit, with PSA and ALP values considered as indicators of prostate cancer progression after treatment, and endogenous platelets and bilirubin as time-dependent covariates. Our study included a total of 1,504 patients diagnosed with prostate cancer, out of which 1,026 received ADT and combination therapies as part of their treatment.

3. Model specification

3.1 Model for longitudinal measurements

Data analysis is described by the joint model specification; sub-models illustrate such mixed types of complex data modelling. The first sub-model assumes to follow linear mixed-effects model specified for continuous longitudinal biomarker. Binary longitudinal response is assumed to follow mixed-effects logistic regression model.

Let Y_ij be the continuous longitudinal outcome for i^th individual i = 1,2,3,….,n at time t_ij j = 1,2,3,….,m^y, assume linear mixed effects model as where, α₁ is a p^y dimensional vector of fixed effects regression coefficients, vector of covariates. is q^y dimensional vector of random effects, u_1i is a vector of random effects, that is independently and identically distributed as multivariate normal with mean vector 0 and covariance matrix D. is the random error term.

Let, Z_ij be the binary repeated measurements observe for i^th individual, i = 1,2,3,….,n at time s_ij. Where s_ij = t_ij, j = 1,2,3,….,m^z, binary response Z_ij assumes to follow logistic mixed effects model that is given by where, α₂ is a p^z dimensional vector of fixed effects regression coefficients, vector of covariates. is q^y dimensional vector of random effects, u_2i∼MVN(0, D) is a vector of random effects, with the addition of associated parameter ϒ_j to measure the effect of response Y_ij at time t_ij on response Z_ij at the same time s_ij.

3.2 Missing data mechanism and shared parameter model

Longitudinal data are not fully observed which leads to incomplete measurements. The most important consideration is to specify an appropriate missing data mechanism based on assumptions related to unobserved and observed data. Little and Schluchter [15] as well as Fitzmaurice and Laird [16] applied general location model described by Olkin and Tate [17] assuming the ignorability assumption. This assumption leads to the usage of observed responses only without considering a model for missingness mechanism. Usually, the expectation-maximization (EM) algorithm is employed for parameter estimation.

In this study, we examine two distinct missingness mechanisms to account for the presence of two different missingness patterns in longitudinal outcomes, two SPM are specified for missingness mechanisms.

Let R_i = (R_i1, R_i2,R_i3,….,R_in)′ be the vector of response indicators, if Y_ij observed R_ij = 1, and R_ij = 0 for missing Y_ij. In this paper, a SPM based on non-ignorable missigness mechanisms for Y_i and Z_i consider, assuming logistic mixed effects regression models for and responses.

Let be the missingness indicator for continuous response at time t_ij, such that if Y_ij is not fully observed, the model for is given by

Let be the missingness indicator for continuous response at time s_ij, such that if Z_ij is not fully observed, the model for is given by

To formulate joint distribution of responses given the random effects, let Y_ij and Z_ij are partitioned into and , respectively.

Joint model given the random effects is given as

Joint probability distribution function is formulated further as

3.3 Prior distributions and hierarchical model

The unknown parameters of the proposed model are estimated by employing the Bayesian framework using Markov chain Monte Carlo (MCMC) techniques [18]. MCMC has an advantage over conventional methods, with MCMC conditional distribution of each parameter given others is easily specifying. Priors are chosen to carry out Bayesian inference for unknown parameters, let Ø = {α₁,α₂,σ²,D₁,D₂,Φ₁,Φ₂,π^y,π^z,ϒ_l,κ_k} be the unknown parameters’ vector where k = 1,2,3,…., m^y and l = 1,2,3,….,m^z. We specify proper prior for each set of unknown parameters: Gaussian distribution for the fixed effects, and Inverse Gamma distribution for random effects. Different levels of variances for each distribution are tried out to make robust choice of fixed effects estimates. Independent prior distributions for unknown parameters are chosen by assigning hyperparameters that lead to low-informative prior distributions. where IΓ(a,b) and Γ(a,b) denote inverse gamma distribution and gamma distribution with shape parameter a and scale parameter b. IWishart (ψ, v) represents the inverse Wishart distribution with scale and matrix parameters v and ψ, respectively. N(μ, ∑) denotes a normal distribution with mean vector μ and covariance matrix ∑.

The joint posterior density is formulated as where, Θ(Ø) is the joint prior distribution of unknown parameters.

Samples are drawn iteratively from conditional posterior distributions derived from (8) using Gibbs sampler, the full conditional distributions for parameters are given by

Where

4. Simulation studies

Some simulation studies are conducted to assess the performance of the proposed model. The first simulation is designed with assuming no association between two responses, and between measurements and missingness processes. The second simulation assumes different associations between biomarkers, also between biomarkers and missingness processes. Each simulation has 100 random samples and each sample includes 500, 1,000, and 3,000 subjects.

For each individual, the two longitudinal measurements are generated from the following joint model, where , Z_ij = 1 for and Z_ij = 0 for

For missingness processes, following models are assumed to generate data, where, where,

To make simulation feasible, each dataset included small to large individuals with longitudinal measurements. Follow-up visits are scheduled at time = {0, 1, 2, 3, 4, 5}. The true values considered for α₁ = α₂ = (1, ― 1,0,2,1,1,1, ― 1, ― 1)′ and Φ₁ = Φ₂ = (2,1)′. x₁₁,x₁₂,x₁₃,x₁₄ are generated from N(0,1), and x₁₅,x₁₆,x₁₇ are generated from binom(n,1,0.1). u_i = (u_1i,u_2i) is generated from MVN = (0,D), where

For the SPM incorporating non-monotone, non-ignorable missing data mechanism, model (26) and model (27) are formulated following Fitzmaurice et al. [19]. We first considered independence (ϒ_j,κ_j) = (0,0)′. Sample sizes are assessed by considering n = 500, n = 1,000, and n = 3,000 individuals with different association parameters as (ϒ_j,κ_j) = (1.000, ― 1.000)′, (ϒ_j,κ_j) = (0.500, ― 0.500)′, (ϒ_j,κ_j) = (― 0.500, ― 0.500)′.

Sensitivity analysis of posterior distribution is assessed by selecting different prior distributions. In the case of informative priors, the structure of prior distributions take main focus during the sensitivity analysis. Non-informative priors are assessed based on changes in posterior inference. All our parameters are checked for different parameters’ values to assess robustness of posterior means. In addition, sensitivity analysis can also be done for missingness mechanisms, where the assessment for ignorable and non-ignorable missingness can be evaluated. We do not apply sensitivity analysis for different missingness mechanisms, as it is beyond the scope of our study objective.

A computational procedure to estimate parameters in proposed joint model is conducted using Gibbs sampling by WinBUGS software. Two parallel MCMC sampling chains run with different starting values, convergence of chains is examined by trace plots and with diagnostic statistics suggested by Gelman et al. [18]. Posterior estimates are based on 10,000 iterations after discarding 5,000 of the burn-in period.

Data are generated under two scenarios: biomarker variability is not associated with each other (ϒ = 0), and both biomarkers are associated with each other (ϒ = ± 0.5, or ± 1.0). With that, it is also evaluated whether either missingness process is associated with the measurements process (κ = ± 0.5, 0r ± 1.0) or not. Each dataset is then analyzed using our proposed joint model.

Based on the results presented in Table 1, it is to be noted that the performance of the proposed joint model is good. Consistency of parameter estimates is evaluated by increasing sample size which causes reduced bias and standard error for respective parameters. The same interpretation of parameter estimates holds for zero to non-zero association parameters.

View this table:

Table 1:

Results of the simulation study for binary and continuous longitudinal biomarkers data incorporating NMAR based on joint modelling for the simulation study. Estimate (Est.), standard error (SE), Bias for N = 100 simulated data with sample size 500, 1,000, and 3,000 considering association (ϒ_j,κ_j) = (0.5, ― 0.5)′.

5. Prostate cancer data analysis

The primary objective of prostate cancer (PC) data analysis is to detect any potential association between PSA and ALP biomarkers, and to simultaneously analyze both. PC data were collected from n =1504 patients who had at least 2 measurements of PSA, and ALP. Patients’ Age, Platelets, BMI, Bilirubin, Gleason Score, Grade are recorded along with the prescribed Drug. Following joint model is considered, and

In addition, for non-ignorable missingness mechanisms following models are proposed, where where, , and

Two parallel MCMC chains runs with different initial values for 10,000 iterations and discarded first 5,000 iterations as pre-convergence burn-in. Convergence of MCMC chains is checked using Gelman-Rubin diagnostic test. In addition, trace plots can plot for unknown parameters to assess convergence. To get posterior inference, prior distributions for unknown parameters are selected as, u_i = (u_1i,u_2i)∼MVN((0,0),D_u), i = 1,…,n, where D_u∼IWishart , such that the hyper-parameters of and which lead to low-informative priors. It is also to be assumed that , the regression coefficients (α_i1,α_i2,ψ_i1,ψ_i2)′ are fixed effects unknown parameters and the prior distributions for them are N(0,1000), γ_j, j = 1,…,m^y is the associated parameter of the continuous longitudinal log(PSA) on the binary longitudinal ALP at time j and the prior distributions for it is N(0,1000). Prior distribution for associated parameter κ_j, j = 1,…,m^z is N(0,1000), π^logPSA,π^ALP∼N(0,1000).

For model selection, we apply the deviance information criterion (DIC), which identifies the most suitable model by balancing between likelihood and parameters’ numbers. Posterior means and the number of parameters are used in this criteria to find the best-fitted model [20].

Where, and .

D(Ø) and are posterior estimates of D(Ø) and Ø. w = w_i = (w₁,w₂,w₃,….,w_n) is full data with marginal density, and Ø are vectors of model parameters. The lowest DIC leads us to the best model fit, that is our proposed joint model.

The parameter estimates and their 95% posterior intervals are presented in Table 2 and summarized for joint model: PSA level decreases with respect to time after treatment, and it is a good sign to evaluate efficiency of prescribed treatment when this decrease is significant. PSA alone is not an adequate biomarker, ALP measurements are also taken into account. ALP increases with respect to time for individuals, but this increase has not had any significant effect. The results indicate that with increasing age, both PSA and ALP increase. This positive increase is significant for PSA but non-significant for ALP. Platelets are another predictor of increasing PSA significantly, but due to one unit increase in platelets, ALP level decreases non-significantly with 0.026 on average. With increasing BMI, patients’ PSA and ALP both tend to significantly decrease. One unit increase in bilirubin measurement makes a significant increase of 0.201, and 0.617 in PSA and ALP measurements, respectively. The results show that PSA and ALP measurements are lower among those patients with a Gleason score greater than or equal to (4+3) as compared to those whose score is lower than and equal to (3+4), it is due to treatment effect. This study’s results revealed that patients who received ALT, prostatectomy, and their combinations have lower PSA and ALP measurements as compared to those who received EBRT; and this is a good sign to prove drug efficacy. There exists a positive correlation between PSA and ALP measurements that can be revealed with associated parameters ϒ; while a negative association exists between missingness and measurement processes. Results revealed that due to the existence of a positive association both PSA and ALP measurements should be simultaneously taken on PC patients to get insights about PC progression. In addition, missingness must be incorporated into data to avoid loss of information.

View this table:

Table 2: Parameter estimates (Mean), standard deviation (Sd), and 95% CI (credible interval) of PC data, by applying our proposed joint SPM, and separate models.

6. Conclusion

The primary goal of this manuscript was to jointly model different types of longitudinal outcomes incorporating non-ignorable missingness. One can directly apply mixed modelling techniques with considering missingness as an ignorable phenomenon. However, follow-up studies could get over the hurdles of loss of information in case of missingness by proposing SPM for mixed-longitudinal outcomes with missingness. For each missing data process, a logit model is proposed via a latent variable to depict the tendency of change for individuals. SPM proposed to join longitudinal and missingness models using random effects. This paper presents a joint modelling approach to analyze longitudinal biomarkers measurements in the presence of non-ignorable missing data due to both intermittent and monotone missingness. In our PC dataset, the non-monotone missingness pattern exists due to patients who missed visits, and monotone missingness data are from those patients who left the follow-up and never came back to complete the treatment process. We use logistic models to describe the missingness patterns. Our proposed model is adequate to account for the association between measurements and missingness processes, and it can be extended to models with more than two mixed longitudinal outcomes incorporating more than two missing outcomes processes.

Correlated random effects provide inference for non-ignorable missing data. However, the non-ignorable missingness assumption is untestable for the data at hand. In our collected PC dataset, non-ignorable assumption is verified as missingness occurred due to lack of treatment efficacy. Therefore, we incorporated missingness process in data analysis. However, researchers must put extra care to check assumptions about missingness mechanism, if the cause of missing data is not internally related to responses; local sensitivity analysis should be performed.

This study emphasis that PC patients must be monitored for PSA and ALP simultaneously. It is important to take into consideration both PSA and ALP levels as both can influence the health of PC patients. In addition, missing observations must be incorporated in data analysis to get full information about patients’ health and wellbeing. An elevated level of PSA shows the non-effectiveness of particular treatment, and an elevated level of ALP depicts the spread of PC tumor cells.

Data Availability

Data cannot be shared publicly because it is secondary data taken from a hospital. Data is available from the corresponding author upon reasonable request for researchers who meet the criteria for access to confidential data.

Acknowledgements

None.

Footnotes

madiha.phd.stat{at}pu.edu.pk
kamal0shahid{at}gmail.com

References

1.↵
Buhule OD, Wahed AS, Youk AO. Bayesian hierarchical joint modeling of repeatedly measured continuous and ordinal markers of disease severity: Application to Ugandan diabetes data. Stat Med. 2017;36(29):4677–91.
OpenUrl
2.↵
Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92.
OpenUrl CrossRef Web of Science
3.↵
Gao F, Miller JP, Xiong C, Beiser JA, Gordon M. The Ocular Hypertension Treatment Study (OHTS) Group.: A joint-modeling approach to assess the impact of biomarker variability on the risk of developing clinical outcome. Stat Methods Med. 2011;20(1):83–100.
OpenUrl
4.↵
Alam K, Maity A, Sinha SK, Rizopoulos D, Sattar A. Joint modeling of longitudinal continuous, longitudinal ordinal, and time-to-event outcomes. Lifetime Data Analysis. 2021;27:64–90.
OpenUrl
5.↵
Bahrami Samani E, Ganjali M. Bayesian latent variable model for mixed continuous and ordinal responses with possibility of missing responses. J Appl Stat. 2011;38(6):1103–16.
OpenUrl
6.↵
Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes. Biostatistics. 2005;6(1):11–25.
OpenUrl CrossRef PubMed Web of Science
7.↵
Catalano P, Ryan LM. Bivariate latent variable models for clustered discrete and continuous outcomes. J A Stat Ass. 1992;87(419):651–8.
OpenUrl
8.↵
Leon AR, Carriere KC. On the one sample location hypothesis for mixed bivariate data. Commun. Statist. Theory and Meth. 2000;29(11):2573–81.
OpenUrl
9.↵
Liu X, Daniels MJ, Marcus B. Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial. J Am Stat Assoc. 2009;104(486):429–38.
OpenUrl PubMed
10.↵
Li Q, Pan J, Belcher J. Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events. Stat Methods Med Res. 2016;25(6):2521–40.
OpenUrl CrossRef PubMed
11.↵
Kürüm E, Hughes J, Li R. A semivarying joint model for longitudinal binary and continuous outcomes. Canadian J Stat. 2016;44(1):44–57.
OpenUrl
12.↵
Stubbendick AL, Ibrahim JG. Maximum likelihood methods for nonignorable missing responses and covariates in random effects models. Biometrics. 2003;59(4):1140–50.
OpenUrl CrossRef PubMed Web of Science
13.↵
Hogan JW, Roy J, Korkontzelou C. Handling drop-out in longitudinal studies. Stat Med. 2004;23(9):1455–97.
OpenUrl CrossRef PubMed Web of Science
14.↵
Gaskins JT, Daniels MJ, Marcus BH. Bayesian methods for nonignorable dropout in joint models in smoking cessation studies. J Am Stat Assoc. 2016;111(516):1454–65.
OpenUrl
15.↵
Little RJ, Schluchter M. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika. 1985;72(3):497–512.
OpenUrl CrossRef Web of Science
16.↵
Fitzmaurice GM, Laird NM. Regression models for mixed discrete and continuous responses with potentially missing values. Biometrics. 1997;53:110–22.
OpenUrl CrossRef PubMed Web of Science
17.↵
Olkin I. Tate RF. Multivariate correlation models with mixed discrete and continuous variables. Annals Math Stat. 1997;32:448–65.
OpenUrl
18.↵
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. CRC Press; 2013.
19.↵
Fitzmaurice GM, Laird NM. Ware JH. Applied Longitudinal Analysis. Wiley, Hoboken; 2012.
20.↵
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639.
OpenUrl CrossRef