Characterization of direct and/or indirect genetic associations for multiple traits in longitudinal studies of disease progression
==================================================================================================================================

* Myriam Brossard
* Andrew D. Paterson
* Osvaldo Espin-Garcia
* Radu V. Craiu
* Shelley B. Bull

## Abstract

When quantitative longitudinal traits are risk factors for disease progression, endogenous, and/or subject to random errors, joint model specification of multiple time-to-event and multiple longitudinal traits can effectively identify direct and/or indirect genetic association of single nucleotide polymorphisms (SNPs) with time-to-event traits. Here, we present a joint model that integrates: *i)* a linear mixed model describing the trajectory of each longitudinal trait as a function of time, SNP effects and subject-specific random effects, and *ii)* a frailty Cox survival model that depends on SNPs, longitudinal trajectory effects, and a subject-specific frailty term accounting for unexplained dependency between time-to-event traits. Inference is based on a two-stage approach with bootstrap joint covariance estimation. We develop a hypothesis testing procedure to identify direct and/or indirect SNP association with each time-to-event trait. Motivated by complex genetic architecture of type 1 diabetes complications (T1DC) observed in the Diabetes Control and Complications Trial (DCCT), we show by realistic simulation study that joint modelling of two time-to-T1DC (retinopathy, nephropathy) and two longitudinal risk factors (HbA1c, systolic blood pressure) reduces bias and improves identification of direct and/or indirect SNP associations, compared to alternative methods ignoring measurement errors in intermediate risk factors. Through analysis of DCCT, we identify two SNPs with indirect associations with multiple time-to-T1DC traits and obtain similar conclusions using alternative formulations of time-dependent HbA1c effects on T1DC. In total, joint analysis of multiple longitudinal and multiple time-to-event traits provides insight into aetiology of complex traits.

Key words
*   joint models
*   longitudinal study
*   direct and/or indirect genetic association
*   pleiotropy
*   complex genetic architecture
*   multiple-trait analysis
*   measurement error
*   patient’s trajectory
*   mixed model
*   frailty model

## Introduction

Despite their known ability to improve inference in clinical and epidemiological studies, particularly in the presence of informative censoring/dropout or when longitudinal traits are measured with error1–3, joint models for longitudinal and time-to-event outcomes have received limited attention in genetic association studies. Genome-wide association studies (GWAS) of intermediate quantitative traits (QTs) typically require follow-up studies to identify whether SNP associations detected with each of the QT(s), analyzed separately, also affect related outcomes through direct and/or indirect effects induced by those traits. Such intermediate traits may include established risk factors for related clinical outcomes and be measured with errors (e.g. biological variation). By accounting for QT measurement error and dependencies among traits, joint models can improve accuracy and efficiency of effect estimation, as well as detection of SNP associations.

The so-called shared-random-effects joint model4,5 consists of a sub-model for a single longitudinal trait and a sub-model for a single right-censored time-to-event trait. The longitudinal sub-model describes the QT as an underlying smoothed trajectory that depends on fixed effects of time and baseline covariates, as well as subject-specific random effects. The joint model association structure is induced via the functional dependence between the hazard of an event at any time *t* and the longitudinal trait trajectory6,7. This relationship can be based on prior biological knowledge of the link between that longitudinal and time-to-event traits. As elucidated by Ibrahim et *al*1, this class of joint models lends itself to interpretations of direct/indirect effects because the relationship between a baseline covariate, such as a SNP genotype, and each of the longitudinal and time-to-event traits, as well as the relationship between the longitudinal and time-to-event traits can be specified via model parameters corresponding to direct, indirect and overall effects. Extensions of joint models for multiple longitudinal and/or multiple time-to-event traits can further improve inference by borrowing information among related traits. Extensions of joint models have been reviewed for multiple longitudinal traits6,7 or multiple time-to-event traits8. A few extensions for both multiple longitudinal and multiple time-to-event traits have been developed (9–11, among others), but these models are often formulated for a specific study question, and thus can lack generalizability. Such extensions raise computational challenges for maximisation of the likelihood under the multivariate random effect’s distribution. Two-stage approaches12–15 for joint model fitting appear more computationally efficient and lends itself to more flexible model formulation but inference can be mis-calibrated because parameter estimates and predictions from Stage 1 are obtained from the longitudinal model without consideration of the time-to-event outcome, and the uncertainty in Stage 1 estimates is ignored during Stage 2 estimation, a problem known as propagation of errors16.

Motivated by the complex genetic architecture of long-term type 1 diabetes complications (T1DC), we propose joint-model extension to evaluate genetic associations with multiple time-to-event traits and multiple longitudinal risk factors. Risk of development of T1DC, including diabetic retinopathy (DR) and diabetic nephropathy (DN), is hypothesized to result from multiple genetic factors with potential direct and/or indirect effects via multiple - shared and/or specific - risk factors17. In addition to genetic factors, hyperglycemia (measured by Hemoglobin A1c, referred as HbA1c) represents the major QT risk factor for T1DC. Other measured longitudinal traits, that can also be influenced by genetic factors, have postulated associations with T1DC, for example, association of systolic blood pressure (SBP) with DN. Because conventional analysis methods in GWAS ignores dependency between related traits, SNPs can appear associated with multiple traits (including SNPs with direct and/or indirect effects induced by intermediate QT(s) measured or unmeasured). Accurately distinguishing between direct and/or indirect associations is crucial to elucidate the biological pathways through which genetic factors operate into the etiology of those complex traits, like in our motivating example of the T1DC genetic architecture. A naïve approach to disentangle direct from indirect associations can be achieved by including the intermediate QT as a time-dependent covariate in a Cox Proportional Hazard (PH) model. However, measurement errors in intermediate QT(s), presence of informative censoring in longitudinal trait, unmeasured shared risk factors between longitudinal/time-to-event traits can challenge accurate classification of direct and/or indirect SNP associations.

The *major* contribution of our work is a general formulation for the joint analysis of multiple time-to-event traits and multiple longitudinal QT risk factors in genetic association studies. We develop inference methods for statistical genetic analysis based on joint model parameters estimation, including a hypothesis testing procedure to classify direct and/or indirect SNP associations with each time-to-event trait. A *second* contribution of the paper is the implementation of a data-informed simulation algorithm to generate multiple causal SNPs with various direct effects on *simulated* time-to-event traits and/or indirect effects via observed (*measured*) longitudinal QTs in the Diabetes Control and Complications Trial (DCCT) study18,19 and unobserved (*simulated*) longitudinal QTs. This algorithm also provides a general approach to estimate achievable power given sample size. Our simulation results show bias reduction and improved classification of direct and/or indirect SNP associations in comparison with alternative approaches ignoring measurement errors in longitudinal QT(s). *Thirdly*, we apply the joint model in DCCT data and clarifies two SNPs as having indirect associations via the HbA1c longitudinal risk factor, with consistent conclusions using alternative time-dependent association structures that account for established cumulative and time-weighted effects of HbA1c on T1DC traits20,21. Example R codes for data simulation and for application of the proposed joint model are available on GitHub.

## Methods

### Joint Modelling Approach

#### Model Formulation

We assume that a set of *M* SNPs are available with *K* (*1 ≤ k ≤ K)* unordered and non-competing time-to-event traits, such as multiple disease complications, and *L* (*1 ≤ l ≤ L)* longitudinal QT traits (*i*.*e*. intermediate risk factors) possibly measured with error (*i*.*e*. biological variation) in *N* unrelated individuals indexed by *i* (*1* ≤ *i* ≤ *N*). For each SNP*m* (*1 ≤ m ≤ M*), with minor allele frequency (MAF) *p**m*, the genotypes are coded as the number of copies of the minor allele, under an additive genetic model. For each individual ![Graphic][1]</img> denotes the vector of quantitative measures collected over scheduled visits at *t**ij* for *j*=1,…, *n**i* for the *l**th* longitudinal trait *(1≤l≤ L)* with *n**i* represents the maximum number of visits recorded. Let (*T**i*(*k*), *δ**i*(*k*)) be the vector of observed right-censored event time *T**i*(*k*) and event indicator *δ**i*(*k*) for the *k**th* time-to-event trait.

We assume ![Graphic][2]</img>, where ![Graphic][3]</img> is the latent (uncensored) time-to-event *k* and *C**i* is the censoring time (e.g. administrative censoring). We define ![Graphic][4]</img>, with *δ**i*(*k*) = 1 if the event occurs during the observation period ![Graphic][5]</img> and *δ**i*(*k*) = 0 otherwise.

To characterize the genetic architecture of a system of *multiple* time-to-events and *multiple* longitudinal risk factors, we formulate a shared-random-effects joint model, as specified in **Figure 1**, combining: ***(i) longitudinal*** and ***(ii) time-to-event sub-models*** connected by ***(iii) specified time-dependent association structures***.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/05/11/2021.05.10.21256880/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/F1)

Figure 1. Proposed joint modelling approach for characterization of complex genetic architecture of multiple disease progression

##### (i) Longitudinal sub-model

The *longitudinal sub-model* is specified by a *multivariate mixed model* for the *L* longitudinal QTs (or risk factors), based on the multivariate extension of the Laird and Ware linear mixed model22. It describes each vector of *observed* longitudinal QT measures *Y**i*(*l*)(*t**ij*) as noisy observations of a *true* and *unobserved smoothed* subject-specific longitudinal trajectory, ![Graphic][6]</img>, with noise terms ![Graphic][7]</img>. Each *smooth* trajectory describes subject-specific evolution and depends on time, SNP effect, individual-level random effects *b**i*(*l*) and can also include effects of other baseline covariates, for example confounding factors or ancestry-related principal components. To simplify the presentation, we assume a linear trajectory, but the longitudinal sub-models can be adapted for nonlinear trajectories using, for example, higher order polynomials or splines23.

###### Multivariate mixed model

![Formula][8]</img>  ![Formula][9]</img>  

Where:

*   *β*0(*l*), *β*1(*l*), *β**g*(*l*) and *β**h*(*l*) denote, respectively, the fixed intercept, and baseline effects of time, SNP and other covariate(s) on each longitudinal trait *l*.

*   *b**i* = ((*b**i*0(*l*), *b**i*1(*l*))′, …, (*b**i*0(*L*), *b**i*1(*L*))′) ′ is the 2x*L* vector of random effects with *b**i*∼*N*2*L*(0, *D*).

*   ![Graphic][10]</img> denotes the *overall* variance-covariance matrix for random effects, accounting for serial dependencies *within* longitudinal traits (*D**l,l*, unstructured) and cross-dependencies *between* longitudinal traits (*D**l,m* for two traits *l*≠*m*).

*   ![Graphic][11]</img> is the vector of error terms where ![Graphic][12]</img> is assumed independent and identically distributed.

*   ![Graphic][13]</img> is the residual variance of the QT *l*. We further assume that *ε**ij*(*l*) and *b**i*(*l*) are independent22.

##### (ii)Time-to-event sub-model

The *time-to-event sub-model* is specified by a Proportional Hazards (PH) mixed effects model (also known as frailty PH model) where the hazard function of each time-to-event trait *k* depends on the SNP effect adjusted for a function of the subject’s longitudinal trajectories *W**i*(*k*)(*t*). We introduce a subject-specific random effect (frailty term, *u**i*) to capture unexplained dependencies (e.g. due to unmeasured shared factors) among the *K* time-to-event traits. Given the common frailty term *u**i*, the *K* time-to-event traits are assumed independent24.

###### Proportional Hazards mixed effects sub-model

![Formula][14]</img>  

Where:

*   *λ*0(*k*)(*t*) is a (parametric or non-parametric) baseline hazard function for the time-to-event trait *k*.

*   *γ**g*(*k*) denotes the SNP effect on the time-to-event *k* adjusted for the indirect SNP effects via longitudinal QT(s), taken into account by *W**i*(*k*)(*t*).

*   *γ**v*(*k*) is the effect of the baseline covariate vector(s). Those covariates can be shared or trait-specific covariates. They can also be time-dependent covariates for all or a subset of the *K* time-to-event traits (in this case, *V**i*(*k*) = *V**i*(*k*)(*t*)).

*   *u**i* and *b**i*, random effects from the time-to-event and longitudinal sub-models respectively, are assumed independent. We assume *u**i*∼Γ(*a, b*), with *a, b*>0.

##### (iii) Time-dependent association structures

The *time-dependent association structures, W**i*(*k*)(*t*), that connect the *longitudinal* and *time-to-event sub-models* (**Figure 1**), account for the specific temporal relationships between *each* set of *L**k* (1 ≤ *L**k* ≤ *L*) associated QT(s) and *each* time-to-event trait *k*. ![Formula][15]</img>  

Where:

*   1 ≤ *L**k* ≤ *L* is the set of longitudinal risk factors of the time-to-event trait *k*.

*   ![Graphic][16]</img> denotes the *functional* form of the exposure effect on time-to-event trait *k* corresponding to the *l*-th longitudinal risk. In the case of a *contemporaneous* parametrization, the hazard of an event *k* at a time *t* depends on the longitudinal trait value at the same time *t* (i.e. ![Graphic][17]</img>. Other parametrizations have been described in the literature6,7,25.

*   *α**l*(*k*) denotes the association parameter of the function of the *smooth* longitudinal trajectory ![Graphic][18]</img> with the time-to-event *k*.

*   ![Graphic][19]</img>, *with* 0 ≤ *s* ≤ *t*, 1 ≤ *l* ≤ *L*} denotes the history of the underlying longitudinal trajectory for trait *l* up to time *t*.

###### Interpretation

When the joint model is *correctly* specified, the parameters accurately represent multiple relationships between each SNP and the correlated traits. Note that *β**g*(*l*) is the SNP effect on the longitudinal risk factor *l, γ**g*(*k*) is the *direct* SNP effect on the time-to-event trait *k* via other mechanisms than the indirect SNP association induced by the observed longitudinal risk factors. In this context, *θ**k*(*l*) = µ*g*(*k,l*) + *γ**g*(*k*) is interpreted as the *overall* SNP effect on the time-to-event trait *k*, where µ*g*(*k,l*) = *α**l*(*k*)*β**g*(*l*) represents the *indirect*1 SNP effect on the time-to-event trait *k* via the longitudinal QT *l*.

Under the proposed joint model formulation, when a longitudinal risk factor *l* is *associated* with a time-to-event trait *k* (*α**l*(*k*) ≠ 0), a SNP that has an effect on the longitudinal QT risk factor *l* (*β**g*(*l*) ≠ 0), but no effect on the time-to-event trait *k* (*γ**g*(*k*) = 0), is interpreted as having an ***indirect* SNP association** and its *overall* effect is equal to its *indirect* effect (*θ**k*(*l*) = µ*g*(*k,l*), with µ*g*(*k,l*) = *α**l*(*k*)*β**g*(*l*)). A SNP with an effect on the time-to-event trait *k* (*γ**g*(*k*) ≠ 0), but no effect on the longitudinal QT *l* (*β**g*(*l*) = 0), is interpreted as having a ***direct* SNP association** and its *overall* effect is equal to its *direct* effect (*θ**k*(*l*) = *γ**g*(*k*)). A SNP with an effect on the longitudinal risk factor *l* (*β**g*(*l*) ≠ 0) and an effect on the time-to-event trait *k* (*γ**g*(*k*) ≠ 0) is interpreted as having both ***direct* and *indirect* SNP associations** via distinct genetic pathways. In this case, its *overall* effect is an aggregation of both *direct* and *indirect* effects (*θ**k*(*l*) = µ*g*(*k,l*) + *γ**g*(*k*), with µ*g*(*k,l*) = *α**l*(*k*)*β**g*(*l*)). It is obvious that when an associated longitudinal risk factor is omitted from a time-to-event model, as occurs in separate analyses of longitudinal and time-to-event traits, the marginal SNP effect on the time-to-event trait will reflect partially the overall SNP effect *θ**k*(*l*). This is particularly the case in GWAS when intermediate risk factors are ignored in the analysis models.

###### Effect estimation and test statistic construction

To mitigate the computational complexity involved in the maximization of the joint likelihood and allow more flexibility in the model formulated, we estimate the parameters using a two-stage approach (see **Appendix**). We work within the framework defined by Tsiatis and Davidian26,27 which guarantees that our estimators are consistent and asymptotically normal. Specifically, in Stage 1 we fit the multivariate mixed model (Equation 1) using the *mvlme*() function from the R package JoineRML28 to estimate the parameters of the longitudinal trajectories of the risk factors (Equation 2), and obtain fitted values of the smoothed trajectories. In Stage 2, we fit a Cox PH frailty time-to-event model (Equation 3) adjusting for functions of the smoothed trajectories as time-dependent covariates using the *coxph*() function from the R survival29,30 package. We assume a Gamma distribution for the frailty term (*u**i*) and a separate baseline hazard function for each time-to-event trait using the *strata* argument in *coxph*(). To account for propagation of errors, due to the uncertainty in Stage 1 estimates ignored during Stage 2 parameter estimation16 and empirically estimate the joint covariance matrix of SNP and trajectories effects we apply a nonparametric bootstrap31. For each bootstrap sample *b* (1 ≤ *b* ≤ *B, B* is the total number of bootstraps), we generate a new dataset by randomly sampling *N* individuals with replacement and refitting the joint model on each new dataset *b*. We compute the empirical joint covariance matrix of the estimated parameters using the *B* bootstrap parameter estimates. Wald statistics for each *β**g*(*l*) are computed as ![Graphic][20]</img> using the empirical bootstrap standard errors ![Graphic][21]</img>, and similarly for each *γ**g*(*k*) as ![Graphic][22]</img>.

###### Procedure to classify direct/indirect SNP associations

In **Table 1**, we present a practical hypothesis testing procedure to classify SNPs as having direct and/or indirect association with a time-to-event trait *k* and each associated longitudinal risk factor *l*. This procedure requires two significance thresholds, ![Graphic][23]</img> and ![Graphic][24]</img> for *β**g*(*l*) and *γ**g* (*k*) respectively, to be specified prior to the analysis and adjusted for the number of SNPs tested. Depending on the research question, we can choose different values for ![Graphic][25]</img> and ![Graphic][26]</img>, or the same one ![Graphic][27]</img>. The latter is applicable for instance, when we want to systematically classify direct/indirect association among a set of *M* SNPs, and the former when assessing which SNPs, among the ones reported to be associated with the longitudinal risk factor, have a direct effect on a time-to-event trait.

View this table:
[Table 1.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/T1)

Table 1. Hypothesis testing procedure to classify a SNP as having a direct and/or an indirect association with a time-to-event trait *k* and an associated longitudinal risk factor *l*.

### Motivation: The DCCT Studies

The DCCT randomized-controlled trial was *pivotal* in demonstrating that intensive insulin therapy prevents and delays progression of long-term T1DC18. Due to significant outcome differences between the intensive and conventional treatment groups at interim analysis, the DCCT trial was stopped early. At the closeout visit, patients were administratively censored and had a mean follow-up time of 6.5 years (range 3 to 9), 99% had completed the study, and more than 95% of all scheduled examinations were completed18. DCCT participants continued to be followed for disease progression after the trial in a long-term epidemiologic cohort study and were genotyped for GWAS in the DCCT Genetics Study. Because the goal of intensive therapy was to reduce HbA1c into the non-diabetic range, which produced treatment differences in HbA1c values, we focus on *N*=667 unrelated individuals of European descent ancestry from the Conventional treatment group. Longitudinal measurements for HbA1c and SBP were collected at up to 39 quarterly visits during DCCT, while DR and DN events were diagnosed at annual and semi-annual visits respectively. HbA1c and SBP were recorded irrespective of the occurrence of any complication event(s). Although the DCCT implemented robust quality assurance procedures to minimize potential sources of error during and after data collection32,33, HbA1c and SBP are subject to measurement error (e.g. individual variability). The first GWAS to analyze the DCCT study phenotypes19 identified two SNPs associated with cumulative HbA1c at genome-wide significance in the Conventional treatment arm, namely rs10810632 (in *BNC2*, 9p22.2) and rs1358030 (near *SORCS1*, 10q25.1), and reported consistent associations with secondary outcomes of time-to-DR and/or time-to-DN. Subsequent association studies also reported suggestive evidence for pleiotropy between DR and DN34. To preserve the inherent within-individual variability, we designed simulation study evaluations of the joint modelling approach that generates event times from HbA1c and SBP measurements in the DCCT Study data.

### Simulation study

#### Design of the DCCT-data-based simulation study

To assess the performances of the methods, we simulate the data under a *genetic causal* scenario that imitates the complex genetic architecture of T1DC (**Figure 2**). The genetic association model involves *N*=667 DCCT subjects with *K*=2 non-independent *simulated* time-to-T1DC traits (DR, DN) that depend on *M*=5 causal SNPs with *direct* effects on each time-to-T1DC and/or *indirect* effects via *L*=3 longitudinal QTs: two as *measured* in DCCT (HbA1c, SBP) and one other *simulated* (*U*) to induce some unexplained shared dependency between the T1DC traits. We also introduce an effect of sex on SBP, and effects of T1D duration (at baseline) on both T1DC traits, as observed in the original DCCT data.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/05/11/2021.05.10.21256880/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/F2)

Figure 2. Realistic DCCT-data-based *causal* genetic scenario
The effect of gender on SBP (noted as *H**i*(2)), and effects of T1D duration at baseline (noted as *V**i*(*k*)) on both time-to-T1DC (*k*=1, 2) traits are not represented in this figure, but are included in the data generating model, see **Supplementary Information SI-1** for details. We simulated *R*=1000 replicates of *N*=667 DCCT individuals with *M*=5 causal variants and *K*=2 time-to-event traits simulated under this causal genetic scenario and *R*=1000 replicates of *M*=5 SNPs (with same MAF as the causal ones) under a global null genetic scenario where none of the SNPs is associated with any traits.

#### Algorithm for realistic data generation under a complex genetic architecture

To generate such a data structure that combines *observed* and *simulated* traits, we formulate a data generating model including: *(i) L=3* linear mixed models linking each of the SNPs with an indirect effect to each longitudinal risk factor, and *(ii) K=2* non-independent parametric time-to-event models depending on SNPs with *direct* effects and on functions of the longitudinal QT trajectories. For each DCCT individual *i* with observed HbA1c (*Y**i*(1)), SBP (*Y**i*(2)), visit times (*t**i*) vectors at *n**i* time points, and covariates (*H**i*(*l*), *V**i*(*k*)), we simulate longitudinal trait vector *U**i*, time-to-event traits and genetic data at *M* causal SNPs as illustrated in **Figure 3** and detailed in **Supplementary Information SI-1**. All SNPs are simulated under Hardy-Weinberg and linkage equilibrium assumptions. Particularly, SNPs with indirect effects are simulated from the *observed* (SNP1, SNP5) or *simulated* (SNP3) longitudinal QTs, while SNPs with direct effects (SNP2, SNP4) on time-to-T1DC traits are simulated independently of the longitudinal QTs and are included in the specified hazard function used to simulate each time-to-event trait *k* (**Figure 3**).

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/05/11/2021.05.10.21256880/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/F3)

Figure 3. Illustration of the procedure developed for DCCT-based simulation study under the scenario from Figure 2
*Ω*(1), *Ω*(2) and *Ω*(*U*) are specified parameter values for the longitudinal trait models, and Γ(1), Γ(2) for the time-to-event trait models, *p* is the specified vector of minor allele frequencies vector of the *M* SNPs. Parameters for the causal genetic scenario in **Figure 2** are summarized in **Table 2** and in **Table S1**. The DCCT-data-based simulation algorithm uses longitudinal trait/baseline covariates from DCCT and simulated genetic data and time-to-event traits. SNPs with indirect effects are simulated from a multinomial distribution with conditional genotype probabilities (*π**i*, *π*1*i*, *π*2*i*) using longitudinal trait values for individual *i*. Here, *X**i* and *Z**i* denote the design matrices of the fixed and random effects of the mixed model assumed for each longitudinal trait and *D* is the specified covariance matrix for the random effects. SNPs with direct effects are simulated from the population probabilities, that depend on the MAF. Each time-to-event trait *k* is simulated by calculating the inverse of the cumulative specified hazard function using the Brent univariate root-finding method34,35. For DR, ![Graphic][28]</img>and for ![Graphic][29]</img>. We further added the effect of the longitudinal trajectory ![Graphic][30]</img> to each *η**i*(*k*) in the hazard function of each trait *k* to induce some unexplained dependencies between the simulated time-to-event traits. Details of the DCCT-based simulation procedure are presented in **Supplementary Information SI-1**.

For each SNP*m*, we specify MAF (*p**m*) and the SNP effects (*β**g*(*l*), *γ**g*(*k*)), as well as the other parameter values according to the DCCT Genetics Study and the T1DC literature (**Table 2, Tables S1**). We assume contemporaneous association structures for HbA1c and SBP effects on simulated time-to-T1DC traits. We simulate each time-to-event trait *k* under a Weibull model, with shape and scale parameters specified to generate ∼54% DR events and ∼25% DN events in each of the *R*=1000 replicated datasets generated under the *causal* scenario from **Figure 2**. Under the *global null* genetic scenario, where none of the SNPs is associated with any traits, we simulate *M* SNPs independently of the traits with the same MAF as for the causal SNPs.

View this table:
[Table 2.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/T2)

Table 2. Simulated scenarios of direct and/or indirect SNP associations, assessed using *R*=1000 replicates of *N*=667 DCCT subjects simulated under the *causal* genetic scenario.
This table presents parameters specified for SNPs and trajectory effects on time-to-event traits for the DCCT-based simulation study and illustrates how each type of SNP would be detected in GWAS discovery analysis based on separate analysis of each trait.

#### Scenarios for DCCT-based complex genetic architecture

Overall, the simulated complex genetic architecture covers multiple types of SNP-trait associations (**Figure 2** and **Table 2**): *direct* association with each T1DC trait (SNP2, SNP4), *indirect* associations with both T1DC traits via *measured* (SNP1) and *unmeasured* (SNP3) longitudinal QTs; and *direct and indirect* associations via a *measured* longitudinal QT (SNP5); all longitudinal risk factors are subject to measurement error. Except for SNP3, all other SNP scenarios represent GWAS discovery SNPs detectable by separate GWAS of a longitudinal risk factor (SNP1, SNP5) or a time-to-event trait (SNP2, SNP4, SNP5).

SNP1, SNP3 and SNP5 have indirect effects on T1DC traits, such that their associations with the T1DC traits are detectable with the marginal Cox PH time-to-event models (**Tables 2** and **S2**). SNP1 corresponds roughly to rs10810632 and rs1358030 reported in the motivating DCCT GWAS of HbA1c19, while SNP5 represents a top hit detected by each GWAS (longitudinal and time-to-event traits), analyzed separately. Because of their association with longitudinal and time-to-event traits, SNP1 and SNP5 associations with time-to-event outcomes are conventionally investigated using the Cox PH model adjusting for the *observed* longitudinal trait as a time-dependent risk factor to determine if SNP association with the time-to-event trait is fully (or partially) explained via the associated longitudinal QT(s).

### Analysis of the simulated data

For the analysis of *each* SNP, done *separately*, we compare four analysis models on the simulated data:

*   JM-cmp: a *completely* specified joint model that includes all other non-SNP variables used for the data simulation.

*   JM-mis: includes the same variables as JM-cmp, except *U*, the unobserved longitudinal QT.

*   JM-sep(*k*): includes the same variables as JM-mis but is fitted separately for the *two* time-to-event traits. JM-sep(*k*) does not account for the shared dependency between the time-to-event traits.

*   CM-obs: a Cox PH frailty model that includes the same variables as JM-mis, but adjusts for the *observed* longitudinal QT values as time-dependent covariates.

Due to the latent nature of *U, JM-cmp* cannot be fitted in practice because *U* is unobserved, but we include it as a benchmark for comparison against the *observable* models fitted without *U*. For all these models, empirical covariance matrices are computed using 500 bootstrap iterations. We assess the performance of the hypothesis testing procedure to classify direct/indirect association for each of the 5 causal SNPs analyzed separately with each joint model (JM-cmp, JM-mis). To assess the impact of longitudinal trait measurement errors on classification of SNPs, we substitute test ![Graphic][31]</img> for *γ**g*(*k*) estimated by the joint model by ![Graphic][32]</img> obtained with CM-obs. In our evaluations, we assume the level of significance *P**∗* such that ![Graphic][33]</img> with *P**∗* varying from 0.05 to 5×10−8. Under the *causal* genetic scenario, we estimate for each SNP the proportion of replicates that detect the correct direct and/or indirect association simulated (correct classification rate) and the proportion of replicates where the causal SNP association is misclassified (misclassification rate). Under the *global null* genetic scenario, we assess the classification test for each SNP and each type of association at *P**∗* = 0.05 and *P**∗* = 0.01 to assess the type I error control of the hypothesis testing procedure for each test of direct and/or indirect association.

### Availability

DCCT data are available to authorized users at [https://repository.niddk.nih.gov/studies/edic/](https://repository.niddk.nih.gov/studies/edic/) and [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v3.p1](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v3.p1). Example R codes for DCCT-data-based simulation and analysis of the simulated data are provided on GitHub ([https://github.com/brossardMyriam/Joint-model-for-multiple-trait-genetics](https://github.com/brossardMyriam/Joint-model-for-multiple-trait-genetics)). File S1 includes the Supplementary Information; File S2 includes the Supplementary Figures and File S3 includes the Supplementary Tables.

## Results

### Simulation results

#### Estimation accuracy and single-parameter tests validity

When the longitudinal QT(s) are observed and measured with random *errors*, the simulation results confirm that the proposed joint model reduces the bias of the parameter estimates for all types of SNP association (**Figure S1** and **Tables S3-S6**), even when the analysis model is not fully specified. This applies when: *(i)* two correlated time-to-event traits are analyzed *jointly* rather than *separately* in the joint model (JM-cmp/JM-mis compared to JM-sep(*k*)); *(ii)* a SNP has a *direct* effect on a time-to-event trait with a low event rate (SNP4/SNP5, JM-cmp/JM-mis compared to CM-obs); *(iii)* a SNP has an *indirect* effect via a longitudinal QT measured with errors (for example SNP5, JM-cmp/JM-mis compared to CM-obs). Particularly, for SNP5, CM-obs overestimates the *direct* SNP5 effect on time-to-DN due to its inability to fully account for the indirect SNP5 effect induced via SBP. Direct SNP2 and SNP4 effects on time-to-T1DC traits show overall the larger relative bias reduction using joint models versus CM-obs (Ranges with JM-mis 0.702-0.983, with JM-cmp 0.934-0.966, see **Tables S4 and S6**).

When a SNP has an indirect effect on both T1DC traits fully explained via an *unmeasured* longitudinal QT (SNP3), all methods that ignore the longitudinal QT (JM-mis, JM-sep(*k*) or CM-obs compared to JM-cmp) produce biased direct SNP3 estimates for both T1DC towards the overall SNP3 effect (*θ**k*(*U*) = *α**U*(*k*)*β**g*(*U*) =0.36, **Figure S1-c** and **S1-d, Table S5**). In this scenario, the use of the frailty term in JM-mis does not reduce this bias (JM-mis *vs* JM-sep).

#### Classification of direct/indirect SNP associations with time-to-event traits

Under the *global null* genetic scenario of no genetic association with any of the traits, single-parameter SNP test *P*-values ![Graphic][34]</img> and ![Graphic][35]</img> from the joint model do not show departure from the expected large sample distributions (*χ*2 with 1 degree of freedom (*df)*, see **Figure S2**) and the type I error of each test under the null is reasonably well controlled (**Table S7**). The hypothesis testing procedure applied to single-parameter SNP tests, classifies each SNP as direct or indirect association with each T1DC trait with rates close to the nominal level (**Table 3** and **Figure S3**). However, the classification rates for test of *direct and indirect* SNP association test appears less than nominal (see **Table 3** for SNP5 with SBP/DN traits and **Figure S3** for the other SNPs with other QT/time-to-event trait pairs).

View this table:
[Table 3.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/T3)

Table 3. Classification rates of the procedure for each SNP under the *global null* genetic scenario, assessed using *R*=1000 replicates of *N*=667 DCCT subjects.

Under the *causal* scenario, the hypothesis testing procedure has high correct classification rates for *indirect* (SNP1) or *direct* associations (SNP2, SNP4), as shown in **Figures 4-a, 4-b** and **S4**. Conclusions are similar for JM-mis, with attenuated correct classification rates compared to JM-cmp for direct SNP associations but still higher than classification rates based on CM-obs (particularly for SNP4). Compared to results based on CM-obs, the largest relative classification improvement is found for direct SNP4 association with DN (ranges: 1.18-251.9 with JM-cmp and 1.18-110.98 with JM-mis, **Figure 4**).

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/05/11/2021.05.10.21256880/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/F4)

Figure 4. 
**Performance of the testing procedure to recover underlying generating *causal* relationships from Figure 2, assessed using *R*=1000 replicates of *N*=667 DCCT subjects**.

The *Y* axis represents the proportion of replicates that accurately classify each causal SNP as: **(a.)** indirect; **(b.)** direct and **(c.)** direct and indirect causal associations for SNP1, SNP2, SNP4 and SNP5. SNP3 is tested as a direct association because the longitudinal risk factor *U* is assumed unmeasured. Results obtained for direct association of SNP3 and SNP4 with DN/SBP are similar to those presented for DN/HbA1c (**Figure S4**) and are not shown here. Plot **(d.)** shows that SNP5 is misclassified as an indirect association in a larger number of replicates compared to **(c.)** by the joint models compared to CM-obs. This misclassification rate tends to increase with decreasing *P**∗*. Classification rates of all these SNPs with other time-to-event/longitudinal trait pairs are presented in **Figure S4**.

On the other hand, SNP3 with an *indirect* effect via the unobserved longitudinal trait *U*, is most frequently detected incorrectly as a *direct* association with each T1DC trait by both JM-mis and CM-obs (**Figure 4-b**). SNP5, which has both *direct* and *indirect* effects on DN induced via SBP, is most frequently misclassified by JM-cmp and JM-mis as an *indirect* association at significance levels varying from *P**∗* = 0.01 to 5×10−8 (**Figure 4-c** and **4-d**). SNP5 misclassification rate even increases with *P**∗* decreasing to 5×10−8, due to the direct SNP5 ![Graphic][36]</img>. In comparison, the procedure based on CM-obs results most frequently correctly classify SNP5 association in more than 75% of the replicates at the same significance levels (**Figures 4-c** and **4-d**); this is explained by the overestimated direct SNP5 effect, arising from measurement error in SBP as noted before.

In summary, our simulations show that by accounting for measurement errors in the longitudinal QT risk factors and for dependencies between/within the traits, the proposed joint model improves estimation accuracy and correct classification of associations of SNPs *directly* associated with each time-to-event trait or *indirectly* associated via a measured longitudinal QT in comparison to classification using the CM-obs approach. However, when a SNP has both *direct* and *indirect* effects on a time-to-event trait, the proposed testing procedure can be conservative since it requires the joint significance of the two SNP effects, *β**g*(*l*) and *γ**g*(*k*), where the power of each test depends on effect size and the trait distribution. As a result, a SNP with a direct and an indirect association can be misclassified as either a direct or an indirect association. Finally, when a SNP has an indirect effect on both T1DC traits via an unmeasured QT, the testing procedure based on JM-mis, that captures some of the unexplained dependency between time-to-event traits through the frailty term, does not prevent misclassification as a direct association. This observation also shows the importance of the correct specification of the joint model - including all the intermediate QT(s) - to avoid misclassification of direct and/or indirect SNP associations.

### Application in the DCCT Genetics Study data

We demonstrate the feasibility of the proposed approach by an application in the DCCT Genetics Study data. We use time to mild DR and time to persistent microalbuminuria, for DR and DN outcomes respectively, as previously defined in the motivating GWAS of HbA1c19 (see **Supplementary information SI-2** for details). Genome-wide genotyping in DCCT subjects was performed using HumanCoreExome Bead Array (Illumina, San Diego, CA, USA) and standard quality controls procedures were applied to individuals and genetic markers19,35. Ungenotyped autosomal SNPs were imputed using 1000 Genomes data phase 336 (v5) and minimac337 (v.1.0.13), as previously described35.

Out of the 667 DCCT individuals, we analyze *N*=516 subjects with genetic data, without mild to moderate non-proliferative retinopathy or without DN event at DCCT baseline. By the time of the DCCT close-out visit, 297 (57.6%) experienced a DR event, 61 (11.8%) a DN event, including 47 subjects (9.1%) that experienced both events. After SNP filtering and pruning on linkage disequilibrium (see **Supplementary Information SI-2** for details), we analyze 307 candidate SNPs reported as associated with HbA1c, SBP, and multiple definitions of DR and/or DN19,34,38–42 (see **Table S8** for the full list of SNPs).

Compared to the DCCT-based simulated datasets, the proportion of DN events observed in the application is lower. We expect that the test power for direct SNP association with DN under the joint model to be reduced; this implies reduce chances to correctly classify a SNP with either a *direct* or a *direct and indirect* association with DN. We also observe larger effects of HbA1c on T1DC traits (*α*1(*k*), *k*=1, 2) and a smaller effect of SBP on DN (*α*2(2)), as shown in **Tables 2 vs S9**. Larger *α**l*(*k*) values increase the contribution of the indirect effect (µ*g*(*l,k*)) to the overall SNP effect (*θ**k*(*l*)), which can result in a direct SNP effect (*γ**g*(*k*)) more severely biased if the longitudinal QT is not accounted properly in the joint model, while lower *α**l*(*k*) values reduce µ*g*(*l,k*) and thus *γ**g*(*k*) is less severely biased. Overall, in the DCCT application, we expect a SNP with an *indirect* effect via HbA1c on any T1DC trait to be more subject to be misclassified as having both a *direct* and *indirect* association, compared to the simulation results, while a SNP with an *indirect* effect via SBP on DN is less subject to this misclassification.

Given prior evidence for cumulative effects of HbA1c on T1DC traits20,21, we compare joint model results obtained with contemporaneous, updated cumulative mean, and time-weighted cumulative HbA1c effects on T1DC (**Supplementary Information SI-2** for details). We present results under the latter association structure since it has a stronger prior association with T1DC and in the DCCT individuals analyzed here (**Table S9**). We obtain similar conclusions with alternative association structures (**Figure S5-a**). As shown in **Figure 5-a**, rs10810632 and rs1358030 are classified as indirect associations with both T1DC traits via their association with HbA1c shared risk factor (![Graphic][37]</img> and ![Graphic][38]</img>, *P**∗**=*1.7×10−4 after Bonferroni correction for the 289.02 effective SNPs tested43). These two SNPs have calculated indirect effects (µ*g*(1,*k*)) and 95% bootstrap confidence intervals that do not include 0 (**Figure 5-b**). As noted before, rs10810632 and rs1358030 were discovered previously in a GWAS of HbA1c in DCCT19. Classification of these two SNPs as direct and/or indirect associations may be affected by the Winner’s curse bias, due to the modest sample size in this analysis, and the same individuals contributing to discovery and classification44,45. For the other candidate SNPs which were discovered in larger and/or independent studies19,34,38–42, we find little evidence of association with the discovery traits in marginal analyses (**Table S8**).

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/05/11/2021.05.10.21256880/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2021/05/11/2021.05.10.21256880/F5)

Figure 5. Classification of direct and/or indirect SNP associations in the DCCT Genetics Study data.
**(a.)** Scatter plots of the P-values (-log10) for tests of *β**g*(*l*) (*X* axis) and *γ**g*(*k*) (*Y* axis) for DR/HbA1c, DN/HbA1c and DN/ SBP trait pairs. Significance levels *P**∗* =1.7×10−4 and *P**∗* = 0.05 are indicated by red and grey horizontal and vertical dashed lines. **(b.)** Association results for the two SNPs that exibit indirect association with DR/DN outcomes at *P**∗* =1.7×10−4. Left panel present results from separate analysis of each trait; and right panel presents results from the joint model with bootstrap 95% confidence intervals for the direct and indirect SNP effects. Results are presented using the time-weighted cumulative HbA1c effects on T1DC traits.

Conclusions regarding classification of direct and/or indirect associations based on the procedure using ![Graphic][39]</img> from CM-obs are similar to those based on the joint model results for all SNPs (**Figure S5-b**), although we noticed some differences in ![Graphic][40]</img>; that could be explained by the attenuated association of HbA1c and SBP with T1DC traits due to measurement errors ignored from CM-obs (**Tables S10 *vs* S9**).

## Discussion

We present new methods for statistical genetic analysis under a joint model specification of multiple time-to-event traits and multiple longitudinal risk factors designed to characterize the complex genetic architecture of related traits in longitudinal studies of disease progression. The proposed model accounts for dependencies within and between traits and can account for trait-specific and shared covariates, measurement errors in the longitudinal traits, as well as effects of unobserved baseline confounding factors between the time-to-event traits through the subject-specific frailty term.

Evaluation by *realistic* data-informed simulation study of complex T1DC genetic architecture shows that the proposed joint model improves estimation accuracy and efficiency compared to the Cox PH frailty model adjusted for observed longitudinal trait values as time-dependent covariates. This improvement holds particularly when the longitudinal risk factors are measured with errors or the time-to-event outcome has a low event rate. Indeed, measurement error attenuates the association of the longitudinal risk factor with the time-to-event trait, and as a result, if a SNP has an indirect effect induced by the QT, the estimated direct SNP effect on a time-to-event is biased away from zero. The magnitude of this bias tends to be worse with larger measurement error14,15. Identification of direct and/or indirect SNP associations is also facilitated under joint modelling in the presence of risk factor measurement error. When, however, a SNP has an indirect effect via an unmeasured longitudinal risk factor, both the proposed and Cox PH model approaches produce biased estimates of the direct SNP effects. The extent of this bias depends on the magnitude of the effect sizes *β**g*(*l*) and *α**l*(*k*). Although the frailty term in the joint model time-to-event sub-model captures some of the unexplained shared dependency between the time-to-event traits, we did not observe bias reduction of the direct SNP estimate in the case of an unmeasured risk factor, which can lead to misclassification of the SNP as a direct association. This may be a limitation of using a time-invariant frailty term that does not fully account for the indirect SNP effect via the unmeasured longitudinal risk factor. Because joint significance of both SNP effects is required to classify a SNP as having both direct and indirect associations, a SNP with both direct and an indirect association can be misclassified as having either a direct or an indirect association if one of the SNP effects is too small to be detected at the specified level. The proposed procedure can be adapted to other research questions, for example to identify SNPs that have direct association among the SNPs found associated with a longitudinal risk factor. In this case, it may be desirable to apply a less stringent significance threshold for the test of the direct SNP effect while controlling the overall type I error. Alternatively, intersection-union tests, as based on the likelihood ratio test46, could be used to assess joint significance of *β**g*(*l*) and *γ**g*(*k*) but would require computation of the full joint likelihood.

Application of the proposed joint model approach in longitudinal studies of disease progression, such as in the DCCT Genetics Study, improves classification of direct and/or indirect SNP association which can help to elucidate the genetic architecture of complex traits. In the context of mediation analysis, Liu et al47 discussed various formulations and interpretations of joint models, with shared-random-effects accounting for potential unmeasured baseline confounding factors between *one* longitudinal and *one* time-to-event traits. Using applications in datasets from two clinical trials, they illustrate interpretation of sensitivity analysis to unmeasured baseline confounders. Adaptation of the joint model we propose for multiple longitudinal and multiple time-to-event traits for mediation analysis requires extension of the mediation assumptions48,49 to the case of multiple mediators and multiple time-to-event traits. Specific evaluations of the proposed model under these assumptions are also warranted.

Although our primary aim in this report is to develop statistical methods to accurately distinguish among direct and/or indirect SNP associations with each time-to-event trait, the multi-trait aspect of the joint model lends itself to development of multi-trait SNP association testing for SNP discovery. In **Supplementary Information SI-3**, we present a joint-parameter test based on a generalized Wald statistic. In application to the simulated DCCT-based complex genetic architecture, we observe good type I error control under the global genetic null scenario, improved power for SNP discovery when a SNP has multiple trait effects, and power maintenance in other SNP association scenarios.

We acknowledge several features of the proposed joint model approach that warrant examination in further work. *Firstly*, to reduce computational complexity and improve model flexibility, we used two-stage parameter estimation. In some circumstances, this approach can produce biased estimates and/or underestimated standard errors16. Biased estimates can result from non-random censoring of the longitudinal trait values due to the occurrence of an event or from informative dropout50,51. Our simulation results show minimal biases in the absence of informative censoring, even when the joint model is mis-specified. In the DCCT application, characterized by administrative censoring and a high completion rate, these biases are of minimal concern because longitudinal trait values continued to be recorded regardless of the occurrence of any T1DC events; we estimated the trajectories using all the available measurements. Furthermore, we obtain robust bootstrap estimates of the covariance matrix, and simulation results under the null did not show deviation from expected distributions. Use of the bootstrap also facilitates joint testing of parameters from both stages. In the presence of informative censoring, we recommend sensitivity analysis using existing implementations joint likelihood estimation. To our knowledge those implementations only exist for simpler joint model formulation with either one longitudinal and one time-to-event trait52 or multiple longitudinal traits and one time-to-event outcome28,53. *Secondly*, because the joint model integrates longitudinal and time-to-event sub-models, model misspecification can occur in multiple ways and lead to invalid inference54. Therefore, we recommend a careful assessment of the model assumptions using joint model diagnostic tools23. *Thirdly*, patient visits were scheduled with high frequency in DCCT, so we ignored the modest degree of interval censoring in the current implementation of the joint model; when there are longer gaps between visits, extended methods are needed to account for interval censoring with additional simulation studies to assess impact on joint model estimates. *Lastly*, joint models are very computational demanding, particularly for genetic association studies that test millions of variants. In DCCT application, it took ∼60 seconds to fit the joint model for each SNP and ∼1090.32 more seconds (∼ 18 minutes) to estimate the covariance matrix with 500 bootstraps run in parallel on 4 nodes (each node with 40 CPU and 202 GB RAM). While analysis at the genome level involving, up to 85 million 1000G-imputed SNPs in DCCT, seems computationally unrealistic, a screening approach without bootstrap to select the SNPs based on their association *P*-values, followed by the bootstrap refinement would reduce the computational burden. Recent computationally efficient algorithms have been developed to improve the efficiency of the linear mixed model55 and Cox PH model56,57 for genetic association studies, but to date, they remain to be implemented for multivariate outcomes.

With the increasing development of national biobanks58–60 consisting of a large collection of phenotypes, environmental factors, biomarkers and other risk factors collected over time and linked with genetic data, we anticipate that joint model methods can be applied to characterize the genetic architecture of complex traits using these data resources61. In addition, the results of such analysis could contribute towards the translation of human genetic findings to personalized medicine by providing more efficient SNP effect estimates for increased precision in polygenic risk score development62, and causal inference using mediation and mendelian randomization studies. Lastly, the joint model framework can also enable dynamic prediction beneficial for dynamic risk assessment7,63 and optimization of intervention strategies64,65.

## Supporting information

File S1 includes the Supplementary Information [[supplements/256880_file02.pdf]](pending:yes)

File S2 includes the Supplementary Figures [[supplements/256880_file03.pdf]](pending:yes)

File S3 includes the Supplementary Tables [[supplements/256880_file04.xlsx]](pending:yes)

## Data Availability

DCCT data are available to authorized users at https://repository.niddk.nih.gov/studies/edic/ and https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v3.p1. Example R codes for DCCT-data-based simulation and analysis of the simulated data are provided on GitHub (https://github.com/brossardMyriam/Joint-model-for-multiple-trait-genetics).

[https://repository.niddk.nih.gov/studies/edic/](https://repository.niddk.nih.gov/studies/edic/) 

[https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study\_id=phs000086.v3.p1](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v3.p1). 

[https://github.com/brossardMyriam/Joint-model-for-multiple-trait-genetics](https://github.com/brossardMyriam/Joint-model-for-multiple-trait-genetics) 

## Contributions

All named authors affirm that authorship is merited based on the International Committee of Medical Journal Editors (ICMJE) authorship criteria. MB, SBB and RVC designed the study. MB conducted the simulation study and data analysis and drafted the manuscript. SBB, RVC and OEG contributed to the data analysis and interpretation of the results and reviewed/edited the manuscript. ADP contributed to the acquisition of the data and reviewed/edited the manuscript. All authors approve the final version to be published. SBB and MB are the guarantors of this work.

## Acknowledgements

This study uses the data provided by the Diabetes Control and Complications Trial / Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Research Group which is sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases contract (#N01-DK-6-2204), National Institute of Diabetes and Digestive and Kidney Diseases grants (#R01-DK-077510, #R01-DK077489, and #P60-DK20595). Funding for genotyping by Illumina HumanCoreExome was provided by JDRF (#17-2013-9). The authors are grateful to the subjects in the DCCT/EDIC cohort for their long-term participation. A complete list of the individuals and institutions participating in the DCCT/EDIC Research Group can be found in **Supplementary Information**. This project was supported by: CIHR Operating/Project Grants (#MOP-84287, #PJT159509), CANSSI postdoctoral fellowship (MB), CIHR STAGE fellowships (MB and OEG, #GET-101831). Computations were performed on the Niagara supercomputer at the SciNet HPC Consortium and Galen, the HPC facility at the Lunenfeld-Tanenbaum Research Institute (LTRI). SciNet is funded by: the Canada Foundation for Innovation under the auspices of Compute Canada; the Government of Ontario; Ontario Research Fund - Research Excellence; and the University of Toronto. The LTRI HPC facility was also supported by the Canada Foundation for Innovation.

## Appendix

Inference under the joint likelihood can be based on joint maximum-likelihood estimation, Bayesian estimation of the joint likelihood parameters, or a two-stage strategy23. Joint maximum-likelihood and Bayesian methods enable joint parameter estimation but can be computationally challenging, especially for multiple traits. Two-stage approaches12–15 sequentially fit the longitudinal (Stage 1) and time-to-event (Stage 2) sub-models, with trajectory estimates from Stage 1 incorporated into Stage 2; this can be relatively computationally efficient and lends itself to flexible model formulation. However, as noted in introduction, inference can be mis-calibrated because of propagation of errors16.

### Joint likelihood function

Let *Ω* be the full parameter vector containing all fixed parameters from the longitudinal and time-to-event sub-models respectively. We assume: ![Formula][41]</img>  ![Formula][42]</img>  ![Formula][43]</img>  ![Formula][44]</img>  

Given (A1)-(A4), it is appropriate to assume that the random effects *b**i* account for: the association between the *L* longitudinal traits; the association between both the longitudinal and time-to-event outcomes66; and the serial correlation between the repeated measurements in each longitudinal process22; and that given (A4), the frailty term accounts for the residual dependency between the time-to-event traits24. Under these conditional independence assumptions, the joint likelihood function of the model parameters given the observed data is: ![Formula][45]</img>  where:

*   (*T**i* *δ**i*) = ((*T**i*(1), *δ**i*(1)) ′,…, (*T**i*(*k*), *δ**i*(*k*)) ′, …, (*T**i*(*K*), *δ**i*(*K*)) ′) ′

*   *Y**i* *=* (*Y**i*(1) ′, …, *Y**i*(*l*) ′, …, *Y**i*(*L*) ′) ′

*   ![Graphic][46]</img>

*   ![Graphic][47]</img>

*   ![Graphic][48]</img>

*   ![Graphic][49]</img>, where *q* is the dimension of the *D* matrix

*   ![Graphic][50]</img> i.e. we assume *u**i*∼Γ(*a, b*), with *a, b*>0.

Calculation of this full likelihood requires multivariate integration with respect to the random effect’s distribution, which can lead to demanding computation. When the random effect vector *b**i*, has a small dimension, say less than 3, the integral can be evaluated via Gaussian quadrature which approximates the integral by a weighted sum of the target function evaluated at pre-specified sample points. However, when the dimension is larger, it is demanding to calculate the integrals with satisfactory approximation accuracy. Although a full likelihood specification enables rigorous study of asymptotic properties, its large sample approximation may not be accurate when sample size is small. In comparison, the Bayesian paradigm does not require asymptotic approximations, but the design of an efficient sampling algorithm to study the posterior distribution is challenging.

### Likelihood functions under the two-stage approximation

Let *Ω* and *Г* be the vectors containing all fixed parameters from the longitudinal and time-to-event sub-models respectively.

#### *Stage1:* Multivariate mixed model

![Formula][51]</img> 

#### *Stage2:* Multivariate Cox PH model adjusted for fitted values of the trajectories ![Graphic][52]</img>

![Formula][53]</img>  Where:

*   ![Graphic][54]</img>

*   ![Graphic][55]</img>

*   With ![Graphic][56]</img>

Unlike the *joint likelihood function*, where the shared random effects *b**i* account for the dependencies between the longitudinal and the time-to-event traits, the two-stage approach accounts for the dependencies between the longitudinal and time-to-event traits via the fitted values of the longitudinal trajectories. As mentioned previously, this approximation can produce biased estimates and/or underestimated standard errors16, particularly in presence of non-random censoring of the longitudinal trait values due to the occurrence of an event or from informative dropout50,51 and/or because of propagation errors of Stage 1 parameters in Stage216. Indeed, informative missingness/dropouts lead in differential follow-up between patients with and without an event, the time-to-event processes are related to the length of the follow-up, and thus the random effects *b**i*(*l*) can depend on the event times (e.g. patients who have an event early are more likely to have positive random slopes). However, in absence of informative dropouts/missingness, as we showed in our simulation studies, this approach has low bias and is computationally feasible for genetic association studies.

## Abbreviations

DCCT
:   Diabetes control and complications trial
DR
:   diabetic retinopathy
DN
:   diabetic nephropathy
GWAS
:   genome-wide association study
HbA1c
:   Hemoglobin A1c
LD
:   linkage disequilibrium
MAF
:   minor allele frequency
PH
:   proportional hazards
QT(s)
:   quantitative trait(s)
SBP
:   systolic blood pressure
SNP
:   single nucleotide polymorphism
T1DC
:   type 1 diabetes complications

*   Received May 10, 2021.
*   Revision received May 10, 2021.
*   Accepted May 11, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/)

## References

1.  1.Ibrahim, J.G., Chu, H., and Chen, L.M. (2010). Basic concepts and methods for joint models of longitudinal and survival data. J. Clin. Oncol. 28, 2796–2801.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjEwOiIyOC8xNi8yNzk2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDUvMTEvMjAyMS4wNS4xMC4yMTI1Njg4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

2.  2.Chen, L.M., Ibrahim, J.G., and Chu, H. (2011). Sample size and power determination in joint modeling of longitudinal and survival data. Stat. Med. 30, 2295–2309.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21590793&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

3.  3.Hogan, J.W., and Laird, N.M. (1998). Increasing efficiency from censored survival data by using random effects to model longitudinal covariates. Stat. Methods Med. Res. 7, 28–48.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/096228029800700104&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9533260&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

4.  4.Wu, L., Liu, W., Yi, G.Y., and Huang, Y. (2012). Analysis of longitudinal and survival data: joint modeling, inference methods, and issues. J. Probab. Stat. 2012, 1–17.
    
    
5.  5.Asar, Ö., Ritchie, J., Kalra, P.A., and Diggle, P.J. (2015). Joint modelling of repeated measurement and time-to-event data: An introductory tutorial. Int. J. Epidemiol. 44, 334–344.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyu262&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25604450&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

6.  6.Hickey, G.L., Philipson, P., Jorgensen, A., and Kolamunnage-Dona, R. (2016). Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC Med. Res. Methodol. 16, 117.
    
    
7.  7.Papageorgiou, G., Mauff, K., Tomer, A., and Rizopoulos, D. (2019). An overview of joint modeling of time-to-event and longitudinal outcomes. Annu. Rev. Stat. Its Appl. 6, 223–240.
    
    
8.  8.Hickey, G.L., Philipson, P., Jorgensen, A., and Kolamunnage-Dona, R. (2018). Joint models of longitudinal and time-to-event data with more than one event time outcome: a review. Int. J. Biostat. 14,.
    
    
9.  9.Zhu, H., Ibrahim, J.G., Chi, Y.Y., and Tang, N. (2012). Bayesian influence measures for joint models for longitudinal and survival data. Biometrics 68, 954–964.
    
    
10. 10.Tang, N.S., Tang, A.M., and Pan, D.D. (2014). Semiparametric Bayesian joint models of multivariate longitudinal and survival data. Comput. Stat. Data Anal. 77, 113–129.
    
    
11. 11.Tang, A.M., and Tang, N.S. (2014). Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Stat. Med. 34, 824–843.
    
    
12. 12.Bycott, P., and Taylor, J. (1998). A comparison of smoothing techniques for CD4 data measured with error in a time-dependent Cox proportional hazards model. Stat. Med. 17, 2061–2077.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/(SICI)1097-0258(19980930)17:18<2061::AID-SIM896>3.0.CO;2-O&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9789914&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000075939000003&link_type=ISI) 

13. 13.Self, S., and Pawitan, Y. (1992). Modeling a marker of disease progression and onset of disease. In AIDS Epidemiology, (Boston, MA: Birkhäuser Boston), pp. 231–255.
    
    
14. 14.Tsiatis, A.A., Degruttola, V., and Wulfsohn, M.S. (1995). Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. J. Am. Stat. Assoc. 90, 27–37.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2291126&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QH03000004&link_type=ISI) 

15. 15.Dafni, U.G., and Tsiatis, A.A. (1998). Evaluating Surrogate Markers of Clinical Outcome When Measured with Error. Biometrics 54, 1445.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2533670&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9883544&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000077898700020&link_type=ISI) 

16. 16.Wulfsohn, M.S., and Tsiatis, A.A. (2006). A joint model for survival and longitudinal data measured with error. Biometrics 53, 330.
    
    
17. 17.Paterson, A.D., and Bull, S.B. (2012). Does familial clustering of risk factors for long-term diabetic complications leave any place for genes that act independently? J. Cardiovasc. Transl. Res. 5, 388–398.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22729868&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

18. 18. The Diabetes Control and  Complications Trial Research Group (1993). The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N. Engl. J. Med. 329, 977–986.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJM199309303291401&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8366922&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1993LY58700001&link_type=ISI) 

19. 19.Paterson, A.D., Waggott, D., Boright, A.P., Hosseini, S.M., Shen, E., Sylvestre, M.P., Wong, I., Bharaj, B., Cleary, P.A., Lachin, J.M., et al. (2010). A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose. Diabetes 59, 539–549.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6ODoiNTkvMi81MzkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNS8xMS8yMDIxLjA1LjEwLjIxMjU2ODgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

20. 20.Lind, M., Odén, A., Fahlén, M., and Eliasson, B. (1995). The relationship of glycemic exposure (HbA(1c)) to the risk of development and progression of retinopathy in the diabetes control and complications trial. Diabetes 53, 1093–1098.
    
    
21. 21.Lind, M., Odén, A., Fahlén, M., and Eliasson, B. (2010). The shape of the metabolic memory of HbA1c: Re-analysing the DCCT with respect to time-dependent effects. Diabetologia 53, 1093–1098.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00125-010-1706-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20237754&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

22. 22.Laird, N.M., and Ware, J.H. (1982). Random-effects models for longitudinal data. Biometrics 38, 963–974.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2529876&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7168798&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1982QF89700008&link_type=ISI) 

23. 23.Rizopoulos, D. (2012). Joint models for longitudinal and time-to-event data, with applications in R (Chapman and Hall/CRC).
    
    
24. 24.Hougaard, P. (1995). Frailty models for survival data. Lifetime Data Anal. 1, 255–273.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF00985760&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9385105&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

25. 25.Mauff, K., Steyerberg, E.W., Nijpels, G., van der Heijden, A.A.W.A., and Rizopoulos, D. (2017). Extension of the association structure in joint models to include weighted cumulative effects. Stat. Med. 36, 3746–3759.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7385&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28714278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

26. 26.Tsiatis, A.A., and Davidian, M. (2001). A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika 88, 447–458.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/88.2.447&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000169573400012&link_type=ISI) 

27. 27.Tsiatis, A.A., and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Stat. Sin. 14, 809–834.
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000223652100011&link_type=ISI) 

28. 28.Hickey, G.L., Philipson, P., Jorgensen, A., and Kolamunnage-Dona, R. (2018). JoineRML: A joint model and software package for time-to-event and multivariate longitudinal outcomes. BMC Med. Res. Methodol. 18,.
    
    
29. 29.Therneau, T.M. (2020). A Package for survival analysis in R. R package version 3.2-7, URL [http://CRAN.R-project.org/package=survival](http://CRAN.R-project.org/package=survival).
    
    
30. 30. Terry M. Therneau, and  Patricia M. Grambsch (2000). Modeling survival data: Extending the Cox model (New York: Springer).
    
    
31. 31.Efron, B. (1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika 68, 589.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/68.3.589&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1981MR77800002&link_type=ISI) 

32. 32.Steffes, M., Cleary, P., Goldstein, D., Little, R., Wiedmeyer, H.-M.M., Rohlfing, C., England, J., Bucksa, J., and Nowicki, M. (2005). Hemoglobin A1c measurements over nearly two decades: sustaining comparable values throughout the Diabetes Control and Complications Trial and the Epidemiology of Diabetes Interventions and Complications study. Clin. Chem. 51, 753–758.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiY2xpbmNoZW0iO3M6NToicmVzaWQiO3M6ODoiNTEvNC83NTMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNS8xMS8yMDIxLjA1LjEwLjIxMjU2ODgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

33. 33.Lorenzi, G.M., Braffett, B.H., Arends, V.L., Danis, R.P., Diminick, L., Klumpp, K.A., Morrison, A.D., Soliman, E.Z., Steffes, M.W., Cleary, P.A., et al. (2015). Quality control measures over 30 years in a multicenter clinical study: Results from the Diabetes Control and Complications Trial / Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) study. PLoS One 10, e0141286.
    
    
34. 34.Hosseini, S.M., Boright, A.P., Sun, L., Canty, A.J., Bull, S.B., Klein, B.E.K., Klein, R., and Paterson, A.D. (2015). The association of previously reported polymorphisms for microvascular complications in a meta-analysis of diabetic retinopathy. Hum. Genet. 134, 247–257.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00439-014-1517-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25487307&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

35. 35.Roshandel, D., Gubitosi-Klug, R., Bull, S.B., Canty, A.J., Pezzolesi, M.G., King, G.L., Keenan, H.A., Snell-Bergeon, J.K., Maahs, D.M., Klein, R., et al. (2018). Meta-genome-wide association studies identify a locus on chromosome 1 and multiple variants in the MHC region for serum C- peptide in type 1 diabetes. Diabetologia 61, 1098–1111.
    
    
36. 36.1000 Genomes Project Consortium, Auton, A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Kang, H.M., Korbel, J.O., Marchini, J.L., McCarthy, S., McVean, G.A., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

37. 37.Das, S., Forer, L., Schönherr, S., Sidore, C., Locke, A.E., Kwong, A., Vrieze, S.I., Chew, E.Y., Levy, S., McGue, M., et al. (2016). Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3656&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27571263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

38. 38.Wheeler, E., Leong, A., Liu, C.-T.T., Hivert, M.-F.F., Strawbridge, R.J., Podmore, C., Li, M., Yao, J., Sim, X., Hong, J., et al. (2017). Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. PLoS Med. 14, e1002383.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1002383&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28898252&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

39. 39.Evangelou, E., Warren, H.R., Mosen-Ansorena, D., Mifsud, B., Pazoki, R., Gao, H., Ntritsos, G., Dimou, N., Cabrera, C.P., Karaman, I., et al. (2018). Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0205-x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30224653&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

40. 40.Pollack, S., Igo, R.P., Jensen, R.A., Christiansen, M., Li, X., Cheng, C.-Y., Ng, M.C.Y., Smith, A. V, Rossin, E.J., Segrè, A. V, et al. (2019). Multiethnic genome-wide association study of diabetic retinopathy using liability threshold modeling of duration of diabetes and glycemic control. Diabetes 68, 441–456.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6ODoiNjgvMi80NDEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNS8xMS8yMDIxLjA1LjEwLjIxMjU2ODgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

41. 41.Grassi, M.A., Tikhomirov, A., Ramalingam, S., Below, J.E., Cox, N.J., and Nicolae, D.L. (2011). Genome-wide meta-analysis for severe diabetic retinopathy. Hum. Mol. Genet. 20, 2472–2481.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddr121&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21441570&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000290849200017&link_type=ISI) 

42. 42.Sandholm, N., Salem, R.M., McKnight, A.J., Brennan, E.P., Forsblom, C., Isakova, T., McKay, G.J., Williams, W.W., Sadlier, D.M., Mäkinen, V.-P.P., et al. (2012). New susceptibility loci associated with kidney disease in type 1 diabetes. PLoS Genet. 8, e1002921.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1002921&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23028342&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

43. 43.Li, J., and Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity (Edinb). 95, 221–227.
    
    
44. 44.Kraft, P. (2008). Curses—Winner’s and Otherwise—in Genetic Epidemiology. Epidemiology 19, 649–651.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/EDE.0b013e318181b865&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18703928&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000258712000002&link_type=ISI) 

45. 45.Sun, L., Dimitromanolakis, A., Faye, L.L., Paterson, A.D., Waggott, D., DCCT/EDIC Research Group, and Bull, S.B. (2011). BR-squared: a practical solution to the winner’s curse in genome-wide scans. Hum. Genet. 129, 545–552.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00439-011-0948-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21246217&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

46. 46.Schaid, D.J., Tong, X., Larrabee, B., Kennedy, R.B., Poland, G.A., and Sinnwell, J.P. (2016). Statistical methods for testing genetic pleiotropy. Genetics 204, 483–497.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZ2VuZXRpY3MiO3M6NToicmVzaWQiO3M6OToiMjA0LzIvNDgzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDUvMTEvMjAyMS4wNS4xMC4yMTI1Njg4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

47. 47.Liu, L., Zheng, C., and Kang, J. (2018). Exploring causality mechanism in the joint analysis of longitudinal and survival data. Stat. Med. 37, 3733–3744.
    
    
48. 48.Sobel, M.E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociol. Methodol. 13, 290.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/270723&link_type=DOI) 

49. 49.Mackinnon, D.P., Warsi, G., and Dwyer, J.H. (1995). A simulation study of mediated effect measures. Multivariate Behav. Res. 30, 41.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1207/s15327906mbr3001_3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20157641&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QX94500003&link_type=ISI) 

50. 50.Albert, P.S., Shih, J.H., Albert, P.S., Shih, J.H., Albert1’, P.S., and Shih2, J.H. (2010). On estimating the relationship between longitudinal measurements and time-to-event data using a simple two-stage procedure. Biometrics 66, 983–991.
    
    
51. 51.Ye, W., Lin, X., and Taylor, J.M.G. (2008). Semiparametric modeling of longitudinal measurements and time-to-event data - A two-stage regression calibration approach. Biometrics 64, 1238–1246.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1541-0420.2007.00983.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18261160&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000261054500027&link_type=ISI) 

52. 52.Rizopoulos, D. (2010). JM: An R Package for the joint modelling of longitudinal and time-to-event data. J. Stat. Softw. 35,.
    
    
53. 53.Rizopoulos, D. (2016). The R package JMbayes for fitting joint models for longitudinal and time-to-event data using MCMC. J. Stat. Softw. 72,.
    
    
54. 54.Arisido, M.W., Antolini, L., Bernasconi, D.P., Valsecchi, M.G., and Rebora, P. (2019). Joint model robustness compared with the time-varying covariate Cox model to evaluate the association between a longitudinal marker and a time-to-event endpoint. BMC Med. Res. Methodol. 19, 222.
    
    
55. 55.Sikorska, K., Lesaffre, E., Groenen, P.J.F., Rivadeneira, F., and Eilers, P.H.C. (2018). Genome-wide analysis of large-scale longitudinal outcomes using penalization - GALLOP algorithm. Sci. Rep. 8,.
    
    
56. 56.Rizvi, A.A., Karaesmen, E., Morgan, M., Preus, L., Wang, J., Sovic, M., Hahn, T., and Sucheston-Campbell, L.E. (2019). gwasurvivr: an R package for genome-wide survival analysis. Bioinformatics 35, 1968–1970.
    
    
57. 57.Bi, W., Fritsche, L.G., Mukherjee, B., Kim, S., and Lee, S. (2020). A fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank. J. Clean. Prod. 107, 222–233.
    
    
58. 58.Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L.T., Sharp, K., Motyer, A., Vukcevic, D., Delaneau, O., O’Connell, J., et al. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

59. 59.Scholtens, S., Smidt, N., Swertz, M.A., Bakker, S.J.L., Dotinga, A., Vonk, J.M., Van Dijk, F., Van Zon, S.K.R., Wijmenga, C., Wolffenbuttel, B.H.R., et al. (2015). Cohort Profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 44, 1172–1180.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyu229&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25502107&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F05%2F11%2F2021.05.10.21256880.atom) 

60. 60.Dummer, T.J.B., Awadalla, P., Boileau, C., Craig, C., Fortier, I., Goel, V., Hicks, J.M.T., Jacquemont, S., Knoppers, B.M., Le, N., et al. (2018). The Canadian Partnership for Tomorrow Project: A pan-Canadian platform for research on chronic disease prevention. Cmaj 190, E710–E717.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiY21haiI7czo1OiJyZXNpZCI7czoxMToiMTkwLzIzL0U3MTAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNS8xMS8yMDIxLjA1LjEwLjIxMjU2ODgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

61. 61.Gasparini, A., Abrams, K.R., Barrett, J.K., Major, R.W., Sweeting, M.J., Brunskill, N.J., and Crowther, M.J. (2020). Mixed-effects models for health care longitudinal data with an informative visiting process: A Monte Carlo simulation study. Stat. Neerl. 74, 5–23.
    
    
62. 62.Young, A.I., Benonisdottir, S., Przeworski, M., and Kong, A. (2019). Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjUvNjQ2MC8xMzk2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDUvMTEvMjAyMS4wNS4xMC4yMTI1Njg4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

63. 63.Bull, L., Lunt, M., Martin, G., Hyrich, K., and Sergeant, J. (2020). Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods. Diagnostic Progn. Res. 4, 1–16.
    
    
64. 64.Sweeting, M.J., and Thompson, S.G. (2011). Joint modelling of longitudinal and time-to-event data with application to predicting abdominal aortic aneurysm growth and rupture. Biometrical J. 53, 750–763.
    
    
65. 65.Yuen, H.P., Mackinnon, A., Hartmann, J., Amminger, G.P., Markulev, C., Lavoie, S., Schäfer, M.R., Polari, A., Mossaheb, N., Schlögelhofer, M., et al. (2018). Dynamic prediction of transition to psychosis using joint modelling. Schizophr. Res. 202, 333–340.
    
    
66. 66.Ibrahim, J.G., Chen, M.-H., and Sinha, D. (2001). Joint models for longitudinal and survival data. In EMR-IBS Bi-Annual Meeting, pp. 262–289.

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif
 [4]: /embed/inline-graphic-4.gif
 [5]: /embed/inline-graphic-5.gif
 [6]: /embed/inline-graphic-6.gif
 [7]: /embed/inline-graphic-7.gif
 [8]: /embed/graphic-2.gif
 [9]: /embed/graphic-3.gif
 [10]: /embed/inline-graphic-8.gif
 [11]: /embed/inline-graphic-9.gif
 [12]: /embed/inline-graphic-10.gif
 [13]: /embed/inline-graphic-11.gif
 [14]: /embed/graphic-4.gif
 [15]: /embed/graphic-5.gif
 [16]: /embed/inline-graphic-12.gif
 [17]: /embed/inline-graphic-13.gif
 [18]: /embed/inline-graphic-14.gif
 [19]: /embed/inline-graphic-15.gif
 [20]: /embed/inline-graphic-16.gif
 [21]: /embed/inline-graphic-17.gif
 [22]: /embed/inline-graphic-18.gif
 [23]: /embed/inline-graphic-19.gif
 [24]: /embed/inline-graphic-20.gif
 [25]: /embed/inline-graphic-21.gif
 [26]: /embed/inline-graphic-22.gif
 [27]: /embed/inline-graphic-23.gif
 [28]: F3/embed/inline-graphic-24.gif
 [29]: F3/embed/inline-graphic-25.gif
 [30]: F3/embed/inline-graphic-26.gif
 [31]: /embed/inline-graphic-27.gif
 [32]: /embed/inline-graphic-28.gif
 [33]: /embed/inline-graphic-29.gif
 [34]: /embed/inline-graphic-30.gif
 [35]: /embed/inline-graphic-31.gif
 [36]: /embed/inline-graphic-32.gif
 [37]: /embed/inline-graphic-33.gif
 [38]: /embed/inline-graphic-34.gif
 [39]: /embed/inline-graphic-35.gif
 [40]: /embed/inline-graphic-36.gif
 [41]: /embed/graphic-13.gif
 [42]: /embed/graphic-14.gif
 [43]: /embed/graphic-15.gif
 [44]: /embed/graphic-16.gif
 [45]: /embed/graphic-17.gif
 [46]: /embed/inline-graphic-37.gif
 [47]: /embed/inline-graphic-38.gif
 [48]: /embed/inline-graphic-39.gif
 [49]: /embed/inline-graphic-40.gif
 [50]: /embed/inline-graphic-41.gif
 [51]: /embed/graphic-18.gif
 [52]: /embed/inline-graphic-42.gif
 [53]: /embed/graphic-19.gif
 [54]: /embed/inline-graphic-43.gif
 [55]: /embed/inline-graphic-44.gif
 [56]: /embed/inline-graphic-45.gif