Statistical tests for heterogeneity of clusters and composite endpoints ======================================================================= * Anthony J. Webster ## Abstract Clinical trials and epidemiological cohort studies often group similar diseases together into a composite endpoint, to increase statistical power. A common example is to use a 3-digit code from the International Classification of Diseases (ICD), to represent a collection of several 4-digit coded diseases. More recently, data-driven studies are using associations with risk factors to cluster diseases, leading this article to reconsider the assumptions needed to study a composite endpoint of several potentially distinct diseases. An important assumption is that the (possibly multivariate) associations are the same for all diseases in a composite endpoint (not heterogeneous). Therefore, multivariate measures of heterogeneity from meta-analysis are considered, including multi-variate versions of the *I*2 and *Q* statistics. Whereas meta-analysis offers tools to test heterogeneity of clustering studies, clustering models suggest an alternative heterogeneity test, of whether the data are better described by one, or more, clusters of elements with the same mean. The assumptions needed to model composite endpoints with a proportional hazards model are also considered. It is found that the model can fail if one or more diseases in the composite endpoint have different associations. Tests of the proportional hazards assumption can help identify when this occurs. It is emphasised that in multi-stage diseases such as cancer, some germline genetic variants can strongly modify the baseline hazard function and cannot be adjusted for, but must instead be used to stratify the data. ## Introduction It is common for epidemiological studies [1] and clinical trials [2–4] to group similar diseases together into a cluster of diseases, providing more total cases, and more statistical power to detect associations. This procedure has been essential since the pioneering epidemiological studies of John Graunt in the 1600s [5], providing sufficient cases to allow meaningful statistical study, while attempting to ensure that clustered diseases have a similar etiology. Today an increasingly detailed biological understanding informs the clustering of diseases. One common approach is to use the heirarchical classification of the International Classification of Diseases (ICD) as a guide for which diseases to group [6]. The ICD system clusters increasingly detailed disease descriptions into larger clusters with similar disease etiology, often allowing larger clusters to be used in epidemiological studies. In practice, diseases are selected by a clinician to help ensure that only diseases with a clearly defined and common etiology are clustered together [1]. With data-driven clustering studies increasingly being used to identify potential composite endpoints [1, 7–11], it seems appropriate to review the assumptions needed to study a cluster of diseases, and the existing arguments for and against doing so in clinical trials. In composite endpoints, or diseases clustered using shared risk factors [1], a necessary requirement is that risk-factor associations are the same (homogeneous). A similar requirement arose in meta-analysis, and led to the development of heterogeneity tests [12]. The multivariate versions of these tests, their origin, and their application to assess the heterogeneity of risk-factor associations within a composite endpoint or a clustering study is described. In the process, an alternative clustering-based heterogeneity test is suggested that offers a different perspective to the conventional *Q* and *I*2 statistics [13, 14], that are commonly used in meta-analyses [12]. The proportional hazards modelling is ubiquitous in medical research and statistical epidemiology, so the implicit assumptions needed when using it to model a composite endpoint are explored. Heterogeneity of disease associations with risk-factors in a composite endpoint, are sufficient to cause the proportional hazards assumptions to fail. Therefore it is necessary for tests of the proportional hazards assumption to be satisfied, and these can be used when tests of heterogeneity are inappropriate due to insufficient data for example. From a statistical modelling perspective, the key requirement is that risk-factor associations are the same for diseases within a composite endpoint, baseline incidence rates can be arbitrarily different. (Although as will be discussed in the context of clinical trials, insufficient cases of a disease that prevent tests of heterogeneity, should only be included if there are strong prior reasons to expect the same risk-factor associations as for the other diseases in the endpoint.) The conceptual multi-stage model of diseases [15] such as cancer [16–18] and motor neuron disease [19–21], is used to highlight some additional implicit assumptions that are needed when the proportional hazards methodology is used. ### Composite endpoints in clinical trials It is worth briefly recapping the benefits and concerns about using composite endpoints in clinical trials. ### Reasons to consider composite endpoints There are several good reasons to use composite endpoints: 1. Diseases are implicitly defined as a composite endpoint of symptoms and biological measurements. (More precise sub-groups sometimes have different behaviours [9, 10, 22, 23].) 2. Statistical power - If a drug targets a pathway that modifies the risk of several otherwise distinct diseases, or if a disease is a symptom of onset of more serious disease, then it makes sense to study these together to increase statistical power. The importance of both i) and ii) are apparent in the tables of disease studied by John Graunt in the 1600s [5], where it is difficult to compare diseases when cases are rare or erratic. Weaker arguments to use composite endpoints include: 3. Interest is in avoiding any negative disease outcomes, not just particularly severe ones. 4. Interest is in testing the potential influence of a potential drug or risk factor on a wide range of diseases. These arguments iii) and iv) are more easily criticised as “fishing”, casting a wider net to try and catch more diseases that may be influenced by a new drug without strong prior reason to do so, and increasing the risk of false positive associations. ### Criticisms of composite endpoints There are several serious criticisms and limitations of using composite endpoints, some of which are easily corrected, but others that appear unavoidable: 1. Early criticisms of composite endpoints were, to quote Cordoba et al. [24], that “Components are often unreasonably combined, inconsistently defined, and inadequately reported.”. 2. Endpoints can be of different importance to patients. This has led several authors to argue for weighting of endpoints based on importance to patients [2, 25, 26]. 3. Infrequent disease - Insufficient cases of a disease in a composite endpoint can make it impossible to test if effect sizes are comparable, e.g. due to unreliable confidence intervals. 4. Similar effect sizes - If effect sizes are dissimilar, then it is impractical to meaningfully interpret results. If disease risk is being modified through a shared disease pathway, then we might expect a similar proportional reduction in disease risk. Criticisms in a) can be avoided by careful study design, specification, and reporting. The concerns in c) limit studies to diseases with sufficient cases of each distinct disease to allow meaningful tests of similar effect sizes. A more nuanced concern is b), the argument that endpoints should be of equal importance to different patients. To develop understanding and treatments, it is most important to identify endpoints with a shared disease pathway. Whereas diseases sharing this pathway could have very different relevance to patients, diseases should only be included if they have the same underlying cause. Despite the limitations, points (i) and (ii) emphasise that composite endpoints are often reasonable, or necessary, and in these cases it is important to understand the assumptions being made when we study them. Research that is intended to cluster diseases by common disease pathways will also lead to newly hypothesised composite endpoints, and it is helpful to establish tests and understand the assumptions that disease clusters are consistent with. ### Heterogeneity of associations Heterogeneity tests are widely used in meta-analyses, and are intended to assess whether the reported associations are the same in several different studies [12]. For composite endpoints we wish to assess whether one or more associations are the same for all diseases in a composite endpoint. The method’s multivariate generalisation that is needed for clustering studies such as Ref. [1], is described below. Multivariate heterogeneity tests were originally developed for meta-analyses [27–29], for which case a random effects model is often more suitable. A discussion of the multivariate test for meta-analyses in a random effects model, can be found in [30]. Here we explain the basis for the (fixed effects) multivariate heterogeneity test, that is most relevant to composite endpoints. Later a test that originates from clustering studies is suggested, that offers an alternative approach with some advantages over conventional *Q* and *I*2 statistics [12]. The null hypothesis (in a fixed effects model), is that all diseases in a composite endpoint have the same associations with one or more parameters, such as a drug, or a collection of potential risk factors. These might be a subset of associations, with potential confounders adjusted for, and subsequently removed by marginalisation [1]. Consider *m* composite endpoints (or clusters of diseases), labelled by *g*. Under the null hypothesis of the same associations for diseases in a composite endpoint, labelled *i* = 1 to *i* = *n**g*, ![Formula][1] where ![Graphic][2] is the covariance (Γ*i* is the precision matrix), and *µ**g* are the (unknown) associations that we are estimating, that are assumed to be the same for all diseases in the composite endpoint. Eq. 1 requires [31], ![Formula][3] where *p* is the dimension. Therefore because the sum of *n**g* random variables that are individually ![Graphic][4] distributed is ![Graphic][5], ![Formula][6] For *p* = 1 this has, ![Formula][7] for standard deviations *σ**i*. Because *µ**g* is unknown and must be estimated, the test statistic is modified, as explained next. Using Bayes theorem with either a flat or normal prior for the mean *µ**g*, a cluster of diseases with covariances {Γ*i*} and the same mean *µ**g*, have [32], ![Formula][8] where, ![Formula][9] and, ![Formula][10] where if a normal prior is used then the sum over *i* includes the prior’s mean *µ* and covariance Λ, and the sum is from *i* = 0 to *i* = *n**g*. For a flat prior, the sum is from *i* = 1 to *i* = *n**g*. The subscripts *g* allow the discussion to include more than one cluster of diseases, as was considered in Webster et al. [1]. Here unless stated otherwise, we consider a single composite endpoint, and the subscript *g* could be omitted. As a result of Eq. 5, we have, ![Formula][11] These observations, together with Eq. 1, can be used to derive a multivariate test for the assumption that the normal distributions have the same mean. Appendix A shows that, ![Formula][12] Using this with Eqs. 3 and 8, ![Formula][13] The left side of Eq. 10 is the Q statistic. It provides a test for the assumption that the (approximately) normally distributed estimates {*X**i*} have the same mean, as is assumed for a composite endpoint or cluster of diseases. For *p* = 1, these expressions give the well known inverse variance weighted heterogeneity test, that are regularly used in meta analyses and 2-sample Mendelian randomisation studies [12, 33]. For the situation described in Webster et al. [1], the aim is to assess the goodness of fit for a clustering of diseases. In effect, there is a set of composite endpoints being considered, and the task is to determine the optimum split of diseases into composite endpoints with similar risk-factor associations. For this situation, Eq. 10 is modified to sum over all *m* clusters, and the Q statistic becomes, ![Formula][14] where we used ![Graphic][15], diseases in cluster and *C**g* is the set of diseases in cluster *g* (composite endpoint g). The results above correspond to a fixed effects model where the data are assumed to have the same mean, as opposed to the means being sampled from an underlying distribution (a random effects model). A random effects model samples each study’s mean *µ**i*, with ![Graphic][16], and assumes measured estimates *X**i* have ![Graphic][17] Marginalising out *µ**i* gives ![Graphic][18], that replaces Eq. 1. Assuming that the data belong to a single cluster, then Eq. 5 becomes ![Graphic][19], and Eqs. 6 and 7 become, ![Formula][20] and ![Formula][21] as in Jackson et al. [30] (where ![Graphic][22] is the inverse of the covariance). To calculate the *Q* and *I*2 statistics using Eq. 10, Γ*i* is replaced by ![Graphic][23] with ![Graphic][24], and *n**g* is the total number of studies. The above arguments and results could be modified to consider a random effects model with different priors for each cluster. ### Assessing heterogeneity within meta-analyses The classical measure of heterogeneity uses the Q statistic, that was derived above in a multi-variate context. The I-square statistic [13, 14] is closely related to the Q statistic [12], and in the notation above, is for Eq. 11, ![Formula][25] where *Q* is the left side of Eq. 11, and the factor of 100 is conventionally used to express *I*2 as a percentage. Unlike in figures 1 and 2, *I*2 is usually set to zero if its evaluation is negative. The equivalent expression for a single cluster that uses the left side of Eq. 10 for *Q*, would replace the number of degrees of freedom (*n − m*)*p*, with (*n**g* − 1)*p* in Eq. 14. The *I*2 statistic replaces a test with a more nuanced measure of heterogeneity that is particularly useful when some heterogeneity is expected, but it no longer provides an objective statistical test. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/06/2021.06.16.21258900/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2021/07/06/2021.06.16.21258900/F1) Figure 1: For a flat prior and normally distributed test data [32], the *Q*-statistic and log-likelihood correctly identify the 50 clusters of test data with 2 to 14 members each [32]. Interestingly, the rate of reduction of *I*2 slowed after the 50 clusters were identified. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/06/2021.06.16.21258900/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2021/07/06/2021.06.16.21258900/F2) Figure 2: For a flat prior and UK Biobank disease data [1], the clustering log-likelihood of Webster [32] is minimised at *m* = 37, where *I*2 ≃ 50% A statistical heterogeneity test using *Q* fails for less than 63 clusters. With a normal prior centred on zero, *I*2 reduces and the maximum in log-likelihood moves to *m* ∼ 25, and was fairly insensitive to the prior’s covariance [32]. This article was originally motivated by the need to better understand and characterise the results from clustering studies, such as assessing the heterogeneity of the underlying clusters. However, clustering studies are becoming increasingly rigorous (Figures 1 and 2), and can offer an alternative approach to test the heterogeneity of composite endpoints. Webster [32] provides a log-likelihood for clustering normally distributed data such as that from maximum likelihood estimates, that assumes equal means in each cluster. The log-likelihood is maximised to determine whether the data fit more naturally in one or more clusters. This provides an intuitive and objective assessment of heterogeneity, that in the examples shown in figures 1 and 2, was more lenient than a statisical test using *Q*. Conventional heterogeneity tests estimate the mean effect size across several groups, and then assess or test, whether the mean effect size is consistent with all the studies. In contrast, when assessing clusterings of data with the model in Webster [32], equal means are assumed in all clusters, and minimising the log-likelihood determines whether the study data are best explained by one, or more clusters. This objective test is different to asking whether there are statistical differences between data, and is more tolerant of heterogeneity. It is easy to incorporate prior information into the approach, such as a weak normal prior for zero effect size. The selection of priors in a (1-dimensional) Bayesian meta-analysis is discussed in Ott et al. [34]. In principle, clustering methods could allow the identification of statistically distinct subgroups or outliers. This and the relative merits of the approach need considering in greater detail elsewhere, but it appears to offer a potentially useful alternative to conventional measures of heterogeneity. ### Composite endpoints in proportional hazards studies Composite endpoints are frequently studied with proportional hazards models. This section explores the assumptions that are needed for the model to be correct, and for its estimates to be meaningful. Firstly consider *m* diseases in a composite endpoint that may be influenced by the same factors or processes, but are otherwise independent of each other. The survival probability is determined from the probability of an individual surviving all *m* diseases until time *t*. If *S**j*(*t*) is the probability of surviving disease *j* to time *t*, and *S*(*t*) is the probability of surviving all *m* independent diseases, then [15], ![Formula][26] Writing *S**j*(*t*) in terms of its hazard function *h**j*(*t*), with ![Graphic][27], then Eq. 15 requires, ![Formula][28] giving the hazard function *h*(*t*) for a group of diseases as [15], ![Formula][29] The proportional hazards model takes ![Graphic][30] for an individual *i* with covariates *X**i*, and assumes that *h* (*t*) is the same for all individuals in a population. If the assumption holds for each disease in a cluster or composite endpoint, then we can write, ![Formula][31] If all the diseases have the same risk-factor associations *β*, then, ![Formula][32] and the proportional hazards assumption will remain true for the cluster of diseases, with all the time-dependence in a single factor that will be the same for all individuals. However if any disease *k* has *β**k* *≠ β*, then, ![Formula][33] and the proportional hazards assumption fails. Note that this is the case if the assumption fails for any single covariate. The observations described by Eqs. 15-20 have the following implications: 1. For the proportional hazards assumption to hold, it is necessary that all clustered diseases have the same associations. 2. Counter-intuitively, the baseline hazard for each individual disease could be very different with diseases prevalent at very different ages, but the cluster of diseases can still satisfy the proportional hazards assumption. 3. If the proportional hazards model fails, it could be due to one or more of the clustered diseases having different risk associations - it need not necessarily be caused by different baseline hazards *h**j* (*t*) in the underlying population. Points 1 and 2 emphasise that diseases must be clustered on the basis of similar underlying etiology, and that incidence rates are irrelevant and potentially misleading for determining whether diseases can be considered as a single cluster and modelled with a proportional hazards model. Point 3 indicates that failure of the proportional hazards model, as identified by a statistical test using the Schoenfeld residuals for example [35], could be due to one or more diseases having different risk factors. This provides a consistency test for whether a cluster of diseases have the same risk factors, without fitting and comparing each disease individually, which may not always be possible. ### General remarks If statistical tests are consistent with the proportional hazards assumption, then it can be reported that the diseases in the composite endpoint are consistent with having the same associations (within a proportional hazards model). This does not ensure that diseases have the same associations, only that there is no strong statistical evidence for them being different. Importantly, as was noted in the context of clinical trials, if the number of cases of a disease are few then they are unlikely to have much influence on a test result. In those cases the disease should only be included if there are strong prior (usually biological), reasons to include them. The discussion above has considered the commonly used proportional hazard model. It has not considered the influence of model misspecification or the censoring distribution [36], or any newer methods for modelling composite endpoints that are being developed [37, 38]. ### Multi-stage disease processes Although Eq. 19 allows different baseline hazards for the diseases in a composite endpoint, the proportional hazards methodology requires that the resulting baseline hazard ∑ *j* *h**j* (*t*), is the same for all individuals (or individuals in a strata of a stratified analysis). Some rates of disease incidence, such as Cancers [16–18] and Amyotrophic Lateral Sclerosis [19–21], can be described by multi-stage disease processes [15], where one or more rate-limiting steps to disease may be skipped by a germline genetic variation. This will produce different baseline hazards for individuals with the genetic change. Importantly however, such changes will qualitatively modify the incident rate to a different power of time [15–21], and cannot be adjusted for in a proportional hazards model. Instead, the variant should be used to stratify the data, so that different strata have differing baseline hazards. ## Conclusions Composite endpoints are intrinsic to how we define and study disease. Since John Graunt’s studies [5], there has always been a trade-off between disease definitions that are sufficiently specific to distinguish different underlying disease processes, but also sufficiently broad to allow a meaningful statistical study. This is particularly apparent in clinical trials and epidemiological studies where data are costly or unavailable. Large population datasets with detailed genetic and biological information are providing a new data-driven source of composite endpoints, by identifying composite endpoints with potential shared underlying causes. Statistical methods can assess whether a composite endpoint is consistent with its assumed properties, such as testing its constituent diseases for heterogeneity among their disease-risk associations. Heterogeneity is conventionally tested with a *Q* or *I*2 statistic [12]. An alternative clustering-based approach, is to assume that diseases are in one or more clusters with equal associations, and test if the log-likelihood for the model [32] is minimised by one, or more clusters. This objective test can incorporate a prior, and examples suggest it is more lenient than a *Q* test, with the disease clustering data of Webster et al. [1] minimising the log-likelihood when *I*2 ≃ 50%. The merits of this approach for applications such as meta-analyses, will need exploring in greater detail elsewhere. When proportional hazard models are used, then heterogeneity of associations in a composite endpoint will cause the proportional hazards assumption to fail. However, the statistical model does allow arbitrarily different baseline incidence rates for the diseases in the composite endpoint (although each baseline hazard *h**j* must be the same for all individuals in the studied population). A test for the assumption of proportional hazards, provides a necessary consistency test when heterogeneity tests are impractical or impossible; although the proportional hazards assumption could fail for other reasons. The multi-stage model of disease emphasises that germline genetic variants that modify the number of rate-limiting steps prior to disease, cannot be adjusted for within a proportional hazards model but must be used to stratify the data. This remark holds more generally, regardless of whether a disease is part of a composite endpoint or studied alone. ## Data Availability UK Biobank data are available by application from www.ukbiobank.ac.uk. Simulated datasets used in the examples, will be made available via the Open Science Foundation after publication. ## Acknowledgements This research has been conducted using data from UK Biobank, a major biomedical database, under application number 42583. Anthony Webster is supported by a fellowship from the Nuffield Department of Population Health (NDPH). ## Appendix: Multivariate inverse variance weighted sum of squares Note that the covariances and their inverses are symmetric, and expand, ![Formula][34] If ![Graphic][35] takes the specific form given by Eq. 6, then the terms (Σ*i* Γ*i**X**i*) and ![Graphic][36] in the last term of the final line cancel, and the resulting equation can be rearranged to give, ![Formula][37] ## Footnotes * Minor errors corrected. * Received June 16, 2021. * Revision received July 5, 2021. * Accepted July 6, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. [1].Webster, A., Gaitskell, K., Turnbull, I., Cairns, B. & Clarke, R. Characterisation, identification, clustering, and classification of disease. Scientific Reports 11, 5405 (2021). 2. [2].Sankoh, A. J., Li, H. & D’Agostino, R. B., Sr.. Use of composite endpoints in clinical trials. Statistics in Medicine 33, 4709–4714 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.6205&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24833282&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 3. [3].Mccoy, C. E. Understanding the Use of Composite Endpoints in Clinical Trials. Western Journal of Emergency Medicine 19, 631–634 (2018). 4. [4].Palileo-Villanueva, L. M. & Dans, A. L. Composite endpoints. Journal of Clinical Epidemiology 128, 157–158 (2020). 5. [5].Graunt, C. J. Natural and Political OBSERVATIONS Mentioned in a following INDEX, and made upon the Bills of Mortality (Printed by John Martyn, Printer to the Royal Society, at the Sign of the Bell in St. Paul’s Church-yard. MDCLXXVI., 1665). Appendix - The table of casualties - Table of Casualties in Economic Writings (vol. 2) by William Petty (1899), between p. 406 and 407. 6. [6].Organization, W. H. International statistical classification of diseases and related health problems 10th revision (2016). URL [https://icd.who.int/browse10/2016/en](https://icd.who.int/browse10/2016/en). 7. [7].Alhasoun, F. et al. Age density patterns in patients medical conditions: A clustering approach. PLOS Computational Biology 14 (2018). 8. [8].Zhou, X. et al. A Systems Approach to Refine Disease Taxonomy by Integrating Phenotypic and Molecular Networks. EBioMedicine 31, 79–91 (2018). 9. [9].Guillamet, R. V., Ursu, O., Iwamoto, G., Moseley, P. L. & Oprea, T. Chronic obstructive pulmonary disease phenotypes using cluster analysis of electronic medical records. Health Informatics Journal 24, 394–409 (2018). 10. [10].Seymour, C. W. et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA 321, 2003–2017 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2019.5791&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 11. [11].Kuan, V. et al. Data-driven identification of ageing-related diseases from electronic health records. Scientific Reports 11 (2021). 12. [12].Borenstein, M., Hedges, M.J.P.T. H., & Rothstein, H. Introduction to Meta-Analysis (Wiley, 2009). 13. [13].Higgins, J. P. T. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21, 1539–1558 (2002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.1186&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12111919&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000176016900005&link_type=ISI) 14. [14].Higgins, J., Thompson, S., Deeks, J. & Altman, D. Measuring inconsistency in meta-analyses. British Medical Journal 327, 557–560 (2003). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMjcvNzQxNC81NTciO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8wNi8yMDIxLjA2LjE2LjIxMjU4OTAwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 15. [15].Webster, A. Multi-stage models for the failure of complex systems, cascading disasters, and the onset of disease. PLOS One 14, e0216422. (2019). 16. [16].Nordling, C. A new theory on the cancer-inducing mechanism. British Journal of Cancer 7, 68–72 (1953). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/bjc.1953.8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=13051507&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1953YC50500008&link_type=ISI) 17. [17].P.A. & R., D. The age distribution of cancer and a multistage theory of carcinogenesis. British Journal of Cancer 8, 1–12 (1954). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/bjc.1954.1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=13172380&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1954YC50900001&link_type=ISI) 18. [18].Moolgavkar, S. H. Commentary: Multistage carcinogenesis and epidemiological studies of cancer. International Journal of Epidemiology 45, 645–649 (2015). 19. [19].Ai-Chalabi, A. et al. Analysis of amyotrophic lateral sclerosis as a multistep process: a population-based modelling study. The Lancet Neurology 13, 1108 – 1113 (2014). 20. [20].Chiò, A. et al. The multistep hypothesis of als revisited. Neurology 91, e635–e642 (2018). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 21. [21].Corcia, P. et al. In als, a mutation could be worth two steps. Revue Neurologique 174, 669–670 (2018). 22. [22].Macedo Hair, F. N. F.B. P., G. Characterization of clinical patterns of dengue patients using an unsupervised machine learning approach. BMC Infectious Diseases 19, 649 (2019). 23. [23].Kueffner, Z. N. B. M. e. a. R., Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach. Scientific Reports 9, 690 (2019). 24. [24].Cordoba, G., Schwartz, L., Woloshin, S., Bae, H. & Gøtzsche, P. C. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. BMJ 341 (2010). 25. [25].Duc, A. N. & Wolbers, M. Weighted analysis of composite endpoints with simultaneous inference for flexible weight constraints. Statistics in Medicine 36, 442–454 (2017). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 26. [26].Armstrong, P. W. & Westerhout, C. M. Composite End Points in Clinical Research A Time for Reappraisal. Circulation 135, 2299 (2017). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjExOiIxMzUvMjMvMjI5OSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA3LzA2LzIwMjEuMDYuMTYuMjEyNTg5MDAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 27. [27].Kalaian, H. & Raudenbush, S. A multivariate mixed linear model for meta-analysis. Psychological Methods 1, 227–235 (1996). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1037//1082-989X.1.3.227&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996VV66100001&link_type=ISI) 28. [28].van Houwelingen, H., Arends, L. & Stijnen, T. Advanced methods in meta-analysis: multivariate approach and metaregression. Statistics in Medicine 21, 589–624 (2002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.1040&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11836738&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000173774800009&link_type=ISI) 29. [29].Nam, I., Mengersen, K. & Garthwaite, P. Multivariate meta-analysis. Statistics in Medicine 22, 2309–2333 (2003). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.1410&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12854095&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000184193200005&link_type=ISI) 30. [30].Jackson, D., White, I. R. & Riley, R. D. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses. Statistics in Medicine 31, 3805–3820 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.5453&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22763950&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 31. [31].Hardle, W. & Simar, L. Applied Multivariate Statistical Analysis, 4th Edition. In Applied Multivariate Statistical Analysis, 4th edition, 1–580 (2015). 32. [32].Webster, A. Clustering parametric models and normally distributed data. arxiv:2008.03974v3 (2020). 33. [33].Burgess, S., Bowden, J., Fall, T., Ingelsson, E. & Thompson, S. G. Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants. Epidemiology 28, 30–42 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=doi:10.1097/EDE.0000000000000559&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27749700&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 34. [34].Ott, M., Plummer, M. & Roos, M. How vague is vague? how informative is informative? reference analysis for bayesian meta-analysis. Statistics in Medicine. 35. [35].Klein, J. & Moeschberger, M. Survival analysis. In Survival Analysis, Techniques for Censored and Truncated Data, Second Edition, 1–531 (2003). 36. [36].Wu, L. & Cook, R. J. Misspecification of Cox regression models with composite endpoints. Statistics in Medicine 31, 3545–3562 (2012). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22736519&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F06%2F2021.06.16.21258900.atom) 37. [37].Hara, H. et al. Statistical methods for composite endpoints. EuroIntervention 16, E1484+ (2021). 38. [38].Rauch, G., Kieser, M., Binder, H., Bayes-Genis, A. & Jahn-Eimermacher, A. Time-to-first-event versus recurrent-event analysis: points to consider for selecting a meaningful analysis strategy in clinical trials with composite endpoints. Clinical Research in Cardiology 107, 437–443 (2018). [1]: /embed/graphic-1.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/graphic-2.gif [4]: /embed/inline-graphic-2.gif [5]: /embed/inline-graphic-3.gif [6]: /embed/graphic-3.gif [7]: /embed/graphic-4.gif [8]: /embed/graphic-5.gif [9]: /embed/graphic-6.gif [10]: /embed/graphic-7.gif [11]: /embed/graphic-8.gif [12]: /embed/graphic-9.gif [13]: /embed/graphic-10.gif [14]: /embed/graphic-11.gif [15]: /embed/inline-graphic-4.gif [16]: /embed/inline-graphic-5.gif [17]: /embed/inline-graphic-6.gif [18]: /embed/inline-graphic-7.gif [19]: /embed/inline-graphic-8.gif [20]: /embed/graphic-12.gif [21]: /embed/graphic-13.gif [22]: /embed/inline-graphic-9.gif [23]: /embed/inline-graphic-10.gif [24]: /embed/inline-graphic-11.gif [25]: /embed/graphic-14.gif [26]: /embed/graphic-17.gif [27]: /embed/inline-graphic-12.gif [28]: /embed/graphic-18.gif [29]: /embed/graphic-19.gif [30]: /embed/inline-graphic-13.gif [31]: /embed/graphic-20.gif [32]: /embed/graphic-21.gif [33]: /embed/graphic-22.gif [34]: /embed/graphic-23.gif [35]: /embed/inline-graphic-14.gif [36]: /embed/inline-graphic-15.gif [37]: /embed/graphic-24.gif