RT Journal Article SR Electronic T1 Statistical tests for heterogeneity of clusters and composite endpoints JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.06.16.21258900 DO 10.1101/2021.06.16.21258900 A1 Anthony J. Webster YR 2021 UL http://medrxiv.org/content/early/2021/07/06/2021.06.16.21258900.abstract AB Clinical trials and epidemiological cohort studies often group similar diseases together into a composite endpoint, to increase statistical power. A common example is to use a 3-digit code from the International Classification of Diseases (ICD), to represent a collection of several 4-digit coded diseases. More recently, data-driven studies are using associations with risk factors to cluster diseases, leading this article to reconsider the assumptions needed to study a composite endpoint of several potentially distinct diseases. An important assumption is that the (possibly multivariate) associations are the same for all diseases in a composite endpoint (not heterogeneous). Therefore, multivariate measures of heterogeneity from meta-analysis are considered, including multi-variate versions of the I2 and Q statistics. Whereas meta-analysis offers tools to test heterogeneity of clustering studies, clustering models suggest an alternative heterogeneity test, of whether the data are better described by one, or more, clusters of elements with the same mean. The assumptions needed to model composite endpoints with a proportional hazards model are also considered. It is found that the model can fail if one or more diseases in the composite endpoint have different associations. Tests of the proportional hazards assumption can help identify when this occurs. It is emphasised that in multi-stage diseases such as cancer, some germline genetic variants can strongly modify the baseline hazard function and cannot be adjusted for, but must instead be used to stratify the data.Competing Interest StatementThe authors have declared no competing interest.Clinical TrialNAFunding StatementAnthony Webster is supported by a fellowship from the Nuffield Department of Population Health (NDPH), University of Oxford.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not applicable - the research involves only previously collected, fully anonymised non-NHS data from the UK Biobank study (www.ukbiobank.ac.uk).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesUK Biobank data are available by application from www.ukbiobank.ac.uk. Simulated datasets used in the examples, will be made available via the Open Science Foundation after publication.