Abstract
Clinical trials and epidemiological cohort studies often group similar diseases together into a composite endpoint, to increase statistical power. A common example is to use a 3-digit code from the International Classification of Diseases (ICD), to represent a collection of several 4-digit coded diseases. More recently, data-driven studies are using associations with risk factors to cluster diseases, leading this article to reconsider the assumptions needed to study a composite endpoint of several potentially distinct diseases. An important assumption is that the (possibly multivariate) associations are the same for all diseases in a composite endpoint (not heterogeneous). Therefore, multivariate measures of heterogeneity from meta-analysis are considered, including multi-variate versions of the I2 and Q statistics. Whereas meta-analysis offers tools to test heterogeneity of clustering studies, clustering models suggest an alternative heterogeneity test, of whether the data are better described by one, or more, clusters of elements with the same mean. The assumptions needed to model composite endpoints with a proportional hazards model are also considered. It is found that the model can fail if one or more diseases in the composite endpoint have different associations. Tests of the proportional hazards assumption can help identify when this occurs. It is emphasised that in multi-stage diseases such as cancer, some germline genetic variants can strongly modify the baseline hazard function and cannot be adjusted for, but must instead be used to stratify the data.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
NA
Funding Statement
Anthony Webster is supported by a fellowship from the Nuffield Department of Population Health (NDPH), University of Oxford.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Not applicable - the research involves only previously collected, fully anonymised non-NHS data from the UK Biobank study (www.ukbiobank.ac.uk).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Minor errors corrected.
Data Availability
UK Biobank data are available by application from www.ukbiobank.ac.uk. Simulated datasets used in the examples, will be made available via the Open Science Foundation after publication.