Clusters of multiple long-term conditions in three UK datasets: a latent class analysis
=======================================================================================

* Lewis Steell
* Stefanie J. Krauth
* Sayem Ahmed
* Grace Dibben
* Emma McIntosh
* Peter Hanlon
* Jim Lewsey
* Barbara I. Nicholl
* David McAllister
* Rod S. Taylor
* Sally J. Singh
* Frances S. Mair
* Bhautesh D. Jani

## Abstract

**Introduction** Latent class analysis (LCA) can be used to identify subgroups within populations based on unobserved variables. LCA can be used to explore whether certain long-term conditions (LTC) occur together more frequently than others in patients with multiple-long term conditions. In this manuscript we present findings from applying LCA in three large-scale UK databanks.

**Methods** We applied LCA to three different UK databanks: Secure Anonymised Information Linkage databank [SAIL], UK Biobank, and Understanding Society: the UK Household Longitudinal Study [UKHLS] and four different age groups: 18-36, 37-54, 55-73, and 74+ years. The optimal number of classes in each LCA was determined using maximum likelihood. Sample size adjusted Bayesian Information Criterion (aBIC) was used to assess model fit and elbow plots and model entropy were used to assess the best number of latent classes in each model.

**Results** Between three to six clusters were identified in the different datasets and age groups. Although different in detail, similar types of clusters were identified between datasets and age groups which combine disorders around similar systems incl. Cardiometabolic clusters, Pulmonary clusters, Mental health clusters, Painful conditions clusters, and cancer clusters.

## Introduction

Latent class analysis (LCA) is a method of finite mixture modelling used to identify unobserved subgroups (i.e. ‘latent classes’) within a population, based on a series of observed indicator variables [1]. LCA assumes that the distribution of the indicator variables is due to the existence of a finite number of underlying latent classes in the population. Typically applied in behavioural and social sciences, LCA has gained traction in medicine as a means of clustering patients into underlying disease phenotypes, based on their clinical and demographic characteristics. LCA has also been used as a method for identifying clusters of multiple long-term conditions that exist across different populations [2–6]. Identifying clusters of multiple LTCs may facilitate deeper understanding and extraction of the underlying cause and effects of multimorbidity profiles on patient and healthcare outcomes.

LCA facilitates the use of statistical inference in the selection of the optimal model solution and does not rely on arbitrary distance-based measures used in other methods for cluster identification (e.g. k-means or hierarchical clustering). As such, LCA is regarded as a more statistically robust and reproducible method of clustering. Additionally, in simulation studies where the underlying distribution of latent classes was known but concealed during analyses, LCA performed better than other clustering algorithms at accurately classifying individuals into the correct cluster [4, 7]. In clustering at the individual rather than condition level, LCA may uncover a more accurate reflection of the clinical landscape whereby highly prevalent conditions can appear across several identified multiple long-term condition clusters.

Therefore, in this analysis we apply LCA to identify and characterise clusters of multiple long-term conditions across three UK population datasets.

## Methods

### Data source

This study is part of an ongoing NIHR-funded Research project “Personalised exercise rehabilitation for people with multiple long-term conditions (PERFORM)”. The UK Biobank has full ethical approval from the NHS National Research Ethics Service (16/NW/0274). This study was conducted as part of UK Biobank Project 14151. The use and analysis of SAIL data was approved by the SAIL information governance review panel (Project 0830). UKHLS data access and use was granted by the UK Data Service (Project ID: 221571).

This work uses data provided by patients and collected by the NHS as part of their care and support, copyright © (2022), NHS England. Re-used with the permission of the NHS England and UK Biobank. All rights reserved.

This research used data assets made available by National Safe Haven as part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (research which commenced between 1st October 2020 – 31st March 2021 grant ref MC\_PC_20029; 1st April 2021 -30th September 2022 grant ref MC_PC_20058)

### Approach

We applied LCA to three UK datasets (Secure Anonymised Information Linkage databank [SAIL], UK Biobank, and Understanding Society: the UK Household Longitudinal Study [UKHLS]) to identify clusters of multiple long-term conditions in people with multimorbidity. To account for different morbidity profiles across the lifespan, LCA was applied separately to the following age strata: 18 – 36 years, 37 – 54 years, 55 – 73 years, and 74+ years. LCA analyses were applied participants aged 18+ years at baseline data in both research datasets (UK Biobank & UKHLS), and to adults (aged 18+ years) registered with a participating practice on 1st January 2011 in the unselected community cohort (SAIL). This date was chosen as *de facto* ‘baseline’ in SAIL as electronic data capture was most complete after this period, plus it coincided approximately with baseline recruitment of the other datasets [8].

For LCA, input indicators were single LTCs that were derived from either self-report (UK Biobank & UKHLS) or from Read codes [with associated prescription data for some LTCs] from individually linked healthcare data (SAIL). In our analyses, LTCs were deemed present if self-reported or recorded in linked health data, or otherwise absent, meaning there were no missing input data in our LCA models. Due to heterogeneity of data collection across these datasets, different LTCs were used in the definition of multimorbidity and subsequent construction of LCA models. In UK Biobank and SAIL, multimorbidity was defined using 43 LTCs derived from a list commonly applied in multimorbidity research [9] (Table 1). In UKHLS, data were collected on only 17 LTCs, some of which were merged to map with LTCs on the longer list (e.g. ‘hyperthyroidism’ and ‘hypothyroidism’ were merged to become ‘thyroid disease’). Resultantly, a total of 13 LTCs were used to define multimorbidity in UKHLS (Table 2). Only individuals with 2 or more LTCs (i.e. multimorbidity) were included in LCA. The number of people included in LCA per dataset and for each age group is presented in Table 1.

View this table:
[Table 1.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T1)

Table 1. Number of participants with multimorbidity included in LCA models per age group in each datase

### LCA Parameters

LCA was conducted in R using the *poLCA* package [10]. LCA was applied iteratively to identify the optimal number of latent classes in each age group and dataset combination (n = 10). For each combination, we constructed and compared LCA models containing 1 to 10 latent classes. As described above, our indicator variables were single LTCs recorded in each of the datasets (n=43 SAIL & UK Biobank; n=13 UKHLS). *poLCA* uses the expectation-maximization (EM) algorithm with a Newton-Raphson step to identify the maximum likelihood estimates of the model parameters. In our analyses, the EM algorithm was set to complete a maximum of 5000 iterations to achieve model convergence on the maximum likelihood solution. Each LCA model was repeated 25 times to improve the search for a global, rather than local, maximum solution. Model parameters were then validated against models with increasing number of random start values (n = 50 & 100). If the maximum likelihood solution (i.e. the single largest log-likelihood value) could not be replicated from increasing repetitions of model fitting, this indicated poor fit of the data to the specified model and the model was rejected.

### Model selection criteria & cluster assignment

Identifying the optimal latent class solution requires careful consideration of multiple criteria. Here, we interpreted the best model through a combined assessment of model fit data with substantive knowledge and clinical interpretability of clusters. Firstly, we used the Bayesian Information Criteria (BIC) and sample size adjusted BIC (aBIC) to guide model selection. These criteria are derived from the maximum likelihood solution and award model parsimony, with lower values indicating better model fit. Simulation studies have suggested the BIC to be the most reliable model fit indicator, particularly in large samples [11]. Despite this, in LCA models with large sample sizes and high number of indicators (as in our SAIL & UK Biobank analyses), the BIC tends to favour more complex models with a higher number of classes. The BIC was therefore not definitive in our model selection and used as a guide only. We plotted the model fit data using elbow plots, to identify where any subsequent increases in number of latent classes resulted in comparatively lesser improvement in BIC, as has been suggested previously [12]. In addition to this, we assessed the underlying distribution of LTCs within the latent classes for clinical interpretability. Where models may have had similar performance in terms of statistical criteria, the substantive clinical interpretation and face validity of identified clusters were considered prior to model selection. Finally, we considered model entropy, with higher values (close to 1) representing models with more accurate assignment of individuals into the appropriate classes.

After identifying the optimal LCA solution, individuals were assigned to the latent class for which they had the highest posterior probability. Conditional item response probabilities were used to investigate within cluster prevalence of each individual LTC, then the clusters were subsequently labelled based on within and between cluster prevalence of these LTCs, as detailed below.

### Applied convention for labelling MLTC clusters

*   Considered between cluster differences first, include LTCs that had substantially higher prevalence in one cluster compared to all others in name (rule of thumb approximately twice than next highest cluster)

*   Labels include LTCs with 100% within-cluster prevalence and/or the highest between-cluster prevalence

*   Labelled according to affected systems where LTCs with highest within and between prevalence affect similar physiology and body systems (e.g. Pulmonary, cardiovascular, etc.)

*   Used ‘+’ when cluster named after single condition with highest within and/or between cluster prevalence

*   Where a single LTC dominates the cluster, a + was added to the name to indicate that a collection of other LTCs are also in this cluster albeit without other obvious between-cluster differences that warrant naming

*   Clusters with several LTCs with high within and between cluster prevalence which affect multiple systems and without one single dominant LTC in the cluster, were labelled Discordant

### Note

The clusters have been named for convenience in discussion and write-up. Clusters are complex and no naming convention will comprehensively describe a cluster. Readers are advised to investigate the prevalence of LTCs in detail when interpreting the data and results.

## Results

### Identified MLTC clusters

Below we present elbow plots containing model fit criteria and statistics and rationale for selection of the optimal LCA model in each age group and dataset combination.

#### SAIL LCA Models

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F1)

Figure 1. Model fit statistics for LCA models in adults 18-36 years in SAIL

Model selected: 5-class model.

Reason: Stepwise improvement in aBIC and BIC was reduced beyond the 5-class solution. Only local maxima were identified in 6-class model and beyond, meaning poor data fit and models being rejected. 5-class model had good clinical interpretability and better fit statistics than 4-class model, with almost identical classification.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F2)

Figure 2. Model fit statistics for LCA models in adults 37-54 years in SAIL

Model selected: 5-class model.

Rationale: Stepwise improvement in aBIC and BIC reduces beyond the 5-class model. Failure to replicate maximum likelihood function in 7-class model onwards. Marginal improvement in model classification between 5-class and 6-class model, but 6-class did not improve clinical interpretability.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F3)

Figure 3. Model fit statistics for LCA models in adults 55 - 73 years in SAIL

Model selected: 7-class model.

Rationale: Improvement in aBIC and BIC smaller after 6-class model. Classification improves in 7-class model. Clinical interpretation of 7-class model better than that of additional models with small differences in fit criteria and classification.

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F4)

Figure 4. Model fit statistics for LCA models in adults 55 - 73 years in SAIL

Model selected: 6-class model.

Rationale: Stepwise improvement in aBIC and BIC reduces around 5 to 6-class model. All models beyond 2-class model have similar classification. Two class model not appropriate as showed very similar clinical profiles split only by the presence/absence of COPD. Similar model fit between 6-class and 7-class, but clinical interpretation favourable in 6-class model.

#### UK Biobank LCA Models

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F5)

Figure 5. Model fit statistics for LCA models in adults 37 – 54 years in UK Biobank

Model selected: 5-class model.

Rationale: Stepwise improvement in aBIC and BIC reduced beyond the 5-class model. Failed to replicate maximum log-likelihood function in 6-class model onwards, so models rejected due to instability and poor data fit.

![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F6.medium.gif)

[Figure 6.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F6)

Figure 6. Model fit statistics for LCA models in adults 55 - 73 years in UK Biobank

Model selected: 4-class model.

Rationale: Stepwise improvement in aBIC and BIC reduced beyond the 5-class model. However, failed to replicate maximum log-likelihood function in 5-class model onwards, so these models were rejected due to instability and poor data fit.

#### UKHLS LCA Models

![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F7.medium.gif)

[Figure 7.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F7)

Figure 7. Model fit statistics for LCA models in adults 18 – 36 years in UKHLS

Model selected: 3-class model.

Rationale: BIC almost identical in 2, 3 and 4-class models, reduction in aBIC improvement beyond 4-class model. Failed to replicate maximum log-likelihood in 4-class models and beyond so these were rejected. Clinical interpretation of 3-class model preferable vs 2-class model. Classification of this model was still good despite being the lowest among comparable models.

![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F8.medium.gif)

[Figure 8.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F8)

Figure 8. Model fit statistics for LCA models in adults 37 - 54 years in UKHLS

Model selected: 5-class model.

Rationale: Lowest BIC and deflection of aBIC at 5-class model. Larger models were unstable and therefore rejected. Very clear clinical interpretation.

![Figure 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F9.medium.gif)

[Figure 9.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F9)

Figure 9. Model fit statistics for LCA models in adults 55 – 73 years in UKHLS

Model selected: 4-class model.

Rationale: BIC and aBIC showed consistent improvement until 7-class model, but maximum log-likelihood function could not be replicated in any models beyond the 4-class model. Indeed, the EM algorithm failed to converge after 5000 iterations in the 7 to 10-class models, highlighting instability of these models. 4-class model had improved classification and clinical interpretation compared to smaller models.

![Figure 10.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F10.medium.gif)

[Figure 10.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F10)

Figure 10. Model fit statistics for LCA models in adults 74+ years in UKHLS

Model selected: 3-class model.

Rationale: Lowest BIC and reduced improvement of aBIC beyond the 3-class model. Model classification in 3-class model is ideal (entropy = 1). Model failed to converge in 4-class and 6-class model, and maximum likelihood function could not be replicated in 7-class model onwards.

### MLTC clusters

#### SAIL MLTC Clusters

![Figure 11.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F11.medium.gif)

[Figure 11.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F11)

Figure 11. Within cluster conditional item response probabilities for adults 18 - 36 years in SAIL.

![Figure 12.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F12.medium.gif)

[Figure 12.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F12)

Figure 12. Within cluster conditional item response probabilities for adults 37 - 54 years in SAIL

![Figure 13.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F13.medium.gif)

[Figure 13.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F13)

Figure 13. Within cluster conditional item response probabilities for adults 55 - 73 years in SAIL.

![Figure 14.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F14.medium.gif)

[Figure 14.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F14)

Figure 14. Within cluster conditional item response probabilities for adults 74+ years in SAIL.

### UK Biobank MLTC clusters

![Figure 15.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F15.medium.gif)

[Figure 15.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F15)

Figure 15. Within cluster conditional item response probabilities for adults 37 - 54 years in UK Biobank.

![Figure 16.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F16.medium.gif)

[Figure 16.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F16)

Figure 16. Within cluster conditional item response probabilities for adults 55-73 years in UK Biobank.

### UKHLS MLTC clusters

![Figure 17.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F17.medium.gif)

[Figure 17.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F17)

Figure 17. Within cluster conditional item response probabilities for adults 18 - 36 years in UKHLS.

![Figure 18.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F18.medium.gif)

[Figure 18.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F18)

Figure 18. Within cluster conditional item response probabilities for adults 37 – 54 years in UKHLS.

![Figure 19.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F19.medium.gif)

[Figure 19.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F19)

Figure 19. Within cluster conditional item response probabilities for adults 55 - 73 years in UKHLS.

![Figure 20.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/06/2023.09.05.23294158/F20.medium.gif)

[Figure 20.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/F20)

Figure 20. Within cluster conditional item response probabilities for adults 74+ years in UKHLS.

### MLTC Clusters – Sociodemographic characteristics

#### SAIL

##### SAIL: 18-36 years

View this table:
[Table 2.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T2)

Table 2. Sociodemographic characteristics of adults 18-36 years in SAIL stratified by multimorbidity clusters.

### SAIL: adults 37 – 54 years

View this table:
[Table 3.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T3)

Table 3. Sociodemographic characteristics of adults 37 – 54 years in SAIL stratified by multimorbidity cluster.

### SAIL: adults 55 – 73 years

View this table:
[Table 4.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T4)

Table 4. Sociodemographic characteristics of adults 55 - 73 years in SAIL stratified by multimorbidity cluster.

### SAIL: adults (74+ years)

View this table:
[Table 5.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T5)

Table 5. Sociodemographic characteristics of adults 74+ years in SAIL stratified by multimorbidity cluster

#### UK BIOBANK

### UK BIOBANK: adults 37 – 54 years

View this table:
[Table 6.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T6)

Table 6. Sociodemographic characteristics of adults 37 – 54 years in UK Biobank stratified by multimorbidity cluster

### UK BIOBANK: adults 55 – 73 years

View this table:
[Table 7.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T7)

Table 7. Sociodemographic characteristics of adults 55 - 73 years in UK Biobank stratified by multimorbidity cluster

#### UKHLS

### UKHLS: adults 18 – 36 years

View this table:
[Table 8.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T8)

Table 8. Sociodemographic characteristics of adults 18-36 years in UKHLS stratified by multimorbidity cluster

### UKHLS: adults 37 – 54 years

View this table:
[Table 9.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T9)

Table 9. Sociodemographic characteristics of adults 37 - 54 years in UKHLS stratified by multimorbidity clusters.

### UKHLS: adults 55 - 73 years

View this table:
[Table 10.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T10)

Table 10. Sociodemographic characteristics of adults 55 - 73 years in UKHLS stratified by multimorbidity clusters.

### UKHLS: adults 74+ years

View this table:
[Table 11.](http://medrxiv.org/content/early/2023/09/06/2023.09.05.23294158/T11)

Table 11. Sociodemographic characteristics of adults 74+ years in UKHLS stratified by multimorbidity clusters.

## Discussion

Using LCA we identified latent classes of LTCs across three databanks and four age groups. Clusters identified in different age groups in all databanks center around specific systems in the body. For example, in all age groups and databanks LCA identified clusters centring around pulmonary conditions. Other common clusters combine cardiometabolic diseases, mental health or neurological conditions, or painful conditions. Although the exact combination of LTCs in clusters necessarily differ depending on the data used for LCA, previous studies applying LCA to different cohorts or databanks have likewise consistently identified clusters combining conditions around different systems including among others, cardiometabolic clusters, pain-related clusters, respiratory clusters, and neurological/mental health clusters [13–17]. In addition to clusters which group disorders affecting specific systems, discordant clusters are clusters grouping disorders with different aetiologies and affected systems which may have a more complex interpretation. Despite the differences in details on how many conditions were considered and how the conditions were entered into the databanks and cohorts between previous studies, as well as between the different databanks in this study, similar types of clusters tend to emerge across cohorts/databanks. These similarities in clusters suggest that there might be important associations and potentially connected aetiologies between certain LTCs, leading them to cluster together more frequently.

## Data Availability

Access to data from Biobank can be requested via the UK Biobank Access Management System [https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access](https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access) Access to SAIL data can be requested via the independent Information Governance Review Panel [https://saildatabank.com/data/apply-to-work-with-the-data/](https://saildatabank.com/data/apply-to-work-with-the-data/) Access to UKHLS data can be requested via UK Data Service [https://www.understandingsociety.ac.uk/documentation/access-data](https://www.understandingsociety.ac.uk/documentation/access-data)

[https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access](https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access) 

[https://saildatabank.com/data/apply-to-work-with-the-data/](https://saildatabank.com/data/apply-to-work-with-the-data/) 

[https://www.understandingsociety.ac.uk/documentation/access-data](https://www.understandingsociety.ac.uk/documentation/access-data) 

## Footnotes

*   * Senior authors.

*   Received September 5, 2023.
*   Revision received September 5, 2023.
*   Accepted September 6, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.Collins, L.M. and  S.T. Lanza, Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences. Vol. 718. 2009: John Wiley & Sons.
    
    
2.  2.Bayes-Marin, I., et al., Multimorbidity patterns in low-middle and high income regions: a multiregion latent class analysis using ATHLOS harmonised cohorts. BMJ Open, 2020. 10(7): p. e034441.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTAvNy9lMDM0NDQxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDkvMDYvMjAyMy4wOS4wNS4yMzI5NDE1OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

3.  3.Larsen, F.B., et al., A Latent Class Analysis of Multimorbidity and the Relationship to Socio-Demographic Factors and Health-Related Quality of Life.A National Population-Based Study of 162,283 Danish Adults. PLoS One, 2017. 12(1): p. e0169426.
    
    
4.  4.Nichols, L., et al., In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm. J Clin Epidemiol, 2022. 152: p. 164–175.
    
    
5.  5.Whitson, H.E., et al., Identifying Patterns of Multimorbidity in Older Americans: Application of Latent Class Analysis. J Am Geriatr Soc, 2016. 64(8): p. 1668–73.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F06%2F2023.09.05.23294158.atom) 

6.  6.Zhu, Y., et al., Characteristics, service use and mortality of clusters of multimorbid patients in England: a population-based study. BMC medicine, 2020. 18(1): p. 1–11.
    
    
7.  7.Magidson, J. and  J. Vermunt, Latent class models for clustering: A comparison with K-means. Canadian journal of marketing research, 2002. 20(1): p. 36–43.
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000173353800006&link_type=ISI) 

8.  8.Hanlon, P., et al., Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLOS Medicine, 2022. 19(3): p. e1003931.
    
    
9.  9.Barnett, K., et al., Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. The Lancet, 2012. 380(9836): p. 37–43.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/s0140-6736(12)60240-2&link_type=DOI) 

10. 10.Linzer, D.A. and  J.B. Lewis, poLCA: An R Package for Polytomous Variable Latent Class Analysis. Journal of Statistical Software, 2011. 42(10): p. 1–29.
    
    
11. 11.Nylund, K.L.,  T. Asparouhov, and  B.O. Muthén, Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 2007. 14(4): p. 535–569.
    
    
12. 12.Nylund-Gibson, K. and  A.Y. Choi, Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science, 2018. 4: p. 440–461.
    
    
13. 13.Ho, H.E., et al., Trends of Multimorbidity Patterns over 16 Years in Older Taiwanese People and Their Relationship to Mortality. Int J Environ Res Public Health, 2022. 19(6).
    
    
14. 14.Nguyen, Q.D., et al., Multimorbidity Patterns, Frailty, and Survival in Community-Dwelling Older Adults. J Gerontol A Biol Sci Med Sci, 2019. 74(8): p. 1265–1270.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F06%2F2023.09.05.23294158.atom) 

15. 15.Simoes, D., et al., Patterns and Consequences of Multimorbidity in the General Population: There is No Chronic Disease Management Without Rheumatic Disease Management. Arthritis Care Res (Hoboken), 2017. 69(1): p. 12–20.
    
    
16. 16.Zheng, D.D., et al., Multimorbidity patterns and their relationship to mortality in the US older adult population. PLoS One, 2021. 16(1): p. e0245053.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F06%2F2023.09.05.23294158.atom) 

17. 17.Zhu, Y., et al., Characteristics, service use and mortality of clusters of multimorbid patients in England: a population-based study. BMC Med, 2020. 18(1): p. 78.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F06%2F2023.09.05.23294158.atom)