Longitudinal Faecal Calprotectin Profiles Characterise Disease Course Heterogeneity in Crohn’s Disease

Nathan Constantine-Cooke; Karla Monterrubio-Gómez; Nikolas Plevris; Lauranne A.A.P Derikx; Beatriz Gros; Gareth-Rhys Jones; Riccardo E. Marioni; Charlie W. Lees; Catalina A. Vallejos

doi:10.1101/2022.08.16.22278320

Abstract

Background and Aims The progressive nature of Crohn’s disease is highly variable and hard to predict. In addition, symptoms correlate poorly with mucosal inflammation. There is therefore an urgent need to better characterise the heterogeneity of disease trajectories in Crohn’s disease by utilising objective markers of inflammation. We aimed to better understand this heterogeneity by clustering Crohn’s disease patients with similar longitudinal faecal calprotectin profiles.

Methods Latent class mixed models were used to model faecal calprotectin trajectories within five years of diagnosis and to cluster subjects. Information criteria, alluvial plots, and cluster trajectories were used to decide the optimal number of clusters. Chi-squared, Fisher’s exact test, and ANOVA were used to test for associations with variables commonly assessed at diagnosis.

Results Our study cohort comprised of 365 patients with newly diagnosed Crohn’s disease and 2856 faecal calprotectin measurements taken within five years of diagnosis (median 7 per subject). Four distinct clusters were identified by characteristic calprotectin profiles: a cluster with consistently high faecal calprotectin and three clusters characterised by different downward longitudinal trends. Cluster membership was significantly associated with smoking (p = 0.015), upper gastrointestinal involvement (p < 0.001), and early biologic therapy (p < 0.001).

Conclusions Our analysis demonstrates a novel approach to characterising the heterogeneity of Crohn’s disease by using faecal calprotectin. The group profiles do not simply reflect different treatment regimes and do not mirror classical disease progression endpoints. We believe these profiles represent an entirely new way of classifying disease behaviour in Crohn’s disease.

1 Introduction

Crohn’s disease (CD) affects around 1 in 350 people in the UK ^1,2 with substantial variation in phenotypes and disease outcomes. Historically, 30% follow a quiescent disease course ³, whilst many will require surgery due to strictures, fistulas, or lack of response to medical therapy. Despite this heterogeneity, our ability to characterise disease variability remains poor and, in the case of Montreal location and behaviour, involves invasive examinations which limits the suitability of frequent longitudinal measurements.

Faecal calprotectin (FCAL) is routinely used to monitor mucosal inflammation and guide treatment decisions ⁴, and is well known to be associated with poor outcomes in CD ^5,6. It is therefore sensible to consider using FCAL to characterise heterogeneity found in intestinal inflammation. By incorporating all FCAL data, instead of only FCAL measurements which can be dichotomised into specific time points, FCAL can be modelled as a continuous longitudinal process. Whilst FCAL has previously been modelled in this way, no published research has attempted to cluster CD patients by longitudinal FCAL profiles: instead capturing heterogeneity across patients through a priori selected covariates (such subjects in endoscopic or clinical remission and those who have relapsed) ^7,8.

Disease heterogeneity in CD has previously been described longitudinally by the IBSEN study ³. In the IBSEN study, subjects with CD chose which profile they believed best described their disease activity out of four profiles specified a priori. We aimed to perform a modernised iteration of this work by instead using FCAL profiles to characterise patient heterogeneity. We hypothesise that an unsupervised analysis to uncover latent patient subgroups with distinct longitudinal FCAL patterns can lead to better disease characterisation.

2 Materials and Methods

2.1 Study Design

We performed a retrospective cohort study at the Edinburgh IBD Unit, a tertiary referral centre, to determine if there were subgroups within the CD patient population identifiable from FCAL measurements which had been collected within five years of diagnosis. We modelled longitudinal FCAL profiles using latent class mixed models (LCMMs) ⁹, an extension of linear mixed effects models, which enables the identification of distinct subgroups with shared longitudinal patterns. LCMMs have been used to model biomarker trajectories in many contexts (e.g. modelling disease activity score in rheumatoid arthritis ¹⁰ and estimated glomerular filtration rate in type 2 diabetes ¹¹).

The data were obtained from a retrospective cohort study by Plevris et al. which identified all incident CD cases between 2005 and 2017 at The Edinburgh IBD unit which fulfilled set inclusion criteria ¹². For all patients, electronic health records (TrakCare; InterSystems, Cambridge, MA) were used to extract demographic as well as outcomes and FCAL values (both up to June 2019). Data for drug treatments and disease location were also extracted.

2.2 Criteria & Definitions

First, the inclusion criteria from Plevris et al. were applied: (1) CD diagnosis between 2005 and 2017; (2) an initial FCAL measurement at diagnosis (or within 2 months) and prior to treatment; (3) initial FCAL result ≥ 250μg/g; (4) an accurate date of diagnosis; (5) at least one additional FCAL measurement within 12 months of diagnosis; (6) at least 12 months of followup; (7) neither having surgery nor a Montreal disease progression/new perianal disease within 12 months of diagnosis. Second, the following additional criterion was applied in this study: (8) at least 3 FCAL measurements within 5 years of diagnosis.

The following information was available at diagnosis: sex, age, smoking status, FCAL, Montreal location (alongside upper gastrointestinal inflammation), and Montreal behaviour (alongside perianal disease). Treatments prescribed within one year of diagnosis were also recorded: 5-ASAs (aminosalicylates), thiopurines, corticosteroids, methotrexate, exclusive enteral nutrition, and biologic therapies (either infliximab, adalimumab, ustekinumab, or vedolizumab).

2.3 FCAL Assay

The Edinburgh IBD Unit has been using FCAL for diagnostic and monitoring purposes since 2005. Stool samples have been routinely collected at all healthcare interactions ¹². Patients are also given collection kits in the clinic or sent by post to their home. Samples are stored at -20ºC and FCAL is measured using a standard enzyme-linked immunosorbent assay technique (Calpro AS, Lysaker, Norway). All FCAL measurements in this study were performed using the same protocol and assay.

2.4 Statistical Analysis

Descriptive statistics are presented as median and interquartile range (IQR) for continuous variables. Frequencies with percentages are provided for categorical variables.

FCAL measurements greater than 2500μg/g were set to 2500 μg/g, the upper range for the assay. Likewise, measurements reported as less than the lower range for the assay, 20μg/g, were set to 20μg/g. FCAL values were log-transformed before the models were fitted. To model the FCAL trajectories and find clusters, we used LCMMs with longitudinal patterns captured using natural cubic splines ⁹. Natural cubic splines provide a flexible framework to model FCAL trajectories whilst remaining stable at either end of the study followup period ¹³. Using natural cubic splines results in fewer parameters needing to be estimated compared to polynomial regression which requires a high-degree polynomial to achieve the same level of flexibility ¹⁴ Between two and five knots were considered for the splines and their performance was compared using Akaike information criterion (AIC). Three knots were found to produce the optimal AIC within this range. The knots were placed at the first quartile, median, and third quartile of all FCAL measurement times. A full model description is provided as an Appendix.

LCMMs assuming two to six clusters were fitted. For each number of clusters, the optimal model was found via a grid search approach (50 runs with 10 maximum iterations) following the vignette provided as part of the lcmm R package. Models were deemed to converge based on parameter and likelihood stability, and on the negativity of the second derivatives. After each optimal model was found, the maximum log-likelihood, AIC, and Bayesian information criterion (BIC) were calculated. An alluvial plot was produced to provide intuition of how additional clusters are formed as the number of assumed clusters increases. These findings were used to decide on the appropriate number of clusters in our study population. Uncertainty in cluster assignments was quantified using posterior classification probabilities. To visualise overall trajectories within each cluster, point estimates for each of the model parameters were used, and statistical uncertainty was visualised using 95% confidence intervals.

Marginal associations between cluster membership and information available at the time of diagnosis were explored. Chi-square tests and Fisher’s exact tests, dependent on suitability, were used for categorical variables. ANOVA was used for continuous variables. Upper gastrointestinal inflammation (L4) and perianal disease (P) were tested separately to Montreal location (L1-L3) and Montreal behaviour (B1-B3) respectively.

Potential evidence of treatment effects was garnered by testing for associations between cluster membership and whether each treatment was prescribed within one year of diagnosis using Fisher’s exact test. Biologic prescriptions within three months of diagnosis were also considered to study potential earlier treatment effects..

A 5% significance level was used for all statistical tests. Bonferroni adjustments have also been used to provide adjusted p-values (p_adj).

As an exploratory analysis, a multinomial logistic regression model ¹⁵ and a random forest classifier ¹⁶ were used to predict cluster allocations using information available at the time of diagnosis and biologic prescriptions. For this purpose, a 75:25 train:test split with 4-fold cross validation was used ¹⁷. Classification performance was assessed via area under the curve (AUC) extended to multiple classes ¹⁸.

R ¹⁹ (v.4.2.1) was used for all statistical analyses using the lcmm ²⁰ (v.1.9.5), survival²¹ (v.3.3-1), survminer ²² (v.0.4.9), nnet ²³ (v.7.3-17), ranger ²⁴ (v.0.13.1), datefixR²⁵ (v.0.1.4), tidyverse²⁶ (v.1.3.1), tidymodels ²⁷ (v.0.2.0), vip ²⁸ (v0.3.2) and ggalluvial²⁹ (v.0.12.3) R libraries. The analytical reports generated for this study and corresponding source code are hosted online^*.

2.5 Ethics

As this study was considered a retrospective audit due to all data having been collected as part of routine clinical care, no ethical approval or consent was required as per UK Health Research Authority guidance. Caldicott guardian approval (NHS Lothian) was granted (Project ID: 18002).

3 Results

3.1 FCAL Measurements

356 subjects with incident CD met the inclusion criteria for this study (Figure 1, Table 1). Across all patients, 2856 FCAL measurements were recorded within five years of diagnosis. The median frequency of FCAL measurements for a subject within this period was 7 (IQR 5-10). The overall distribution is presented in Figure S1.

View this table:

Table 1:

Cohort characteristics and treatments prescribed to the cohort. All prescriptions were prescribed within one year of diagnosis unless otherwise stated. Percentages when stratified across clusters are out of the total number of subjects in the cluster. Biologic is defined as either infliximab, adalimumab, ustekinumab or vedolizumab prescription. † Perianal disease may be present concomitantly to B1, B2 or B3 disease behaviour or separately. ‡ Upper gastrointestinal inflammation may be present in addition to ileal, colonic, or ilealcolonic inflammation. p unadjusted p-value. p_adj p-value after Bonferroni correction. ^* Significant at a 5% significance level. ^** Significant at a 1% significance level. ^*** Significant at a 0.1% significance level.

Figure 1:

Flowchart demonstrating data processing steps. FCAL: faecal calprotectin.

3.2 Modelling FCAL Trajectories

LCMMs fitted with two to six assumed clusters all converged as per default convergence criteria. As seen in Figure 2, cluster assignments were largely stable across differing assumed clusters, particularly when comparing the 3-cluster, 4-cluster and 5-cluster models. Performance metrics for each model considered are provided in Table S1. BIC suggested the 2-cluster model was most appropriate, but this model was discarded as visual inspection of the inferred trajectories suggested a larger number of distinct clusters (Figure S2). AIC and the maximum log-likelihood favoured the 5-cluster and 6-cluster models, respectively. However, those models were found to overfit the data as some of the inferred trajectories were similar (Figure S4 and Figure S5). Therefore, as a parsimonious choice, we selected the 4-cluster model.

Figure 2:

Alluvial plot demonstrating how cluster membership obtained from the faecal calprotectin profiles of Crohn’s disease patients changes as the assumed number of clusters increase. The height of each band indicates the size of each cluster.

Figure 3 presents the log mean profiles for the 4-cluster model alongside subject-specific observed FCAL trajectories. The model identified three main groups of patients: clusters 1, 2 and 3 (92, 191, and 58 subjects, respectively) and a small cluster 4 with 15 subjects. Clusters 1 and 3 display similar profiles — both showing a sharp decrease in FCAL which then remains low. However, cluster 1 is differentiated by the decrease occurring immediately after diagnosis, whilst this decrease does not occur until around a year after diagnosis for cluster 3. In contrast, cluster 2 is characterised by a mean profile which remains consistently high: never dropping below the 250μg/g clinical threshold for disease activity. Finally, the mean profile for cluster 4 exhibits an initial decrease, but this is not sustained during the first 3 years.

Figure 3:

Log-transformed subject-specific five-year faecal calprotectin profiles for the study cohort for A, cluster 1; B, cluster 2; C, cluster 3; D, cluster 4. The red solid line represents the predicted mean trajectory for each cluster, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject. The blue dotted line indicates an FCAL of log(250 μg/g): the commonly accepted threshold for biochemical remission in Crohn’s disease. See Figure S6 for the fits in the original measurement scale.

3.3 Association with Variables Available at Diagnosis

Out of the eight variables typically available at diagnosis we tested for association with class membership, two variables were found to be significant at the 5% significance level before applying a Bonferroni adjustment: smoking status (p = 0.01; p_adj = 0.08) and the presence of upper gastrointestinal inflammation (p < 0.001; p_adj = 0.002). 24% and 23% of cluster 1 and cluster 2 respectively were smokers when they were diagnosed, whereas only 7% of cluster 3 and cluster 4 smoked during this period. Only 9% of cluster 1 had upper gastrointestinal involvement at diagnosis in comparison to the 27%, 34%, and 33% in cluster 2, cluster 3, and cluster 4 respectively.

3.4 Association with Treatments

A difference in the percentage of subjects prescribed a biologic therapy within one year of diagnosis was observed across classes (Table 1): 46% of class 1 were prescribed one of these treatments, compared to 18% and 21% for class 2 and class 3 respectively.

Out of the prescriptions considered, being prescribed a thiopurine within one year of diagnosis (p = 0.023; p_adj = 0.16) and being prescribed a biologic either within three months (p < 0.001; p_adj = 0.004) or one year of diagnosis (p < 0.001; p_adj < 0.001) were found to be significant before Bonferroni adjustment. However, class membership could not be predicted from demographic data and biologic prescriptions (AUC of 0.68 for the multinomial logistic regression model and 0.66 for the random forest classifier).

4 Discussion

In this study, four patient clusters in the CD population with distinct FCAL trajectories have been identified and described (Figure 3). To the best of our knowledge, we are the first to apply LCMMs to characterise latent patient heterogeneity using FCAL data, although others have applied linear mixed models to FCAL data ^7,8 or have applied LCMMs in other disease contexts ^30,31.

We have demonstrated that membership to these clusters is associated with smoking and upper gastrointestinal inflammation. A comparatively high number of subjects who smoked at diagnosis were found in both cluster 1 and cluster 2 despite cluster 1 being characterised by an overall decrease in FCAL and cluster 2 being characterised by a consistently high profile. The interpretation of this finding is not clear from our data. Previous research has found smoking to be associated with low drug concentrations for infliximab and adalimumab, mediating low remission rates in CD patients ³² in addition to being associated with undergoing surgery and disease progression ³³. Upper gastrointestinal involvement is likely a proxy for a more severe CD sub-phenotype. We also observed cluster membership to be associated with early biologic treatment. This is arguably reasonable given the often-reported association between FCAL and endoscopic activity and an association between biologic treatments and endoscopic healing for CD patients ^34,35.

The approach demonstrated here has notable advantages over the methodology used by the IBSEN study which required participants to choose which diagram they believed best described their disease activity out of four possible options ³. Using FCAL profiles allows us to quantify inflammation in an objective manner rather than using patient reported symptom activity which may be influenced by recency bias and the tendency for patient-reported data to exhibit extreme responses ³⁶.Furthermore, using FCAL allows longitudinal profiles to be generated in a data-driven manner. Instead of profiles needing to be generated based on prior beliefs and opinion, we can allow these profiles to be formed naturally. Finally, FCAL profiles can be readily generated for many CD patients from electronic healthcare records without requiring active involvement from study subjects.

Some similarities can be observed between the clinically derived profiles in the IBSEN cohort patterns and the cluster-specific mean profiles uncovered in this study. Both studies identified a large group of patients that exhibit a decline in severity of symptoms (cluster 1 and cluster 3 in our study) and a group with chronic continuous symptoms (cluster 2 in our study). However, the IBSEN study identified a group with increasing intensity of symptoms which was not found by our analysis. Such differences may be due to the disconnect between symptoms and inflammation which is commonly seen when using endoscopic activity scores ³⁷. Moreover, the IBSEN study findings were gathered before the widespread emergence of biologic therapies for CD and may not represent more modern trends which also may not be well known a priori: demonstrating the advantage of being able to infer subgroup profiles in a data-driven manner.

In this study, eight potential associations with variables typically available at diagnosis,and seven potential associations with treatments have been explored. As such, we potentially invite criticism due to multiple testing. Indeed, some associations reported here (e.g. between cluster membership and smoking) fail to be significant after applying Bonferroni corrections. However, we believe our findings here are biologically plausible and in line with other published literature.

The retrospective design of this study remains a limitation, and the results reported may be due to observational biases and should not be assigned a causal interpretation. In particular, quantifying causal treatment effects from such observational data is an active area of research and such analysis is beyond the scope of this study ^38,39. The data gathering process is observational and whilst FCAL is collected routinely at all clinical interactions, subjects with more complicated disease are still likely to have more measurements available. The retrospective study design also means all subjects did not have the same treatment options at the same stage in their disease trajectories, as subjects may have been diagnosed any time between 2005 and 2017. However, the date of diagnosis, converted to the number of days the subject was diagnosed after 01/01/2001, was considered for potential association with cluster membership and no significant association was found (p = 0.12). We also acknowledge the potential for inclusion bias in this study. The study by Plevris et al. required subjects to have an FCAL of at least 250μg/g at diagnosis and excluded subjects which met one of the endpoints within a year of diagnosis. The former potentially excludes subjects with milder disease, whilst the latter potentially excludes subjects with more aggressive disease.

The clusters reported here are intended purely for exploring heterogeneity in CD and are not intended for use as predictors in a risk score. Indeed, some FCAL measurements were taken after typical outcomes of interest (e.g. surgery), hence cluster membership information is not a suitable risk factor. However, our approach provides an objective way to characterise disease trajectory heterogeneity using a routinely collected inflammation marker, providing a proof of concept for novel longitudinal patient stratification in the context CD.

5 Conclusion

We have demonstrated the suitability and utility of latent class mixed modelling for identifying clusters within the CD population based on FCAL profiles. After we found and described four clusters, we reported cluster membership to be significantly associated with smoking and upper gastrointestinal involvement. We believe our findings are an important first step towards embracing longitudinal FCAL measurements to explain disease heterogeneity in CD.

Data Availability

The data used in this study is not publicly available as it originates from patients who have not given consent for the data to be publicly shared. For access to the data, please contact CWL.

https://vallejosgroup.github.io/lcmm-site/

6 Authorship

NC-C, KM-G, NP, REM, CWL, and CAV contributed to the conception and study design for the manuscript. NC-C, NP, LD, BG, and CWL collected the data for this study. All authors except REM had access to the study data. NC-C performed all statistical analysis. NC-C, BG, and KM-G drafted the manuscript. All authors were involved with critical revision of the manuscript, and all authors reviewed and approved the final manuscript prior to submission.

7 Data Availability

The data used in this study is not publicly available, as it originates from patients who have not given consent for the data to be publicly shared. For access to the data, please contact CWL.

8 Funding

This work was supported by the Medical Research Council & University of Edinburgh Precision Medicine PhD studentship (MR/N013166/1, to NC-C) and the UKRI Future Leaders Fellowship (MR/S034919/1, to CWL. KM-G was supported by an MRC University Unit grant to the MRC Human Genetics Unit. GRJ is supported by a Wellcome Trust Clinical Research Career Development Fellowship.

9 Conflicts of Interest

NC-C: none declared; KM-G: none declared; NP has received consultancy fees from Takeda, speaker fees and/or travel support from Abbvie, Takeda, Norgine; LAAPD has received consultancy fees from Sandoz, speaking fees from Janssen; BG has received consultancy fees from Abbvie; GRJ has received speaker fees from Abbvie, Takeda, Pfizer, Ferring and Janssen; REM: none declared; CWL has received research support from Abbvie and Gilead, consultancy fees from Abbvie, Pfizer, Janssen, Gilead, Celltrion, Pharmacosmos, Takeda, Vifor, Iterative Scopes, Trellus Health, Galapagos, Vifor Pharma, Bristol Meyers Squibb, Boehringer Ingelheim, Sandoz, Novartis, Fresnius, and Kabi Tillotts; speaker fees and/or travel support from Janssen, Abbvie, Pfizer, Dr Falk, Ferring, Hospira, GSK, and Takeda; CAV: none declared.

Supplemental Materials for Constantine-Cooke et al

Appendix A Supplementary Figures and Tables

Figure S1:

Distribution of number of FCAL measurements within five years of diagnosis per subject.

Figure S2:

Assuming two clusters, log-transformed subject-specific five-year faecal calprotectin profiles for the study cohort for A, cluster 1; B, cluster 2. The red solid line represents the predicted mean trajectory for each group, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject. The blue dotted line indicates an FCAL of log(250 μg/g): the commonly accepted threshold for biochemical remission in Crohn’s disease.

Figure S3:

Assuming three clusters, log-transformed subject-specific five-year faecal calprotectin profiles for the study cohort for A, cluster 1; B, cluster 2; C, cluster 3. The red solid line represents the predicted mean trajectory for each group, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject. The blue dotted line indicates an FCAL of log(250 μg/g): the commonly accepted threshold for biochemical remission in Crohn’s disease.

View this table:

Table S1:

Model fit statistics for latent class models fitted to the faecal calprotectin data for different numbers of latent subgroups. G: number of assumed clusters; AIC: Akaike information criterion; BIC: Bayesian information criterion.

Figure S4:

Assuming five clusters, log-transformed subject-specific five-year faecal calprotectin profiles for the study cohort for A, cluster 1; B, cluster 2; C, cluster 3; D, cluster 4; E, cluster 5. The red solid line represents the predicted mean trajectory for each group, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject. The blue dotted line indicates an FCAL of log(250 μg/g): the commonly accepted threshold for biochemical remission in Crohn’s disease.

Figure S5:

Assuming six clusters, log-transformed subject-specific five-year faecal calprotectin profiles for the study cohort for A, cluster 1; B, cluster 2; C, cluster 3; D, cluster 4; E, cluster 5; F, cluster 6. The red solid line represents the predicted mean trajectory for each group, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject. The blue dotted line indicates an FCAL of log(250 μg/g): the commonly accepted threshold for biochemical remission in Crohn’s disease.

Figure S6:

Five-year mean faecal calprotectin (FCAL) trajectories for clusters obtained by fitting latent class mixed models for A, cluster 1; B, cluster 2; C, cluster 3; D, cluster 4. The red solid line represents the predicted mean trajectory for each group, whilst the red dotted lines represent 95% confidence intervals. The grey lines indicate the trajectory of each subject.

Appendix B Statistical Methods

Formal Definition of the Model

We assume a population of N individuals is heterogeneous and composed of G latent classes (or clusters): each characterised by a distinct mean profile of FCAL (in logarithmic scale) across time. We assume each subject i has a vector of repeated FCAL measurements of length n_i: allowing the number of measurements to differ across subjects. Random effects specification are used to capture intra-individual correlation in FCAL measurements. We allow each subject i to belong to only one latent class and introduce a discrete random variable c_i which is equal to g if subject i belongs to the latent class g, where g = 1, …, G.

The logarithm of the FCAL measurement for the ith subject taken at time t_ij is denoted by Y_ij. Given that the subject i belongs to class g, the latter is modelled using a latent class mixed model LCMM: where the vector of regression coefficients β_g capture class-specific fixed effects, u_ig denote random effects distributed such that u_ig ∼ N (0, B) (the variance-covariance matrix is shared across classes) and ϵ_ij indicates an independently distributed Gaussian error term with zero mean and variance

In (S1), X(t) = (1, X₁(t), X₂(t), X₃(t), X₄(t))^′ is a vector of time-dependent covariates used to capture non-linear dependency between log-FCAL values and time (the first element ensures the model includes an intercept term). These are defined using natural cubic splines with three knots (4 cubic polynomials) ¹. The natural cubic splines were calculated as a pre-processing step prior to estimating the model in (S1) using the ns function of the splines R library ². For this purpose, knots were located at the first, second and third quantiles of measurement times across all FCAL measurements.

The probability of c_i = g is given as a class specific probability and is described by a multinomial logistic model: where ξ_0g indicates the intercept for class g in this model. For identifiability, ξ_0G = 0.

After inferring all model parameters, posterior class-membership probabilities for each subject are given by: where Y_i denotes a vector of length n_i containing all longitudinal measurements recorded for subject i, X(t_i·) is a matrix (n_i × 4) comprised of all the corresponding time-dependent covariates for subject i, denotes the estimates obtained for all model parameters and corresponds to (S2) evaluated on Finally, denotes a multivariate normal density function with mean and variance covariance , where denotes an identify matrix with dimension n_i.

Footnotes

↵† Shared senior authorship
Manuscript has been revised to make clearer the model presented is intended purely as an explanatory model and not to be used for risk predictions in the clinic; Survival analysis followup is now across five years from diagnosis, the same followup period as faecal calprotectin observations (previously survival analysis followup was across all EHR data available).
↵^* https://vallejosgroup.github.io/lcmm-site/

References

[1].↵
Jones GR, Lyons M, Plevris N, et al. IBD prevalence in Lothian, Scotland, derived by capture–recapture methodology. Gut 2019; 68(11): 1953–1960. doi: 10.1136/gutjnl-2019-318936
OpenUrl Abstract/FREE Full Text
[2].↵
Hamilton B, Green H, Heerasing N, et al. Incidence and prevalence of inflammatory bowel disease in Devon, UK. Frontline Gastroenterol. 2021; 12(6): 461–470. doi: 10.1136/flgastro-2019-101369
OpenUrl Abstract/FREE Full Text
[3].↵
Henriksen M, Jahnsen J, Lygren I, et al. Clinical course in Crohn’s disease: Results of a five-year population-based follow-up study (the IBSEN study). Scand. J. Gastroenterol. 2007; 42(5): 602-610. PMID: 17454881doi: 10.1080/00365520601076124
OpenUrl CrossRef PubMed
[4].↵
D’Amico F, Nancey S, Danese S, Peyrin-Biroulet L. A practical guide for faecal calprotectin measurement: myths and realities. J. Crohns Colitis 2020; 15(1): 152–161. doi: 10.1093/ecco-jcc/jjaa093
OpenUrl CrossRef
[5].↵
Kennedy NA, Jones GR, Plevris N, Patenden R, Arnott ID, Lees CW. Association between level of fecal cal-protectin and progression of Crohn’s disease. Clin. Gastroenterol. Hepatol. 2019; 17(11): 2269-2276.e4. doi: 10.1016/j.cgh.2019.02.017
OpenUrl CrossRef
[6].↵
Plevris N, Lees CW. Disease Monitoring in Inflammatory Bowel Disease: Evolving Principles and Possibilities. Gastroenterology 2022; 162(5): 1456-1475.e1. doi: 10.1053/j.gastro.2022.01.024
OpenUrl CrossRef
[7].↵
De Vos M, Dewit O, D’Haens G, et al. Fast and sharp decrease in calprotectin predicts remission by infliximab in anti-TNF naïve patients with ulcerative colitis. J. Crohns Colitis 2012; 6(5): 557–562. doi: 10.1016/j.crohns.2011.11.002
OpenUrl CrossRef PubMed Web of Science
[8].↵
Zhulina Y, Cao Y, Amcoff K, Carlson M, Tysk C, Halfvarson J. The prognostic significance of faecal calprotectin in patients with inactive inflammatory bowel disease. Aliment. Pharmacol. Ther. 2016; 44(5): 495–504. doi: 10.1111/apt.13731
OpenUrl CrossRef PubMed
[9].↵
Proust-Lima C, Philipps V, Liquet B. Estimation of extended mixed models using latent classes and latent processes: The R package lcmm. J. Stat. Softw. 2017; 78(2): 1–56. doi: 10.18637/jss.v078.i02
OpenUrl CrossRef
[10].↵
Courvoisier D, Alpizar-Rodriguez D, Gottenberg J, et al. Rheumatoid arthritis patients after initiation of a new biologic agent: trajectories of disease activity in a large multinational cohort study. EBioMedicine 2016; 11: 302–306. doi: 10.1016/j.ebiom.2016.08.024
OpenUrl CrossRef
[11].↵
Jiang G, Luk AOY, Tam CHT, et al. Progression of diabetic kidney disease and trajectory of kidney function decline in Chinese patients with Type 2 diabetes. Kidney Int. 2019; 95(1): 178–187. doi: 10.1016/j.kint.2018.08.026
OpenUrl CrossRef PubMed
[12].↵
Plevris N, Fulforth J, Lyons M, et al. Normalization of fecal calprotectin within 12 months of diagnosis is associated with reduced risk of disease progression in patients with Crohn’s disease. Clin. Gastroenterol. Hepatol. 2021; 19(9): 1835–1844.e6. doi: 10.1016/j.cgh.2020.08.022
OpenUrl CrossRef
[13].↵
Elhakeem A, Hughes RA, Tilling K, et al. Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Med. Res. Methodol. 2022; 22(1): 68. doi: 10.1186/s12874-022-01542-8
OpenUrl CrossRef
[14].↵
James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning ch. 7: 297–317; Springer Texts in Statistics. Springer US. 2nd ed. 2021
OpenUrl
[15].↵
Kwak C, Clayton-Matthews A. Multinomial logistic regression. Nurs. Res. 2002; 51(6).
[16].↵
Breiman L. Random forests. Mach. Learn. 2001; 45(1): 5–32. doi: 10.1023/A:1010933404324
OpenUrl CrossRef
[17].↵
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: 241–247; Springer Series in Statistics. Springer New York. 2nd ed. 2009
[18].↵
Hand DJ, Till RJ. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 2001; 45(2): 171–186. doi: 10.1023/a:1010920819831
OpenUrl CrossRef
[19].↵
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2022.
[20].↵
Proust-Lima C, Philipps V, Diakite A, Liquet B. lcmm: extended mixed models using latent classes and latent processes. https://cran.r-project.org/package=lcmm; 2021. R package version: 1.9.3.
[21].↵
Therneau TM. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival; 2021. R package version 3.2-13.
[22].↵
Kassambara A, Kosinski M, Biecek P. survminer: Drawing Survival Curves using ‘ggplot2’. https://CRAN.R-project.org/package=survminer; 2021. R package version 0.4.9.
[23].↵
Venables WN, Ripley BD. Modern applied statistics with S. New York: Springer. fourth ed. 2002. ISBN 0-387-95457-0.
[24].↵
Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 2017; 77(1): 1–17. doi: 10.18637/jss.v077.i01
OpenUrl CrossRef
[25].↵
Constantine-Cooke N. datefixR: fix really messy dates in R. https://CRAN.R-project.org/package=datefixR; 2022. R package version 0.1.4
[26].↵
Wickham H, Averick M, Bryan J, et al. Welcome to the tidyverse. J. Open Source Softw. 2019; 4(43): 1686. doi: 10.21105/joss.01686
OpenUrl CrossRef PubMed
[27].↵
Kuhn M, Wickham H. Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles.. https://CRAN.R-project.org/package=tidymodels;.
[28].↵
Greenwell BM, Boehmke BC. Variable importance plots—An introduction to the vip package. The R Journal 2020; 12(1): 343–366. doi: 10.32614/RJ-2020-013
OpenUrl CrossRef
[29].↵
Brunson JC. ggalluvial: layered grammar for alluvial plots. J. Open Source Softw. 2020; 5(49): 2017. doi: 10.21105/joss.02017
OpenUrl CrossRef
[30].↵
Chapuis N, Ibrahimi N, Belmondo T, et al. Dynamics of circulating calprotectin accurately predict the outcome of moderate COVID-19 patients. EBioMedicine 2022; 80. doi: 10.1016/j.ebiom.2022.104077
OpenUrl CrossRef
[31].↵
Vistisen D, Andersen GS, Hulman A, Persson F, Rossing P, Jørgensen ME. Progressive Decline in Estimated Glomerular Filtration Rate in Patients With Diabetes After Moderate Loss in Kidney Function—Even Without Albuminuria. Diabetes Care 2019; 42(10): 1886–1894. doi: 10.2337/dc19-0349
OpenUrl Abstract/FREE Full Text
[32].↵
Kennedy NA, Heap GA, Green HD, et al. Predictors of anti-TNF treatment failure in anti-TNF-naive patients with active luminal Crohn’s disease: a prospective, multicentre, cohort study. Lancet Gastroenterol. Hepatol. 2019; 4(5): 341–353. doi: 10.1016/s2468-1253(19)30012-3
OpenUrl CrossRef
[33].↵
Lawrance IC, Murray K, Batman B, et al. Crohn’s disease and smoking: Is it ever too late to quit?. J. Crohns Colitis 2013; 7(12): e665–e671. doi: 10.1016/j.crohns.2013.05.007
OpenUrl CrossRef PubMed
[34].↵
Jusué V, Chaparro M, Gisbert JP. Accuracy of fecal calprotectin for the prediction of endoscopic activity in patients with inflammatory bowel disease. Dig. Liver Dis. 2018; 50(4): 353–359. doi: 10.1016/j.dld.2017.12.022
OpenUrl CrossRef PubMed
[35].↵
Narula N, Wong EC, Dulai PS, Marshall JK, Jairath V, Reinisch W. Comparative effectiveness of biologics for endoscopic healing of the ileum and colon in Crohn’s disease. Am. J. Gastroenterol. 2022; Publish Ahead of Print. doi: 10.14309/ajg.0000000000001795
OpenUrl CrossRef
[36].↵
Vaerenbergh YV, Thomas TD. Response Styles in Survey Research: A Literature Review of Antecedents, Consequences, and Remedies. Int. J. Public Opin. Res. 2012; 25(2): 195–217. doi: 10.1093/ijpor/eds021
OpenUrl CrossRef
[37].↵
Koutroumpakis E, Katsanos K. Implementation of the simple endoscopic activity score in Crohn’s disease. Saudi. J. Gastroenterol. 2016; 22(3): 183. doi: 10.4103/1319-3767.182455
OpenUrl CrossRef
[38].↵
Nogueira AR, Pugnana A, Ruggieri S, Pedreschi D, Gama J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022; 12(2). doi: 10.1002/widm.1449
OpenUrl CrossRef
[39].↵
Hammerton G, Munafò MR. Causal inference with observational data: the need for triangulation of evidence. Psychol. Med. 2021; 51(4): 563–578. doi: 10.1017/s0033291720005127
OpenUrl CrossRef

References

[1].
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: 241–247; Springer Series in Statistics. Springer New York. 2nd ed. 2009
[2].
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2022.

View the discussion thread.

Posted October 27, 2022.

Download PDF

Data/Code

Citation Tools

Subject Area

Gastroenterology

Subject Areas

All Articles

Addiction Medicine (410)
Allergy and Immunology (723)
Anesthesia (214)
Cardiovascular Medicine (3075)
Dentistry and Oral Medicine (347)
Dermatology (260)
Emergency Medicine (461)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1093)
Epidemiology (13006)
Forensic Medicine (13)
Gastroenterology (858)
Genetic and Genomic Medicine (4819)
Geriatric Medicine (444)
Health Economics (749)
Health Informatics (3047)
Health Policy (1100)
Health Systems and Quality Improvement (1123)
Hematology (406)
HIV/AIDS (955)
Infectious Diseases (except HIV/AIDS) (14310)
Intensive Care and Critical Care Medicine (878)
Medical Education (453)
Medical Ethics (119)
Nephrology (498)
Neurology (4593)
Nursing (244)
Nutrition (680)
Obstetrics and Gynecology (842)
Occupational and Environmental Health (764)
Oncology (2379)
Ophthalmology (672)
Orthopedics (268)
Otolaryngology (333)
Pain Medicine (300)
Palliative Medicine (87)
Pathology (514)
Pediatrics (1236)
Pharmacology and Therapeutics (517)
Primary Care Research (518)
Psychiatry and Clinical Psychology (3944)
Public and Global Health (7174)
Radiology and Imaging (1596)
Rehabilitation Medicine and Physical Therapy (951)
Respiratory Medicine (942)
Rheumatology (458)
Sexual and Reproductive Health (474)
Sports Medicine (402)
Surgery (508)
Toxicology (65)
Transplantation (219)
Urology (188)

[1] [1].↵
Jones GR, Lyons M, Plevris N, et al. IBD prevalence in Lothian, Scotland, derived by capture–recapture methodology. Gut 2019; 68(11): 1953–1960. doi: 10.1136/gutjnl-2019-318936
OpenUrl Abstract/FREE Full Text

[2] [2].↵
Hamilton B, Green H, Heerasing N, et al. Incidence and prevalence of inflammatory bowel disease in Devon, UK. Frontline Gastroenterol. 2021; 12(6): 461–470. doi: 10.1136/flgastro-2019-101369
OpenUrl Abstract/FREE Full Text

[3] [3].↵
Henriksen M, Jahnsen J, Lygren I, et al. Clinical course in Crohn’s disease: Results of a five-year population-based follow-up study (the IBSEN study). Scand. J. Gastroenterol. 2007; 42(5): 602-610. PMID: 17454881doi: 10.1080/00365520601076124
OpenUrl CrossRef PubMed

[4] [4].↵
D’Amico F, Nancey S, Danese S, Peyrin-Biroulet L. A practical guide for faecal calprotectin measurement: myths and realities. J. Crohns Colitis 2020; 15(1): 152–161. doi: 10.1093/ecco-jcc/jjaa093
OpenUrl CrossRef

[5] [5].↵
Kennedy NA, Jones GR, Plevris N, Patenden R, Arnott ID, Lees CW. Association between level of fecal cal-protectin and progression of Crohn’s disease. Clin. Gastroenterol. Hepatol. 2019; 17(11): 2269-2276.e4. doi: 10.1016/j.cgh.2019.02.017
OpenUrl CrossRef

[6] [6].↵
Plevris N, Lees CW. Disease Monitoring in Inflammatory Bowel Disease: Evolving Principles and Possibilities. Gastroenterology 2022; 162(5): 1456-1475.e1. doi: 10.1053/j.gastro.2022.01.024
OpenUrl CrossRef

[7] [7].↵
De Vos M, Dewit O, D’Haens G, et al. Fast and sharp decrease in calprotectin predicts remission by infliximab in anti-TNF naïve patients with ulcerative colitis. J. Crohns Colitis 2012; 6(5): 557–562. doi: 10.1016/j.crohns.2011.11.002
OpenUrl CrossRef PubMed Web of Science

[8] [8].↵
Zhulina Y, Cao Y, Amcoff K, Carlson M, Tysk C, Halfvarson J. The prognostic significance of faecal calprotectin in patients with inactive inflammatory bowel disease. Aliment. Pharmacol. Ther. 2016; 44(5): 495–504. doi: 10.1111/apt.13731
OpenUrl CrossRef PubMed

[9] [9].↵
Proust-Lima C, Philipps V, Liquet B. Estimation of extended mixed models using latent classes and latent processes: The R package lcmm. J. Stat. Softw. 2017; 78(2): 1–56. doi: 10.18637/jss.v078.i02
OpenUrl CrossRef

[10] [10].↵
Courvoisier D, Alpizar-Rodriguez D, Gottenberg J, et al. Rheumatoid arthritis patients after initiation of a new biologic agent: trajectories of disease activity in a large multinational cohort study. EBioMedicine 2016; 11: 302–306. doi: 10.1016/j.ebiom.2016.08.024
OpenUrl CrossRef

[11] [11].↵
Jiang G, Luk AOY, Tam CHT, et al. Progression of diabetic kidney disease and trajectory of kidney function decline in Chinese patients with Type 2 diabetes. Kidney Int. 2019; 95(1): 178–187. doi: 10.1016/j.kint.2018.08.026
OpenUrl CrossRef PubMed

[12] [12].↵
Plevris N, Fulforth J, Lyons M, et al. Normalization of fecal calprotectin within 12 months of diagnosis is associated with reduced risk of disease progression in patients with Crohn’s disease. Clin. Gastroenterol. Hepatol. 2021; 19(9): 1835–1844.e6. doi: 10.1016/j.cgh.2020.08.022
OpenUrl CrossRef

[13] [13].↵
Elhakeem A, Hughes RA, Tilling K, et al. Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Med. Res. Methodol. 2022; 22(1): 68. doi: 10.1186/s12874-022-01542-8
OpenUrl CrossRef

[14] [14].↵
James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning ch. 7: 297–317; Springer Texts in Statistics. Springer US. 2nd ed. 2021
OpenUrl

[15] [15].↵
Kwak C, Clayton-Matthews A. Multinomial logistic regression. Nurs. Res. 2002; 51(6).

[16] [16].↵
Breiman L. Random forests. Mach. Learn. 2001; 45(1): 5–32. doi: 10.1023/A:1010933404324
OpenUrl CrossRef

[17] [17].↵
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: 241–247; Springer Series in Statistics. Springer New York. 2nd ed. 2009

[18] [18].↵
Hand DJ, Till RJ. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 2001; 45(2): 171–186. doi: 10.1023/a:1010920819831
OpenUrl CrossRef

[19] [19].↵
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2022.

[20] [20].↵
Proust-Lima C, Philipps V, Diakite A, Liquet B. lcmm: extended mixed models using latent classes and latent processes. https://cran.r-project.org/package=lcmm; 2021. R package version: 1.9.3.

[21] [21].↵
Therneau TM. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival; 2021. R package version 3.2-13.

[22] [22].↵
Kassambara A, Kosinski M, Biecek P. survminer: Drawing Survival Curves using ‘ggplot2’. https://CRAN.R-project.org/package=survminer; 2021. R package version 0.4.9.

[23] [23].↵
Venables WN, Ripley BD. Modern applied statistics with S. New York: Springer. fourth ed. 2002. ISBN 0-387-95457-0.

[24] [24].↵
Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 2017; 77(1): 1–17. doi: 10.18637/jss.v077.i01
OpenUrl CrossRef

[25] [25].↵
Constantine-Cooke N. datefixR: fix really messy dates in R. https://CRAN.R-project.org/package=datefixR; 2022. R package version 0.1.4

[26] [26].↵
Wickham H, Averick M, Bryan J, et al. Welcome to the tidyverse. J. Open Source Softw. 2019; 4(43): 1686. doi: 10.21105/joss.01686
OpenUrl CrossRef PubMed

[27] [27].↵
Kuhn M, Wickham H. Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles.. https://CRAN.R-project.org/package=tidymodels;.

[28] [28].↵
Greenwell BM, Boehmke BC. Variable importance plots—An introduction to the vip package. The R Journal 2020; 12(1): 343–366. doi: 10.32614/RJ-2020-013
OpenUrl CrossRef

[29] [29].↵
Brunson JC. ggalluvial: layered grammar for alluvial plots. J. Open Source Softw. 2020; 5(49): 2017. doi: 10.21105/joss.02017
OpenUrl CrossRef

[30] [30].↵
Chapuis N, Ibrahimi N, Belmondo T, et al. Dynamics of circulating calprotectin accurately predict the outcome of moderate COVID-19 patients. EBioMedicine 2022; 80. doi: 10.1016/j.ebiom.2022.104077
OpenUrl CrossRef

[31] [31].↵
Vistisen D, Andersen GS, Hulman A, Persson F, Rossing P, Jørgensen ME. Progressive Decline in Estimated Glomerular Filtration Rate in Patients With Diabetes After Moderate Loss in Kidney Function—Even Without Albuminuria. Diabetes Care 2019; 42(10): 1886–1894. doi: 10.2337/dc19-0349
OpenUrl Abstract/FREE Full Text

[32] [32].↵
Kennedy NA, Heap GA, Green HD, et al. Predictors of anti-TNF treatment failure in anti-TNF-naive patients with active luminal Crohn’s disease: a prospective, multicentre, cohort study. Lancet Gastroenterol. Hepatol. 2019; 4(5): 341–353. doi: 10.1016/s2468-1253(19)30012-3
OpenUrl CrossRef

[33] [33].↵
Lawrance IC, Murray K, Batman B, et al. Crohn’s disease and smoking: Is it ever too late to quit?. J. Crohns Colitis 2013; 7(12): e665–e671. doi: 10.1016/j.crohns.2013.05.007
OpenUrl CrossRef PubMed

[34] [34].↵
Jusué V, Chaparro M, Gisbert JP. Accuracy of fecal calprotectin for the prediction of endoscopic activity in patients with inflammatory bowel disease. Dig. Liver Dis. 2018; 50(4): 353–359. doi: 10.1016/j.dld.2017.12.022
OpenUrl CrossRef PubMed

[35] [35].↵
Narula N, Wong EC, Dulai PS, Marshall JK, Jairath V, Reinisch W. Comparative effectiveness of biologics for endoscopic healing of the ileum and colon in Crohn’s disease. Am. J. Gastroenterol. 2022; Publish Ahead of Print. doi: 10.14309/ajg.0000000000001795
OpenUrl CrossRef

[36] [36].↵
Vaerenbergh YV, Thomas TD. Response Styles in Survey Research: A Literature Review of Antecedents, Consequences, and Remedies. Int. J. Public Opin. Res. 2012; 25(2): 195–217. doi: 10.1093/ijpor/eds021
OpenUrl CrossRef

[37] [37].↵
Koutroumpakis E, Katsanos K. Implementation of the simple endoscopic activity score in Crohn’s disease. Saudi. J. Gastroenterol. 2016; 22(3): 183. doi: 10.4103/1319-3767.182455
OpenUrl CrossRef

[38] [38].↵
Nogueira AR, Pugnana A, Ruggieri S, Pedreschi D, Gama J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022; 12(2). doi: 10.1002/widm.1449
OpenUrl CrossRef

[39] [39].↵
Hammerton G, Munafò MR. Causal inference with observational data: the need for triangulation of evidence. Psychol. Med. 2021; 51(4): 563–578. doi: 10.1017/s0033291720005127
OpenUrl CrossRef

Longitudinal Faecal Calprotectin Profiles Characterise Disease Course Heterogeneity in Crohn’s Disease

Abstract

1 Introduction

2 Materials and Methods

2.1 Study Design

2.2 Criteria & Definitions

2.3 FCAL Assay

2.4 Statistical Analysis

2.5 Ethics

3 Results

3.1 FCAL Measurements

3.2 Modelling FCAL Trajectories

3.3 Association with Variables Available at Diagnosis

3.4 Association with Treatments

4 Discussion

5 Conclusion

Data Availability

6 Authorship

7 Data Availability

8 Funding

9 Conflicts of Interest

Supplemental Materials for Constantine-Cooke et al

Appendix A Supplementary Figures and Tables

Appendix B Statistical Methods

Formal Definition of the Model

Footnotes

References

References

Citation Manager Formats

Subject Area