Abstract
Background Genetic susceptibility to inflammatory bowel disease (IBD) has been widely studied, whereas the genetic contribution to disease progression over time remains largely unknown. In a unique population-based cohort, we explored if genetic susceptibility to IBD associated with disease course severity.
Methods In a Danish nationwide cohort of 3688 patients with Crohn’s disease (CD), 4491 patients with ulcerative colitis (UC), and 9469 controls, we estimated polygenic scores (PGS) for IBD susceptibility. We then investigated the association between susceptibility PGS and severe versus less severe disease courses. Patients were categorized as having a severe disease course if needing I) at least two IBD-related hospitalizations exceeding two days, II) at least two IBD-related major surgeries, III) one hospitalization and one major surgery not overlapping in time, or IV) total use of minimum 5000 mg systemic corticosteroids within the first three years of diagnosis. Secondary analyses explored the association with other severity measures including inflammatory markers, exposure to biologics and immunomodulators, and time to hospitalization and major IBD-related surgery. Statistical analyses included logistic, linear, and Cox proportional hazards regression, and predictive modeling using random forest.
Findings Patients with severe disease courses had higher susceptibility PGS than patients with less severe disease courses (CD: OR=1·27, P=7·73×10−10, UC: OR=1·35, P=7·29×10−17 per SD increase in susceptibility PGS). When comparing the highest versus lowest PGS quintile, we observed a hazard ratio (HR) for major surgery of 2·74 (P=7·19×10−18) in patients with CD and 2·04 (P=4·36×10−7) in patients with UC. A higher susceptibility PGS was also associated with longer and more frequent hospitalizations, higher levels of C-reactive protein, decreased hemoglobin, and a higher need for corticosteroids, immunomodulators, and biologic therapies.
Interpretation Patients with a higher genetic burden for developing IBD also experience a more severe disease course, suggesting a shared genetic architecture between disease susceptibility and severity.
Funding Danish National Research Foundation, DNRF148, Lundbeck Foundation, R313-2019-857, Novo Nordisk Foundation, NNF23OC0087616, Danish Colitis Crohn Association.
Introduction
Inflammatory bowel disease (IBD) is a chronic, progressive, immune-mediated intestinal disease with an increasing incidence and prevalence globally. The two most common subtypes of the disease are Crohn’s disease (CD) and ulcerative colitis (UC).1,2 Patients are often diagnosed in early adulthood and experience a serious, life-altering condition that does not currently have a curative treatment. The disease course is highly heterogeneous, ranging from relatively mild symptoms to severe disease characterized by uncontrolled gastrointestinal inflammation and repeated hospitalizations and surgeries.3,4 A better understanding of which patients will experience a more severe disease course would be the first step to improve treatment of this subgroup. However, our current limited understanding of IBD and the factors driving its heterogenicity makes it difficult to anticipate a patient’s disease course.
Genome-wide association studies (GWAS) of IBD have primarily focused on disease onset by comparing patients with IBD to population controls. To date, more than 600 susceptibility loci have been identified.5,6 However, the genetic contribution to disease progression over time remains largely unknown. In an analysis of approximately 2700 patients with CD performed by Lee et al.,7 four individual loci were associated with the need for immunomodulators or gastrointestinal surgeries. These loci were not associated with disease susceptibility.
Polygenic scores (PGS) use the results from GWAS to calculate the aggregated effect of trait-associated variants carried by an individual. To date, PGS has primarily been used to model the individual risk of disease onset for complex diseases where a well-powered, large-scale GWAS exists. For IBD, PGS can predict disease onset with limited performance (area under the receiver operating characteristic curve (AUC) = 0·63) and only identifies ∼3.2% of the population with a threefold increased risk of IBD.8 As for genetics overall, very little is known about the impact of genetics on the course of IBD. Although a strong association between PGS and disease location has been reported,9 previous studies have failed to identify an association between susceptibility PGS and anti-TNF response,10 early onset,11 and CD severity stratified by disease location.7
In a large, unique population-based IBD cohort with both genetics and long-term clinical data, we investigated the hypothesis that an individual’s genetic burden, as measured by their IBD susceptibility PGS, impacts the severity of the disease course.
Methods and materials
Danish registries
The study was based on data from Danish registries, including the Danish Civil Registration System,12 The Danish National Patient Registry (DNPR13), The Danish National Prescription Registry,14 and The Danish nationwide Registry of Laboratory Results for Research (RLRR15). They hold information on inpatient hospital contacts, International Classification of Diseases 8th and 10th revision (ICD-8 and -10) codes, surgical and other procedures since 1977, outpatient contacts since 1995, redeemed prescriptions for Danish residents at all pharmacies since 1994, and results of biochemistry and hematological tests from hospitals and general practitioners since 2008 with complete coverage from 2015. We had data available until September 2022. All registry codes used are listed in Table S1.
Source populations
The PREDICT neonatal blood spot cohort (PREDICT-NBS cohort)
The Danish National Biobank stores neonatal blood spots from almost all Danes born since 1982.16 The PREDICT-NBS cohort includes neonatal blood spots from individuals born and diagnosed with IBD between April 1981 and December 2018. Identification of patients with IBD was based on previously described criteria.2 IBD patients should have at least two IBD-related hospital registrations within a two-year period recorded in the DNPR. Assignment of CD or UC was based on the two most recent hospital contacts if these were concordant, and otherwise according to the highest number of diagnoses. Date of diagnosis was the date of the first IBD-related hospital registration. Patients not living in Denmark during the two years before this first IBD-related hospital registration were excluded to ensure a valid date of diagnosis. Each IBD patient was matched to controls based on sex, date of birth, and no diagnosis of IBD prior to the patient’s diagnosis date. Matched controls that developed IBD from end of cohort selection (December 2018) to end of available registry data (September 2022) were considered IBD patients.
The North Denmark IBD cohort (NorDIBD)
The NorDIBD cohort is a population-based cohort of all patients from the North Denmark Region with a confirmed IBD diagnosis from 1978 to 2020.17 From this cohort, 940 patients with IBD and a further 973 blood donors with no IBD diagnosis were genotyped from whole blood. The date of IBD diagnosis was based on patient health records.
Overlapping individuals between the two cohorts (n=233) were excluded randomly from one of the cohorts. Details on genotyping, quality control, and generation of genetic principal components (PCs) are previously described18 and available in Supplement 1 and summarized in Figure S1.
Study population
We excluded patients diagnosed with IBD before January 1, 1996, and after September 2019, to ensure at least three years of post-diagnosis follow-up in the registries. This resulted in a final study population of 3688 patients with CD (3340 from the PREDICT-NBS cohort and 348 from the NorDIBD cohort), and 4491 patients with UC (4153 from the PREDICT-NBS cohort and 338 from the NorDIBD cohort) (Figure S2).
Polygenic score
We downloaded and applied to our dataset variant loadings for calculating PGS for CD and UC susceptibility generated by Middha et al.19 They generated variant loadings from summary statistics from IIBDGC20 calibrated on UK Biobank data. We calculated PGS using Plink; 738,417 out of 744,682 (>99%) and 738,314 out of 744,575 (>99%) available variants were included for calculating CD and UC susceptibility PGS respectively. We also calculated CD and UC susceptibility PGS for the controls. The mean (SD) of CD PGS across controls and CD patients was 1.00 (1.27) and the corresponding mean (SD) of UC PGS across controls and UC patients was −0.95 (1.01). The PGS scores were normalized to standard deviations (SDs) from the mean value to improve interpretability.
Outcomes
Our primary outcome was severe versus less severe disease course. We defined a severe IBD course based on having one of the following outcomes within three years after date of diagnosis: I) two or more IBD-related hospitalizations exceeding two days, II) two or more IBD-related major surgeries, III) one IBD-related hospitalization and one IBD-related major surgery not overlapping in time, or IV) a total use of at least 5000 mg systemic corticosteroids (Figure S3). The threshold for systemic corticosteroids was set based on visualization of the data. Patients not fulfilling these criteria were considered less severe.
Secondary outcomes were age at diagnosis, and the following outcomes within three years after date of diagnosis: two or more IBD-related hospitalizations exceeding two days, two or more IBD-related major surgeries, one or more IBD-related major surgeries, a total use of at least 5000 mg systemic corticosteroids, treatment with biologics, treatment with immunomodulators, other inflammatory immune mediated diseases (IMDs), and days spent in hospital due to IBD. Additionally, we analyzed time to first IBD-related hospitalization and time to first IBD-related major surgery. Finally, we examined markers of inflammation i.e., C-reactive protein (CRP), fecal calprotectin (f-calpro) and hemoglobin at time of diagnosis and during follow-up. At diagnosis was defined as ± six months from the date of diagnosis, and the post-diagnosis phase was defined as three years starting from six months after date of diagnosis. Definitions and codes for primary and secondary outcomes are listed in Table S1.
Statistical analyses
We analyzed the association between susceptibility PGS and the primary outcome (severe versus less severe IBD) using logistic regression, adjusting for sex, cohort, calendar year, age at diagnosis, and the first ten PCs. A similar model was used for the analysis of the binary secondary outcomes (two or more IBD-related hospitalizations exceeding two days, two or more IBD-related major surgeries, one or more IBD-related major surgeries, a total use of at least 5000 mg systemic corticosteroids, treatment with biologics, treatment with immunomodulators, and other IMIDs). For the continuous secondary outcomes (age at IBD diagnosis and days spent in hospital due to IBD), we used linear regression adjusting for the above-mentioned covariates. All results are per SD increase of susceptibility PGS.
For analysis of time to first hospitalization and first major IBD-related surgery, we divided patients into PGS quintiles and used Cox proportional hazards regression models to test for differences across PGS quintiles. These survival analyses included patients previously excluded based on not having three years of follow-up (CD=44, UC=44), and censoring was based on death, emigration, or end of study. The model included adjustment for sex, cohort, calendar year, age at diagnosis, and the first ten PCs.
At- and post-diagnosis biochemistry test values of CRP, f-calpro, and hemoglobin were evaluated for association with susceptibility PGS (per SD increase) using a linear regression model including the above-mentioned covariates. Where there were multiple samples per individual, the sample closest to date of diagnosis was chosen for the at-diagnosis interval, whereas the median value was chosen for the post-diagnosis interval. Test values were transformed using log transformation for CRP and f-calpro to approach a Gaussian distribution.
Reported p-values from tests of secondary outcomes and from biochemistry test values were adjusted for multiple testing using the Bonferroni correction.
Prediction models
Random forest models were generated for predicting severe CD and UC based on patient information available at date of diagnosis. We fitted a baseline model only including the covariates sex, cohort, calendar year and age at diagnosis, parental IBD status, and having other IMIDs at time of diagnosis. The baseline model was expanded to evaluate the increase in predictive power of adding a) susceptibility PGS, b) CRP, and f-calpro test results within ± six months from the date of diagnosis, and c) both susceptibility PGS, CRP, and f-calpro.
The random forest models were fitted on a random subset of 80% of the data, of which the less severe group was randomly down-sampled to have the same number of patients as the severe group. Parameter settings were tuned using three times ten-fold cross validation. The final model evaluation was based on the remaining 20% validation data. The performance of the models was reported as the area under the receiver operating curves (ROC-AUC).
Ethical approval
The PREDICT-NBS cohort was approved by the Research Ethical Committee of the Capital Region of Denmark (H-20048987) and no direct consent was required. The NorDIBD cohort was approved by the Research Ethical Committee of the North Denmark Region (N-20170005-N-20200071). Oral and written consents were obtained, and we followed up on whether any participant withdrew their consent.
Results
We included 8179 patients with IBD (3688 (45%) patients with CD and 4491 (55%) with UC) and 9469 control individuals (Table 1). Among patients with CD, the mean age at diagnosis was 21·7 years (SD=7·4), 56% were female, and 1165 patients (31·6%) had a severe disease course based on three years of follow-up. Among patients with UC, the mean age at diagnosis was 23·3 years (SD=8·4), 53% were female, and 1203 patients (26·8%) had a severe disease course. Control individuals had a mean normalized PGS of −0.221 for CD PGS and −0.235 for UC PGS. Patients with CD had a mean normalized CD PGS of 0.528, whereas patients with UC had a mean normalized UC PGS of 0.469.
Numbers (percentage) are reported in the table.
Higher genetic burden of IBD susceptibility is associated with a more severe disease course
The PGS of IBD susceptibility was significantly associated with severe versus less severe disease for both CD and UC (Figure 1A-B). Patients with severe CD had higher PGS than the remaining patients with CD (odds ratio [OR]=1·26, 95% confidence interval [CI] =1·17-1·35 per SD increase). Accordingly, 39·3% of CD patients in the highest PGS quintile experienced a severe disease course compared to 23·6% of CD patients in the lowest PGS quintile. This was also the case for patients with UC (OR=1·35, 95% CI=1·26-1·45 per SD increase), where 34·7% of patients in the highest PGS quintile experienced a severe disease course compared to 20·1% in the lowest PGS quintile.
A) Violin plot of PGS values of CD susceptibility (normalized to SDs from the mean) for controls and CD severity groups. Summary statistics are outputted from logistic regressions. B) Violin plot of PGS values of UC susceptibility (normalized to SDs from the mean) for controls and UC severity groups. Summary statistics are outputted from logistic regressions. We analyzed the association between the PGS of susceptibility for developing CD and UC and ten different severity outcomes. Binary outcomes C) were analyzed using logistic regression and the OR (±95% CI) per SD increase in PGS is shown for CD and UC patients separately. Numerical outcomes D) were analyzed using linear regression and the β estimates (±95% CI) per SD increase in PGS are shown for CD and UC patients separately. P-values were adjusted for multiple testing using the Bonferroni correction for ten tests.
CD: Crohn’s disease, UC: ulcerative colitis, IBD: inflammatory bowel disease, OR: odds ratio, PGS: polygenic score, SD: standard deviation, IMID: immune-mediated inflammatory disease, CI: confidence interval
We investigated the association between susceptibility PGS and the secondary severity outcomes within the follow-up period: two or more IBD-related hospitalizations exceeding two days, total hospitalization days, two or more IBD-related major surgeries, one or more IBD-related major surgeries, a total use of at least 5000 mg systemic corticosteroids, treatment with biologics, treatment with immunomodulators, occurrences of other IMIDs, and age at IBD onset (Figure 1C-D). In general, we observe that all individual components of the main outcome were associated with susceptibility PGS with a consistent effect size ranging from 1·24-1·50 and with overlapping CIs.
The most significant association between susceptibility PGS and CD severity was detected for treatment with immunomodulators (OR=1·47, 95% CI=1·36-1·58 per SD increase), treatment with biologic therapies (OR=1·35, 95% CI=1·25-1·45 per SD increase) and need for major surgery (OR=1·49, 95% CI=1·35-1·65 per SD increase). CD PGS was further associated with having at least two hospitalization events, a higher number of hospitalization days, use of at least 5000 mg of systemic corticosteroids, and with younger age at diagnosis.
The most significant association between susceptibility PGS and UC severity was detected for having at least two separate hospitalization events (OR=1·43, 95% CI=1·31-1·56 per SD increase) and for more days in hospital (β=1·68, 95% CI=1·26-2·09 days of hospitalization per SD increase). UC PGS further associated with need for treatment with biologic therapies and immunomodulators, use of at least 5000 mg of systemic corticosteroids, and with the need of major surgery.
Extending beyond three years of follow-up
Separating the CD and UC cohorts into quintiles based on susceptibility PGS revealed cumulative incidence curves ranking based on their PGS quintile, where the quintile with lowest PGS had the longest event-free time (Figure 2). This pattern was consistent for both time to hospitalization and time to major surgery for both CD and UC. Comparing the highest PGS quintile to the lowest revealed a quicker time to hospitalization (CD: hazard ratio [HR]=1·81, 95% CI=1·57-2·09, UC: HR=1·69, 95% CI=1·47-1·95) and to major surgery (CD: HR=2·74, 95% CI=2·18-3·45, UC: HR=2·04, 95% CI=1·55-2·69).
A) Cumulative incidence of IBD-related hospitalization split by CD susceptibility PGS quintiles within the CD cohort. B) Cumulative incidence of IBD-related major surgery split by CD susceptibility PGS quintiles within the CD cohort. C) Cumulative incidence of IBD-related hospitalization split by UC susceptibility PGS quintiles within the UC cohort. D) Cumulative incidence of IBD-related major surgery split by UC susceptibility PGS quintiles within the UC cohort.
IBD: Inflammatory bowel disease, UC: ulcerative colitis, CD: Crohn’s disease, PGS: polygenic score, Q1-Q5: Quintiles 1 to 5, where Q1 has the lowest quintile of PGS scores, and Q5 the highest.
Biochemistry profile is affected by susceptibility genetics
Increased susceptibility PGS was associated with disease activity markers of gastrointestinal and systemic inflammation (Figure 3). At diagnosis, f-calpro was increased (CD: β=0·32, 95% CI=0·23-0·42 mg/kg per SD increase, UC: β=0·23, 95% CI=0·14-0·33 mg/kg per SD increase) and hemoglobin decreased (CD: β=-0·11, 95% CI=-0·16--0·07 mmol/L per SD increase, UC: β=-0·08, 95% CI=-0·12--0·03 mmol/L per SD increase) with higher PGS, whereas CRP was only increased with higher PGS for patients with CD (β=0·17, 95% CI=0·10-0·24 mg/L per SD increase). For patients with CD, increased f-calpro levels remained significantly associated with higher PGS in the post-diagnosis phase (β=0·25, 95% CI=0·18-0·33 mg/kg per SD increase), whereas for patients with UC, decreased hemoglobin levels remained significant (β=-0·04, 95% CI=-0·07--0·01 mmol/L per SD increase).
Plot showing β estimates and their 95% CI for association between laboratory test results (CRP, f-calpro, and hemoglobin) and SD standardized susceptibility PGS for CD and UC separately. For each individual, the test results were calculated as following: Diagnosis = test result closest to date of diagnosis (±six months), and post-diagnosis = the median value of all tests from six months to three years after date of diagnosis. Back-transformed estimates and 95% CIs are showed. P-values were adjusted for multiple testing using the Bonferroni correction for three tests. The dots are shaped based on significance after adjusting for multiple testing. Dot sizes are based on number of individuals with available test results in that given period.
CD: Crohn’s disease, UC: ulcerative colitis, PGS: polygenic score, SD: standard deviation, CRP: C-reactive protein, f-calpro: fecal calprotectin, CI: confidence interval, NS: not significant
Predicting IBD severity at the time of diagnosis
We constructed random forest models to separate severe and less severe disease courses of patients diagnosed with IBD using information available at time of diagnosis (Figure 4). The down-sampled, balanced dataset consisted of 367 severe and non-severe patients with CD and 440 severe and non-severe patients with UC with information on all predictors. The AUC of the full models on the test data were 0·651 and 0·752 for CD and UC respectively. Susceptibility PGS ranked as the most important predictor of CD severity and second most important predictor in UC severity. Adding CRP and f-calpro values at time of diagnosis diminished the effect of PGS and ranked to be the most important predictors of both CD and UC severity. Explicitly modelling HLA-DRB1*01:0318 that is associated with both susceptibility to IBD and severity of UC did not increase the performance of the model in UC (AUC=0·726).
Random forest models were fitted using patient information available at date of diagnosis as predicters. Receiver operating characteristic (ROC) curves for the performance on a test dataset is shown for A) CD and B) UC. C) Variable importance for predicting IBD severity is shown based on ranking by decreasing mean decrease in Gini index.
CD: Crohn’s disease, UC: ulcerative colitis, IBD: inflammatory bowel disease, PGS: polygenic score, IMID: immune-mediated inflammatory disease, CRP: C-reactive protein, f-calpro: fecal calprotectin, TPR: true positive rate, FPR: false positive rate
Discussion
In this unique, population-wide study of >8000 patients with available GWAS data and long-term follow-up, we investigated the influence of IBD susceptibility genetics on the course of IBD using known susceptibility PGS for IBD and a range of longitudinal severity outcomes within the first three years, including hospitalizations, major surgeries, medication use, and test results from biochemistry laboratory tests.
We found an association between the aggregated genetic susceptibility to CD and UC and disease severity over time. In patients with CD, the susceptibility PGS associated most strongly with undergoing major surgery and needing treatment with biologics or immunomodulators. In patients with UC, the susceptibility PGS associated strongly with more frequent and longer hospitalizations, as well as with a need for systemic corticosteroids, immunomodulators, and biologics.
When splitting time to hospitalization and major surgery by PGS quintiles and introducing follow-up beyond three years, we observed an additive pattern for both CD and UC, where a higher genetic susceptibility to CD and UC associated with a proportionally higher hazard of hospitalization and major surgery.
Using longitudinal nationwide data from biochemistry laboratory tests, we further showed that the IBD susceptibility PGS is associated with more gastrointestinal inflammation in the form of higher values of f-calpro at time of diagnosis of both CD and UC and during follow-up in CD. Furthermore, susceptibility PGS associated with lower hemoglobin at diagnosis of CD and UC and during follow-up in UC, which is part of the diagnostic criteria for acute severe UC21.
Our prediction models of IBD severity showed that CRP and f-calpro at time of diagnosis were the most predictive markers for future severe disease and that adding susceptibility PGS did not improve predictive power. The performance of future prediction models could potentially be improved by the creation of a future IBD “severity PGS” based on genetics studies of IBD disease course.
Complementing the findings from Lee et al.7 and Cleynen et al.9, our results suggest that IBD susceptibility genetics only explain part of the severity of IBD, hence leaving room for other factors at play. These should be investigated together with genetics and included in future prognostic prediction models. This observation is promising, as it points towards the possibility for modulating the effects of genetic susceptibility through lifestyle interventions directed towards potential environmental factors that are yet to be uncovered. Future prediction models of IBD severity may also include the gut microbiota profile22 and additional biochemistry measurements to CRP and f-calpro.23 The present study is hopefully a step towards the development of algorithms based on genetics, environmental exposures, and molecular pathways which in the future can identify patients in need of increased monitoring and intensified treatment to prevent their disease from progressing and improving their quality of life.
The primary strength of this study is its nationwide character and unselected nature, which assures generalizability of findings, as well as the complete longitudinal follow-up of patients. Identifying IBD patients using the Danish registries has previously been validated with a positive predictive value of 95%24 and completeness of 75-94%.24,25 A further strength is the use of unique population-based samples for genetic analyses, which was combined with lifetime health data.
Our study also has potential limitations, which need to be considered. The cohort is rather young, and our results may primarily reflect the course of IBD in this part of the patient population. Also, we did not have nationwide data on disease extent and behavior to study if the genetic burden of susceptibility modulates disease severity through these factors.9 However, the linearity of the association (especially for UC) between the different susceptibility PGS quintiles and hospitalization and surgery risk suggests that the effect is not limited to disease types that require more surgical interventions (ileal CD, extensive UC) suggesting that the effect is driven by other factors also. We used comparable definitions of severe disease for CD and UC which may be debated which is why we performed additional sensitivity analysis that demonstrated the consistency of the association. Laboratory analyses were not available for all patients and their availability may potentially reflect severity. However, no study has to our knowledge been able to test all these parameters over time in a cohort where genetics were also available.
Conclusion
This study presents novel evidence of an association between the genetic susceptibility burden and subsequent severity of IBD over time. Patients with a higher susceptibility PGS experience more hospitalizations, surgeries and medical treatments, suggesting that genetics of disease susceptibility and disease severity are not orthogonal. Our findings of a dose-response relationship between the genetic burden of IBD susceptibility and disease severity provide a new biological understanding of disease severity.
Declaration of interests
TJ reports consultancy for Ferring and Pfizer. LL reports speaker fee from Takeda and advisory board for Tillotts. HAJ reports speaker fee for Tillotts. The remaining authors have no disclosures.
Data sharing statement
This work uses data from the Danish National Health registries (https://sundhedsdatastyrelsen.dk), which is protected by the Danish Act on Processing of Personal Data. To access the data an application is required, which must be approved from the Danish Data Protection Agency and the Danish Health Data Authority. Handling of the data must be carried out by a Danish institution.
Acknowledgements
We thank the nurses at the outpatient clinic at the Department of Gastroenterology and Hepatology at Aalborg University Hospital, Denmark, for their important contribution to the sample collection.