Abstract
Type 1 diabetes (T1D) and other autoimmune diseases (AIDs) co-occur in families. We studied the aggregation of 50 parental AIDs with T1D in offspring and the contribution of a shared genetic background, which was partitioned into HLA and non-HLA variation. Leveraging nationwide registers of 7.2M Finns, including 58,284 family trios, we observed that 15 parental AIDs, such as coeliac disease and rheumatoid arthritis, were associated with an increased risk of T1D in offspring. The identified epidemiological associations were then tested by comprehensive genetic analyses performed on 470K Finns genotyped in the FinnGen study (12,563 trios). The within-family genetic transmission analysis further demonstrated that the aggregation of parental AIDs with offspring T1D could be partially explained by HLA and non-HLA polymorphisms in a disease-dependent manner. For example, the associations with offspring T1D for coeliac disease and psoriasis were mainly driven by HLA while autoimmune hypothyroidism and rheumatoid arthritis also had non-HLA contributors. We, therefore, proposed a novel parental polygenic score (PGS), integrating variations in both HLA and non-HLA genes, to understand the cumulative risk pattern of T1D in offspring. This raises an intriguing possibility of considering parental PGS, in conjunction with clinical diagnoses, to inform individuals about T1D risk in their offspring.
Type 1 diabetes (T1D) is an autoimmune disease (AID) characterized by a severe insulin deficiency resulting from the destruction of insulin-producing pancreatic beta cells. While the pathogenic mechanism remains largely unknown, T1D usually appears in genetically susceptible individuals1,2 and triggered by environmental exposures3,4 - mostly before 20 years of age, with incidence increasing from birth and peaking at age 10-14 during puberty5–7. Notably, in many populations such as Europeans, Asians, and Latins, genetic polymorphisms in human leukocyte antigen (HLA) genes were reported to account for up to half of the T1D heritability, for which the strongest effects were attributable to two HLA class II haplotypes, HLA-DR3-DQ2 and DR4-DQ88–14. In addition to the well-documented large-effect HLA contribution, non-HLA susceptibility also plays a role2. Large-scale genome-wide association studies (GWAS) have identified dozens of independent non-HLA signals across the genome15–18, that together with HLA polymorphisms can be used to construct polygenic scores (PGS) to evaluate genetic predisposition of T1D prior to the disease onset19–21 as well as better differentiate T1D from other types of diabetes22–24.
Interestingly, T1D tends to co-occur with some other AIDs, both in the same individuals and within families (Supplementary notes 1). Previous studies mainly explored the co-occurrence from two angles. A popular and straightforward one was to adopt a population-based design and epidemiological analysis for the risk of T1D among people whose relatives were diagnosed with some AIDs. We have provided a detailed review of these studies in Table S1. To obtain conclusive findings, a sufficient number of trios or families is needed25–27. Another set of studies attempted to leverage genetic information for a better understanding of T1D-AID aggregation among genetically related individuals28–30. In one such study, among the one-third of children with T1D who were found to have a relative affected by an AID, having HLA-DR3-DQ2 haplotype was associated with celiac disease (CD) in their extended family29. Whereas this type of studies mainly collected family history from questionnaires and focused only on HLA haplotype analysis due to data limitation, other studies systematically researched the disease associations of common variants in either HLA or non-HLA genes. Instead of individual-level data, many studies utilized publicly accessible summary statistics generated from population-based analyses. For example, a recent study captured intercorrelation of AIDs, such as T1D, rheumatoid arthritis (RA), and autoimmune hypothyroidism (HYPO), in two large HLA hotspots (locus 961 and 964)31. Further, GWAS revealed shared genetic susceptibility loci beyond the HLA regions for T1D and five other AIDs, including CD, RA, multiple sclerosis (MS), Crohn’s diseases, and systemic lupus erythematosus32.
To date, there has been a lack of comprehensive investigation that combines the two angles and encompasses a wider range of AID spectrum than CD, RA, HYPO and MS. Meanwhile, it remains an enigma why individuals with similar genetic backgrounds can develop such heterogeneous diseases ranging from potentially lethal insulin deficiency to mild hypothyroidism - even among members of the same family who also tend to share the same environment. One way to approach this question is to combine evidence from population-based multi-generational cohorts and large-scale biobanks with high-resolution diagnostic data, preferably of the same population to avoid any bias or confounding caused by differences in across-population socio-cultural contexts and clinical practices. In this study, we aim to comprehensively evaluate the genetic determinants of familial aggregation of T1D and other AIDs using the nationwide multi-generational health registers of the whole Finnish population in FinRegistry33, as well as the genomic and family trio data available in Finnish biobanks through the FinnGen Study34. Considering the world’s highest incidence of T1D in Finland35, these detailed and structured data resources provide us a unique opportunity to quantify the associations between parental AIDs and offspring T1D, and more importantly, to answer three fundamental questions: 1) Which parental diagnoses of AIDs are associated with T1D in offspring? 2) To what extent is the shared genetic background between AIDs and T1D driven by genetic components, separately for HLA and non-HLA variants? And, 3) Can AID-related genetic information in parents complement disease family history be used to estimate the risk of having a child developing T1D?
Results
Study populations and disease diagnoses
We leveraged nationwide socio-demographic and health registers of the entire Finnish population collected in the FinRegistry study33 (N = 7.2 million) and a subset of 473,681 genotyped Finns from the FinnGen study34 (Figure 1A and 1B). To avoid potentially biased estimates caused by incomplete medical records, we excluded individuals who had migrated in or out of Finland by 2019 (572,640 were excluded for FinRegistry and 9,196 for FinnGen). To have comparatively complete coverage of AID diagnoses for parents and T1D diagnoses for children, we considered parents born before 1976 and children born between 1960 and 1999, corresponding to at least 20 years of follow-up for children by the end of 2019 (Figure S2.2). We defined T1D cases as those who had their first diagnosis recorded before the age of 40. In total, we included 14,571 T1D cases in FinRegistry (0.61%) and 3,668 in FinnGen (0.83%). To examine the association between AIDs in parents and T1D in offspring, as well as the extent to which genetic factors could contribute to the identified association, we constructed 2.4 million family trios in FinRegistry based on Finnish multi-generation registers and 12,563 genotype-inferred trios in FinnGen (Methods). Of these trios, 14,571 in FinRegistry and 1,129 in FinnGen had children with T1D.
An overview of the study design and study populations. Panel A, the study population of FinRegistry (7.2 million individuals) represents every Finn alive in 2010 (5.3 million individuals) and their first-degree relatives. To maximize the coverage of diagnoses of AIDs for parents and T1D for children, we included only family trios with both parents born before 1976 and children between 1960-1999 (the follow-up time in 2019 was at least 45 years for parents and 20 years for children). The solid lines denote the birth year range of the study population and the dashed lines the years of follow-up. Among the 2.4 million family trios, 14,571 had a child ever diagnosed with T1D (T1D trios), and for each of the T1D-trios, we matched three control trios based on sex, birth year, birthplace, and the number of siblings of the child, as well as birth years of both parents. In total, we included 14,571 T1D-trios and their matched 43,713 control trios. Panel B, the study population of FinnGen includes 473.7K Finns enrolled in a nationwide network of Finnish biobanks, of which 12,563 family trios can be accurately constructed with available genomic information. FinnGen includes 3,370 T1D cases, 385,786 controls, as well as 1,129 T1D-trios and 11,434 control trios. Panel C illustrates a matched case-control study conducted in the 58,284 FinRegistry family trios to examine the association between parents’ AID and the T1D status of offspring. The different symbols denote father, mother, and offspring, and colors the disease status (red, offspring with T1D; green, parents with AIDs; grey, individuals unaffected by T1D or other AIDs). Panel D depicts the examination of shared genetic components between T1D and other AIDs using population-based analyses (left panel with blue background) or trio-based family analyses (right panel with yellow background). A haplotype and PGS-based analysis of HLA variants conducted in 473.7K FinnGen participants (left); a genetic correlation analysis of the non-HLA variants utilized GWAS summary statistic data (middle); and a polygenic transmission disequilibrium test (pTDT) examined in FinnGen participants whether the AID-associated common variants as a whole were over-transmitted from AID-unaffected parents to their T1D-affected offspring (right). Panel E, the average AID PGS of the parents and its predictive performance of T1D in offspring among 12,563 genotyped family trios in FinnGen. AID, autoimmune disease; T1D, type 1 diabetes; HLA, human leukocyte antigen; PGS, polygenic score; GWAS, genome-wide association study.
To have broad coverage of AIDs affecting different tissues or organ systems, we defined 50 AIDs or autoimmune-related disorders based on Finnish versions of the International Classification of Diseases (ICD) codes 8-10 (Supplementary notes 2.1). After excluding 13 AIDs with less than 50 parental cases in FinRegistry and 11 AIDs that were highly correlated with other AIDs (Table S2.1), we included 26 AIDs for further analyses: T1D, adrenocortical insufficiency (ADDISON), autoimmune hyperthyroidism (HYPER), autoimmune hypothyroidism (HYPO), autoimmune haemolytic anaemias (AIHA), allergic purpura, vitamin B12 deficiency anemia (B12A), idiopathic thrombocytopenic purpura (ITP), sarcoidosis, primary biliary cholangitis (PBC), celiac disease (CD), inflammatory bowel disease (IBD), IgA nephropathy, ankylosing spondylitis (AS), mixed connective tissue disease (MCTD), rheumatoid arthritis (RA), Sjögren’s syndrome (SjS), systemic sclerosis, Wegener granulomatosis (WG), systemic lupus erythematosus (SLE), Guillain-Barré syndrome (GBS), multiple sclerosis (MS), myasthenia gravis (MG), alopecia areata (AA), psoriasis, and vitiligo. The number of parental cases ranged from 55 for AIHA (corresponding to a prevalence of 0.05%) to 11,485 for HYPO (prevalence = 9.85%) (Table S2.1 and S3.1.1).
We designed four sets of complementary analyses (Figure 1). We first examined the association between parental AID and offspring T1D risk in FinRegistry. For AIDs showing significant epidemiological associations, we explored the HLA-related associations with haplotype and PGS-based analyses in FinnGen, and the genome-wide non-HLA associations with a summary statistic-based genetic correlation analysis. A polygenic transmission disequilibrium test (pTDT) was then used to examine the within-family transmission of AID genetic signals from AID-unaffected parents to their T1D-affected and unaffected offspring. Lastly, we proposed a novel parental PGS method, integrating both HLA and non-HLA contributors, to assess the risk of developing T1D in offspring.
Epidemiological associations between parental AIDs and T1D in offspring
We first wanted to examine the associations between parental AIDs and T1D in children using the large number of family trios available in the FinRegistry study. For each of the 14,571 trios with T1D-affected children, we employed a 1:3 matched case-control design considering the information of both the child (sex, birth year, birthplace, and number of siblings) and parents (birth year) (Methods and Figure 1C). For other available socio-demographic factors that were not used in matching, we saw limited differences between T1D trios and their matched controls (Table S3.1.2). Overall, children with T1D (42.2% [95% CI 41.4%, 43.0%]) were more likely to have AID-affected parent(s) compared to those without T1D (31.9% [31.5%, 32.3%]).
Of 26 parental AIDs examined in FinRegistry, fifteen were associated with increased risk of T1D in offspring at a nominal P-value <0.05, and nine remained statistically significant after Bonferroni correction for multiple-testing (0.05/26 = 0.002) (Figure 2 and Table S3.2.1.1). The strongest association was seen for T1D (odds ratio (OR) [95% CI], 6.77 [5.44, 8.42], P=3.8×10−66), followed by CD (2.14 [1.90, 2.42], P=2.5×10−35), B12A (1.76 [1.54, 2.02], P=1.4×10−16), MCTD (1.54 [1.24, 1.92], P=8.5×10−5), HYPO (1.53 [1.46, 1.60], P=4.8×10−73), RA (1.48 [1.37, 1.60], P=1.8×10−23), sarcoidosis (1.45 [1.28, 1.65], P=1.1×10−8), HYPER (1.42 [1.20, 1.68], P=4.3×10−5), and SjS (1.33 [1.12, 1.59], P=1.6×10−3). In terms of sex, we didn’t observe significant differences either for parents or for offspring (Figure S3.2.2). However, as previously reported in the literature36–38, fathers with T1D had a higher risk of having children with T1D than mothers with T1D (7.49 [5.68, 9.86] vs. 5.71 [4.01, 8.13], Pdifference= 0.24), but this was not significant. We also noted that, a paternal history of T1D could significantly impact the onset age of T1D among boys, resulting in T1D being diagnosed 1.85 [0.29, 3.41] years earlier compared to having a T1D-unaffected father (P=0.02) (Supplementary notes 3.2.3). A sensitivity analysis restricted to early-onset T1D (before age 20) yielded stronger associations for parental early-onset T1D and parental SLE from the main analysis (before age 40). For example, the OR increased from 6.77 [5.44, 8.42] to 10.90 [7.90, 15.04] for parental early-onset T1D (Pdifference=0.02) and from 1.29 [0.96, 1.73] to 1.77 [1.24, 2.51] for parental SLE (Pdifference=0.18) (Table S3.2.1.2).
Epidemiological associations between parental AIDs and T1D in offspring ordered by P-value. Upper panel: Association (OR, and 95% CI) between parental AIDs and T1D in offspring using a matched case-control design in Finnish nationwide registry data (FinRegistry). Lower panel: number of T1D cases with parents having a given AID diagnosis. The dark green diamonds and bars indicate AIDs that are significantly associated with T1D after multiple testing corrections. AID, autoimmune disease; T1D, type 1 diabetes; OR, odds ratio; CI, confidence interval.
Shared genetic components from HLA and non-HLA variants at a population level
Parents and offspring share many environmental exposures and genetic components. We next aimed to quantify the extent to which the identified epidemiological associations are attributable to a shared genetic background. Given that T1D is known to be impacted by both HLA and non-HLA variations2,8–18, we analyzed HLA and non-HLA regions separately at a population level (Methods). We first examined the association of the main T1D predisposing / protective HLA haplotypes with other AIDs to understand to what extent a single haplotype can drive the identified epidemiological associations. Next, we combined multiple HLA imputed alleles to create multi-allelic scores for HLA for each AIDs and examine the association with T1D. The analyses were conducted on 439,817 individuals from FinnGen (Figure 1B, Table S3.1.1). For non-HLA variations, we estimated genetic correlation between T1D and other AIDs using the latest GWAS summary statistics.
Shared effect of HLA haplotypes for T1D and other AIDs
Given that for T1D, the strongest genetic associations are for Class II haplotypes, in particular HLA-DR3 and DR4 haplotypes8–14,39, we constructed 19 DRB1-DQA1-DQB1 haplotypes based on 64 imputed alleles of these genes in FinnGen40 (Methods).
We examined 19 haplotypes (frequencies > 0.5%, Table S3.3.1.1), of which six were strongly associated with T1D (P<1.0 ×10−50), with DRB1*04:01-DQA1*03:01-DQB1*03:02 (OR=7.54 [7.00, 8.13], P<1.0 ×10−602) conferring the strongest susceptibility and DRB1*15:01-DQA1*01:02-DQB1*06:02 (0.06 [0.04, 0.07], P=2.0×10−106) the strongest protection (Table S3.3.1.2). We then tested the association of these six haplotypes with the 24 other included AIDs using logistic regression and considering the age, sex, and first ten principal components (PCs) as covariates (Figure 3A). Many of these T1D-associated haplotypes were also associated with other AIDs (P<1.0 ×10−4) (Table S3.3.1.3). For example, the T1D risk haplotype DRB1*03:01-DQA1*05:01-DQB1*02:01 increased the risk of both CD (15.83 [14.56, 17.20], P<1.0 ×10−918) and autoimmune hyperthyroidism (2.42 [2.23, 2.63], P=6.3×10−100), and DRB1*04:01-DQA1*03:01-DQB1*03:02 increased the risk of RA (1.93 [1.84, 2.02], P=5.1 ×10−174). Of note, opposite effects were seen for some AIDs: the strongest haplotype protecting against T1D (DRB1*15:01-DQA1*01:02-DQB1*06:02) was the lead risk haplotype for MS41 (2.85 [2.60, 3.12], P=7.9×10−112) and the strongest T1D susceptibility haplotype (DRB1*04:01-DQA1*03:01-DQB1*03:02) protected against IBD (0.85 [0.80, 0.90], P=4.7×10−8).
Shared genetic background between T1D and other AIDs, stratified by HLA and non-HLA variations at a population level. Panel A, HLA haplotypes strongly associated with T1D and their associations with other AIDs in FinnGen (3,668 T1D cases and 436,149 T1D controls). The strength of the association is shown as a heatmap with blue depicting a susceptible haplotype associated with the AID in question. The haplotypes are sorted by the P-value for T1D as shown on the left. Panel B, upper panel: association (OR, and 95% CI) between HLA PGSs for different AIDs and T1D in FinnGen. The dark blue squares indicate that HLA PGS for a given disease has a significant association with T1D after multiple testing corrections. “×” denotes that an HLA PGS could not be robustly constructed for that AID (Methods). Bottom panel: blue bars represent the number of individuals with the given disease. Panel C, upper panel: non-HLA based genetic correlations (rg and 95% CI) between AIDs and T1D using GWAS summary statistics from European populations. Bottom panel: the number of cases for the given disease. The dark orange squares or bars indicate that the AIDs have a significant rg with T1D after multiple testing corrections. T1D, type 1 diabetes; AID, autoimmune disease; HLA, human leukocyte antigen; OR, odds ratio; CI, confidence interval; PGS, polygenic score; rg, genetic correlation; GWAS, genome-wide association study.
Multi-allelic scores (PGS) of HLA alleles for AIDs and T1D
Having observed that multiple HLA haplotypes could contribute to the susceptibility to T1D and other AIDs independently, we constructed polygenic HLA scores (HLA PGSs) to summarize the overall multi-allelic HLA effects for each AIDs. Considering the widespread variation in HLA allele frequencies across populations42, we opted to construct HLA PGSs within the FinnGen samples 187 imputed Class I and Class II alleles using weighted ridge regression (Methods). We were able to construct reliable HLA PGSs (P≤0.05/26 and partial correlation |ρ|≥2% for predicting the susceptibility of the disease itself) in 23,336 individuals for 14 AIDs (including T1D). Of these, HLA PGSs for 13 AIDs exhibited significant associations with T1D susceptibility after multiple-test correction (0.05/14 = 0.003), with exceptions for psoriasis (P=3.6×10−3) and AS (P=6.1×10−2) (Figure 3B and Table S3.3.2.1). Overall, the effects of AID HLA PGS on individuals’ own T1D risk were highly correlated with the effects of parental AIDs on offspring T1D obtained from our FinRegistry epidemiological analyses (ρ from a Spearman’s rank correlation = 0.63, P=6.1×10−4). Among all AIDs other than T1D (OR=5.33 [4.79, 5.93], P =3.74×10−207), strongest associations were seen for ADDISON (OR=2.30 [2.13, 2.49], P =4.2×10−100), RA (OR=2.10 [1.95, 2.27], P=1.8×10−78), and CD (OR=2.07 [1.94, 2.22], P =2.4×10−96). Consistent with the HLA haplotype analysis, we observed negative associations with T1D for HLA PGS of IBD (OR=0.64 [0.60, 0.69], P=1.2×10−30) and MS (OR=0.81 [0.75, 0.88], P=2.4×10−7). These associations were further replicated with a distinct analytic approach - leave-one-group-out (LOGO) (Supplementary notes 2.3.2 and Table S3.3.2.2).
Genome-wide non-HLA correlation between T1D and AIDs
In addition to the major HLA contributors, non-HLA variants identified in the GWAS studies are known to impact AID susceptibility as well. We, therefore, explored the shared correlation between T1D and other AIDs using a summary statistics-based method - LD score regression (LDSC)43,44, excluding the HLA regions (Methods). Significant positive genetic correlations (rg) were observed between T1D and ten AIDs (P<0.05/25 after Bonferroni correction) (Figure 3C), including HYPO, CD, RA, B12A, sarcoidosis, HYPER, vitiligo, PBC, MG, and SLE. The first six AIDs were also among the parental AIDs that exerted significant effects on offspring T1D in our FinRegistry epidemiological analyses (ρ from a Spearman’s rank correlation=0.30, P=0.15). The highest rg were seen for B12A (rg=0.48 [0.34, 0.63], P=6.4×10−11), followed by HYPO (0.43 [0.34, 0.51], P=2.9×10−23) and RA (0.43 [0.25, 0.61], P=3.9×10−6) (Table S3.3.3.1). Contrary to the negative HLA association, we did not observe a significant non-HLA genetic correlation between T1D and MS or between T1D and IBD. A sensitivity analysis excluding all chromosome 6 variants yielded similar estimates, suggesting that the observed non-HLA results were not driven by variants in highly linkage disequilibrium with HLA alleles (Table S3.3.3.2).
In summary, both the HLA and non-HLA analyses recapitulated the multi-generational epidemiological effects observed in nationwide registers. While the results conferred that the risk of T1D was positively associated with the risk of many other AIDs regarding both HLA and non-HLA genetic variants (e.g., HYPO, CD, RA), the opposite held for MS and IBD concerning the HLA susceptibility whereas no genetic correlation was observed in non-HLA regions.
Transmitted and non-transmitted genetic liability within families
Polygenic transmission disequilibrium for HLA and non-HLA genetic factors
Following the population-based epidemiological evidence for clustering of other AIDs and T1D in the trios, and the genetic evidence for a shared genetic origin of the AIDs and T1D at a population level, we wanted to examine the transmission of the HLA and non-HLA within families using a polygenic transmission disequilibrium test (pTDT)45. The pTDT examines whether the AID risk variants are observed to be over-transmitted (or under-transmitted) to the offspring with T1D (compared to the expected transmission rate) and thus it is immune to many of the potential confounders arising from population studies on unrelated individuals. We considered 12,563 trios (genotyped in FinnGen), of which 1,159 had offspring with T1D (9.2%). The prevalence of T1D in these trios was higher than expected in the general population because of the inclusion of several studies have specifically targeted individuals with T1D. We analyzed 10 AIDs with reliable PGSs (ρ≥2%) for both HLA and non-HLA genes and additional six diseases with only reliable PGSs for the HLA regions (Methods; Supplementary notes 3.3.2 and Table S3.4.1). For offspring with T1D, both HLA and non-HLA PGSs for T1D deviated significantly from their mid-parent value (1.23 [1.16, 1.30], P=2.6×10−168 and 0.69 [0.62, 0.75], P=9.6×10−74) while no such deviation was observed in unaffected siblings (−0.06 [−0.13, 0.01], P=0.08 and 0.00 [−0.06, 0.05], P=0.88) (Figure 4 and Table S3.4.2). Overall, two major patterns were seen across the analyzed AIDs. The first group encompassed T1D, HYPO, RA, and SLE, for which significant over-transmission in offspring with T1D was seen for both HLA and non-HLA PGSs. The second group exhibited significant transmission only for HLA but not non-HLA PGSs: while psoriasis exhibited significant over-transmission in offspring with T1D compared to unaffected siblings, IBD and MS had a significant under-transmission. Taking IBD as an example, among 2,040 genotyped trios, 756 offspring with T1D presented under-transmitted HLA PGS (−0.38 [−0.45, −0.31], P=3.1×10−22) while their 1,269 unaffected siblings had comparable PGS to their parents (−0.01 [−0.08, 0.06], P=0.17). No diseases showed significant pTDT only regarding non-HLA PGS without a significant association for HLA-PGS.
Polygenic transmission disequilibrium tests (pTDT) assess whether AID PGSs transmitted to offspring significantly deviated from the parental PGS in T1D-affected offspring and their unaffected siblings. Deviations from parental PGS (mean and 95% CI) of 1 SD change in offspring PGS are shown separately for HLA and non-HLA PGSs, with the P value obtained from a two-sided t-test. Red color denotes offspring with T1D and gray unaffected siblings. AID, autoimmune disease; PGS, polygenic score; T1D, type 1 diabetes; CI, confidence interval; SD, standard deviation; HLA, human leukocyte antigen.
Offspring T1D risk based on parental AIDs clinical history and genetic information
The PGS for T1D has shown potential to predict disease susceptibility before its onset19–21. We asked whether T1D risk in offspring can be assessed using a couple’s genetic information before they plan to have a child, and how this compares to estimating the risk based on family history of T1D and other AIDs, as is currently being done in clinical practice. To address this question, we adopted a novel strategy to construct a full PGS (Full-PGS) by considering the contribution ratio of HLA PGS relative to non-HLA PGS (Methods; Supplementary notes 3.5). We tested eight AIDs (T1D, HYPO, RA, SLE, CD, psoriasis, IBD, and MS) with at least one reliable PGS in HLA regions or non-HLA regions for T1D prediction (|ρ|≥1%) (Table S3.5.1) using five-fold cross validation. By modelling the HLA and non-HLA variations separately and using imputed HLA alleles (187 alleles in 10 classical HLA genes rather than common genotyped SNPs), the Full-PGSs we proposed outperformed the standard SNP-based PGS methods such as PRS-CS46 at an individual level (Table S3.6.1; Table S3.6.2) and when predicting T1D in offspring using AID-specific parental PGS among the 12,563 FinnGen genotyped trios (Table S3.6.3; Figure S3.6.4).
Having established that a parental Full-PGS for T1D was strongly associated with T1D risk in offspring (AUCmean=0.817), we tested whether the parental Full-PGSs for all the eight AIDs could add any additional information to the parental Full-PGS for T1D. However, limited difference was seen by integrating the PGS of eight AIDs (AUCmean: 0.817 vs 0.820, Pdifference=0.77). This suggested that, instead of a direct effect, the impact of parental AID PGS on offspring T1D was most likely to be mediated through the genetic correlation between T1D and AID, and therefore provided limited additional information to the offspring T1D prediction when parental T1D PGS was already included and sufficiently accurate.
We thus focused on parental Full-PGS for T1D and estimated the cumulative risk of T1D in offspring using a Cox proportional hazards model adjusted for the first 10 PCs and birthyear of the child among genotyped trios (Methods). When stratified PGS into four groups by percentiles (0-50th, 50-90th, 90-99th, 99-100th), we observed distinct trajectories in terms of T1D cumulative incidence rates (0.24%, 0.74%, 2.29%, 6.96% by age 20) (Figure 5A). That is, children whose parents were in the top percentile of T1D Full-PGS had a 29-fold higher risk of developing T1D than children whose parents were within the bottom 50% of T1D Full-PGS. When further stratified by the sex of children (Figure 5B and 5C), the adjusted survival curves suggested that sons overall had a higher cumulative incidence than daughters across all PGS groups. For example, in the top T1D parental Full-PGS group, sons (11.48%) could have a 2-fold higher cumulative risk than daughters (5.29%) by age 20. Overall, the cumulative incidence curves of T1D started to level off around the age of 14 to 16.
Adjusted survival curves from Cox proportional hazards models for cumulative incidence of T1D in offspring by age 20, stratified by parental full T1D PGS percentiles comprising both HLA and non-HLA variants (Panels A-C; 8,872 FinnGen genotyped trios), parents’ T1D status (Panels D-F; 3,048,812 FinRegistry trios), parents’ other AIDs (Panels G-I; 3,037,723 FinRegistry trios after removing trios having parent(s) diagnosed with T1D), including T1D, autoimmune hypothyroidism, rheumatoid arthritis, systemic lupus erythematosus, coeliac disease, psoriasis, inflammatory bowel disease, and multiple sclerosis; The panels show pooled data for both sexes (left), daughters (middle), and sons (right). T1D, type 1 diabetes; AID, autoimmune disease; HLA, human leukocyte antigen; PGS: polygenic score.
We then wanted to understand how informative the Full-PGS was compared to parental AID clinical history. In general, the offspring of T1D couples had a higher cumulative incidence of T1D risk (Figure 5D) than T1D-unaffected couples and such differences were larger for boys (Figure 5E) than girls (Figure 5F). Having either father or mother with T1D (prevalencefather = 0.22%, prevalencemother = 0.15%) resulted in a lower cumulative risk than having both parents in the top percentile of Full-PGS for T1D (1.49% and 4.16% vs 6.95% by age 20 for children). Having both parents affected by T1D (a very rare event with only a prevalence of 8.5 per million) resulted in a higher cumulative risk (11.30%) than having parents in the top percentile of Full-PGS for T1D (6.95%). Among couples only affected by AIDs other than T1D, offspring also had a higher cumulative incidence of T1D than the general population (Fig. 5 G, H, and I) (0.96 % vs 0.48%), although the cumulative incidence would be smaller than that for the offspring from couples with physician-diagnosed T1D or couples in the top decile of the Full-PGS for T1D.
Discussion
In this study, we comprehensively explored the genetic determinants of the familial aggregation of T1D and other AIDs with data from high-dimensional nationwide registers and rich genetic information of the Finnish population. Our long follow-up (≥20 years) in offspring and the early age of T1D onset (median age = 12.7) allows us to maximize the number of T1D cases identified in the study. With systematically designed epidemiological and genetic analyses, we seek to answer three key but underexplored questions: which parental AIDs are related to T1D in offspring, to what extent the identified parental AID-offspring T1D association is attributable to genetic polymorphisms of HLA and non-HLA genes, and last, for any couples that are planning to have children, how well we could use their genetic information to evaluate the T1D risk of their offspring.
The nationwide registers of 7.2 Finns collected in the FinRegistry study cover longitudinal health and sociodemographic information of 58,284 family trios which provides us a unique opportunity to perform a comprehensive exploration of 50 autoimmune diseases for their impacts on offspring T1D, on a scale and detailed level that could be difficult to achieve in questionnaire- or survey-based studies. This rich familial and health information enabled us to perform an efficient matched case-control design to better control for unmeasurable confounding factors than the standard unmatched design used in previous large population-based studies25–27. In total, we detected ten parental AID-offspring T1D associations, including T1D, HYPO, CD, RA, HYPER, and B12A, that have previously been linked to T1D risk in offspring in large population-based studies (Table S4), and more importantly, six novel associations encompassing AIHA, MCTD, MG, AA, psoriasis, and vitiligo.
Following the epidemiological analyses, we leveraged the genetic information of 470K genotyped Finns enrolled in the FinnGen study to better understand the genetic causes of the observed family aggregations. Given that T1D is known to be impacted by both HLA and non-HLA variations2,8–18, we designed a set of genetic analyses in a hypothesis-free manner to separate the contributions from HLA and non-HLA regions for each analyzed AID. These genetic analyses were conducted both at a population level, studying the shared effects of HLA and non-HLA variations among unrelated individuals, and, within families using pTDT in 12,563 FinnGen family trios, which to the best of our knowledge, is the largest family-based genetic analysis performed for T1D and AIDs. Similar to the epidemiological family-based study that is widely accepted to yield less biased estimates than population-based studies, pTDT can condition out the impact of shared familial factors through within-family comparisons, allowing, for the first time, to examine under- or over-transmission of genetic risk factors for AIDs in individuals with T1D and their unaffected siblings.
Generally, two major patterns were seen for the transmission of AID-associated variants to offspring with T1D, with one group encompassing T1D, HYPO, RA, and SLE showing significant over-transmission for both HLA and non-HLA variants, and another group comprising IBD, MS, CD, and psoriasis that exerted significant deviated transmission only for HLA. We noted that even among the disease group presenting over-transmission for both HLA and non-HLA, more significant transmission was seen regarding HLA compared to non-HLA. This is consistent with the previous studies which reported strong HLA associations for many AIDs2,47,48.
Previous works over decades have shown evidence for variation in genes in both HLA and non-HLA regions being associated with and between AIDs, especially after the advances in the development of GWAS and multi-omics data in recent years47,49–51. Specifically, genes modulating adaptive immunity through T cell activation and signaling (e.g., PTPN2252,53, TAGAP54–57, CD22658–60), B cell activation (e.g., KIF5A/PIP4K2C61–63, IL1064–66), T helper cells (e.g., STAT1-STAT467–72, PRKCQ73–75), and regulatory T cells57,76–79, are involved. Variation has also been shown in genes impacting innate immunity, which may play an important role in viral or microbial infections through the pathogen recognition receptor pathways, transcription factors, apoptosis, autophagy, and immune-complex clearance. A cross-disease genomic analysis of nine AIDs, such as T1D, RA, SLE, suggested that the top prioritized genes of each analyzed AID converged on the same common pathways relevant to T cell activation and signaling, although distinct genes were prioritized across diseases80. Many of these works indicated that T1D shared biological pathways with other AIDs, especially RA, CD, and SLE, which to some extent explained why they were tightly associated with T1D in our analyses.
The interplay of HLA and non-HLA variants on disease susceptibility turned out to be complex and disease-dependent. Especially, our HLA haplotype analysis showed that multiple T1D lead haplotypes exerted strong effects on other AIDs, but largely in a disease-dependent manner. The HLA effects regarding other AIDs and T1D might also be in opposite directions as illustrated by some HLA haplotypes. For example, being the most protective haplotype of T1D, DRB1*15:01-DQA1*01:02-DQB1*06:02 meanwhile strongly increased the risk of MS. This opposite pleiotropic HLA was consistantly observed by multi-allelic HLA PGS. However, in non-HLA regions, we observed no link between T1D and MS from either genetic correlation or pTDT. IBD had a similar pattern in these analyses.These findings, taken together, can point to potential explanations for the discrepancies observed between the epidemiological evidence of familial aggregation and the genetic results. In particular, our results suggest that HLA risk factors for IBD and MS are protective of T1D, while their non-HLA risk factors are not associated with T1D. The overall lack of familial aggregation suggested that this HLA protective effect could be counterbalanced by other non-captured genetic effects or by environmental effects shared within the family.
Most AIDs are more prevalent in women than in men and disease progression and severity could also differ between the two sexes37,81. T1D is an exception in that a slight majority of at least pediatric cases are boys and twice as many fathers than mothers of the patients with T1D also have T1D37,38. In this study, we also observed that among individuals with T1D, men had a higher risk of having offspring with T1D than women, while significant sex-specific associations affecting transmission were overall not observed among parental AIDs. Further, when at least one parent had T1D, especially if it was the father, the male offspring would have a higher risk of developing T1D than the female offspring. To date, the precise cause of a higher rate of T1D transmission from fathers than mothers has not been identified, but potential hypotheses largely revolve around in utero exposure to different aspects of maternal T1D and or its consequences e.g. exposure to maternal hyperglycemia, exogenous insulin, maternal islet antibodies, and maternal enterovirus antibodies38. It has also been proposed that a protective effect of maternal T1D could be mediated by an E. coli dominant maternal microbiome, which fosters the development of neonatal T cells in the infant82–85.
Previous studies also showed that people with AIDs, are less likely to have children86. This is likely to reflect multiple factors, including a higher rate of miscarriage amongst women with AIDs including T1D, amongst whom the rate of miscarriage is 15-30% higher than in the background population 87–90. In some cases, individuals with AIDs may choose not have children due to a fear of transmitting disease to offspring. Being able to estimate early T1D risk trajectories for a couple’s offspring, utilizing their PGS and AID disease status, will equip them with more accurate information to make such decisions around family planning, and provide reassurance to those whose offspring have a low predicted risk of T1D. For individuals with offspring predicted to be at high risk of disease, recent advances in T1D research may mean that targeted screening and/or the option of early preventative therapies may be a realistic option.
So far, we are not aware of any previous study using parental PGS in estimating an AID. We introduced a new approach to calculate PGSs for diseases that have a strong HLA signal, by training different models for HLA and non-HLA regions and combining them based on their expected contribution to the disease. We showed that parental PGS for T1D is strongly predictive of T1D risk before age of 20 and that parental PGS for other AIDs does not help in better predicting T1D. We showed that a T1D PGS that represents the genetic risk factors from parents is positively associated with developing T1D in offspring, especially sons. The cumulative risk trajectories start to plateau around the age of 14-16, especially for children with high parental PGS (top 10%), which is consistent with the incidence peak at 10-14 years reported in previous studies5–7. The effect of high parental PGS on T1D risk in children was in many instances higher than a diagnosis of T1D in their mother or father. For example, having a mother with T1D resulted in a cumulative T1D risk of 1.49% by age 20, while the top 10% of parents with the highest PGS had a cumulative risk of 2.88%. Considering that only 0.15% of mothers with children had T1D in Finland, the PGS provides additional value on top of family history. These results indicate the intriguing possibility of considering parental PGS, in conjunction with clinical diagnoses, to inform parents about T1D risk in their offspring.
This study also has several limitations. One limitation was that AID diagnoses for some patients might not be well captured in the healthcare register. For example, the specific ICD-9 code for T1D was not implemented until 1987 although the Care Register for Finnish Health Care began as early as in 1969. Another limitation was that rather than being nationally representative, the amalgamated Finnish biobanks represent a collection of biobanks with diverse methods of data collection. Some of the diseases might have a higher prevalence in the Finnish biobanks compared to the nationally representative FinRegistry. Although these aspects might limit the generalizability of the results, we noted that our findings in the Finnish biobanks were consistent with those we observed from FinRegistry and matched well to previous studies26,29,31,80,81. Thirdly, rather than extending the analysis to siblings or other relatives, in this study, we focused on parents and offspring. We note the HLA and non-HLA association patterns with T1D across AIDs would be similar to what we observed here although the effect sizes would differ. Finally, while many risk factors (e.g., socioeconomic factors, genetic nurture) might have an impact on the development of T1D, our study mainly focused on family history of diseases and genetic factors by controlling these factors with delicate analytical designs.
In conclusion, our results, while confirming the existence of general familial aggregation of AIDs and T1D, highlight a substantial heterogeneity in the impact of different AIDs on T1D. This heterogeneity is partially explained by different genetic effects within and outside the HLA regions and demonstrated two different transmission patterns regarding shared genetic liabilities in HLA and non-HLA. Overall, genetic effects inside and outside HLA regions are consistent with observational analyses, but in the case of IBD and MS we revealed unexpected divergence between the genetic effects and epidemiological observations. Investigating the mechanisms behind these findings may provide valuable insights into the origins of T1D and the etiology of the familial aggregations of AIDs. Moreover, the relative contribution of HLA and non-HLA genetic risk can be leveraged to create powerful PGSs that can complement a family history of AIDs as a tool to inform parents about expected T1D risk in their offspring.
Author contributions
FW, AL, TT, and AG designed the study and wrote the manuscript, with comments from all authors. FW processed the registry data, conducted the statistical analyses, and generated all figures and tables; AL and FW preprocessed the FinRegistry pedigree data and the FinRegistry team preprocessed all other registry data of FinRegistry; the FinnGen data team preprocessed the FinnGen data. TT, PV, and the FinnGen clinical team defined disease endpoints; TT oversaw the project and interpreted the findings from a clinical standpoint. AL, ZY, SJ, and AG advised the registry-based epidemiological analyses; AL, ZY, and AG advised genetic analyses; AL, SK, JR, JP, and TT advised HLA haplotype analyses. AG and TT supervised the study. MP is the principal investigator of the FinRegistry project. All authors discussed the results, revised the manuscript, and had final responsibility for the decision to submit for publication.
Funding
FW was funded by the University of Helsinki and the University of Edinburgh joint PhD program in Human Genomics. AG was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme [grant number 945733], starting grant AI-Prevent. AG was supported by the Academy of Finland (grant no. 323116). This project also received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no. 101016775. TT was supported by grants from Folkhälsan Research Foundation, The Academy of Finland (336822, 312072 and 336826), and the University of Helsinki.
Competing interests
AG is the founder of Real World Genetics.
RO holds a UK Medical Research Council Institutional Confidence in Concept Grant to develop a 10 SNP biochip T1D genetic test in collaboration with Randox.
Ethics statement
FinRegistry is a collaboration project of the Finnish Institute for Health and Welfare (THL) and the Data Science Genetic Epidemiology research group at the Institute for Molecular Medicine Finland (FIMM), University of Helsinki. The FinRegistry project has received approvals for data access from THL (THL/1776/6.02.00/2019 and subsequent amendments), Digital and Population Data Services Agency (VRK/5722/2019–2), Finnish Center for Pension (ETK/SUTI 22003) and Statistics Finland (TK-53–1451-19). The FinRegistry project has received IRB approval from THL (Kokous 7/2019).
The use of the biobank donor samples is in accordance with the biobank consent and meets the requirements of the Finnish Biobank Act 688/2012. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017.
The FinnGen study is approved by Finnish Institute for Health and Welfare (permit numbers: THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019 and THL/1524/5.05.00/2020), Digital and population data service agency (permit numbers: VRK43431/2017-3, VRK/6909/2018-3, VRK/4415/2019-3), the Social Insurance Institution (permit numbers: KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 134/522/2019, KELA 138/522/2019, KELA 2/522/2020, KELA 16/522/2020), Findata permit numbers THL/2364/14.02/2020, THL/4055/14.06.00/2020, THL/3433/14.06.00/2020, THL/4432/14.06/2020, THL/5189/14.06/2020, THL/5894/14.06.00/2020, THL/6619/14.06.00/2020, THL/209/14.06.00/2021, THL/688/14.06.00/2021, THL/1284/14.06.00/2021, THL/1965/14.06.00/2021, THL/5546/14.02.00/2020, THL/2658/14.06.00/2021, THL/4235/14.06.00/2021, Statistics Finland (permit numbers: TK-53-1041-17 and TK/143/07.03.00/2020 (earlier TK-53-90-20) TK/1735/07.03.00/2021, TK/3112/07.03.00/2021) and Finnish Registry for Kidney Diseases permission/extract from the meeting minutes on 4th July 2019.
The Biobank Access Decisions for FinnGen samples and data utilized in FinnGen Data Freeze 11 include: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1, BB2021_65, Finnish Red Cross Blood Service Biobank 7.12.2017 (000-2018), Helsinki Biobank HUS/359/2017, HUS/248/2020, HUS/430/2021 §28, §29, HUS/150/2022 §12, §13, §14, §15, §16, §17, §18, §23, §58 and §59, Auria Biobank AB17-5154 and amendment #1 (August 17 2020) and amendments BB_2021-0140, BB_2021-0156 (August 26 2021, Feb 2 2022), BB_2021-0169, BB_2021-0179, BB_2021-0161, AB20-5926 and amendment #1 (April 23 2020) and it’s modification (Sep 22 2021), BB_2022-0262, BB_2022-0256, Biobank Borealis of Northern Finland_2017_1013, 2021_5010, 2021_5018, 2021_5015, 2021_5015 Amendment, 2021_5023, 2021_5023 Amendment, 2021_5017, 2022_6001, 2022_6006 Amendment, BB22-0067, 2022_0262, Biobank of Eastern Finland 1186/2018 and amendment 22§/2020, 53§/2021, 13§/2022, 14§/2022, 15§/2022, 27§/2022, 28§/2022, 29§/2022, 33§/2022, 35§/2022, 36§/2022, 37§/2022, 39§/2022, 7§/2023, Finnish Clinical Biobank Tampere MH0004 and amendments (21.02.2020 & 06.10.2020), 8§/2021, 9§/2021, §9/2022, §10/2022, §12/2022, 13§/2022, §20/2022, §21/2022, §22/2022, §23/2022, 28§/2022, 29§/2022, 30§/2022, 31§/2022, 32§/2022, 38§/2022, 40§/2022, 42§/2022, 1§/2023, Central Finland Biobank 1-2017, BB_2021-0161, BB_2021-0169, BB_2021-0179, BB_2021-0170, BB_2022-0256, and Terveystalo Biobank STB 2018001 and amendment 25th Aug 2020, Finnish Hematological Registry and Clinical Biobank decision 18th June 2021, Arctic biobank P0844: ARC_2021_1001.
Methods
Data sources
Most of our analyses were conducted by leveraging two large datasets in Finland – FinRegistry33 and FinnGen34 (Table S3.1.1).
FinRegistry is a national registry database in Finland combining disease registries with comprehensive records on individuals’ demographics, socioeconomic status, death, drug purchases, prescriptions, and administrative information. It includes all the residents in Finland who were alive on 1st January 2010, and their first-degree relatives (in total approximately 7.2 million individuals). 572,640 individuals were excluded as they migrated in or out of Finland by 2019, remeining 3,412,326 individuals with both father and mother information.
FinnGen (R11) is a national research project using samples and data from Finnish biobanks, compromising genome-wide genotype data and healthcare registry data for approximately 0.47 million Finnish biobank donors. After excluding 9,196 individuals who had migrated in or out of Finland by 2019, removing 10,588 individuals without imputed HLA data and 8,627 individuals who have other missing values in the cohort, we included 3,668 T1D cases and 441,602 controls without T1D included in FinnGen and observed lifetime prevalence and number of cases in FinnGen for each selected AID. The study population features slightly more women (N=251,294, 56.4%).
By using the datasets from the same country, we alleviate concerns about potential bias caused by differences in genetic background, healthcare system, or socio-cultural context.
Finnish population-based case-control study
The epidemiology association between AIDs in parents and T1D in their children was investigated by a case-control study within FinRegistry. To include most of the individuals and their parents in this analysis, we selected all individuals born in Finland between 1960 and 1999 who had both father’s and mother’s information available. The follow-up period started either from the date they were born or from the year the codes for our selected diseases were included in national patient register and ended on 31 December 2019, which is the latest follow-up date in FinRegistry. We then restricted our analysis to offspring whose father was born between 1917 and 1976 and whose mother was born between 1922 and 1976. Finally, we removed those individuals who died during the follow-up period.
T1D cases were individuals diagnosed with T1D before the age of 40 years. For each trio with T1D-affected children, three controls without T1D diagnosis were matched by following variables: sex, birth year (five-year bin, 8 levels), and birthplace for the child (19 levels), birth year of father (five-year bin, 12 levels), birth year of mother (five-year bin, 11 levels), and family size (represented by the number of siblings: 0, 1, 2, 3, >=4). The differences in socio-demographic characteristics between cases and controls, including socio-economic status, marital status, number of offspring, occupation, education, and mother tongue, were further examined to ensure that the two groups were comparable. Cohen’s D91 (≥0.2) for ordinal characteristics such as number of siblings and number of children and Chi-squared test (P<0.05) for categorical variables such as education and marital status were used to test whether the differences are minor.
Exposures of interest were all the parental endpoints related to AIDs mainly defined by Finnish versions 8-10 of International Classification of Diseases (ICD) (Supplementary notes 2.1). AIDs with fewer than 50 cases among parents in the final study population in the case-control study were filtered out from the list. We then excluded AIDs with an unclear definition or AID codes that were subtypes of other codes. The final AID list was used across all analyses in this study.
A conditional logistic regression, adjusted by birth year, birth year of father, birth year of mother, and the exact number of siblings, was applied to estimate the effect of having a given AID in parents on the risk of developing T1D in offspring. Bonferroni multiple testing correction was applied, and only the diseases reached significant associations (P<=0.05/26) were included in the following analyses. To examine whether the identified associations are shared between father and mother, we used conditional logistic regression stratified by maternal AIDs status and paternal AIDs status. We also investigated whether parental AID status can influence the onset age of disease in offspring by regressing the age at onset of T1D in offspring on the AID status in their parents, with the same covariates described above adjusted. Statsmodels v.0.13.592 and SciPy v.1.7.3 93 were used for the analyses in this section in the context of Python v.3.7.11.
Shared genetic components at a population level
To explore, to what extent, the familial aggregations between AIDs and T1D identified in the population-based case-control study are contributed by genetic factors, we conducted three analyses in both HLA regions and non-HLA regions at individual level. Given the strong associations between HLA and AIDs4, we first performed an HLA haplotype analysis and an HLA-based PGS analysis using imputed HLA alleles. Then, we performed a genetic correlation analysis using variants in non-HLA regions.
HLA haplotype analysis
The HLA alleles were imputed by a population-specific reference panel using FinnGen data at the individual level37. We considered all the 187 unique available alleles on 10 classical genes in class I and II regions for analysis, including 27 alleles for HLA-A, 40 for HLA-B, 23 for HLA-C, 24 for HLA-DPB1, 15 for HLA-DQA1, 16 for HLA-DQB1, 33 for HLA-DRB1, and 3 each for HLA-DRB3, HLA-DRB4, and DRB5 separately. According to previous studies8–10,36, several HLA haplotypes in Class II have been significantly associated with an increased risk of developing T1D and other AIDs. To understand whether HLA genetic liability to T1D is associated with other AIDs, we conducted a haplotype analysis (Figure S2.4.1) and only considered possible haplotypes from gene combination HLA-DRB1-DQA1-DQB1. We first removed individuals with any ambiguous genotypes in the given haplotype. We define an ambiguous genotype as a genotype with imputed allele dosage in [0.4, 0.6] or [1.4,1.6]. Next, we extracted the potential haplotypes using the expectation–maximization algorithm and removed those haplotypes whose frequency is less than 0.005. We identified the most prevalent HLA haplotypes that were either positively or negatively associated with T1D using a haplotype score test (P<1.0 ×10−10), a statistical test to calculate the contribution of a specific haplotype on a certain phenotype when the linkage phase is unknown95. Multiple testing corrections were applied for haplotype selection. We then only selected the three haplotypes with the highest P-value and the three with the lowest. Afterward, a multivariate logistic regression was built to estimate the associations between the remaining haplotypes and AID status, adjusting for age, sex and the first ten principal components (PCs). Haplotype analysis was done in Python v.3.7.13 using statsmodels v.0.13.292 and in R v.4.2.2 using gap v.1.3-196.
HLA multi-allelic score
To tackle the complex linkage disequilibrium (LD) structure amongst HLA alleles, we introduced a PGS method to construct PGS for T1D and other AIDs specific to HLA regions using FinnGen R11. To handle the downstream analyses in later Panels, we considered all individuals of genotyped family trios as our target population and the rest as a training set for weight calculation.
In the training set, to account for the complex LD and high polymorphism among HLA alleles, we applied to calculate the weight for each HLA allele using weighted ridge classifiers with optimal regularization strength, which can address this problem by imposing a penalty on the size of the coefficients. For covariates, we adjusted for age, sex, and the first ten PCs. For the genotyped trios, we calculated PGS by summing up all the allele dosage multiplied by its weight from the corresponding training set. After PGS normalization, we first measured the partial correlation between each AID and its PGS. We dropped all the AIDs whose partial correlation with PGS for the AIDs themselves was lower than 2%, to filter out diseases with low sample sizes or with weak HLA signals. Next, we conducted logistic linear regression to examine the association between PGS and T1D. The Bonferroni correction was applied to account for false positives related to multiple comparisons. Weighted ridge regression and logistic regression were applied using sklearn v.1.0.2 and statsmodels v.0.13.292 in Python v.3.7.13 and partial correlation were calculated using ppcor v1.197 in R v.4.3.2
non-HLA genetic correlation
To evaluate the genetic overlap between T1D and other AIDs due to shared non-HLA effects, we utilized largeset GWAS summary statistics from European ancestry individuals (Table S2.3.1) and estimated genetic correlation in non-HLA regions using linkage disequilibrium score regressions (LDSC), which capitalized the patterns of LD among common genetic variants of the whole genome except for the HLA regions43,44. To separate the contribution of HLA from non-HLA genetic signals, we excluded single-nucleotide polymorphisms (SNPs) in the extended HLA regions on chromosome 6 (25–34 Mb). In a sensitivity analysis, we further examined whether the genetic correlations changed significantly while excluding all the SNPs on chromosome 6. We followed the suggested protocol for LDSC analyses from https://github.com/bulik/ldsc and used the package LDSC v.1.0.143 in Python v.3.7.11. We considered HapMap398 SNPs and constructed LD structures with European ancestry samples in 1000 Genomes project99.
Analyses of inter-generation cross-trait transmission via polygenic transmission disequilibrium test
To understand the transmission of parental AIDs to T1D in offspring, we further estimated the contribution of parental transmitted and non-transmitted genetic liability to T1D in offspring in both non-HLA regions and HLA regions using data in FinnGen. While offspring are expected to receive, on average, half of their parents’ risk alleles for a disease, some will over or under-inherit alleles associated with the disease, which will impact their disease risk. To understand how much of the shared genetic liability is on average transmitted to the next generation, we applied a polygenic transmission disequilibrium test (pTDT) to assess whether the mean of the offspring PGS distribution is consistent with its parentally derived expected value42.
To construct non-HLA PGS, we used the same GWAS summary statistics used in genetic correlation analysis and adopted PRS-CS46 (Table S2.3.2). We included 12,563 genotyped trios where both parents and at least one offspring were directly genotyped. We considered HLA PGSs and non-HLA PGSs that were robustly associated with the corresponding AID (P<0.05/19 and |ρ|>=2%) (Table S2.3.2). This resulted in ten diseases with both robust HLA PGS and non-HLA PGS: T1D, HYPO, CD, RA, SLE, sarcoidosis, psoriasis, IBD, MS, and AS, as well as additional six diseases for HLA PGS (B12A, HYPER, PBC, MCTD, SjS, and ADDISON) (Supplementary notes 3.3.2 and Table S3.3.4).
For each remaining parental AD, we first removed genotyped families with affected parents and then subtracted mid-parent PGS for that AID from offspring PGS for the same disease. We normalized the obtained difference by dividing it by the standard deviation of the mid-parent distribution. For each child, we grouped his or her family trios according to whether this child was affected, so that we could compare how the mean PGS of children deviate from their mid-parent PGS in the affected sibling group and unaffected sibling group separately, as well as the difference between the two groups. We conducted pTDT using SciPy v.1.7.393 in Python v.3.7.13.
Predicting T1D risk in offspring using parental PGS
To assess how well parental PGSs could be linked to T1D in offspring, we designed several analyses to test the associations and predicting abilities. Considering the prediction performance of PGS for parental AIDs, we only focused on those AIDs that showed significance from previous PGS analyses in HLA regions and non-HLA regions. We introduced a new approach, Full-PGS, combining separately generated HLA and non-HLA PGSs.
Full-PGS construction
Theoretically, with pTDT, four patterns could be defined for parental AID - offspring T1D transmissions: 1) both HLA PGS and non-HLA PGS over-/under-transmitted from unaffected parents to affected children; 2) only HLA PGS over-/under-transmitted; 3) only non-HLA PGS over-/under-transmitted; 4) neither was transmitted.
We wondered whether we could improve PGS prediction across diseases by better considering genetic architecture of a target disease. For our target disease, T1D, we constructed HLA PGS and non-HLA PGS separately, and further considered the contribution ratio of HLA PGS relative to non-HLA PGS in a full PGS context.
We proposed an equation to quantify the exact contribution of HLA PGS and non-HLA PGS (rationew) for diseases following pattern 1 (effectHLA ≠ 0; effectnon-HLA ≠ 0), by taking HLA vs non-HLA contributions for both T1D and another AID into account.
Where the contribution ratio between HLA PGS and non-HLA PGS for T1D is 2:1, wHLA and wnon-HLA are the weight of HLA PGS and the weight of non-HLA PGS for the AID when predicting the AID itself, rationew denotes the contribution ratio between HLA PGS and non-HLA PGS when predicting T1D.
For the diseases following pattern 2 (effectHLA ≠ 0; effectnon-HLA = 0), since only HLA PGS is assumed to be over-/under-transmitted, we only used HLA PGS for these diseases to predict T1D.
For the diseases following pattern 3 (effectHLA = 0; effectnon-HLA ≠ 0), similarly, we only considered non-HLA PGS for these diseases to predict T1D.
For the diseases following pattern 4 (effectHLA ≠ 0; effectnon-HLA ≠ 0), neither HLA PGS nor non-HLA PGS was used for T1D prediction.
In a scenario of cross-disease prediction, e.g., using PGS for RA to predict T1D, we conducted the analyses with the following steps: 1) assigned RA to one of the four groups according to its pTDT results, 2) quantified the contribution ratio between HLA PGS and non-HLA PGS using ρHLA PGS, ρnon-HLA PGS and the proposed equation, and 3) built a Full-PGS using HLA PGS, non-HLA PGS and the ratio of their contributions.
We proposed that this equation could be extended to broader conditions for cross-disease prediction to fully utilize PGS information of a correlated disease to boost the prediction power of a target disease.
where wT1 and wT2, are the weights of two seperate components of the target disease under a presumed genetic architecture (e.g., partition into HLA variants vs non-HLA variants, or common variants vs rare variants), wT1 and wT2, are the weights of two separate components of the correlated disease, rationew is the contribution ratio between the two genetic components for the correlated disease when predicting the target disease.
Impact of parental AID information and cumulative risk of T1D in children
For each parental AID, we constructed a mid-parent Full-PGS using HLA PGS and non-HLA PGS considering their ratio for cross-disease prediction and examined the association with T1D in offspring. For comparisons, we ran additional models including only using parental diagnosis, only mid-parent PGS calculated by a traditional PGS approach that takes genome-wide variants altogether for calculation, e.g., PRS-CS46, or both HLA PGS and non-HLA PGS from our previous analyses, assumed to be at least as good as Full-PGS. We used AUC as a primary metric to evaluate model performances. To understand how much additive value parental PGS for other AIDs can be added to a T1D prediction model, we then compare a model with all mid-parent PGSs and one with only mid-parent T1D PGS. We evaluated the two models by comparing the AUCs among 12,563 genotyped trios in FinnGen. To avoid overfitting, the analyses were done using five-fold cross validation.
We then conducted a Cox proportional hazards model to assess the T1D cumulative incidence risk in offspring before the age of 20 years, adjusting for birthyear and the first 10 PCs of offspring. To maximize the statistical power, we considered all the genotyped trios with children born between 1960-2010. Among the 8,827 trios we analyzed, 1,035 had a child with T1D (11.7%). We then grouped the trios by dividing the parental Full-PGS percentile into 0-50th, 50-90th, 90-99th, and 99-100th and estimated cumulative risk within each group given the mean of children’s birthyear and PCs. We also stratified the model by the sex of offspring. To make the result generalizable, we calibrated the FinnGen results based on the FinRegistry nationwide T1D prevalence among all children born between 1960-2010 (mean birthyear=1984; N=3,048,812; prevalence=0.6%).
For the analyses in FinRegistry, we used the same cohort as we used for FinnGen calibration. We considered four groups: 1) parents without T1D, 2) maternal T1D only, 3) paternal T1D only, and 4) both parents with T1D among the whole study population. For the analyses excluding children with parental T1D, we considered four groups: 1) parents without AID, 2) only the mother had AID(s), 3) only the father had AID(s), and 4) both parents had AID(s). The parental AIDs in this analysis covered all the significant associations identified from the previous analyses. All the analyses in this session were done using statsmodels v.0.13.292 and sklearn v.1.0.2 in Python v.3.7.13 and survival v.3.5.7 in R v.4.3.2.
Code availability
Code for the complete analyses is available at https://github.com/dsgelab/parentalAIDs_T1D.
Acknowledgments
We thank the entire FinRegistry and FinnGen team for making the data available for this study. We also acknowledge Richard Oram (Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK) for his discussion on this study, Samuel Jones (Institute for Molecular Medicine Finland (FIMM), Finland) for discussion on HLA PGS analysis, Bradley Jermy (FIMM, Finland) for discussion on genetic correlation analysis, Mary P Reeve (FIMM, Finland) for discussion on disease definitions, Om Dwivedi (FIMM, Finland) for discussion on T1D PGS, and Alessio Gerussi (University of Milano-Bicocca, Italy) for discussion on results from a clinical standpoint. The We want to acknowledge the participants and investigators of FinnGen study. The FinnGen project is funded by two grants from Business Finland (HUS 4685/31/2016 and UH 4386/31/2016) and the following industry partners: AbbVie Inc., AstraZeneca UK Ltd, Biogen MA Inc., Bristol Myers Squibb (and Celgene Corporation & Celgene International II Sàrl), Genentech Inc., Merck Sharp & Dohme LCC, Pfizer Inc., GlaxoSmithKline Intellectual Property Development Ltd., Sanofi US Services Inc., Maze Therapeutics Inc., Janssen Biotech Inc, Novartis Pharma AG, and Boehringer Ingelheim International GmbH. Following biobanks are acknowledged for delivering biobank samples to FinnGen: Auria Biobank (www.auria.fi/biopankki), THL Biobank (www.thl.fi/biobank), Helsinki Biobank (www.helsinginbiopankki.fi), Biobank Borealis of Northern Finland (https://www.ppshp.fi/Tutkimus-ja-opetus/Biopankki/Pages/Biobank-Borealis-briefly-in-English.aspx), Finnish Clinical Biobank Tampere (www.tays.fi/en-US/Research_and_development/Finnish_Clinical_Biobank_Tampere), Biobank of Eastern Finland (www.ita-suomenbiopankki.fi/en), Central Finland Biobank (www.ksshp.fi/fi-FI/Potilaalle/Biopankki), Finnish Red Cross Blood Service Biobank (www.veripalvelu.fi/verenluovutus/biopankkitoiminta), Terveystalo Biobank (www.terveystalo.com/fi/Yritystietoa/Terveystalo-Biopankki/Biopankki/) and Arctic Biobank (https://www.oulu.fi/en/university/faculties-and-units/faculty-medicine/northern-finland-birth-cohorts-and-arctic-biobank). All Finnish Biobanks are members of BBMRI.fi infrastructure (www.bbmri.fi). Finnish Biobank Cooperative -FINBB (https://finbb.fi/) is the coordinator of BBMRI-ERIC operations in Finland. The Finnish biobank data can be accessed through the Fingenious® services (https://site.fingenious.fi/en/) managed by FINBB.