Abstract
Genetics as a science has roots in studying phenotypes of relatives, but molecular approaches facilitate direct measurements of genomic variation within individuals. Agricultural and human biomedical research are both emphasizing genotype-based instruments, like polygenic scores, as the future of breeding programs or precision medicine and genetic epidemiology. However, unlike in agriculture, there is an emerging consensus that family variables act nearly independent of genotypes in models of human disease. To advance our understanding of this phenomenon, we use 2,066,057 family records of 99,645 genotyped probands from the iPSYCH2015 case-cohort study to show that state-of-the-field genotype- and phenotype-based genetic instruments explain largely independent components of liability to psychiatric disorders. We support these empirical results with novel theoretical analysis and simulations to describe, in a human biomedical context, parameters affecting current and future performance of the two approaches, their expected interrelationships, and consistency of observed results with expectations under simple additive, polygenic liability models of disease. We conclude, at least for psychiatric disorders, that phenotype- and genotype-based genetic instruments are likely noisy measures of the same underlying additive genetic liability, should be seen for the near future as complementary, and integrated to a greater extent.
Introduction
In the first decades of genetics as a science, the only approach for estimating genetic contributions to traits was by assessing phenotypes of relatives1,2, but now molecular approaches provide direct measures of individual genomes cheaply, efficiently, and at scale. In agricultural applications, e.g., estimating breeding values of dairy cattle3,4, “genotype-based genetic instruments” derived from molecular data are proving so effective that “phenotype-based genetic instruments” obtained from pedigree records are becoming redundant3,5–7. At the same time, human biomedical research is emphasizing genotype-based instruments, polygenic scores (PGS)8, in precision medicine9–11 and genetic epidemiology12. However, it is not known how and why phenotype-based and genotype-based genetic instruments complement each other in human biomedical research. We examine these questions by combining theoretical analysis, simulations, and applications of state-of-the-field genotype- and phenotype-based genetic instruments in a unique psychiatric genetics data resource.
Somewhat counterintuitively, PGS and indicators of positive family history (FH) have low correlations and contribute nearly independently to classification and prediction. PGS and parental history for depression13, autism14, or schizophrenia15 are near-independent in epidemiological models, with similar results in others diseases16,17. Studies18–20 using family genetic risk scores (FGRS), shared environment adjusted, kinship weighted sums of diagnoses in proband genealogies, have reported a number of associations that are mirrored by studies using PGS (e.g.,21–24). Model-based estimates of liability from family records, liability threshold on family history (LT-FH)16 and Pearson-Aitken Family Genetic Risk Scores (PA-FGRS)21, have low correlations with PGS and contribute nearly independently to classification. Many intuit this as evidence that family variables reflect effects of shared environment but it is not clear that these reports are inconsistent with large additive genetic components in complex trait etiology25,26.
In this work we describe the relationships among phenotype-based genetic instruments (FH and PA-FGRS), genotype-based genetic instruments (PGS), and liability for five major psychiatric disorders. We use data from two independent cohorts of the iPSYCH2015 case-cohort study that include diagnostic information for 99,645 genotyped probands (62,357 psychiatric cases, 37,288 controls) and 2,066,657 relatives. We propose that the lack of redundancy between PA-FGRS (or FH) and PGS in human biomedical research is consistent with the hypothesis that both are simply very noisy measures of a highly similar underlying genetic liability. Different from trends in agriculture applications, we suggest that such instruments should be emphasized as complementary, at least over the near-term, and should be better-integrated going forward.
Results
Genotype- and phenotype-based genetic instruments explain nearly independent components of liability to psychiatric disorders
Genotype-based and phenotype-based genetic instruments are both known to contribute to models of human disease. Here, we estimate the variance in liability to five psychiatric disorders explained by combinations of polygenic scores (PGS), indicators of positive family-history (FH), and Pearson-Aitken Family Genetic Risk Scores (PA-FGRS) using a two-stage sampling weighted probit regression (Online Methods; Figure 1; Supplementary Table S1, S2). We introduce the two-stage sampling weighted probit as a preferred estimating approach because common estimators27 produced qualitatively different results when applied to our real data (Supplementary Figure S1) and subsequent simulation experiments suggested our new approach to be the least biased (Supplementary Information S1; Supplementary Figure S2; Supplementary Table S3). Nagelkirke’s pseudo-R2 from logistic models are shown in Supplementary Figure S3. In these models, PGS, FH, and PA-FGRS each explain significant variance in liability to each disorder, a result that replicates across two independent samples (Figure 1A,B, orange, yellow, green bars). For all disorders, PA-FGRS explain more liability than FH (mean increase 30%, range 11% to 46%; Supplementary Table S2). The variance explained by PGS and PA-FGRS (or FH) when fit together is nearly the sum of variance explained by PGS and PA-FGRS (or FH) when fit alone (Figure 1A,B, blue and purple; Supplementary Table S2). Including FH did not increase the variance explained of models including PA-FGRS and PGS (4th vs. 6th bars in each set, Figure 1A,B, Supplementary Table S2; exception: MDD in iPSYCH 2012). However, including PA-FGRS in models with PGS and FH did generally result in small but significant increases in variance explained (5th vs. 6th bars in each set, Figure 1A,B, Supplementary Table S2; exceptions: BPD, SCZ). Estimated odds ratios (OR) for PA-FGRS (or PGS) were only modestly reduced (mean reduction: 5%) when adjusting for PGS (or PA-FGRS, mean reduction: 8%; Figure 1C,D, Supplementary Table S3). Estimated ORs of FH adjusting for PA-FGRS were reduced to a larger extent (mean: 72%) than ORs of PA-FGRS when adjusting for FH (mean: 23%; Supplementary Figure S4-5, Supplementary Table S4). Taken together, this suggests PA-FGRS is a better instrument for capturing familial genetics than FH. Finally, we observe quite modest correlations between PGS and PA-FGRS in random population samples (Pearson correlation coefficient (r), 0.03 to 0.09; Figure 1E,F) that were consistently, but modestly, larger than between PGS and FH (r, 0.02 to 0.07), and much lower than between PA-FGRS and FH (r, 0.72 to 0.82, Supplementary Figures S6-7).
The current, future, and asymptotic performance of PA-FGRS depends on pedigree structure and trait architecture
In order to understand how and why these instruments appear independent, we need to understand how PA-FGRS is expected to behave in data and under generative models relevant to human biomedical research. Assuming the simple polygenic liability threshold model, we derived (Supplementary Information S2-6) expectations for the accuracy of PA-FGRS (rg,ĝPA-FGRS′, the correlation between our instrument, ĝPA-FGRS′ and the true latent genetic value, g; Online Methods, Eqs. 9-10) as well as for the performance of PA-FGRS (R2l,ĝPA-FGRS′ the variance in total liability, L, explained by the instrument; Online Methods). We use these equations to describe the expected behavior of PA-FGRS in predicting four hypothetical traits from human-relevant pedigree data (Figure 2A-F; Supplementary Figures S8-S16). First, PA-FGRS will never capture all additive genetic effects in human pedigrees for two well-described6,28 reasons: 1) unless a proband has half-sibling offspring (i.e., offspring from different mates such that they are half-siblings to each other) PA-FGRS becomes an estimate of between family genetic variance (i.e., mid-parental genotype, gMPV) and the accuracy and performance of bounded and (Figures 2A,B, dashed lines), and 2) it would require unrealistically many half-sibling offspring to achieve full accuracy and performance (Figures 2C,D, dashed lines). As such, we expect that incorporating additional phenotypes of full siblings (Figure 2A,B) to be less useful than such half-sibling offspring (Figure 2C,D). For plausible human pedigrees (Nrel < 10), the accuracy and performance of PA-FGRS are well below theoretical maxima (Figure 2A-D, Supplementary Figures S8-S13). Second, PA-FGRS that use complex, extended pedigrees do not gain information beyond four generations of records (i.e., beyond great-grandparents and their descendants; Figure 2E-H). Finally as an intuitive measure of information in a pedigree, we use the expected accuracy of PA-FGRS to define a number of full sibling equivalents (Nsibe; Online Methods, Eq. 12) from any arbitrary set of relatives (Figure 2I,J). For a trait with heritability of 0.7 and prevalence of 0.01, Nsibe of 5 is achieved by ∼1 monozygotic twin, ∼4 half-sibling offspring, or approximately one entire great-grandparent founded pedigree with two children per generation (for other simulated trait architectures see, Supplementary Figures S17). Family diagnoses are always more informative for common traits (e.g., Figure 2A-H, purple vs. orange lines), but more distant relatives (e.g., half-siblings or cousins, green or blue lines, respectively, Figure 2I,J) are relatively more predictive for rarer traits (i.e., correspond to higher Nsibe), while closer relatives (i.e., half-sibling offspring, yellow solid lines, Figure 2I,J) are relatively more informative for common traits. Simulations including shared (or “familial”) environmental contributions from siblings and parents show that PA-FGRS will incorporate confounded genetic and familial variance into predictions and this results in overperformance (i.e., excess variance explained) relative to the level expected from genetics alone (Supplementary Figures S18-S21). PA-FGRS will perform best for highly heritable, common traits and with many close relatives, but even in optimistic human pedigree data (Nsibe=5) it will explain a modest proportion of heritability.
PA-FGRS is expected to outperform PGS when GWAS samples are small
Understanding expectations for the realized performance of PGS is important for predicting if and when phenotype-based instruments may become redundant. This is because PGS can, in theory, achieve full accuracy (i.e., rg,PGS =1) and performance (i.e., ) if incorporating all causative variants and errorless effect sizes, but, in practice, limited or biased training GWAS introduce noise. We adapt published frameworks29,30 (Online Methods) to describe the expected performance of PGS trained on samples taken from the same population (Figure 3). We emphasize a set of plausible parameters, PGS tagging the human common variant genome (i.e., capturing m=60,000 independent genetic factors6,31) and accounting for half of the narrow sense heritability (i.e., ), and describe alternate parameters as supplementary analysis (Supplementary Figures S22-24). We then select the sample size needed for PGS to match the performance of PA-FGRS with Nsibe=1, 3, or 5 (Online Methods; Supplementary Information S7-8; Figure 3). The performance of PA-FGRS with Nsibe=1 should equal a PGS with effects taken from GWAS with N=65,000 samples from the same population (range of sample size estimates across simulated traits: 61,546 to 71,356; Figure 3A-D, Supplementary Table S5). The performance of PA-FGRS with Nsibe=3 and Nsibe=5 hould equal a PGS trained using GWAS with N=190,000 (range: 176,047 to 214,070) and N=300,000 (range: 280,368 to 356,784), respectively (Figure 3A-D, Supplementary Table S5). Parameters defining trait architecture ( and prevalence) have smaller impacts on these relationships than parameters defining the architecture of the PGS instrument (e.g., m, the number of independent genetic factors covered, h2l,SNP/h2, the proportion of heritability captured by the markers, and p, an effective polygenicity parameter; Supplementary Figures S22-S24; Supplementary Table S5). For rarer traits, maximally powered case-control sampling (c.c.; 50% cases, 50% controls) can increase performance of PGS substantially relative to PA-FGRS (e.g., Figure 3A), but total population size remains a limiting factor. For more common traits, the difference between the PGS and PA-FGRS is less, even under case-control sampling (e.g., Figure 3D). Minimal (Nsibe=1), realistic (Nsibe=3), and optimistic (Nsibe=5) PA-FGRS can outperform PGS when training GWAS have modest sample sizes or the architecture of the PGS is unfavorable, with the relative performance of PA-FGRS being larger for more common traits.
PA-FGRS are expected to complement PGS for at least the near future
The expected correlation and joint performance of the two instruments can be modeled from the previous expectations. Under a simple polygenic liability model (Online Methods), PGS and PA-FGRS can be viewed as estimators of the same latent additive genetic value, G, and from this, the expected correlation, as well as joint and conditional performances, can be written as simple functions of their accuracies (Online Methods, Eqs 15-17; Figure 4, Supplementary Figure S24, Supplementary Information S9). When the accuracy of PGS and PA-FGRS are modest, the correlation between the instruments is expected to be modest, even in generative models with no contribution from shared environment (Figure 4A-D). The consistently low accuracy of PA-FGRS across considered scenarios implies consistency modest correlations with PGS (Supplementary Figures S26-S28). Modest correlations between PGS and PA-FGRS imply that, even as PGS training GWAS sample sizes grow, PA-FGRS should remain complementary to PGS (Figures 4E-H). This trend is stronger when trait prevalence is higher (e.g., Figure 4E vs Figure 4F) and when the PGS is limited due to, e.g., small training GWAS or large contributions from non-tagged variants (i.e., when h2SNP/h2 is low or m is high; Figure 4, Supplementary Figures S29-31). Simulating shared environmental contributions from siblings and parents demonstrates that confounded familial and genetic variance in PA-FGRS reduces its empirical correlation with PGS relative to the expectations we derive (Supplementary Figures S18-S21). Taken together, low accuracies of PGS and PA-FGRS imply low to modest correlations between genotype- and phenotype-based genetic instruments are expected, even under a simple polygenic liability threshold model. This trend should persist even as GWAS sample sizes grow.
Expected and observed performance of genetic instruments are in alignment
Finally, we compared the empirical performance of our instruments (Figure 1) to our expectations under a simple polygenic liability model and given our particular input data (Online Methods; Figure 5; Supplementary Tables S5,S6). The iPSYCH proband pedigrees contain on average ∼22 relatives, but many are distant relatives and age-censored such that they correspond to average Nsibe of 1.43, 1.16, 2.11, 1.28, 1.70, when predicting ADHD, ASD, BPD, MDD, and SCZ, respectively (Online Methods; Supplementary Table S7). The modest empirical performance of PA-FGRS in our data (Figure 1AB) is broadly consistent with expectations for our theory, with the exception of SCZ which underperforms (Figure 5A). Similarly, the empirical performances of PGS in our data (Figure 1A,B) are broadly consistent with expectations, with the exception of ADHD which overperforms (Figure 5B, Online Methods, Supplementary Table S6). The modest empirical correlations between PGS and FGRS (Figure 1E,F) are, in fact, larger than expected from individual performances (Figure 5C). Our PA-FGRS (Observed Nsibe, 1.14 to 2.14, Supplementary Table S1) perform empirically as well as we expect PGS trained on population samples of 190,000 to 1,140,000 individuals from the same target population to perform (Figure 5D, Supplementary Table S7). Our PGS (Observed NPop. GWAS, 100,000 to 2,100,000, Supplementary Table S1) perform empirically as well as we expect PA-FGRS constructed from pedigrees with Nsibe from 0.3 to 4.7 (Figure 5E, Supplementary Table S7). In contrast with the expectations above, we observe PGS to require many more than 65,000 training samples to achieve the performance of PA-FGRS with Nsibe=1. We attribute this to differences in simulated versus empirical parameters, namely, in practice, h2SNP/h2 trend much less than 0.5 and genetic correlations between training and target populations trend less than 1 (Supplementary Table S6). These results do not suggest large confounding by shared environment, which would predict higher than expected performance of PA-FGRS and lower than expected correlation between PA-FGRS and PGS (Supplementary Figures S18-21). As such, low empirical correlations can be consistent with simple polygenic liability models of inheritance and PA-FGS with only a few relatives can be as useful for predicting disease, in practice, as PGS constructed using large GWAS.
Discussion
We demonstrate and replicate that phenotype-based genetic instruments (PA-FGRS, FH) and genotype-based genetic instruments (PGS) explain comparable, nearly independent, proportions of disease liability to five psychiatric disorders. This trend - that family phenotypes and genotypes are quite modestly correlated in models of human disease - is reflected in a number of contemporary studies13–17,21, but the reason for this behavior has been unknown. We explored this phenomenon, concluding that PA-FGRS, FH, and PGS behave consistent with noisy estimators of a highly similar underlying genetic liability. We believe that even as GWAS grow to sample sizes in the millions, phenotype-based genetic instruments will remain complementary. The current emphasis on molecular approaches in human genetics should not discard the continued potential for familial phenotypic records in genetic medicine and genetic epidemiology.
Phenotype and genotype-based instruments have been compared, contrasted, and combined much more extensively in agricultural sciences3,5–7. In such applications, genomic predictions are now the preferred method for assessing breeding potential, attributable to the genetic architectures of most agricultural species, which are very different from those underlying trait variation in human populations. Decades of controlled breeding have greatly reduced genetic diversity and simplified the genetic architecture of studied traits6. Pedigrees also have different densities of different relatives, all of whom may have available molecular genotypes6. A key parameter in determining the performance of a PGS is the number of independent genetic elements segregating among genomes of the target population29,30, which is a function of genetic diversity. For dairy cattle this number may be ∼5,000, but for humans it may range from ∼60,000 (the genome captured by SNP-arrays) to ∼500,000 (the genome captured by whole genome sequencing)6. This, along with a focus on binary disease outcomes, limits the pace at which PGS could make phenotypic records redundant in human biomedical applications.
Our proposal, that genomic prediction is a hard problem in human biomedicine and genetic instruments are simply very noisy, is in accordance with the preponderance of evidence from quantitative genetics25,26. Twin and family studies repeatedly purport a large additive genetic contribution to complex disorder etiology, with sparser, less consistent, and smaller contributions from shared (i.e., familial) environment. Among the five psychiatric disorders we studied, the contributions from shared environment may be largest for major depressive disorder. However, for MDD, the most direct and powerful tests for familial environment, comparing half-siblings reared together and apart32 or children and adoptive/step-parents33, and found significant, but very modest effects of shared environment. For some specific traits or disorders, e.g., substance use34–36 or educational attainment37, substantial shared environmental effects have been reported, but these appear to be exceptions rather than a rule26. Still, we can not interpret our results as suggesting effects of shared environment are always zero. It is, however, more likely, parsimonious, and consistent with a broader body of work to ascribe currently reported modest correlations between genotype- and phenotype-based instruments to error in measuring highly similar additive genetic liability.
At the moment, PGS outperform PA-FGRS for SCZ and BPD, but not for ADHD and ASD. For SCZ and BPD, this can be attributed to two things: the large training GWAS for BPD38,39 and SCZ38,39 with population sample sizes >1.5 million and the rarity and later onset of SCZ and BPD limiting PA-FGRS. In comparison, for ADHD and ASD, PGS are trained in much smaller GWAS (population sample sizes: 150,000 to 400,000) and PA-FGRS use more complete records as onset is younger and, at least for ADHD, prevalence is higher. However, we observe that our ADHD PGS outperforms BPD and SCZ PGS, despite the latter having much larger training samples. Here, being trained on a highly comparable sample (train-test sample genetic correlations: ∼1 for ADHD vs. ∼0.45 for SCZ and ∼0.78 for BPD) overcomes differences in training GWAS of >1,000,000 population samples. This illustrates that the best single genetic predictor will depend on disease parameters (h2 and prevalence), the usefulness of familial records, and, critically for PGS40, the genetic similarity between the training and target samples. The various instruments, then, have unique sources of error, further supporting integration of multiple predictors and expectations of modest overlap. Differences in, at least, genetic ancestry41, ascertainment schema42, and phenotype quality43 are known to reduce genetic correlation among test-training pairs, and will continue to affect PGS performance if not given attention. The within family nature of PA-FGRS mitigates some of this loss of efficiency. In scenarios where financial resources, trait (genetic) architecture, or obtainable/comparable population sizes are limiting, opting to collect detailed familial records may remain a more efficient approach for building best-in-practice genetic instruments.
Our work informs future research using genetic instruments. First, we found that widely implemented approaches for estimating variance explained on the liability scale27 can be biased for PA-FGRS or FH. We propose directly estimating liability scale variance using appropriately weighted probit models in order to protect against reporting strongly biased estimates. Second, the concept of heritability has a central place in generative and etiological models, but these models are abstract and describe asymptotic (i.e., infinite sample size) performance of potentially practical predictive instruments. As we have shown, for human biomedical research, genetic predictions based on phenotypes of relatives will never account for the full heritability, in practice - it would require at least hundreds to thousands of half-sibling offspring records per proband. Whole genome sequence based PGS, which may need to capture ∼500,000 independent genomic factors6, could require training samples equal to, or beyond, the current global population to match performance to heritability for some diseases. Such extreme sample sizes are, in part, a function of lost power when converting a continuous liability score to a binary diagnosis44 and may not be required for quantitative traits, such as human height45. There are and will continue to be methodological advances that improve efficiency of PGS beyond our simple modeling framework, but it is worth discussing if heritability is the right goal post for practical aims of human biomedicine, such as prediction or classification. Third, if our interpretation of PGS, PA-FGRS, and FH as highly noisy estimates of the same genetic liability is correct, covarying or adjusting, e.g., FH by PGS, will not provide adequate control for confounding of the underlying genetic liability construct. Models that consider or make adjustments for the expected amount of noise in an instrument46, a quantity we show can be estimated with plausible accuracy, could be preferred.
Our study should be viewed in light of some limitations. First, our familial records are incomplete but still may overestimate effect sizes of predictions in clinical settings. Our modeling leverages all familial information available at the end of follow-up, which is valid for etiological models, but reflects information that may not be available at the time of a clinical decision (e.g., birth or first hospital contact, etc.). Imperfect patient recall or age-censoring in relatives regarding diagnoses made subsequent to the time of evaluation of the proband would reduce the information available. Our results may not generalize to predictive studies which should carefully account for such “conditioning on the future” to better mimic real clinical assessments or risk-screens. Second, we found that the modest correlation observed between PA-FGRS and PGS was actually higher than our simple framework predicts, indicating that our model does not reflect the generative model underlying our real data. This could be due to a number of features we do not model, such as non-random mating, indirect genetic effects, frequency dependent genetic architectures, or other phenomena that we have not considered, but the trend is nevertheless, inconsistent with confounding by shared environment. Third, and related, PA-FGRS is derived under a simple, additive liability model - the predominant model for dichotomous traits in genetics for nearly a century - but this currently limits our ability to consider these more nuanced contributions.
Genetics as a field is rooted in resemblance among relatives and even as molecular techniques provide increasingly deep and specific access to variability across the genomes of millions of individuals, incorporating these foundational sources of data is expected to play a continued, critical role in describing genetic liability for complex traits and disorders.
Online Methods
iPSYCH2015 Case-Cohort Study
The Lundbeck Foundation initiative for Integrative Psychiatric Research (iPSYCH)47,48 samples singleton births between 1981 and 2008, alive on their first birthday, with known mothers residing in Denmark (N=1,657,449). The iPSYCH 2012 case-cohort includes 86,189 individuals (30,000 random population controls; 57,377 psychiatric cases)47, while the iPSYCH 2015i case-cohort includes an independent 56,233 individuals (19,982 random population controls; 36,741 psychiatric cases)47,48. Taken together, these two ascertainments comprise the iPSYCH 2015 case-cohort study. DNA was taken from dried blood spots housed at the Danish Neonatal Screening Biobank49. The Infinium PsychChip v1.0 array (2012) or the Global Screening Array v2 (2015i) were used for genotyping. Register based diagnoses are made by licensed practitioners during in- or out-patient specialty care (diagnoses or treatments assigned in primary care are not included) and are obtained from the Danish Psychiatric Central Research Register (PCRR)50 and the Danish National Patient Register (DNPR)51. Unique identifiers of the Danish Civil Registration System52 enables linkage across population registers, to parental identifiers, and to the neonatal blood spots. The use of this data follows standards of the Danish Scientific Ethics Committee, the Danish Health Data Authority, the Danish Data Protection Agency, and the Danish Neonatal Screening Biobank Steering Committee. Data access was via secure portals in accordance with Danish data protection guidelines set by the Danish Data Protection Agency, the Danish Health Data Authority, and Statistics Denmark.
Imputation and quality control of genotypes followed custom, mirrored protocols53
BEAGLEv5.154,55 was used to phase and impute the data in conjunction with reference haplotypes from the Haplotype Reference Consortium v1.1 (HRC)56. SNPs were checked for missing data across SNPs and individuals, deviations from Hardy-Weinberg equilibrium in controls, abnormal heterozygosity, minor allele frequency (MAF), batch artifacts, and imputation quality. 7,649,999 imputed allele dosages were retained for analysis. Samples were checked for genotype-phenotype sex discordance and abnormal heterozygosity. KING57 was used to estimate kinship and ensure no second degree or higher relatives remained within either cohort or across cohorts. The smartpca module of EIGENSOFT58 was used to estimate PCs and a classifier was used to remove individuals distant from those with parents and four grandparents born in Denmark. For this study, in particular, we defined the following disorders according to ICD-10 codes associated with at least one registered treatment: major depressive disorder (MDD; F32-3), bipolar disorder (BPD; F31), schizophrenia (SCZ; F20), autism spectrum disorders (ASD; F84, F84.1, F84.5, or F84.9), and attention deficit hyperactivity disorder (ADHD; F90.0). In total, we retain, in iPSYCH 2012 24,266 random probands, 14,970 ADHD probands, 12,101 ASD probands, 1581 BPD probands, 18238 MDD probands, 2624 SCZ probands, and in iPSYCH 2015i 15,381 random probands, 7499 ADHD probands, 5600 ASD probands, 1141 BPD probands, 8668 MDD probands, 2721 SCZ probands.
iPSYCH 2015 case-cohort genealogies
Details of the iPSYCH 2015 case-cohort genealogy are described previously21 following previous approaches59. Briefly, 2,066,657 unique relatives were obtained using mother-father-offspring linkages for the 141,26560 (before QC) genotyped probands. A population graph was constructed using kinship261 and FamAgg60 packages for R with edges defined by recorded trios. Relatedness coefficients were estimated by summing unique paths between two individuals, weighted by (0.5)^(number of edges in the path)62. Same-sex twins were given kinship of 0.375 and maternal siblings with missing paternal records were given kinship of 0.25 according to prior comparisons with SNP-data21.
Diagnoses made between 1968 and 1994 are limited to in-patient contacts and ICD-8 codes and were assigned to disorder labels to match ICD-10 groups as follows: major depressive disorder (MDD; 296.09, 296.29, 298.09, or 300.49, but not 296.19, 296.39, or 298.19), bipolar disorder (BPD; 296.19, 296.39, or 298.19), schizophrenia (SCZ; 295.x9, excluding 295.79), autism spectrum disorders (ASD; 299.01-299.04), and attention deficit hyperactivity disorder (ADHD; 308.01)47. FH variables were defined as having at least one affected parent, half sibling, or full sibling.
Narrow sense heritability (h2) for each psychiatric disorder was estimated using relatives in the genealogies of the random population sample. To account for cohort effects, we selected 250,057 sibling pairs in these genealogies such that both siblings were born in the same decade (either, 1960-1970, 1970-1980, 1980-1990, 1990-2000, 2000-2010). We then estimated the sib-pair tetrachoric correlation for each disorder within each decade, combining the estimates in an inverse variance weighted meta-analysis. The meta-analyzed tetrachoric correlation and its standard deviation were divided by the sibling kinship (0.5) to give an estimate of h2 and compute 95% confidence intervals.
A simple, additive polygenic liability threshold model
Our approach is based on a simple additive polygenic liability threshold model, which is tractable, useful, and commonly employed. Briefly, We assume that for a fully observed individual i their binary disease status, Di, is the result of a latent, normally distributed liability, Li. Di is expressed when Li is above a critical threshold, t, that is the upper tail standard normal quantile corresponding to the lifetime prevalence of the disease, Kpop. The liability Li is the sum of a genetic value, Gi, and an environmental deviation, ei, and we assume no G-E interactions and no contributions from shared environment. G is an additive function of genetic factors, x, with true allelic average effects, β, and explains h2 proportion of population variance in L. The genetic value may itself have two components, a GWAS related SNP component, Gi,SNP, explaining h2SNP proportion of variance in liability, and genetic residual, explaining h2 - h2SNP that may represent incomplete linkage or other genetic factors.
Computing Pearson-Aitken Family Genetic Risk Scores (PA-FGRS)
PA-FGRS21 estimates the genetic component of latent disease liability under assumptions of a modified polygenic liability threshold model that relaxes the assumption above that individuals are fully observed by defining where Ki is the cumulative incidence of the disease in the age window during which individual i was observed. The key difference from above is that here, the probability of a true disease status Di being observed (i.e., that Yi = 1 | Di = 1) is related to the proportion of the lifetime incidence for which an individual was observed. As a result, we model the expected liabilities of cases and controls as mixtures of truncated normals such that observed controls are treated as a probabilistic mixture of a true case and a true control.
PA-FGRS estimates the genetic component of latent disease liability under this model for a proband, p, from the observed disease statues of their nrel relatives, a modified relatedness matrix, Σ, describing the relationship between trait liability in relatives and genetic liability of the proband, and known parameters, θ, such as the age of each relative at the end of follow up, and the age-specific incidence of the trait. where rp,i is the relatedness coefficient between the proband and their i-th relative. Briefly, we estimate Ĝp,PA-FGRS according to the Pearson-Aitken (PA) selection formula to provide a direct, efficient, and highly accurate solution for an estimator that otherwise requires expensive resampling or numerical integration techniques 21.
The PA updating proceeds as follows. First, the vector of mean expected liabilities for all relatives, μ, and the expected covariance among liabilities of relatives, Ω, are initialized to expectations in the population assuming no observations have been made: μ0 = 0 and Ω0 = Σ. Then, μ and Ω are updated iteratively, conditioning on each relative x = 1,… nrel in sequence. At each step, the current expected liability of the selected relative, μx, is updated to a conditional expectation, , based on the current estimate, the observed disease status of the relative, and other known parameters. with ϕ(⋅), the PDF of the standard normal distribution, Ф(⋅), the CDF of the standard normal distribution, , and if Yx = 0 or πx = 1 if Yx = 1. All remaining (i.e., unselected) liabilities are then updated conditional on the updated liability of the selected relative. This can be thought of as propagating a non-linear (i.e., liability threshold model appropriate) weighting of the observed disease status of the selected relative into the expected liabilities of their genetic relatives, including the proband.
Similarly, the liability covariance, Ω, is updated at each step by first updating the variance of the mean expected liability for the selected relative, and then the rest of the covariance conditional on this updated variance for the selected relative. The procedure is repeated until only the proband remains unselected (i.e., proband liability has been conditioned on all relatives). For further details see Krebs et al 21 and Supplementary Information. We computed these PA-FGRS for individuals from the iPSYCH 2012 and iPSYCH 2015i cohorts separately (Supplementary Table S1) using trait parameters described in Supplementary Table S6.
Computing polygenic scores (PGS)
PGS approximate the genetic value under the polygenic liability threshold model by substituting a set of observed SNP marker genotypes for an individual, i, for the unknown generative genetic factors (X), and using transformations of per allele effect estimates from a training GWAS as a surrogate for generative allelic effects (β). We computed PGS combining imputed dosages with estimated using SBayesR63 to update results provided by large GWAS. For SBayesR we followed the developer recommendations using provided UKBiobank LD matrix estimated for 2.8 million SNPs and with parameters: gamma=0.0,0.01,01,1, pi=0.95,0.02,0.02,0.01, burn-in=5000, chain-length=25000, exclude-mhc, impute-n. For BPD 38, MDD 64, and SCZ 39 we use large published GWAS that are independent of iPSYCH. For ADHD and ASD we use the results of mixed effects model GWAS65 from the complementary iPSYCH cohort (e.g., beta from GWAS in iPSYCH 2015i for PGS in iPSYCH 2012).
Statistical model fitting to iPSYCH data
Variance in liability explained by combinations of genetic instruments was estimated with weighted multivariate generalized linear models incorporating a probit link function. A baseline model included age, gender, and the first ten principal components of an estimated genetic relatedness matrix with subsequent models including combinations of PGS, FH, and PA-FGRS. The liability scale variance for each model was estimated according to a modified approach presented in Lee et al 27 as, where is a vector of coefficients from the inverse probability weighted probit regression of the disease status on the matrix of covariates X. For the weighting, two-stage sampling weights were computed as the product of the case-control sampling weights suggest by Lee et al 27 (unit weights in controls and in cases) and a second, study specific, probability weight to reflect selection out of the sample due to relatedness pruning (Supplementary Note S1). X∗, a random subset of X such that w ∗= KPop.
Reported liability variance explained is after subtracting that of a baseline covariates only model. Empirical p-values were used to test for significance of changes in liability variance explained in nested models as the proportion of times out of 100,000 that the empirical change associated with adding a variable to a model was larger than adding a random permutation of the variable. Significance adjusted to p < 0.05/110 model contrasts = 4.55 × 10-4. Unadjusted and adjusted odds ratios (OR) were estimated using multivariate generalized linear models with a logit link function. Pearson’s correlations among genetic instruments were estimated in the random subcohort of each iPSYCH cohort and not adjusted for covariates.
Estimating the expected Accuracy of Pearson-Aitken Family Genetic Risk Scores (PA-FGRS)
Exact Solution
The expected accuracy of our PA-FGRS estimator was derived fully in the Supplementary Information S11 and is defined as the product of the inverse of the heritability and the variance of the unbiased estimator ĜPA-FGRS, where D is nrel-variate thresholded gaussian, and the conditional expectation of G is the PA estimator above. Where {d1,d2,…,df} is the set of possible configurations of disease in the selected pedigree. For pedigrees with relatively few numbers of relatives we can solve this by a explicit calculation of the nrel-variate integral over the f configurations (see code availability for our function fgrs_accuracy(…, method=”pa”,estimate= “theory”)), but for larger pedigrees it is intractable. This full solution is used for the FGRS expectation results presented in Figures 2-4.
Linearly Approximated Solution
Alternatively, one may use more tractable expectations of accuracy from a linear estimator (i.e., expected breeding value) in place of the full liability scale estimator PA-FGRS (Supplementary Information S3-5), similarly defined as the product of the inverse of the heritability and the variance of Ĝlinear, with c being a 1 x nrel vector describing the expected covariance between the proband genetic liability and the true disease status of the i-th relative such that . P is an nrel x nrel matrix describing the expected covariance in disease status among the relatives, which can be approximated such that and Pij,i=j = Var(Di) = Kpop(1 − Kpop) following the second order Taylor approximation used by Golan et al66. Here, rpi is the relatedness coefficient between the proband, p, and their i-th relative, rij is relatedness coefficient between the i-th and j-th relatives, and t is the liability threshold value, t = Ф−1(1 – KPop). We show that this is a good approximation of the full solution in smaller pedigrees (Supplementary Figures S14-S16) and use this to compute the expected accuracy of PA-FGRS for iPSYCH pedigrees as presented in Figure 5. When individuals are not fully observed, we use and where and to estimate this accuracy as,
Linearly Approximated Number of Full Sibling Equivalents (Nsibe)
In the special case of pedigree containing one relative type (e.g., Full siblings of the proband), the second order Taylor approximation of Eq. 10 reduces to, Here, rp is the relatedness coefficient between the relatives and the proband, rr is the relatedness coefficient among the relatives which may be different than rp (e.g., as for half-sibling offspring where rp=0.5, rr=0.25). To derive the Nsibe for any predictor, we set this equation to its observed or expected accuracy, which for an empirical predictor can be recovered its performance as , and solve this equation for the value of nrel when rp = rr = 0.5. This approximation was used to estimate Nsibe for iPSYCH pedigrees, simulated pedigrees, and to equate observed PGS sample size to equivalent PA-FGRS Nsibe.
Estimating the accuracy of polygenic scores (PGS)
The expected accuracy of polygenic scores has been described elsewhere by, e.g., Wu et al 30, Daetwyleer et al 29, and Dudbrige 67 under related, but not identical assumptions (see Supplementary Information S7-8). We use the formulas from Wu et al 30 to estimate the expected squared accuracy of a PGS as where M is the effective number of independent markers. Shrinkage-based estimators can gain power in the context of non-infinitesimal models (e.g., ldpred 68) and we can evaluate, where a non-standardized Gaussian , and with , and
Expected correlation between PA-FGRS and PGS
Under a simple liability framework as we propose, the ĜPA-FGRS and ĜPGS are statistically independent conditional on the true genetic value (G)69 (Supplementary Information S9) In this scenario, the expected correlation is Where and are the liability variance explained by the estimated genetic values and is the liability scale heritability.
Expected joint performance of PA-FGRS and PGS
The expected semi-partial accuracy in liability of the PA-FGRS, or expected correlation between the PA-FGRS and liability after accounting for the PGS, can be written as, which allows us to predict the expected joint performance of a PGS and PA-FGRS fit in the same model as,
Comparisons of expected and observed performance
Expected performance and relationships of genetic instruments depends on both assumed and estimated trait parameters that are described in Supplementary Table S6. The expected performance of PA-FGRS in iPYSCH were computed using the mean of the linear approximation for the expected accuracy of PA-FGRS for each iPSYCH pedigree using known values for and KPop.
The expected performance of the PGS in iPSYCH were calculated by first estimating, using the non-shrinkage estimator, the expected squared accuracy of the PGS in the training sample following the expectations of the non-shrinkage estimator above assuming M=60,000, and using training sample parameters Ncase, Ncontrols, , KPop. The expected out of population test sample performance (i.e., in iPSYCH) was then estimated as, Where is the squared genetic correlation between the phenotype as defined in the training GWAS and in iPSYCH, and is the SNP-based heritability for the iPSYCH trait.
The expected correlation between the observed PGS and PA-FGRS were estimated as above using provided values for .
The population GWAS sample size expected to result in a PGS with performance equal to that of the empirical PA-FGRS instruments, was calculated by setting the non-shrinkage based estimator (Eq. 13) equal to the observed value for squared accuracy of the PA-FGRs, (i.e., ) and solving for Nq assuming w = KPop. The calculations assumed the training GWAS was performed in the same population that the PGS was applied within (i.e., and rG,train,test = 1), that m=60,000 and known KPop.
The pedigree Nsibe expected to result in a PA-FGRS with performance equal to that of the empirical PGS instruments, was calculated by setting the linear approximation estimator for full sibling pedigrees above (Eq. 12) equal to the observed value for squared accuracy of the PGS, (i.e., ) and solving for nrel assuming KPop and .
Code availability
PA-FGRS, predicted performance, and simulations are implemented in R code available at https://github.com/BioPsyk/PAFGRS
Data Availability
All data produced in the present study are available upon reasonable request to the authors and in accordance with Danish legislation