Abstract
Genome-wide association studies (GWAS) indicate neuropsychiatric disorders to be highly polygenic. Polygenicity refers to the additive influence of multiple genes on variation in a disorder. GWAS have identified many single-nucleotide polymorphisms (SNPs) across the genome associated with neuropsychiatric disorders, each explaining a very small part of individual variance within a trait. This complicates the understanding of the genetic architecture and biological mechanisms underlying these disorders. Previous studies have successfully used common genetic variants associated psychiatric disorders to generate Polygenic Risk scores (PRS). PRSs estimate the aggregate genetic liability of an individual for a particular disorder or trait based on a genome-wide association study (GWAS) of said trait. Here, we present a novel bottom-up approach to polygenic scoring that starts at the brain, rather than at behavior or clinical diagnosis. We used GWAS of structural brain imaging derived phenotypes (IDPs) from the UK Biobank as a basis to generate polygenic imaging derived scores (PIDS). As a proof-of-concept of its application, we applied PIDS to quantify differences in the genetic influence on brain structure between persons with ADHD and unaffected controls. 94 IDPs were selected using the subcortical segmentation atlas and the Desikan-Killiany cortical atlas from FreeSurfer. In the polygenic model training stage, 72 out 94 PIDS were associated with their respective IDP in an independent sample. Global measures such as cerebellum white matter, cerebellum cortex and cerebral white matter ranked amongst the highest in variance explained ranging between 3% and 5.7%. Our results indicate that a majority of GWAS of structural neuroimaging traits are becoming sufficiently powered to enable reliable and meaningful use of polygenic scoring applications that accurately reflect the underlying polygenic architecture well. Larger discovery GWAS will further improve upon this. Conversely, our associations with ADHD were relatively weak. Larger target samples are required to establish robust links of PIDS with behavioral or clinical traits like ADHD. With this novel approach to polygenic risk scoring we provide a new tool for other researchers to build on in the field of psychiatric genetics.
Introduction
Genome-wide association studies (GWAS) indicate neuropsychiatric disorders to be highly polygenic (Watanabe et al., 2019). Polygenicity refers to the additive influence of multiple genes on variation in a disorder. GWAS have identified many single-nucleotide polymorphisms (SNPs) across the genome associated with neuropsychiatric disorders, explaining a very small part of individual variance within a trait (Selzam et al., 2019; Liu et al., 2021). This complicates the understanding of the genetic architecture and biological mechanisms underlying these disorders. Genes are likely expressed in mediating brain development and information processing rather than influencing behavior directly (Weinberger, 2011). We will explore the use of imaging derived phenotypes (IDPs) to examine brain-based intermediate phenotypes, also known as endophenotypes. IDPs consist of structural measures of brain anatomy including tissue and structure volumes. We will use IDPs to assess the genetic architecture of neurological and psychiatric disorders and explain individual variation.
Human brain structure and function vary greatly between individuals. This variation can be measured non-invasively using magnetic resonance imaging (MRI) (Krain & Castellanos, 2006). The effects of neurological and psychiatric disorders such as Schizophrenia, Bipolar disorder, Autism and ADHD can be studied through MRI (Bonvicini et al., 2016; Romme et al., 2017; Yang et al., 2020). Understanding the genetic determinants of robust MRI markers could help identify biological pathways relevant to brain development and various diseases. Human brain structure traits possess attractive properties for genetic study (Matoba et al., 2020). Moreover, the study of human brain structure traits as broad endophenotypes for neuropsychiatric disorders could help in the identification of biological mechanisms and pathways that mediate individual differences in complex behaviors and vulnerability to disease (Bigos & Weinberger, 2010).
Given GWAS results, the count of risk alleles present in an individual person can be weighted by their effect size and summed into polygenic risk scores (PRSs). PRS is then used to quantify genetic liability for that individual. Mapping individual genetic liability can help understand the genetic architecture of the studied trait. Previously, studies have used hypothesis-driven approaches to investigating genetic associations with psychiatric disorders. A single PRS per neuropsychiatric disorder can fall short in describing variation for everyone due to biological heterogeneity. The genetic variation at the basis of risk factors may vary between individuals. With the availability of large data resources such as UK Biobank we can perform a hypothesis-free study. Here, we present a novel bottom-up approach to polygenic scoring that starts at the brain, rather than at behavior or clinical diagnosis. We used GWAS of structural brain IDPs from the UK Biobank as a basis to generate a set of polygenic imaging derived scores (PIDS) to better grasp the genetic variation between individuals.
ADHD will be used as a proof of concept for testing the potential utility of PIDS to clinical research. We chose ADHD because it is highly heritable with an average reported heritability of 76% (Bonvicini et al., 2016). Genetic factors have been broadly linked to the susceptibility to develop ADHD and some susceptibility genes have been identified (Demontis et al., 2022; Ronald et al., 2021). Differences in brain structure have been reported in cases with ADHD compared with controls (Krain & Castellanos, 2006). Alterations found in prefrontal gray matter, occipital gray matter, and inferior fronto-occipital fasciculus white matter were found not only in ADHD patients, but also in their unaffected siblings relative to controls (Bralten et al., 2016). Moreover, reductions in total cerebral volume, the prefrontal cortex, the basal ganglia (striatum), the dorsal anterior cingulate cortex, the corpus callosum and the cerebellum are also associated with ADHD (Poissant & Joyal, 2008; Curatolo et al., 2010).
In the present study we use a genomic approach to evaluate the genetic link between brain structure and disease risk. We will use PIDS to explore the relationship between ADHD and the brain, in a brain-driven fashion. The aim of this polygenic score was not to predict risk for individual patients but explore the construct and face validity of polygenic imaging derived scores. This new approach could improve our understanding of factors involved in neuropsychiatric traits.
Methods
Data
Discovery Sample
Summary statistics were obtained from the UK Biobank, a long-term epidemiological study of 500,000 volunteers from across the United Kingdom between the age of 40 and 69. The dataset used for this project contains 3929 IDPs using 39,691 participants and QC measures available from UKB ranging over six distinct MRI modalities. (Smith et al., 2021). For this project we focused on the structural image modality. After removal of individuals as part of genetic processing a sample of 33,224 individuals remained (n = 33,224, 17,411 genetic females). The ages in the sample were as follows: females mean age = 63.7±7.4 years (min = 45.1, max = 81.8); males mean age = 65.0±7.6 years (min 46.1, max 81.8). Additional information on the data can be found in the original paper by Smith et al., 2021. The UK Biobank has approval from the Northwest Multi-centre Research Ethics Committee (MREC).
Neuroimaging traits / IDPs
Freesurfer V6.0.0 (https://surfer.nmr.mghm.harvard.edu) was used to define both cortical and subcortical structures. The definition of cortical structures, referred to as parcellation, is performed by mapping atlas regions of the brain onto an individual’s cortical surface model (Destrieux et al., 2010). The Desikan-Killiany atlas was used for automated parcellation (aparc) of the cortical structures (Desikan et al., 2006).
For this project we were interested in broad structural differences of the brain between ADHD patients and controls. All gray matter tissue, global white matter and cerebrospinal fluid labels from Freesurfer were included in this project. These labels are part of the ‘regional and tissue volume category’ for 18 subcortical volumes from the subcortical segmentation atlas and 29 cortical volumes from the Desikan-Killiany atlas were included, bilaterally, resulting in 94 IDP’s (Table 1. Supplementary materials). Confounds were removed from the data including 40 population genetic principal components supplied by UKB and with a greatly expanded new set of confounds including age, head size, sex, head motion, scanner table position, imaging center and scan date-related slow drifts. More details for the imaging and genetics processing and cohorts can be found in the online methods of the original paper by Smith et al., 2021.
Target Samples
The NeuroIMAGE study is a follow-up of the Dutch part of the International Multicenter ADHD Genetics (IMAGE) project (Müller et al., 2011). NeuroIMAGE sample is a longitudinal cohort of ADHD patients, their unaffected siblings, and controls. The original 365 ADHD families and 148 control IMAGE families, consisting of 506 participants with an ADHD diagnosis, 350 unaffected siblings, and 283 healthy controls (Kuntsi et al., 2006). In combination with new participants the NeuroIMAGE study consisted of 1,069 children (751 from ADHD families; 318 from control families) and 848 parents (582 from ADHD families, 266 from controls). The NeuroIMAGE dataset consists of 3 repeated measures ‘waves’, collected between 2003 and 2012. A slight adaptation of the Parental Account of Childhood Symptoms (PACS) clinical interview and the Conners Parent and teacher rating scales were used in wave 1 to more objectively asses children’s symptoms (Taylor, 1986; Conners, 1997,1998; Müller et al., 2011). In the immediate follow-up wave 2 & 3, Conners’ Adult ADHD Rating Scale (CARS R-L), Conners’ Parent Rating Scale (CPRS R:L), and Conners Wells’ Adolescent Self-Report scale: Short Form (CASS:S) (Conners et al., 1998; Conners et al., 1998; Parker & Sitarenios, 1999). MRI measurements were recorded for each child older than 7 years without contradiction for an MRI measurement (e.g., implanted metal and medical devices) and willingness. Freesurfer V6.0.0 (https://surfer.nmr.mghm.harvard.edu) was used to define both cortical and subcortical structures. The 94 IDP summary statistics from UKB were matched by volumetric measurements for the same cortical and sub-cortical brain regions. Participating children completed a session in a magnetic resonance imaging (MRI) scanner. Data was collected at the Radboud University Medical Centre, Donders Centre for Cognitive Neuroimaging or VU University Amsterdam and VU University Medical Centre. At both sites similar 1.5 T scanners were used (Siemens SONATA and Siemens AVANTO; Siemens, Erlangen Germany), both using 8-channel Phase Array Head Coils. The NeuroIMAGE study was approved by the regional ethics committee (Centrale Commissie Mensgebonden Onderzoek).
NeuroIMAGE
To determine ADHD diagnosis at the time of participation in NeuroIMAGE, a diagnostic algorithm was used combining the diagnostic interview (K-SADS) with the Conners rating scales as described previously in von Rhein et al (2015). Following the diagnostic algorithm of clinical interview and ADHD, participants were categorized as either diagnosed with ADHD, Unaffected, and ADHD Subthreshold. To be diagnosed with ADHD, participants were required to have ≥ 6 hyperactive/impulsive and/or inattentive symptoms, met the DSM-IV criteria for pervasiveness and impairment and showed an age of onset before 12. To be considered unaffected, participants were required to have a T < 63 on each subscale of the Conners questionnaires and ≤ 3 symptoms derived from the combined symptom counts of the K-SADS and CTRS-R: L/CAARS-S:L. Participants who did not meet the requirements for either ADHD or unaffected status received the ADHD-subthreshold classification. Criteria were adapted for adults (≥ 18 years) resulting in a diagnosis after a symptom count of ≥ 5. Adults were considered unaffected with an ADHD symptom count of ≤ 2. Table 1 summarizes the NeuroIMAGE sample according to clinical trajectory of persistence, partial remission, and full remission. Inclusion criteria for participants was a full diagnosis of ADHD at least once previously in NeuroIMAGE and having supplied quality-control passing DNA.
Target sample update DELTA
A subset of 60 NeuroIMAGE participants were reexamined in an additional MRI follow-up study, Determinants of Long-Term Trajectories of ADHD (DELTA), about five years after the third NeuroIMAGE wave. Participants were only included in DELTA if they had a full diagnosis of ADHD at least once previously in NeuroIMAGE and had contributed DNA which passed QC. The Diagnostic interview voor ADHD bij volwassenen (DIVA) interview was used to assess diagnosis of ADHD considering all participants were aged 18 or above (Kooij & Francken, 2010). DIVA is based on the DSM-IV criteria and is a follow up to earlier semi-structured interviews for ADHD in adults. Individuals were classified as “affected” if they had 6 or more symptoms for hyperactivity and/or inattention and clinically significant self-reported impairment in at least 2 domains. Individuals were labeled as “unaffected” with fewer than 3 symptoms for inattention and hyperactivity or “Subthreshold” if they fit neither category. Details concerning demographics and clinical information can be found in table 1. Individuals were categorized to be either (I) persistent, (II) partially in remittance or (III) fully remitted.
Persistence vs Remittance
To define this longitudinal categorical status of participants, only the first and last measurement were considered in the NeuroIMAGE study. Individuals with a full diagnosis of ADHD at the first or second wave and at their last measurement were considered persisting. Individuals with a full diagnosis of ADHD at the first and/or second wave and were unaffected at the last available measurement were considered fully remitted. Individuals with a full diagnosis at the first or second wave and subthreshold ADHD or unaffected at the last measurement were considered (at least) in partial remission. Individuals only affected at the last measurement were considered ‘late onset’ and excluded (N=48). Details on the diagnostic tools, imaging and genetics processing and cohorts can be found in the original paper by van Rhein et al., 2015.
Genotyping and quality control
DNA was extracted from whole blood or saliva. Genotyping was done using the Illumine PsyChip, in two batches (N=192 and N=798). Plink 1.9 was used to merge the batches ((https://www.cog-genomics.org/plink/1.9/) (Purcell, 2007; Yang et al., 2015). During quality control, mismatching SNPs between batches, sex chromosomes, insertions, deletions, other non-SNP variation; and ambiguous SNPs were excluded. RICOPILI pipeline was used to further process the data (https://sites.google.com/a/broadinstitute.org/ricopili). SNP exclusion criteria were set at call rate < 0 .05, Hardy-Weinberg equilibrium P,10−6, and minor allele frequency <0.01. Moreover, SNPs with a missing-rate difference > 0.01between batches were also excluded. Exclusion criteria for individuals were call rate < 0.05, identity by descent with another participant (quasi-random exclusion for each pair, with priority to more complete phenotyping data), inconsistent sex between self-report and GWAS data. 18 individuals were excluded in RICOPILI after principal component analysis (PCA) due to deviation from the main sample cluster (defined by 4 principal components, greater than 3 standard deviations). No association was found between the principal components and genotyping batch. A batch-effect of P,0.0008 resulted in the exclusion of 48 SNPs (P-threshold determined based on QQ-plot diagonal of the genome-wide batch effects). After quality control, 283,843 SNPs and 952 individuals were included in the re-run of the PCA analysis, required for subsequent analyses. Imputation was performed in accordance with the RICOPILI pipeline using minimac3 and Phase3 of the European 1000 genomes as reference. The PRS analysis were run with the strict best-guess output from RICOPILI (INFO score>0.1, MAF>0.005,2,840,886 SNPs).
PIDS calculation
The PIDS analysis consisted of three different stages. (1) First, PRSice was run to determine optimal P-thresholds and select the model using IDP’s (Choi et al., 2020; Euesden et al., 2016). PIDS’s able to successfully predict their respective imaging phenotype were optimized. As we sought to validate the genomic signature of the PIDS, establishing a significant association between the PIDS and IDP in an independent sample verifies the predictive quality of the genetic score. (2) Secondly, the optimal model for each PIDS was used to predict ADHD diagnosis in the same dataset using generalized logistic mixed linear models to account for family structure. In addition, an explorative analysis was run to explore the consistency of associations between PIDS and ADHD during the different waves of NeuroIMAGE. (3) Thirdly, further analyses were performed using PIDs in trend association with ADHD diagnosis. General linear mixed effects models were constructed to regress the PIDs on persistence/remittance of ADHD symptoms in the NeuroIMAGE and DELTA dataset, accounting for family structure. ADHD-relevant PIDS were run on persistence and remittance in the longitudinal NeuroIMAGE cohort and consequent follow-up DELTA wave. An overview of all different steps performed in this project can be seen in figure 1.
Step 1: Model training and PIDS selection: The UKB summary statistics were used as the ‘base data’ PRSice input with standard clumping parameters (P=1, R2 <0.1 and 250kb interval) and P-value thresholds (P-value thresholds between P=10^-4 and P=0.5 at P=5*10^-5 intervals). We chose to run this broad selection of P-thresholds to thoroughly test the performance of the PIDS and see if the known polygenic nature of the IDPs was reflected across P-thresholds. In the resulting bar plots of variance explained across P-thresholds, we expect to see a broad range of multiple P-thresholds to perform well, including higher P-thresholds, reflecting the polygenicity of the IDPs. P-threshold optimization was based on the best high-resolution prediction of each PIDS on their respective IDP counterpart in the NeuroIMAGE sample before the update of DELTA. Because our aim was to establish whether an association exists between the PIDS and their respective IDP, for the PIDS selection, a lenient significance threshold of P,0.05 was chosen for PIDs selection in this training phase.
Step 2: Exploratory associations with ADHD. PIDS that passed the training stage at P<0.05 were entered in logistic mixed models run in R 3.6.1, with family as a random factor, ADHD diagnosis as dependent variable, PRS and as covariates, genotype batch, site and sex and four principal components. Afterwards, an exploratory analysis was performed in which the PIDS were regressed on ADHD and ADHD-subthreshold at waves one, two and three to ascertain whether any effects of the PIDS were consistent across time. Mixed logistic regressions were run for each PIDS found to be in association with ADHD following the exploratory analysis. For step 3, PIDS associated with ADHD were regressed on Persistence-Remittance, Partial, Remittance, and Remittance-Category from the NeuroIMAGE dataset as outcome variables. Four regression models were constructed for DIVA DELTA diagnosis (DSM5), DELTA diagnosis von Rhein et al. (DSM4), Persistence-Remittance-updated, and Partial Remittance-updated from the DELTA dataset (N=60).
Power analysis
The sample included twin pairs and their siblings, meaning observations weren’t independent. Therefore, the effective sample size was calculated to perform the power analysis (NE = (N*M) / (1 + ICC*(M-1)) in which N = the number of families, M= the number of individuals in a family and ICC = the (average) phenotypic correlation within a family. The power analysis itself was performed using the DubridgeLab library (https://rdrr.io/github/DudbridgeLab/avengeme/src/R/sampleSizeForGeneScore.R) based on the Power and Predictive Accuracy of Polygenic Risk Scores paper (Dudbridge, 2013). The PIDS were tested against ADHD to test for a common genetic basis with the Imaging Derived Phenotypes.
Results
Model optimization and calculation of PIDS
Descriptive information on each PIDS such as variance explained (R2), optimal P-threshold and SNP number can be found in the supplements (supplementary tables 1, 2 & 3). 72 out of 94 PIDSs were associated with their respective IDP (p < 0.05) in the target sample. The 72 PIDS were separated into 4 subgroups of left hemisphere cortical PIDS (1), right hemisphere cortical PIDS (2), left hemisphere Subcortical PIDS (3), and right hemisphere Subcortical PIDS (4). The genetic effect size (R2), P-threshold, and heritability (h2) for the PIDS in each subgroup are displayed in circosplots (figure 2-5).
Amongst PIDS scoring highest in genetic effect size were global measures cerebellum white matter, cerebellum cortex, and cerebral white matter, consistently found across hemispheres. Amongst the PIDS with the lowest variance explained > 1%, were multiple gyral volumes, superior and middle temporal volume, isthmus cingulate gyrus and the global measure cortex volume, consistently found across hemispheres (supplementary table 3). The Subcortical and Cortical volumes used for the PIDS differ significantly in heritability. Heritability was found to be a good predictor for the genetic effect size of the PIDS (F (1,70) = 11.77, p = 0.001) with an R2 of 0.132. Moreover, with the greater heritability found in the Subcortical volumes used for the PIDS we would expect the Subcortical PIDS to outperform the cortical PIDS. However, this was not the case with no significant differences found between Cortical and Subcortical PIDS in genetic effect size, P-threshold, or number of independent markers. The PIDS are consistent in their performance across the four different subgroups. Interestingly, small correlations (−0.4 < h2 < 0.4) were found for genetic effect size, P-threshold, and number of SNPS between hemispheres for both cortical and subcortical PIDS. A sizeable correlation (r = 0.77) was found between the cortical hemispheres for heritability. 22 PIDS out of the original 94 that were excluded from further analyses after stage 1. This exclusion could potentially have influenced the correlation values we observe in the analysis.
Most PIDS (52 out of 72) had an optimal P-threshold between 0.001 and 1, reliably reflecting the polygenic nature of IDP’s. However, the 10 PIDSs with the lowest P-thresholds ranged between 0.00005 and 0.00000005, conflicting with the expected polygenicity. The PRSice generated bar plots further illustrate these findings (supplementary figure 2.1-2.80).
The PIDS amongst the lowest ranging P-thresholds produce bar plots with inconsistent R-squared values across different P-thresholds. This may indicate overfitting of these specific PIDS. The bar plots for 28 PIDSs were of note due to consistent performance across most p-thresholds. Amongst these 28 PIDS were the global measures: cerebellum cortex, cerebellum white matter, cerebral white matter, and cortex volume, consistently found across both hemispheres. PIDS with consistent performance across a range of p-thresholds also performed optimally with inclusive P-thresholds (0.01-1) (supplementary table and figure’s 3.1-3.80).
PIDS and ADHD status
In stage two of the project, PIDS were used to predict ADHD diagnosis and subthreshold ADHD using linear mixed modeling. Results indicated six out of the 72 PRS to be nominally associated with ADHD diagnosis of which none remained after FDR correction (table 2) (p ,0.05). These 6 PIDS were approximately normally distributed (supplementary figure 1). To account for the large number of tests across 94 IDP’s, we considered a false-discovery rate (FDR) of 0.05. After FDR correction for the 72 performed tests, no significant association remained. No interaction was found with sex, site, or batch for any of the PIDS. The six PIDS were used to predict ADHD persistence using linear regression models. No significant relationship was found between any of the six IDP PRS and ADHD persistence (all P > 0.4). Nor was any significant relationship found for partial remission or remission category. The six PIDS were also used to predict determinants of long-term ADHD (DELTA). No significant effects were found for DIVA interview in DELTA diagnosis based on the DSM5, diagnosis based on DSM4, updated persistence and updated partial remittance.
Power analysis
Polygenic scores were constructed associating IDPS with ADHD based on a panel of 100,000 independent markers with a discovery sample of 30,000 and an effective target sample of 964. Assuming prevalence of 4%, a SNP-h2 of 22% and a genetic covariance of 6% we achieve 20% power. Due to the small genetic covariance between the IDPS and ADHD a larger sample size is required to achieve acceptable power. However, when increasing both target and discovery sample sizes, leaps in power can be achieved. When increasing the discovery sample to 50,000 and target sample to 3,000, 0.80 power would be the result (figure 6). The sample sizes required for this amount of power are an order of magnitude higher than currently available but are achievable in future research. The UKB sample is projected to increase to a GWAS imaging sample of 100,000 participants. Moreover, with large increases in the GWAS discovery sample, larger SNP-h2 closer to the narrow-sense heritability can be achieved. This increase in genetic effect size was not considered for the estimated increase in power, leaving this estimation to be somewhat conservative.
Discussion
This study investigated the relationship between the combined effect of imaging derived phenotype associated SNPs and ADHD-related traits by using polygenic imaging derived scores or PIDS. Where previous work tended to use clinically derived PRS to predict neuroimaging traits, we use IDPs to create PIDS and predict ADHD-related traits. We showed PIDS can be successfully derived with the majority robustly reflecting the genetically driven inter-individual differences in brain structure. However, in the ADHD sample, the PIDS did not describe inter-individual differences in related clinical variables. This lack of performance is in part due to insufficient power to detect the small sub portion of genetic variation shared between ADHD and the IDPs.
The results indicate that the majority of PIDS were predictive of their respective IDP in an independent target sample. The polygenic nature of the IDPs was reflected through consistent performance of most cortical and sub-cortical PIDS across a wide variety of P-value thresholds (supplementary figures 2.1-1.80 & 3.1-2.80). PRSice-generated bar plots and high-resolution plots reflect most PIDS to be stable predictors of out-of-sample cortical and subcortical brain structure despite the large age difference in the base (UKB) and target (NeuroIMAGE) datasets. This large degree of predictive PIDS indicates GWAS data such as the UKB can be successfully used to predict inter-individual variance. Moreover, the most predictive PIDS reached variance explained ranges upwards of 3% - 5.7%. A trend of association was found between the right hemisphere postcentral volume, left hemisphere thalamus, left hemisphere putamen, and both hemispheres of the isthmus cingulate volume PIDS and ADHD. The variance in the ADHD phenotype explained by the PIDS did not reach beyond 1%. None of these associations survived FDR correction. No associations were found between PIDS and ADHD persistence-remittance variables. In short, we did not find any significant association between PIDS and ADHD.
The SNP-heritability of cortical volumes being greater than subcortical volumes was expected to result in greater performance of the cortical PIDS in genetic effect size, significance, and power. Interestingly, no differences were found in genetic effect size, nor significance. Moreover, low correlations were found between hemispheres for genetic effect size, heritability, and p-thresholds. However, Cortical PIDS did achieve greater power than Subcortical PIDS.
The results of this project should be interpreted considering some limitations. First, The effects sizes for brain structure in association with ADHD are generally very small (Hoogman et al al., 2019). Due to the small genetic covariance between ADHD and the PIDS our target sample was insufficiently large to meaningfully inter-individual differences in related clinical variables. We reached 0.20 power with the current sample of 30,000 participants in the GWAS dataset and 744 in the target sample. Clinical sample size remains a big issue in neuroimaging research. However, we project with an increase to 50.000 participants in the UKB GWAS and 3000 in the clinical NeuroIMAGE target sample we will achieve 0.80 power. Interesting patterns/trends emerged from the analysis, with this increase in target and base datasets these should be pursued further.
Second, genetic effects on brain measures are inherently correlated (fleary et al, 2021; Hofer et al., 2020; Smith et al., 2021). with global measures being the most heritable. We did not use global brain measures as covariates in calculating the PIDS. The UKB GWAS has already been corrected for total head volume accounting for a degree of the correlation between the global brain measures and our PIDS. If we were to remove variation by accounting for global brain measures our ability to quantify the SNP-effects associated with IDPs would decrease substantially.
PIDS may have use for studying psychiatric disorders in the future, as a basis for better understanding underlying biology in addition to heterogeneous diagnosis (Whalley et al., 2012) The application of PIDS extend beyond single disorders into transdiagnostic vulnerability, researching psychiatric disorders starting from the brain. Focusing on imaging derived traits rather than syndromes and using the resultant knowledge and statistics as a modelling basis for bottom-up clinical research, represents a framework shift for intermediate phenotype use in psychiatric genetics. Disorders that are phenotypically dissimilar may still share common neural vulnerability (Beauchaine & Constantino, 2017). Hence, we chose to start with the definition of this neural vulnerability and asses the genetic component to this trait rather than that of a disorder. These transdiagnostic and data-driven applications of intermediate phenotypes can be useful if we are to evaluate neural vulnerabilities and their interactions with or role in genetic risk at the root of behavior.
This project presents a new approach to characterize the genetic underpinnings of psychiatric traits by quantifying the common variants involved in relevant brain variation and using them to profile individual patients. GWAS data can be used successfully to explain Individual variance. However, this same success has yet to be achieved for variance in ADHD. Currently, the implementation PRS is hindered by weaker application to non-European ancestry and limited translation of percentile loadings to lifetime risk. With improved diverse ancestry databases and a combining of environmental and clinical risk factors with the PRS, polygenic scores based on imaging derived phenotypes could serve as a more objective tool for investigating underlying neurobiology and genetic architecture complementary to syndromic classification. Ultimately, PIDS may have potential value when included in prediction and classification models. Profiling individuals in terms of genetic influences on different areas of the brain could for example be useful when describing heterogeneity in etiological factors, prognosis, or treatment response. These findings encourage future studies to use PIDS to translate fundamental neuroimaging genetics to the field of psychiatry and beyond.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Acknowledgements
We are appreciative to UK Biobank for making the data available and to all UK Biobank participants who donated their time and made this resource possible. The BIG40 Open Data Server is provided by the Wellcome Centre for integrative Neuroimaging, which is supported by center funding from the Wellcome Trust (203139/Z/16/Z). Compute resources were provided by the Oxford Biomedical Research Computing (BMRC) facility (a joint development between Oxford’s Wellcome Centre for Human genetics and Big Data Institute, supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre). Furthermore, we would like to thank all participants and families who participated in the IMAGE, NeuroIMAGE, and DELTA studies. The DELTA project was funded by a Radboudumc Hypatia Fellowship (E. S.). NeuroIMAGE was supported by a Dutch Research Council (NWO) Large Investment Grant (no. 1750102007010) and NWO Brain & Cognition an Integrative Approach Grant (no. 433–09-242 to J.K.B.), and grants from Radboud University Medical Center, University Medical Center Groningen and Accare, and VU University Amsterdam. B.F. has received educational speaking fees from Medice. J.K.B. has been in the past 3 years a consultant to / member of advisory board of / and/or speaker for Takeda/Shire, Roche, Medice, Angelini, Janssen, and Servier. He is not an employee of any of these companies, and not a stock shareholder of any of these companies. All other authors report no biomedical financial interests or potential conflicts of interest.
Footnotes
Updated and reformated all figures