Abstract
The UK Biobank’s brain imaging data is an essential resource for clinical research, but its cost and difficulty in obtaining limit the imaging study to only 100,000 participants, leaving the majority of UKB subjects without imaging data. However, because imaging-derived phenotypes (IDPs) are heritable, and most UKB subjects have genetic information available, it’s possible to predict IDPs for UKB subjects outside the imaging study using genetic data. To this end, this study systematically developed and evaluated biobank-scale genetic polygenic risk scores (PRS) for 4,206 IDPs from multiple brain imaging modalities and processing pipelines. The results indicate that the majority of IDPs (64.76%, 2,774/4,206) were significantly predicted by PRS developed by subjects with both genetic and imaging data. Moreover, genetically predicted IDPs showed associations with a wide range of complex traits and diseases, with the patterns being consistent across different imaging pipelines. These findings suggest that genetic prediction through PRS is a cost-effective and practical way to make the UKB imaging study more beneficial to a broader population. The PRS data resources developed in this study have been made publicly available through Zenodo and will be returned to the UK Biobank.
The large-scale brain imaging data from the UK Biobank (UKB) imaging study has proven to be an immensely valuable resource for characterizing brain structural and functional organizations1,2. This data has been instrumental in establishing links with clinical biomarkers3,4, predicting brain aging5-7, and facilitating early disease detection8. Launched in 2014, the UKB imaging study reached a milestone in 2022 by scanning the multimodal brain magnetic resonance imaging (MRI) of its 50,000th participant. Although it is the world’s largest imaging study, the UKB imaging study will ultimately include 100,000 participants9, leaving 80% (400,000) of the half a million UKB subjects without imaging data. Given the cost and difficulty of collecting additional imaging data, it is crucial to develop strategies that extend the utility of the UKB imaging study to more participants in UKB and a wider population.
The polygenic risk scores (PRS) can be used to predict traits or disease risk for individuals by aggregating genetic information across the genome10,11. The development of numerous prediction methods12, reporting standards13, genetic data resources14, and data sharing platforms15 has enabled the application of PRS to a wide variety of complex diseases and heritable traits. Both family and population-based studies have shown that variation in brain structure and function, as measured by brain MRI, are heritable2,16-18. Recent genome-wide association studies (GWAS)2,19-25 have identified many genetic loci associated with brain imaging-derived phenotypes (IDPs). Consequently, PRS methods can be employed to predict brain IDPs for UKB subjects who are not part of the imaging study. A number of GWAS have investigated and reported the prediction accuracy (out of sample R-squared) of PRS for brain IDPs19,26,27 in small-scale independent testing data, indicating that genetic data could partially recover variations in brain IDPs, especially when the PRS was developed and applied to the same population or research cohort. These pilot studies demonstrate that genetically-predicted IDPs can serve as valuable proxy imaging biomarkers in the absence of readily available brain MRI data.
In this study, we systematically developed and evaluated PRS for brain IDPs in UKB subjects without imaging data. We examined 4,206 brain IDPs from various imaging modalities and independent processing pipelines, including 3,905 traits generated by the UKB brain imaging team1,2,20 (UKB Data Category 100, referred to as UKB-Oxford data hereafter, https://open.win.ox.ac.uk/ukbiobank/big40/) and 301 traits processed by BIG-KP19,21,23 (https://bigkp.org/). These imaging biomarkers spanned major MRI categories, such as structural MRI (sMRI, including regional brain volumes, cortical thicknesses, and surface areas), diffusion MRI (dMRI, with diffusion tensor imaging (DTI) parameters), resting-state functional MRI (rfMRI, featuring amplitude28 and functional connectivity traits), task-based functional MRI (tfMRI, using activation z-statistics), and susceptibility weighted brain MRI (incorporating regional median T2star). The Methods section and Table S1 provide more detailed information. Figure 1 presents an overview of the study design. The list of genetic variants and their weights for constructing PRS for brain IDPs can be found at https://github.com/xcyang17/IPRS_UKB.
RESULTS
Developing biobank-scale PRS for 4,206 brain IDPs
To develop and assess the PRS for brain IDPs in UKB subjects without imaging data, we employed data from individuals of white British ancestry with brain IDP information from UKB phases 1 to 3 data releases2,20,28 for training (average n = 34,224, released up through 2020). We generated GWAS summary statistics of brain IDPs, which were then used as input for the PRS models to create genetically predicted IDPs for all UKB subjects without imaging data (average n = 454,318). An independent hold-out dataset containing brain IDP data served as a test set to evaluate the predictive performance of the generated IDPs (average n = 3,438). We used PRS-CS29 to construct the PRS, incorporating genotyping data from the UKB study, and 461,488 genetic variants were included in the prediction model after standard genetic data quality controls. The details can be found in Methods.
In summary, we discovered that 64.76% (2,774/4,206) of brain IDPs could be significantly predicted after controlling the false discovery rate (FDR) at 5% using the Benjamini-Hochberg procedure. Significant brain IDPs were present in both UKB-Oxford and BIG-KP across all brain MRI modalities (Fig. 2A and Table S2). Brain IDPs from the same imaging modality exhibited similar prediction accuracy ranges (Fig. 2B). For instance, the sMRI modality in BIG-KP consisted of 101 regional brain volumes generated by advanced normalization tools (ANTs)19,30, with an average prediction R-squared of 1.13% (s.e.=0.10%). The UKB-Oxford contained 1,437 sMRI IDPs, and most subcategories displayed similar prediction R-squared ranges as the BIG-KP ANTs traits. Additionally, there were variations in prediction accuracy across different modalities. Among the three modalities in BIG-KP, dMRI traits (110 DTI parameters from the ENIGMA-DTI TBSS pipeline21,31,32) demonstrated significantly higher prediction accuracy compared to sMRI traits (101 ANTs regional brain volumes) or rfMRI traits (90 rfMRI traits using the Glasser360 atlas21,31,32, P < 2.2 × 10−16, Wilcoxon rank test). Furthermore, we assessed the consistency of prediction performance across different PRS methods by conducting the same analyses on the 301 BIG-KP traits using DBSLMM21,31,32. Figure 2C reveals that the prediction accuracy of the two methods was consistent across various traits (Correlation = 0.9278, Table S2). These results suggest that brain IDPs can be consistently predicted by different PRS prediction methods.
PRS of brain IDPs were widely associated with complex traits
Utilizing the PRS developed for brain IDPs, we conducted association analyses with 265 phenotypes (Table S3) on UKB participants who initially lacked brain imaging data (Methods). Prior studies have identified associations between various complex traits and diseases using brain imaging data, such as intelligence, blood pressure, and education3,4. Our objective is to determine whether imaging-trait relationships can also be uncovered using the PRS of IDPs in the UKB non-imaging cohort. We outline the results for the 301 IDPs from the BIG-KP below.
At the Bonferroni significance level (265 × 301 tests), we discovered 2,053 significant pairs between 97 complex traits and 258 PRS of brain IDPs (|β| > 0.0053, P range = (1.75 × 10−115, 6.27 × 10−7)). Out of the 2,053 pairs discovered, 1,922 (93.62%) were replicated in an independent hold-out dataset (Fig. S1 and Table S4). Figure 3A shows the pattern of significant IDP-trait connections across different phenotype groups and imaging modalities. The PRS of dMRI traits exhibited the highest percentage of associations, followed by those of sMRI traits and rfMRI traits. Associations were observed across a wide range of phenotypes, including blood biochemistry biomarkers, curated disease phenotypes33, spirometry, body composition by impedance, and mental health. We provide several examples below, all of which have been replicated.
Significant associations were discovered and replicated between PRS and various disease-related phenotypes, including curated disease phenotypes, family health history, and health and medical history. Among the curated diseases, hyperthyroidism and hypothyroidism were found widely associated with all three imaging modalities (|β| > 0.0104, P < 6.02 × 10−7). These results were consistent with recent studies34,35 that reported significant changes in white matter radial diffusivity and axial diffusivity in adult patients with hyperthyroidism/hypothyroidism, which were known to be associated with memory dysfunction. In addition, hypertension and hypercholesterolemia were correlated with PRS of white matter structural connectivity traits and resting functional connectivity traits (|β| > 0.0102, P < 5.30 × 10−7). Hypertension was mostly correlated with PRS of DTI parameters involving the external capsule and the anterior limb of internal capsule tracts, and there was no association between hypertension and regional brain volumes, which was consistent with a previous study using the UKB brain IDPs36. Hypertension can lead to vascular stiffness and impaired cerebral perfusion, which in turn can cause microstructural white matter disruption and stroke37. There were significant associations between diabetes and PRS of all three modalities (|β| > 0.0105, P < 5.92 × 10−7), especially for the DTI parameters of the superior and inferior longitudinal fasciculus tracts and the inferior fronto-occipital fasciculus tract. These findings were consistent with two recent studies that investigated the effects of type 2 diabetes (T2D) on brain white matter38,39.
We found significant associations with multiple brain-related disorders and the family history of stroke and Alzheimer’s disease (|β| > 0.0109, P < 2.92 × 10−7). Family history of Alzheimer’s disease was significantly associated with PRS of DTI parameters of the hippocampal cingulum tracts, consistent with previous findings about changes in the DTI parameters of Alzheimer’s disease patients40-42. Multiple sclerosis was correlated with all three imaging modalities, such as PRS of DTI parameters of the cingulum and fornix-stria terminalis tracts. Previous research has linked structural damage in the cingulum with subjective fatigue perception in multiple sclerosis43, and the fornix has been found to be correlated with cognitive impairment in multiple sclerosis patients44. We also uncovered widespread associations with brain-related complex traits, including mental health, alcohol use, smoking, cognitive functions, and education. All mental health traits were associated with the PRS of regional brain volumes, and several of them (nervous feelings, visits to doctors/psychiatrists, and neuroticism score) were also widely associated with multiple DTI parameters (|β| > 0.0104, P < 6.18 × 10−7). Cognitive functions exhibited significant correlations with both DTI parameters and regional brain volumes (|β| > 0.0101, P < 5.93 × 10−7). For instance, fluid intelligence was positively linked with fractional anisotropy of the uncinate fasciculus, whereas higher fractional anisotropy can improve interhemispheric transfer time, boost information processing speed, and lead to more efficient cognitive functioning and faster reaction45. In summary, PRS for brain IDPs provide the opportunity to identify biologically relevant connections between the brain and complex traits and diseases.
Comparison of BIG-KP and UKB-Oxford IDPs in associations with phenotypes
Using the PRS of 3,905 IDPs from the UKB-Oxford database, we repeated association analyses with the 265 phenotypes (Methods). Our results confirmed the consistency in PRS-phenotype associations produced by brain IDPs from BIG-KP and UKB-Oxford pipelines. We also discovered new associations from imaging modalities exclusive to the UKB-Oxford. Below we compared the results of UKB-Oxford with those of BIG-KP and highlighted some interesting new associations between PRS and phenotypes.
We found 14,541 significant pairs between 100 phenotypes and 2,814 PRS at the Bonferroni significance level (265 × 3,905 tests; |β| > 0.0056, P range = (1.01 × 10−135, 4.83 × 10−8)), 13,899 (95.58%) of which were replicated in an independent dataset (Figs. 3B and S2, and Table S5). Comparing Figures 3A and 3B, both BIG-KP and UKB-Oxford PRS had the most associations in blood biochemistry, curated disease phenotypes, and mental health. The 1,439 sMRI traits in the UKB-Oxford consisted of multiple subcategories, including regional volumes, cortical areas, cortical grey-white contrast, cortical thickness, regional and tissue intensity, regional T2*, and white matter hyperintensity volume (Table S1). A high correlation (0.9506) was found between the number of significant associations obtained from the BIG-KP ANTs traits and the UKB-Oxford regional volumes, suggesting that PRS of volumetric measures from the two different pipelines resulted in consistent patterns of phenotypic associations. The other subcategories of sMRI revealed additional associations that were not detected by regional volumes. For example, playing computer games, a possibly addictive behavior, was found negatively associated with the area of the left inferior temporal (β = -0.0115, P = 4.35 × 10−8). It was reported that young male adults playing Internet video games had smaller inferior temporal gyri46. The associations detected by the 675 UKB-Oxford dMRI traits (tract-skeleton and probabilistic tractography traits) highly overlapped with those of the BIG-KP DTI parameters, with the correlation between the number of significant associations being 0.9797. The PRS of the dMRI traits had additional significant associations with rheumatoid arthritis and liver biomarkers (such as gamma-glutamyl transferase and direct bilirubin) in multiple white matter tracts (|β| > 0.0117, P < 4.30 × 10−8). Both gamma-glutamyl transferase and direct bilirubin were related to rheumatoid arthritis47,48, and previous studies have shown brain atrophy in rheumatoid arthritis patients49.
A wide range of phenotypes was associated with PRS of 1,777 rfMRI IDPs, including the family history of Alzheimer’s disease, neuroticism, cardiovascular problems, and blood biomarkers (|β| > 0.0112, P < 4.22 × 10−8). For example, the family history of Alzheimer’s disease was associated with the PRS in the visual network and the three core cognitive networks (the central executive, default mode, and salience networks) (β < -0.0115, P < 4.67 × 10−8). Previous studies found that Alzheimer’s disease progressively reduced visual functional network connectivity50, and MRI of the three core cognitive networks are known to be predictive of Alzheimer’s disease51-53. We also detected multiple associations between neuroticism and the PRS of rfMRI IDPs in the cerebellum (|β| > -0.0131, P < 2.08 × 10−8). Cerebellum plays an important role in motion control and is involved in cognitive functions, and previous studies showed functional connectivity of the cerebellum was highly involved in neuroticism54. Overall, the BIG-KP and UKB-Oxford IDPs provide consistent association patterns across different categories of phenotypes.
Concordance between brain IDPs and their PRS
In this section, we conducted an analysis of phenotypic associations between IDPs and phenotypes on UKB subjects with brain imaging data (average n = 34,870, Methods). We then compared the IDP-phenotype associations in the UKB imaging cohort with the PRS-phenotype associations in the UKB non-imaging cohort. At the FDR 5% level (265 × 301 tests), 4,717 pairs between 206 phenotypes and 297 IDPs were discovered and replicated (|β| > 0.0076, P range = (1.18 × 10−123, 8.19 × 10−3); Figs. 4A and S3, and Table S6). Out of these 4,717 IDP-phenotype associations, 1,383 pairs between 121 phenotypes and 266 PRS were significant at the FDR 5% level (Figs. 4B and S4, and Table S7). That is, PRS associations recovered 29.32% (1,383/4,717) of the IDP associations, corresponding to 58.74% (121/206) of the phenotypes and 89.56% (266/297) of the imaging traits. The distribution of the IDP signals in Figure 4A was in more diverse phenotype groups than that of the PRS signals in Figure 4B. Both IDP and PRS results were most abundant in blood biochemistry, curated disease phenotypes, and mental health traits, and the distribution of signals in each imaging modality was consistent. Among the 1,383 pairs that were significant in both IDP and PRS analyses, the correlation between their regression coefficients was 0.5685, and 78.16% (1,081/1,383) had regression coefficients in the same direction. The correlation among regression coefficients reduced to 0.4389 among all the 4,717 IDP-phenotype associations. These results suggest that the majority of PRS associations have the same signs as the IDP associations and their regression coefficients are partially overlapped.
Significant associations with various brain disorders, including stroke, multiple sclerosis, depression, and migraine, were identified by both brain IDPs (|β| > 0.0168, P < 4.62 × 10−3) and their PRS (|β| > 0.0064, P < 2.48 × 10−3). In multiple white matter tracts, significant positive associations with stroke were found with mean diffusivity and residual diffusivity and their PRS (IDP β > 0.0327, P < 1.21 × 10−7; PRS β > 0.0064, P < 2.48 × 10−3). These findings are consistent with the known impairment of white matter and motor deficits following stroke55. Similar to a recent study56, the mean diffusivity of the superior fronto-occipital fasciculus and its PRS had positive associations with depression (IDP β = 0.0390, P = 1.24 × 10−10; PRS β = 0.0079, P = 1.89 × 10−4). The brain-related complex traits that were associated with both brain IDPs and PRS included most mental health traits, cognitive functions, and electronic device usage, such as time spent watching TV, weekly use of mobile phone, and length of mobile phone use (IDP |β| > 0.0174, P < 4.52 × 10−3; PRS |β| > 0.0059, P < 4.83 × 10−3). In addition, we found that some brain orders were only significantly associated with brain IDPs, and not with PRS, such as bipolar disorder, Parkinson’s disease, and epilepsy (|β| > 0.0015, P < 7.68 × 10−3). For example, there were strong negative associations between bipolar disorder and the mean fractional anisotropy of the body and genus of corpus callosum (β < -0.0197, P < 5.82 × 10−4), which was consistent with findings reported in other studies57,58. In summary, we explored the overlaps between IDP and PRS phenotypic associations and confirmed that PRS can partially recover the imaging associations with brain-related diseases and complex traits. The PRS can be used as proxy imaging biomarkers when brain MRI data are unavailable.
DISCUSSION
In this study, we generated PRS for 4,206 brain IDPs for UKB subjects without imaging data. These PRS have been investigated in relation to a wide range of phenotypes and it was confirmed that they can provide biologically relevant information to brain-related complex traits and diseases. We found consistent predictive accuracy and association patterns across IDPs from different pipelines, such as the volumetric measures in UKB-Oxford and BIG-KP. The PRS of brain IDPs partially recovered previously known associations generated from imaging data. It is possible to detect almost 30% of the IDP associations using their PRS proxy data, and the majority of these PRS associations have the same sign as the IDP associations. We have provided the data resources so that users can easily reconstruct PRS in the UKB database.
When real brain imaging data are not available, the PRS can be used as genetically predicted variables for brain structure and function. However, as shown in our prediction and association analyses, the PRS is only able to partially reconstruct the imaging phenotypes. It has generally been observed that PRS has demonstrated imperfect performance in predicting the most complex traits and diseases, which can be attributed to a number of factors, including a limited number of training samples, heritability, and weak genetic effects59. Another challenge in PRS applications lies in ancestry and population differences. As the current UKB imaging cohort had the majority of the subjects of European ancestry, generating PRS in non-UKB and/or non-European studies may have further reduced performance60. More powerful PRS methods that better account for PRS limitations and cohort differences may result in more informative PRS for potential clinical applications.
METHODS
Methods are available in the Methods section.
Note: One supplementary information pdf file and one supplementary table zip file are available.
Data Availability
The PRS data resources have been made publicly available at Zenodo (https://doi.org/10.5281/zenodo.7709788). The individual-level data used in this study can be obtained from https://www.ukbiobank.ac.uk/.
METHODS
Imaging traits
The data used in our study was obtained from the UK Biobank (UKB) study, which recruited around 500,000 individuals between the ages of 40 and 69 between 2006 and 201061 (https://www.ukbiobank.ac.uk/). The ethics approval of the UKB study was obtained from the North West Multicentre Research Ethics Committee (approval number: 11/NW/0382). We used a total of 4,206 brain imaging-derived phenotypes (IDPs) from the UKB study, which consisted of 301 BIG-KP19,21,23 (https://bigkp.org/) and 3,905 UKB-Oxford1,2,20 (https://open.win.ox.ac.uk/ukbiobank/big40/) traits. BIG-KP traits were divided into three groups. First, we obtained 101 regional brain volumes19 from structural MRI (sMRI) images by applying the advanced normalization tools30 (ANTs). Second, we generated 110 tract-averaged diffusion tensor imaging (DTI) parameters from diffusion MRI (dMRI) using the ENIGMA-DTI pipeline31,32. Third, for resting-state fMRI (rsfMRI), we partitioned the cerebral cortex into 360 brain areas using the Glasser360 atlas62. We obtained 90 functional activity (amplitude) and functional connectivity (edge) traits for 12 functional networks63. The UKB-Oxford had 1,437 IDPs from sMRI, 675 from dMRI, 1,777 from rsfMRI, and 16 from task-based functional MRI (tfMRI). The sMRI IDPs consisted of FIRST (Category 1102), FAST (Category 1101), FreeSurfer ASEG (Category 190), FreeSurfer BA exvivo (Category 195), FreeSurfer a2009s (Category 197), FreeSurfer DKT (Category 196), FreeSurfer desikan gw (Category 194), FreeSurfer desikan pial (Category 193), FreeSurfer desikan white (Category 192), FreeSurfer subsegmentation (Category 191), regional T2* (Category 109), and white matter hyperintensity volume (Category 112). The 675 dMRI IDPs included 432 from Category 134 and 243 from Category 135. The 1,777 rsfMRI IDPs included 76 amplitude (node) traits and 1,701 functional connectivity (edge) traits from whole brain spatial independent component analysis1,64,65 (Category 111). Lastly, there were 16 tsfMRI IDPs from Category 106. The image acquisition, preprocessing procedures, and quality controls were detailed in the UKB Brain Imaging Documentation (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf). See Table S1 for the complete ID list of all brain IDPs.
PRS constructions
We performed the following genetic quality controls for the set of subjects with both brain IDPs and genetic data23: 1) removed individuals with missing genotype rate > 0.1; 2) removed variants with missing genotype rate > 0.1; 3) removed variants with minor allele frequency (MAF) < 0.01; and 4) removed variants that failed the Hardy-Weinberg test for equilibrium at 1 × 10−7 level. Using individuals of white British ancestry, the GWAS was performed using linear mixed effect models via fastGWA66 (average n = 34,224). The adjusting covariates included age (at imaging), age-squared, sex, the interaction between age and sex, the interaction between age-squared and sex, first 40 genetic principal components28 (PCs), estimated total intracranial volume (eTIV), head motion measurements and their squares, brain position measurements and their squares, and volumetric scaling. Additionally, for regional brain volume IDPs, the total brain volume (TBV) was included as an adjusting covariate. For TBV, the eTIV and volumetric scaling were not included as covariates. With the GWAS summary statistics as input, we applied PRS-CS29 and DBSLMM67 to obtain the effect sizes. The hyperparameters of both methods were the default values and/or the automatically tuned values. We then used PLINK to generate risk scores in testing data by summarizing across genetic variants, weighed by their effect sizes estimated from PRS-CS29 and DBSLMM67.
The prediction accuracy of PRS was measured by the incremental R-squared, which was the additional phenotypic variation that can be explained by the PRS while adjusting for the effects of covariates in a linear regression model. The covariates included age, age-squared, sex, the interaction between age and sex, the interaction between age-squared and sex, and the first 40 genetic PCs. The prediction accuracy was estimated in a dataset consisting of unrelated UKB individuals of non-British ancestry with brain IDP data (average n = 3,200).
PRS-phenotype and IDP-phenotype association analyses
We employed a discovery-replication approach to examine associations between PRS and phenotypes in UKB participants without brain IDPs. We randomly selected 70% of UKB British white individuals (average n = 202,893) as the discovery dataset for PRS-phenotype associations, while the remaining 30% of UKB British white individuals, all UKB white but non-British individuals, and all non-white individuals (average n = 129,333) were used as the replication dataset. We treated the values greater than five times the median absolute deviation from the median as outliers and removed these values. A total of 265 UKB phenotypes were tested, which represented a wide range of traits from various trait domains. Specifically, the 265 UKB phenotypes included 24 mental health traits (Category 100060), 5 cognitive traits (Category 100026), 12 physical activity traits (Category 100054), 6 electronic device use traits (Category 100053), 8 sun exposure traits (Category 100055), 3 sexual factor traits (Category 100056), 3 social support traits (Category 100061), 12 family history of diseases (Category 100034), 21 diet traits (Category 100052), 9 alcohol drinking traits (Category 100051), 6 smoking traits (Category 100058), 34 blood biochemistry biomarkers (Category 17518), 3 blood pressure traits (Category 100011), 3 spirometry traits (Category 100020), 32 early life factors (Categories 135, 100033, 100034, and 100072), 9 greenspace and coastal proximity (Category 151), 2 hand grip strength (Category 100019), 13 residential air pollution traits (Category 114), 5 residential noise pollution traits (Category 115), 2 body composition traits by impedance (Category 100009), 4 health and medical history traits (Category 100036), 3 female specific factors (Category 100069), 1 education trait (Category 100063), and 57 curated disease phenotypes based on Dey, et al. 33 (Table S3).
Association testing was then conducted to examine the relationship between the 4,206 IDP-derived PRS generated by PRS-CS and the 265 UKB phenotypes. To investigate the PRS-phenotype associations, we conducted a linear regression analysis, adjusting for the same set of covariates separately in the discovery set and the replication set. The adjusted covariates included age, age-squared, sex, the interaction between age and sex, the interaction between age-squared and sex, and 40 genetic PCs. Specifically, we regressed the IDP-derived PRS onto the UKB phenotypes and calculated P values using a two-sided t-test. We prioritized the results that met the following three criteria: 1) significant after Bonferroni correction in the discovery dataset, 2) significant at a nominal significance level (0.05) in the replication dataset, and 3) had regression coefficients with matching directions in both the discovery and replication datasets.
We analyzed the associations between the 301 BIG-KP IDPs and the 265 UKB phenotypes in a discovery-replication design. Specifically, the discovery set included all unrelated white British subjects from UKB phases 1 to 3 data releases, which was similar to the training GWAS dataset. The replication set consisted of all the rest of the non-discovery unrelated subjects from UKB phases 1 to 3 data releases and all unrelated subjects from UKB phase 4 data release. We performed linear regression on the discovery dataset and the replication dataset, respectively, by regressing the BIG-KP IDP on the UKB phenotype. We adjusted for the same set of covariates as used in the GWAS analysis. We reported P values from the two-sided t-test and prioritized those that met the following three criteria: 1) significant in the discovery dataset after controlling the false discovery rate (FDR) at a 5% level with the Benjamini-Hochberg procedure, 2) significant at a nominal significance level in the replication dataset, and 3) had regression coefficients with matching directions in both the discovery and replication datasets.
Code availability
We made use of publicly available software and tools. The list of genetic variants and their weights used to construct PRS for brain IDPs are available at https://github.com/xcyang17/IPRS_UKB.
Data availability
The PRS data resources have been made publicly available at Zenodo (https://doi.org/10.5281/zenodo.7709788). The individual-level data used in this study can be obtained from https://www.ukbiobank.ac.uk/.
ACKNOWLEDGEMENTS
The study has been partially supported by funding from the Wharton Dean’s Research Fund and Analytics at Wharton, as well as start-up funds from Purdue Statistics Department. This research has been conducted using the UK Biobank resource (application number 76139), subject to a data transfer agreement. We would like to thank the individuals who represented themselves in the UK Biobank for their participation and the research teams for their efforts in collecting, processing, and disseminating these datasets. We would like to thank the research computing groups at the University of North Carolina at Chapel Hill, Purdue University, and the Wharton School of the University of Pennsylvania for providing computational resources and support that have contributed to these research results.