MUTATE: A Human Genetic Atlas of Multi-organ AI Endophenotypes using GWAS Summary Statistics
============================================================================================

* Junhao Wen
* Christos Davatzikos
* Jian Zeng
* Li Shen
* Andrew Zalesky
* Ye Ella Tian
* Zhijian Yang
* Aleix Boquet-Pujadas

## Summary

Artificial intelligence (AI) has been increasingly integrated into imaging genetics to provide intermediate phenotypes (i.e., endophenotypes) that bridge the genetics and clinical manifestations of human disease. However, the genetic architecture of these AI endophenotypes remains largely unexplored in the context of human multi-organ system diseases. Using publicly available GWAS summary statistics from UK Biobank, FinnGen, and the Psychiatric Genomics Consortium, we comprehensively depicted the genetic architecture of 2024 multi-organ AI endophenotypes (MAEs). Two AI- and imaging-derived subtypes1 showed lower polygenicity and weaker negative selection effects than schizophrenia disease diagnoses2, supporting the endophenotype hypothesis3. Genetic correlation and Mendelian randomization results demonstrate both within-organ connections and cross-organ talk. Bi-directional causal relationships were established between chronic human diseases and MAEs across multiple organ systems, including Alzheimer’s disease for the brain, diabetes for the metabolic system, asthma for the pulmonary system, and hypertension for the cardiovascular system. Finally, we derived the polygenic risk scores of the 2024 MAEs. Our findings underscore the promise of the MAEs as new instruments to ameliorate overall human health. All results are encapsulated into the MUTATE genetic atlas and are publicly available at [https://labs-laboratory.com/mutate](https://labs-laboratory.com/mutate).

**Highlight**

*   Two AI- and neuroimaging-derived subtypes of schizophrenia (SCZ1 and SCZ2) show lower polygenicity and weaker negative selection signatures than the disease endpoint/diagnosis of schizophrenia, supporting the endophenotype hypothesis.

*   Brain AI endophenotypes are more polygenic than other organ systems.

*   Most multi-organ AI endophenotypes exhibit negative selection signatures, whereas a small proportion of brain patterns of structural covariance networks exhibit positive selection signatures.

*   The 2024 multi-organ AI endophenotypes are genetically and causally associated with within-organ and cross-organ disease endpoints/diagnoses.

![Figure1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F1.medium.gif)

[Figure1](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F1)

Keywords
*   Multi-organ AI endophenotypes
*   genetic correlation
*   Mendelian randomization
*   polygenic risk score

## Introduction

Multi-organ research1,4–11 represents a pivotal frontier in advancing our understanding of human aging and disease. In particular, integrating artificial intelligence (AI) into multi-organ imaging genetics1,4,12,6 has emerged as a novel approach, offering potential promise in advancing precision medicine13. This integration introduces a new array of endophenotypes14,15, serving as intermediate, often quantitative, phenotypes, potentially reshaping how we perceive and approach medical AI16 in imaging and genetic research.

In recent years, three primary catalysts have significantly advanced the field of genetics. The first pivotal factor stems from the extensive collaborative efforts in consolidating large-scale multi-omics datasets, which has endowed researchers with unprecedented statistical power previously inaccessible. As an illustration, the UK Biobank (UKBB) study17 stands out for its comprehensive collection of multi-organ imaging18, genetics19, and proteomics20,21 data within the United Kingdom. Similarly, the FinnGen study22, conducted in Finland, has amassed extensive clinical and genetic data. Secondly, efforts toward open science have propelled the field, especially emphasizing the significance of publicly available resources, such as genome- wide association study (GWAS) summary statistics and widespread scientific dissemination.

Notably, the FinnGen study and Psychiatric Genomics Consortium (PGC23) have publicly made all the GWAS summary statistics accessible22. Public GWAS platforms such as the GWAS Catalog24, OpenGWAS25, and GWAS ATLAS26 have consolidated and harmonized vast GWAS datasets, rendering them suitable for subsequent genetic analyses. Likewise, such good practice was also employed in the newly burgeoning field of brain imaging genetics27, including the BIG40 ([https://open.win.ox.ac.uk/ukbiobank/big40/](https://open.win.ox.ac.uk/ukbiobank/big40/)), the BIG-KP ([https://bigkp.org/](https://bigkp.org/)), BRIDGEPORT ([https://labs-laboratory.com/bridgeport](https://labs-laboratory.com/bridgeport)), and MEDICINE ([https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine)) knowledge portals. Finally, advanced computational genomics statistical methods using solely GWAS summary statistics, along with sufficient linkage disequilibrium information, have been developed, presenting an unparalleled chance to comprehend the genetic architecture of highly polygenic disease traits. For example, LDSC28 has been extensively utilized to estimate single-nucleotide polymorphism (SNP)-based heritability and genetic correlations. Mendelian randomization29 is a statistical method to dissect associations further, probing potential causal relationships among these complex human disease traits, although these methods often rely on several sensitive model assumptions30.

Despite these advancements, the intricate genetic foundation shaping these AI endophenotypes in the context of pleiotropic human disease endpoints (DE) within multi-organ systems remains largely uncharted. We previously applied AI to imaging genetic data and derived 2024 multi-organ AI endophenotypes (MAE). These encompassed 2003 multi-scale brain patterns of structural covariance (PSC) networks generated through a deep learning- analogy non-negative matrix factorization method12 (visualization for C32_1 encompassing deep subcortical structures: [https://labs-laboratory.com/bridgeport/MuSIC/C32_1](https://labs-laboratory.com/bridgeport/MuSIC/C32_1)), 9 dimensional neuroimaging endophenotypes (DNE) quantifying neuroanatomical heterogeneity (also known as disease subtype) within 4 common brain diseases1, and 12 biological age gap (BAG) assessing the individual deviation in typical aging (i.e., acceleration or deceleration from the chronological age) across 9 human organ systems4,6 (**Supplementary eTable 1a**). To contribute to open science31, we made all the GWAS summary statistics derived from UKBB data publicly available at the MEDICINE knowledge portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine). In addition, FinnGen analyzed genetic data for 2269 binary and 3 quantitative DEs from 377,277 individuals and 20,175,454 variants. They made these massive GWAS summary statistics publicly available to the community at [https://finngen.gitbook.io/documentation/](https://finngen.gitbook.io/documentation/) (**Supplementary eTable 1b**). Finally, PGC consolidated GWAS results focused on neurological disorders worldwide and made the GWAS summary statistics accessible to the research community ([https://pgc.unc.edu/](https://pgc.unc.edu/), **Supplementary eTable 1c**).

This study harnesses the extensive GWAS summary resources made publicly available by us on behalf of UKBB, FinnGen, and PGC (**Method 1**), along with the utilization of several advanced computational genomics statistical methods (refer to **Code Availability**), to thoroughly depict the genetic architecture of the 2024 MAEs (**Method 2**) and 525 DEs (>5000 cases) in the context of multi-organ investigations. Importantly, our previous research explored the genetic foundation of the 2024 MAEs but did not systematically encompass the FinnGen or PGC data.

Specifically, we included 521 DEs released by the FinnGen study, accessible at [https://finngen.gitbook.io/documentation/v/r9/](https://finngen.gitbook.io/documentation/v/r9/), and 4 brain DEs (Alzheimer’s disease (AD), Attention-deficit/hyperactivity disorder (ADHD), bipolar disorder (BIP), and schizophrenia (SCZ)) from PGC ([https://pgc.unc.edu/](https://pgc.unc.edu/)). This study expanded on this by systematically benchmarking the genetic analyses and comprehensively comparing various statistical methodologies28,30,32–38 (**Method 3**). Specifically, we aimed to compute the SNP-based heritability (*h*2*SNP*), polygenicity (π), the relationship between SNP effect size and minor allele frequency (*S*: signature of natural selection, genetic correlation (*r**g*), causality, and polygenic risk score (PRS) between the 2024 MAEs and 525 DEs. These findings were encapsulated within the MUTATE (**MU**l**T**i-organ **A**I endopheno**T**yp**E**) genetic atlas, which is publicly available at [https://labs-laboratory.com/mutate](https://labs-laboratory.com/mutate).

## Results

### The genetic architecture of the 2024 MAEs and 525 DEs

We computed three parameters to fully depict the genetic architecture of the 2024 MAEs (**Method 3a**). For the SNP-based heritability (*h*2*SNP*), SBayesS39 obtained the highest *h*2*SNP*for the 2016 brain MAEs (mean *h*2*SNP* =0.13 [0.01, 0.38]), followed by the pulmonary BAG (0.16±0.004), the eye BAG (0.14±0.009), the cardiovascular BAG (0.12±0.003), the renal BAG (0.10±0.003), and the musculoskeletal BAG (0.10±0.003) (**Fig. 1a** and **Supplementary eFile 1**). It is worth noting that SNP-based heritability varies across methods and depends on the input data, i.e., summary data or individual-level genotype data used in the method40. We aimed to benchmark the summary data-based methods by comparing the results from SBayesS with those of LDSC28 and SumHer33. Overall, while the estimates from the three methods were highly correlated (*r*=0.97 between LDSC and SumHer; *r*=0.99 between SBayesS and SumHer; *r*=0.99 between SBayesS and LDSC; **Supplementary eFigure 1**), SumHer (0.23±0.14) generally yielded larger*h*2*SNP* estimates than both LDSC (0.16±0.10) and SBayesS (0.13±0.08) (**Supplementary eFile 1**). We present the *h*2*SNP* estimate of the 525 DEs and 2024 MAEs in **Supplementary eFigure 2**. **Supplementary eFile 2** presents the results of the 525 DEs. For the 525 DEs, we converted the *h*2*SNP* estimates from the observed scales to the liability scales, following the recommendations of Ojavee et al . It’s important to clarify that we did not intend to compare the estimates of the two data sources due to differences in genotype coverage, sample sizes, allele frequencies, and other factors.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F2.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F2)

Figure 1: 
The genetic architecture of the 2024 MAEs Three parameters are estimated by SBayesS to delineate the genetic architecture of the 2024 MAEs, including (**a**) the SNP-based heritability (), (**b**) the relationship between MAF and effect size (*S*), and (**c**) polygenicity (). (**d**) We compared the and *S* parameters using harmonized GWAS summary data for two AI- and imaging-derived subtypes (SCZ1 and SCZ21) from UKBB and the disease endpoint of schizophrenia (SCZ2) from PGC. FinnGen data was not used due to bias stemming from the unavailability of FinnGen-specific linkage disequilibrium data (**Supplementary eMethod 1**). We present the distribution of the estimated parameters for the 2016 brain MAEs using a violin plot; the mean value is denoted by the black horizontal line.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F3.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F3)

Figure 2: 
Genetic correlation between the 2024 MAEs and 525 Des The significant genetic correlation estimates (*rg*) between 2024 MAEs and 525 DEs are depicted, considering two levels of corrections for multiple comparisons, considering the relatively smaller sample sizes (<40,000) for brain MAEs compared to other organ MAEs (>100,000). Initially, we reveal significant results shared between LDSC and GNOVA, employing Bonferroni correction based solely on the number of MAEs (P-value<0.05/2024), uncovering 133 MAE-AE pairs. Subsequently, a stricter correction based on both the number of MAEs and DEs is applied, leading to 45 unique MAE-AE pairs marked as red squares; the numeric results are displayed using results from LDSC. The genetic correlation for non-significant results was set to 0 for visualization purposes. For the MAEs, readers can explore the BRIDGEPORT portal for a visual representation of the 2003 brain PSCs (e.g., C256_225: [https://labs-laboratory.com/bridgeport/MuSIC/C256_225](https://labs-laboratory.com/bridgeport/MuSIC/C256_225)) and the other BAGs at the MEDICINE portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine).

We then computed the natural selection signature (*S*) for the 2024 MAEs. The metabolic BAG showed a strong negative selection (S=-0.82±0.10), followed by the pulmonary BAG (S=- 0.79±0.05), the hepatic BAG (S=-0.74±0.09), the renal BAG (S=-0.68±0.08), and the immune BAG (S=-0.66±0.11). For the brain MAEs (*S*=-0.33 [-1, 0.43]), the brain BAG and (S=- 0.70±0.12) the subtype (ASD1) for autism spectrum disorder42 (S=-0.90±0.11) showed strong negative selection effects (**Fig. 1b** and **Supplementary eFile 3**).

Finally, we calculated the polygenicity (Ti) for the 2024 MAEs. We found that brain MAEs (0.040 [0.003, 0.072]) showed higher polygenicity than other organ systems, followed by the pulmonary BAG (0.018±0.001), the musculoskeletal BAG (0.013±0.001), and the cardiovascular BAG (0.011±0.001) (**Fig. 1C** and **Supplementary eFile 4**). The PSC (C128_115: [https://labs-laboratory.com/bridgeport/MuSIC/C128_115](https://labs-laboratory.com/bridgeport/MuSIC/C128_115)) showed the highest polygenicity estimate (0.072±0.002).

### Supporting evidence for the endophenotype hypothesis

Previous studies43,44 have found supporting evidence for the endophenotype hypothesis14,15 using traditional brain map-based signatures, showing that more genes are associated with disease endpoints than imaging-derived signatures (i.e., endophenotypes). Of note, considering genetic differences between FinnGen and UKBB samples, SBayesS with the UKBB as LD reference may give biased estimates of *S* and Ti (LD from FinnGen not fully available; **Method 3a**).

Therefore, we used the GWAS summary data for PGC schizophrenia (SCZ2) and two subtypes of SCZ (SCZ1 and SCZ21) from our UKBB analysis to demonstrate this. The advantage of using PGC data is that the GWAS summary statistics are better powered (large sample sizes), and the data were from European ancestry groups across different countries. A data harmonization procedure is outlined in **Supplementary eMethod 1** to ensure a fair comparison of these estimates, which led to the utilization of a common set of SNPs and linkage disequilibrium information for computing the *S* and parameters. Our results showed that SCZ1 (=0.048±0.002; *S*=-0.61±0.09) and SCZ2 (=0.047±0.002; *S*=-0.54±0.12) had lower polygenicity signals and weaker negative selection effects than SZC (=0.055±0.003; *S*=- 0.82±0.04) (**Fig. 1d**). **Supplementary eFigure 3** shows the Manhattan plot of the harmonized summary data for SCZ1, SCZ2, and SCZ. These findings support the endophenotype hypothesis3, which suggests that intermediate phenotypes (such as SCZ subtype MAEs) are part of the causal pathway from genetics to exo-phenotypes (such as SCZ binary diagnosis), making them closer to the underlying etiology. Consequently, the SCZ subtypes were found to be less polygenic43,44.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F4.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F4)

Figure 3: 
Causal relationship from the 2024 MAEs to the 525 Des The causal relationship from the 2024 MAEs to the 525 DEs revealed 39 significant MAE-DE pairs, involving 633 MAEs as effective exposure variables (>8 instrumental variables before harmonization) and 525 DEs as outcomes. Bonferroni correction was applied to identify potential significant causal signals based on *i*) the 633 MAEs (P-value<0.05/633) and *ii*) the 633 MAEs and 525 DEs (P-value<0.05/633/524, denoted by the 15 red rectangles). Furthermore, we verified that the statistical significance attained for the IVW estimator was consistent and persisted across at least one of the other four Mendelian randomization estimators (Egger, weighted median, simple mode, and weighted mode estimators). For visualization purposes, the odds ratios for non-significant results were set to 1 and were left blank. For the MAEs, readers can explore the BRIDGEPORT portal for a visual representation of the 2003 brain PSCs (e.g., C32_4: [https://labs-laboratory.com/bridgeport/MuSIC/C32_4](https://labs-laboratory.com/bridgeport/MuSIC/C32_4)) and the other BAGs at the MEDICINE portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine).

### The genetic correlation shows organ-specific and cross-organ associations

We found 132 (P-value < 0.05/2024) and 45 (P-value < 0.05/2024/525) commonly significant positive genetic correlations (*rg*) after applying two levels of Bonferroni correction (**Fig. 2**) for the LDSC28 and GNOVA34 methods (**Method 3b**, **Supplementary eFile 5**, and **Supplementary eTable 2**). We noted that HDL encountered convergence issues with the models, as detailed in Method 3b.

Between these methods, the magnitude of the genetic correlations for the significant signals for both methods differed: mean, *r̂*g =0.24[-0.40∼0.52] with 213 significant signals for LDSC, mean *r̂*g =0.17[-0.30∼0.62] for GNOVA with 428 significant signals (**Fig. 2**). The three sets of converged estimates showed a strong correlation: *r*=0.77 (P-value<1x10-10; *N*=1,062,577) between LDSC and GNOVA, *r*=0.81 (P-value<1x10-10; *N*=59,289) between LDSC and HDL, and *r*=0.82 (P-value<1x10-10; *N*=59,289) between GNOVA and HDL. **Supplementary eFigure 4** shows the correlation of the three sets of estimates.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F5.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F5)

Figure 4: 
Causal relationship from the 525 DEs to the 2024 MAEs The causal relationship from the 525 MAEs to the 2024 DEs revealed 47 significant DE-MAE pairs, involving 214 DEs as effective exposure variables (>8 instrumental variables before harmonization) and 787 DEs as effective outcomes after quality checks. Bonferroni correction was applied to identify potential significant causal signals based on *i*) the 787 MAEs (P- value<0.05/787) and *ii*) the 787 MAEs and 214 DEs (P-value<0.05/787/214, denoted by the 23 red rectangles). Furthermore, we verified that the statistical significance attained for the IVW estimator was consistent and persisted across at least one of the other four Mendelian randomization estimators (Egger, weighted median, simple mode, and weighted mode estimators). For visualization purposes, the odds ratios for non-significant results were set to 1 and were left blank. For the MAEs, readers can explore the BRIDGEPORT portal for a visual representation of the 2003 brain PSCs (e.g., C128_13: [https://labs-laboratory.com/bridgeport/MuSIC/C128_13](https://labs-laboratory.com/bridgeport/MuSIC/C128_13)) and the other BAGs at the MEDICINE portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine).

Within the significant signals identified, we observed *i*) organ-specific associations, in which the MAE showed a genetic association with the DE originating from the respective organ system, and *ii*) cross-organ connections, in which the MAE and DE were primarily involved from different organ systems. For example, two brain PSCs showed significant negative genetic correlations with BIP from PGC (C512\_368 vs. BIP: -0.16±0.03; C1024\_114 vs. BIP: - 0.15±0.03). At a less stringent level, the brain MAEs were also genetically associated with DEs from other organ systems, including the positive correlation between C1024\_808 and obesity (E4\_OBESITY: *r**g*=0.17±0.13). The cardiovascular BAG was positively correlated with several DEs related to the cardiovascular system, including ischemic heart disease (I9_IHD: *r**g* =0.26±0.03), coronary heart disease (I9\_HEARTFAIL_AND_CHD: *r**g*=0.26±0.03), angina (I9_ANGINA: *r**g*=0.25±0.03) and atrial fibrillation (I9_AF: *r**g*=0.22±0.04). Likewise, the pulmonary BAG was positively associated with multiple DEs related to the lung and respiratory system, including chronic obstructive pulmonary disease (COPD_EARLY: *r**g*=0.47±0.04) and various forms of asthma (ASTHMA_NAS: *r**g*=0.43±0.04). Cross-organ connections were established, such as between the pulmonary BAG and substance abuse (KRA_PSY_SUBSTANCE_EXMORE: *r**g*=0.20±0.03) and hypertension (I9_HYPTENS: *r**g*=0.17±0.03). Lastly, the metabolic BAG was largely linked to different forms of diabetes (T2D: *r**g*=0.40±0.04).

### The brain, cardiovascular, and pulmonary MAEs are causally linked to DEs of multiple organ systems

Employing five distinct two-sample Mendelian randomization estimators, we identified 39 (P- value<0.05/633) and 15 (P-value<0.05/633/524) significant causal relationships, directed from the MAE to DE, that withstood the Bonferroni correction at two different levels of rigors, as per the inverse variance weighted (IVW) estimator and at least one of the other four estimators (**Method 3c** and **Supplementary eTable 3**).

Within the 15 significant causal relationships, the brain MAEs showed causal connections with DEs from the brain, as well as DEs from other organ systems. For example, the brain PSC (C1024_598) was causally linked to SCZ from PGC [P-value=9.89x10-8; OR (95% CI)=0.69 (0.59, 0.79); the number of IVs=7]. C1024_684 was causally linked to Ventral hernia from FinnGen [K11_VENTHER: P-value=1.09x10-7; OR (95% CI)=1.43 (1.25, 1.63); the number of IVs=18]. The pulmonary BAG was causally linked to multiple DEs related to the pulmonary system, including chronic obstructive pulmonary disease (COPD) [J10_COPD: P- value=2.70x10-20; OR (95% CI)=1.77 (1.56, 2.00); the number of IVs=59] and asthma [ASTHMA_PNEUMONIA: P-value=1.51x10-14; OR (95% CI)=1.67 (1.41, 1.96); the number of IVs=59]. The cardiovascular BAG was causally linked to ischemic heart disease (IHD) [ASTHMA_PNEUMONIA: P-value=1.09x10-7; OR (95% CI)=1.64 (1.36, 1.96); the number of IVs=37] (**Fig. 3**). The quality check of the significant signals is presented in **Supplementary eFolder 1**. **Supplementary eFile 6** presents the full set of results for the 521 FinnGen DEs and 4 PGC DEs.

### The DEs involving Alzheimer’s disease, diabetes, asthma, and hypertension exert causal effects on multi-organ MAEs

We then tested the inverse causality by employing the DEs as exposure and MAEs as outcome variables. We identified 47 (P-value<0.05/787) and 23 (P-value<0.05/787/214) significant causal relationships, directed from the DE to MAE, that survived the Bonferroni correction at two different levels of rigors (**Method 3c** and **Supplementary eTable 4**).

Within the 23 significant causal relationships (P-value<0.05/787/214), various forms of Alzheimer’s disease were linked to the brain MAEs, including the brain BAG [G6\_AD_WIDE: P-value=3.03x10-7; OR (95% CI)=1.10 (1.06, 1.13); the number of IVs=8] and metabolic BAG [G6_AD_WIDE: P-value=3.03x10-7; OR (95% CI)=1.07 (1.04, 1.09); the number of IVs=8].

Type 1 diabetes (E4_DM1NASCOMP) was also causally linked to multiple brain PSCs. In addition, the cardiovascular BAG was causally linked to multiple heart diseases, including hypertension [I9_HYPTENS: P-value=4.67x10-31; OR (95% CI)=1.23 (1.19, 1.27); the number of IVs=110]. Several forms of asthma were causally linked to the pulmonary BAG, such as allergic asthma [ALLERG_ASTHMA: P-value=2.38x10-9; OR (95% CI)=1.09 (1.06, 1.13); the number of IVs=14]. Finally, obesity was also linked to the renal BAG [E4_OBESITY: P- value=2.74x10-8; OR (95% CI)=1.11 (1.07, 1.15); the number of IVs=19] (**Fig. 4**).

**Supplementary eFolder 2** presents the quality check results of the significant signals.

**Supplementary eFile 7** presents the full set of results for the 521 FinnGen DEs and 4 PGC DEs.

### The polygenic risk scores of the 2024 MAEs

Using the PRS-CS45 method, we derived the PRS of the 2024 MAEs. We found that the 1799 MAEs could significantly (P-value<0.05/2024) predict the phenotypic BAGs in the test/target data (split2 GWAS; detailed in **Method 3d**). Among these, 1791 brain MAEs resulted in significant incremental *R2*ranging from 0.11% to 10.70% to predict the phenotype of interest. For example, the PSC (C1024_593 for part of the cerebellum: [https://labs-laboratory.com/bridgeport/MuSIC/C1024_593](https://labs-laboratory.com/bridgeport/MuSIC/C1024_593)) showed an incremental of *R2* 10.70%. The renal BAG showed an incremental *R2* of 5.92%, followed by the metabolic (*R2* = 5.67%) and pulmonary BAG (*R2* = 3.86%) (**Fig. 5a** and (**Supplementary eFile 8**).

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/06/16/2024.06.15.24308980/F6.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2024/06/16/2024.06.15.24308980/F6)

Figure 5: 
The polygenic risk score of the 2024 MAEs and PWAS (**a**) The incremental *R*2 of the PRS derived by PRC-CS to predict the 2024 MAEs in the target/test data (i.e., the split2 GWAS). The y-axis indicates the proportions of phenotypic variation that the PRS can significantly and additionally explain (i.e., incremental *R2*). The x-axis lists the 8 organ systems. For the brain, we showed the PRS distribution of the significant results from the 1791 brain PRS-MAEs; the other organ systems only have one PRS-MAE. (**b**) The PWAS links the PRS-MAEs to the 59 additional phenotypes not used to compute the PRS-MAE in the entire UKBB sample (P-value<0.05/2024/59).

We then applied the model to the entire UKBB population and performed a PRS-wide association study (PWAS), where the 2024 PRS-MAEs were linked to the 59 phenotypes that were not initially used to compute the respective PRS, to avoid the circular bias46 (**Supplementary eTable 5)**. Refer to **Method 3d** for details. We found 388 significant associations (P-value<0.05/2024/59) between 7 PRS-MAEs and 41 phenotypes. Among these, PSC C32_1 showed the most associations (94%); the lifestyle factor for only fish intake (Field ID: 16) was highly linked to multiple PRS-MAEs (16%). These results were expected because the 59 phenotypes (e.g., cognitive and mental traits) are primarily linked to the brain, and lifestyle factors were largely linked to multiple organ systems (**Fig. 5b** and **Supplementary eFile 9**). All derived PRS will be returned to UKBB and made available to the community.

## Discussion

This study expands previously established genetic atlases47,32 by integrating AI-derived endophenotypes via the 2024 MAEs within the multi-organ framework solely through GWAS summary statistics. We demonstrate a promising avenue for advancing imaging genetic research in two key aspects: *i*) integrating AI in imaging genetics and *ii*) exploring human aging and disease through a multi-organ perspective.

By comprehensively depicting the genetic architecture of the 2024 MAEs, we showcased that AI endophenotypes supported the endophenotype hypothesis14,15, in which they showed lower polygenicity and weaker negative selection effects than the disease diagnosis. First, it may suggest that these intermediate phenotypes exist along the causal pathway, bridging the gap between underlying genetics and “exo-phenotypes” like cognitive decline or disease diagnoses in case/control studies, thus positioned closer to the core etiology and pathology. Secondly, many of these 2024 MAEs originated from *in vivo* imaging methodologies like magnetic resonance imaging (MRI). Consequently, they tend to exhibit reduced noise levels (i.e., a higher SNR) in capturing disease-related effects and are less susceptible to biases, such as misclassification48, case/control-covariate sample bias (e.g., studies matching comorbidities and other factors), and imbalanced case/control ratios, as evidenced in many GWASs in FinnGen. Especially for the former, binary traits have a threshold for disease classification, leading to the dichotomization of individuals into affected and unaffected categories. Thirdly, the 525 DEs often represent complex diseases highly influenced by multiple genetic and environmental factors. Their multifaceted nature, involving numerous genes with modest effects and environmental interactions49, can lead to a higher vulnerability to disease onset and clinical symptoms.

Consistent with this observation, we previously also found that one AI- and imaging-derived subtype of Alzheimer’s disease50 (AD1), but not the binary disease diagnosis, was genetically correlated with brain age (GM- and WM-BAG)6.

We observed that brain MAEs were overall more polygenic than MAEs from other organ systems. Brain disorders are highly polygenic51. First, the brain is a highly complex organ with intricate functions, and disorders affecting it are likely influenced by a larger number of genetic variants12,52. Second, many brain disorders are multifaceted, involving various aspects of brain structure, function, and connectivity, which can be influenced by various genetic factors19.

Additionally, the brain regulates many physiological processes throughout the body, so disruptions in its function can have widespread effects, potentially involving interactions with multiple organ systems4. In addition, we found that most of the brain MAEs showed negative selection signatures, including the 9 disease subtype DNEs and 4 brain BAGs; some of the brain PSCs showed a positive *S* estimate (e.g., for the occipital lobe and subcortical structure, *S*=0.31±0.09: [https://labs-laboratory.com/bridgeport/MuSIC/C32_18](https://labs-laboratory.com/bridgeport/MuSIC/C32_18)). The anticipated negative selection signatures of biological age across multiple organs and disease subtypes are expected to align with our prior findings, which revealed pervasive signatures of natural selection across a range of complex human traits and functional genomic categories. This negative selection signature prevents mutations with large deleterious effects from becoming frequent in the population53. The positive selection signatures identified in certain brain PSCs may suggest that positive selection may also play a role in shaping the genetic architecture of brain structural networks.

The MUTATE atlas uncovered both established and previously undiscovered interactions concerning human systemic diseases within individual organs and across diverse organ systems. For example, within the cardiovascular system, the AI-derived MAE, cardiovascular BAG showed both substantial genetic correlation (**Fig. 2**) and bi-directional causality (**Fig. 3** and **4**) with multiple heart diseases, such as ischaemic heart disease54, heart failure55, and atrial fibrillation56. Similarly, pulmonary BAG was also causally linked to multiple diseases related to the lung and respiratory system, including COPD57 and various forms of asthma58. Another organ-specific connection was observed in neurologic diseases, encompassing conditions such as AD59 and various mental disorders60 linked to several MAEs associated with the brain, notably several PSCs and WM-BAG. Cross-organ interplay was evidenced for several novel connections. For instance, the brain PSCs exhibited causal connections to conditions extending beyond the brain, such as ventral hernia and vein diseases, as well as systemic conditions, like various forms of diabetes affecting the entire body. In contrast, AD appears to causally impact multiple BAGs across various human organ systems, including the renal, immune, and metabolic systems. It’s widely recognized that AD, being a complex condition, triggers detrimental effects that influence several human organ systems59,61. Our previous study used imaging genetics to investigate this multi-organ involvement along the disease continuum62. These results highlight the clinical relevance and interpretation of these AI endophenotypes to quantify individual-level organ health.

Emphasizing preventative strategies for specific chronic diseases is crucial to enhancing overall multi-organ health. Our MAEs present opportunities as novel instruments for selecting populations in clinical trials and facilitating therapeutic development. AD and various forms of diabetes exemplify disease endpoints significantly impacting multiple human organ systems. AD stands as the leading cause of dementia in older adults, presenting a persistent challenge in medicine despite numerous pharmacotherapeutic clinical trials. These trials have included interventions, such as anti-amyloid drugs63,64 and anti-tau drugs.65. The complexity and multifaceted nature of the underlying neuropathological processes may account for the lack of effective treatments. We call on the scientific community to embrace various mechanistic hypotheses to elucidate AD pathogenesis beyond amyloid and tau66,67. Likewise, the complexity of diabetes, with its various contributing factors, renders prevention challenging68. Moreover, diabetes often coexists with other chronic conditions affecting multiple organ systems, such as cardiovascular diseases, hypertension, and dyslipidemia69. Successful prevention strategies require a holistic approach, encompassing lifestyle adjustments, education, healthcare access, and societal considerations.

## Limitation

This study presents several limitations. Primarily, our analyses were centered solely on GWAS summary statistics derived from individuals of European ancestries. Future investigations should extend these findings to diverse ethnic groups, particularly those that are underrepresented, to ascertain broader applicability. This necessitates the research community’s commitment to embracing open science in AI and genetics. Secondly, the computational genomics statistical methods utilized in this research rely on several underlying statistical assumptions, which could potentially be violated and introduce bias. We mitigated bias by employing multiple methodologies to compute heritability, genetic correlation, and causality to address this concern. Additionally, we conducted thorough sensitivity checks, and the detailed results are provided accordingly. Finally, these MAEs originated from a singular biomedical data modality, such as MRI. Future investigations should explore utilizing AI across multi-omics data, such as integrating imaging and genetics70, to capture underlying disease effects more comprehensively.

## Outlook

In summary, we introduced the MUTATE genetic atlas to comprehensively comprehend the genetic architecture of AI endophenotypes and chronic diseases in multi-organ science. This investigation underscores the potential of integrating AI into genetic research and supports a comprehensive approach to investigating human diseases within a multi-organ paradigm.

## STAR * Methods

### Method 1: GWAS summary statistics

The present study solely utilized GWAS summary statistics; no individual-level data were used. We downloaded the GWAS summary statistics from three web portals for the 2024 MAEs, 521 DEs from FinnGen, and 4 DEs from PGC, respectively.

### UKBB

UKBB is a population-based study of approximately 500,000 people recruited from the United Kingdom between 2006 and 2010. The UKBB study has ethical approval, and the ethics committee is detailed here: [https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee](https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee).

The GWAS summary statistics for all the 2024 MAEs are publicly available at the MEDICINE knowledge portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine), which focuses on disseminating scientific findings on imaging genetics and AI methods in multi-organ science. Specifically, among the 2024 MAEs, 2003 PSCs – at varying scales from C32 to C1024 – were structural covariance networks derived via the sopNMF method12. 9 DNEs1 captured the neuroanatomical heterogeneity of four brain diseases (AD1-2, ASD1-3, LLD1-2, and SCZ1-2) using semi-supervised clustering or representation learning methods42,62,71,72. 12 multi-organ BAGs (GM, WM, FC6, multimodal brain BAGs, cardiovascular BAG, eye BAG, hepatic BAG, immune BAG, musculoskeletal BAG, metabolic BAG, pulmonary BAG, and renal BAG73) were derived from various machine learning models to quantify the individual-level deviation from typical brain aging due to various pathological effects. Detailed AI methodologies are presented in **Method 2** for the MAEs, DNEs, and BAGs. All GWASs were performed within European ancestries and using the GRCh37 human genome assembly; the GWAS model (PLINK74 for linear model and fastGWA75 for linear mixed-effect model), sample sizes, and covariates included are detailed in the original papers and also in **Supplementary eTable 1a**.

Harmonization of GWAS summary statistics across different models and consortia for various software is crucial, such as aligning the effect allele and the direction of the effect size. There’s currently no established standard in the field for this process, although some advice has been proposed76. Certain software harmonizes data based on the allele frequency of the effect allele, such as the *TwoSampleMR* package77 for Mendelian randomization. In our UKBB MAE GWAS summary data, we harmonized the effect allele as the alternative allele from PLINK and A1 from fastGWA and provided its corresponding allele frequency. P-value, effect sizes (e.g., BETA value and SE), and sample sizes are indicated too. The variant identifier is based on the rs ID number, not the chromosome number and position number combination.

### FinnGen

The FinnGen22 study is a research project based in Finland that explores combined genetics and health registry data to understand the underlying causes and mechanisms behind various disease endpoints. It particularly emphasizes the genetic basis of diseases in the Finnish population (>500,000) by conducting extensive GWAS and analyzing large-scale genomic data in collaboration with multiple research institutions and organizations. FinnGen has generously made their GWAS results publicly available to the community for research purposes ([https://www.finngen.fi/en/access_results](https://www.finngen.fi/en/access_results)).

The present study used the GWAS summary statistics version R9 released to the public on May 11, 2022, after harmonization by the consortium. In the R9 release, FinnGen analyzed 2269 binary and 3 quantitative endpoints from 377,277 individuals and 20,175,454 variants.

Regenie78 was used to run the GWAS models, including sex, age, 10 PCs, and genotyping batch as covariates. Genotype imputation was done with the population-specific SISu v4.0 reference panel. In our analysis, we concentrated solely on binary DEs with case numbers exceeding 5000 to ensure adequate statistical power, given the highly imbalanced case/control ratios. As the released data were based on the GRCh38 human genome assembly, we lifted the GWAS summary statistics to the GRCh37 version for all genetic analyses. **Supplementary eTable 1b** details the included 521 DEs. More details can be found at the FinnGen website: [https://finngen.gitbook.io/documentation/v/r9/](https://finngen.gitbook.io/documentation/v/r9/).

The FinnGen team has systematically harmonized the GWAS summary data for the 521 DEs involved. The alternative allele serves as the effect allele. The rsID number represents the SNP; the chromosome number and position are also shared. The data includes P-values, effect sizes, and allele frequencies for both the alternative and reference alleles.

### Psychiatric Genomics Consortium

PGC23 is an international coalition of researchers exploring the genetic underpinnings of psychiatric disorders and beyond. This collaborative effort unites scientists globally to examine and decipher extensive genomic datasets concerning various brain diseases. The primary goal of PGC involves uncovering and comprehending the genetic elements that contribute to various psychiatric disorders, such as schizophrenia, bipolar disorder, and major depressive disorder. We downloaded GWAS summary statistics from the PGC website ([https://pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/)) and manually harmonized the data to our Mendelian randomization analyses to replicate the FinnGen findings.

PGC did not harmonize the GWAS summary statistics; the available data information depends on each study. **Supplementary eTable 1c** details the 4 DEs (AD, ADHD, bipolar disorder, and schizophrenia) included after the data filtering procedure. First, we ensured that the study population comprised individuals of European ancestry and, if necessary, lifted the data to the human genome build assembly GRCh37. Secondly, we excluded two studies where the allele frequency is unavailable because the *TwoSampleMR* package77 requires this information to harmonize the exposure and outcome data (e.g., flip the effect allele and effect size). Thirdly, we confirmed that the GWAS summary statistics didn’t overlap with UKBB data. Specifically, the AD GWAS summary data79 explicitly offered a version that excluded participants from UKBB. In addition, the original dataset lacked a column for the rsID number. To deal with this, we employed a mapping approach using the chromosome number and position to the dpSNP database (version 150), which allowed us to obtain the corresponding rsID numbers. All 4 DE GWAS summary data went through the same harmonization procedure as FinnGen (**Method 3c**)

### Method 2: 2024 multi-organ AI endophenotypes

#### (a) The 2003 patterns of structural covariance of the brain

In our earlier study12, we utilized the sopNMF method on an extensive and varied brain imaging MRI dataset (*N*=50,699, including data from UKBB) to generate the multi-scale brain PSCs. The scale C ranges from 32 to 1024, progressively increasing by a factor of 2; 11 PSCs vanished during models.

Biologically, the 2003 PSCs represent data-driven structural networks that co-vary across brain regions and individuals in a coordinated fashion. Mathematically, the sopNMF method is a stochastic approximation (“deep learning-analogy”) constructed and extended based on opNMF80,81. Consider an imaging dataset comprising n images, each containing *d* voxels. We represent the data as a matrix ***X***, where each column corresponds to a flattened image: X = ![Graphic][1]</img>. The method factorizes ***X*** into two low-rank matrices ![Graphic][2]</img> and ![Graphic][3]</img>, subject to two important constraints: *i*) non-negativity and *ii*) column-wise orthonormality. More mathematical details can be referred to the original references12,80,81 and Supplementary eMethod 2a

#### **(b)** : The 9 dimensional neuroimaging endophenotypes of the brain

The nine DNEs captured the neuroanatomical heterogeneity of four brain diseases, including AD1-2 for AD62, ASD1-3 for autism spectrum disorder42, LLD1-2 for late-life depression71, and SCZ1-2 for schizophrenia72. The underlying AI methodologies involved two different semi- supervised clustering or representation learning algorithms: Surreal-GAN82 and HYDRA83. Refer to a review for details of the semi-supervised learning84, which primarily seeks the so-called “*1- to-k*” mapping patterns or transformations from reference domains (like healthy controls) to target domains (such as patients).

Surreal-GAN82 was used to derive AD1-262. It unravels the intrinsic heterogeneity associated with diseases through a deep representation learning approach. The methodological innovation, compared to its precentor Smile-GAN85, lies in how Surreal-GAN models disease heterogeneity: it interprets it as a continuous dimensional representation, ensures a consistent increase in disease severity within each dimension, and permits the simultaneous presence of multiple dimensions within the same participant without exclusivity. More mathematical details are presented in **Supplementary eMethod 2b.**

HYDRA83 was employed to derive the other 7 DNEs. It utilizes a widely adopted discriminative technique, namely support vector machines (SVM), to establish the “*1-to-k*” mapping. The model extends multiple linear SVMs to the nonlinear domain by piecing them together. This approach serves the dual purpose of classification and clustering simultaneously. Specifically, it creates a convex polytope by amalgamating hyperplanes derived from *k* linear SVMs. This polytope separates the healthy control group from the *k* subpopulations within the patient group. Conceptually, each face of this convex polytope can be likened to encoding each subtype (categorical trait) or dimension (continuous trait), capturing distinctive disease effects (Refer to **Supplementary eMethod 2c**).

#### **(c)** : The 12 biological age gaps of nine human organ systems

The nine multi-organ BAGs (brain, cardiovascular, eye, hepatic, immune, musculoskeletal, metabolic, pulmonary, and renal) were derived from a previous study5 that used AI to predict the chronological age of healthy individuals without chronic medical conditions: AI-predicted age – chronological age. Using a 20-fold cross-validation procedure, we applied the model for each organ system, employing a linear support vector machine. Before training each model iteration, standardization was applied to measures (excluding categorical variables) within the training set. The model was solved using sequential minimal optimization with a gap tolerance of 0.001. The support vector regression settings were adjusted for optimization, adhering to established principles in the field86.

Alongside the nine organ BAGs, we previously derived three multimodal brain BAGs (GM, WM, and FC-IDP) using features from gray matter (GM), white matter (WM), and functional connectivity (FC) in MRI scans6. We systematically compared four machine learning models: SVR, LASSO regression, multilayer perceptron, and a five-layer neural network. We employed nested cross-validation (CV) and included an independent test dataset87 for a fair comparison across different models and MRI modalities. This process involved an outer loop CV with 100 repeated random splits: 80% for training and validation and 20% for testing. Within the inner loop, a 10-fold CV was utilized for hyperparameter tuning. Furthermore, we reserved an independent test dataset, which was kept unseen until the fine-tuning of the machine learning models88 (e.g., hyperparameters for SVR) was completed.

### Method 3: Genetic analyses based on GWAS summary statistics

#### (a) : The genetic architecture of the 2024 MAEs and 525 DEs

Primarily, we used SBayesS39 to estimate three sets of parameters that fully unveil the genetic architecture of the 2024 MAEs and 525 DEs. SBayesS is an expanded approach capable of estimating three essential parameters characterizing the genetic architecture of complex traits through a Bayesian mixed linear model89. This method only requires GWAS summary statistics of the SNPs and LD information from a reference sample. These parameters include SNP-based heritability (*h*2*SNP*), polygenicity (*π*), and the relationship between minor allele frequency (MAF) and effect size (*S*). We used the software pre-computed sparse LD correlation matrix derived from the European ancestry by Zeng et al.39. More mathematical details can be found in the original paper from Zeng et al.39. We ran the *gctb* command89 using the argument *--sbayes S*, and left all other arguments by default. When applying SBayesS to the 2025 MAEs and 525 DEs summary data, we found that 18 DEs failed to converge in the MCMC sampling, which may be due to LD differences between FinnGen and UKBB samples (the latter was used as the LD reference in SBayesS).

To benchmark different methods used in the field for SNP-based heritability estimates, we also employed two other methods based on GWAS summary data: *i*) LDSC28 and *ii*) SumHer33. LDSC relies on the principle that the correlation between SNP effect sizes and linkage disequilibrium with neighboring SNPs can be used to estimate the proportion of heritability explained by all SNPs using GWAS summary data. For LDSC, we used the precomputed LD scores from the 1000 Genomes of European ancestry. All other parameters were set to default in the software. After merging the GWAS summary statistics, we chose the 1000 Genomes reference panel for fair comparisons between the two studies and ensured that most SNPs were included in the analyses. For example, for the DE (RX_PARACETAMOL_NSAID), after merging with the reference panel LD, 1,171,361 remained. For the first MAE (C32_1), 1,092,510 SNPs remained after the same merging procedure. Furthermore, FinnGen didn’t provide the original genotype data; they only shared the LD information via the LDstore software but did not provide the allele information. Consequently, we cannot generate in-sample LD scores using the LDSC software. Finally, a prior investigation90 showcased the robustness of LDSC concerning the selection of LD reference panels – multi-ethnic European, Finnish-only, non-Finnish European from 1000 Genomes Phase 3 data, and FINRISK Finnish reference panel – regarding heritability estimates in four lipid traits within a Finnish population.

For SumHer, we used the BLD-LDAK model, as the software suggested. BLD-LDAK stands for “Bayesian LD-adjusted Kinship,” where LD-adjusted kinship refers to the calculation of genetic relatedness between individuals using information about the correlation of alleles between nearby SNPs (linkage disequilibrium). We used the software-provided tagging file, generated from 2000 white British individuals, as a reference penal suggested by the software for European ancestry groups. The HapMap3 data ([https://www.broadinstitute.org/medical-and-population-genetics/hapmap-3](https://www.broadinstitute.org/medical-and-population-genetics/hapmap-3)) merged with the tested GWAS summary SNPs. Similarly, we ensured sufficient SNPs remained after merging with the reference panel. All other parameters were set to default. SumHer differs from LDSC in several ways: *i*) it models inflation multiplicatively, whereas LDSC uses an additive approach; *ii*) it accounts for uneven LD patterns and incorporates MAF on SNP effect; and *iii*) it utilizes a restricted maximum likelihood solver rather than regression to estimate the *h*2*SNP*.

#### (b) : Genetic correlation

We used three different methods to compute the MAE-DE pairwise (*N*=2024x525=1,062,600) genetic correlations (*r**g*): *i*) LDSC28, *ii*) GNOVA34, and *iii*) HDL38.

An earlier study92 highlighted the significance of selecting an appropriate LD score reference panel for genetic correlation estimates based on summary statistics. We generated the same reference panel for LD scores across the three software for a fair comparison. For LDSC, we used the precomputed LD scores from the 1000 Genomes of European ancestry provided by the software. All other parameters were set by default. To employ GNOVA, we created the LD scores utilizing the 1000 Genomes of European ancestry using the *--save-ld* argument within the *gnova.py* script. For HDL, we used the provided scripts from HDL to generate the LD scores using the same 1000 Genomes of European ancestry ([https://github.com/zhenin/HDL/wiki/Build-a-reference-panel](https://github.com/zhenin/HDL/wiki/Build-a-reference-panel)).

Through our analysis, we found that the three packages have different levels of model convergence rates, which is critical for future applications as these open-source packages claim to advance genetic research. In particular, we found that LDSC (1,062,577/1,062,600) and GNOVA (1,062,600/1,062,600) converged for most of the tested MAE-DE pairs, whereas HDL failed a substantial proportion of the analyses, leading to only 59,291 out of the 1,062,600 MAE- DE pairs (refer to the raised issue: [https://github.com/zhenin/HDL/issues/30](https://github.com/zhenin/HDL/issues/30)). Therefore, in **Fig. 2**, we presented common significant results after Bonferroni corrections from the LDSC and GNOVA, resulting in 133 and 45 significant signals corrected on *i*) the number of MAEs and *ii*) the number of MAEs and DEs.

#### (c) : Two-sample bidirectional Mendelian randomization

We employed a bidirectional, two- sample Mendelian randomization using the *TwoSampleMR* package77 to infer the causal relationships between the 2024 MAEs, 521 DEs from FinnGen, and 4 brain DEs from PGC.

The forward Mendelian randomization examined causality from the 2024 MAEs to the 525 DEs, while the inverse analysis investigated causality from the 525 DEs to the 2024 MAEs. The *TwoSampleMR* package77 applied five different Mendelian randomization methods. We presented the significant findings after the Bonferroni correction using the inverse variance weighted (IVW) estimator, verifying that the correction remained significant in at least one of the other four estimators (Egger, weighted median, simple mode, and weighted mode estimators). For the significant signals, we performed several sensitivity analyses. First, a heterogeneity test was performed to check for violating the IV assumptions. Horizontal pleiotropy was estimated to navigate the violation of the IV’s exclusivity assumption93 using a funnel plot, single-SNP Mendelian randomization approaches, and Mendelian randomization Egger estimator . Moreover, the leave-one-out analysis excluded one instrument (SNP) at a time and assessed the sensitivity of the results to individual SNP.

Critically, to enhance transparency and reproducibility, we followed a systematic procedure guided by the STROBE-MR Statement94 in conducting all causality analyses. This comprehensive approach encompassed the selection of exposure and outcome variables, reporting full sets of statistics, and implementing sensitivity checks to identify potential violations of underlying assumptions. First, we performed an unbiased quality check on the GWAS summary statistics. Notably, the absence of population overlapping bias29 was confirmed, given that FinnGen and UKBB participants largely represent European ancestry populations without explicit overlap. For the four PGC DEs, we ensured that no UKBB participants were included in the GWAS summary data. Furthermore, all GWAS summary statistics were based on or lifted to GRCh37. Subsequently, we selected the effective exposure variables by assessing the statistical power of the exposure GWAS summary statistics in terms of instrumental variables (IVs), ensuring that the number of IVs exceeded 8 before harmonizing the data. Crucially, the function “*clump_data*” was applied to the exposure GWAS data, considering LD. The function “*harmonise_data*” was then used to harmonize the GWAS summary statistics of the exposure and outcome variables. This overall resulted in a smaller number (< 525 DEs or 2024 MAEs) of effective exposure/outcome variables in both forward and inverse Mendelian randomization analyses, as certain GWAS summary data did not have enough IVs.

#### (d) : PRS calculation

PRS calculation used the GWAS summary statistics from the split-sample sensitivity analysis from our previous studies12,6,4,1. We established PRS weights using split1 GWAS data as the base/training set, while the split2 GWAS summary statistics were used as the target/testing data. Details of the quality control (QC) procedures are shown in our previous studies12,6,4,1. Following the QC procedures, PRS for the split2 group was computed using PRS- CS45. PRS-CS infers posterior SNP effect sizes under continuous shrinkage priors using GWAS summary statistics and an LD reference panel (i.e., UKBB reference). To ascertain the most suitable PRS, we conducted a linear regression encompassing different P-value thresholds (0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5), while controlling for age, sex, intracranial volume (if applicable), and the forty genetic principal components. The optimal P-value threshold for PRS- MAE was determined based on the highest incremental *R*2.

After determining the optimal model, we applied the model to the entire UKBB sample (∼500k individuals). We then performed a PWAS to link the 2024 PRS-MAEs and 59 additional phenotypes (**Supplementary eTable 5)** not used to compute the PRS-MAE to avoid the circular bias46. The 59 phenotypes include cognitive scores (e.g., fluid intelligence score; Field ID: 20016, mental traits (e.g., fed-up feelings; Filed ID: 1960), and lifestyle factors (e.g., tea intake; Filed ID: 1488). A linear regression was built considering the following covariates: sex (Field ID: 31), smoking status (Field ID: 20116), weight (Field ID: 21002), standing height (Field ID: 50), waist circumstance (Field ID: 48), age at recruitment (Field ID: 21022), and first 40 genetic principal components (Field ID: 22009).

## Data Availability

The results of the MUTATE atlas are disseminated at the MUTATE knowledge portal: [https://labs-laboratory.com/mutate](https://labs-laboratory.com/mutate). The GWAS summary statistics for the 2024 MAEs can be accessed publicly through the MEDICINE knowledge portal: [https://labs-laboratory.com/medicine](https://labs-laboratory.com/medicine) and the BRIDGEPORT knowledge portal: [https://labs-laboratory.com/bridgeport](https://labs-laboratory.com/bridgeport). The GWAS summary statistics for the 521 DEs from FinnGen are publicly available at: [https://finngen.gitbook.io/documentation/v/r9/](https://finngen.gitbook.io/documentation/v/r9/). The GWAS summary statistics for the 4 DEs from PGC are publicly available at: [https://pgc.unc.edu/for-researchers/download-results/](https://pgc.unc.edu/for-researchers/download-results/). The study used only GWAS summary statistics rather than individual-level data from the UK Biobank. However, the 2024 MAE GWAS data was initially derived from previous studies conducted under Application Numbers 35148 and 60698 from the UK Biobank.

## Code Availability

The software and resources used in this study are all publicly available: 

*   *GCTB*: [https://cnsgenomics.com/software/gctb/#Overview](https://cnsgenomics.com/software/gctb/#Overview), SNP-based heritability, polygenicity, and MAF/effect size ratio

*   *LDSC*: [https://github.com/bulik/ldsc](https://github.com/bulik/ldsc), SNP-based heritability and genetic correlation

*   *SumHer*: [https://dougspeed.com/sumher/](https://dougspeed.com/sumher/), SNP-based heritability

*   *GNOVA*: [https://github.com/xtonyjiang/GNOVA](https://github.com/xtonyjiang/GNOVA), genetic correlation

*   *HDL*: [https://github.com/zhenin/HDL](https://github.com/zhenin/HDL), genetic correlation

*   *TwoSampleMR*: [https://mrcieu.github.io/TwoSampleMR/index.html](https://mrcieu.github.io/TwoSampleMR/index.html), Mendelian randomization

*   PRS-CS: [https://github.com/getian107/PRScs](https://github.com/getian107/PRScs), PRS

## Competing Interests

None

## Authors’ contributions

Dr. Wen has full access to all the study data and is responsible for its integrity and accuracy.

*Study concept and design*: W.J

*Acquisition, analysis, or interpretation of data*: W.J

*Drafting of the manuscript*: W.J

*Critical revision of the manuscript for important intellectual content*: All authors

*Statistical analysis*: W.J

## Supporting information

Supplement [[supplements/308980_file02.docx]](pending:yes)

## Acknowledgment

We sincerely thank the UK Biobank ([https://www.ukbiobank.ac.uk/](https://www.ukbiobank.ac.uk/)), FinnGen ([https://www.finngen.fi/en](https://www.finngen.fi/en)), and PGC ([https://pgc.unc.edu/](https://pgc.unc.edu/)) team for their invaluable contribution to advancing clinical research in our field.

*   Received June 15, 2024.
*   Revision received June 15, 2024.
*   Accepted June 16, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.Wen, J. et al. Neuroimaging-AI Endophenotypes of Brain Diseases in the General Population: Towards a Dimensional System of Vulnerability. 2023.08.16.23294179 Preprint at doi:10.1101/2023.08.16.23294179 (2023).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMy4wOC4xNi4yMzI5NDE3OXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDYvMTYvMjAyNC4wNi4xNS4yNDMwODk4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

2.  2.Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/S41586-022-04434-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

3.  3.Kendler, K. & Neale, M. Endophenotype: a conceptual analysis. Mol Psychiatry 15, 789– 797 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/mp.2010.8&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20142819&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280854800006&link_type=ISI) 

4.  4.Wen, J. et al. The Genetic Architecture of Biological Age in Nine Human Organ Systems. medRxiv 2023.06.08.23291168 (2023) doi:10.1101/2023.06.08.23291168.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMy4wNi4wOC4yMzI5MTE2OHY1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDYvMTYvMjAyNC4wNi4xNS4yNDMwODk4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

5.  5.Tian, Y. E. et al. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med 1–11 (2023) doi:10.1038/s41591-023-02296-6.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-023-02296-6&link_type=DOI) 

6.  6.Wen, J. et al. The genetic architecture of multimodal human brain age. Nat Commun 15, 2604 (2024).
    
    
7.  7.Zhao, B. et al. Heart-brain connections: Phenotypic and genetic insights from magnetic resonance images. Science 380, abn6598 (2023).
    
    
8.  8.McCracken, C. et al. Multi-organ imaging demonstrates the heart-brain-liver axis in UK Biobank participants. Nat Commun 13, 7839 (2022).
    
    
9.  9.Nie, C. et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Reports 38, 110459 (2022).
    
    
10. 10.Liu, Y. et al. Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning. eLife 10, e65554 (2021).
    
    
11. 11.Oh, H. S.-H. et al. Organ aging signatures in the plasma proteome track health and disease. Nature 624, 164–172 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-023-06802-1&link_type=DOI) 

12. 12.Wen, J. et al. Genomic loci influence patterns of structural covariance in the human brain. Proceedings of the National Academy of Sciences 120, e2300842120 (2023).
    
    
13. 13.Hodson, R. Precision medicine. Nature 537, S49–S49 (2016).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/537s49a&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27602738&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

14. 14.Gottesman, I. I. & Gould, T. D. The endophenotype concept in psychiatry: etymology and strategic intentions. Am J Psychiatry 160, 636–645 (2003).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1176/appi.ajp.160.4.636&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12668349&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182096300008&link_type=ISI) 

15. 15.Cannon, T. D. & Keller, M. C. Endophenotypes in the Genetic Analyses of Mental Disorders. Annual Review of Clinical Psychology 2, 267–290 (2006).
    
    
16. 16.Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat Med 28, 31–38 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-021-01614-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35058619&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

17. 17.Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

18. 18.Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19, 1523–1536 (2016).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nn.4393&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27643430&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

19. 19.Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0571-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305740&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

20. 20.Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37794186&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

21. 21.Dhindsa, R. S. et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature 622, 339–347 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37794183&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

22. 22.Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-02205473-8&link_type=DOI) 

23. 23.O’Donovan, M. C. What have we learned from the Psychiatric Genomics Consortium. World Psychiatry 14, 291–293 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/wps.20270&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26407777&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

24. 24.Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005–D1012 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky1120&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30445434&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

25. 25.Elsworth, B. et al. The MRC IEU OpenGWAS data infrastructure. 2020.08.10.244293 Preprint at doi:10.1101/2020.08.10.244293 (2020).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wOC4xMC4yNDQyOTN2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA2LzE2LzIwMjQuMDYuMTUuMjQzMDg5ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

26. 26.Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51, 1339–1348 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0481-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

27. 27.Shen, L. & Thompson, P. M. Brain Imaging Genomics: Integrated Analysis and Machine Learning. Proceedings of the IEEE 108, 125–162 (2020).
    
    
28. 28.Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3211&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25642630&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

29. 29.Sanderson, E. et al. Mendelian randomization. Nat Rev Methods Primers 2, 1–21 (2022).
    
    
30. 30.Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol 44, 512–525 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyv080&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26050253&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

31. 31.Open science. Nature 550, 7–8 (2017).
    
    
32. 32.Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–1241 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3406&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26414676&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

33. 33.Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat Genet 51, 277–284 (2019).
    
    
34. 34.Lu, Q. et al. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. The American Journal of Human Genetics 101, 939–964 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.11.001&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

35. 35.Smith, G. D. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32, 1– 22 (2003).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyg070&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12689998&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182341300001&link_type=ISI) 

36. 36. Davey Smith,  G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 23, R89–98 (2014).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddu328&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25064373&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000349825700013&link_type=ISI) 

37. 37.Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am J Epidemiol 178, 1177–1184 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwt084&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23863760&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325151700023&link_type=ISI) 

38. 38.Ning, Z., Pawitan, Y. & Shen, X. High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet 52, 859–864 (2020).
    
    
39. 39.Zeng, J. et al. Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 12, 1164 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-021-21446-3&link_type=DOI) 

40. 40.Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat Genet 50, 737–745 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0108-x&link_type=DOI) 

41. 41.Ojavee, S. E., Kutalik, Z. & Robinson, M. R. Liability-scale heritability estimation for biobank studies of low-prevalence disease. The American Journal of Human Genetics 109, 2009–2017 (2022).
    
    
42. 42.Hwang, G. et al. Assessment of Neuroanatomical Endophenotypes of Autism Spectrum Disorder and Association With Characteristics of Individuals With Schizophrenia and the General Population. JAMA Psychiatry (2023) doi:10.1001/jamapsychiatry.2023.0409.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamapsychiatry.2023.0409&link_type=DOI) 

43. 43.Makowski, C. et al. Discovery of genomic loci of the human cerebral cortex using genetically informed brain atlases. 8 (2022).
    
    
44. 44.Matoba, N., Love, M. I. & Stein, J. L. Evaluating brain structure traits as endophenotypes using polygenicity and discoverability. Human Brain Mapping 43, 329–340 (2022).
    
    
45. 45.Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019).
    
    
46. 46.Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F. & Baker, C. I. Circular analysis in systems neuroscience: the dangers of double dipping. Nat. Neurosci. 12, 535–540 (2009).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nn.2303&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19396166&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000265575400006&link_type=ISI) 

47. 47.Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415–1424 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00931-x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

48. 48.Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet 45, 984–994 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2711&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23933821&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

49. 49.Hunter, D. J. Gene–environment interactions in human diseases. Nat Rev Genet 6, 287–298 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg1578&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15803198&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000228093000014&link_type=ISI) 

50. 50.Wen, J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at doi:10.1101/2022.09.16.508329 (2022).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMi4wOS4xNi41MDgzMjl2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA2LzE2LzIwMjQuMDYuMTUuMjQzMDg5ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

51. 51.THE BRAINSTORM CONSORTIUM et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
    
    
52. 52.Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat Neurosci 24, 737–745 (2021).
    
    
53. 53.Walsh, B. & Lynch, M. Evolution and Selection of Quantitative Traits. (Oxford University Press, 2018).
    
    
54. 54.Plackett, B. A graphical guide to ischaemic heart disease. Nature 594, S3–S3 (2021).
    
    
55. 55.Peisker, F. et al. Mapping the cardiac vascular niche in heart failure. Nat Commun 13, 3027 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-022-30682-0&link_type=DOI) 

56. 56.Brundel, B. J. J. M. et al. Atrial fibrillation. Nat Rev Dis Primers 8, 1–23 (2022).
    
    
57. 57.Barnes, P. J. et al. Chronic obstructive pulmonary disease. Nat Rev Dis Primers 1, 1–21 (2015).
    
    
58. 58.Holgate, S. T. et al. Asthma. Nat Rev Dis Primers 1, 1–22 (2015).
    
    
59. 59.Ballard, C. et al. Alzheimer’s disease. The Lancet 377, 1019–1031 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/s0140-6736(10)61349-9&link_type=DOI) 

60. 60.Marshall, M. The hidden links between mental disorders. Nature 581, 19–21 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2308-7&link_type=DOI) 

61. 61.Eiser, A. R. & Fulop, T. Alzheimer’s Disease Is a Multi-Organ Disorder: It May Already Be Preventable. J Alzheimers Dis 91, 1277–1281 (2023).
    
    
62. 62.Wen, J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at doi:10.1101/2022.09.16.508329 (2022).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMi4wOS4xNi41MDgzMjl2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA2LzE2LzIwMjQuMDYuMTUuMjQzMDg5ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

63. 63.Guthrie, H. et al. Safety, Tolerability, and Pharmacokinetics of Crenezumab in Patients with Mild-to-Moderate Alzheimer’s Disease Treated with Escalating Doses for up to 133 Weeks. J Alzheimers Dis 76, 967–979 (2020).
    
    
64. 64.Sevigny, J. et al. The antibody aducanumab reduces Aβ plaques in Alzheimer’s disease. Nature 537, 50–56 (2016).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature19323&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27582220&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

65. 65.Congdon, E. E. & Sigurdsson, E. M. Tau-targeting therapies for Alzheimer disease. Nat Rev Neurol 14, 399–415 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41582-018-0013-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

66. 66.Jack, C. R. et al. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol 12, 207–216 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1474-4422(12)70291-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23332364&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000314330200021&link_type=ISI) 

67. 67.Frisoni, G. B. et al. The probabilistic model of Alzheimer disease: the amyloid hypothesis revised. Nat Rev Neurosci 23, 53–66 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/S41583-021-00533-W&link_type=DOI) 

68. 68.Tomic, D., Shaw, J. E. & Magliano, D. J. The burden and risks of emerging complications of diabetes mellitus. Nat Rev Endocrinol 18, 525–539 (2022).
    
    
69. 69.DeFronzo, R. A. et al. Type 2 diabetes mellitus. Nat Rev Dis Primers 1, 1–22 (2015).
    
    
70. 70.Yang, Z. et al. Gene-SGAN: discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering. Nat Commun 15, 354 (2024).
    
    
71. 71.Wen, J. et al. Characterizing Heterogeneity in Neuroimaging, Cognition, Clinical Symptoms, and Genetics Among Patients With Late-Life Depression. JAMA Psychiatry (2022) doi:10.1001/jamapsychiatry.2022.0020.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamapsychiatry.2022.0020&link_type=DOI) 

72. 72.Chand, G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain 143, 1027–1038 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/brain/awaa025&link_type=DOI) 

73. 73.Wen, J. et al. The Genetic Architecture of Biological Age in Nine Human Organ Systems. medRxiv 2023.06.08.23291168 (2023) doi:10.1101/2023.06.08.23291168.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMy4wNi4wOC4yMzI5MTE2OHY1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDYvMTYvMjAyNC4wNi4xNS4yNDMwODk4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

74. 74.Purcell, S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81, 559–575 (2007).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

75. 75.Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51, 1749–1755 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588&ndash;019&ndash;0530&ndash;8&link_type=DOI) 

76. 76.MacArthur, J. A. L., et al. Workshop proceedings: GWAS summary statistics standards and sharing. Cell Genomics 1, (2021).
    
    
77. 77.Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7554/eLife.34408&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29846171&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

78. 78.Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 53, 1097–1103 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/S41588-021-00870-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

79. 79.Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat Genet 53, 1276–1282 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00921-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34493870&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

80. 80.Sotiras, A., Resnick, S. M. & Davatzikos, C. Finding imaging patterns of structural covariance via Non-Negative Matrix Factorization. NeuroImage 108, 1–16 (2015).
    
    
81. 81. Zhirong Yang & Oja, E. Linear and Nonlinear Projective Nonnegative Matrix Factorization. IEEE Trans. Neural Netw. 21, 734–749 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TNN.2010.2041361&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20350841&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

82. 82.Yang, Z., Wen, J. & Davatzikos, C. Surreal-GAN:Semi-Supervised Representation Learning via GAN for uncovering heterogeneous disease-related imaging patterns. ICLR (2021).
    
    
83. 83.Varol, E., Sotiras, A. & Davatzikos, C. HYDRA: Revealing heterogeneity of imaging and genetic patterns through a multiple max-margin discriminative analysis framework. NeuroImage 145, 346–364 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2016.02.041&link_type=DOI) 

84. 84.Wen, J. et al. Subtyping Brain Diseases from Imaging Data. in Machine Learning for Brain Disorders (ed. Colliot, O.) 491–510 (Springer US, New York, NY, 2023). doi:10.1007/978-1-0716-3195-9_16.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-1-0716-3195-9_16&link_type=DOI) 

85. 85.Yang, Z. et al. A deep learning framework identifies dimensional representations of Alzheimer’s Disease from brain structure. Nat Commun 12, 7065 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-021-26703-z&link_type=DOI) 

86. 86.Chang, C.-C. & Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/1961189.1961199&link_type=DOI) 

87. 87.Samper-González, J. et al. Reproducible evaluation of classification methods in Alzheimer’s disease: Framework and application to MRI and PET data. NeuroImage 183, 504–521 (2018).
    
    
88. 88.Wen, J. et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Medical Image Analysis 63, 101694 (2020).
    
    
89. 89.Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 50, 746–753 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0101-4&link_type=DOI) 

90. 90.Hautakangas, H. LD Score regression for estimating and partitioning heritability of lipid levels in the Finnish population. (University of Helsinki, Helsinki, 2018).
    
    
91. 91.Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat Genet 52, 458–462 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

92. 92.Zhang, Y. et al. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Brief Bioinform 22, bbaa442 (2021).
    
    
93. 93.Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7221&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28114746&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom) 

94. 94.Skrivankova, V. W. et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA 326, 1614–1621 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/JAMA.2021.18236&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34698778&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F06%2F16%2F2024.06.15.24308980.atom)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif