Dissecting the genetic overlap of education, socioeconomic status, and mental health ==================================================================================== * F. R. Wendt * G. A. Pathak * T. Lencz * J. H. Krystal * J. Gelernter * R. Polimanti ## Abstract Socioeconomic status (SES) and education (EDU) are phenotypically associated with psychiatric disorders and behavior. It remains unclear how these associations influence the genetic risk for mental health traits and EDU/SES individually. Using information from >1 million individuals, we conditioned the genetic risk for psychiatric disorders, personality traits, brain imaging phenotypes, and externalizing behaviors with genome-wide data for EDU/SES. Accounting for EDU/SES significantly affected the observed heritability of psychiatric traits ranging from 2.44% h2 decrease for bipolar disorder to 29.0% h2 decrease for Tourette syndrome. Neuroticism h2 significantly increased by 20.23% after conditioning with SES. After EDU/SES conditioning, novel neuronal cell-types were identified for risky behavior (excitatory), major depression (inhibitory), schizophrenia (excitatory and GABAergic), and bipolar disorder (excitatory). Conditioning with EDU/SES also revealed unidirectional causality between brain morphology and mental health phenotypes. Our results indicate genetic discoveries of mental health outcomes may be limited by genetic overlap with EDU/SES. ## Introduction Education (EDU) and socioeconomic status (SES) are risk or protective factors for traits related to mental health and disease (1, 2). Social position has been repeatedly correlated with mood, anxiety, and substance use related disorders, while EDU phenotypes such as *educational attainment, math ability*, and *fluid intelligence* are overall protective factors for development of neurological and psychiatric conditions (2). They are epidemiologically correlated, but the specific EDU and/or SES phenotypes used in epidemiological studies clearly account in part for observed differences between groups (3). It is therefore imperative to understand how EDU and SES phenotypes influence what we understand about human health and disease. Genome-wide association studies (GWAS) are powerful hypothesis-free genetic studies for detecting risk loci (e.g., single nucleotide polymorphisms (SNPs) or genes) for phenotypes of interest. Their widespread use has led to risk locus discovery underlying thousands of phenotypes across the spectrum of human health and disease, including mental and physical health and disease, personality, anthropometric measures, intelligence, and behavior (4). An observation generated from large-scale GWAS is the widespread presence of pleiotropy; a single SNP (or a set of SNPs) may have a range of relatively small effects on multiple similar or disparate phenotypes. On a genome-wide scale, these pleiotropic effects, detected using GWAS summary data, may be used to determine genetic correlations between phenotypes to putatively identify genetic underpinnings of trait pairs (5). The EDU phenotypes *educational attainment* and *cognitive performance* have relatively high SNP-heritability: the phenotypic variance explained by genetic information was 40-60% (6) and 21.5% (7), respectively. Socioeconomic status (SES) is defined as the social standing or class of an individual or group, often measured as a combination of education, income, and occupation (8). SES phenotypes such as *household income* and *Townsend deprivation index* (i.e., measure of SES based on whether individuals own their homes, their employment status, their access to a vehicle, and whether or not individuals share living accommodations with others) are significantly heritable and show strong genetic correlation with EDU traits (9). Additionally there is pleiotropy of genetic risks between EDU/SES and a range of mental health outcomes (e.g., psychiatric disorders, personality traits, internalizing and externalizing behaviors, social science outcomes, and brain imaging phenotypes) (10, 11). The epidemiological observations of high genetic correlations between genetic risk for EDU/SES and mental health outcomes (1, 2) raise two critical questions: (1) how might the strong genetic effects of EDU/SES affect our understanding of the overall genetic risk for mental health outcomes? and (2) is there evidence that genetic effects of mental health and disease phenotypes affect our understanding of the overall genetic risk for EDU/SES? The goal of this study was to investigate how the shared genetic effects between the general categories of EDU, SES, and mental health outcome phenotypes influence genetic risk for individual phenotypes within each of these classes. There are several ways to approach these questions. First, polygenic risk scoring (PRS) (12) is a tempting approach; but PRS using mental health/disease to predict the same or different phenotypes from an independent dataset often explain very little variance in the outcome phenotype (13-15). PRS also cannot detect specific biology underlying each phenotype. Second is multi-trait analysis of GWAS (MTAG), which jointly analyses GWAS summary statistics and adjusts per-SNP effect estimates and association p-values using the strength of the genetic correlation between phenotypes (16). Genetic correlations between EDU/SES and related phenotypes have, however, demonstrable biases from environmental confounders. If genetic correlations involving EDU and SES proxy phenotypes are significantly upwardly biased, an MTAG adjustment of summary statistics may inappropriately correct (i.e., bias) the summary statistics used for this study. To disentangle the complex genetic overlaps between EDU/SES and mental health, we therefore used multi-trait conditioning and joint analysis (mtCOJO), which generates conditioned GWAS summary statistics for each phenotype of interest after correcting for the per-SNP effects of another phenotype (17). The mtCOJO approach is not based on genetic correlation; it is based on the causal relationship between trait pairs inferred by Mendelian randomization (MR). For our phenotypes of interest mtCOJO is an advantageous approach, which, in theory, is independent of the effects of environmental confounders. MR detects causal inferences between trait pairs using non-modifiable risk factors (SNPs) associated with an exposure variable and only associated with an outcome variable through the exposure. Because SNPs are non-modifiable, environmental confounders of the relationship between SNP, exposure, and outcome should not influence MR estimates. We used the mtCOJO approach to condition mental health outcomes with the per-SNP effects of EDU and SES phenotypes and investigate their underlying biology at multiple levels: (1) risk locus detection, (2) heritability (h2), (3) gene-set enrichment, (4) tissue transcriptomic profile enrichment, (5) cell type transcriptomic profile enrichment, (6) phenotype relationships via structural equation modeling and genetic correlation, and (7) latent genetically causal relationships (see flow diagram Fig S1). Our findings identify several cell types and phenotype relationships that were masked by the shared genetic etiology between mental health outcomes and EDU/SES. Furthermore, we demonstrate that the same multi-level analyses of EDU and SES are largely robust to the effects of shared genetic etiology with mental health outcomes. ## Results ### Trait Inclusion The genetic correlations (rg) between EDU (*educational attainment, cognitive performance, highest math class*, and *self-rated math ability*), SES (*household income* and *Townsend deprivation index*), and mental health outcomes (i.e., psychiatric disorders, personality traits, externalizing behaviors, social science outcomes, and brain imaging phenotypes) were detected using the Linkage Disequilibrium Score Regression (LDSC) method (Fig. 1, Table S1, Figs. S2 & S3) (18). Genetic correlations for EDU and SES phenotype categories were analyzed independently to identify brain imaging phenotypes nominally genetically correlated with at least two of the four EDU phenotypes and both SES phenotypes. We detected only two traits genetically correlated with at least two of the four EDU phenotypes: *left insular cortex* (mean rg = −0.122, se = 0.013) and *left subcallosal cortex* (mean rg = −0.106, se = 0.009). These two brain imaging phenotypes were included in EDU conditioning experiments. ![Fig. 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/01/13/2020.01.09.20017079/F1.medium.gif) [Fig. 1.](http://medrxiv.org/content/early/2020/01/13/2020.01.09.20017079/F1) Fig. 1. Trait inclusion genetic correlations. Genetic correlation between mental health outcomes, education phenotypes, and socioeconomic status phenotypes. Genetic correlations labeled with an asterisk were at least nominally significant. Twenty-nine brain imaging phenotypes were genetically correlated with both SES phenotypes. We tested genetic correlation between these 29 brain imaging phenotypes to identify a subset of high heritability traits to include in SES conditioning experiments. We identified six such brain imaging phenotypes (Fig. S2). These are: *cortex volume, left hemisphere medialorbitofrontal area, right insular cortex, right temporal fusiform cortex, subcortical gray matter volume*, and *volume of right-ventral diencephalon*. The SES phenotypes *income* and *deprivation index* are inversely genetically correlated as visible in Fig. 1. ### Conditioning Heritability and Risk Locus Discovery We tested the effects of conditioning on observed-scale heritability (h2) using LDSC (18). Psychiatric disorders were most sensitive to shared genetic etiology with EDU/SES phenotypes. Except for *major depressive disorder* (*MDD*), *anxiety*, and *posttraumatic stress disorder* (*PTSD*), conditioning reduced the h2 for all psychiatric disorders relative to their original estimates (h2 decrease ranged from 2.44% ± 0.187 for *bipolar disorder* (original h2 = 4.39%; highest conditioned h2 = 2.22%, se = 0.460, p = 5.67×10−65; lowest conditioned h2 = 1.70%, se = 0.440, p = 4.05×10−80) to 29.0% ± 0.105 for *Tourette syndrome* (original h2 = 35.6%; highest conditioned h2 = 6.72%, se = 0.770, p = 2.61×10−18; lowest conditioned h2 = 6.43%, se = 0.730, p = 1.27×10−18); Fig. 2A). *Tourette syndrome* exhibited the largest decrease in h2 after conditioning with the effects of EDU/SES phenotypes (*Tourette syndrome* mean pdiff compared to original h2 = 2.24×10−11, se = 4.42×10−12). Conversely, two phenotypes exhibited significant increases in h2 after conditioning with EDU/SES phenotypes: *neuroticism* (highest conditioned h2 = 20.2%, se = 0.630, p = 3.08×10−226; lowest conditioned h2 = 18.1%, se = 0.590, p =2.35×10−207) and *subjective well-being* (highest conditioned h2 = 3.65%, se = 0.220, p = 8.11×10−62; lowest conditioned h2 = 3.34%, se = 0.220, p =4.67×10−52). ![Fig. 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/01/13/2020.01.09.20017079/F2.medium.gif) [Fig. 2.](http://medrxiv.org/content/early/2020/01/13/2020.01.09.20017079/F2) Fig. 2. Heritability (h2) changes and risk locus discovery. **(A)** Observed-scale h2 changes of mental health outcomes after conditioning with education and socioeconomic status phenotypes. **(B)** Manhattan plots for neuroticism (SSGAC) before and after conditioning with education and socioeconomic status phenotypes. **(C)** Evidence that neuroticism (SSGAC) locus discovery is due to increased detection of polygenicity rather than exacerbated effects of population substructure. Conditioning the *neuroticism* GWAS (original h2 = 9.41%) with EDU/SES phenotypes revealed several novel, confirmed known LD-independent risk loci, and increased heritability (range = 59 loci (*neuroticism* conditioned with *income*) to 100 loci (*neuroticism* conditioned with *deprivation index*; Fig. 2B). We observed an increase in the association signal in the *neuroticism* GWAS with the strongest effects observed after conditioning with SES phenotypes *income* (lambda GC = 1.36; intercept = 0.971, se = 0.009) and *deprivation index* (lambda GC = 1.75; intercept = 0.967, se = 0.009; Fig. 2C). This increase was not related to an increase in the potential bias of population stratification (there was no significant change in the LDSC intercept, p > 0.05), supporting that the observation was attributable to the increased detection of valid *neuroticism* polygenic signals. Using a physical proximity single-SNP-single-gene based annotation of conditioned *neuroticism* genomic risk loci, the top gene sets included Gene Ontology (GO) biological process synaptic signaling (enrichment FDR = 1.5×10−4), GO cellular component synapse part (enrichment FDR = 5.40×10−4), and Kyoto Encyclopedia of Genes and Genomes (KEGG) dopaminergic synapses (enrichment FDR = 0.046). The significant increase in h2 for GWAS of *subjective well-being* (original h2 = 2.50%) uncovered a 5.7 kb genomic risk locus on chromosome 7 (minimum genome-wide significant p-value = 1.45×10−8) which maps to the α2δ1 subunit of calcium voltage-gated channel (*CACNA2D1*). The protein encoded by *CACNA2D1* has been implicated in familial epilepsy and intellectual disability pedigrees but to our knowledge has not been implicated in genome-wide studies of these phenotypes (19, 20). ### Tissue-Type Transcriptomic Profile Enrichment Differences After conditioning with GWAS of EDU/SES phenotypes, *schizophrenia* was the only mental health outcome demonstrating significant changes in tissue transcriptomic profile enrichment. Compared to original *schizophrenia* brain tissue transcriptomic profile enrichments, all conditioned *schizophrenia* brain tissue GTEx annotations, with the exception of c1 cervical spinal cord, had significantly decreased enrichments (Fig. S4). The maximum decrease was observed after conditioning *schizophrenia* with the EDU phenotype *educational attainment* (average beta decrease for all brain tissue annotations = 0.038 ± 0.004). After conditioning with EDU and SES phenotypes, the cerebellum and cerebellar hemisphere GTEx annotations remained the most enriched in the *schizophrenia* GWAS (original cerebellum enrichment = 0.080, p = 1.76×10−22; original cerebellar hemisphere enrichment = 0.077, p = 1.28×10−22; mean conditioned cerebellum enrichment = 0.047 ± 0.001, FDR < 0.05; mean conditioned cerebellar hemisphere enrichment = 0.047 ± 0.001, FDR < 0.05). After adjusting for the effects of *cognitive performance* and *educational attainment* and correcting for multiple testing, we uncovered enrichment of skeletal muscle tissue transcriptomic profiles in the *schizophrenia* GWAS (original skeletal muscle enrichment = 0.009, p = 0.135; skeletal muscle enrichment conditioned with *educational attainment* = 0.010, p = 0.032; skeletal muscle enrichment conditioned with *cognitive performance* = 0.011, p = 0.024) (21). ### Cell-Type Transcriptomic Profile Discoveries Cell-type transcriptomic profile enrichments were evaluated in two ways: (1) assess differences in within-data-set cell-type enrichments before and after conditioning with EDU/SES (based on MAGMA cell-type enrichment Step 1 (22)) and (2) assess the effects of conditioning on the detection of conditionally independent proportionally significant (PS) cell type enrichments (based on MAGMA cell-type enrichment Step 3 (22)). PS cell-types are those whose genetic signals could be differentiated from one another. PS values ≥ 0.80 indicate independent genetic signals relative to a second cell type. We then used genes whose expression profiles define the excitatory (Ex) and inhibitory (In) cell types of PsychENCODE (23) to perform gene set enrichment analyses of GO and KEGG gene sets. There were no differences in cell-type transcriptomic profile enrichments for mental health outcomes (MAGMA cell-type Step 1) after conditioning with EDU/SES; however, we discovered several PS cell-type pairs not detected in the unconditioned GWAS for (1) *risky behavior*, (2) *MDD*, and (3) *schizophrenia* (MAGMA cell-type Step 3). These PS cell-type findings and relevant gene set results for are described in detail below. In unconditioned GWAS of *risky behavior*, there were no PS cell-type enrichments After conditioning with the EDU phenotypes *cognitive performance* and *educational attainment*, human cortex fetal quiescent and Ex2 were conditionally independent from one another (*risky behavior* conditioned with *cognitive performance* Ex2 β = 0.035, p = 7.48×10−4, PS = 1.37; fetal quiescent β = 0.023, p = 0.032, PS = 1.82; *risky behavior* conditioned with *educational attainment* Ex2 β = 0.034, p = 0.001, PS = 1.38; fetal quiescent β = 0.024, p = 0.030, PS = 1.77; Fig 3A). Ex2 neurons also were detected in the *risk tolerance* GWAS after conditioning with *educational attainment*, but this signal could not be distinguished from hippocampal CA1 subfield cells. The genes that define the Ex2 cell type were enriched in nervous system development (GO:0007399; enrichment FDR = 3.70×10−4) and eye development (GO:001654; enrichment FDR = 6.30×10−4) gene sets. ![Fig. 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/01/13/2020.01.09.20017079/F3.medium.gif) [Fig. 3.](http://medrxiv.org/content/early/2020/01/13/2020.01.09.20017079/F3) Fig. 3. Cell-type transcriptomic profile enrichments underlying mental health outcomes. Cross-data-set proportionally significant (PS) and conditionally independent (i.e., genetic signatures of cell-type pairs are distinguishable) cell-type transcriptomic profile enrichments underlying unconditioned and conditioned GWAS for **(A)** risky behavior, **(B)** major depression, **(C)** schizophrenia, and **(D)** bipolar disorder. The human cell-type data sets from FUMA are labeled individually for each panel using different colors; cell types in the x and y directions are conditionally independent signals from within-data-set analysis performed in FUMA (cell-type enrichment step 2 (22)). Genetic signals from colinear cell types labeled with a single asterisk could not be differentiated from one another in FUMA. The unconditioned *MDD* GWAS exhibited cell-type transcriptomic profile enrichments between adult GABAergic neurons, In6b, and gestational week 10 (GW10) stem cells. After conditioning with *self-rated math ability*, the genetic signal from human midbrain neurons was conditionally independent from lateral geniculate nucleus (LGN) GABAergic neurons (β relative to midbrain neurons = 0.041, p = 0.002, PS = 0.822; Fig 3B), In6b neurons (β relative to midbrain neurons = 0.517, p = 6.59×10−6, PS = 0.969), and In5 neurons (β relative to midbrain neurons = 0.039, p = 5.26×10−5, PS = 0.813). The gene expression profiles of these cell types implicate the neurotransmitter transport (GO:0007269; enrichment FDR = 0.003) and locomotory behavior (GO:0007626; enrichment FDR = 0.015) gene sets in *MDD* psychopathology. The cell-type transcriptomic profiles underlying *schizophrenia* initially highlighted the role of Ex7 and human cortical neurons with conditionally independent genetic signals. After conditioning the *schizophrenia* GWAS with *self-rated math ability*, we uncovered conditionally independent PS genetic signals from GW26 GABAergic neurons and GW10 stem cells (GABAergic neuron β = 0.046, p = 9.46×10−9, PS = 1.00; GW10 stem cell β = 0.031, p = 1.04×10−4, PS = 1.00; Fig 3C and 3D). Importantly, the independent genetic signals of Ex7 and human cortical neurons persisted after conditioning the *schizophrenia* GWAS with EDU and SES phenotypes. There were no conditionally independent PS cell-type signals in the unconditioned GWAS, but the GWAS of *bipolar disorder* conditioned with *educational attainment* revealed PS genetic signals from (1) Ex7 (β relative to GABAergic neurons from the lateral geniculate nucleus (LGN) = 0.043, p = 1.50×10−10, PS = 0.952 and Ex7 beta relative to LGN human cortical neurons = 0.035, p = 5.46×10−6, PS = 0.999), (2) LGN GABAergic neurons (β relative to Ex7 = 0.045, p = 1.42×10−4, PS = 0.871), and (3) human cortical neurons (β relative to Ex7 = 0.001, p = 0.044, PS = 0.904) but could not distinguish genetic signals between the LGN GABAergic and human cortical neuron cell types. The genes contributing to the Ex7 cell type were enriched in gene sets related to nervous system processes (GO:0050877; enrichment FDR = 0.014) and synaptic signaling (GO:0099536; enrichment FDR = 0.023). ### Correlative, Latent, and Causal Relationships between Mental Health Outcomes Genetic correlations were assessed between all mental health outcomes after conditioning with each EDU and SES phenotype. Though small changes in genetic correlation magnitude were observed, the mental health outcome genetic correlations largely persisted even after conditioning with EDU/SES (Fig. S5). Two mental health outcomes, however, demonstrated significant changes in their genetic correlations after conditioning: (1) genetic correlations with *neuroticism* and (2) genetic correlations with *volume of the right-ventral diencephalon*. The genetic correlations between conditioned *neuroticism* and (1) *MDD* (original rg = 0.732, mean conditioned rg = 0.574 ± 0.008), (2) *subjective well-being* (original rg = −0.718, mean conditioned rg = −0.522 ± 0.009), and (3) *tiredness* (unconditioned rg = 0.638, mean conditioned rg = 0.490 ± 0.017) were at least nominally significant, and were in each case significantly lower than the unconditioned relationship. Unconditioned *right-ventral diencephalon volume* was significantly genetically correlated with *subcortical gray matter volume* (unconditioned rg = 0.620, p = 8.77×10−15) and *schizophrenia* (unconditioned rg = 0.134, p = 0.009). After conditioning with *income*, the genetic correlation between *volume of the right-ventral diencephalon* and (1) *subcortical gray matter volume* persisted (conditioned rg = 0.612, p = 1.44×10−14), (2) *schizophrenia* switched directions and remained significant (conditioned rg = −0.120, p = 0.011, pdiff = 2.72×10−4), and (3) *risk tolerance* became significant (conditioned rg = −0.123, p = 0.044). Conversely, after conditioning with the effects of *deprivation index*, the genetic correlation between *volume of the right-ventral diencephalon* and (1) *subcortical gray matter volume* was no longer significant (conditioned rg = −0.076, p = 0.315), (2) *schizophrenia* increased in magnitude (conditioned rg = 0.198, p = 5.20×10−12, pdiff = 0.294), and (3) several additional phenotypes become at least nominally significant (conditioned rg with *autism spectrum disorder* (*ASD*) = 0.478, p = 3.40×10−30; with *bipolar disorder* = 0.181, p = 2.93×10−8; with *risky behavior* = 0.418, p = 1.50×10−59; with *subjective well-being* = −0.363, p = 1.83×10−17; and with *tiredness* = 0.343, p = 1.82×10−20; Fig. S5). Genomic Structural Equation Modeling (GenomicSEM) was used to identify how unconditioned and conditioned mental health outcomes relate to a latent unobserved genetic factor connecting them (Fig. 4). In unconditioned models, exploratory factor analysis (EFA) identified a two-factor model as best suited to explain the relationships among mental health outcomes. In confirmatory factor analysis (CFA), these two latent factors generally highlight relationships between all psychiatric disorders and brain imaging phenotypes (F1) and *anxiety, MDD, depressive symptoms*, and *neuroticism* (F2). The correlation between unconditioned F1 and F2 was 0.14. After conditioning with *highest math class, self-rated math ability*, and *deprivation index*, the GWAS of *neuroticism* and *MDD* were no longer major contributors to the same factor. Conditioned F1 had major contributions from *MDD* (mean loading = 0.611 ± 0.005) and *depressive symptoms* (loading = 0.538 ± 0.098) while conditioned F2 had major contributions from *neuroticism* (loading = 0.877 ± 0.080) and *anxiety* (loading = 0.658 ± 0.009). Interestingly, after conditioning with the SES phenotype *income*, the SEM best-fit converged on a single common factor between all mental health outcomes with major contributions from *MDD* (loading = 0.808, se = 0.068) and *depressive symptoms* (loading = 0.831, se = 0.022). ![Fig. 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/01/13/2020.01.09.20017079/F4.medium.gif) [Fig. 4.](http://medrxiv.org/content/early/2020/01/13/2020.01.09.20017079/F4) Fig. 4. Trait loading onto latent factors. Genomic structural equation modeling of mental health outcomes before and after conditioning with education and socioeconomic status phenotypes. Each column shows the confirmatory factor analysis (CFA) loading value (blue shading indicating that a trait is a major contributor to the latent factor and blue tinting indicating that a trait is a minor independent contributor to the latent factor) for each mental health outcome (in the y direction) into one of two factors (F1 and F2) from exploratory factor analysis (EFA). Grey boxes indicate that a given trait was not predicted to load onto a given factor column. Red boxes indicate that the trait was predicted by EFA to load onto a factor but did not independently load during CFA. Latent Causal Variable (LCV) analyses were used to detect causal relationships between trait pairs that are independent of the genetic correlations between them (24). Considering only the unconditioned mental health outcomes, one trait pair exhibited significant genetic causality proportion (gĉp): *left subcallosal cortex*→*obsessive compulsive disorder* gĉp = 0.167, p = 4.54×10−6 (Table 1 and Fig. 5). This partial causal relationship did not survive conditioning; however, thirteen unique trait pairs demonstrated significant gĉp after conditioning both traits with an EDU or SES phenotype (Table 1). Most notable were those causal relationships involving brain imaging phenotypes which became significant after conditioning with EDU phenotypes: (1) *extraversion*→*left subcallosal cortex* (mean gĉp = 0.188 ± 0.107, 1.23×10−13