A parametric bootstrap approach for computing confidence intervals for genetic correlations with application to genetically-determined protein-protein networks =============================================================================================================================================================== * Yi-Ting Tsai * Yana Hrytsenko * Michael Elgart * Usman Tahir * Zsu-Zsu Chen * James G Wilson * Robert Gerszten * Tamar Sofer ## Abstract Genetic correlation refers to the correlation between genetic determinants of a pair of traits. When using individual-level data, it is typically estimated based on a bivariate model specification where the correlation between the two variables is identifiable and can be estimated from a covariance model that incorporates the genetic relationship between individuals, e.g., using a pre-specified kinship matrix. Inference relying on asymptotic normality of the genetic correlation parameter estimates may be inaccurate when the sample size is low, when the genetic correlation is close to the boundary of the parameter space, and when the heritability of at least one of the traits is low. We address this problem by developing a parametric bootstrap procedure to construct confidence intervals for genetic correlation estimates. The procedure simulates paired traits under a range of heritability and genetic correlation parameters, and it uses the population structure encapsulated by the kinship matrix. Heritabilities and genetic correlations are estimated using the close-form, method of moment, Haseman-Elston regression estimators. The proposed parametric bootstrap procedure is especially useful when genetic correlations are computed on pairs of thousands of traits measured on the same exact set of individuals. We demonstrate the parametric bootstrap approach on a proteomics dataset from the Jackson Heart Study. Key words * Genetic correlation * Heritability * Parametric bootstrap * Sampling * Protein-protein network ## Introduction Genetic correlation measures the relationship between a pair of traits through their shared genetic variability (1). It is a related concept to heritability, which measures the overall genetic contribution to a trait (2). Specifically, genetic correlation is defined as the correlation between the genetic effects of two traits. It can be estimated using individual-level data, or using summary statistics from genome-wide association studies (GWAS) (3). Scientific papers studying genetic architecture of health and behavioral phenotypes now routinely report genetic correlation estimates between phenotypes, sometimes as a step preceding follow up analysis, e.g. with polygenic risk scores or Mendelian randomization analyses (4–7). Genetic correlations are further being studied at the local genomic region level (local genetic correlations), or stratified by genetic annotations, to localize sources of shared genetic underpinning of phenotypes (8–12). Methods for estimating heritability and genetic correlations based on summary statistics from GWAS (3,13,14) became popular in recent years due to their computational tractability and the access to many phenotypes that were interrogated in GWAS by the research community. However, in diverse populations and in small datasets it is still preferable to estimate heritabilities and genetic correlations using individual-level data, rather than based on GWAS summary statistics (15). Methods using individual-level data typically rely on an underlying linear mixed model (LMM) formulation, where a genetic relationship matrix is used to model the relationship, or degree of similarity, between the phenotype levels of different individuals (16,17). When estimating genetic correlation between two phenotypes, a bivariate normal model is usually used. Common algorithms for estimating heritability and genetic correlation include Restricted Maximum Likelihood (REML)-based normal likelihood models (18), and method of moment estimators such as the Haseman-Elston approach (15). Estimation of standard errors (SEs), confidence intervals (CIs), and p-values, often relies on asymptotic normal approximation. However, both heritability and genetic correlations have a limited support: heritability is bounded on the [0,1] interval, and genetic correlation on the [-1,1] interval. This means that asymptotic normal approximation may not be appropriate when estimates are close to the boundary of the parameter space, and the problem is more severe with smaller datasets. Previous publications addressed the problem of confidence interval estimation in the context of heritability (19,20), but, although the distribution of genetic correlations has been studied (21–23), methods for confidence interval computation in the era of large-scale genomic studies have not been as developed. Notably, we previously proposed a Fisher’s transformation-based approach and a blocked bootstrap, relying on resampling from the data, by blocks of related individuals (15). The blocked bootstrap worked better than the Fisher’s transformation approach, but was computationally more intensive and we therefore only allowed for a small number of resamples, limiting the potential coverage of the confidence intervals as well as application at scale (i.e., for millions of traits). Here, we build on a prior work by Schweiger et al. (19), in the context of heritability. We expand their parametric bootstrap test-inversion method which eliminates the dependency on asymptotic approximation. In this paper, we develop a parametric bootstrap approach to construct CIs for genetic correlations to better model the unknown distribution of genetic correlations. The procedure requires simulating pairs of phenotypes using existing correlation structure between individuals in a given dataset, based on sets of values of heritabilities and genetic correlation between the phenotypes. The results from the simulation study are used to construct CIs for the genetic correlation parameter based on triplets of estimated heritabilities and genetic correlation of a pair of phenotypes, using the conditional empirical probability mass function (PMF) of the genetic correlation parameter. We demonstrate and compare, through simulations, the performance of two variations of the parametric bootstrap procedure, and further compare them with construction of CIs based on the Fisher’s transformation of the estimated genetic correlation, and estimated standard errors (SEs) of the correlation parameter from asymptotic normal assumption on restricted maximum likelihood estimates. Despite being a resampling method, typically requiring many computations and thus computationally costly, our approach is very useful when estimating genetic correlations between thousands of traits measured on the same dataset, because the simulation study used to construct PMFs is performed once and may be used many times. Thus, we demonstrate the application of the parametric bootstrap approach to study genetic correlations between a high-dimensional set of proteins and to develop protein-protein networks based on the genetic correlations estimated in the Jackson Heart Study dataset. ## Methods ### Linear Mixed Model (LMM) formulation Let ***y*** be an *n* × 1 phenotype outcome vector and ***X*** be an *n* × *p* matrix containing values of *p* covariates measured on *n* participants. Let ***e*** be an *n* × 1 vector of residuals, or errors, which we assume are potentially correlated across participants due to shared genetic effects. Suppose that the *n* × *n* matrix ***K*** models the genetic relationship between individuals, such that its *i,j* entry *k**i,j* is, for example, (twice) the kinship coefficient between individual *i* and *j*, and is equal to *k**j,i* = *k**i,j* (i.e., this is a symmetric matrix). Note that genetic relationship could be estimated by various quantities (24), without loss of generality, we here assume that we use a kinship matrix using identity by descent estimates. Following standard linear mixed model formulation of heritability, we model the outcome as ![Formula][1] where ***β*** are the regression coefficients of the covariates, here treated as nuisance parameters. Suppose that the total variance is decomposed to a genetic variance and remaining residual variance. Let ![Graphic][2] be the genetic variance component, and ![Graphic][3] be the residual variance component, so that ![Formula][4] The narrow-sense heritability, defined as the proportion of total variance explained by additive genetic factors is: ![Formula][5] Here, we assume, without loss of generality, that ![Graphic][6]. Therefore, we have ![Graphic][7], meaning that the genetic variance is equal to the heritability. Under this assumption, the variance of the phenotype can be written as ![Formula][8] Given two *n* × 1 vectors ***y*****1**,***y*****2**, their covariance can be modelled as ![Formula][9] where ![Graphic][10] is the genetic variance for phenotype ![Graphic][11] is the residual variance for phenotype *i, ρ**k* is the genetic correlation between the two phenotypes, and *ρ**e* is the residual correlation between the two phenotypes (15). Alternatively: ![Formula][12] If we further plug in ![Graphic][13], then, for a single and for two phenotypes, we can write the variance model as: ![Formula][14] ![Formula][15] which is the form that we will use to simulate outcomes in the following parametric bootstrap section. ### Parametric Bootstrap We use a parametric bootstrap approach to compute confidence intervals. In brief, we simulate data for each set of potential values of heritabilities ![Graphic][16],![Graphic][17] and genetic and residual correlation ![Graphic][18] between two phenotypes based on the existing genetic relationship between individuals in the dataset of interest. Next, we compute confidence intervals by inferring ranges of potential values of *ρ**k* (integrated over potential values of ![Graphic][19],![Graphic][20], as the true values are not known) that resulted in realized (estimated) values ![Graphic][21]. In practice, to limit computational burden, we fix ![Graphic][22] (and we assess the use of ![Graphic][23] also as fixed values in simulations). **Step 1: Random sampling of genetically correlated outcomes** For every given combination of the potential heritability of phenotype 1 ![Graphic][24], potential heritability of phenotype 2 ![Graphic][25], and potential genetic correlation ![Graphic][26], we draw *N* (e.g., 10,000) pairs of phenotype vectors (***y*****1**, ***y*****2**) from the multivariate normal distribution ![Formula][27] where ![Formula][28] where ![Graphic][29] and ![Graphic][30]. We note here that *ρ**e* may take potential value in the interval [−1,1], but we choose just one value as mentioned earlier. We used 10 settings for ![Graphic][31], 10 settings for ![Graphic][32], and 20 settings for ![Graphic][33] as follows: ![Formula][34] Under this setup, there are 2,000 distinct combinations of triplets ![Graphic][35] in total. We note that while developing this procedure we compared using finer grids of values, with sequences with differences of 0.01 between each two consecutive values, but the results remained essentially the same while the computational burden was substantially higher. Because the grid size determines the accuracy level of potential confidence interval coverage, we later offer a solution to increase coverage without simulating a finer grid of values. **Step 2: Genetic Correlation and Heritability Estimation** Next, based on each sampled pair of phenotype vectors (***y*****1**,***y***2) we estimate ![Graphic][36]. While the procedure is in principle naïve to the specific formula used, we are using the closed-form Hasemen-Elson formulas we previously derived (15,20): ![Formula][37] ![Formula][38] Where ***W*** is either the kinship matrix with all diagonal values set to zero, or, a weighted sum of the kinship matrix ***K*** and the matrix modelling the random error (here, an identity matrix) with weights related to the relationship between the kinship matrix and the identity matrix. See (15) for more details, including the potential use of multiple matrices modelling correlations between individuals. In practice, it is appropriate to use the kinship matrix with diagonal value set to zero when only the kinship matrix is used to model relationship between individuals. Using these formulas rather than likelihood-based procedures is computationally quicker as no iterations are required. **Step 3: PMF estimation for** ![Graphic][39]**)** We now derive the expression for the conditional probability of *ρ**k* given the estimated parameters. Because the support of ![Graphic][40],![Graphic][41],*ρ**k* are continuous where *h*1,*h*2 ∈ [0,1] and *ρ**k* ∈ [-1,1], while the results from simulations are discrete values, we divide these ranges into bins, e.g., of size 0.1, i.e., forcing them into a discrete distribution: ![Formula][42] When estimating CIs for genetic correlations, we are given the estimates ![Graphic][43],![Graphic][44],![Graphic][45] and we want to identify a region 𝒜 such that ![Graphic][46]. Therefore, we want to estimate the probabilities ![Graphic][47] for *i* = 1, …,20 in order to create an empirical probability mass function (PMF) and use it to generate confidence intervals, which can be derived using Bayes theorem. The derivation below uses the probabilities ![Graphic][48], which are the probabilities of ![Graphic][49],![Graphic][50],![Graphic][51] being in given regions conditional on the fixed values of ![Graphic][52],![Graphic][53],![Graphic][54] (note that the probabilities of the estimated heritabilities do not depend heritabilities of other traits on of the genetic correlation between them). Moving forward, we drop the notations showing that values refer to bins (regions) for brevity, with the understanding that all probabilities refer to parameters being in bins. Therefore, we will denote ![Graphic][55] instead of ![Graphic][56], etc. We first note that ![Graphic][57], and therefore, we need to estimate ![Graphic][58] and ![Graphic][59]. ### Estimating ![Graphic][60] We estimate ![Graphic][61] based on the following: ![Formula][62] ![Formula][63] where, for bins of length 0.1, *n**p* = 20, *n**h* = 10. ### Estimating ![Graphic][64] We estimate ![Graphic][65] based on the following: ![Formula][66] Putting these together: ![Formula][67] ![Graphic][68] (computed for each pre-defined region) is then the empirical probability mass function of *ρ**k* obtained by parametric bootstrap. ### Computing confidence intervals from the PMF After obtaining the empirical PMF from parametric bootstrap, we can now derive the CIs for any given genetic correlation estimate ![Graphic][69] with a coverage probability of 1 − *α* (e.g., 95%). Because the parameters are bounded, constructed confidence intervals may be asymmetric in both the distance between the estimated ![Graphic][70] to the low and high values of the confidence interval, and in the cumulative probability between provided by the two “sides” (around ![Graphic][71]) of the confidence interval. We address this by considering the following three cases depending on the cumulative probability ![Formula][72] Here *cp**l* denotes cumulative probability of potential *ρ**k* values lower or equal to the estimated ![Graphic][73]. Denote the low and the high values of the confidence interval for ![Graphic][74] by ![Graphic][75] and ![Graphic][76]. Then a 1-*α* confidence interval news to include all potential values ![Graphic][77] such that: ![Formula][78] **Case 1:** If ![Graphic][79] Here, ![Graphic][80] corresponds to the first potential value ![Graphic][81] (i.e. a point in the first considered bin, where bins are considered by order ![Graphic][82],![Graphic][83] …) where ![Graphic][84].![Graphic][85] corresponds to the smallest potential value ![Graphic][86] satisfying equation (6). **Case 2:** If ![Graphic][87] In this case we first identify ![Graphic][88] as the highest ![Graphic][89] (i.e. a point in first considered bin, where bins are considered by order ![Graphic][90],![Graphic][91] …) with ![Graphic][92].![Graphic][93] corresponds to the highest potential value ![Graphic][94] satisfying equation (6). **Case 3:** Both ![Graphic][95] and ![Graphic][96] Here we require ![Graphic][97] to be the largest value and ![Graphic][98] to be the lowest such that ![Formula][99] When the upper bound or lower bound of CIs obtained from the above procedure lies somewhere inside the bins defined by the grid of considered values, which is often the case, we use linear interpolation to get a position for upper and lower bound as point within the bins. ### Empirical Beta Approximation to the PMF for CI estimation Because the PMF is discrete, it limits the potential coverage of constructed CIs and the potential computation of accurate p-values. Thus, we study a continuous beta approximation to the empirical PMF from parametric bootstrap. Since the range of genetic correlation is [-1, 1], and the range of beta distribution is [0, 1], we first map the [-1, 1] range of genetic correlations to [0,1] range of beta distribution through a location-scale transformation. After finding the 100*(1 − *α*)% CIs of ![Graphic][100] on the beta scale using a similar approach to that reported based on the discreate PMF, we apply the inverse location-scale transformation from [0, 1] to [-1, 1] to retrieve the CIs of genetic correlations. ### The Jackson Heart Study The Jackson Heart Study (JHS) is a longitudinal study following 5,306 individuals of African American background from Jackson Mississippi (25,26). The study population included 2,050 related and unrelated JHS participants who had whole genome sequencing (WGS) through the Trans-Omics for Precision Medicine (TOPMed) program, proteomics data, and available body mass index (BMI). The TOPMed Data Coordinating Center used TOPMed WGS data from the TOPMed freeze 8 release and computed kinship matrix, tabulating the genetic relationship between TOPMed participants. We subsetted this matrix into JHS participants. Concentration levels of 1,317 proteins were measured using the SomaScan platform (27). The JHS study was approved by Jackson State University, Tougaloo College, and the University of Mississippi Medical Center Institutional Review Boards, and all participants provided written informed consent. We excluded 5 proteins with more than 80% missing values. The remaining dataset had no missing protein measurements. Protein measurements were adjusted for batch effect by rank-normalizing each protein separately in each batch and then aggregating the data across batches. Next, the protein measurements were regressed over (1) only intercept, and (2) over age, sex, and BMI. The residuals from each of these regressions were extracted and were used for estimating heritabilities and genetic correlations between all protein pairs using Haseman-Elston estimators provided in equations (4) and (5), in addition to heritabilities and genetic correlations estimated using the rank-normalized protein levels (without regressing them on covariates). Also, we compared the estimates of genetic correlations to estimated Pearson correlations calculated using *stats* R package (version 3.6.2). ### Simulation Studies We used the kinship matrix from the JHS data to perform simulations. To study methods’ performance in larger sample sizes, we also created simulated datasets mimicking the JHS in which we used block matrices, with blocks being the original JHS kinship matrix using n = 2,050 individuals. We used 2 and 3 times the original sample size to form block diagonal kinship matrices with n = 4,100 and n = 6,150. We referred to simulations using the kinship matrix, and the 2- and 3-times block matrices as Setting A, B, and C. Thus, we used these kinship matrices to (1) perform simulations for the parametric bootstrap procedure, where in the primary we fix ![Graphic][101] as a conservative potential high value of *ρ**e*. We also performed simulations comparing the choice of ![Graphic][102]. Next, (2) we generate new simulated data that used the results of the parametric bootstrap simulations (1) to construct CIs. We performed 10,000 simulations for each combination of potential ![Graphic][103], with ![Graphic][104] and ![Graphic][105]. We constructed CIs for the estimated ![Graphic][106] in each simulation. #### Comparison: four approaches of constructing CIs We estimated the coverage and the width of the CIs constructed using a few approaches: (a) percentiles of the empirical PMF constructed using the parametric bootstrap approach; (b) beta approximation to the empirical PMF; and two existing methods: (c) Fisher’s transformation; and (d) normal approximation to the distribution of the estimated genetic correlation implemented in the GCTA package (29). The Fisher’s transformation method assumes that genetic correlations follow the same distributions as Pearson correlations (30). More specifically, they are normally distributed after Fisher’s transformation. For genetic correlation *ρ**k*, ![Formula][107] With *N**eff* being the “effective sample size”, previous proposed to be *trace*(***K***−***K***−) with ***K***− being the kinship matrix with diagonal values set to zero (15). We can then construct the CIs of z by the standard approach assuming asymptotic normal distributions. For example, the 95% CI of z would be [*μ* − 1.96 *σ, μ* +1.96*σ*]. After finding the 100*(1 − *α*)% CIs of ![Graphic][108] on the Fisher’s transformed (*z*) scale, we apply the inverse Fisher’s transform to retrieve the CIs of the genetic correlation *ρ**k*. To compute CIs based on existing approach that rely on a normal approximation, we estimate both the genetic correlation and its standard error using the bivariate REML procedure implemented in the GCTA package. We apply the ![Graphic][109] formula to construct 95% CIs. Due to the high computational resources required by GCTA, we focus only on the four scenarios when true *ρ**k* equals {0.05, 0.15, 0.45, 0.95} with the original-size kinship matrix. #### Performance evaluation metrics We used coverage probabilities and CI widths as the metrics to evaluate and compare the performance of the four approaches for CI construction. In primary results, for a given true value of genetic correlation *ρ**k* we calculated both the coverage probability and the average width of 95% CIs using the constructed CIs for the estimated *ρ**k* over all the 100 true heritability scenarios (10 for ![Graphic][110], and 10 for ![Graphic][111]). Ideally, the coverage probabilities would be at or above 95% across different *ρ**k*, and also having small CI widths. The coverage probability was estimated as the proportion of simulations in which the true *ρ**k* was contained in its CI. #### P-value estimation We evaluated the use of the CI inversion method to obtain p-values for hypothesis testing. Here,our null hypothesis H is *ρ**k* = 0, and our alternative hypothesis H1 is *ρ**k* ≠ 0. Given any realization ![Graphic][112], we can estimate the CIs for *ρ**k* using the parametric bootstrap procedure, focusing on the continuous beta approximation to the empirical PMF because smaller accurate p-values can be obtained if the underlying distribution is continuous. To determine the p-values of genetic correlation estimates, we use the CI inversion methods. Suppose that we construct a 100× (1 − *α*)% CI. Then, we can determine that the p-value is smaller than *α* if the constructed CI does not cover 0. For computational efficiency, we implemented a method that constructs CIs using a binary search approach to the *α* value, stopping when a pre-defined sensitivity level is reached. #### Type 1 error For each combination of potential heritability values ![Graphic][113], we simulated 10,000 pairs of phenotype vectors (***y*****1*****y*****2**) under the null, i.e., *ρ* = 0, estimated their genetic correlations, and calculated their p-values as described above based on the beta approximation to the PMF. After obtaining the p-values for all the 10,000 simulated data, we estimated the type 1 error rate, also called the size of the test, by checking the percentage of these simulation rejecting the null given an *α* value. ## Results ### Simulation studies In primary results, for a given true value of genetic correlation *ρ**k* we calculated both the coverage probability and the average width of 95% CIs by averaging the corresponding estimates over all the 100 true heritability scenarios (10 for ![Graphic][114], and 10 for ![Graphic][115]). Supplementary Table 1 provides estimated coverage by all combinations of (true) ρk,![Graphic][116],![Graphic][117]. Figure 1 provides the estimated coverage probabilities for the compared methods in simulations, and Figure 2 provides the averaged CI widths. The PMF approach provides appropriate coverage across the three settings defined by the kinship matrices. The beta approximation to the PMF results in under-coverage across the simulated *ρ**k* values setting A, but improved substantially in settings B and C when the simulated sample size increased. Still the average width of the CIs was lower when using the empirical PMF. In setting A, GCTA had an appropriate coverage only when *ρ**k* was set to 0.05. The Fisher’s transformation tended to result in under-coverage. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F1) Figure 1. Estimated coverage of 95% confidence intervals of genetic correlations in the primary simulations. The columns represent different kinship matrix sizes: Setting A denotes the use of original-size kinship matrix (n=2,050), Setting B denotes the use of the double-size kinship matrix (n=4,100), and Setting C denotes the use of the triple-size kinship matrix (n=6,150). The rows represent the four approaches for constructing CIs, including parametric bootstrap PMF, beta approximation for parametric bootstrap PMF, Fisher’s transformation, and GCTA package use of normal distribution approximation. Only parts of the analyses were carried out on the GCTA package due to the high computational resources required. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F2) Figure 2. Average width of the confidence intervals of the genetic correlations in the primary simulations. The columns represent different kinship matrix sizes: Setting A denotes the use of original-size kinship matrix (n=2,050), Setting B denotes the use of the double-size kinship matrix (n=4,100), and Setting C denotes the use of the triple-size kinship matrix (n=6,150). The rows represent the four approaches for constructing CIs, including parametric bootstrap PMF, beta approximation for parametric bootstrap PMF, Fisher’s transformation, and GCTA package. Only parts of the analyses were carried out on the GCTA package due to the high computational resources required. Figure 3 compares coverage and CI widths when using the empirical PMF approach to compute CIs and settings *ρ**e* = 0,0.2, or 0.4 in the simulations generating data. It demonstrates that there is essentially no difference in the results. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F3) Figure 3. Setting environmental correlation between error terms has no effect on genetic correlation estimates. Comparison of coverage and CI widths when using the empirical PMF approach to compute CIs and settings or in the simulations generating data. We also used the simulations to estimate type 1 error when using the confidence interval inversion methods with the beta approximation to the PMF to compute association p-values. The results are visualized in Figure 4. Here, we also estimated type 1 error by combinations of specific ![Graphic][118] values. With *α* = 0.05, the type 1 error was controlled across heritability combinations in settings B and C, but not in setting A. While it is unsurprising that the type 1 error is not controlled when heritability values of either one of the two traits are very small, the test was also somewhat inflated in setting A when the two heritabilities were fairly high. Over all, the beta approximation method to the PMF is promising for computing high coverage CIs and p-values when the sample size is sufficiently large. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F4) Figure 4. Type 1 error estimates when using the confidence interval inversion method and the Beta approximation to perform association testing Visualization of type 1 error () when using the CI inversion approach coupled with the Beta approximation to the PMF to generate CIs. The results are provided for each simulation settings and by values of. ### Application to genetically-determined protein-protein networks in JHS We estimated heritabilities and genetic correlations for every pair of proteins among the 1,317 proteins available in JHS, in an analysis adjusted to age, sex, and BMI (in which protein measures were first regressed over these covariates prior to estimation of genetic correlations based on the resulting residuals), and in an unadjusted analysis. Characteristics of the JHS dataset are provided in Table 1. Of the study participants, 61% were women. Individuals were 55 (male)-56 (female) on average, and were mostly overweight. Some individuals were close. For example, there were 341 pairs of individuals with estimated coefficient of relationship ≥ 0.48, and 1,113 pairs of individuals with coefficient of relatives relationship ≥ 0.12 (considering the total number of unique pairs of individuals, this corresponds to 0.05% of all pairs of participants). View this table: [Table 1:](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/T1) Table 1: JHS dataset characteristics stratified by sex. Supplementary Tables 2 and 3 provide the estimated heritabilities of all proteins in the dataset from analysis unadjusted and adjusted to covariates (age, sex and BMI) respectively. Based on the simulations using this specific dataset, we removed from consideration proteins with estimated heritabilities ![Graphic][119], as genetic correlations and p-values using the beta, approximation method are less reliable compared to higher values of (real, not estimated) heritabilities. We also excluded proteins with estimated ![Graphic][120] because such high may suggest a problem with the measurement and/or genetic characterization of a protein (e.g., technical issue with the platform, genetic variants segregated to a few families, etc.). After the above filtering, there were 403 and 431 proteins, or 81,406 and 93,096 protein-protein pairs, available for genetic correlation analysis in the covariate-adjusted and unadjusted analyses, respectively. For each set of the proteins (adjusted and unadjusted), it took around 2.5 hours to estimate the genetic correlations and around 12 minutes to construct the CIs, based on the previously-constructed parametric bootstrap reference results, for all the protein-protein pairs on a MacBook Pro laptop with an M1 chip. Full results from genetic correlation estimates for these sets of proteins are provided in Supplementary Tables 4 and 5. Figure 5 visualizes the comparison between estimated phenotypic (Pearson) and genetic correlations across these phenotype pairs. The figure suggests that, for this set of highly-heritable proteins, genetic correlations tend to be higher than Pearson correlations (to see this, one needs to focus in Figure 5 on the bright hexbins because they represent many more protein pairs compared to dark hexbins). ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F5) Figure 5. Estimated Pearson versus genetic correlations between heritable proteins. The figure compares the sample Pearson correlation to the estimated genetic correlation ![Graphic][121]for all protein pairs for which the estimated heritability ![Graphic][122] for each of the proteins. The color of each hexbin represents the number (count) of protein pairs with x- and y-axis values falling under the hexbin. #### Protein-Protein Network We visualize the results in a protein-protein network. Due to the large number of protein pairs, we focused the network resulting from protein-protein genetic correlations passing a p-value threshold. We computed p-values for the genetic correlations between the limited set of heritable and “valid” proteins (with heritability estimates that are not egregiously high) using the beta approximation to the PMF, and applied a False Discovery Rate (FDR) correction using the Benjamini-Hochberg procedure (31). The considered pairs of proteins are those with FDR-adjusted genetic correlation p-value<0.01. This corresponds to 253 and 294 pairs of genetically-correlated proteins in adjusted and unadjusted analysis, respectively. Figure 6 visualizes these results. The size of each node represents its degree, with larger ones being “hub nodes/proteins”, (genetically) associated with a large number of proteins. See Supplementary Tables 6 and 7 for estimated genetic correlations between pairs of proteins selected based on the criteria described above. Supplementary Table 8 contains a list of the top 10 hub nodes/proteins and their connections, i.e., the list of proteins connected to each of these hub nodes, both in covariate adjusted and unadjusted analyses. Visually, the network appears to be less connected (and we also know that the number of connections decreased) in analysis that adjusted for age, sex, and BMI. It is likely that genetic correlations decreased because BMI has strong effects on proteins, and the genetic effects on BMI are also strong, so when BMI was adjusted for, genetic effects inducing correlations between proteins were reduced. ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/10/25/2023.10.24.23297474/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2023/10/25/2023.10.24.23297474/F6) Figure 6. Network constructed from top pairs of genetically-correlated proteins. Panel (a) visualizes the protein-protein genetic correlation network using the age, sex, BMI-adjusted proteins; panel (b) visualizes the corresponding network based on unadjusted analysis. The blue edges represent positive genetic correlations, and the grey edges represent negative genetic correlations. Larger nodes are hub proteins where multiple proteins have strong genetic correlations with each other both in covariate adjusted and unadjusted analyses. Some of the hub proteins include *APO\_D, PREKALLIKREIN, NOTCH\_3, HPV\_E7\_TYPE18, CARBONIC\_ANHYDRASE\_IV, CDK5\_P35, DKK\_4, PAK3, TRKC, MIS, C5A, OMD, JAG1, HEPARIN\_COFACTOR\_II, BFGF\_R, and MMP\_2, GDF\_11\_8*. ## Discussion We developed a parametric bootstrap procedure to estimate confidence intervals for the genetic correlation estimator, studied it in simulations, and applied it to learn a protein-protein network using a set of heritable proteins measured in the Jackson Heart Study. Our bootstrap procedure was inspired by a similar approach developed for heritability confidence intervals (19). Compared to the previous publication focusing on heritability, our approach is complicated by the need to simulate pairs of traits, including their heritabilities and genetic correlation between them, i.e., a grid of three parameters rather than one. Indeed, confidence intervals for genetic correlation depend on trait heritability, and are wider when at least one of the traits has low heritability. Thus, the computation burden of our procedure is higher. Especially, it is important to recognize that this procedure, like that of Schweiger et al., is dataset dependent, because it uses the kinship matrix of the specific dataset. However, our procedure is realistic and useful when many genetic correlations are estimated for the same dataset, as in this work. In this case the parametric bootstrap simulation step is performed once but is applied many times. A limitation for the high dimensional number of parameters (many genetic correlation parameters) is the limited level of coverage due to the discreteness of the bootstrap procedure: we cannot use the estimated conditional PMF of *ρ**k* (conditional on the estimated genetic correlation and heritabilities) as it is to obtain confidence intervals at the 1-*α* level when *α* is very small (e.g., 10−7). To address this, we proposed the beta approximation, after local-scale transformation, to the PMF. The beta distribution has two parameters that can be fit to many distribution functions that are on a bounded interval. Based on our simulations, CIs based on the beta approximation tend to be wider than those using the PMF directly, and they can still undercover the desired distribution in low sample sizes. However, for larger sample sizes their performance improves. Overall, we think that for larger sample sizes, e.g., 6,000 individuals, the beta approximation to the PMF will be very useful in providing reliable confidence intervals and, using the inversion method, p-values. It is important to point out that while we performed simulations with a “triple size” JHS kinship matrix, i.e., of n=6,150 individuals, the effective sample size corresponding to it is much lower than that of real potential datasets with 6,150 individuals. That is because we simulated a block diagonal matrix. Realistic kinship matrices will have non-zero off diagonal values throughout (unless forced to be zero for computational efficiency purposes (32)). Existing methods that compute confidence intervals for genetic correlations typically utilize an asymptotic normal distribution argument, at either the untransformed or Fisher’s transformation level. This is appropriate depending on the combination of four factors: sample size, underlying (true) heritabilities of each of the pair of traits, and the underlying genetic correlation. For any given pair of traits and a dataset, any one of these factors may be suboptimal, potentially leading to poor performance of confidence intervals that rely on asymptotic normality. The bootstrap procedure addresses this shortcoming. However, this procedure too does not produce perfect confidence intervals: for low values of heritability of either one of the two traits, the coverage may still be lower than desired in low sample sizes. Note that in reality we do not know the true heritability, we only have estimated heritability. Therefore, we cannot tell whether a CI may not be reliable according to the values of the estimated heritabilities. That is why our main results are provided at an aggregate level, across simulated values of potential heritabilities. We demonstrated the use of genetic correlations to infer genetically-determined protein-protein networks. However, we acknowledge that our analysis is limited by the relatively low sample size, which led to posing a stringent filter requiring at least 0.3 protein heritability for inclusion in the downstream analysis. While we chose to include only edges with estimated FDR-adjusted p-value<0.01 (with p-values estimated using the beta approximation), other statistical network approaches may generate sparsity using penalized multivariant regression techniques (33,34). It would be interesting to extend such approaches to genetic (rather than phenotypic) correlation-based networks. In future work we will apply the existing framework on larger datasets and develop approximation methods to further speed up the simulations required for the parametric bootstrap and the estimation of heritabilities and genetic correlations, for example, following (35). ## Supporting information Supplementary Tables [[supplements/297474_file02.xlsx]](pending:yes) ## Data Availability Individual-level JHS data can be obtained by application to dbGaP (accession phs000286) or by data use agreement with JHS Data Coordinating Center (DCC), see study website at [https://www.jacksonheartstudy.org/](https://www.jacksonheartstudy.org/). Summary statistics from analyses reported in this manuscript are provided as supplementary materials. ## Acknowledgements This work was supported by the National Institute of Diabetes and Digestive and Kidney Diseases R01DK081572. The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. * Received October 24, 2023. * Revision received October 24, 2023. * Accepted October 25, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. 1.van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet. 2019 Oct;20(10):567–81. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31171865&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 2. 2.Visscher PM, Hill WG, Wray NR. Heritability in the genomics era--concepts and misconceptions. Nat Rev Genet. 2008 Apr;9(4):255–66. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg2322&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18319743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000254084100009&link_type=ISI) 3. 3.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015 Nov;47(11):1236–41. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3406&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26414676&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 4. 4.O’Connor LJ, Price AL. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018 Dec;50(12):1728–34. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0255-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22504417&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 5. 5.Zhang Y, Elgart M, Kurniansyah N, Spitzer BW, Wang H, Kim D, et al. Genetic determinants of cardiometabolic and pulmonary phenotypes and obstructive sleep apnoea in HCHS/SOL. EBioMedicine. 2022 Oct;84:104288. 6. 6.Ikeda M, Tanaka S, Saito T, Ozaki N, Kamatani Y, Iwata N. Re-evaluating classical body type theories: genetic correlation between psychiatric disorders and body mass index. Psychol Med. 2018 Jul;48(10):1745–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291718000685&link_type=DOI) 7. 7.Kappelmann N, Arloth J, Georgakis MK, Czamara D, Rost N, Ligthart S, et al. Dissecting the Association Between Inflammation, Metabolic Dysregulation, and Specific Depressive Symptoms: A Genetic Correlation and 2-Sample Mendelian Randomization Study. JAMA Psychiatry. 2021 Feb 1;78(2):161–70. 8. 8.Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. Am J Hum Genet. 2017 Nov 2;101(5):737–51. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.09.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29100087&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 9. 9.Zhang Y, Lu Q, Ye Y, Huang K, Liu W, Wu Y, et al. SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Genome Biol. 2021 Sep 7;22(1):262. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-021-02478-w&link_type=DOI) 10. 10.Guo H, Li JJ, Lu Q, Hou L. Detecting local genetic correlations with scan statistics. Nat Commun. 2021 Apr 1;12(1):2033. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-021-22334-6&link_type=DOI) 11. 11.Werme J, van der Sluis S, Posthuma D, de Leeuw CA. An integrated framework for local genetic correlation analysis. Nat Genet. 2022 Mar 14;54(3):274–82. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-022-01017-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35288712&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 12. 12.Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, et al. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. Am J Hum Genet. 2017 Dec 7;101(6):939–64. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.11.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 13. 13.Weissbrod O, Flint J, Rosset S. Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics. Am J Hum Genet. 2018 Jul 5;103(1):89–99. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.06.002&link_type=DOI) 14. 14.Zhang Y, Cheng Y, Jiang W, Ye Y, Lu Q, Zhao H. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Brief Bioinformatics. 2021 Sep 2;22(5). 15. 15.Elgart M, Goodman MO, Isasi C, Chen H, Morrison AC, de Vries PS, et al. Correlations between complex human phenotypes vary by genetic background, gender, and environment. Cell Reports Medicine. 2022 Dec 12; 16. 16.Furlotte NA, Eskin E. Efficient multiple-trait association and estimation of genetic correlation using the matrix-variate linear mixed model. Genetics. 2015 May;200(1):59–68. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZ2VuZXRpY3MiO3M6NToicmVzaWQiO3M6ODoiMjAwLzEvNTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8xMC8yNS8yMDIzLjEwLjI0LjIzMjk3NDc0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 17. 17.Lee SH, van der Werf JHJ. MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics. 2016 May 1;32(9):1420–2. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw012&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26755623&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 18. 18.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012 Oct 1;28(19):2540–2. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bts474&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22843982&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000309687500025&link_type=ISI) 19. 19.Schweiger R, Kaufman S, Laaksonen R, Kleber ME, März W, Eskin E, et al. Fast and accurate construction of confidence intervals for heritability. Am J Hum Genet. 2016 Jun 2;98(6):1181–92. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2016.04.016&link_type=DOI) 20. 20.Sofer T. Confidence intervals for heritability via Haseman-Elston regression. Stat Appl Genet Mol Biol. 2017 Sep 26;16(4):259–73. 21. 21.Brown GH. An empirical study of the distribution of the sample genetic correlation coefficient. Biometrics. 1969 Mar;25(1):63. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=5786309&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 22. 22.Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003 May;63(3):221–30. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0040-5809(03)00007-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12689793&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182420200006&link_type=ISI) 23. 23.Liu BH, Knapp SJ, Birkes D. Sampling distributions, biases, variances, and confidence intervals for genetic correlations. Theor Appl Genet. 1997 Jan;94(1):8–19. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19352739&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 24. 24.Legarra A. Comparing estimates of genetic variance across different relationship models. Theor Popul Biol. 2016 Feb;107:26–30. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.tpb.2015.08.005&link_type=DOI) 25. 25.Wyatt SB, Diekelmann N, Henderson F, Andrew ME, Billingsley G, Felder SH, et al. A community-driven model of research participation: the Jackson Heart Study Participant Recruitment and Retention Study. Ethn Dis. 2003;13(4):438–55. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14632263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000220168500004&link_type=ISI) 26. 26.Taylor HA, Wilson JG, Jones DW, Sarpong DF, Srinivasan A, Garrison RJ, et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn Dis. 2005;15(4 Suppl 6):S6–4. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16317981&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 27. 27.Katz DH, Tahir UA, Bick AG, Pampana A, Ngo D, Benson MD, et al. Whole genome sequence analysis of the plasma proteome in black adults provides novel insights into cardiovascular disease. Circulation. 2022 Feb;145(5):357–70. 28. 28.van Buuren S, Groothuis-Oudshoorn K. mice_J: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45(3). 29. 29.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011 Jan 7;88(1):76–82. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2010.11.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21167468&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 30. 30.Mudholkar GS, Chaubey YP. On the distribution of Fisher’s transformation of the correlation coefficient. Communications in Statistics - Simulation and Computation. 1976 Jan;5(4):163–72. 31. 31.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological). 1995 Jan;57(1):289–300. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346101&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:A1995QE4&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QE45300017&link_type=ISI) 32. 32.Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btz567&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31329242&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 33. 33.Li H, Gui J. Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics. 2006 Apr;7(2):302–17. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biostatistics/kxj008&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16326758&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000236436300010&link_type=ISI) 34. 34.Li B, Chuns H, Zhao H. Sparse estimation of conditional graphical models with application to gene networks. J Am Stat Assoc. 2012 Jan 1;107(497):152–67. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/01621459.2011.644498&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24574574&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F10%2F25%2F2023.10.24.23297474.atom) 35. 35.Wu Y, Burch KS, Ganna A, Pajukanta P, Pasaniuc B, Sankararaman S. Fast estimation of genetic correlation for biobank-scale data. Am J Hum Genet. 2022 Jan 6;109(1):24–32. [1]: /embed/graphic-1.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/inline-graphic-2.gif [4]: /embed/graphic-2.gif [5]: /embed/graphic-3.gif [6]: /embed/inline-graphic-3.gif [7]: /embed/inline-graphic-4.gif [8]: /embed/graphic-4.gif [9]: /embed/graphic-5.gif [10]: /embed/inline-graphic-5.gif [11]: /embed/inline-graphic-6.gif [12]: /embed/graphic-6.gif [13]: /embed/inline-graphic-7.gif [14]: /embed/graphic-7.gif [15]: /embed/graphic-8.gif [16]: /embed/inline-graphic-8.gif [17]: /embed/inline-graphic-9.gif [18]: /embed/inline-graphic-10.gif [19]: /embed/inline-graphic-11.gif [20]: /embed/inline-graphic-12.gif [21]: /embed/inline-graphic-13.gif [22]: /embed/inline-graphic-14.gif [23]: /embed/inline-graphic-15.gif [24]: /embed/inline-graphic-16.gif [25]: /embed/inline-graphic-17.gif [26]: /embed/inline-graphic-18.gif [27]: /embed/graphic-9.gif [28]: /embed/graphic-10.gif [29]: /embed/inline-graphic-19.gif [30]: /embed/inline-graphic-20.gif [31]: /embed/inline-graphic-21.gif [32]: /embed/inline-graphic-22.gif [33]: /embed/inline-graphic-23.gif [34]: /embed/graphic-11.gif [35]: /embed/inline-graphic-24.gif [36]: /embed/inline-graphic-25.gif [37]: /embed/graphic-12.gif [38]: /embed/graphic-13.gif [39]: /embed/inline-graphic-26.gif [40]: /embed/inline-graphic-27.gif [41]: /embed/inline-graphic-28.gif [42]: /embed/graphic-14.gif [43]: /embed/inline-graphic-29.gif [44]: /embed/inline-graphic-30.gif [45]: /embed/inline-graphic-31.gif [46]: /embed/inline-graphic-32.gif [47]: /embed/inline-graphic-33.gif [48]: /embed/inline-graphic-34.gif [49]: /embed/inline-graphic-35.gif [50]: /embed/inline-graphic-36.gif [51]: /embed/inline-graphic-37.gif [52]: /embed/inline-graphic-38.gif [53]: /embed/inline-graphic-39.gif [54]: /embed/inline-graphic-40.gif [55]: /embed/inline-graphic-41.gif [56]: /embed/inline-graphic-42.gif [57]: /embed/inline-graphic-43.gif [58]: /embed/inline-graphic-44.gif [59]: /embed/inline-graphic-45.gif [60]: /embed/inline-graphic-46.gif [61]: /embed/inline-graphic-47.gif [62]: /embed/graphic-15.gif [63]: /embed/graphic-16.gif [64]: /embed/inline-graphic-48.gif [65]: /embed/inline-graphic-49.gif [66]: /embed/graphic-17.gif [67]: /embed/graphic-18.gif [68]: /embed/inline-graphic-50.gif [69]: /embed/inline-graphic-51.gif [70]: /embed/inline-graphic-52.gif [71]: /embed/inline-graphic-53.gif [72]: /embed/graphic-19.gif [73]: /embed/inline-graphic-54.gif [74]: /embed/inline-graphic-55.gif [75]: /embed/inline-graphic-56.gif [76]: /embed/inline-graphic-57.gif [77]: /embed/inline-graphic-58.gif [78]: /embed/graphic-20.gif [79]: /embed/inline-graphic-59.gif [80]: /embed/inline-graphic-60.gif [81]: /embed/inline-graphic-61.gif [82]: /embed/inline-graphic-62.gif [83]: /embed/inline-graphic-63.gif [84]: /embed/inline-graphic-64.gif [85]: /embed/inline-graphic-65.gif [86]: /embed/inline-graphic-66.gif [87]: /embed/inline-graphic-67.gif [88]: /embed/inline-graphic-68.gif [89]: /embed/inline-graphic-69.gif [90]: /embed/inline-graphic-70.gif [91]: /embed/inline-graphic-71.gif [92]: /embed/inline-graphic-72.gif [93]: /embed/inline-graphic-73.gif [94]: /embed/inline-graphic-74.gif [95]: /embed/inline-graphic-75.gif [96]: /embed/inline-graphic-76.gif [97]: /embed/inline-graphic-77.gif [98]: /embed/inline-graphic-78.gif [99]: /embed/graphic-21.gif [100]: /embed/inline-graphic-79.gif [101]: /embed/inline-graphic-80.gif [102]: /embed/inline-graphic-81.gif [103]: /embed/inline-graphic-82.gif [104]: /embed/inline-graphic-83.gif [105]: /embed/inline-graphic-84.gif [106]: /embed/inline-graphic-85.gif [107]: /embed/graphic-22.gif [108]: /embed/inline-graphic-86.gif [109]: /embed/inline-graphic-87.gif [110]: /embed/inline-graphic-88.gif [111]: /embed/inline-graphic-89.gif [112]: /embed/inline-graphic-90.gif [113]: /embed/inline-graphic-91.gif [114]: /embed/inline-graphic-92.gif [115]: /embed/inline-graphic-93.gif [116]: /embed/inline-graphic-94.gif [117]: /embed/inline-graphic-95.gif [118]: /embed/inline-graphic-96.gif [119]: /embed/inline-graphic-97.gif [120]: /embed/inline-graphic-98.gif [121]: F5/embed/inline-graphic-99.gif [122]: F5/embed/inline-graphic-100.gif