Robust Mendelian Randomization Analysis by Automatically Selecting Valid Genetic Instruments for Inferring Causal Relationships between Complex Traits and Diseases
===================================================================================================================================================================

* Minhao Yao
* Zijian Guo
* Zhonghua Liu

## Abstract

Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to estimate the causal effect of a modifiable exposure on the outcome of interest to remove unmeasured confounding bias. However, some genetic variants might be invalid IVs due to violations of IV assumptions, for example, in the presence of population stratification and/or widespread horizontal pleiotropy. Inclusion of invalid genetic IVs for MR analysis might lead to biased causal effect estimate and misleading scientific conclusions. To address this challenge, we propose a novel MR method that first Selects valid genetic IVs and then performs Post-selection Inference (MR-SPI) based on two-sample genome-wide summary statistics. Extensive simulations demonstrate that MR-SPI outperforms other competing methods. We apply MR-SPI to analyze 146 exposure-outcome pairs to establish putative causal relationships. We further analyze 912 proteins using the UK Biobank proteomics data, and identify 7 proteins significantly associated with the risk of Alzheimer’s disease.

## 1 Introduction

In epidemiological studies, it is essential to infer the causal effect of a modifiable risk factor on a health outcome of interest1,2. Despite the fact that randomized controlled trials (RCTs) serve as the gold standard for causal inference, it is neither feasible nor ethical to perform RCTs for many harmful exposures. Mendelian randomization (MR), which leverages the random assortment of genes from parents to offspring to mimic RCTs to establish causality in observational studies3,4,5. MR uses genetic variants, typically single-nucleotide polymorphisms (SNPs), as instrumental variables (IVs) to assess the causal association between an exposure and an outcome6. Recently, many MR methods have been developed to investigate causal relationships using genome-wide association study (GWAS) summary data that consist of effect estimates of SNP-exposure and SNP-outcome associations from two non-overlapping sets of samples, which are commonly referred to as the two-sample MR methods7,8,9,10. Since summary statistics are often publicly available and provide abundant information of associations between genetic variants and complex traits, two-sample MR methods become increasingly popular9,11,12,13.

Conventional MR methods require the genetic variants included in the analysis to be valid for reliable causal inference. A genetic variant is called a valid IV if the following three core assumptions hold4,14:

### (A1) Relevance

The genetic variant is associated with the exposure;

### (A2) Effective Random Assignment

The genetic variant is not associated with any confounder of the exposure-outcome relationship;

### (A3) Exclusion Restriction

The genetic variant affects the outcome only through the exposure.

Among the three IV assumptions, the first assumption (A1) can be tested empirically by selecting genetic variants associated with the exposure in GWAS. However, assumptions (A2) and (A3) cannot be empirically verified in general and may be violated in practice, which leads to a biased estimate of the causal effect. The violation of (A2) may occur due to the presence of population stratification4,15. The violation of (A3) may occur in the presence of the horizontal pleiotropy4,16, which is a widespread biological phenomenon that the genetic variant affects the outcome through some other biological pathway that does not involve the exposure in view17,18.

Recently, several two-sample MR methods have been proposed to handle invalid IVs under certain assumptions. The Instrument Strength Independent of Direct Effect (InSIDE) assumption has been proposed and used by multiple methods, for example, the random-effects inverse-variance weighted (IVW) method19, MR-Egger20 and MR-RAPS (Robust Adjusted Profile Score) 11. The InSIDE assumption requires that the SNP-exposure effect is asymptotically independent of the horizontal pleiotropic effect when the number of IVs goes to infinity. However, the InSIDE assumption is often not plausible in practice21, and thus the estimate of causal effect might be biased using random-effects IVW, MR-Egger or MR-RAPS10. Another strand of methods imposes assumptions on the proportion of invalid IVs included in the analysis. For example, the weighted median method22 and the Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) test23 are based on the majority rule condition that allows up to 50% of the candidate IVs to be invalid. However, the weighted median method and MR-PRESSO might produce unreliable results when more than half of the candidate IVs are invalid10. The plurality rule condition, which only requires a plurality of the candidate IVs to be valid, is weaker than the majority rule condition 24,25, and is also termed as the ZEro Modal Pleiotropy Assumption (ZEMPA) 10,26. The plurality rule condition (or ZEMPA assumption) has been applied to some existing two-sample MR methods, for example, the mode-based estimation26, MRMix27 and the contamination mixture method25. Among the aforementioned methods, MRMix and the contamination mixture methods require additional distributional assumptions on the genetic associations or the ratio estimates to provide reliable causal inference. Despite many efforts, most of the current MR methods require an ad-hoc set of pre-determined genetic instruments, which is often obtained by selecting genetic variants with strong SNP-exposure associations in GWAS28. Since the traditional way of selecting IVs only requires the exposure data, hence the same selected set of IVs is used for assessing the causal relationships between the exposure in view and different outcomes. Obviously, this one-size-fits-all exposure-specific strategy for selecting IVs might not work well for different outcomes due to complicated genetic architecture; for example, the pattern of horizontal pleiotropy might vary with different outcomes. It is thus desirable to develop an automatic algorithm to select a set of valid IVs for a specific exposure-outcome pair.

In this paper, we propose a novel two-sample MR method and algorithm that can automatically Select valid IVs for a specific exposure-outcome pair and then performs Post-selection Inference (MR-SPI) for the causal effect. More specifically, MR-SPI contains the following four steps: (i) select relevant SNP IVs that are associated with the exposure; (ii) each selected relevant IV first provides a ratio estimate for the causal effect, and then receives votes on itself to be valid from other relevant IVs whose degrees of violation of (A2) and (A3) are small (thus more likely to be valid) under this ratio estimate of the causal effect; (iii) select valid IVs that receive a majority/plurality of votes, or by finding the maximum clique of the voting matrix that encodes whether two relevant IVs mutually vote for each other; and (iv) perform post-selection inference to construct a confidence interval for the causal effect that is robust to finite-sample IV selection error.

To the best of our knowledge, MR-SPI is the first two-sample MR method that utilizes both exposure and outcome data to automatically select a set of exposure-outcome pair specific SNP IVs. Moreover, our proposed selection procedure does not require additional distributional assumptions to model the genetic effects or ratio estimates25,27. Extensive simulations show that our MR-SPI method outperforms other competing MR methods. We apply MR-SPI to infer the causal relationships among 146 exposure-outcome pairs involving COVID-19 related traits, ischemic stroke, cholesterol levels and heart disease, and detect significant associations. Furthermore, We employ MR-SPI to perform omics MR (xMR) with 912 proteins using UK Biobank proteomics data, and discover 7 proteins significantly associated with the risk of Alzheimer’s disease.

## 2 Results

### 2.1 Method overview

MR-SPI is an automatic procedure to select valid genetic instruments and perform robust causal inference using two-sample GWAS summary data. In brief, MR-SPI contains the following four steps, as illustrated in Figure 1: 

1.  select relevant SNPs with large IV strength in the GWAS summary data for the exposure;

2.  each relevant SNP provides a ratio estimate of the causal effect, and all the other relevant SNPs votes for it to be a valid IV if their degrees of violation of (A2) and (A3) are small under this ratio estimate of the causal effect;

3.  select valid IVs by majority/plurality voting or by finding the maximum clique of the voting matrix;

4.  estimate the causal effect of interest using the selected valid IVs, and construct a confidence interval for the causal effect that is robust to IV selection error in finite samples.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F1)

Figure 1: 
The framework of MR-SPI. In the first step, MR-SPI selects the relevant IVs with strong SNP-exposure associations. In the second step, each relevant IV receives votes on itself to be valid from the other relevant IVs whose degrees of violation of (A2) and (A3) are small under its ratio estimate of causal effect. For example, by assuming SNP 1 is valid, the slope of the line connecting SNP 1 and the origin represents the ratio estimate of SNP 1, and SNP 2 and 3 vote for SNP 1 to be valid because they are close to that line, while SNP 4, 5 and 6 vote against it since they are far away from that line. In the third step, MR-SPI selects valid IVs according to some voting rule. In the inference step, MR-SPI estimates the causal effect and provides the corresponding confidence interval using the selected valid IVs.

Current two-sample MR methods only involve step (i) to select (relevant) genetic instruments for downstream MR analysis, while the selected genetic instruments might violate assumptions (A2) and (A3), leading to possibly unreliable scientific findings. To address this issue, MR-SPI automatically select valid genetic instruments for a specific exposure-outcome pair by further incorporating the outcome data. Our key idea of selecting valid genetic instruments is that, under the plurality rule condition, valid IVs will form the largest group of relevant IVs and give “similar” ratio estimates (see Online Methods). Specifically, we propose the following two criteria to measure the similarity between the ratio estimates of two SNPs *j* and *k* in step (ii):

**C1**: We say the *k*th SNP “votes for” the *j*th SNP to be a valid IV if, by assuming the *j*th SNP is valid, the *k*th SNP’s degree of violation of (A2) and (A3) is small;

**C2**: We say the ratio estimates of two SNPs *j* and *k* are “similar” if they mutually vote for each other to be valid.

In step (iii), we construct a symmetric and binary voting matrix to encode the votes that each relevant SNP receives from other relevant SNPs: the (*k, j*) entry of the voting matrix is 1 if SNPs *j* and *k* mutually vote for each other to be valid, and 0 otherwise. There are two ways to select valid genetic instruments based on the voting matrix (see Online Methods): (1) we can select relevant SNPs who receive majority voting or plurality voting as valid IVs; (2) we can use SNPs in the maximum clique of the voting matrix as valid IVs29. Simulation studies show that the maximum clique method can empirically offer lower false discovery rate (FDR)30 and higher true positive proportion (TPP).

In step (iv), we estimate the causal effect and construct a confidence interval for this causal effect using the selected valid genetic instruments. In finite samples, some invalid IVs with small (but still nonzero) degrees of violation of (A2) and (A3) might be incorrectly selected as valid IVs, and we refer to them as “locally invalid IVs”31. We then propose to construct a robust confidence interval with guaranteed nominal coverage even in the presence of IV selection error in finite-sample settings, with main steps described in Figure 6 and Online Methods.

### 2.2 Comparing MR-SPI to other competing methods with simulation studies

We conduct extensive simulations to evaluate the performance of MR-SPI in the presence of invalid IVs. We simulate data in a two-sample setting under four scenarios: (**S1**) majority rule condition holds and no locally invalid IVs exist; (**S2**) plurality rule condition holds and no locally invalid IVs exist; (**S3**) majority rule condition holds and locally invalid IVs exist; (**S4**) plurality rule condition holds and locally invalid IVs exist. The detailed simulation settings are provided in Online Methods.

We compare the bias, empirical coverage and average lengths of 95% confidence intervals of MR-SPI to the following competing methods: (i) the random-effects IVW method that performs random-effects meta-analysis to account for pleiotropy19, (ii) MR-RAPS that assumes pleiotropic effects are normally distributed and applies the maximum profile likelihood estimation to obtain the causal effect estimate11, (iii) MR-PRESSO that detects the SNPs which substantially reduce the residual sum of squares of the regression when omitted from the analysis as outliers23, (iv) the weighted median method that takes the weighted median of the ratio estimates as the causal effect22, (v) the mode-based estimation that takes the mode of the smoothed empirical density function of the ratio estimates as the causal effect 26, (vi) MRMix that models the SNP-exposure and SNP-outcome effects with a bivariate normal mixture distribution27 and (vii) the contamination mixture method that models the ratio estimates of SNPs with a normal mixture distribution25. We exclude MR-Egger in this simulation since it is heavily biased under our simulation settings. Among those methods, the random-effects IVW method and MR-RAPS require the InSIDE assumption, MR-PRESSO and the weighted median method require the majority rule condition, while MR-SPI, the mode-based estimation, MRMix and the contamination mixture method require the plurality rule condition (or ZEMPA assumption). For simplicity, we shall use IVW to represent the random-effects IVW method here and after.

In Figure 2(a), we present the bias of those MR methods in simulated data with sample sizes of 5000 for both the exposure and the outcome. Moreover, Supplementary Figure S1(a) and Supplementary Table S1 provide a comparison of bias of those MR methods across different sample sizes. Generally, the proposed MR-SPI has small bias in all four scenarios. IVW and MR-RAPS are biased since the InSIDE assumption does not hold in our parameter settings. Interestingly, the biases of these two methods are smaller in scenarios (**S3**) and (**S4**) compared to (**S1**) and (**S2**), as the degree of violation of (A2) and (A3) is generally smaller when some of the candidate IVs are only locally invalid. MR-PRESSO generally yields biased estimates as it fails to remove outliers in most of our settings. The weighted median estimator is biased when only the plurality rule condition holds, since it requires more than half of the candidate IVs to be valid. The mode-based estimation, MRMix and the contamination mixture method are all nearly unbiased, as these three methods only require the plurality rule condition to hold, which is satisfied in all of the four simulation scenarios.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F2)

Figure 2: 
Performance of MR-SPI and the other competing MR methods in simulated data with sample sizes of 5000. (a) Boxplot of the absolute value of bias in causal effect estimates. (b) Empirical coverage of 95% confidence intervals. The green dashed line in (b) represents the nominal level (95%). (c) Average lengths of 95% confidence intervals.

Figure 2(b) reports the empirical coverage of the confidence intervals of those methods in simulated data with sample sizes of 5000. Additional results for empirical coverage of those methods under different sample sizes can be found in Supplementary Figure S1(b) and Supplementary Table S2. Under scenarios (**S1**) and (**S2**) where no locally invalid IV exists, the confidence interval of MR-SPI can attain 95% coverage level even when the sample sizes are small (e.g., 5000). In the presence of locally invalid IVs, i.e., under scenarios (**S3**) and (**S4**), the empirical coverage of the confidence interval of MR-SPI can still attain the nominal level when sample sizes are 80000, as MR-SPI can correctly distinguish the locally invalid IVs from the valid IVs when sample sizes are large enough. However, MR-SPI fails to identify those locally invalid IVs under (**S3**) and (**S4**) if the sample sizes are small (e.g., 5000), and therefore the empirical coverage of MR-SPI is lower than 95%. In such cases, we suggest using the robust confidence interval constructed by Algorithm 2, which attains the 95% coverage level and thus is less vulnerable to the IV selection error in finite samples. The empirical coverage of the weighted median method is lower than MR-SPI even in scenario (**S1**) where the majority rule condition holds. For example, when the sample sizes are 20000, the empirical coverage of the weighted median method is 0.638. Compared to the confidence interval of MR-SPI, the confidence interval of the mode-based estimation is generally more conservative with coverage above the nominal level in our simulation settings, which is the price to pay for being less affected by the invalid instruments, as discussed in26. Both MRMix and the contamination mixture method cannot attain the 95% coverage level in all of the four simulation scenarios. These two methods make distributional assumptions for either the genetic associations or the ratio estimates, which might be violated in our simulation settings, and thus the coverage levels are below the nominal level.

We report the average lengths of 95% confidence intervals of MR-SPI and the competing methods under sample sizes = 5000in Figure 2(c), while additional results for various sample sizes are provided in Supplementary Figure S1(c) and Supplementary Table S3. Although MR-RAPS generally has the shortest confidence interval, it is biased and the coverage level is close to zero, as the InSIDE assumption does not hold in our simulation settings. Among the methods except MR-RAPS, MR-SPI generally has the shortest confidence interval under four simulation scenarios. The average length of confidence interval of IVW is not significantly decreasing as the sample sizes increase, since we apply the random-effects IVW method here, which scales up the standard error of the causal effect estimate when heterogeneity in the ratio estimates exists19. In scenario (**S4**), MR-PRESSO has longer confidence interval as the sample sizes increase, since it tends to treat none of the candidate SNPs as outlier under this simulation setting, i.e., when the majority rule does not hold and locally invalid IV exists. When no outlier is identified, MR-PRESSO uses all candidate SNPs, and thus the standard error of MR-PRESSO under (**S4**) will be close to that of IVW when the sample sizes are large.

In Table 1, we report (1) the FDR that is defined by the proportion of invalid IVs in the set of SNPs selected by MR-SPI, and (2) the TPP that is defined by the proportion of valid IVs selected by MR-SPI in the true set of valid IVs. In our simulation, we select valid IVs by finding the maximum clique in the voting matrix. Generally, the TPP of MR-SPI is close to 1 under all scenarios, and the FDR of MR-SPI is close to 0 if no locally invalid IV exists and the plurality rule condition holds. Under scenarios (**S3**) and (**S4**), MR-SPI might incorrectly select those locally invalid IVs when the sample sizes are small (e.g. 5000). As the sample sizes increase, the FDR of MR-SPI would be close to 0 even in the presence of locally invalid IVs. For example, the FDR is 0.005 under scenario **(S4)** when sample sizes are 80000. Therefore, even when locally invalid IV exists, MR-SPI can still correctly identify valid IVs if the sample sizes are large.

View this table:
[Table 1:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/T1)

Table 1: 
The FDR and TPP of valid IV selection by MR-SPI under different scenarios and sample sizes. The FDR is close to 0 in the absence of locally invalid IV or when the sample sizes are large. The TPP is close to 1 under all scenarios.

The simulation studies demonstrate that MR-SPI performs better compared to the other competing MR methods under the plurality rule condition. When no locally invalid IV exists, MR-SPI can select the valid IVs correctly and provide nearly unbiased estimates of the causal effect, and the confidence interval of MR-SPI can attain the nominal coverage level. In practice, we can perform a sensitivity analysis of the causal effect estimate to the threshold in the voting step (see Online Methods and Supplementary Figure S12). If the causal effect estimate is sensitive to the choice of the threshold, then MR-SPI might suffer from the finite-sample IV selection error, and thus the robust confidence interval of MR-SPI is recommended for use in this case.

### 2.3 Evaluation of the performance of MR-SPI in two benchmark datasets

In this section, we apply the proposed MR-SPI method to two benchmark datasets to evaluate its performance. These two datasets serve as the benchmark because the exposure and the out-come are the same trait in each dataset, and thus the horizontal pleiotropic effects are expected to be zero. We first apply MR-SPI to the dataset in which both the exposure and the outcome are coronary artery disease (CAD), and we refer to it as the CAD-CAD dataset. Since both the exposure and the outcome are CAD, the causal effect is expected to be one. The exposure data come from the Coronary Artery Disease (C4D) Genetics Consortium 32, and the outcome data come from the Coronary ARtery DIsease Genome-wide Replication and Meta-analysis (CARDIo-GRAM) consortium33. We first clump the SNPs in the exposure data using the software Plink34 with *r*2 *<* 1 *×* 10−2 to obtain the independent genetic instruments, and then use 1 *×* 10−6 as the *p*-value threshold to select relevant instruments. In total, five relevant instruments are included for downstream analysis. We compare MR-SPI to the other eight competing MR methods including IVW, MR-Egger, MR-RAPS, MR-PRESSO, the weighted median method, the mode-based estimation, MRMix and the contamination mixture method. The causal effect estimates and the corresponding 95% confidence intervals using those methods are presented in Figure 3(a). Generally, the confidence intervals of MR-SPI, IVW and MR-Egger all cover 1, and MR-SPI provides the shortest confidence interval. In addition, none of the relevant IVs is excluded in the voting step, which is in line with the expectation that horizontal pleiotropy should not exist in this dataset.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F3)

Figure 3: 
Point estimates and 95% confidence intervals for the causal effects of (a) CAD-CAD dataset and (b) BMI-BMI dataset using different MR methods. Confidence intervals are clipped to arrows if they exceed axis limits. CAD: coronary artery disease; BMI: body mass index.

Next, we apply MR-SPI to the dataset where the exposure data and the outcome data are the BMI GWAS data for physically active men and women respectively, and we refer to this dataset as the BMI-BMI dataset. In the BMI-BMI dataset, both the exposure data and the outcome data come from the GIANT consortium35. After LD clumping and filtering SNPs with the same parameters as in the CAD-CAD dataset, 64 candidate SNPs are selected as relevant IVs and none of them is detected to be invalid by MR-SPI. The point estimates of the causal effect and corresponding 95% confidence intervals using the above methods are shown in Figure 3(b). Overall, all the above methods except MR-Egger provide causal effect estimates that are significantly below one. As discussed in previous studies35, some significant loci of BMI might exhibit heterogeneity in genetic effects between men and women. Therefore, the “true” effect might not be equal to one in this dataset due to the gender difference in the genetic architecture of BMI.

### 2.4 Learning causal relationships of 146 exposure-outcome pairs

In this section, we examine the causal relationships between complex traits and diseases from four categories including ischemic stroke, cholesterol levels, heart disease, and Coronavirus disease 2019 (COVID-19) related traits. Since MR-SPI requires that the GWAS summary statistics of the exposure and the outcome come from two non-overlapping samples, we exclude the trait pairs whose exposure and outcome are in the same consortium. In addition, we also exclude trait pairs whose exposure and outcome are two similar phenotypes (for example, heart failure and coronary artery disease), and we finally get 146 pairwise exposure-outcome combinations. All the GWAS summary statistics used for MR analysis are publicly available with more detailed description of each dataset given in Supplementary Table S4.

We first perform LD clumping using the software Plink34 to obtain independent SNPs with *r*2 *<* 0.01, and then use 1 *×* 10−6 as the *p*-value threshold to select relevant IVs that are associated with each exposure trait. Among the 146 exposure-outcome pairs, MR-SPI detects invalid IVs in 16 exposure-outcome pairs. For example, MR-SPI detects one invalid SNP (rs616154, marked by red triangle) in the causal relationship from cardioembolic stroke (CES) to SARS-CoV-2 infection, as illustrated in the left panel of Figure 4(a). SNP rs616154 is identified to be invalid since its ratio estimate of the causal effect is 0.525, which is far away from other SNPs’ ratio estimates and thus no other relevant SNP votes for it to be a valid IV. We search for the human phenotypes that are strongly associated with SNP rs616154 using the PhenoScanner tool 36,37, and find that this SNP is also associated with the Interleukin-6 (IL-6) levels which is a potential biomarker of COVID-19 progression38, indicating that SNP rs616154 might exhibit horizontal pleiotropy in the relationship of CES on SARS-CoV-2 infection and thus is a potentially invalid IV. After excluding SNP rs616154, the point estimate of the causal effect by MR-SPI (represented by the slope of the green solid line in the left panel of Figure 4(a)) is nearly zero, suggesting that cardioembolic stroke might not be a risk factor for SARS-CoV-2 infection. The causal effect estimate of MR-PRESSO (represented by the slope of the blue dashed line in the left panel of Figure 4(a)) is also close to zero, as MR-PRESSO detects SNP rs616154 as an outlier and excludes it from analysis. However, IVW and MR-RAPS include SNP rs616154 in the MR analysis, and thus their causal effect estimates (represented by the slopes of the black and orange dashed line in the left panel of Figure 4(a), respectively) might be biased. In contrast, the right panel of Figure 4(a) illustrates the causal effect estimates for the relationship of heart failure (HF) on any ischemic stroke (AIS) by MR-SPI, IVW, MR-PRESSO and MR-RAPS. In this relationship, MR-SPI does not identify any invalid IV, and thus MR-SPI gives a causal effect estimate that is similar to IVW and MR-RAPS.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F4)

Figure 4: 
**(a)** Scatter plot of cardioembolic stroke on SARS-CoV-2 infection (left panel), and heart failure on any ischemic stroke (right panel). The slope of the green solid line represents the causal effect estimate of MR-SPI. The slopes of the black, blue and orange dashed line represents the causal effect estimate of IVW, MR-PRESSO and MR-RAPS, respectively. Green circles represent the valid IVs and red triangles represent the invalid IVs detected by MR-SPI. **(b)** Direction of causal associations detected by MR-SPI. The significant positive and negative associations after Bonferroni correction are marked by blue filled circles and red filled circles, respectively. The radius of a circle is proportional to the − log10(*p*-value) of the corresponding exposure-outcome pair. Those pairs whose exposure and outcome come from the same consortium or are two similar phenotypes are marked as grey cells. **(c)** Venn diagram of significant associations detected by MR-SPI, the mode-based estimation, MRMix and the contamination mixture method after Bonferroni correction. **(d)** Significant associations detected by MR-SPI using the robust confidence interval. The red bars represent the default confidence interval calculated using the causal effect estimates and the corresponding standard errors by MR-SPI, and the blue bars represents the robust confidence interval constructed by the searching and sampling method. AIS: any ischemic stroke; LAS: large-artery atherosclerotic stroke; SVS: small vessel stroke; CES: cardioembolic stroke; LDL: low-density lipoprotein; HDL: high-density lipoprotein; TC: total cholesterol; TG: triglycerides; HF: heart failure; CAD: coronary artery disease; AF: atrial fibrillation; SEVERE: severe COVID-19; HOSPITAL: COVID-19 hospitalization; INFECTION: SARS-CoV-2 infection.

Figure 4(a) illustrates that the inclusion of invalid IVs might lead to misleading scientific findings, and thus MR-SPI selects only valid IVs for downstream analysis to provide reliable causal inference. After excluding those invalid IVs, MR-SPI identifies 27 significant associations after Bonferroni correction for multiple comparison 39, which are given in Figure 4(b). We also apply the other eight competing MR methods including IVW, MR-Egger, MR-RAPS, MR-PRESSO, the weighted median method, the mode-based estimation, MRMix and the contamination mixture method to infer the causal relationships among these exposure-outcome pairs, and the results are presented in Supplementary Figures S2-S9. Among the 146 exposure-outcome pairs, MR-SPI detects invalid IVs in 16 exposure-outcome pairs. Some of our findings are in line with previous studies, for example, an increase in LDL level might be associated with increased risks of CAD and HF40,41. In addition, MR-SPI also detects significant associations that cannot be discovered by other competing MR methods. For example, MR-SPI suggests that SARS-CoV-2 infection might be a risk factor for HF (![Graphic][1]</img>, *p*-value = 1.43 *×* 10−4), which cannot be identified by the other competing MR methods considered in this paper. Our finding is consistent with a former study that reported a significant increase in the risk of developing acute heart failure in patients with confirmed COVID-19 infection42.

To demonstrate the similarities and differences in the results of MR-SPI and other MR methods, we plot the Venn diagrams that show the number of significant associations that are either shared or uniquely detected by these methods. We present the Venn diagram of the significant pairs using MR-SPI, the mode-based estimation, MRMix and the contamination mixture method in Figure 4(c), as these four methods are all based on the plurality rule condition. Venn diagrams that compare MR-SPI and the other competing MR methods can be found in Supplementary Figures S10 and S11. From Figure 4(c), MR-SPI generally detects more significant associations than the mode-based estimation, MRMix and the contamination mixture method among these 146 exposure-outcome pairs. Indeed, these three competing MR methods fail to discover some causal relationships that have been supported from previous literature. For example, the mode-based estimation, MRMix and the contamination mixture method fail to detect that an increased HDL level might be associated with a decreased risk of CAD, which is identified by MR-SPI (![Graphic][2]</img>, *p*-value = 3.73 *×* 10−17) and has been supported with evidence by previous epidemiological studies43,44. Supplementary Figure S10 compares the significant relationships detected by MR-SPI and three MR methods that require InSIDE assumption (IVW, MR-Egger and MR-RAPS). MR-RAPS detects 17 significant associations that are not identified by MR-SPI, of which some associations might be spurious. For example, MR-RAPS suggests significant associations of AIS on low-density lipoprotein (LDL), high-density lipoprotein (HDL) and total cholesterol (TC) level. However, the reverse association, i.e., cholestrol level on the risk of stroke, has been reported in previous epidemiological studies45,46. In Supplementary Figure S11, we compare MR-SPI with MR-PRESSO and the weighted median method that both require the majority rule condition. MR-PRESSO and the weighted median detect 14 and 10 significant associations, respectively, all of which are also identified by MR-SPI. Besides, MR-SPI identifies 11 more significant associations, most of which are in line with previous epidemiological studies, for example, HF might be a risk factor for ischemic stroke47,48.

To deal with the issue of potential IV selection error, we also construct robust confidence intervals of these exposure-outcome pairs by MR-SPI according to Algorithm 2. Generally, MR-SPI discovers four significant associations whose robust confidence intervals do not include zero (CES on HF, LDL on CAD, TC on CAD, and atrial fibrillation (AF) on CES), and we compare the robust confidence intervals (represented by blue bars) with the default confidence intervals calculated by equation (12) (represented by red bars) in Figure 4(d). As shown in Figure 4(d), the robust confidence intervals are longer than the default confidence intervals of MR-SPI, indicating that locally invalid IVs might exist and might be incorrectly selected in these datasets. Therefore, we suggest using the robust confidence intervals for inference in these four relationships to provide more reliable causal findings.

### 2.5 Identification of Alzheimer’s disease-associated proteins using MR-SPI

Omics MR (xMR) aims to identify omics biomarkers (e.g., proteins) causally associated with complex traits and diseases. In particular, xMR with proteomics data enables the identification of disease-associated proteins, facilitating crucial advancements in drug target discovery, disease prevention, and treatment strategies. In this section, we apply MR-SPI to identify protein biomarkers putatively causally associated with the risk of Alzheimer’s disease (AD). The proteomics data used in our analysis comprises 54,306 participants from the UK Biobank Pharma Proteomics Project (UKB-PPP)49. Following the guidelines proposed by Sun et al. 49, significant (*p*-value *<* 3.40 *×* 10−11, accounting for Bonferroni correction) and independent (*r*2 *<* 0.01) SNPs are extracted from the proteomics data as candidate genetic instruments, and thus all of these candidates SNPs are strongly associated with the exposures (proteins). Summary statistics for AD are obtained from a meta-analysis of GWAS studies for clinically diagnosed AD and AD-by-proxy, comprising 455,258 samples in total50. For MR method comparison, we analyze 912 proteins that share four or more candidate SNPs within the summary statistics for AD.

As presented in Figure 5(a), MR-SPI identifies 7 proteins that are significantly associated with AD after Bonferroni correction, including CD33, CD55, EPHA1, PILRA, PILRB, PRSS8, RET, and TREM2. Among them, 4 proteins contribute to an increased risk of AD (CD33, PILRA, PILRB, and RET), while the other 3 proteins contribute to a decreased risk of AD (CD55, EPHA1, and TREM2). Previous studies have revealed that these proteins and the corresponding proteincoding genes might contribute to the pathogenesis of AD51,52,53,54,55. For example, it has been found that CD33 plays a key role in modulating microglial pathology in AD, with TREM2 acting downstream in this regulatory pathway53. Additionally, RET at mitochondrial complex I is activate during ageing, which might contribute to an increased risk of ageing-related diseases including AD55. These findings highlight the potential therapeutic opportunities that target these proteins for the treatment of AD.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F5)

Figure 5: 
**(a)** Volcano plot of associations of proteins with Alzheimer’s disease using MR-SPI. The *x*-axis represents the estimated effect (on the log odds ratio scale), and the *y*-axis represents the − log10(*p*-value). Positive and negative associations are represented by green and red points, respectively. The size of a point is proportional to the − log10(*p*-value). The blue dashed line represents the significance threshold accounting for Bonferroni correction (*p*-value*<* 5.48 *×* 10−5). **(b)** Forest plot of significant associations of proteins with Alzheimer’s disease identified by MR-SPI. Point estimates and 95% confidence intervals for the associations using the other competing MR methods are presented in different colors. Confidence intervals are clipped to *y*-axis limits. **(c)** Bubble plot of GO analysis results using the 7 significant proteins detected by MR-SPI. The *x*-axis represents the *z*-score of the enriched GO term, and the *y*-axis represents the − log10(*p*-value) after Bonferroni correction. Each point represents one enriched GO term. The blue dashed line represents the significance threshold (adjusted *p*-value*<* 0.05). **(d)** Table of the GO ID, description and source of the significant GO terms using the 7 significant proteins detected by MR-SPI. BP: biological process; CC: cellular component; MF: molecular function.

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F6)

Figure 6: 
The procedure of constructing the robust confidence interval by MR-SPI. When locally invalid IVs exist in finite samples, MR-SPI might incorrectly select invalid IVs (marked by the red cross). In such cases, a robust confidence interval can be constructed to improve the coverage probability. First, we construct an initial interval using SNPs in ![Graphic][3]</img> and discretize it to a grid set. Second, we repeatedly sample the estimators of ***γ*** and **Γ**. Third, we find a confidence interval for each sampling (marked by blue line segments) by grid search, and then aggregate these confidence intervals to construct the robust confidence interval (marked by the red line segment). Note that the confidence interval in the third sampling is empty since the majority rule is violated.

In Figure 5(b), we present the point estimates and 95% confidence intervals of the effects (on the log odds ratio scale) of these 7 proteins on AD using the other competing MR methods. From Figure 5(b), these proteins are identified by most of the competing MR methods, confirming the robustness of our findings. Notably, in the relationship of TREM2 on AD, MR-SPI detects one possibly invalid IV, SNP rs10919543, which is associated with red blood cell count according to PhenoScanner. Red blood cell count is a known risk factor for AD 56,57, and thus SNP rs10919543 might exhibit pleiotropy in the relationship of TREM2 on AD. After excluding this potentially invalid IV, MR-SPI suggests that TREM2 is negatively associated with the risk of AD (![Graphic][4]</img>, *p*-value = 1.20 *×* 10−18). Additionally, we perform the gene ontology (GO) enrichment analysis using the g:Profiler web server58([https://biit.cs.ut.ee/gprofiler/gost](https://biit.cs.ut.ee/gprofiler/gost)) to gain biological insights for the set of proteins identified by MR-SPI, and the results are presented in Figure 5(c) and 5(d). After Bonferroni correction, the GO analysis indicates that these proteins are significantly enriched in 21 GO terms, such as the metabolic process, MHC protein binding, and transmembrane receptor protein kinase activity.

## 3 Discussion

In this paper, we develop a novel two-sample MR method and algorithm, named MR-SPI, to automatically select valid SNPs from GWAS studies and perform post-selection inference. MR-SPI first selects relevant IVs with strong SNP-exposure associations, and then applies the voting procedure to select a plurality of the relevant IVs whose ratio estimates are similar to each other as valid IVs. In case that the causal effect estimate of MR-SPI is biased due to the selection of locally invalid IVs in finite samples, MR-SPI can provide a robust confidence interval constructed by the searching and sampling method, which is less vulnerable to IV selection error. We show with extensive simulation studies that MR-SPI can be helpful to select valid genetic instruments among candidate SNPs for a specific exposure-outcome pair and provide robust confidence interval for the causal effect when invalid IVs exist. Through data analyses, we demonstrate that MR-SPI can provide reliable causal findings by automatically selecting valid genetic instruments. We apply MR-SPI to infer the causal relationships among 146 trait pairs, and detect significant associations. Furthermore, we employ MR-SPI to conduct xMR analysis with 912 proteins using the proteomics data from UK Biobank, and identify 7 proteins significantly associated with the risk of Alzheimer’s disease. These findings highlight the potential of MR-SPI as a powerful tool in the identification of therapeutic targets for disease prevention and treatment.

We emphasize two main advantages of MR-SPI in this paper. First, MR-SPI can incorporate both exposure and outcome data to automatically select a set of valid genetic instruments in genome-wide studies, and the selection procedure does not rely on additional distributional assumptions on the genetic effects. Therefore, MR-SPI is the first to offer such a practical approach to select valid instruments for a specific exposure-outcome pair from GWAS studies for MR analyses, which is especially helpful in the existence of wide-spread horizontal pleiotropy. Second, we propose a robust confidence interval for the causal effect using the searching and sampling method, which is less vulnerable to the IV selection error. Therefore, when locally invalid IVs are incorrectly selected and the causal effect estimate is biased in finite samples, we can still provide reliable inference for the causal effect with the robust confidence interval.

MR-SPI also has some limitations. Currently, MR-SPI can only perform causal inference using independent SNPs from two non-overlapping samples. As a future work, we plan to extend MR-SPI to include SNPs with LD structure from summary statistics of two possibly overlapping samples. Besides, the robust confidence interval is slightly more conservative than the confidence interval calculated from the limit distribution of the causal effect estimate, which is the price to pay for the robustness to the finite-sample IV selection error. Further studies are needed to construct less conservative confidence intervals that are robust to the IV selection error.

In conclusion, MR-SPI provides an automatic approach to selecting valid instruments among candidate SNPs and perform causal inference using two-sample GWAS summary statistics. Simulation studies and data analyses have shown that MR-SPI can provide reliable inference for the causal relationships even in the presence of invalid IVs. Our developed software is also user-friendly and highly efficient. We hope that MR-SPI can help researchers to detect more trustworthy causal mechanisms with increasingly rich and publicly available GWAS and multi-omics datasets.

## Data Availability

All data produced in the present study are available upon reasonable request to the authors. 

## Software availability

The R package **MR.SPI** is publicly available at [https://github.com/MinhaoYaooo/MR-SPI](https://github.com/MinhaoYaooo/MR-SPI).

## Data availability

All of the GWAS data analyzed are publicly available with the following URLs:

*   CARDIoGRAMplusC4D consortium: [http://www.cardiogramplusc4d.org/data-downloads/](http://www.cardiogramplusc4d.org/data-downloads/);

*   GIANT consortium: [https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT\_consortium\_data\_files](https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files);

*   MEGASTROKE consortium: [http://megastroke.org/download.html](http://megastroke.org/download.html);

*   Global Lipids Genetics Consortium (GLGC): [http://csg.sph.umich.edu/willer/public/lipids2013/](http://csg.sph.umich.edu/willer/public/lipids2013/);

*   GWAS for heart failure: [https://www.ebi.ac.uk/gwas/publications/31919418](https://www.ebi.ac.uk/gwas/publications/31919418);

*   GWAS for atrial fibrillation: [https://www.ebi.ac.uk/gwas/publications/30061737](https://www.ebi.ac.uk/gwas/publications/30061737);

*   The COVID-19 Host Genetics Initiative: [https://www.covid19hg.org/](https://www.covid19hg.org/).

*   GWAS for Alzheimer’s disease: [https://ctg.cncr.nl/software/summary\_statistics](https://ctg.cncr.nl/software/summary_statistics).

*   UK Biobank proteomics data: [https://europepmc.org/article/ppr/ppr508031](https://europepmc.org/article/ppr/ppr508031).

## Online Methods

### Two-sample GWAS summary statistics

Suppose that we obtain *p* independent SNPs ***Z*** = (*Z*1, …, *Z**p*)**⊺** by using LD clumping that keeps one representative SNP per LD region34. We also assume that the SNPs are standardized such that 𝔼*Z**j* = 0 and Var(*Z**j*) = 1 for 1 ≤ *j* ≤ *p*. Let *D* denote the exposure and *Y* denote the outcome. Consider the following linear structural models 24,59: ![Formula][5]</img>  ![Formula][6]</img>  where *β* represents the causal effect of interest, ***γ*** = (*γ*1, …, *γ**p*)**⊺** represents the IV strength, and ***π*** = (*π*1, …, *π**p*)**⊺** encodes the violation of assumption (A2) and (A3). If assumptions (A2) and (A3) hold for SNP *j*, then *π**j* = 0 and otherwise *π**j* ≠ 0 (see Supplementary Section S1 for details). The error terms *δ* and *e* with variances ![Graphic][7]</img> and ![Graphic][8]</img> respectively are possibly correlated due to unmeasured confounding factors. By plugging the exposure model (1) into the outcome model (2), we obtain the reduced-form outcome model: ![Formula][9]</img>  where *ϵ* = *βδ* + *e*. Let **Γ** = (Γ1, …, Γ*p*)**⊺** denote the SNP-outcome associations, then we have **Γ** = *β****γ*** + ***π***. If *γ**j* ≠ 0, then SNP *j* is called a relevant IV. If both *γ**j* ≠ 0 and *π**j* = 0, then SNP *j* is called a valid IV. Let 𝒮 = {*j* : *γ**j* ≠ 0, 1 ≤ *j* ≤ *p*} denote the set of all relevant IVs, and 𝒱 = {*j* : *γ**j* ≠ 0 and *π**j* = 0, 1 ≤ *j* ≤ *p*} denote the set of all valid IVs. The majority rule condition can be expressed as ![Graphic][10]</img>, and the plurality rule condition can be expressed as |𝒱| *>* max*c*≠0 |{*j* ∈ 𝒮 : *π**j**/γ**j* = *c*}|24. If the plurality rule condition holds, then valid IVs with the same ratio of SNP-outcome effect to SNP-exposure effect will form a plurality. Based on this key fact, our proposed MR-SPI selects the largest group of SNPs with similar ratio estimates of the causal effect as valid IVs using a voting procedure to be described in detail in Section 3.

Let ![Graphic][11]</img> and ![Graphic][12]</img> be the estimated marginal effects of SNP *j* on the exposure and the outcome, and ![Graphic][13]</img> and ![Graphic][14]</img> be the corresponding standard errors respectively. Let ![Graphic][15]</img> and ![Graphic][16]</img>. In the two-sample setting, the summary statistics ![Graphic][17]</img>and ![Graphic][18]</img> are calculated from two non-overlapping samples with sample sizes *n*1 and *n*2 respectively. When all the SNPs are independent of each other, the joint asymptotic distribution of ![Graphic][19]</img> and ![Graphic][20]</img> is: ![Formula][21]</img>  where the diagonal entries of **V***γ* and **V**Γ are ![Graphic][22]</img> and ![Graphic][23]</img>, respectively, and the off-diagonal entries of **V***γ* and **V**Γ are ![Graphic][24]</img> and ![Graphic][25]</img>, respectively. The derivation of the limit distribution (3) can be found in Supplementary Section S2. Therefore, with the summary statistics of the exposure and the out-come, we can estimate ![Graphic][26]</img>and ![Graphic][27]</img>as: ![Formula][28]</img>  

After obtaining ![Graphic][29]</img>, we can perform the proposed IV selection procedure as illustrated in Figure 1.

### Selecting valid instruments by voting

The first step of MR-SPI is to select relevant SNPs with large IV strength using GWAS summary statistics for the exposure. Specifically, we estimate the set of relevant IVs 𝒮 by: ![Formula][30]</img>  where ![Graphic][31]</img> is the standard error of ![Graphic][32]</img> in the summary statistics, Φ−1 (*·*) is the quantile function of the standard normal distribution, and *α** is the user-specified threshold with the default value of 1 *×* 10−6. This step is equivalent to filtering the SNPs in the exposure data with *p*-value *< α**, and is adopted by most of the current two-sample MR methods to select (relevant) genetic instruments for downstream MR analysis. Note that the selected genetic instruments may not satisfy the IV independence and exclusion restriction assumptions and thus maybe invalid. In contrast, our proposed MR-SPI further incorporates the outcome data to automatically select a set of valid genetic instruments from ![Graphic][33]</img> for a specific exposure-outcome pair.

Under the plurality rule condition, valid genetic instruments with the same ratio of SNP-outcome effect to SNP-exposure effect (i.e., Γ*j**/γ**j*) will form a plurality and yield “similar” ratio estimates of the causal effect. Based on this key fact, MR-SPI selects a plurality of relevant IVs whose ratio estimates are “similar” to each other as valid IVs. Specifically, we propose the following two criteria to measure the similarity between the ratio estimates of two SNPs *j* and *k*:

**C1**: We say the *k*th SNP “votes for” the *j*th SNP to be a valid IV if, by assuming the *j*th SNP is valid, the *k*th SNP’s degree of violation of (A2) and (A3) is small;

**C2**: We say the ratio estimates of two SNPs *j* and *k* are “similar” if they mutually vote for each other to be valid.

The ratio estimate of the *j*th SNP is defined as ![Graphic][34]</img>. By assuming the *j*th SNP is valid, the plug-in estimate of the *k*th SNP’s degree of violation of (A2) and (A3) can be obtained by ![Formula][35]</img>  as we have Γ*k* = *βγ**k* + *π**k* for the true causal effect *β*, and ![Graphic][36]</img> for the ratio estimate ![Graphic][37]</img> of the *k*th SNP. From equation (6), ![Graphic][38]</img> has two noteworthy implications. First, ![Graphic][39]</img> measures the difference between the ratio estimates of SNPs *j* and *k* (multiplied by the *k*th SNP-exposure effect estimate ![Graphic][40]</img>), and a small ![Graphic][41]</img> implies that the difference scaled by ![Graphic][42]</img> is small. Second, ![Graphic][43]</img> represents the *k*th IV’s degree of violation of (A2) and (A3) by regarding the *j*th SNP’s ratio estimate ![Graphic][44]</img> as the true causal effect, thus a small ![Graphic][45]</img> implies a strong evidence that the *k*th IV supports the *j*th IV to be valid. Therefore, we say the *k*th IV votes for the *j*th IV to be valid if: ![Formula][46]</img>  where ![Graphic][47]</img> is the standard error of ![Graphic][48]</img>, which is given by: ![Formula][49]</img>  and the term ![Graphic][50]</img> in equation (7) ensures that the violation of (A2) and (A3) can be correctly detected with probability 1 as the sample sizes goes to infinity, as shown in Supplementary Section S3.

For each relevant IV in ![Graphic][51]</img>, we collect all relevant IVs’ votes on whether it is a valid IV according to equation (7). Then we construct a voting matrix ![Graphic][52]</img> to summarize the voting results and evaluate the similarity of two SNPs’ ratio estimates according to criterion **C2**. Specifically, we define the (*k, j*) entry of ![Graphic][53]</img> as: ![Formula][54]</img>  

From equation (9), we can see that the voting matrix ![Graphic][55]</img> is symmetric, and the entries of ![Graphic][56]</img> are binary: ![Graphic][57]</img> represents SNPs *j* and *k* vote for each other to be a valid IV, i.e., the ratio estimates of these two SNPs are close to each other; ![Graphic][58]</img> represents that they do not. For example, in Figure 1, ![Graphic][59]</img> since the ratio estimates of SNP 1 and 2 are similar, while ![Graphic][60]</img> because the ratio estimates of SNP 1 and 4 differ substantially.

After constructing the voting matrix ![Graphic][61]</img>, we select the valid IVs by applying majority/plurality voting or finding the maximum clique of the voting matrix29. Let ![Graphic][62]</img> be the total number of SNPs whose ratio estimates are similar to SNP *k*. For example, **VM**1 = 3 in Figure 1, since 3 SNPs (including SNP 1 itself) yield similar ratio estimates to SNP 1 according to criterion **C2**. A large **VM***k* implies a strong evidence that SNP *k* is a valid IV, since we assume that valid IVs form a plurality of the relevant IVs. Let ![Graphic][63]</img> denote the set of IVs with majority voting, and ![Graphic][64]</img> denote the set of IVs with plurality voting, then the union ![Graphic][65]</img> can be a robust estimate for 𝒱 in practice. Alternatively, we can also find the maximum clique in the voting matrix as an estimate for 𝒱. A clique in the voting matrix is a group of IVs who mutually vote for each other to be valid, and the maximum clique is the clique with the largest possible number of IVs.

### Estimation and inference of the causal effect

After selecting the set of valid genetic instruments ![Graphic][66]</img>, the causal effect *β* is estimated by: ![Formula][67]</img>  and the variance of ![Graphic][68]</img> is estimated by: ![Formula][69]</img>  where ![Graphic][70]</img> and ![Graphic][71]</img> are the estimates of SNP-exposure associations and SNP-outcome associations of the instruments in ![Graphic][72]</img>, respectively. The two expectation terms ![Graphic][73]</img> and ![Graphic][74]</img> in equation (11) can be approximated by ![Graphic][75]</img> and ![Graphic][76]</img>, respectively. The variance-covariance matrix of ![Graphic][77]</img> can be obtained by the delta method, as shown in Supplementary Section S4. Let *α* ∈ (0, 1) be the significance level and *z*1−*α/*2 be the (1 − *α/*2)-quantile of the standard normal distribution, then the (1 − *α*) confidence interval for *β* is given by: ![Formula][78]</img>  

As min{*n*1, *n*2} → ∞, we have ![Graphic][79]</img> under the plurality rule condition, as shown in Supplementary Section S5. Hence, MR-SPI provides a theoretical guarantee for the asymptotic coverage probability of the confidence interval under the plurality rule condition.

We summarize the procedure of selecting valid IVs and constructing the corresponding confidence interval by MR-SPI in Algorithm 1.

Algorithm 1
### Selection of Valid Instruments and Inference by MR-SPI

![Figure7](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F7.medium.gif)

[Figure7](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F7)

### A robust confidence interval via searching and sampling

In finite-sample settings, the selected set of relevant IVs ![Graphic][80]</img> might include some invalid IVs whose degrees of violation of (A2) and (A3) are small but nonzero, and we refer to them as “locally invalid IVs”31. When locally invalid IVs exist and are incorrectly selected into ![Graphic][81]</img>, the confidence interval in equation (12) becomes unreliable, since its validity (i.e., the coverage probability attains the nominal level) requires that the invalid IVs are correctly identified. In practice, we can multiply the threshold ![Graphic][82]</img> in the right-hand side of equation (7) by a scaling factor *η* to examine whether the confidence interval calculated by equation (12) is sensitive to the choice of the threshold. If the confidence interval varies substantially to the choice of the scaling fator *η*, then there might exist finite-sample IV selection error especially with locally invalid IVs. We demonstrate this issue with two numerical examples presented in Supplementary Figure S12. Supplementary Figure S12(a) shows an example in which MR-SPI provides robust inference across difference values of the scaling factor, while Supplementary Figure S12(b) shows an example that MR-SPI might suffer from IV selection error, as the causal effect estimate and the corresponding confidence interval are sensitive to the choice of the scaling factor *η*. This issue motivates us to develop a more robust confidence interval.

To construct a confidence interval that is robust to finite-sample IV selection error, we borrow the idea of searching and sampling proposed by31, with main steps described in Figure 6. The key idea is to sample the estimators of ***γ*** and **Γ** repeatedly from the following distribution: ![Formula][83]</img>  where *M* is the number of sampling times. Since ![Graphic][84]</img> and ![Graphic][85]</img> follow distributions centered at ***γ*** and **Γ**, there exists *m** such that ![Graphic][86]</img> and ![Graphic][87]</img> are close enough to the true genetic effects ***γ*** and **Γ** when the number of sampling times *M* is sufficiently large, and thus the confidence interval obtained by using ![Graphic][88]</img> and ![Graphic][89]</img> instead of ![Graphic][90]</img> and ![Graphic][91]</img> might have a larger probability of covering *β*.

For each sampling, we construct the confidence interval by searching over a grid of *β* values such that more than half of the instruments in ![Graphic][92]</img> are detected as valid. As for the choice of grid, we start with the smallest interval [*L, U*] that contains all the following intervals: ![Formula][93]</img>  where ![Graphic][94]</img> is the ratio estimate of the *j*th SNP, ![Graphic][95]</img> is the variance of ![Graphic][96]</img>, and ![Graphic][97]</img> serves the same purpose as in equation (7). Then we discretize [*L, U*] into ℬ = {*b*1, *b*2, …, *b**K*} as the grid set such that *b*1 = *L, b**K* = *U* and |*b**k*+1 − *b**k*| = *n*−0.6 for 1 ≤ *k* ≤ *K* − 2. We set the grid size *n*−0.6 so that the error caused by discretization is smaller than the parametric rate *n*−1*/*2.

For each grid value *b* ∈ ℬ and sampling index 1 ≤ *m* ≤ *M*, we propose an estimate of the degree of violation of (A2) and (A3) of the *j*th SNP by: ![Formula][98]</img>  where ![Graphic][99]</img> is a data-dependent threshold, Φ−1(*·*) is the inverse of the cumulative distribution function of the standard normal distribution, *α* ∈ (0, 1) is the significance level, and ![Graphic][100]</img> (*λ <* 1 when *M* is sufficiently large) is a scaling factor to make the thresholding more stringent so that the confidence interval in each sampling is shorter, which we will see later in equation (16). Here, ![Graphic][101]</img> indicates that the *j*th SNP is detected as a valid IV in the *m*th sampling if we take ![Graphic][102]</img> as the estimates of genetic effects and *b* as the true causal effect. Let ![Graphic][103]</img>, then we construct the *m*th sampling’s confidence interval CI(*m*) by searching for the smallest and largest *b* ∈ ℬ such that more than half of SNPs in ![Graphic][104]</img> are detected to be valid according to equation (15), i.e., ![Formula][105]</img>  

From equations (15) and (16), we can see that, when *λ* is smaller, there will be fewer SNPs in ![Graphic][106]</img> being detected as valid for a given *b* ∈ ℬ, which leads to less *b* ∈ ℬ satisfying ![Graphic][107]</img>, thus the confidence interval in each sampling will be shorter. If there does not exist *b* ∈ ℬ such that the majority of IVs in ![Graphic][108]</img> are detected as valid, we set CI(*m*) = ∅. Let ℳ = {1 ≤ *m* ≤ *M* : CI(*m*) ≠ ∅} denote the set of all sampling indexes corresponding to non-empty searching confidence intervals, then the proposed robust confidence interval is given by: ![Formula][109]</img>  

We summarize the procedure of constructing the proposed robust confidence interval in Algorithm 2.

Algorithm 2
### Constructing A Robust Confidence Interval via Searching and Sampling

![Figure8](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/19/2023.02.20.23286200/F8.medium.gif)

[Figure8](http://medrxiv.org/content/early/2023/05/19/2023.02.20.23286200/F8)

### Simulation settings

We set the number of candidate IVs *p* = 10 and the sample sizes *n*1 = *n*2 ∈ {5000, 10000, 20000, 40000, 80000}. We generate the *j*th genetic instruments *Z**j* and *X**j* independently from a binomial distribution Bin(2, *f**j*), where *f**j* ∼ *U* (0.05, 0.50) is the minor allele frequency of SNP *j*. Then we generate the exposure ![Graphic][110]</img> and the outcome ![Graphic][111]</img> according to models (1) and (2) respectively. Finally, we calculate the genetic associations and their corresponding standard errors for the exposure and the outcome, respectively. As for the parameters in models (1) and (2), we fix the causal effect *β* = 1, and we consider 4 scenarios for ***γ*** ∈ ℝ*p* and ***π*** ∈ ℝ*p*: ![Formula][112]</img>  ![Formula][113]</img>  ![Formula][114]</img>  ![Formula][115]</img>  

Scenarios (**S1**) and (**S3**) satisfy the majority rule condition, while (**S2**) and (**S4**) only satisfy the plurality rule condition. In addition, (**S3**) and (**S4**) simulate the cases where locally invalid IVs exist, as we shrink some of the SNPs’ violation degrees of (A2) and (A3) down to 0.25 times in these two scenarios. In total, we run 1000 replications in each scenario.

### Implementation of existing MR methods

We compare the performance of MR-SPI with eight other MR methods in simulation studies and data analyses. These methods are implemented as follows:

*   Random-effects IVW, MR-Egger, the weighted median method, the mode-based estimation and the contamination mixture method are implemented in the R package “MendelianRan-domization” ([https://github.com/cran/MendelianRandomization](https://github.com/cran/MendelianRandomization)). The mode-based estimation is run with iteration=1000. All other methods are run with the default parameters.

*   MR-PRESSO is implemented in the R package “MR-PRESSO” ([https://github.com/rondolab/MR-PRESSO](https://github.com/rondolab/MR-PRESSO)) with outlier test and distortion test.

*   MR-RAPS is performed using the R package “mr.raps” ([https://github.com/qingyuanzhao/mr.raps](https://github.com/qingyuanzhao/mr.raps)) with the default options.

*   MRMix is run with the R package “MRMix” ([https://github.com/gqi/MRMix](https://github.com/gqi/MRMix)) using the default options.

## Footnotes

*   More data analyses added.

*   Received February 20, 2023.
*   Revision received May 19, 2023.
*   Accepted May 19, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  [1]. Kenneth J Rothman and  Sander Greenland. Causation and causal inference in epidemiology. American Journal of Public Health, 95(S1):S144–S150, 2005.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2105/AJPH.2004.059204&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16030331&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000230898200023&link_type=ISI) 

2.  [2]. Jan P Vandenbroucke,  Alex Broadbent, and  Neil Pearce. Causality and causal inference in epidemiology: the need for a pluralistic approach. International Journal of Epidemiology, 45 (6):1776–1786, 2016.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

3.  [3]. George Davey Smith and  Shah Ebrahim. ‘Mendelian randomization’: can genetic epidemi-ology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32(1):1–22, 2003.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyg070&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12689998&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182341300001&link_type=ISI) 

4.  [4]. Debbie A Lawlor,  Roger M Harbord,  Jonathan AC Sterne,  Nic Timpson, and  George Davey Smith. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine, 27(8):1133–1163, 2008.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.3034&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17886233&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

5.  [5]. George Davey Smith and  Gibran Hemani. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics, 23(R1):R89–R98, 2014.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddu328&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25064373&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000349825700013&link_type=ISI) 

6.  [6]. George Davey Smith and  Shah Ebrahim. Mendelian randomization: prospects, potentials, and limitations. International Journal of Epidemiology, 33(1):30–42, 2004.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyh132&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15075143&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000220615000009&link_type=ISI) 

7.  [7]. Stephen Burgess,  Adam Butterworth, and  Simon G Thompson. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology, 37(7): 658–665, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21758&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24114802&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

8.  [8]. Brandon L Pierce and  Stephen Burgess. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. American Journal of Epidemiology, 178(7):1177–1184, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwt084&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23863760&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325151700023&link_type=ISI) 

9.  [9]. Debbie A Lawlor. Commentary: Two-sample Mendelian randomization: opportunities and challenges. International Journal of Epidemiology, 45(3):908, 2016.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyw127&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27427429&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

10. [10]. Eric AW Slob and  Stephen Burgess. A comparison of robust Mendelian randomization methods using summary data. Genetic Epidemiology, 44(4):313–329, 2020.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

11. [11]. Qingyuan Zhao,  Jingshu Wang,  Gibran Hemani,  Jack Bowden, and  Dylan S Small. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. The Annals of Statistics, 48(3):1742–1769, 2020.
    
    
12. [12]. Jean Morrison,  Nicholas Knoblauch,  Joseph H Marcus,  Matthew Stephens, and  Xin He. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nature Genetics, 52(7):740–747, 2020.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

13. [13]. Qing Cheng,  Xiao Zhang,  Lin S Chen, and  Jin Liu. Mendelian randomization accounting for complex correlated horizontal pleiotropy while elucidating shared genetic etiology. Nature Communications, 13(1):1–13, 2022.
    
    
14. [14]. Vanessa Didelez and  Nuala Sheehan. Mendelian randomization as an instrumental variable approach to causal inference. Statistical Methods in Medical Research, 16(4):309–330, 2007.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0962280206077743&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17715159&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000248753000001&link_type=ISI) 

15. [15]. Eleanor Sanderson,  Tom G Richardson,  Gibran Hemani, and  George Davey Smith. The use of negative control outcomes in Mendelian randomization to detect potential population stratification. International Journal of Epidemiology, 50(4):1350–1361, 2021.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyaa288&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33570130&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

16. [16]. Nadia Solovieff,  Chris Cotsapas,  Phil H Lee,  Shaun M Purcell, and  Jordan W Smoller. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics, 14(7): 483–495, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg3461&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23752797&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

17. [17]. Shanya Sivakumaran,  Felix Agakov,  Evropi Theodoratou,  James G Prendergast,  Lina Zgaga,  Teri Manolio,  Igor Rudan,  Paul McKeigue,  James F Wilson, and  Harry Campbell. Abundant pleiotropy in human complex diseases and traits. The American Journal of Human Genetics, 89(5):607–618, 2011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2011.10.004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22077970&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

18. [18]. Miles Parkes,  Adrian Cortes,  David A Van Heel, and  Matthew A Brown. Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nature Reviews Genetics, 14(9):661–673, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg3502&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23917628&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

19. [19]. Jack Bowden, Fabiola Del Greco M,  Cosetta Minelli,  George Davey Smith,  Nuala Sheehan, and  John Thompson. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Statistics in Medicine, 36(11):1783–1802, 2017.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7221&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28114746&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

20. [20]. Jack Bowden,  George Davey Smith, and  Stephen Burgess. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology, 44(2):512–525, 2015.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyv080&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26050253&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

21. [21]. Stephen Burgess and  Simon G Thompson. Interpreting findings from Mendelian randomization using the MR-Egger method. European Journal of Epidemiology, 32(5):377–389, 2017.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=doi:10.1007/s10654-017-0255-x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28527048&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

22. [22]. Jack Bowden,  George Davey Smith,  Philip C Haycock, and  Stephen Burgess. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genetic Epidemiology, 40(4):304–314, 2016.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21965&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27061298&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

23. [23]. Marie Verbanck,  Chia-Yen Chen,  Benjamin Neale, and  Ron Do. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nature Genetics, 50(5):693–698, 2018.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0099-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29686387&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

24. [24]. Zijian Guo, Hyunseung Kang, T  Tony Cai, and  Dylan S Small. Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(4):793–815, 2018.
    
    
25. [25]. Stephen Burgess,  Christopher N Foley,  Elias Allara,  James R Staley, and  Joanna MM Howson. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nature Communications, 11(1):1–11, 2020.
    
    
26. [26]. Fernando Pires Hartwig,  George Davey Smith, and  Jack Bowden. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. International Journal of Epidemiology, 46(6):1985–1998, 2017.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyx102&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29040600&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

27. [27]. Guanghao Qi and  Nilanjan Chatterjee. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nature Communications, 10(1): 1–10, 2019.
    
    
28. [28]. Daniel I Swerdlow,  Karoline B Kuchenbaecker,  Sonia Shah,  Reecha Sofat,  Michael V Holmes,  Jon White,  Jennifer S Mindell,  Mika Kivimaki,  Eric J Brunner,  John C Whittaker, et al. Selecting instruments for Mendelian randomization in the wake of genome-wide association studies. International Journal of Epidemiology, 45(5):1600–1616, 2016.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyw088&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27342221&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

29. [29]. Qi Ouyang,  Peter D Kaplan,  Shumao Liu, and  Albert Libchaber. DNA solution of the maximal clique problem. Science, 278(5337):446–449, 1997.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIyNzgvNTMzNy80NDYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8wNS8xOS8yMDIzLjAyLjIwLjIzMjg2MjAwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

30. [30]. Yoav Benjamini and  Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346101&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:A1995QE4&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QE45300017&link_type=ISI) 

31. [31]. Zijian Guo. Post-selection problems for causal inference with invalid instruments: A solution using searching and sampling. arXiv preprint arXiv:2104.06911, 2021.
    
    
32. [32].The Coronary Artery Disease (C4D) Genetics Consortium. A genome-wide association study in europeans and south asians identifies five new loci for coronary artery disease. Nature Genetics, 43(4):339–344, 2011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.782&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21378988&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000288903700014&link_type=ISI) 

33. [33]. Heribert Schunkert,  Inke R König,  Sekar Kathiresan,  Muredach P Reilly,  Themistocles L Assimes,  Hilma Holm,  Michael Preuss,  Alexandre FR Stewart,  Maja Barbalic,  Christian Gieger, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature Genetics, 43(4):333–338, 2011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.784&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21378990&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

34. [34]. Shaun Purcell,  Benjamin Neale,  Kathe Todd-Brown,  Lori Thomas,  Manuel AR Ferreira,  David Bender,  Julian Maller,  Pamela Sklar,  Paul IW De Bakker,  Mark J Daly, et al. Plink: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3):559–575, 2007.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

35. [35]. Adam E Locke,  Bratati Kahali,  Sonja I Berndt,  Anne E Justice,  Tune H Pers,  Felix R Day,  Corey Powell,  Sailaja Vedantam,  Martin L Buchkovich,  Jian Yang, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature, 518(7538):197–206, 2015.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14177&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25673413&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

36. [36]. James R Staley,  James Blackshaw,  Mihir A Kamat,  Steve Ellis,  Praveen Surendran,  Benjamin B Sun,  Dirk S Paul,  Daniel Freitag,  Stephen Burgess,  John Danesh, et al. PhenoScanner: a database of human genotype–phenotype associations. Bioinformatics, 32(20):3207–3209, 2016.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw373&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27318201&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

37. [37]. Mihir A Kamat,  James A Blackshaw,  Robin Young,  Praveen Surendran,  Stephen Burgess,  John Danesh,  Adam S Butterworth, and  James R Staley. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics, 35(22):4851–4853, 2019.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

38. [38]. Zulvikar Syambani Ulhaq and  Gita Vita Soraya. Interleukin-6 as a potential biomarker of covid-19 progression. Medecine et maladies infectieuses, 50(4):382, 2020.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

39. [39]. Olive Jean Dunn. Multiple comparisons among means. Journal of the American Statistical Association, 56(293):52–64, 1961.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2282330&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A19611734300002&link_type=ISI) 

40. [40].Cholesterol Treatment Trialists et al. The effects of lowering LDL cholesterol with statin therapy in people at low risk of vascular disease: meta-analysis of individual data from 27 randomised trials. The Lancet, 380(9841):581–590, 2012.
    
    
41. [41]. Baris Gencer,  Nicholas A Marston,  KyungAh Im,  Christopher P Cannon,  Peter Sever,  Anthony Keech,  Eugene Braunwald,  Robert P Giugliano, and  Marc S Sabatine. Efficacy and safety of lowering LDL cholesterol in older patients: a systematic review and meta-analysis of randomised controlled trials. The Lancet, 396(10263):1637–1643, 2020.
    
    
42. [42]. Juan R Rey,  Juan Caro-Codón,  Sandra O Rosillo,  Ángel M Iniesta,  Sergio Castrejón-Castrejón,  Irene Marco-Clement,  Lorena Martín-Polo,  Carlos Merino-Argos,  Laura Rodríguez-Sotelo,  Jose M García-Veas, et al. Heart failure in COVID-19 patients: prevalence, incidence and prognostic implications. European Journal of Heart Failure, 22(12):2205–2215, 2020.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ejhf.1990&link_type=DOI) 

43. [43]. RonaldM Krauss,  PaulT Williams,  John Brensike,  KatherineM Detre,  FrankT Lindgren,  SherylF Kelsey,  Karen Vranizan, and  RobertI Levy. Intermediate-density lipoproteins and progression of coronary artery disease in hypercholesterolaemic men. The Lancet, 330(8550): 62–66, 1987.
    
    
44. [44]. Daniel J Rader and  G Kees Hovingh. HDL and cardiovascular disease. The Lancet, 384(9943): 618–625, 2014.
    
    
45. [45]. Prospective Studies Collaboration et al. Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55 000 vascular deaths. The Lancet, 370(9602):1829–1839, 2007.
    
    
46. [46].Cholesterol Treatment Trialists et al. Efficacy and safety of more intensive lowering of ldl cholesterol: a meta-analysis of data from 170 000 participants in 26 randomised trials. The Lancet, 376(9753):1670–1681, 2010.
    
    
47. [47]. Brandi J Witt,  Robert D Brown Jr,  Steven J Jacobsen,  Susan A Weston,  Karla V Ballman,  Ryan A Meverden, and  Véronique L Roger. Ischemic stroke after heart failure: a community-based study. American Heart Journal, 152(1):102–109, 2006.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ahj.2005.10.018&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16824838&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000239060200015&link_type=ISI) 

48. [48]. Karl Georg Haeusler,  Ulrich Laufs, and  Matthias Endres. Chronic heart failure and ischemic stroke. Stroke, 42(10):2977–2982, 2011.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToic3Ryb2tlYWhhIjtzOjU6InJlc2lkIjtzOjEwOiI0Mi8xMC8yOTc3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDUvMTkvMjAyMy4wMi4yMC4yMzI4NjIwMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

49. [49]. Benjamin B Sun,  Joshua Chiou,  Matthew Traylor,  Christian Benner,  Yi-Hsiang Hsu,  Tom G Richardson,  Praveen Surendran,  Anubha Mahajan,  Chloe Robins,  Steven G Vasquez-Grinnell, et al. Genetic regulation of the human plasma proteome in 54,306 UK Biobank participants. BioRxiv, pages 2022–06, 2022.
    
    
50. [50]. Iris E Jansen,  Jeanne E Savage,  Kyoko Watanabe,  Julien Bryois,  Dylan M Williams,  Stacy Steinberg,  Julia Sealock,  Ida K Karlsson,  Sara Hägg,  Lavinia Athanasiu, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nature Genetics, 51(3):404–413, 2019.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

51. [51]. Adam C Naj,  Gyungah Jun,  Gary W Beecham,  Li-San Wang,  Badri Narayan Vardarajan,  Jacqueline Buros,  Paul J Gallins,  Joseph D Buxbaum,  Gail P Jarvik,  Paul K Crane, et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nature genetics, 43(5):436–441, 2011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.801&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21460841&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

52. [52]. Nisha Rathore,  Sree Ranjani Ramani,  Homer Pantua,  Jian Payandeh,  Tushar Bhangale,  Arthur Wuster,  Manav Kapoor,  Yonglian Sun,  Sharookh B Kapadia,  Lino Gonzalez, et al. Paired Immunoglobulin-like Type 2 Receptor Alpha G78R variant alters ligand binding and confers protection to Alzheimer’s disease. PLoS genetics, 14(11):e1007427, 2018.
    
    
53. [53]. Ana Griciuc,  Shaun Patel,  Anthony N Federico,  Se Hoon Choi,  Brendan J Innes,  Mary K Oram,  Gea Cereghetti,  Danielle McGinty,  Anthony Anselmo,  Ruslan I Sadreyev, et al. TREM2 acts downstream of CD33 in modulating microglial pathology in Alzheimer’s disease. Neuron, 103(5):820–835, 2019.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2019.06.010&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31301936&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

54. [54]. Hafdis T Helgadottir,  Pär Lundin,  Emelie Wallén Arzt,  Anna-Karin Lindström,  Caroline Graff, and  Maria Eriksson. Somatic mutation that affects transcription factor binding up-stream of CD55 in the temporal cortex of a late-onset Alzheimer disease patient. Human Molecular Genetics, 28(16):2675–2685, 2019.
    
    
55. [55]. Suman Rimal,  Ishaq Tantray,  Yu Li,  Tejinder Pal Khaket,  Yanping Li,  Sunil Bhurtel,  Wen Li,  Cici Zeng, and  Bingwei Lu. Reverse electron transfer is activated during aging and contributes to aging and age-related disease. EMBO reports, 24(4):e55548, 2023.
    
    
56. [56]. Noel G Faux,  Alan Rembach,  James Wiley,  Kathryn A Ellis,  David Ames,  Christopher J Fowler,  Ralph N Martins,  Kelly K Pertile,  Rebecca L Rumble, B Trounson, et al. An anemia of Alzheimer’s disease. Molecular Psychiatry, 19(11):1227–1234, 2014.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

57. [57]. Laura M Winchester,  John Powell,  Simon Lovestone, and  Alejo J Nevado-Holgado. Red blood cell indices and anaemia as causative factors for cognitive function deficits and for Alzheimer’s disease. Genome Medicine, 10(1):1–12, 2018.
    
    
58. [58]. Uku Raudvere,  Liis Kolberg,  Ivan Kuzmin,  Tambet Arak,  Priit Adler,  Hedi Peterson, and Jaak Vilo. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research, 47(W1):W191–W198, 2019.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nbt.4096&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F19%2F2023.02.20.23286200.atom) 

59. [59]. Hyunseung Kang, Anru Zhang, T  Tony Cai, and  Dylan S Small. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. Journal of the American Statistical Association, 111(513):132–144, 2016.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/01621459.2014.994705&link_type=DOI)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: F6/embed/inline-graphic-3.gif
 [4]: /embed/inline-graphic-4.gif
 [5]: /embed/graphic-8.gif
 [6]: /embed/graphic-9.gif
 [7]: /embed/inline-graphic-5.gif
 [8]: /embed/inline-graphic-6.gif
 [9]: /embed/graphic-10.gif
 [10]: /embed/inline-graphic-7.gif
 [11]: /embed/inline-graphic-8.gif
 [12]: /embed/inline-graphic-9.gif
 [13]: /embed/inline-graphic-10.gif
 [14]: /embed/inline-graphic-11.gif
 [15]: /embed/inline-graphic-12.gif
 [16]: /embed/inline-graphic-13.gif
 [17]: /embed/inline-graphic-14.gif
 [18]: /embed/inline-graphic-15.gif
 [19]: /embed/inline-graphic-16.gif
 [20]: /embed/inline-graphic-17.gif
 [21]: /embed/graphic-11.gif
 [22]: /embed/inline-graphic-18.gif
 [23]: /embed/inline-graphic-19.gif
 [24]: /embed/inline-graphic-20.gif
 [25]: /embed/inline-graphic-21.gif
 [26]: /embed/inline-graphic-22.gif
 [27]: /embed/inline-graphic-23.gif
 [28]: /embed/graphic-12.gif
 [29]: /embed/inline-graphic-24.gif
 [30]: /embed/graphic-13.gif
 [31]: /embed/inline-graphic-25.gif
 [32]: /embed/inline-graphic-26.gif
 [33]: /embed/inline-graphic-27.gif
 [34]: /embed/inline-graphic-28.gif
 [35]: /embed/graphic-14.gif
 [36]: /embed/inline-graphic-29.gif
 [37]: /embed/inline-graphic-30.gif
 [38]: /embed/inline-graphic-31.gif
 [39]: /embed/inline-graphic-32.gif
 [40]: /embed/inline-graphic-33.gif
 [41]: /embed/inline-graphic-34.gif
 [42]: /embed/inline-graphic-35.gif
 [43]: /embed/inline-graphic-36.gif
 [44]: /embed/inline-graphic-37.gif
 [45]: /embed/inline-graphic-38.gif
 [46]: /embed/graphic-15.gif
 [47]: /embed/inline-graphic-39.gif
 [48]: /embed/inline-graphic-40.gif
 [49]: /embed/graphic-16.gif
 [50]: /embed/inline-graphic-41.gif
 [51]: /embed/inline-graphic-42.gif
 [52]: /embed/inline-graphic-43.gif
 [53]: /embed/inline-graphic-44.gif
 [54]: /embed/graphic-17.gif
 [55]: /embed/inline-graphic-45.gif
 [56]: /embed/inline-graphic-46.gif
 [57]: /embed/inline-graphic-47.gif
 [58]: /embed/inline-graphic-48.gif
 [59]: /embed/inline-graphic-49.gif
 [60]: /embed/inline-graphic-50.gif
 [61]: /embed/inline-graphic-51.gif
 [62]: /embed/inline-graphic-52.gif
 [63]: /embed/inline-graphic-53.gif
 [64]: /embed/inline-graphic-54.gif
 [65]: /embed/inline-graphic-55.gif
 [66]: /embed/inline-graphic-56.gif
 [67]: /embed/graphic-18.gif
 [68]: /embed/inline-graphic-57.gif
 [69]: /embed/graphic-19.gif
 [70]: /embed/inline-graphic-58.gif
 [71]: /embed/inline-graphic-59.gif
 [72]: /embed/inline-graphic-60.gif
 [73]: /embed/inline-graphic-61.gif
 [74]: /embed/inline-graphic-62.gif
 [75]: /embed/inline-graphic-63.gif
 [76]: /embed/inline-graphic-64.gif
 [77]: /embed/inline-graphic-65.gif
 [78]: /embed/graphic-20.gif
 [79]: /embed/inline-graphic-66.gif
 [80]: /embed/inline-graphic-67.gif
 [81]: /embed/inline-graphic-68.gif
 [82]: /embed/inline-graphic-69.gif
 [83]: /embed/graphic-22.gif
 [84]: /embed/inline-graphic-70.gif
 [85]: /embed/inline-graphic-71.gif
 [86]: /embed/inline-graphic-72.gif
 [87]: /embed/inline-graphic-73.gif
 [88]: /embed/inline-graphic-74.gif
 [89]: /embed/inline-graphic-75.gif
 [90]: /embed/inline-graphic-76.gif
 [91]: /embed/inline-graphic-77.gif
 [92]: /embed/inline-graphic-78.gif
 [93]: /embed/graphic-23.gif
 [94]: /embed/inline-graphic-79.gif
 [95]: /embed/inline-graphic-80.gif
 [96]: /embed/inline-graphic-81.gif
 [97]: /embed/inline-graphic-82.gif
 [98]: /embed/graphic-24.gif
 [99]: /embed/inline-graphic-83.gif
 [100]: /embed/inline-graphic-84.gif
 [101]: /embed/inline-graphic-85.gif
 [102]: /embed/inline-graphic-86.gif
 [103]: /embed/inline-graphic-87.gif
 [104]: /embed/inline-graphic-88.gif
 [105]: /embed/graphic-25.gif
 [106]: /embed/inline-graphic-89.gif
 [107]: /embed/inline-graphic-90.gif
 [108]: /embed/inline-graphic-91.gif
 [109]: /embed/graphic-26.gif
 [110]: /embed/inline-graphic-92.gif
 [111]: /embed/inline-graphic-93.gif
 [112]: /embed/graphic-28.gif
 [113]: /embed/graphic-29.gif
 [114]: /embed/graphic-30.gif
 [115]: /embed/graphic-31.gif