On Mendelian Randomisation Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference
==============================================================================================================================

* Zhonghua Liu
* Ting Ye
* Baoluo Sun
* Mary Schooling
* Eric Tchetgen Tchetgen

## Abstract

Standard Mendelian randomization analysis can produce biased results if the genetic variant defining instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in the presence of unmeasured confounding by leveraging an invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. Although assumptions (i) and (ii) have, respectively appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MR MiSTERI is shown to be particularly advantageous in the presence of pervasive heterogeneity of pleiotropic effects on the additive scale. For estimation, we propose a simple and consistent three-stage estimator that can be used as preliminary estimator to a carefully constructed one-step-update estimator, which is guaranteed to be more efficient under the assumed model. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved estimation accuracy. Both simulation studies and UK Biobank data analysis results demonstrate the robustness of the proposed MR MiSTERI method.

Keywords
*   Treatment effect on the treated
*   Mendelian randomization
*   horizontal pleiotropy
*   invalid instrument
*   unmeasured confounding
*   UK Biobank
*   weak instrument

## 1 Introduction

Mendelian randomization (MR) is an instrumental variable (IV) approach that uses genetic variants, for example, single-nucleotide polymorphisms (SNPs), to infer the causal effect of a modifiable risk treatment on a health outcome (Smith and Ebrahim 2003). MR has recently gained popularity in epidemiological studies because, under certain conditions, it can provide unbiased estimates of causal effects even in the presence of unmeasured exposure-outcome confounding. For example, findings from a recent MR analysis assessing the causal relationship between low-density lipoprotein cholesterol and coronary heart disease (Ference et al. 2017) in an observational study are consistent with the results of earlier randomized clinical trials (Scandinavian Simvastatin Survival Study Group 1994).

For a SNP to be a valid IV, it must satisfy the following three core assumptions (Didelez and Sheehan 2007; Lawlor et al. 2008):

**(A1)** IV relevance: the SNP must be associated (not necessarily causally) with the exposure;

**(A2)** IV independence: the SNP must be independent of any unmeasured confounder of the exposure-outcome relationship;

**(A3)** Exclusion restriction: the SNP cannot have a direct effect on the outcome variable not mediated by the treatment, that is, no horizontal pleiotropic effect can be present.

The causal diagram in Figure 1(a) graphically represents the three core assumptions for a valid IV. It is well-known that even when assumptions (A1)-(A3) hold for a given IV, the average causal effect of the treatment on the outcome cannot be point identified without an additional condition, the nature of which dictates the interpretation of the identified causal effect. Specifically, Angrist et al. (1996) proved that under (A1)-(A3) and a monotonicity assumption about the IV-treatment relationship, the so-called complier average treatment effect is nonparametrically identified. More recently, Wang and Tchetgen Tchetgen (2018) established identification of population average causal effect under (A1)-(A3) and an additional assumption of no unmeasured common effect modifier of the association between the IV and the endogenous variable, and the treatment causal effect on the outcome. A special case of this assumption is that the association between the IV and the treatment variable is constant on the additive scale across values of the unmeasured confounder (Tchetgen Tchetgen et al. (2020)). In a separate strand of work, Robins (1994) identified the effect of treatment on the treated under (A1)-(A3) and a so-called “no current-treatment value interaction” assumption (A.4a) that the effect of treatment on the treated is constant on the additive scale across values of the IV. In contrast, Liu et al. (2020) established identification of the effect of treatment on the treated (ETT) under (A1)-(A3), and an assumption (A.4b) that the selection bias function defined as the odds ratio association between the potential outcome under no treatment and the treatment variable, is constant across values of the IV. Note that under the IV DAG Figure 1(a), assumption (A1) is empirically testable while (A2) and (A3) cannot be refuted empirically without a different assumption being imposed (Glymour et al. 2012). Possible violation or near violation of assumption (A1) known as the weak IV problem poses an important challenge in MR studies as the associations between a single SNP IV and complex traits can be fairly weak (Staiger and Stock 1997; Stock et al. 2002). But massive genotyped datasets have provided many such weak IVs. This has motivated a rich body of work addressing the weak IV problem under a many weak instruments framework, from a generalized method of moments perspective given individual level data (Chao and Swanson 2005; Newey and Windmeijer 2009; Davies et al. 2015), and also from a summary-data perspective (Zhao et al. 2019a,b; Ye et al. 2019). Violation of assumption (A2) can occur due to population stratification, or when a selected SNP IV is in linkage disequilibrium (LD) with another genetic variant which has a direct effect on the outcome (Didelez and Sheehan 2007). Violations of (A3) can occur when the chosen SNP IV has a non-null direct effect on the outcome not mediated by the exposure, commonly referred to as horizontal pleiotropy and is found to be widespread (Solovieff et al. 2013; Verbanck et al. 2018). A standard MR analysis (i.e. based on standard IV methods such as 2SLS) with an invalid IV that violates any of those three core assumptions might yield biased causal effect estimates.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/30/2020.09.29.20204420/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/F1)

Figure 1: 
Directed acyclic graph (DAG) with an instrument (*Z*), an outcome (*Y*), a treatment (*A*) and unmeasured confounders (*U*). The left panel shows a valid Mendelian randomization study and the right panel shows violations of IV independence and exclusion restriction assumptions.

Methods to address possible violations of (A2) or (A3) given a single candidate IV are limited. Two methods have recently emerged as potentially robust against such violation under certain conditions. The first, known as MR-GxE, assumes that one has observed an environmental factor (E), which interacts with the invalid IV in its additive effects on the treatment of interest, and that such interaction is both independent of any unmeasured confounder of the exposure-outcome relationship, and does not operate on the outcome in view (Spiller et al. 2019, 2020). In other words, MR-GxE essentially assumes that while the candidate SNP (the G variable) may not be a valid IV, its additive interaction with an observed covariate constitutes a valid IV which satisfies (A1)-(A3). In contrast, MR GENIUS relies on an assumption that the residual variance of the first stage regression of the treatment on the candidate IV is heteroscedastic with respect to the candidate IV, i.e., the variance of the treatment depends on the IV, an assumption that may be viewed as strengthening of the IV relevance assumption (Lewbel 2012; Tchetgen Tchetgen et al. 2020). Interestingly, as noted by Tchetgen Tchetgen et al. (2020), existence of a GxE interaction that satisfies conditions (A1)-(A3) required by MR GxE had such an E variable (independent of *G*) been observed, necessarily implies that the heteroscedasticity condition required by MR GENIUS holds even when the relevant E variable is not directly observed. Furthermore, as it is logically possible for heteroscedasticity of the variance of the treatment to operate even in absence of any GxE interaction, MR GENIUS can be valid in certain settings where MR GxE is not.

In this paper, we develop an alternative robust MR approach for estimating a causal effect of a treatment subject to unmeasured confounding, leveraging a potentially invalid IV which fails to fulfill either IV independence or exclusion restriction assumptions. The proposed “Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification” (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. Note that assumption (iii) is empirically testable. Although assumptions (i) and (ii) have, respectively appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MR MiSTERI is shown to be particularly advantageous in the presence of pervasive heterogeneity of pleiotropic effects on the additive scale, a setting in which both MR GxE and MR GENIUS can be severely biased whenever heteroscedastic first stage residuals can fully be attributed to latent heterogeneity in SNP-treatment association (Spiller et al. 2020). For estimation, we propose a simple and consistent three-stage estimator that can be used as a preliminary estimator to a carefully constructed one-step-update estimator, which is guaranteed to be more efficient under the assumed model. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MaWII MR MiSTERI) approach for strengthened identification and improved accuracy. Simulation study results show that our proposed MR method gives consistent estimates of the causal parameter and the selection bias parameter with nominal confidence interval coverage with an invalid IV, and the accuracy is further improved with multiple weak invalid IVs. For illustration, we apply our method to the UK Biobank data set to estimate the causal effect of body mass index (BMI) on average glucose level. We also develop an R package freely available for public use at [https://github.com/zhonghualiu/MRMiSTERI](https://github.com/zhonghualiu/MRMiSTERI).

## 2 Methods

### 2.1 Identification with a Binary Treatment and a Possibly Invalid Binary IV

Suppose that we observe data *O**i* = (*Z**i*, *A**i*, *Y**i*) of sample size *n* drawn independently from a common population, where *Z**i*, *A**i*, *Y**i* denote a SNP IV, a treatment and a continuous outcome of interest for the *i*th subject (1 ≤ *i* ≤ *n*), respectively. In order to simplify the presentation, we drop the sample index *i* and do not consider observed covariates at this stage, although all the conclusions continue to hold within strata defined by observed covariates. Let *z, a, y* denote the possible values that *Z, A, Y* could take, respectively. Let *Y**az* denote the potential outcome, had possibly contrary to fact, *A* and *Z* been set to *a* and *z* respectively, and let *Y**a* denote the potential outcome had *A* been set to *a*. We are interested in estimating the ETT defined as ![Formula][1]</img>  To facilitate the exposition, consider the simple setting where both the treatment and the SNP IV are binary, then the ETT simplifies to *β* = *E*(*Y*1 − *Y* | *A* = 1). By the consistency assumption (Hernan and Robins 2020), we know that *E*(*Y*1 | *A* = 1) = *E*(*Y* | *A* = 1). However, the expectation of the potential outcome *Y* among the exposed subpopulation *E*(*Y* | *A* = 1) cannot empirically be observed due to possible unmeasured confounding for the exposure-outcome relationship. The following difference captures this confounding bias on the additive scale *Bias* = *E*(*Y* | *A* = 1) − *E*(*Y* | *A* = 0), which is exactly zero only when exposed and unexposed groups are exchangeable on average (i.e. under no confounding) (Hernan and Robins 2020), and is otherwise not null. With this representation and the consistency assumption, we have ![Formula][2]</img>  This simple equation implies that one can only estimate the sum of the causal effect *β* and the confounding bias but cannot tease them apart using the data (*A, Y*) only. With the availability of a binary candidate SNP IV *Z* that is possibly invalid, we can further stratify the population by *Z* to obtain under consistency: ![Formula][3]</img>  where *z* is equal to either 0 or 1, and *β*(*z*) = *E*(*Y**a* − *Y* | *A* = *a, Z* = *z*), *Bias*(*z*) = *E*(*Y* | *A* = 1, *Z* = *z*) − *E*(*Y* | *A* = 0, *Z* = *z*) denote the causal effect and bias in the stratified population with the IV taking value *z*. Note that there are only two equations but four unknown parameters: *β*(*z*), *Bias*(*z*), *z* = 0, 1. Therefore, the causal effect cannot be identified without imposing assumptions to reduce the total number of parameters to two.

Our first assumption extends the no current-treatment value interaction assumption (A.4a) originally proposed by Robins (1994), that the causal effect does not vary across the levels of the SNP IV so that *β*(*z*) is a constant as function of *z*. Formally stated:

**(B1)**Homogeneous ETT assumption: *E*(*Y**a*=1,*z* − *Y**a*=0,*z* | *A* = 1, *Z* = *z*) = *β*.

It is important to note that this assumption does not imply the exclusion restriction assumption (A3); it is perfectly compatible with presence of a direct effect of *Z* on *Y* (the direct arrow from *Z* to *Y* is present in Figure 1(b)), i.e. *E*(*Y**a*=0,*z*=1 − *Y**a*=0,*z*=0 |*A* = 1, *Z* = *z*) ≠ 0 as well as unmeasured confounding of the effects of Z on (A,Y), both of which we accommodate.

In order to state our second core assumption, consider the following data generating mechanism for the treatment, which encodes presence of unmeasured confounding by making dependence between *A* and *Y* explicit under a logistic model: ![Formula][4]</img>  This model can of course not be estimated directly from the observed data without any additional assumption because it would require the potential outcome *Y* be observed both on the untreated (guaranteed by consistency) and the treated. Nevertheless, this model implies that the conditional (on *Z*) association between *A* and *Y* on the odds ratio scale is ![Graphic][5]</img>. Together, *γ* and ![Graphic][6]</img> encode the selection bias due to unmeasured confounding on the log-odds ratio scale. If both *γ* and ![Graphic][7]</img> are zeros, then *A* and *Y* are conditionally (on *Z*) independent, or equivalently, there is no residual confounding bias upon conditioning on *Z*. Our second identifying assumption formally encodes assumption A4.b of Liu et al. (2020) of homogeneous odds ratio selection bias ![Graphic][8]</img>:

**(B2)** Homogeneous selection bias: *OR*(*Y* = *y*, *A* = *a* | *Z* = *z*) = exp(*γay*).

Assumption (B2) states that the selection bias on the odds ratio scale is homogeneous across levels of the IV. Thus, this assumption allows for the presence of unmeasured confounding which, upon setting ![Graphic][9]</img> is assumed to be on average balanced with respect to the SNP IV (on the odds ratio scale). Following Liu et al. (2020), we have that under (B2) ![Formula][10]</img>  Define *ε* = *Y* − *E*(*Y* |*A, Z*), and suppose that *ε* |*A, Z* ~ *N* (0, *σ*2(*Z*)), then after some algebra we have, ![Formula][11]</img>  The selection bias on the odds ratio scale does not vary with the levels of the IV, however in order to achieve identification, the bias term *γσ*2(*Z*) must depend on the IV. This observation motivates our third assumption,

**(B3)** Outcome heteroscedasticity, that is *σ*2(*Z*) cannot be a constant.

Assumption (B3) is empirically testable using existing statistical methods for detecting non-constant variance in regression models. As shown by Paré et al. (2010), observed or unobserved gene–gene (GxG) and/or gene–environment (GxE) can result in changes in variance of a quantitative trait per genotype. Paré et al. (2010) found several SNPs associated with the variance of inflammation markers. Therefore, GxE interaction can be inferred from genetic variants associated with phenotypic variability without the need of measuring environmental factors, usually termed variance quatitative trait loci (vQTL) screening (Marderstein et al. 2021). In particular, Wang et al. (2019) performed a genome-wide vQTL analysis of about 5.6 million variants on 348,501 unrelated individuals of European ancestry in the UK Biobank and identified 75 significant vQTLs for 9 quantitative traits (out of 13 traits under study). Hence, one can generally expect this novel identification assumption (B3) to hold in the presence of pervasive heterogeneity of pleiotropic effects on the additive scale for the outcome, a setting in which both MR GxE and MR GENIUS can be severely biased. Under assumptions (B1)-(B3), we have ![Formula][12]</img>  Note that *σ*2(*z*) is the variance of Y given z, and thus can be estimated with no bias. For the binary IV *Z, σ*2(*Z* = 0) and *σ*2(*Z* = 1) are the variance of *Y* within each IV stratum and can easily be estimated using the sample variance in each group. We will describe estimating procedures in the next section for more general settings where *Z* is not necessarily binary. Importantly, in the present case, we now have two equations and two unknown parameters *β, γ*. Denote *D*(*Z* = *z*) = *E*(*Y* | *A* = 1, *Z* = *z*) − *E*(*Y* | *A* = 0, *Z* = *z*). We have the following main identification result.

Theorem 1.
*Under assumptions (B1)-(B3), the selection bias parameter and the causal effect of interest are uniquely identified as follows:* ![Formula][13]</img>  

In an observed sample, one can simply use the sample versions of unknown quantities to obtain estimates for *β* and *γ* respectively. Standard errors can be deduced by a direct application of the multivariate delta method, or by using resampling techniques such as the jackknife or the bootstrap. **Remark 1**. Note that as stated in the theorem, the causal parameter *β* can be estimated based on data among either *Z* = 0 or *Z* = 1 group. In practical settings, in order to improve efficiency, one may take either unweighted average of the two estimates as given in the theorem, or their inverse variance weighted average. In fact, one may fit a saturated linear model for the variance *σ*2(*Z* = *z*) = *b* + *bz*; We give a graphical illustration of our identification strategy as shown in Figure 2. By using the outcome heteroscedasticity assumption (B3), we can identify the selection bias parameter *γ* and then the causal effect parameter *β*. Without the assumption (B3) as shown by the red dashed line, we can only obtain *β* + *γb* and thus cannot tease apart the causal effect and the selection bias.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/30/2020.09.29.20204420/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/F2)

Figure 2: 
An illustration of our identification strategy for a binary treatment *A* and binary IV *Z*. The green line represents *E*(*Y* |*A* = 0, *Z* = *z*) and its difference between *Z* = 1 and *Z* = 0 is *θ**z*. The red dashed line (parallel to the green line) represents the hypothetical setting when *σ*2(*Z* = *z*) is constant, and the red solid line represents the setting when *σ*2(*Z* = *z*) is not a constant function in *Z*. Note that *b* and *b* are treated as known.

### 2.2 Identification with a Continuous Treatment and a Possibly Invalid Discrete IV

The identification strategy in the previous section extends to the more general setting of a continuous treatment and a more general discrete IV, for example, the SNP IV taking values 0, 1, 2 corresponding to the number of minor alleles. To do so, we introduce the notion of generalized conditional odds ratio function as a measure of the conditional association between a continuous treatment and a continuous outcome. To ground idea, consider the following model for the treatment free potential outcome: ![Formula][14]</img>  where *ϵ* is independent standard normal; then one can show that the generalized conditional odds ratio function associated with *Y* and *A* given *Z* with (*Y* = 0, *A* = 0) taken as reference values (Chen 2007) is given by ![Formula][15]</img>  By Assumption (B2), we have ![Formula][16]</img>  Therefore, ![Formula][17]</img>  For practical interpretation, we assume *A* has been centered so that *A* = 0 actually represents average treatment value in the study population. Assume the following model *E*(*Y* | *A* = 0, *Z* = *z*) = *θ* + *θ**z**z* and a log linear model for *σ*2(*Z*) such that ![Formula][18]</img>  The conditional distribution of *Y* given *A* and *Z* is ![Formula][19]</img>  Then we can simply use standard maximum likelihood estimation (MLE) to obtain consistent and fully efficient estimates for all parameters under the assumed model. However, the likelihood may not be log-concave and direct maximization of the likelihood function with respect to all parameters might not always converge. Instead, we propose a novel three-stage estimation procedure which can provide a consistent but inefficient preliminary estimator of unknown parameters. We then propose a carefully constructed one-step update estimator to obtain a consistent and fully efficient estimator. Our three-stage estimation method works as follows.

**Stage 1:** Fit the following linear regression using standard (weighted) least-squares ![Formula][20]</img>  We obtain parameter estimates ![Graphic][21]</img>, with corresponding estimated residuals ![Graphic][22]</img>.

**Stage 2:** Regress the squared residuals ![Graphic][23]</img> on (1, *Z*) in a log-linear model and obtain parameter estimates ![Graphic][24]</img> and ![Graphic][25]</img>.

**Stage 3:** Regress *Y* − *Ê*(*Y* | *A* = 0, *Z*) on *A* and ![Graphic][26]</img> without an intercept term and obtain the two corresponding regression coefficients estimates ![Graphic][27]</img> and ![Graphic][28]</img> respectively. Alternatively, we may also fit ![Graphic][29]</img>.

The proposed three-stage estimation procedure is computationally convenient, appears to always converge and provides a consistent estimator for all parameters under the assumed model. Next we propose the following one-step update to obtain a fully efficient estimator. Consider the following normal model ![Formula][30]</img>  Denote Θ = (*β, γ, η*, *η**z*, *θ*, *θ**z*) and denote the log-likelihood function as *l*(Θ; *Y, A, Z*) for an individual. With a sample of size *n*, the log-likelihood function is ![Formula][31]</img>  The score function *S**n*(Θ) = *∂l**n*(Θ)*/∂*Θ is defined as the first order derivative of the log-likelihood function with respect to Θ, and use *i**n*(Θ) = *∂*2*l**n*(Θ)*/∂*Θ*∂*Θ*T* to denote the second order derivative of the log-likelihood function. Suppose we have obtained a consistent estimator ![Graphic][32]</img> of Θ using the three-stage estimating procedure, then one can show that a consistent and fully efficient estimator is given by the one-step update estimator ![Graphic][33]</img> ![Formula][34]</img>  **Remark 2**. In biomedical studies, continuous outcomes of interest are often approximately normally distributed, and the normal linear regression is also routinely used by applied researchers.

When the normal assumption is violated, a typical remedy is to apply a transformation to the outcome prior to modeling it, such as the log-transformation or the more general Box-Cox transformation to achieve approximate normality. Alternatively, in section 2.4 we propose to model the error distribution as a more flexible Gaussian mixture model which is more robust than the Gaussian model. Alternatively, as described in the Supplemental Materials, one may consider a semiparametric location-scale model which allows the distribution of standardized residuals to remain unrestricted, a potential more robust approach although computationally more demanding.

### 2.3 Estimation and Inference under Many Weak Invalid IVs

Weak identification bias is a salient issue that needs special attention when using genetic data to strengthen causal inference, as most genetic variants typically have weak association signals. When many genetic variants are available, we recommend using the conditional maximum likelihood estimator (CMLE) based on (1), where *Z* is replaced with a multi-dimensional vector. Let ![Graphic][35]</img> be the solution to the corresponding score functions, i.e., ![Graphic][36]</img>; let ![Formula][37]</img>  be the Fisher information matrix based on the conditional likelihood function for one observation. Let *k* be the total number of parameters in Θ, which is equal to 4 + 2*p* when *Z* is *p* dimensional. When *λ*min {*nI*1(Θ)}*/k* → ∞ with *λ*min {*nI*1(Θ)} being the minimum eigenvalue of *nI*1(Θ), we show in the Supplementary Materials (Theorem 2) that ![Graphic][38]</img> is consistent and asymptotically normal as the sample size goes to infinity, and satisfies ![Formula][39]</img>  In other words, the CMLE is robust to weak identification bias and the usual inference procedure can be directly applied.

The key condition *λ*min{*nI*1(Θ)}*/k* → ∞ warrants more discussion. Note that the quantity *λ*min {*nI*1(Θ)}*/k* measures the ratio between the amount of information that a sample of size *n* car-ries about the unknown parameter and the number of parameters. In classical strong identification scenarios where the minimum eigenvalue of *I*1(Θ) is assumed to be lower bounded by a constant, then the condition simplifies to that the number of parameter is small compared with sample size, i.e., *n/k* → ∞. However, when identification is weak, the minimum eigenvalue of *I*1(Θ) can be small. In practice, the condition *λ*min{*nI*1(Θ)}*/k* → ∞ can be evaluated by ![Formula][40]</img>  which is the ratio between the minimum eigenvalue of the negative Hessian matrix and the number of parameters. We remark that the condition stated in the Supplementary Materials (Theorem 2) is more general and applies to general likelihood-based methods, which, for example, implies the condition for consistency for the profile likelihood estimator (MR.raps) in Zhao et al. (2019b). In practice, we recommend checking that the ![Graphic][41]</img> is at least greater than 10 based on our simulation studies. For valid statistical inference, a consistent estimator for the variance covariance matrix for ![Graphic][42]</img> is simply the negative Hessian matrix ![Graphic][43]</img>, whose corresponding diagonal element estimates the variance of the treatment effect estimator ![Graphic][44]</img>. Other variants of the CMLE can also be used, for example, the one-step iteration estimator ![Graphic][45]</img> in (2) is asymptotically equivalent to the CMLE ![Graphic][46]</img> as long as the initial estimator ![Graphic][47]</img> is ![Graphic][48]</img> consistent (Shao 2003).

### 2.4 More Robust Gaussian Mixture Model for the Outcome

As mentioned in Remark 2, although many continuous outcomes can be well modelled using normal linear model, however sometimes, a more robust model for the outcome error term is desired. As shown by Verbeke and Lesaffre (1996), a Gaussian mixture model is more general and thus more robust than the normal model for modelling an outcome error distribution. Consider the general location-scale model ![Formula][49]</img>  where *ϵ* ⊥ *A, Z*. Under assumptions (B1)-(B3), the conditional mean function is given by ![Formula][50]</img>  where *F* (·) denotes the cumulative distribution function of *ϵ*. We assume that the outcome error distribution can be reasonably approximated by a Gaussian mixture model with enough components (Verbeke and Lesaffre 1996). Specifically, let ![Graphic][51]</img> where *ϕ**k*(·) is the normal density with mean *µ**k* and variance ![Graphic][52]</img> satisfying the constraints ![Formula][53]</img>  The conditional mean function with Gaussian mixture error is given by ![Formula][54]</img>  Where ![Graphic][55]</img>. The conditional distribution is ![Graphic][56]</img>. In practice, estimation of (*β, γ*), which are of primary interest, may proceed under a user-specified integer value *K <* as well as parametric models for *E*(*Y* |*A* = 0, *Z* = *z*) and *σ*(*Z* = *z*) via an alternating optimization algorithm which we describe in the Supplementary Materials.

## 3 Simulation Studies

In this section, we conduct extensive simulation studies to evaluate the finite-sample performance of MR-MiSTERI compared to its main competitors, standard two stage least squares (TSLS) and MR GENIUS. We considered sample sizes of *n* = 10000, 30000, 100000. The IV *Z* is generated from a Binomial distribution with probability equal to 0.3 (minor allele frequency). The treatment *A* is generated from a standard normal distribution *N* (0, 1). The outcome *Y* is also generated from a normal distribution with mean and variance compatible with assumptions B1-B3: ![Formula][57]</img>  where *η**z* is set to be 0.2, 0.15, 0.1, 0.05. Under this model, we have that the ETT is equal to *β* = 0.8 and *γ* = 0.2. We generated in total 1000 such data sets and applied our proposed method to each of them along with TSLS and MR GENIUS. We found that TSLS gives severely biased estimates of the causal effect with estimates as extreme as −195.72 and standard errors as large as 381.57. MR GENIUS is also severely biased as the average point estimate for the causal effect is −14.99. Numerical results for TSLS and MR GENIUS are expected as their corresponding identification conditions are not met in the simulation setup. Specifically, both exclusion restriction and heteroscedasticity of the treatment with respect to *Z* assumptions do not hold. Simulation results for MR MiSTERI are summarized in Table 1. We found that when sample size is 10000, the bias and standard error of the causal effect and selection bias parameter estimates become larger when the IV strength *η**z* decreases from 0.2 to 0.1. The causal effect is less sensitive to the IV strength compared to the selection bias parameter *γ*. As we increase sample size, the bias decreases and becomes negligible at n=100000. Confidence intervals achieved the nominal 95% coverage.

View this table:
[Table 1:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/T1)

Table 1: 
Simulation results for estimating the confounding bias parameter *γ* and the ETT *β* using the one-step estimator approach (without covariates) based on 1000 Monte Carlo experiments with a range of sample sizes n. ![Graphic][58]</img> is the averaged point estimates of *γ*, and bias is calculated in percentage format. Likewise for ![Graphic][59]</img>. SE is the averaged standard error. SD stands for the sample standard deviation of the 1000 point estimates for *γ* and *β*. Coverage is calculated as the proportion of 95% confidence intervals that contains the true parameter value among those 1000 experiments. We vary the value of *η**z* in *σ*2(*Z*) = *exp*(*η* + *η**z**Z*)) to assess the impact of decreasing IV strength on the estimation results.

The second simulation study is designed to evaluate the finite sample performance of the proposed three-stage estimator and the CMLE in the presence of many weak invalid IVs. The sample size is set to be 100000, each IV *Z**j*, *j* = 1, …, *p*, is still generated to take values 0,1,2 with minor allele frequency equal to 0.3. The treatment *A* is generated from standard normal distribution.

The outcome *Y* is generated from a normal distribution with mean and variance under B1-B3: ![Formula][60]</img>  Simulation results are presented in Table 2, where the standard error (SE) for the three-stage estimator is the bootstrap estimator approximated using 100 Monte Carlo simulations, the SE for the CMLE is obtained from the inverse Hessian matrix discussed in Section 2.3. From results in Table 2, the three-stage estimator has negligible bias when *p* = 20, but shows evidently larger bias when *p* grows to 50, which is also reflected by the fact that coverage of 95% CI is substantially lower than its nominal level. In contrast, the CMLE shows negligible bias in both scenarios. The standard errors for the CMLE are close to Monte Carlo standard deviation, and the CMLE estimators show nominal coverage in both simulation scenarios. These observations agree with our theoretical assessment that the CMLE is efficient and is robust to many weak invalid IVs. In particular, it is consistent and asymptotically normal as long as the condition ![Graphic][61]</img> is reasonably large. Based on empirical evaluations (in the Supplementary Materials), we recommend checking that the ![Graphic][62]</img> value is at least greater than 10.

View this table:
[Table 2:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/T2)

Table 2: 
Simulation results for estimating the confounding bias parameter *γ* = 0.2 and the ETT *β* = 0.8 using the three-stage estimator and the CMLE based on 1000 Monte Carlo experiments, with *n* = 100000 and varying *p*. The third column is the averaged ![Graphic][63]</img>, which is the consistency and asymptotic normality condition for the CMLE and is preferably to be large. ![Graphic][64]</img> is the averaged point estimates of *γ*, and bias is calculated in percentage format. Likewise for ![Graphic][65]</img>. SE is the averaged standard error. SD stands for the sample standard deviation of the 1000 point estimates for *γ* and *β*. Coverage is calculated as the proportion of 95% confidence intervals that contains the true parameter value among those 1000 experiments.

We similarly generate the treatment *A* from the standard normal distribution, the IV *Z* which takes values in {0, 1, 2} with minor allele frequency equal to 0.3. However we generate *Y* = *E*(*Y* | *A,Z*) + *σ*(*Z*)*ϵ* where ![Graphic][66]</img>, *σ*2(*Z*) = exp(0.1 + *η**z**Z*), and the error *ϵ* is generated from a Gaussian mixture distribution with two components, ![Formula][67]</img>  with the parameter values (*π*1, *π*2) = (0.4, 0.6), (*µ*1, *µ*2) = (−0.6, 0.4) and (*δ*1, *δ*2) = (0.5, 1.049). We vary the value of *η**z* in the set 0.1, 0.25, 0.5 to assess the impact of IV strength. The results using the proposed Gaussian mixture method with *K* = 2 are summarized in Table 3. There is noticeable finite-sample bias and under-coverage when *η**z* = 0.1, especially for estimation of the selection bias parameter *γ*, but the performance improves with larger *η**z* or sample size.

View this table:
[Table 3:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/T3)

Table 3: 
Simulation results for estimating the confounding bias parameter *γ* = 0.2 and the ETT *β* = 0.8 under Gaussian mixture error based on 1000 replicates. See the caption of Table 1 for description of the summary statistics.

## 4 An Application to the Large-Scale UK Biobank Study Data

UK Biobank is a large-scale ongoing prospective cohort study that recruited around 500,000 participants aged 40-69 in 2006-2010. Participants provided biological samples, completed questionnaires, underwent assessments, and had nurse led interviews. Follow up is chiefly through cohort-wide linkages to National Health Service data, including electronic, coded death certificate, hospital and primary care data (Sudlow et al. 2015). To control for population stratification, we restricted our analysis to participants with self-reported and genetically validated white British ancestry. For quality control, we also excluded participants with (1) excess relatedness (more than 10 putative third-degree relatives) or (2) mismatched information on sex between genotyping and self-report, or (3) sex-chromosomes not XX or XY, or (4) poor-quality genotyping based on heterozygosity and missing rates *>* 2%.

To illustrate our methods, we extracted a total of 289 010 white British subjects from the UK Biobank data with complete measurements in body mass index (BMI) and glucose levels. We further selected BMI associated SNP potential IVs among the list provided by Sun et al. (2019). We found seven SNPs (in Table 4) that are predictive of the residual variance of glucose levels in our Stage 2 regression (with p-values *P <* 0.01), that is, those seven SNPs might satisfy assumption (B3). The average BMI was 27.39 *kg/m*2 (SD: 4.75 *kg/m*2), and the average glucose level was 5.12 *mmol/L* (SD: 1.21 *mmol/L*). We applied MR MiSTERI to this data with the goal of evaluating the causal relationship between BMI as treatment and glucose level as outcome; analysis results are summarized in Table 4.

View this table:
[Table 4:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/T4)

Table 4: 
UK Biobank data analysis results. Columns 2-5 contain the results based on the model (1), columns 6-9 contain results based on the Gaussian mixture model described in Section 2.4, columns 10-11 contain results using the standard two-stage least squares using the R package *ivreg*.

For comparison purposes, we also include analysis results for standard two-stage least square (TSLS) implemented in the R package *ivreg* (Fox et al. 2020). The allele score for the seven SNPs selected is defined as the signed sum of their minor alleles, where the sign is determined by the regression coefficient in our stage 2 regression. We also implemented MR-GENIUS method (Tchetgen Tchetgen et al. 2020) using the same seven SNPs, the causal effect of BMI on glucose was estimated as 0.046 (se: 0.01, 95% CI: (0.0264,0.0656), *P*-value: 2.24 × 10−6). Crude regression analysis by simply regressing glucose on BMI gives an estimate of 0.041 (se: 4.65 × 10−4, 95% CI: (0.0401, 0.0419), *P*-value: *<* 2 × 10−16). Further, we included all seven SNPs into a conventional regression together with BMI, and obtained a similar estimate 0.039 (se: 4.65 × 10−4, 95% CI: (0.0381, 0.0399), *P*-value: *<* 2 × 10−16). It is interesting to consider results obtained by selecting each SNP as single candidate IV as summarized in the table. For instance, take the SNP rs2176040, the corresponding causal effect estimate is ![Graphic][68]</img> (se: 0.0139, 95% CI: (0.0138, 0.0682), *P*-value: 0.003), and the selection bias parameter is estimated as ![Graphic][69]</img> (se: 0.0098, 95% CI: (0.0398, 0.0782), *P*-value: 2.93 × 10−9). However, the TSLS method gives causal effect estimate −1.619 (se: 2.04, 95% CI: (−5.6174, 2.3794), *P*-value: 0.428). These conflicting findings may be due to SNP rs2176040 being an invalid IV, in which case TSLS likely yields biased results. We further combine 20 SNPs, 13 of which are weakly associated with the residual variance of glucose levels, we obtain using CMLE a causal effect estimate of BMI on glucose ![Graphic][70]</img> (se: 0.0025, 95% CI: (0.0231, 0.0329), *P*-value: 6.87 × 10−31), and a selection bias estimate of ![Graphic][71]</img> (se: 0.0017, 95% CI: (0.0057, 0.0123), *P*-value: 2.82 × 10−7). The consistency and asymptotic normality condition ![Graphic][72]</img>, greater than our recommended value 10. This final analysis suggests a somewhat smaller treatment effect size than all the other MR methods, while providing strong statistical evidence of a positive effect with a relatively small but statistically significant amount of confounding bias detected.

## 5 Discussion

In this paper, we have proposed a novel MR MiSTERI method for identification and inference about causal effects in observational studies. MR MiSTERI leverages a possible association between candidate IV SNPs and the variance of the outcome in order to disentangle confounding bias from the causal effect of interest. Importantly, MR MiSTERI can recover unbiased inferences about the causal effect in view even when none of the candidate IV SNPs is a valid IV. Key assumptions which the approach relies on include no additive interaction involving a candidate IV SNP and the treatment causal effect in the outcome model, and a requirement that the amount of selection bias (measured on the odds ratio scale) remains constant as a function of SNP values. Violation of either assumption may result in incorrect inferences about treatment effect. Although not pursued in this paper, robustness to such violation may be assessed via a sensitivity analysis in which the impact of various departures from the assumption may be explored by varying sensitivity parameters. Alternatively, as illustrated in our application, providing inferences using a variety of MR methods each of which relying on a different set of assumptions provides a robustness check of empirical findings relative to underlying assumptions needed for a causal interpretation of results. We did not include the MR GxE method for numerical comparison because it requires users to specify an observed covariate interacting with the SNP IV. For a particular data analysis problem as we did in Section **??**, we typically do not have sufficient prior knowledge about which covariate (out of thousands) might interact with a given SNP. In contrast, MR GENIUS method requires that the treatment variable admits a residual which is heteroscedastic with respect to the IV, while MR MiSTERI requires that the outcome admits a residual that is heteroscedastic with respect to the IV. In practice, one can in principle perform a formal statistical test to determine which method might be most suitable for a particular data analysis problem. Note that both MR MiSTERI and MR GENIUS methods do not require us to specify any covariate a priori interacting with the candidate SNP IV, and hence are more convenient for routine data analyses.

A notable robustness property of CMLE estimator of MR MiSTERI is that as we formally establish, it is robust to many weak IV bias, thus providing certain theoretical guarantees of reliable performance in many common MR settings where one might have available a relatively large number of weak candidate IVs, many of which may be subject to pleiotropy. While the proposed methods require a correctly specified Gaussian model for the conditional distribution of *Y* given *A* and *Z*, such an assumption can be relaxed. In fact, we also consider a Gaussian mixture model as a framework to assess and possibly correct for potential violation of this assumption. Furthermore, in the Supplementary Materials, we briefly describe a semiparametric three-stage estimation approach under the location-scale model which allows the distribution of standardized residuals to remain unrestricted. It is of interest to investigate robustness and efficiency properties of this more flexible estimator, which we plan to pursue in future work. It is likewise of interest to investigate whether the proposed methods can be extended to the important setting of a binary outcome, which is of common occurrence in epidemiology and other health and social sciences.

## Data Availability

This research has been conducted using the UK Biobank resource ([https://www.ukbiobank.ac.uk](https://www.ukbiobank.ac.uk)) under application number 44430.

[https://www.ukbiobank.ac.uk](https://www.ukbiobank.ac.uk) 

## Supplementary Materials

Proof of Theorem 1
*Proof*. Under assumptions (B1) - (B3), we have ![Formula][73]</img>  

Hence, we have the following two equations ![Formula][74]</img>  Solving these two equations simultaneously gives the result in Theorem 1.□

## Estimation and Inference under Weak Identification

Consider the following normal model also given in the main text ![Formula][75]</img>  Let ![Formula][76]</img>  be the Fisher information matrix for a sample of size *n, λ*min{*I**n*(Θ)} be the minimum eigenvalue of *I**n*(Θ), *tr*(*A*) be the trace of a square matrix *A*.

Condition 2 summaries the regularity conditions.

**Condition 2** (Regularity Conditions). *For every o* = (*y, a, z*) *in the support of O* = (*Y, A, Z*), *the likelihood function denoted by f* (Θ *o*) *is twice continuously differentiable in* Θ *and satisfies* ![Formula][77]</img>  *for ψ*Θ(*o*) = *f*Θ(*o*) *and* = *∂f*Θ(*o*)*/∂*Θ, *where ν is a suitable measure; there is a positive ϵ such that the Fisher information matrix I**n*(Γ) *is positive definite for* Γ : ‖ Γ− Θ ‖ *< ϵ and for any given* Θ, *there exists a positive number ϵ and a positive integrable function h, such that* ![Formula][78]</img>  *for all o in the support of O, where* ![Graphic][79]</img> *for any matrix A*.

Theorem 3 establishes the consistency and asymptotic normality of solutions to the score function.

Theorem 3.
*(a) Under Condition 2 and assume that the observed data O**i*, *i* = 1, …, *n are in-dependent and identically distributed, if k/*[*λ*min{*nI*1(Θ)}] → 0, *then the sequence of conditional maximum likelihood estimators* ![Graphic][80]</img> *is consistent, i*.*e*.,![Graphic][81]</img>.

*(b) Any consistent sequence* ![Graphic][82]</img>*such that* ![Graphic][83]</img> *is asymptotically normal, and as n* → ∞, ![Formula][84]</img>  

In fact, the general consistency condition in Theorem 3 can be written as *λ*min{*nI*1(Θ)} → ∞ and ![Formula][85]</img>  for general likelihood-based methods, which, for example, applies to the profile-likelihood estimator in Zhao et al. (2019b). Specifically, in Zhao et al. (2019b), in their notations, *k* = 1, *E* {*S**n*(Θ)*S**n*(Θ)*T*} = *V*1, *I**n*(Θ) = *V*2, and ![Graphic][86]</img>, and thus, condition (S4) becomes ![Graphic][87]</img>, which is the condition for consistency in Zhao et al. (2019b, Theorem 3.1).

*Proof*. First, notice that the condition *k/*[*λ*min{*nI*1(Θ)}] → 0 implies *λ*min{*nI*1(Θ)} → ∞. Let *B**n*(*c*) = {Γ : ‖ [*I**n*(Θ)]1*/*2(Γ − Θ) ‖ ≤ *c*} for *c >* 0, where ![Graphic][88]</img> for any matrix *A*. Since *B**n*(*c*) shrinks to Θ as *n* → ∞, the consistency result in Theorem 3(a) is implied by the fact that for any *e >* 0, there exists *c >* 0 and *n* *>* 1 such that ![Formula][89]</img>  and ![Graphic][90]</img> is unique for *n* ≥*n*, where *∂B**n*(*c*) is the boundary of *B**n*(*c*). For Γ ∈*∂B**n*(*c*), the Taylor expansion gives ![Formula][91]</img>  where *λ* = [*I**n*(Θ)]1*/*2(Γ ™ Θ)*/c* satisfying ‖*λ‖* = 1, *i**n*(Γ) = *∂*2*l**n*(Γ)*/∂*Γ*∂*Γ*T*, and Γ∗ lies between Γ and Θ. Note that ![Formula][92]</img>  which follows from the dominated convergence theorem combined with the facts that (a) *i*1(Γ) = *∂l*1(Γ)*/∂*Γ*∂*Γ*T* is continuous in a neighborhood of Θ for any fixed observation; (b) *B**n*(*c*) shrinks to {Θ} from *λ*min(*I**n*(Θ)) → ∞; and (c) for sufficiently large *n*, ![Graphic][93]</img>is bounded by an integrate function under Condition 2. By the strong law of large numbers, ![Graphic][94]</img>These results imply that ![Formula][95]</img>  Note that max*λ*{*λ**T* [*I**n*(Θ)]−1*/*2*S**n*(Θ)} = ‖*I**n*(Θ)−1*/*2*S**n*(Θ) ‖ with ![Formula][96]</img>  Therefore, (S5) follows from (S7) and ![Formula][97]</img>  from choosing *c* such that ![Graphic][98]</img> is large enough, which is possible because ![Formula][99]</img>  For *n* ≥ *n*, ![Graphic][100]</img> is unique because from Condition 2, the Fisher information *I**n*(Γ) is positive definite in a neighborhood of Θ.

(b) Using the mean value theorem for vector-valued functions, we obtain that ![Formula][101]</img>  Note that using a similar argument as in (S6), when ![Graphic][102]</img>, ![Formula][103]</img>  Since ![Graphic][104]</img> and *I* (Θ) = *nI* (Θ), ![Formula][105]</img>  This and Slutsky’s theorem imply that ![Graphic][106]</img> is asymptotically equivalent with ![Formula][107]</img>  by the central limit theorem for i.i.d. samples. □

### Empirical guidelines for asymptotics

In practice, it is important to know what value of ![Graphic][108]</img> would be considered large enough for the CMLE to perform well, i.e., to have small bias and adequate coverage probability, in presence of many weak invalid IVs.

We follow the same setting as the second simulation study in Section 3 of the main article. We consider two scenarios: (i) *n* = 105, *p* = 20 in Figure 3; and (ii) *n* = 106, *p* = 5 in Figure 4. For each of the 1000 simulation runs, we plot ![Graphic][109]</img> against ![Graphic][110]</img>. We also plot the two standard error bands centered at *β* (shaded area). From Figures 3–4, ![Graphic][111]</img> are all larger than 10, and ![Graphic][112]</img> are inside the shaded area, close to the true value *β* = 0.8. Note that because ![Graphic][113]</img> (which relates to the Hessian matrix) is a function of ![Graphic][114]</img>, we see that ![Graphic][115]</img> is not symmetric around the true value *β* = 0.8. But this usually does not affect using ![Graphic][116]</img> as a condition to evaluate whether the CMLE would perform well. We have also simulated extensively under other settings, we find that the CMLE has good performance when ![Graphic][117]</img>.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/30/2020.09.29.20204420/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/F3)

Figure 3: 
Evaluation of the consistency and asymptotic normality condition for the CMLE under the same setting as the second simulation study in Section 3 of the main article, with *n* = 105, *p* = 20. The x-axis plots the condition ![Graphic][118]</img>, and the y-axis plots values of the CMLE in 1000 simulations. The shaded area represents the two-sided error bands centered at *β* = 0.8.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/30/2020.09.29.20204420/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2021/03/30/2020.09.29.20204420/F4)

Figure 4: 
Evaluation of the consistency and asymptotic normality condition for the CMLE under the same setting as the second simulation study in Section 3 of the main article, with *n* = 106, *p* = 5. The x-axis plots the condition ![Graphic][119]</img>, and the y-axis plots values of the CMLE in 1000 simulations. The shaded area represents the two-sided error bands centered at *β* = 0.8.

### Estimation method under Gaussian mixture error

We describe the alternating optimization procedure with user-specified *K* and parametric models *σ*(*Z* = *z*; *η*) and *E*(*Y* | *A* = 0, *Z* = *z*; *θ*) ≡ *µ*(*z*; *θ*) indexed by finite-dimensional parameters *η* and *θ* respectively.

1.  Initialization step: maximize the standard Gaussian location scale model ![Formula][120]</img>  with respect to (*β, θ, η, γ*) by maximum likelihood estimation, to obtain ![Graphic][121]</img>. Construct the standardized residuals ![Formula][122]</img>  

2.  Maximize the Gaussian mixture location scale model ![Formula][123]</img>  with respect to (*π*1, …, *π**K*, *µ*1, …, *µ**K*, *δ*1, …, *δ**K*) using constrained maximum likelihood estimation to obtain ![Graphic][124]</img>. Construct ![Formula][125]</img>  where ![Formula][126]</img>  

3.  Minimize ![Graphic][127]</img> using the least squares method with offset ![Graphic][128]</img> to obtain new ![Graphic][129]</img>, followed by regressing the squared residuals to obtain new ![Graphic][130]</img>. Construct the standardized residuals ![Formula][131]</img>  based on the new values![Graphic][132]</img>

4.  Iterate between steps (ii) and (iii) until change in log-likelihood derived at step (ii) falls below a user-specified tolerance level. In this simulation we iterate until |log *LL**j* − log *LL**j*−1 |*/*| log *LL**j*−1 *<* 0.001 at the *j*-th iteration.

Asymptotic variance is estimated based on standard M-estimation theory by stacking the estimating functions in steps (ii) and (iii), evaluated at the final parameter values at convergence.

### Semiparametric three-stage estimation

Consider the general location-scale model ![Formula][133]</img>  where *ϵ* ⊥ *A, Z*. Under assumptions (B1)-(B3), the conditional mean function is given by ![Formula][134]</img>  A semiparametric three-stage estimator of (*β, γ*) may be implemented via the following steps:

1.  Fit the regression model *µ*(*a, z*) = *m**a* (*a*)+*m**z* (*z*)+*m**az* (*a, z*), where 0 = *m**z* (0) = *m**az* (*a*, 0) = *m**az* (0, *z*) for identification, using a nonparametric method such as generalized additive model. For example, if the support of *Z* is 0, 1, 2, then the nonparametric model for the outcome is of the form ![Graphic][135]</img> with *m*1 (0) = *m*2 (0) = 0; a saturated model may be considered if *A* also has finite support. Let ![Graphic][136]</img> denote the resulting estimator of the mean *µ*(*a, z*).

2.  Using the residuals from (i), fit a nonparametric model for the conditional variance *σ*2 (*z*). If the support of *Z* is {0, 1, 2}, then we can specify the saturated model ![Graphic][137]</img>. Let ![Graphic][138]</img> denote the resulting estimator of *σ*2(*z*).

3.  Define ![Graphic][139]</img> and let ![Formula][140]</img> 
    
    The proposed semiparametric three-stage estimator of (*β, γ*) is ![Formula][141]</img> 

We note that a potentially more efficient approach is to maximize the log-likelihood for the model using a kernel estimator for the density of standardized residuals from step (iii). Specifically, let ![Formula][142]</img>  and define ![Formula][143]</img>  where ![Graphic][144]</img>is a kernel density estimator with bandwidth *h >* 0. The alterna-tive estimator of (*β, γ*) is given by ![Formula][145]</img>  

## Acknowledgements

Dr. Zhonghua Liu’s research is supported by the Start-up research fund (000250348) of the University of Hong Kong. Dr. Eric Tchetgen Tchetgen is supported by grants R01AG065276, R01CA222147 and R01AI27271. Dr. Baoluo Sun is supported by the National University of Singapore Start-up Grant (R-155-000-203-133). This research has been conducted using the UK Biobank resource ([https://www.ukbiobank.ac.uk](https://www.ukbiobank.ac.uk)) under application number 44430. Conflict of interest: None declared.

## Footnotes

*   * ett{at}wharton.upenn.edu

*   Revise tables, wordings and add more discussions.

*   Received September 29, 2020.
*   Revision received March 29, 2021.
*   Accepted March 30, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444–455.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2291629&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996UP55200002&link_type=ISI) 

2.  Chao, J. C. and Swanson, N. R. (2005). Consistent estimation with a large number of weak instruments. Econometrica, 73(5):1673–1692.
    
    
3.  Chen, H. Y. (2007). A semiparametric odds ratio model for measuring association. Biometrics, 63(2):413–421.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1541-0420.2006.00701.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17688494&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

4.  Davies, N. M., von Hinke Kessler Scholder, S., Farbmacher, H., Burgess, S., Windmeijer, F., and Smith, G. D. (2015). The many weak instruments problem and mendelian randomization. Statistics in Medicine, 34(3):454–468.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.6358&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25382280&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

5.  Didelez, V. and Sheehan, N. (2007). Mendelian randomization as an instrumental variable approach to causal inference. Statistical Methods in Medical Research, 16(4):309–330.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0962280206077743&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17715159&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000248753000001&link_type=ISI) 

6.  Ference, B., Kastelein, J., Ginsberg, H., Chapman, M., Nicholls, S., Ray, K., Packard, C., Laufs, U., Brook, R., Oliver-Williams, C., Butterworth, A., Danesh, J., Smith, G., Catapano, A., and Sabatine, M. (2017). Association of genetic variants related to cetp inhibitors and statins with lipoprotein levels and cardiovascular risk. JAMA - Journal of the American Medical Association, 318(10):947–956.
    
    
7.  Fox, J., Kleiber, C., and Zeileis, A. (2020). ivreg: Two-Stage Least-Squares Regression with Diagnostics. R package version 0.5-0.
    
    
8.  Glymour, M. M., Tchetgen Tchetgen, E. J., and Robins, J. M. (2012). Credible Mendelian Randomization Studies: Approaches for Evaluating the Instrumental Variable Assumptions. American Journal of Epidemiology, 175(4):332–339.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwr323&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22247045&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000300042200011&link_type=ISI) 

9.  Hernan, M. A. and Robins, J. M. (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
    
    
10. Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., and Davey Smith, G. (2008). Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine, 27(8):1133–1163.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.3034&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17886233&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

11. Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1):67–80.
    
    
12. Liu, L., Miao, W., Sun, B., Robins, J., and Tchetgen Tchetgen, E. (2020). Identification and inference for marginal average treatment effect on the treated with an instrumental variable. Statistica Sinica.
    
    
13. Marderstein, A. R., Davenport, E. R., Kulm, S., Van Hout, C. V., Elemento, O., and Clark, A. G. (2021). Leveraging phenotypic variability to identify genetic interactions in human phenotypes. American journal of human genetics, 108:49–67.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2020.11.016&link_type=DOI) 

14. Newey, W. K. and Windmeijer, F. (2009). Generalized method of moments with many weak moment conditions. Econometrica, 77(3):687–719.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3982/ECTA6224&link_type=DOI) 

15. Paré, G., Cook, N. R., Ridker, P. M., and Chasman, D. I. (2010). On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the women’s genome health study. PLoS Genet, 6(6):e1000981.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1000981&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20585554&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

16. Robins, J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics - Theory and Methods, 23(8):2379–2412.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/03610929408831393&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1994NU67400017&link_type=ISI) 

17.  Scandinavian Simvastatin Survival Study Group (1994). Randomised trial of cholesterol lowering in 4444 patients with coronary heart disease: the scandinavian simvastatin survival study (4s). The Lancet, 344(8934):1383–1389.
    
    
18. Shao, J. (2003). Mathematical Statistics. Springer-Verlag New York.
    
    
19. Smith, G. D. and Ebrahim, S. (2003). ‘mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32(1):1–22.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyg070&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12689998&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182341300001&link_type=ISI) 

20. Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M., and Smoller, J. W. (2013). Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics, 14:483–495.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg3461&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23752797&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

21. Spiller, W., Hartwig, F. P., Sanderson, E., Smith, G. D., and Bowden, J. (2020). Interaction-based mendelian randomization with measured and unmeasured gene-by-covariate interactions. medRxiv.
    
    
22. Spiller, W., Slichter, D., Bowden, J., and Davey Smith, G. (2019). Detecting and correcting for bias in mendelian randomization analyses using gene-by-environment interactions. Int J Epidemiol, 48(3):702–712.
    
    
23. Staiger, D. and Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3):557–586.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2171753&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997WV90300003&link_type=ISI) 

24. Stock, J. H., Wright, J. H., and Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics, 20(4):518–529.
    
    
25. Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al. (2015). Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Plos Medicine, 12(3):e1001779.
    
    
26. Sun, D., Zhou, T., Heianza, Y., Li, X., Fan, M., Fonseca, V. A., and Qi, L. (2019). Type 2 diabetes and hypertension: a study on bidirectional causality. Circulation research, 124(6):930–937.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

27. Tchetgen Tchetgen, E. J., Sun, B., and Walter, S. (2020). The genius approach to robust mendelian randomization inference. Statistical Science.
    
    
28. Verbanck, M., Chen, C.-Y., Neale, B., and Do, R. (2018). Detection of widespread horizontal pleiotropy in causal relationships inferred from mendelian randomization between complex traits and diseases. Nature Genetics, 50(5):693–698.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0099-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29686387&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F30%2F2020.09.29.20204420.atom) 

29. Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the randomeffects population. Journal of the American Statistical Association, 91(433):217–221.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2291398&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996UB27200021&link_type=ISI) 

30. Wang, H., Zhang, F., Zeng, J., Wu, Y., Kemper, K. E., Xue, A., Zhang, M., Powell, J. E., Goddard, M. E., Wray, N. R., Visscher, P. M., McRae, A. F., and Yang, J. (2019). Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the uk biobank. Science Advances, 5:eaaw3538.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo4OiJhZHZhbmNlcyI7czo1OiJyZXNpZCI7czoxMjoiNS84L2VhYXczNTM4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDMvMzAvMjAyMC4wOS4yOS4yMDIwNDQyMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

31. Wang, L. and Tchetgen Tchetgen, E. (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3):531–550.
    
    
32. Ye, T., Shao, J., and Kang, H. (2019). Debiased inverse-variance weighted estimator in two-sample summary-data mendelian randomization. arXiv:1911.09802.
    
    
33. Zhao, Q., Chen, Y., Wang, J., and Small, D. S. (2019a). Powerful three-sample genome-wide design and robust statistical inference in summary-data mendelian randomization. International Journal of Epidemiology.
    
    
34. Zhao, Q., Wang, J., Hemani, G., Bowden, J., and Small, D. S. (2019b). Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Annals of Statistics.

 [1]: /embed/graphic-2.gif
 [2]: /embed/graphic-3.gif
 [3]: /embed/graphic-4.gif
 [4]: /embed/graphic-5.gif
 [5]: /embed/inline-graphic-1.gif
 [6]: /embed/inline-graphic-2.gif
 [7]: /embed/inline-graphic-3.gif
 [8]: /embed/inline-graphic-4.gif
 [9]: /embed/inline-graphic-5.gif
 [10]: /embed/graphic-6.gif
 [11]: /embed/graphic-7.gif
 [12]: /embed/graphic-8.gif
 [13]: /embed/graphic-9.gif
 [14]: /embed/graphic-11.gif
 [15]: /embed/graphic-12.gif
 [16]: /embed/graphic-13.gif
 [17]: /embed/graphic-14.gif
 [18]: /embed/graphic-15.gif
 [19]: /embed/graphic-16.gif
 [20]: /embed/graphic-17.gif
 [21]: /embed/inline-graphic-6.gif
 [22]: /embed/inline-graphic-7.gif
 [23]: /embed/inline-graphic-8.gif
 [24]: /embed/inline-graphic-9.gif
 [25]: /embed/inline-graphic-10.gif
 [26]: /embed/inline-graphic-11.gif
 [27]: /embed/inline-graphic-12.gif
 [28]: /embed/inline-graphic-13.gif
 [29]: /embed/inline-graphic-14.gif
 [30]: /embed/graphic-18.gif
 [31]: /embed/graphic-19.gif
 [32]: /embed/inline-graphic-15.gif
 [33]: /embed/inline-graphic-16.gif
 [34]: /embed/graphic-20.gif
 [35]: /embed/inline-graphic-17.gif
 [36]: /embed/inline-graphic-18.gif
 [37]: /embed/graphic-21.gif
 [38]: /embed/inline-graphic-19.gif
 [39]: /embed/graphic-22.gif
 [40]: /embed/graphic-23.gif
 [41]: /embed/inline-graphic-20.gif
 [42]: /embed/inline-graphic-21.gif
 [43]: /embed/inline-graphic-22.gif
 [44]: /embed/inline-graphic-23.gif
 [45]: /embed/inline-graphic-24.gif
 [46]: /embed/inline-graphic-25.gif
 [47]: /embed/inline-graphic-26.gif
 [48]: /embed/inline-graphic-27.gif
 [49]: /embed/graphic-24.gif
 [50]: /embed/graphic-25.gif
 [51]: /embed/inline-graphic-28.gif
 [52]: /embed/inline-graphic-29.gif
 [53]: /embed/graphic-26.gif
 [54]: /embed/graphic-27.gif
 [55]: /embed/inline-graphic-30.gif
 [56]: /embed/inline-graphic-31.gif
 [57]: /embed/graphic-28.gif
 [58]: T1/embed/inline-graphic-32.gif
 [59]: T1/embed/inline-graphic-33.gif
 [60]: /embed/graphic-30.gif
 [61]: /embed/inline-graphic-34.gif
 [62]: /embed/inline-graphic-35.gif
 [63]: T2/embed/inline-graphic-36.gif
 [64]: T2/embed/inline-graphic-37.gif
 [65]: T2/embed/inline-graphic-38.gif
 [66]: /embed/inline-graphic-39.gif
 [67]: /embed/graphic-32.gif
 [68]: /embed/inline-graphic-40.gif
 [69]: /embed/inline-graphic-41.gif
 [70]: /embed/inline-graphic-42.gif
 [71]: /embed/inline-graphic-43.gif
 [72]: /embed/inline-graphic-44.gif
 [73]: /embed/graphic-35.gif
 [74]: /embed/graphic-36.gif
 [75]: /embed/graphic-37.gif
 [76]: /embed/graphic-38.gif
 [77]: /embed/graphic-39.gif
 [78]: /embed/graphic-40.gif
 [79]: /embed/inline-graphic-45.gif
 [80]: /embed/inline-graphic-46.gif
 [81]: /embed/inline-graphic-47.gif
 [82]: /embed/inline-graphic-48.gif
 [83]: /embed/inline-graphic-49.gif
 [84]: /embed/graphic-41.gif
 [85]: /embed/graphic-42.gif
 [86]: /embed/inline-graphic-50.gif
 [87]: /embed/inline-graphic-51.gif
 [88]: /embed/inline-graphic-52.gif
 [89]: /embed/graphic-43.gif
 [90]: /embed/inline-graphic-53.gif
 [91]: /embed/graphic-44.gif
 [92]: /embed/graphic-45.gif
 [93]: /embed/inline-graphic-54.gif
 [94]: /embed/inline-graphic-55.gif
 [95]: /embed/graphic-46.gif
 [96]: /embed/graphic-47.gif
 [97]: /embed/graphic-48.gif
 [98]: /embed/inline-graphic-56.gif
 [99]: /embed/graphic-49.gif
 [100]: /embed/inline-graphic-57.gif
 [101]: /embed/graphic-50.gif
 [102]: /embed/inline-graphic-58.gif
 [103]: /embed/graphic-51.gif
 [104]: /embed/inline-graphic-59.gif
 [105]: /embed/graphic-52.gif
 [106]: /embed/inline-graphic-60.gif
 [107]: /embed/graphic-53.gif
 [108]: /embed/inline-graphic-61.gif
 [109]: /embed/inline-graphic-62.gif
 [110]: /embed/inline-graphic-63.gif
 [111]: /embed/inline-graphic-64.gif
 [112]: /embed/inline-graphic-65.gif
 [113]: /embed/inline-graphic-66.gif
 [114]: /embed/inline-graphic-67.gif
 [115]: /embed/inline-graphic-68.gif
 [116]: /embed/inline-graphic-69.gif
 [117]: /embed/inline-graphic-70.gif
 [118]: F3/embed/inline-graphic-71.gif
 [119]: F4/embed/inline-graphic-72.gif
 [120]: /embed/graphic-56.gif
 [121]: /embed/inline-graphic-73.gif
 [122]: /embed/graphic-57.gif
 [123]: /embed/graphic-58.gif
 [124]: /embed/inline-graphic-74.gif
 [125]: /embed/graphic-59.gif
 [126]: /embed/graphic-60.gif
 [127]: /embed/inline-graphic-75.gif
 [128]: /embed/inline-graphic-76.gif
 [129]: /embed/inline-graphic-77.gif
 [130]: /embed/inline-graphic-78.gif
 [131]: /embed/graphic-61.gif
 [132]: /embed/inline-graphic-79.gif
 [133]: /embed/graphic-62.gif
 [134]: /embed/graphic-63.gif
 [135]: /embed/inline-graphic-80.gif
 [136]: /embed/inline-graphic-81.gif
 [137]: /embed/inline-graphic-82.gif
 [138]: /embed/inline-graphic-83.gif
 [139]: /embed/inline-graphic-84.gif
 [140]: /embed/graphic-64.gif
 [141]: /embed/graphic-65.gif
 [142]: /embed/graphic-66.gif
 [143]: /embed/graphic-67.gif
 [144]: /embed/inline-graphic-85.gif
 [145]: /embed/graphic-68.gif