Bayesian estimation of IVW and MR-Egger models for two-sample Mendelian randomization studies ============================================================================================= * Okezie Uche-Ikonne * Frank Dondelinger * Tom Palmer ## Abstract We present our package, mrbayes, for the open source software environment R. The package implements Bayesian estimation for IVW and MR-Egger models, including the radial MR-Egger model, for summary-level data in Mendelian randomization analyses. Users have the option of fitting the models using either JAGS or Stan software packages. We have implemented a choice of prior distributions for the model parameters, namely; weakly informative, non-informative, a joint prior for the MR-Egger model slope and intercept, and an informative prior (pseudo-horseshoe prior), or the user can specify their own prior. Similar prior distributions are included using the Stan software with the exception of a user-defined prior. We include We show how to use the package through an applied example investigating the causal effect of BMI on acute ischemic stroke. In future work, we plan to provide functions for Multivariable MR estimation. ## Introduction Observational epidemiology is limited by possible bias due to unmeasured confounders, reverse causation and other problems. Mendelian randomization (MR) is a method of testing and estimating causal effects for the aetiology of diseases1. MR uses genetic variants as instrumental variables related to a modifiable phenotype to estimate a causal effect of the phenotype on a disease outcome. By including multiple instruments, we can increase power for hypothesis testing. The trade-off with this approach is the risk of violating the second and third instrumental variable assumptions due to horizontal pleiotropy. Genome wide association studies (GWAS) provide many potential instruments, and we can obtain summary-level datasets for MR analyses. For two-sample MR, we get the values of instrument-phenotype associations and instrument-outcome associations from different samples. The inverse variance weighted (IVW) model estimates the causal effect for multiple independent instruments in summary data. However, in the presence of pleiotropy its estimates are biased. Methods have been derived which estimate causal effects that are robust to pleiotropy; such as the MR-Egger model.2 The MR-Egger model relies on its InSIDE (Instrument strength is independent of direct effect) assumption. The MR- Egger model has recently been adapted with its radial formulation which has the advantage of viewing the instrument specific IV estimates in a radial plot and the IVW model is its sub-model3. Extending MR analysis with Bayesian estimation allows specification of prior distributions on the model parameters. Several authors have considered Bayesian estimation of MR models including assessing different model parameterisations and Bayesian model averaging.4,5. Bayesian methods have also been applied towards dependant instruments and invalid instruments6. The use of weakly informative prior distributions in the MR-Egger model has been shown to have good coverage properties8. This paper introduces our mrbayes R package that implements Bayesian estimation of the IVW, MR-Egger, and Radial MR-Egger models. The models are estimated using Markov chain Monte Carlo(MCMC) methods through an R interface to the JAGS and Stan software (using the rjags and rstan packages).9,10 Our package includes some specified prior distributions; non-informative, weakly informative, a shrinkage prior on the causal effect estimate (Pseudo-Horseshoe prior), and a joint prior on the intercept and causal effect estimate in the MR-Egger and radial MR-Egger models. The package also allows users to specify their own prior distributions within JAGS software. In the next section, we briefly introduce the models included in our package. We then introduce the features of our mrbayes package, and we show how to use the package in an applied example. The supplementary material includes additional points relating to the methodology and examples. ### Methods Equations (1), (2) and (3) denote the IVW, MR-Egger and Radial MR-Egger models respectively. Please see the supplementary material for additional explanation of the models. ![Formula][1] ![Formula][2] ![Formula][3] #### Prior Distributions Table 1 gives a summary of the default prior distributions for the parameters in the mrbayes package. The supplementary material gives additional details about the prior distributions for the model including a covariance between intercept and slope in the MR-Egger models. View this table: [Table 1:](http://medrxiv.org/content/early/2020/08/20/19005868/T1) Table 1: Formula for default prior models in mrbayes. For functions in IVW model, there is no *α* parameter ### Implementation Our mrbayes package provides the following functions: * mr_format, a function for setting up the summary-level dataset for analysis. The functions that use JAGS/Stan software are; * mr\_ivw\_rjags/mr\_ivw\_stan, a function for estimating causal effects using the Bayesian IVW model, with a choice of prior distributions; * mr\_egger\_rjags/mr\_egger\_stan, a function for estimating causal effects through the Bayesian MR- Egger model, with a choice of prior distributions; * mr\_radialegger\_rjags/mr\_radialegger\_stan, a function for performing Bayesian analysis under the radial formulation of MR-Egger. The package allows users: * to specify custom prior distributions for the estimate of the causal effect (betaprior) and optionally for the residual standard error (sigmaprior) for the MR-Egger models (original and radial). This option is only for _rjags functions, the prior distributions are written in the JAGS syntax. For more information on how to specify prior distributions see page 34 of JAGS manual;11 * to choose a random seed for reproducible results and to choose the number of chains for MCMC, each chain should have a different seed; * to set parameter rho, the correlation coefficient between the average pleiotropic effect and causal estimate. This option is only relevant when using the *joint* prior method; * to plot the posterior density and investigate the MCMC diagnostics. The package also includes two summary-level datasets containing: * 185 SNPs with multiple instrument-phenotype associations for low-density lipoprotein cholesterol (LDL-c), while the instrument-outcome associations for coronary heart disease (CHD);12 * 14 SNPs with instrument-phenotype associations of body mass index (BMI) and instrument-outcome associations of insulin resistance.13 The next section shows an applied example with instructions in r code chunks on how to use the package. ### Example: Investigating the effect of BMI on acute ischemic stroke We demonstrate the package using the motivating example by zhao and colleagues 14 which is also the example dataset in mr.raps for estimating the causal effect of body mass index (BMI) on acute ischemic stroke (AIS). We estimate the causal effect on the summary-level dataset with GWAS p-value threshold as *p <* 5 × 10−8 and when all the instruments are included. We apply the prior distributions in table 1 and compare with the frequentist model. Firstly, we load the package into our R session (See the Availability section for installation instructions). library(mrbayes) The next stage involves the setting up the dataset by using the mr_format() function. In our illustration the datasets are denoted as bmi1 and bmi2; dat <- mr_format( rsid = bmi1$SNP, xbeta = bmi1$beta.exposure, ybeta = bmi1$beta.outcome, xse = bmi1$se.exposure, yse = bmi1$se.outcome ) dat2 <- mr_format( rsid = bmi2$SNP, xbeta = bmi2$beta.exposure, ybeta = bmi2$beta.outcome, xse = bmi2$se.exposure, yse = bmi2$se.outcome ) We show the R syntax to estimate the Bayesian models using the default prior distributions. The code chunk below describes the syntax for each prior distributions for JAGS and Stan software. For the joint prior we assume the correlation between the intercept and slope is 0.5. *## weakly informative prior* *### JAGS* vague_ivw <- mr\_ivw\_rjags( dat, prior = “weak”, seed = c(123456, 456789, 342564), n.chains = 3 ) vague_mregger <- mr\_egger\_rjags( dat, prior = “weak”, seed = c(123456, 456789, 342564), n.chains = 3 ) vague_radmregger <- mr\_radialegger\_rjags ( dat, prior = “weak”, seed = c(123456, 456789, 342564), n.chains = 3 ) *### Stan* vague\_ivw\_stan <- mr\_ivw\_stan(dat, prior = 2, seed = 12345, n.chains = 3) vague\_mregger\_stan <- mr\_egger\_stan(dat, prior = 2, seed = 12345, n.chains = 3) vague\_radmregger\_stan <- mr\_radialegger\_stan(dat, prior = 2, seed = 12345, n.chains = 3) *## Default shrinkage prior* *### JAGS* pseudo_ivw <- mr\_ivw\_rjags( dat, prior = “pseudo”, seed = c(123456, 456789, 342564), n.chains = 3 ) pseudo_mregger <- mr\_egger\_rjags( dat, prior = “pseudo”, seed = c(123456, 456789, 342564), n.chains = 3 ) pseudo_radmregger <- mr\_radialegger\_rjags( dat, prior = “pseudo”, seed = c(123456, 456789, 342564), n.chains = 3 ) *### Stan* pseudo\_ivw\_stan <- mr\_ivw\_stan(dat, prior = 3, seed = 12345, n.chains = 3) pseudo\_mregger\_stan <- mr\_egger\_stan(dat, prior = 3, seed = 12345, n.chains = 3 pseudo\_radmregger\_stan <- mr\_radialegger\_stan(dat, prior = 3, seed = 12345, n.chains = 3) *## Joint Prior* *### JAGS* joint_mregger <- mr\_egger\_rjags( dat, prior = “joint”, rho = 0.5, seed = c(123456, 456789, 342564), n.chains = 3 ) joint_radmregger <- mr\_radialegger\_rjags( dat, prior = “joint”, rho = 0.5, seed = c(123456, 456789, 342564), n.chains = 3 ) *### Stan* joint\_mregger\_stan <- mr\_egger\_stan(dat, prior = 4, seed = 12345, n.chains = 3) joint\_radmregger_stan <- mr_radialegger_stan(dat, prior = 4, seed = 12345, n.chains = 3) View this table: [Table 2:](http://medrxiv.org/content/early/2020/08/20/19005868/T2) Table 2: Estimates from summary-level data (GWAS p-value threshold). CI in the column indicates Confidence/Credible Interval The estimates and credible intervals from JAGS and Stan models are similar, we include estimates from JAGS in tables 2 and 3. The estimates derived from the models are seen in Tables 2 and 3 (dataset including all the instruments). In summary, estimates from the prior distributions for the MR-Egger model are consistent, figure 1 shows a graphical summary. View this table: [Table 3:](http://medrxiv.org/content/early/2020/08/20/19005868/T3) Table 3: Estimates from summary-level dataset when all instruments are included. CI in the column indicates Confidence/Credible Interval ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/20/19005868/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/08/20/19005868/F1) Figure 1: BMI and AIS ## Conclusion We present an R package, mrbayes, to perform Bayesian estimation of the IVW and MR-Egger models implemented through the JAGS and Stan software. In our example, we demonstrated the use of several different prior distributions to estimate the causal and average pleiotropic effects from these models. There are several R packages providing functions for MR analyses. The MendelianRandomization and TwoSampleMR packages implement various two-sample MR methods.15,16 The RadialMR R package implements the radial MR models and visualization of instruments through radial plots.3,17 Bayesian methods have not gained popularity in applied studies due to availability of a user-friendly software18. Our package complements previous MR packages by offering a Bayesian perspective with the choice of four prior distributions for the causal effect; non-informative, weakly informative, pseudo-horseshoe, and a joint prior distribution for the MR-Egger model’s intercept and slope. In a Bayesian analysis the prior distributions can have an important impact upon the final parameter estimates. Hence in the mrbayes package we offer a choice of prior distributions. In future work, we plan to provide functions for multivariate models. ## Data Availability Publicly available summary level data ## Availability The package is freely available, under the MIT license, on GitHub here [https://github.com/okezie94/mrbayes](https://github.com/okezie94/mrbayes). It can be installed in R using the following commands. # *install.packages(*“*remotes*”*) # uncomment if remotes not installed* remotes::install_github(okezie94/mrbayes”) There is a website of the package helpfiles at [https://okezie94.github.io/mrbayes/](https://okezie94.github.io/mrbayes/). ## Footnotes * Main and Supplementary documents were revised. The authors' affiliations were updated. * Received September 2, 2019. * Revision received August 20, 2020. * Accepted August 20, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. Oxford University Press; 2003;32(1):1–22. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyg070&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12689998&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182341300001&link_type=ISI) 2. 2.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology [Internet]. 2015 Jun;44(2):512–525. Available from: [https://dx.doi.org/10.1093/ije/dyv080](https://dx.doi.org/10.1093/ije/dyv080) [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyv080&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26050253&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 3. 3.Bowden J, Spiller W, Del Greco M F, et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. International Journal of Epidemiology [Internet]. 2018 Jun;47(4):1264–1278. Available from: [https://doi.org/10.1093/ije/dyy101](https://doi.org/10.1093/ije/dyy101) 4. 4.Jones E, Thompson J, Didelez V, Sheehan N. On the choice of parameterisation and priors for the Bayesian analyses of Mendelian randomisation studies. Statistics in Medicine. Wiley Online Library; 2012;31(14):1483–1501. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.4499&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22415699&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 5. 5.Burgess S, Thompson SG. Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. Statistics in Medicine. Wiley Online Library; 2012;31(15):1582–1600. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.4498&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22374818&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 6. 6.Shapland CY, Thompson JR, Sheehan NA. A bayesian approach to mendelian randomisation with dependent instruments. Statistics in medicine. Wiley Online Library; 2019;38(6):985–1001. 7. 7.Berzuini C, Guo H, Burgess S, Bernardinelli L. A Bayesian approach to Mendelian randomization with multiple pleiotropic variants. Biostatistics [Internet]. 2018 Aug; Available from: [https://dx.doi.org/10.1093/biostatistics/kxy027](https://dx.doi.org/10.1093/biostatistics/kxy027) 8. 8.Schmidt A, Dudbridge F. Mendelian randomization with Egger pleiotropy correction and weakly informative Bayesian priors. International Journal of Epidemiology. Oxford University Press; 2017;47(4):1217–1228. 9. 9.Plummer M. Rjags: Bayesian graphical models using mcmc [Internet]. 2018. Available from: [https://CRAN.R-project.org/package=rjags](https://CRAN.R-project.org/package=rjags) 10. 10.Stan Development Team. RStan: the R interface to Stan [Internet]. 2018. Available from: [http://mc-stan.org/](http://mc-stan.org/) 11. 11.Plummer M. JAGS Version 3.3.0 user manual. Lyon, France: International Agency for Research on Cancer; 2012. 12. 12.Do R, Willer CJ, Schmidt EM, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nature Genetics. Nature Publishing Group; 2013;45(11):1345. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2795&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24097064&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 13. 13.Richmond R, Wade K, Corbin L, et al. Investigating the role of insulin in increased adiposity: Bi-directional Mendelian randomization study. *bioRxiv* [Internet]. Cold Spring Harbor Laboratory; 2017;155739. Available from: [https://www.biorxiv.org/content/early/2017/06/28/155739](https://www.biorxiv.org/content/early/2017/06/28/155739) 14. 14.Zhao Q, Wang J, Hemani G, Bowden J, Small DS. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. arXiv preprint arXiv:180109652. 2018; 15. 15.Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. International Journal of Epidemiology. Oxford University Press; 2017;46(6):1734–1739. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyx034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28398548&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 16. 16.Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. eLife Sciences Publications Limited; 2018;7:e34408. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7554/eLife.34408&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29846171&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F20%2F19005868.atom) 17. 17.Spiller W, Bowden J. Radial MR: A package for implementing radial inverse variance weighted and MR-Egger methods. [Internet]. 2019. Available from: [https://github.com/WSpiller/RadialMR](https://github.com/WSpiller/RadialMR) 18. 18.Sheehan NA, Didelez V. Epidemiology, genetic epidemiology and mendelian randomisation: More need than ever to attend to detail. Human genetics. Springer; 2020;139(1):121–136. [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/graphic-3.gif