Proteome-wide Mendelian randomization in global biobank meta-analysis reveals multi-ancestry drug targets for common diseases ============================================================================================================================= * Huiling Zhao * Humaria Rasheed * Therese Haugdahl Nøst * Yoonsu Cho * Yi Liu * Laxmi Bhatta * Arjun Bhattacharya * Global Biobank Meta-analysis Initiative * Gibran Hemani * George Davey Smith * Ben Michael Brumpton * Wei Zhou * Benjamin M. Neale * Tom R. Gaunt * Jie Zheng ## Abstract Proteome-wide Mendelian randomization (MR) shows value in prioritizing drug targets in Europeans, but limited data has made identification of causal proteins in other ancestries challenging. Here we present a multi-ancestry proteome-wide MR analysis pipeline based on cross-population data from the Global Biobank Meta-analysis Initiative (GBMI). We estimated the causal effects of 1,545 proteins on eight complex diseases in up to 32,658 individuals of African ancestries and 1.22 million individuals of European ancestries. We identified 45 and seven protein-disease pairs with MR and genetic colocalization evidence in the two ancestries respectively. 15 protein-disease pairs showed evidence of differential effects between males and females. A multi-ancestry MR comparison identified two protein-disease pairs with MR evidence of an effect in both ancestries, seven pairs with European-specific effects and seven with African-specific effects. Integrating these MR signals with observational and clinical trial evidence, we were able to evaluate the efficacy of one existing drug, identify seven drug repurposing opportunities and predict seven novel effects of proteins on diseases. Our results highlight the value of proteome-wide MR in informing the generalisability of drug targets across ancestries and illustrate the value of multi-cohort and biobank meta-analysis of genetic data for drug development. ![Figure1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F1.medium.gif) [Figure1](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F1) **Graphical abstract**Notation: genome-wide association study (GWAS); Mendelian randomization (MR); primary open-angle glaucoma (POAG); idiopathic pulmonary fibrosis (IPF); chronic obstructive pulmonary disease (COPD); heart failure (HF), venous thromboembolism (VTE). European ancestry (EUR); African ancestry (AFR)*For the seven protein-disease associations, one association passed FDR threshold of 0.05 in proteome-wide MR, six additional associations passed FDR of 0.05 in the multi-ancestry comparison analysis. **Highlights** * A multi-ancestry proteome-wide Mendelian randomization (MR) analysis of 1,545 proteins on eight diseases in more than 1.26 million individuals from a disease GWAS meta-analysis of 19 biobanks. * We find evidence for putative causal effects in 45 protein-disease pairs in European ancestry and seven protein-disease pairs in African ancestry, with 15 pairs showing sex specific effects. * We identify evidence of causality for two protein-disease pairs that are common to both African and European ancestries, seven pairs with European-specific effects and seven pairs with African-specific effects. * Triangulating with clinical trial and observational evidence prioritizes seven new targets, seven drug repurposing opportunities and one existing drug target that generalise to African ancestry. Key words * Plasma proteome * complex diseases * multi-ancestry Mendelian randomization * drug target prioritisation ## INTRODUCTION The efficacy of drugs is typically evaluated in one or a small number of populations during phase 3 clinical trials. However, there are known differences in drug response between ancestries1, which may be due to genetic or environmental differences. To improve the generalisability of drug interventions, we need to better understand these differences and find new approaches to predict treatment response in different populations. One possible approach is to use genetic variants that influence the drug target as proxies to cost-effectively predict treatment response2,3. Proteome-wide Mendelian randomization (MR) utilizes genetic predictors of protein levels to test the causal effects of proteins on common diseases. Subject to the key assumptions, MR can provide evidence of the putative causal roles of thousands of proteins on risk of a wide range of diseases4,5,6. Some recent proteome studies have built proteome-phenome maps in samples of up to 35,559 European participants7,8, and further suggested that drug targets with robust MR and colocalization evidence are more likely to be successful in drug trials5. Moving beyond these studies, MR could play a key role in prioritizing drug targets in different ancestries9, which may inform the design of future trials10. Multi-ancestry studies are gaining increasing prominence due to the importance of understanding differences in disease aetiology between ancestries. Others have developed and applied trans-ancestry methods for genetic correlation11, polygenic risk score12,13 and fine mapping14,15 analyses. However, multi-ancestry causal inference using MR is still in its infancy16. One major issue restricting the development of multi-ancestry MR is the unbalanced representation of genome-wide association study (GWAS) samples across different ethnic groups, with one commentary reporting that across published GWAS, 78% of participants were of European ancestry17. Consequently, most published proteome GWAS and MR studies have been restricted to European ancestry4,5,6,7,18,19. This bias in population coverage causes two issues: (i) we lack sufficient proteomic GWAS in non-European ancestries, which restricts our ability to identify protein quantitative trait loci (pQTLs) in other ancestries; (ii) without well-powered disease GWAS in non-European ancestries, we have little opportunity to identify multi-ancestry and ancestry-specific protein-disease associations. A recent large-scale plasma proteome GWAS study has identified pQTLs in both European and African ancestries and compared the genetic architecture of the proteome across ancestries20. This study further estimated the associations of proteins on plasma urate and gout in Europeans using a previously-described transcriptome-wide association study pipeline21. The integration of this unique multi-ancestry pQTL resources with ancestry enriched GWAS resources within the Global Biobank Meta-analysis Initiative (GBMI) has presented an ideal opportunity for a multi-ancestry proteome MR analysis. GBMI has collated a multi-ancestry genetic data set with 2.6 million subjects, including samples from Asian, African, Hispanic American, and European ancestries, has standardized phenotype/disease definitions and applied a universal GWAS analysis pipeline across these biobanks22. This initiative has enabled us to conduct a multi-ancestry proteome MR analysis using well-harmonized GWAS data. In this study, we systematically estimated the causal role of 1,311 and 1,310 proteins, measured in populations from African and European ancestry respectively (1,076 proteins in both ancestries), on eight complex diseases using a comprehensive ancestry-specific MR pipeline based on our previous approach in European datasets5. We further estimated the consistency of pQTLs across ancestries, identified potential multi-ancestry and ancestry-specific causal protein-disease pairs, and integrated MR findings with observational and clinical trial evidence23 to prioritize drug targets. We report our results in an openly accessible database: EpiGraphDB24 ([https://epigraphdb.org/multi-ancestry-pwmr/](https://epigraphdb.org/multi-ancestry-pwmr/)). ## RESULTS ### Summary of selection and validation of proteins and diseases data Cis-acting pQTLs within 500KB of the protein-coding gene were selected as genetic instruments for the proteome MR analyses, since cis-acting pQTLs are more likely to have protein-specific effects than trans-acting pQTLs4. For the discovery MR analysis, 6,144 conditionally independent pQTLs of 1,310 proteins in 7,213 Europeans (**Table S1**) and 3,875 conditionally independent pQTLs of 1,311 proteins in 1,871 Africans (**Table S2**) derived from the Atherosclerosis Risk in Communities Study (ARIC) study20 were selected as candidate genetic instruments for their respective proteins, where the conditional independent pQTLs were identified using the fine mapping method SuSiE25. To increase reliability and boost power, we developed a three-step instrument validation pipeline to filter the pQTLs that best fit the MR assumptions. First, to avoiding the potential issue of collinearity of the MR model, we applied linkage disequilibrium (LD) clumping to remove pQTLs strongly correlated with each other (r2<0.6). Second, we estimated the instrument strength using F-statistics, excluding pQTLs with F-statistics lower than 10 from the MR analysis to avoid potential weak instrument bias26. Third, we applied the MR Steiger filtering approach27 to exclude pQTLs with potential reverse causality28 (i.e. where genetic predisposition to disease has a causal effect on the protein). After selection, a total 3,550 pQTLs of 1,311 proteins in Africans and 5,418 pQTLs of 1,310 proteins in Europeans were selected as instruments for the MR analysis (**Figure 1**). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F2.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F2) Figure 1. Study design of the multi-ancestry proteome-wide Mendelian randomization in Global Biobank Meta-analysis Initiative. We further divided the pQTLs into three tiers using a refined instrument validation process we previously developed5 (details in **Figure S1** and Methods): 5,635 tier 1 instruments passed all validation tests; the 5,418 candidate European instruments showed similar genetic effects across pQTL studies, including ARIC20, Sun et al4 and Folkersen et al29 (Pearson correlation=0.92; **Figure S2**), with 702 tier 2 instruments showed potential heterogeneous genetic effects compared to previous pQTL studies20,4,29 (**Table S3**); 277 tier 3 instruments showed non-specific (potential pleiotropic) effects on the basis of being associated with more than five proteins. On the basis that heterogeneous and non-specific instruments may still provide some evidence for true causal relationships between proteins and diseases, we kept all instruments for the MR analysis but annotated our results with these tiers and recommend that results from tier 2 and 3 instruments be treated with caution. For the replication MR analysis, 289 conditionally independent pQTLs of 289 proteins in up to 6,000 Europeans were selected from Zheng et al5 (**Table S4**) and 290 conditionally independent pQTLs of 290 proteins in 467 Africans from the African American Study of Kidney Disease and Hypertension Cohort Study (AASK) cohort (**Table S5**) were selected as instruments for the replication MR analysis (**Figure 1**). For the outcomes of the MR analysis, we selected eight of the 14 diseases from GBMI on the basis that we had full GWAS summary statistics in both European and African ancestries and relatively good statistical power (more than 100 cases). The eight disease outcomes included idiopathic pulmonary fibrosis (IPF), primary open-angle glaucoma (POAG), heart failure (HF), venous thromboembolism (VTE), stroke, gout, chronic obstructive pulmonary disease (COPD) and asthma (**Table S6A** and **B**). ### Estimation of putative causal effects of proteins on diseases in African and European ancestries We undertook two-stage (discovery and replication) MR and sensitivity analyses to systematically evaluate evidence for the causal effects of 1,311 plasma proteins on the eight diseases in African ancestry and separately for 1,310 proteins on the same eight diseases in European ancestry. Of these proteins, 1,076 of them had instruments in both ancestries (**Table S1** and **S2**). 544 of them (20.8%) have only one pQTL, 599 (22.9%) have two pQTLs and 1,478 (56.4%) have three or more pQTLs in the cis region. For proteins with one pQTL, we applied the Wald ratio test30. For proteins with two or more pQTLs, we applied a generalised inverse variance weighted approach (gIVW)31, which takes into account the LD correlation between nearby cis instruments and increases the reliability of the MR analysis (since conditional independent pQTLs could still be in LD, a conventional IVW may double-count effects among these signals). For proteins with three or more pQTLs, we further applied a generalised MR-Egger regression (gEgger) approach32,31, which allowed us to estimate the potential effect of pleiotropy on the MR estimates. When a certain pQTL was missing in the disease GWAS data, we used a proxy genetic variant in high LD with that pQTL (r2>0.8 in the 1000 Genomes data for the relevant population33) instead (**Figure 1**). ### Discovery MR and sensitivity analyses We conducted discovery MR on 10,318 protein-disease pairs in European ancestry and 9,858 pairs in African ancestry. In total, 830 and 388 protein-disease pairs showed marginal MR signals (P<0.05) and little evidence of horizontal pleiotropy in European and African ancestries separately (**Table 1**; **Table S7, S8**). Among these, 69 MR signals in European ancestry and one signal in African ancestry reached a Benjamini-Hochberg false discovery rate (FDR) of 0.05, and were therefore considered as candidate protein-disease pairs in the discovery analysis (**Table S7A, S8A**). View this table: [Table 1.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/T1) Table 1. Summary of proteome-wide Mendelian randomization results in European and African ancestries To maximize the possibility of identifying true causal effects, we conducted a range of sensitivity analyses on the MR signals that passed the FDR threshold, which included a pleiotropy test using the gEgger intercept term32, a heterogeneity test using Cochran’s Q for gIVW analysis and Rücker’s Q for gEgger analysis34,35, and a set of genetic colocalization analyses (including conventional colocalization, pairwise conditional and colocalization [PWCoCo] and LD check)36,5. The test of gEgger intercept suggested that 10 of the 69 (14.5%) potential protein-disease effects in Europeans and none of the effects in Africans showed evidence of being influenced by directional pleiotropy. As pleiotropy invalidates the exclusion restriction assumption of MR, these 10 results were excluded from the candidate causal effects list (**Table S9A**). For the remaining 59 protein-disease pairs in European ancestry and one pair in African ancestry, we were able to conduct a heterogeneity test on 53 of them. 38 (71.7%) showed little evidence of heterogeneity (**Table S7A** and **S8A**). This observation implies that the conditionally independent cis pQTLs of the same protein tend to show proportionally similar effects on the relevant disease outcomes. Since heterogeneity could be caused by various factors10, we kept the MR signals with evidence of heterogeneity in the candidate list but annotated this in our results. To further distinguish causal protein-disease pairs from confounding by LD, we applied three colocalization approaches: conventional colocalization36, pairwise conditional and colocalization (PWCoCo) and LD check analysis5. The LD check analysis, which estimated the LD between each pQTL and disease-associated GWAS signals in the cis region, suggested that 43 protein-disease pairs in European ancestry and one pair in African ancestry showed evidence of approximate colocalization (pair-wise LD r2>0.7; **Table S7A** and **S8B**). This includes protein levels of ABO which showed robust MR and LD check evidence on VTE, with the same direction of effect in both European and African ancestries (OR in Europeans=1.11, P=5.59×10−11, LD r2=1; OR in Africans=1.33, P=2.82×10−6; LD r2=0.80; **Table S7** and **S8**). The conventional colocalization and PWCoCo showed colocalization evidence for 18 and one protein-disease pairs in European and African ancestries respectively (colocalization posterior probability>70%). For example, we identified the effect of protein level of PROC on VTE using a trans-acting variant in the PROCR region in our previous proteome-wide MR study5, which was confirmed using the same variant using the GBMI VTE GWAS data37. In this study, we estimated the same effect of PROC level on VTE in European ancestry using cis-acting pQTLs (P=1.45×10−8, colocalization probability=99%) but weaker evidence in African ancestry (P=8.96×10−3, colocalization probability=11%). In summary, 46 of 60 (76.7%) MR signals in European and/or African ancestries showed colocalization and/or LD check evidence (**Table S7A** and **S8B**). Finally, we considered potential aptamer-binding artefacts driven by protein-altering variants within the target sequence. Among the 5,418 and 3,550 pQTLs selected in European and African ancestries, 1,421 (15.8%) of the pQTLs or their LD proxies (r2>0.8 in 1000 Genome reference panel) were annotated as missense, stop-lost or stop-gained variants using Ensembl Variant Effect Predictor (VEP)38 (**Table S1** and **S2**). For MR signals using these pQTLs as genetic instruments, we flagged the MR effect estimates and recommend caution in their interpretation. Among the 59 robust European MR signals and one robust African MR signal, 26 (43.3%) were estimated using non-coding variants (and for which LD proxies were also non-coding variants) as the instrument in European and/or African ancestries (**Table S7** and **S8**) and are therefore not likely to be influenced by aptamer-binding artefacts. As summarized in **Table 1**, 45 MR signals in European ancestry and one signal in African ancestry passed the FDR-corrected threshold, showed colocalization/LD check evidence and little evidence of pleiotropy. This included 14 proteins with putative causal effects on asthma and 17 proteins with effects on VTE. In a more extensive discovery analysis, for 788 pairs in Europeans and 387 pairs in Africans with marginal MR signals (P<0.05) that did not reach the FDR threshold, we applied the same set of sensitivity and replication analyses. 17 of them showed evidence of horizontal pleiotropy and were not included in any follow-up analyses (**Table S9B**); 738 pairs in Europeans and 373 pairs in Africans showed little evidence of heterogeneity; 341 pairs in Europeans and 86 pairs in Africans showed colocalization or LD check evidence; 439 pairs in Europeans and 216 pairs in Africans were not influenced by aptamer-binding artefacts (**Table S7B** and **S8B**). ### Replication of MR signals using pQTL data from independent samples We selected the 59 European MR signals and one African signal that passed the FDR threshold of 0.05 for the replication MR analysis (**Table S7A** and **S8A**). The conditional independent pQTLs were selected as instruments from two non-overlapped studies, Zheng et al5 and AASK20. After instrument selection and validation, we were able to conduct replication MR in 28 and one pair(s) in European and African ancestries respectively. Among these pairs, 14 and one protein-disease pairs showed MR evidence (FDR < 0.05 in replication analysis) in European and African ancestries respectively (**Table S10A** and **S10B**). When data was available, we applied the same sensitivity analyses including colocalization analysis for the replication MR signals. Six pairs showed colocalization evidence in European ancestry (**Table S10A** and **S10B**). In an extended replication analysis, we investigated 336 and 222 MR pairs that passed MR P value <0.05 but did not reach the FDR threshold in European and African discovery analysis. This analysis identified 48 and 43 MR pairs that showed LD check evidence (LD r2>0.7) in the two ancestries separately (**Table S11C** and **S11D**). ### Sex-specific MR analysis The treatment response of drugs often differs by sex39. To investigate the potential influence of sex-specific genetic effects on our proteome MR signals, we conducted sex-specific proteome MR using male- and female-only disease GWASs provided by GBMI (**Table S6**). Among the protein-disease pairs, 7,498 protein-disease pairs in European ancestry as well as 7,693 pairs in African ancestry have available data to conduct the MR analysis in both females and males. The pairwise Z-score test comparing the male- and female-only MR estimate was applied to identify protein-disease pairs with sex specific effect. After applying FDR threshold of 0.05 for the pairwise Z-score p values in European and African ancestries, 12 protein-disease pairs in European ancestry and three pairs in African ancestry showed robust evidence of difference in MR estimates between sexes (**Table S11**). Among these protein-disease pairs, three of them were related to proteins of existing drug targets, which included IL17RA level on asthma, ERAP1 level (target of Tosedostat) on IPF and NQO1 level (target of Vatiquinone) on HF (**Figure 2**). The protein IL17RA is a target of drug Brodalumab, and the efficacy of this drug on asthma was tested in a Phase II trial of 421 participants (Brodalumab vs placebo; [NCT01902290](http://medrxiv.org/lookup/external-ref?link\_type=CLINTRIALGOV&access_num=NCT01902290&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom)). However, the trial was terminated due to lack of efficacy. Our MR results showed that genetically increased IL17RA level was associated with increased asthma risk in males (OR=1.03, 95%CI=1.02 to 1.04, P=1.69×10−9) but showed little effect in females (OR=1.00, 95%CI=1.00 to 1.01, P=0.35) in European ancestry. Although the protective effect of IL17RA inhibition on asthma was relatively minor in the sex-combined trial ([NCT01902290](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT01902290&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom)), our study suggested that the efficacy of this target on asthma in males may worth reconsideration in future trials. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F3.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F3) Figure 2. Proteome MR signals showed distinguished effects in males and females. The sex-combined and sex-specific MR estimates were presented for each protein-disease pair. ### Multi-ancestry comparison of pQTLs and proteome MR signals #### Systematic evaluation of ancestry specificity of pQTLs pQTL may show different effects across ancestries due to differing allele frequencies, LD structure, sample sizes or interactions. We systematically evaluated the ancestry-specificity of pQTLs using the TAMR package ([https://github.com/universe77/TAMR](https://github.com/universe77/TAMR)). TAMR allows us to estimate how often pQTLs across two ancestries have a substantially different direction of effect or different level of statistical evidence compared to expectation, where the expected degree of replication was calculated using a Bayesian Winner’s Curse correction described in a previous study40. Using the expected degree of replication as a benchmark offers a comparison of pQTLs that takes into account the differences in allele frequency, sample size and effect size of pQTLs across ancestries. The 1,076 proteins with full summary statistics and pQTL signals in both ancestries were included in this analysis. We first estimated how often African pQTLs showed ancestry-specificity/consistency compared to European pQTLs. Among the expected pQTLs, 73.0% of them show consistent direction of effect across the two ancestries. 83.6% of the expected pQTLs were observed to reach the GWAS genome-wide evidence threshold (p<5×10−8) in both ancestries (**Table S12A**). Conversely, we estimated how often European pQTLs showed ancestry specificity/consistency compared to African pQTLs. In agreement with the results with African pQTLs, we found that 71.8% of the expected pQTLs showed a consistent direction of effect across the two ancestries. However, only 60.8% of the expected pQTLs met the GWAS genome-wide evidence threshold in both ancestries largely driven by statistical power differences (**Table S12B**). #### Identification of multi-ancestry and ancestry specific pQTLs We further identified pQTLs that were shared across ancestries. For any pQTL that passed the FDR threshold of 0.05 in one ancestry, if this variant (or LD proxy with LD r2>0.8) showed a marginal signal < 0.01 in the other ancestry, we considered this as a shared signal across ancestries. We then split the tested proteins into four categories based on whether the pQTL signals in the tested protein region (which was defined as a region within 500kb each side from the top pQTLs) were shared across European and African ancestries (**Figure 3A**): (1) protein regions with pQTL signals in both ancestries but for which the signals were not shared across ancestries; (2) protein regions with one or more shared pQTLs across ancestries (and without non-shared pQTLs); (3) protein regions with both shared and non-shared pQTLs; (4) protein regions with pQTL in only one of the ancestries. As shown in **Figure 3B**, among the 1,310 and 1,311 proteins with pQTLs in at least one ancestry, 1,076 proteins showed pQTL signals in both ancestries (situation 1, 2 and 3), with the remainder comprising 234 European-specific and 235 African-specific protein regions (situation 4; ancestry specific pQTLs were listed in **Table S14A** and **B**). Further investigating these 1,076 protein regions, 974 of them had shared pQTLs in the same region (situation 2 and 3; multi-ancestry pQTLs were listed in **Table S13**), while 102 proteins only had non-shared pQTLs in these regions (situation 1; **Figure 3C**; ancestry specific pQTLs in the shared protein regions were listed in **Table S15A** and **B**). ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F4.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F4) Figure 3. Multi-ancestry investigation identifying shared protein regions or shared pQTLs. (A) Four situations to identify multi-ancestry and ancestry specific protein regions: (1) protein regions only non-shared pQTLs across ancestries; (2) protein regions with one or more shared pQTLs across ancestries (and no non-shared pQTLs); (3) protein regions with both shared and non-shared pQTLs; (4) protein regions with pQTL in only one of the ancestries. (B) number of protein regions with ancestry specific pQTLs. (C) number of protein regions with shared pQTLs in the cis region. #### Identification of multi-ancestry and ancestry specific MR signals We conducted a multi-ancestry comparison for the 59 unique protein-disease pairs that showed robust MR evidence (FDR<0.05) in at least one ancestry (where ABO effect on VTE appeared in both ancestries; **Table S7A** and **S8A**). Using an FDR threshold of 0.05 based on the 59 protein-disease pairs, we identified seven pairs with MR signals in both ancestries (**Figure 4A; Table S16A**). Comparing the MR effect estimates of these seven protein-disease pairs, we observed a very high correlation of the MR effect estimates across ancestries (**Table 1, Figure 4B**; Pearson correlation=85.8%). Further considering the colocalization evidence from the discovery MR (**Figure 5A**), two protein-disease pairs showed colocalization evidence in both ancestries. These include genetically predicted protein level of SERPINE2 associated with VTE (OR in Europeans=0.94, 95%CI=0.92 to 0.96, P=7.3×10−8, colocalization probability=99%; OR in Africans=0.82, 95%CI=0.67 to 0.95, P=1.28×10−2, colocalization probability=100%; **Figure 5B**) as well as the above-mentioned genetically predicted protein level of ABO associated with VTE (**Figure 5C**). Further comparing the two protein-disease pairs with the replication MR signals, both of them showed MR evidence (FDR<0.05 in replication), with the same direction of effect in discovery and replication analyses (**Table S16A**). ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F5.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F5) Figure 4. Comparison of multi-ancestry proteome MR signals in European and African ancestries. (A) Protein-disease pairs with FDR<0.05 in multi-ancestry comparison. (B) Comparison of MR effect estimates of the seven protein-disease pairs with MR evidence (FDR<0.05 in multi-ancestry analysis), each point refers to one protein-disease pair, the x-axis refers to the MR estimate in European ancestry, the y-axis is the MR estimate in African ancestry. (C) Miami plot of the protein-disease causal estimates in European and African ancestries, each point refers to a protein-disease pair, the x-axis is the chromosome and position of the protein, the y-axis is the -log10(P) of the MR estimate in European (upper) and African ancestry (bottom); the points with colours refer to the seven, 12 and 89 protein-disease pairs with multi-ancestry, African-specific or European-specific MR effects (FDR<0.05 in multi-ancestry analysis), different colour refers to different outcomes of the protein-disease pairs; the points with legends are the two, seven and seven protein-disease pairs showed MR and colocalization evidence in discovery and replication MR analyses; background colours in the legends refer to multi-ancestry (yellow), European-specific (green) and African-specific (blue) MR estimates. (D) protein-disease pairs with MR (FDR<0.05) and colocalization evidence in multi-ancestry comparison and replication analysis ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F6.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F6) Figure 5. Regional genomic plots of two protein-disease pairs with MR and colocalization evidence of potential causality in European and African ancestries. (A) the two theoretical models related to genetic colocalization, causality and colocalized as well as confounding by LD. (B) regional plots of protein level of SERPINE2 on VTE in European and African ancestries. (C) regional plots of protein level of ABO on VTE in European and African ancestries. We further identified the ancestry specific protein-disease pairs that only showed MR signal in one of the ancestries. For African MR results, we selected the 86 protein-disease pairs that showed marginal MR signals (P<0.05) and colocalization evidence in the discovery MR as candidates (**Table S8A** and **S8B**) and applied an FDR threshold of 0.05 based on the 86 pairs. After filtering, 14 pairs passed the FDR threshold. Two of the 14 pairs overlapped with the multi-ancestry MR list, and were excluded from the African specific signal list. Among the remaining 12 pairs (**Table 1, Table S16B**), seven of them showed marginal MR signals and LD check evidence in the African replication MR, which included genetically predicted effect of protein level of SERPINF1 on stroke, ACE level on COPD, B4GALT6 level on POAG, F7 level on stroke, LY75 level on asthma, AIF1 level on HF and CD248 level on gout (**Table S16B**). For the European MR results, we selected 341 protein-disease pairs with marginal MR evidence (P<0.05) and colocalization evidence in the discovery analysis as candidates. After filtering by FDR threshold of 0.05 based on the 341 pairs, 95 pairs remained (**Table 1, Table S16C**). Six of these overlapped with the multi-ancestry MR list, and were excluded from the European specific signal list. Further filtering the remaining 89 signals based on the replication MR evidence, eight of them showed MR and LD check evidence in the European replication analysis, including effect of F11 level on VTE, KLKB1 level on VTE, ERAP1 level on POAG, TNFSF12 level on asthma, ECM1 level on IPF, CD109 level on VTE and IL7R level on asthma (**Table S16C**). **Figure 4** summarized the multi-ancestry comparison results. The analysis first prioritized seven, 12 and 89 candidate protein-disease pairs that passed FDR threshold of 0.05 in the multi-ancestry analysis (**Figure 4A,** coloured points in **Figure 4C**). The two multi-ancestry, seven European-specific and seven African-specific signals with MR and colocalization evidence in both multi-ancestry comparison and replication (points with legends in **Figure 4C**) were considered as protein-disease pairs with robust genetic evidence and were included in the triangulation analysis (**Figure 4D**). ### Drug target prioritization using MR, observational and clinical trial evidence Triangulation of evidence from genetics, observational study and clinical trials has the potential to increase the reliability of causal inference23,41. The 16 protein-disease pairs with MR and colocalization evidence in both discovery and replication were selected as candidates for this analysis (**Figure 4C**). We conducted observational analysis in up to 3,172 participants from HUNT study42 (details in methods). The logistic regression of proteins on diseases suggested that nine of the 16 observational associations showed the same direction of effects as the MR signals. Three of the 16 observational associations passed FDR<0.05 in this analysis, which included ACE level on COPD, AIF1 level on HF, SERPINF1 on stroke (**Table S17**). We further mined clinical trial evidence for the 16 prioritized protein-disease pairs using the OpenTargets43 and DrugBank44 database. As summarized in **Table 2**, we found clinical trial evidence (phase IV trials vs placebo; [NCT01014338](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT01014338&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom)) to support our MR signal of protein level of ACE on COPD, in which the evidence was obtained from Europeans. Our study validated the efficacy of ACE inhibition on reducing COPD in African ancestry (OR in Africans=0.88, 95%CI=0.81 to 0.95, P=1.64×10−3; **Table S16**). Additionally, we also observed seven proteins that are drug targets of existing drugs, in which our MR signals indicate potential drug repurposing opportunities of these drug targets to other indications. (**Table 2**). For example, KLKB1 protein is the target for Ecallantide, which is used to treat hereditary angioedema. Our study showed strong genetic evidence to support the causal role of inhibition of protein level of KLKB1 on reducing VTE risk (OR=0.78, P=4.59×10−15), which implies a repurposing opportunity of Ecallantide on VTE prevention. In addition, the effect of ABO level on VTE was also observed in recent genetic studies, including the VTE GWAS meta-analysis conducted in the GBMI consortium45. The remaining seven protein-disease pairs we identified were considered as novel causal proteins and are therefore potential novel drug targets (**Table S18**). For example, the serpin related protein, SERPINE2, has previously been reported to be associated with COPD46. This association was confirmed by our MR results (MR P= 9.28×10−5). Our disease-wide scan further suggested its effect on VTE (P=7.3×10−8) and IPF (P=9.91×10−3). This implies that SERPINE2 could be considered an attractive drug target for prevention of COPD, IPF and thromboembolism (**Figure S3**). View this table: [Table 2.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/T2) Table 2. Drug target validation and repurposing opportunities. To prioritize the most valuable drug targets from this study, we summarized the drug target prioritization analyses we conducted, which included four filtering steps: discovery MR, sensitivity analyses and replication MR, multi-ancestry comparison and triangulation. As showed in **Figure 6A**, nine protein-disease pairs were ranked as the most valuable findings after the filtering, which includes IL7R and TNFSF12 level on asthma, ACE level on COPD, AIF1 level on HF, SERPINF1 level on stroke, SERPINE2, F11, KLKB1 and ABO level on VTE (**Figure 6B**; **Table S18**). Some of these pairs showed robust genetic evidence, for example, SERPINE2 and ABO associated with VTE. Some of these pairs showed integrative evidence, for example, the effect of ACE on COPD was validated by both genetic, observational, and clinical trial evidence. Except for the five existing drugs listed in **Table 2**, and ABO level on VTE with well-defined genetic evidence45, we highlight the other two protein-disease pairs, SERPINF1 level on stroke and SERPINE2 level on VTE, as potential novel targets that are not under clinical investigation yet. Our study provides evidence to support formal investigations of these protein-disease pairs in future clinical trials (**Table S18**). ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F7.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F7) Figure 6. Drug target prioritization profiles of MR signals across European and African ancestries. (A) drug target prioritization profile of this study; (B) evidence level for the eight prioritized drug targets (details in **Table S17** and **S18**). ## DISCUSSION The application of GWASs to investigate complex traits and diseases is now over 15 years old47. With increasing participation in major consortia such as GBMI, we are now entering a new era of multi-ancestry meta-analysis of GWASs across biobanks, which provides new opportunities and challenges. In this study, we utilized the multi-ancestry GWAS data from GBMI to implement a proteome-wide MR in two ancestries. By estimating 21,470 causal effects of proteins on diseases in European and African ancestries, we found evidence for potential causal effects in 45 and seven protein-disease pairs in the two ancestries respectively, with both MR and colocalization evidence. The sex-specific MR suggested that five protein-disease pairs in European ancestry have different causal effects between males and females. After formal comparison of MR estimates across ancestries, 2 protein-disease pairs showed shared causal effects across the two ancestries, seven European specific causal effects and seven African specific effects that showed MR and colocalization evidence in both multi-ancestry and replication analyses. By triangulating these 16 putative causal MR signals with clinical trials and observational evidence, we validated the efficacy of ACE inhibition on COPD and generalized its effect to those of African ancestry, suggested seven drug repurposing opportunities and identified seven novel protein-disease pairs that warrant further investigation, for example, the effects of SERPINE2 on VTE. Collectively, we highlighted the value of multi-ancestry MR as an approach to inform the generalisability of drug target efficacy across ancestries using GWAS results from the Global Biobank Meta-analysis Initiative. A major issue for generalisability of drug targets across populations is that most clinical trials are carried out in European ancestry, and work under the assumption that drug effects are consistent across ancestries, which is not always the case1. We have shown that multi-ancestry proteome MR offers the opportunity to address this bias by enabling us to estimate the effects of drug targets in different populations, which could be useful evidence to support the design of multi-center, multi-ancestry clinical trials. In this study, we identified multi-ancestry causal effects of two protein-disease pairs, despite being limited by statistical power. This highlights the importance of large-scale genetic studies in different populations, which should be a key priority for the research community. In addition to generalizing targets across ancestries, our MR approach also identified evidence of potential heterogeneity of drug response between ancestries for 14 protein-disease pairs. For example, protein levels of SERPINF1 showed MR evidence to support a causal effect on stroke in African ancestry (P=3.76×10−5) but showed little evidence of an effect in European ancestry (P=0.83) (see **Table S16B**, the full results could be queried using the EpiGraphDB web application). Moreover, some recent trans-ancestry fine mapping studies have focused on estimating the influence of potential pleiotropy on the causal variant identification by using multi-ancestry GWAS datasets14,15. Due to the limited power of the African datasets, it is still challenging to clearly claim heterogeneity of drug response and test for pleiotropy in this study, but we hope future studies will address these questions as larger datasets become available. Our multi-ancestry MR pipeline (including multi-ancestry application of PWCoCo) and pQTL comparison approach implemented in TAMR provides a useful framework for such future studies. The concept of generalizability of drug target effects could be extended to identify sex-specific effects. In this study, we identified 15 MR signals with robust evidence to support sex-specific effects. In addition to the protective effect of IL17RA inhibition on asthma in males, our results further suggested reconsideration of efficacy of two protein-disease pairs (see **Table S11**). ERAP1 is the protein target of an anti-leukaemia drug -- Tosedostat, and NQO1 is the protein target of an anti-neurodegressive drug -- Vatiquinone. However, our study found that the estimated causal effects of ERAP1 level on IPF as well as NQO1 level on HF were relatively strong, but the effect estimates were in opposite directions in males and females. Whether these targets and/or drugs may have different drug responses in males and females needs further investigation. In the future, more comprehensive sex-specific proteome MR could be conducted using sex-stratified pQTLs against male- and/or female-only diseases, e.g. on pregnancy and perinatal outcomes48 to predict drug target effects in pregnant women. Our study also provides methodological guidance for future proteome MR. Previously, we showed that naïve application of MR without sensitivity analyses may yield over 30% unreliable results5. A recent study further suggested that 51% of results from transcriptome-wide association studies could not be confirmed by genetic colocalization49. Another study showed the importance of distinguishing disease-causing gene expression from disease-induced gene expression by evaluating reverse causality using genetic data28,50. In this study, we considered these alternative explanations (including reverse causality, confounding by LD and horizontal pleiotropy) and developed a pipeline that integrates some novel sensitivity analyses (e.g. TAMR for multi-ancestry pQTL comparison and extension of PWCoCo in multiple ancestries). In the future, integrating our proteome MR pipeline with other methods, including transcriptome-wide association study51 and drug discovery pipeline37 will provide more robust evidence to support causal gene/protein identification and drug target prioritization. Our study has several limitations. First, the statistical power of the African specific pQTL data and disease GWASs were still limited compared to the European datasets. Although GBMI has incorporated one of the largest consortia of GWASs in African ancestry, and we applied the generalised IVW and Egger regression method31 to increase the number of instruments included in the model and boost power52, the number of MR results reaching our evidence threshold in African ancestry were still 6.4 times lower than in European ancestry. This is mainly because the sample size of the African pQTLs was 3.8 times lower than that of the European pQTLs (7,213 European ancestry vs 1,871 African ancestry), and some of the disease, e.g. IPF, has limited number of cases in African ancestry (see **Table S6A**). Second, to reduce the possibility of identifying false positive findings, we applied the FDR threshold in discovery, replication and multi-ancestry comparison separately. We further applied a rigorous set of sensitivity analyses and triangulated the MR findings with observational and clinical trial evidence. This increased the reliability of our top findings, however, some of them still showed relatively wide confidence intervals. We therefore recommend caution in interpreting the results and further validation of findings in future proteome MR studies. Third, given the increased power of the pQTL study, 79.1% of the proteins now have two or more conditional independent pQTL signals in the cis regions. We therefore applied some conventional MR sensitivity approaches, including the MR-Egger approach, which can test for horizontal pleiotropy. However, we note that use of MR Egger with few instruments may be biased as it is unlikely that the InSIDE assumption will be satisfied because the sample correlation between the pleiotropic effects and instrument strengths could be quite large by chance. Ideally, over 30 variants are needed to ensure that the MR Egger bias term settles close to zero34. Finally, since we included non-specific pQTLs as instruments in this study, some of the MR findings could be influenced by pleiotropy. For instance, pQTLs within the ABO region are known to be pleiotropic and associated with multiple proteins4. In this study, we identified the putative causal role of ABO level on VTE, which aligns well with existing and novel genetic evidence of this protein-disease pair45. This example demonstrated that proteome MR results using pleiotropic instruments may still yield additional evidence to support drug target prioritization, though caution and extra validation (e.g. pleiotropy test) are needed when interpreting the results. ### Recommendations for multi-ancestry MR in the era of global biobank meta-analysis Although proteome-wide MR shows promise in drug target prioritisation, there is little consistency in analytical strategy and the approach to reporting MR findings. Recently, the Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomisation (STROBE-MR) statement has been released to define the reporting standard of MR findings ([https://www.strobe-mr.org/](https://www.strobe-mr.org/))53,54. To complement these guidelines, we list here some specific challenges of MR and provide some specific recommendations for multi-ancestry proteome-wide MR in the era of global biobank meta-analysis. #### Selection and validation of genetic instruments for proteins Instrument selection is a key step for all types of MR analyses. The selection of instruments for proteome MR has been discussed previously55,52,2. Our study provides a comprehensive pipeline for pQTL instrument selection and validation. We also provide some clues for two key challenges: 1. a biological way to categorize pQTLs is into cis-acting pQTLs (which are pQTLs that are within or close to the protein coding gene) and trans-acting pQTLs (which are pQTLs that are not within or close to the protein coding gene). In general, we recommend using cis-acting pQTLs as genetic instruments for future MR since they are considered to have a higher prior probability of specific biological effects52,56. However, cis-pQTLs that alter epitope-binding sites can lead to potential false-negative variant-protein association signals, resulting in biased MR estimates2,4, which need to be used with caution. Where cis-acting variants are not available, non-pleiotropic trans-acting pQTLs could be considered as a backup if they can map to functioning genes57,6. 2. most proteome GWAS4,7,8,18,29,58,59 and MR5,6,60 studies have been conducted in plasma samples, and there is little tissue-specific data for the human proteome. Recently, some studies have identified brain- and cerebrospinal fluid-specific pQTLs and further identified their roles on brain-related diseases using MR61,62,63. This type of study may be able to detect tissue-specific protein effects on diseases. ### Selection of outcomes The outcome selection is the other key step in proteome MR. We summarise three key considerations: 1. evaluating drug efficacy on progression of a disease using MR in disease cases will represent treatment of disease64. However, until now, most of the MR studies have been conducted using disease incidence data in a case-control setting rather than in disease progression. The future development of global biobank meta-analysis is likely to create a valuable source of data for studying disease progression. One advantage of studying disease progression in biobanks is that many biobanks have linked participants with their electronic health record, making it easier to obtain disease progression information. In addition, some novel genetic epidemiology methods such as the Slope-Hunter approach65 have been developed to detect and adjust potential selection bias introduced by disease progression data66. 2. Most MR analyses also assume that genetic effects on proteins are consistent in different subgroups of the population (e.g. in males and females; in diseased patients and healthy controls). However, naïve use of genetic effects generated in a general population as a proxy in disease subgroups may yield biased estimates of causal relationships, which we have illustrated recently for C-reactive protein67. GBMI provides ancestry-specific and sex-specific GWAS data, and also clearly defines disease cases and controls. Both features offer the opportunity to implement MR in subgroups of the populations in the future. 3. For “two-sample” genetic epidemiology approaches (such as genetic correlation, polygenic risk score association, transcriptome-wide association study and MR), a less considered issue is the consistency of covariates/adjustments that were used in the exposure (i.e. protein) GWAS and outcome (i.e. disease) GWAS. A recent study suggested that this can produce biased two-sample MR estimates68. Currently, the GBMI analysis pipeline adjusts for a standard set of covariates for genetic discovery, e.g. age, sex, and principal components. Future GWAS of GBMI and related biobanks could consider providing multiple GWAS for each trait, with differing sets of covariates and/or environmental factors (e.g. with and without BMI). ### Proteome-wide MR and sensitivity analyses A large collection of MR and sensitivity methods can be used to estimate the causal roles of proteins on diseases. We have listed a few general recommendations here: 1. for discovery analysis, Wald ratio and IVW work effectively30,69. However, to increase power and reliability of the MR estimate, generalised IVW and MR-RAPS are potential alternatives31,70. 2. For sensitivity analyses, the following two methods are central to key assumptions: 1. Genetic colocalization is important in distinguishing causality from confounding by LD36,71,72. Such confounding could cause a false inference of a causal effect of the drug target on the disease. Some recent methods have been developed to relax the single causal variant assumption in colocalization5,73,74. 2. MR Steiger filtering is a method that was designed to estimate the directionality of exposure-outcome effecs27, key to addressing potential reverse causality. ## Conclusions In summary, this MR study systematically investigated protein effects on eight complex diseases across European and African ancestries, providing valuable evidence to inform the generalisability of drug targets to other less studied ancestries. We anticipate that a new era of proteome MR will soon emerge, using new proteome resources from large-scale biobanks such as UK Biobank and CHARGE. Our findings, analysis pipeline and recommendations on proteome MR will help future studies design, conduct and interpret multi-ancestry proteome-wide MR. ## STAR*METHODS ## KEY RESOURCES TABLE View this table: [Table3](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/T3) ## CONTACT FOR REAGENT AND RESOURCE SHARING The multi-ancestry proteome Mendelian randomization pipeline was shared via the GBMI GitHub repository ([https://github.com/globalbiobankmeta/multi-ancestry-pwmr](https://github.com/globalbiobankmeta/multi-ancestry-pwmr)). Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Jie Zheng (jie.zheng{at}bristol.ac.uk). ## EXPERIMENTAL MODEL AND SUBJECT DETAILS We accessed protein quantitative trait loci (pQTL) data from the ARIC, AASK cohorts20, INTERVAL study4, Folkersen et al29, Yao et al59 and Emilsson et al60. The majority of the proteins were measured using the SOMAlogic platform ([https://somalogic.com/](https://somalogic.com/)). For European samples, 7,212 European ancestry individuals with protein data from the ARIC cohort were selected as discovery samples, and 3,301 Europeans with protein data from INTERVAL study and/or Emilsson et al were selected as replication samples. For African samples, 1,871 African ancestry individuals with protein data from the ARIC cohort were selected as discovery samples, and 467 African ancestry individuals with protein data from AASK were selected as replication samples. For the disease outcome data, we assembled GWAS data from up to 1,219,215 European samples (100,736 cases and 1,118,479 controls), and up to 32,653 African samples (5,054 cases and 27,599 controls) with genetic association information from the Global Biobank Meta-analysis Initiative (GBMI; **Table S6**). We further utilized individual-level protein and disease data from the HUNT study42 to conduct an observational association analysis. Detailed information about these cohorts is listed below. ### Atherosclerosis Risk in Communities Study (ARIC) The Atherosclerosis Risk in Communities Study (ARIC), sponsored by the National Heart, Lung, and Blood Institute (NHLBI) is a prospective study to investigate the etiology of atherosclerosis and its clinical side effects and variation in cardiovascular risk factors, medical care, and disease by location, gender, race and date. Starting from 1987, each ARIC field centre randomly recruited around 4,000 adults aged 45-64 years to receive extensive examinations including medical, social, and demographic data. All the subjects would be examined twice, three years apart and the follow-up is always conducted with phone calls to maintain contact and assessment. ARIC aims to use modern biochemistry and observational analysis to promote atherosclerosis study to a deeper and broader scope75. ### African American Study of Kidney Disease and Hypertension Cohort Study (AASK) The African American Study of Kidney Disease and Hypertension Cohort Study (AASK Cohort) is an extension of the AASK Clinical Trial. It’s a prospective observational study conducted in multiple centers. The primary objective of the AASK Cohort Study is to determine the possible course of renal function variation and risk factors for chronic kidney disease (CKD) progression in African-Americans with hypertensive kidney disease apart from BP control and use of recommended reno-protective, anti-hypertensive medication. A secondary objective is to determine the occurrence of cardiovascular disease and also discover and evaluate its risk factors76. ## INTERVAL The INTERVAL study is an open randomized trial comprising about 50,000 participants of varying blood donation intervals77. The primary goal is to determine whether blood can be safely and acceptably collected from donors by England’s National Health Service Blood and Transplant (NHSBT) more frequently and at similar intervals to other European countries. The INTERVAL BioResource provides a very powerful research platform to store and analyse detailed information of donor health. Apart from Serial collection of biological samples and clinical information, the study also includes extensive genetic, haematological, biochemical lifestyle, side-effects and other donation-related characterisation of donors. All the information could be linked with electronic health records. For SomaLogic assays, Sun et al randomly selected two non-overlapping sub-cohorts of 2,731 and 831 participants from INTERVAL. After genetic quality control, 3,301 participants (2,481 and 820 in the two sub-cohorts) remained for analysis. No statistical methods were used to determine sample size. The experiments were not randomized. Laboratory staff conducting proteomic assays were blinded to the genotypes of participants. ### Global Biobank Meta-analysis Initiative (GBMI) Global Biobank Meta-analysis Initiative (GBMI) is a collaborative network containing multiple biobanks collaborating through meta-analysis with established resources of genotype, phenotype and GWAS to develop a global and growing resource for human genetics research ([https://www.globalbiobankmeta.org/](https://www.globalbiobankmeta.org/)). GBMI currently represents 2.6 million research participants with health and genetic data from twenty-one biobanks across four continents. It incorporates diverse ancestries in genetic studies by including biobank samples from 6 main populations and 14 endpoints selected based on the common interest of the contributing biobanks22. Incorporating samples with diverse ancestries in the biobank meta-analysis enables comparison of effect sizes of genomic loci across ancestry. Also, the sex-stratified meta-analysis allows for comparing effect sizes of the genomic loci between sexes. ### The Trøndelag Health Study (HUNT) study The Trøndelag Health Study (HUNT) is a population-based cohort in Trøndelag County in Norway42. From the third survey, HUNT3, performed in 2006–2008, protein measurements were performed in a subset of collected serum samples78,79 and results were available in 3190 individuals for whom status for the eight disease outcomes were also available. Serum samples were analysed using the multiplexed, aptamer-based, affinity proteomics platform (SOMAscan™). ## METHOD DETAILS ### Genetic instrument selection of plasma proteome In this study, the genetic variants associated with plasma proteins were used as genetic instruments for the MR analysis. We started the instrument selection process by accessing the pQTL data from three cohorts, ARIC, INTERVAL and AASK. We selected all conditionally independent pQTLs that were associated with proteins at a false discovery rate (FDR) < 0.05. To fit with the data requirements of the MR and colocalization analyses, we only selected pQTLs with full summary statistics available in the cis-acting regions. Although the pQTLs are conditionally independent signals, we still applied LD clumping to remove pQTLs in very strong LD with the top signals (LD r2<0.6) to avoid the issue of collinearity in the MR model. For the discovery MR analysis, 6,614 pQTLs of 1,310 proteins in 7,212 Europeans from ARIC (**Table S1**) and 3,900 pQTLs of 1,311 proteins in 1,871 Africans from ARIC were kept after selection (**Table S2**)20. Among the protein data used in the discovery analysis, 1,076 proteins were measured in both African and European ancestries and used for the multi-ancestry comparison analysis (details described in later section). ### Genetic instrument validation #### Validation of instrument strength To quantify the statistical power of the pQTLs, we estimated the strength of the genetic predictors of each tested variant using F-statistics. If any pQTLs had F-statistics lower than the widely used threshold of 10, we considered those to have limited power (potentially causing weak instrument bias80) and removed these from the MR and follow-up analyses. #### Validation of instruments using directionality test From a drug development point of view, a valid drug will influence the protein level, altering disease risk as a consequence. Therefore, we conducted a directionality test to better understand the direction of effect of the MR findings. We used Steiger filtering27 to test the directionality of the pQTL-disease associations for all candidate instruments. Any pQTLs with Steiger filter flag as FALSE (which means the pQTL explains more of the variance in the outcome than it does the variance in the exposure) were removed from the MR and follow-up analyses (**Table S1** and **S2**). #### Validation of instruments using heterogeneity test and specificity test of instruments **Figure S1** illustrates the instrument validation process using a tier system we developed in a previous MR study5. We conducted two types of validations in the European samples and split the pQTLs into three tiers. First, we estimated the heterogeneity of pQTL effects across ARIC20 and INTERVAL4. For pQTLs showing heterogeneous effects across the two studies (defined as p value of pair-wise Z-score < 0.001; **Table S3**), we set them as tier 2 instruments (**Table S1**). With the assumption that a pQTL effect in one study may highlight a true differential causal effect of protein on disease, we kept the tier 2 instruments (heterogeneous instruments) and conducted the MR analysis using pQTL from ARIC and INTERVAL separately. Second, a pQTL associated with multiple proteins means we cannot determine which protein(s) influences disease in a MR setting. We therefore estimate the specificity of the pQTLs in European samples from ARIC. Given ARIC only provided the cis-acting pQTLs, we estimated the specificity of the ARIC pQTLs using INTERVAL pQTLs as a reference panel, in which full GWAS summary statistics in cis- and trans-acting regions were provided. For ARIC pQTLs (and their LD proxies with r2>0.8) associated with more than five proteins in the INTERVAL data, we set these pQTLs as tier 3 instruments (non-specific instruments) and kept them from the MR analysis with caution of potential non-specificity. For African samples, given we have limited data to conduct heterogeneity and specificity tests using additional African samples directly, we were not able to conduct the above validation analyses. These tests need to be carefully considered once more African datasets were available). ### Outcome selection in the Global Biobank Meta-analysis Initiative We selected disease GWASs from GBMI using four criteria: 1. Both African and non-Finnish European GWAS summary statistics were available in GBMI. 2. Number of cases over 100 so that the logistic mixed model, SAIGE81, used for the GWAS provided good power. 3. the pQTL data we applied were obtained from both males and females, we used sex-combined disease GWAS for the main MR analysis, so the exposure and outcome of the MR were equally represented in the population. 4. Given we conducted sex specific proteome MR analysis as a follow-up analysis, we therefore selected the male- and female-only disease GWAS from GBMI and used them as outcomes in the sex specific MR analysis. Based on these criteria, eight diseases were selected as the outcomes for the MR analysis, including: idiopathic pulmonary fibrosis (IPF), primary open-angle glaucoma (POAG), heart failure (HF), venous thromboembolism (VTE), stroke, gout, chronic obstructive pulmonary disease (COPD) and asthma. The sample size of the eight African-specific GWASs were from 3,867 to 35,209 (**Table S6A**). The sample size of the eight European-specific GWASs were from 469,078 to 1,219,215 (**Table S6B**). ## QUANTIFICATION AND STATISTICAL ANALYSIS ### Discovery proteome-wide MR of complex diseases in European and African ancestries In the discovery MR analysis, we estimated the putative causal effects of proteins on the eight selected human diseases in European and African ancestries separately. To best represent the genetic signals in the cis-acting region and boost power, we conducted one of the three sets of analysis depending on how many protein instruments had been selected. For proteins with only one instrument, we conducted Wald ratio analysis30 to estimate the effects between proteins and diseases. For proteins with two or more instruments, we used the conditional independent pQTLs as genetic instruments and applied a generalised inverse variance weighted (gIVW) approach that takes into account the correlation between nearby pQTLs31. For proteins with three or more instruments, we further applied a generalised MR-Egger regression (gEgger) approach that considered the correlation among pQTLs31. The MR estimates with FDR corrected P value < 0.05 were used to select candidate protein-disease signals for follow up analyses (number of tests for MR in Europeans=11,612; number of tests for MR in Africans=9,858). The MR analyses were conducted using the MendelianRandomization R package82 implemented in the TwoSampleMR R package (github.com/MRCIEU/TwoSampleMR)83. To select MR estimates with good genetic evidence, we applied two thresholds here: 1. A Benjamini-Hochberg false discovery rate (FDR) of 0.05 was applied to select best MR estimates with robust signals. 2. A MR P value of 0.05 was applied to create an extensive list of MR signals for the sensitivity and replication analysis. ### Sensitivity analysis of candidate MR signals To increase the reliability of the MR signals, we applied a set of five sensitivity analyses for the candidate MR signals. #### Estimation of horizontal pleiotropy and heterogeneity of MR signals With increasing power of the protein GWAS, 79.2% of the pQTLs have two or more instruments in the cis region, we therefore applied two sensitivity analyses for conventional Mendelian randomization. First, we applied the gEgger method31 and considered the intercept term of the gEgger approach as an indicator to estimate the potential effect of pleiotropy32. For MR signals with a gEgger intercept p value lower than 0.05, we considered these protein-disease signals as influenced by horizontal pleiotropy. Due to the importance of controlling for pleiotropy in MR analysis, these MR signals were excluded from any of the follow-up analyses and listed separately in **Table S9**. Second, we applied Cochran’s Q test for gIVW results and Rücker’s Q test for gEgger results34,35 to estimate the potential heterogeneity of MR estimates across each pQTL10. Heterogeneity could be caused by various reasons (e.g. by measurement error)10, we therefore still kept the MR signals with evidence of heterogeneity for the follow-up analyses, but flag the potential heterogeneity in the MR results tables (**Table S7** and **S8**). #### Genetic colocalization analysis of the candidate MR signals Results that passed the MR p-value threshold of p<0.05 and the pleiotropy test using MR-Egger regression were evaluated using genetic colocalization analysis. The purpose of this analysis was to distinguish causal MR signals from protein-disease pairs confounded by LD (see **Figure 5A**). We applied three sets of colocalization analyses to obtain more reliable colocalization evidence. First, we applied an approximate colocalization analysis we developed, which is noted as LD check5. We estimated the LD r2 between each pQTL against all variants with GWAS P<1×10−3 in the region associated with the disease outcomes. R2 of 0.7 between the pQTL and any of the outcome variants was used as evidence for approximate colocalization. Second, we applied conventional genetic colocalization analysis using the ‘coloc’ R package36. For these colocalization analyses, we used slightly more relaxed prior probabilities that a variant is equally associated with each phenotype (p1=1×10−3; p2=1×10−3) and both phenotypes jointly (p12=1×10−4). There are two reasons for this: (i) the pQTLs have passed our instrument selection and validation, therefore have good instrument strength to suggest that these variants were robustly associated with the protein level, so we relaxed the prior probability for p1; (ii) as this analysis is based on candidate MR signals, there is some evidence to support the effect of proteins on diseases already, so we relaxed the probability for p2 and p12. A colocalization probability (PP.H4) > 70% in this analysis would suggest that the two genetic association signals are likely to colocalize within the test region. Third, conventional colocalization may provide unreliable inference in some regions due to the presence of multiple independent (but partially correlated) genetic association signals. We therefore applied pairwise conditional and colocalization (PWCoCo) analysis5 of all conditionally independent pQTLs against all conditionally independent association signals for the disease outcomes. For the 830 and 388 protein-disease pairs showing suggestive MR evidence in European and African ancestries (**Table S7** and **S8**), we conducted PWCoCo analysis using our newly developed C++ pipeline ([https://github.com/jwr-git/pwcoco](https://github.com/jwr-git/pwcoco)). The 1000 Genome genotype data for European and African samples were used separately as the LD reference panel33 for the PWCoCo analysis. #### Estimation of potential aptamer binding artificial effect of pQTLs The aptamer binding artefacts driven by protein-altering variants may create false genetic associations between genetic variants and proteins and therefore bias the causal estimates. We considered the influence of such bias by checking whether the pQTL instruments or their LD proxies (r2>0.8) were defined as missense, stop-lost or stop-gained variants using the Ensembl Variant Effect Predictor (VEP)38 (variants annotation listed in **Table S1** and **S2**). When the MR signals were identified as involving one or more of these coding variants, we flagged the MR signals to warn the reader of this potential bias (**Table S7** and **S8**). ### Replication MR analysis In the replication MR analysis, for any protein-disease pairs that passed the MR threshold p<0.05, we selected conditionally independent pQTLs from two independent proteome GWAS studies: European pQTLs from Zheng et al5 and African pQTLs from AASK study. For consistency, the same instrument selection and validation was applied. After selection, there were 285 pQTLs of 285 proteins in Europeans from Zheng et al5 (**Table S4**) and 290 pQTLs of 290 proteins in 467 Africans from the AASK study20 (**Table S5**). The same MR pipeline used for the discovery MR was applied here for the replication MR. A Benjamini-Hochberg FDR of 0.05 was applied to pick out MR estimates with replication MR evidence. ### Sex-specific MR of candidate protein-disease signals We conducted sex-specific MR analysis to identify protein-disease pairs with different effect estimates in males and females. All proteins and disease outcomes were included in this analysis. Among them, 8649 protein-disease pairs and 8527 pairs have available data in European ancestry to conduct the MR analysis in males and females separately, similarly, 8076 and 7851 pairs have available data for African ancestry. The sex-specific GWAS of the eight outcomes were used as outcomes for the sex-specific MR. The same statistical model and sensitivity analyses pipeline was applied in the sex specific MR. A pair-wise Z score P value less than 0.01 between male- and female-only MR estimates was used as threshold to pick out MR estimates with sex specific effects (**Table S11**). ### Multi-ethnic comparison of pQTL and proteome MR effect estimates across ancestries #### Estimation of ancestry specificity of pQTL and MR effects across ancestries We systematically evaluated the ancestry specificity of pQTL using two functions implemented in the TAMR package ([https://github.com/universe77/TAMR](https://github.com/universe77/TAMR)): (i) estimate whether the direction of effect was consistent across ancestries; (ii) estimate whether the signals was significant in both ancestries. To better control the influence of different effects, allele frequencies and power of pQTLs across the two ancestries, we applied a Bayesian Winner’s Curse correction analysis described in a previous study40. The method estimates probability that one pQTL has a matching pQTL across two ancestries using beta, se and sample size of the pQTLs in the two ancestries and further estimates the expected number of pQTLs with same direction of effect and/or same level of significance. For the 1,096 proteins with full summary statistics available, we first excluded proteins showed no pQTL association in either ancestry. For the remaining 1,076 proteins, we conducted the ancestry specificity analysis using the European pQTL effects to mimic African pQTL effects and vice versa. In total, four analyses were conducted: 1. African pQTLs estimate the direction of effect of European pQTLs (**Table S12A**). 2. African pQTLs estimate the significance of African pQTLs (**Table S12A**). 3. European pQTLs estimate the direction of effect of African pQTLs (**Table S12B**). 4. European pQTLs estimate the significance of African pQTLs (**Table S12B**). #### Estimation of multi-ancestry and ancestry-specific pQTLs After obtaining an overall idea of the ancestry specificity of pQTLs, we generated a list of multi-ancestry and ancestry specific pQTLs using the following approach: 1. For regions with pQTL signal in one ancestry but not the other, we defined these regions as ancestry specific regions (**Figure 3** situation 4) and set pQTLs in these regions as ancestry specific pQTLs in non-shared regions (**Table S13**). 2. For the remaining regions with pQTL signals in both ancestries, we looked up the European pQTLs (or its LD proxies with LD r2>0.8) in the African ancestry and vice versa. If there is overlap signal (with P<1×10−3), we set them as multi-ancestry pQTLs (**Figure 3** situation 2 or 3; **Table S14**). 3. For the regions with pQTLs in both ancestries, we further picked out those pQTLs without replication signal in the other ancestry and noted them as ancestry specific pQTLs in the shared regions (**Figure 3** situation 1; **Table S15**). #### Estimation of multi-ancestry and ancestry specific proteome MR signals We conducted a multi-ancestry comparison of the proteome MR estimates across European and African ancestries. To identify the multi-ancestry protein-disease pairs with MR evidence in both ancestries, we selected the 60 pairs that passed FDR threshold of 0.05 in the discovery MR analysis (**Table S7A** and **S8A**). Within the 60 pairs, 59 of them were unique pairs, with the ABO level on VTE showed strong MR signals in both ancestries. For the 59 unique pairs, we corrected their MR signals using an FDR threshold of 0.05 based on the 59 pairs. For those pairs that passed the FDR threshold in the multi-ancestry comparison analysis (**Table S16A**), we further checked their MR and colocalization evidence in the replication MR analysis. Those pairs with MR and colocalization evidence in both multi-ancestry comparison and replication analysis was considered as the most reliable multi-ancestry protein-disease signals. We also tried to identify ancestry specific protein-disease pairs that only showed MR evidence in one ancestry (but not in the other). For African specific MR signals, we selected 86 protein-disease pairs that showed marginal MR signal (MR P<0.05) and colocalization evidence (colocalization probability>0.7) in the discovery MR analysis. We applied an FDR threshold of 0.05 based on these 86 pairs. For those pairs passed the FDR threshold, we checked their effect in the European proteome MR and excluded pairs that showed up in the multi-ancestry MR list (**Table S16A**). For the remaining pairs, we checked their MR and colocalization evidence in the African replication MR (**Table S16B**). Those protein-disease pairs showed MR and colocalization evidence in both African specific analysis and replication analysis were picked as the African specific protein-disease signals. For European-specific MR signals, we applied the same approach as the African specific analysis. This analysis was conducted by selecting 386 protein-disease pairs with marginal MR evidence (MR P<0.05) and colocalization evidence (colocalization probability>0.7) in the discovery MR analysis. The same FDR correction was applied, with protein-disease pairs overlapped with the multi-ancestry list been excluded (**Table S16C**). The MR and colocalization evidence from the European replication analysis were further considered. Those pairs with MR and colocalization evidence in European specific analysis and replication analysis were selected as the European specific protein-disease signals. ### Triangulation of protein-disease MR signals with observational and clinical trial evidence Proteins are the targets for most drugs, and therefore have high value for drug target validation and drug reproposing. For the 16 protein-disease pairs with multi-ancestry or ancestry-specific evidence in the multi-ancestry comparison analysis, we triangulated these MR findings with the observational evidence obtained from HUNT (**Table S17**) as well as with clinical trial evidence provided by the Open Targets43 and DrugBank44 databases (**Table S18**). For these 16 protein-disease pairs, we estimated their observational associations using individual level data from HUNT. The protein measurements for the 16 disease-associated proteins were rank transformed using the function RankNorm in the RNOmni R package. Further, residuals were extracted from a linear model including the transformed protein values in addition to age and sex. The residuals were then included in a logistic model for disease status in addition to age and sex as covariates. The eight disease outcomes (Asthma, COPD, Gout, POAG, VTE, IPF, Stroke, HF) were defined according to the GBMI phecodes using ICDs (details in the GBMI flagship paper22). All individuals provided informed written consent and the study was approved by the Regional Committee for Medical and Health Research Ethics (REK # 2018/1622). In addition, since the HUNT proteome datasets were collected in a cardiovascular disease (CVD) enriched cohort, we considered the potential influence of sample selection on the observational associations and used the prevalent cases vs controls as the model (**Table S17**). For Open Targets, the individual score of each evidence category was recorded (**Table 2**). For proteins targeting existing drugs or drugs under clinical development, we further checked the details of the clinical trials from [Clinialtrials.gov](http://Clinialtrials.gov) and recorded drug names and primary indications (e.g. a disease). In DrugBank, each protein was searched as a target. The protein-drug pair with actual action (e.g. as an inhibitor) was recorded together with the primary indications. For observational associations, the direction and robustness of the association signals of MR and observational analyses were compared (**Table 2** and **Table S18**). #### Drug target prioritization profiling To summarise evidence of this study, we developed a drug target prioritization profiling procedure. Four key steps were used here to select the most promising protein-disease pairs from 11,612 pairs in European ancestry and 9,858 pairs in African ancestry (**Figure 6A**). First, the MR signals with FDR < 0.05 in discovery (or multi-ancestry comparison analysis) were used to select candidate protein-disease pairs. Second, sensitivity analyses including pleiotropy test (little evidence from MR-Egger regression) and three types of colocalization (colocalization probability>0.7) were applied to select more robust protein-disease pairs. Third, we further selected protein-disease pairs with multi-ancestry or ancestry-specific MR evidence (FDR<0.05 in multi-ancestry comparison) and replication MR evidence, and considered them as protein-disease pairs with reliable genetic evidence. Fourth, integrating the genetic evidence with observational and trial evidence, the most reliable pairs with high drug development value were prioritized. The evidence level for the most promising pairs were summarised in **Figure 6B** and **Table S18**. ### Data Availability The GBMI GWAS summary statistics used in the analyses described here are freely accessible on the GBMI website ([https://www.globalbiobankmeta.org/](https://www.globalbiobankmeta.org/)). All our MR estimates and colocalization results (including 11,612 protein-disease signals in European ancestry and 9,858 signals in African ancestry) are freely available to browse, query and download via the EpiGraphDB platform24 ([https://epigraphdb.org/multi-ancestry-pwmr/](https://epigraphdb.org/multi-ancestry-pwmr/)). An application programming interface (API) documented on the site enables users to programmatically access data from the database. ## Supporting information supplemental Tables [[supplements/268473_file03.xlsx]](pending:yes) ## Data Availability All data produced are available online [https://www.globalbiobankmeta.org/](https://www.globalbiobankmeta.org/) [http://nilanjanchatterjeelab.org/pwas](http://nilanjanchatterjeelab.org/pwas) [https://epigraphdb.org/trans-ancestry-pwmr/](https://epigraphdb.org/trans-ancestry-pwmr/) ## Author contribution J.Z., T.R.G., B.M.N., W.Z., and B.M.B. conceived and designed the study and oversaw all analyses; H.L.Z performed the Mendelian randomization analysis; H.L.Z. conducted the sensitivity analysis including colocalization analysis. H.R., T.H.N., L.B. and B.M.B conducted the observational analysis using HUNT data. G.H., Y.C. and J.Z. performed the pQTL and MR results comparison. J.Z. performed the triangulation and drug target prioritisation analysis; Y.L. developed the database and web browser. H.L.Z and J.Z. wrote the manuscript; G.D.S., B.M.B., W.Z. B.M.N. and T.R.G. reviewed the paper and provided key comments. ## Conflicts of interest J.Z., T.R.G. and G.D.S. receive funding from Biogen for other work on drug target prioritization. BMN is on the scientific advisory board at Deep Genomics and Neumora, and consultant for Camp4 Therapeutics, Takeda Pharmaceutical, and Biogen. ## Supplementary Figures ![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F8.medium.gif) [Figure S1.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F8) Figure S1. Instrument validation using a tier system. ![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F9.medium.gif) [Figure S2.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F9) Figure S2. Comparing genetic effects of pQTLs across Zhang et al and Sun / Folkersen et al. Pearson correlation = 0.92. ![Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/11/2022.01.09.21268473/F10.medium.gif) [Figure S3.](http://medrxiv.org/content/early/2022/01/11/2022.01.09.21268473/F10) Figure S3. The putative causal effect of protein level of SERPINE2 on three tested diseases. ## Acknowledgements The authors thank Dr. Shinichi Namba for the internal GBMI review of this manuscript. J.Z. is supported by the Academy of Medical Sciences (AMS) Springboard Award, the Wellcome Trust, the Government Department of Business, Energy and Industrial Strategy (BEIS), the British Heart Foundation and Diabetes UK (SBF006\1117). J.Z. is funded by the Vice-Chancellor Fellowship from the University of Bristol. GDS, TRG, GH, JZ, HR, YC, YL work in a Unit supported by the Medical Research Council for the Integrative Epidemiology Unit (MC_UU_00011/1 & 4) at the University of Bristol. TRG holds a Turing Fellowship from the Alan Turing Institute. L.B. works in a research unit funded by the K.G. Jebsen Center for Genetic Epidemiology funded by Stiftelsen Kristian Gerhard Jebsen; Faculty of Medicine and Health Sciences, NTNU; The Liaison Committee for education, research and innovation in Central Norway; and the Joint Research Committee between St. Olavs Hospital and the Faculty of Medicine and Health Sciences, NTNU The UK Medical Research Council and Wellcome (Grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and Jie Zheng will serve as guarantor for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website ([http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf](http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf)); The work is supported by Cancer Research UK grant, Integrative Cancer Epidemiology Programme (C18281/A19169). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health and Social Care. This work was supported by the Elizabeth Blackwell Institute for Health Research, University of Bristol. We gratefully acknowledge all studies and biobanks that have contribute to the Global Biobank Meta-analysis Initiative: BioBank Japan, BioMe, BioVU, Canadian Partnership for Tomorrow, Colorado Center for Personalized Medicine, China Kadoorie, deCODE Genetics, East London Genes & Health, Estonian Biobank, FinnGen, Generation Scotland, HUNT, Lifelines, Michigan Genomics Initiative, Million Veteran Program, Netherlands twin register, Partners Biobank, QIMR Berghofer - QIMR Biobank (QSkin and GenEpi), Taiwan Biobank, UCLA Precision Health Biobank, UK Biobank. We gratefully acknowledge Zhang et al made the proteome QTL data publicly available ([http://nilanjanchatterjeelab.org/pwas/](http://nilanjanchatterjeelab.org/pwas/)). * Received January 9, 2022. * Revision received January 9, 2022. * Accepted January 11, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Bachtiar, M. & Lee, C. G. L. Genetics of Population Differences in Drug Response. Curr Genet Med Rep 1, 162–170 (2013). 2. 2.Holmes, M. V., Richardson, T. G., Ference, B. A., Davies, N. M. & Davey Smith, G. Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development. Nat Rev Cardiol 18, 435–453 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41569-020-00493-1&link_type=DOI) 3. 3.Astle, W. J. et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell 167, 1415–1429.e19 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2016.10.042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27863252&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 4. 4.Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0175-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29875488&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 5. 5.Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet 52, 1122–1131 (2020). 6. 6.Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2, 1135–1148 (2020). 7. 7.Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science eabj1541 (2021) doi:10.1126/science.abj1541. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1126/science.abj1541&link_type=DOI) 8. 8.Ferkingstad, E. et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 1–10 (2021) doi:10.1038/s41588-021-00978-w. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00978-w&link_type=DOI) 9. 9.Boer, C. G. et al. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 184, 4784–4818.e17 (2021). 10. 10.Zheng, J. et al. Recent Developments in Mendelian Randomization Studies. Curr Epidemiol Rep 4, 330–345 (2017). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29226067&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 11. 11.Brown, B. C., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 99, 76–88 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2016.05.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27321947&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 12. 12.Márquez-Luna, C., Loh, P.-R., South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 41, 811–823 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.22083&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 13. 13.Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun 10, 3328 (2019). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 14. 14.Mahajan, A. et al. Trans-ethnic Fine Mapping Highlights Kidney-Function Genes Linked to Salt Sensitivity. Am J Hum Genet 99, 636–646 (2016). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27588450&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 15. 15.Morris, A. P. et al. Trans-ethnic kidney function association study reveals putative causal genes and effects on kidney-specific disease aetiologies. Nat Commun 10, 29 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-018-07867-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30604766&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 16. 16.Zheng, J. et al. Trans-ethnic Mendelian-randomization study reveals causal relationships between cardiometabolic factors and chronic kidney disease. Int J Epidemiol (2021) doi:10.1093/ije/dyab203. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyab203&link_type=DOI) 17. 17.Sirugo, G., Williams, S. M. & Tishkoff, S. A. The Missing Diversity in Human Genetic Studies. Cell 177, 26–31 (2019). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 18. 18.Emilsson, V. et al. Co-regulatory networks of human serum proteins link genetics to disease. Science (2018) doi:10.1126/science.aaq1327. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjEvNjQwNC83NjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMS8xMS8yMDIyLjAxLjA5LjIxMjY4NDczLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 19. 19.Zhou, S. et al. A Neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity. Nat Med 27, 659–667 (2021). 20. 20.Zhang, J. et al. Large Bi-Ethnic Study of Plasma Proteome Leads to Comprehensive Mapping of cis-pQTL and Models for Proteome-wide Association Studies. bioRxiv (2021) doi:10.1101/2021.03.15.435533. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMS4wMy4xNS40MzU1MzN2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 21. 21.Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 48, 245–252 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3506&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26854917&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 22. 22.Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases. medRxiv 2021.11.19.21266436 (2021) doi:10.1101/2021.11.19.21266436. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4xMS4xOS4yMTI2NjQzNnYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMTEvMjAyMi4wMS4wOS4yMTI2ODQ3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. 23.Munafò, M. R. & Davey Smith, G. Robust research needs many lines of evidence. Nature 553, 399–401 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/d41586-018-01023-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29368721&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 24. 24.Liu, Y. et al. EpiGraphDB: a database and data mining platform for health data science. Bioinformatics 37, 1304–1311 (2021). 25. 25.Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Ser. B Stat Methodol 82, 1273–1300 (2020). 26. 26.Burgess, S., Thompson, S. G., & CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40, 755–764 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyr036&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21414999&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000293618300027&link_type=ISI) 27. 27.Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet 13, e1007081 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1007081&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29149188&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 28. 28.Holmes, M. V. & Davey Smith, G. Can Mendelian Randomization Shift into Reverse Gear? Clin Chem 65, 363–366 (2019). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiY2xpbmNoZW0iO3M6NToicmVzaWQiO3M6ODoiNjUvMy8zNjMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMS8xMS8yMDIyLjAxLjA5LjIxMjY4NDczLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 29. 29.Folkersen, L. et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet 13, e1006706 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1006706&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28369058&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 30. 30.Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N. & Davey Smith, G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 27, 1133–1163 (2008). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.3034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17886233&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 31. 31.Burgess, S., Zuber, V., Valdes-Marquez, E., Sun, B. B. & Hopewell, J. C. Mendelian randomization with fine-mapped genetic data: Choosing from large numbers of correlated instrumental variables. Genet Epidemiol 41, 714–725 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.22077&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28944551&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 32. 32.Bowden, J. et al. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I 2 statistic. Int J Epidemiol 45, 1961–1974 (2016). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 33. 33.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 34. 34.Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7221&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28114746&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 35. 35.Greco M F. D., Minelli, C., Sheehan, N. A. & Thompson, J. R. Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med 34, 2926–2940 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.6522&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25950993&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 36. 36.Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10, e1004383 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1004383&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24830394&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 37. 37.Namba, S. et al. A practical guideline of genomics-driven drug discovery in the era of global biobank meta-analysis. (2021) doi:10.1101/2021.12.03.21267280. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4xMi4wMy4yMTI2NzI4MHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMTEvMjAyMi4wMS4wOS4yMTI2ODQ3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 38. 38.McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-016-0974-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27268795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 39. 39.Anderson, G. D. Sex and racial differences in pharmacological response: where is the evidence? Pharmacogenetics, pharmacokinetics, and pharmacodynamics. J Womens Health 14, 19–29 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1089/jwh.2005.14.19&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000227063700006&link_type=ISI) 40. 40.Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature17671&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27225129&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 41. 41.Lawlor, D. A., Tilling, K. & Davey Smith, G. Triangulation in aetiological epidemiology. Int J Epidemiol 45, 1866–1886 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyw314&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28108528&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 42. 42.Krokstad, S. et al. Cohort Profile: the HUNT Study, Norway. Int J Epidemiol 42, 968–977 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dys095&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22879362&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325167800009&link_type=ISI) 43. 43.Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res 45, D985–D994 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkw1055&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27899665&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 44. 44.Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34, D668–72 (2006). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkj067&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16381955&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000239307700141&link_type=ISI) 45. 45.Wolford, B. N. et al. Multi-ancestry GWAS for venous thromboembolism identifies novel loci followed by experimental validation. MedRxiv. 46. 46.Pillai, S. G. et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 5, e1000421 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1000421&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19300482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 47. 47.Loos, R. J. F. 15 years of genome-wide association studies and no signs of slowing down. Nat Commun 11, 5900 (2020). 48. 48.Yang, Q. et al. Associations of insomnia on pregnancy and perinatal outcomes: Findings from Mendelian randomization and conventional observational studies in up to 356,069 women. bioRxiv (2021) doi:10.1101/2021.10.07.21264689. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2021.10.07.21264689&link_type=DOI) 49. 49.de Leeuw, C., Werme, J., Savage, J., Peyrot, W. & Posthuma, D. Reconsidering the validity of transcriptome-wide association studies. bioRxiv (2021) doi:10.1101/2021.08.15.456414. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMS4wOC4xNS40NTY0MTR2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 50. 50.Porcu, E. et al. Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome. Nat Commun 12, 1–9 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-20241-w&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 51. 51.Bhattacharya, A. et al. Best practices of multi-ancestry, meta-analytic transcriptome-wide association studies: lessons from the Global Biobank Meta-analysis Initiative. (2021) doi:10.1101/2021.11.24.21266825. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4xMS4yNC4yMTI2NjgyNXYzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMTEvMjAyMi4wMS4wOS4yMTI2ODQ3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 52. 52.Gkatzionis, A., Burgess, S. & Newcombe, P. J. Statistical Methods for cis-Mendelian Randomization. arXiv [q-bio.QM] (2021). 53. 53.Skrivankova, V. W. et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA 326, 1614–1621 (2021). 54. 54.Skrivankova, V. W. et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ 375, n2233 (2021). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNzUvb2N0MjZfMS9uMjIzMyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 55. 55.Hemani, G., Bowden, J. & Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet 27, R195–R208 (2018). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 56. 56.Schmidt, A. F. et al. Genetic drug target validation using Mendelian randomisation. Nat Commun 11, 3255 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-16969-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32591531&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 57. 57.Swerdlow, D. I. et al. Selecting instruments for Mendelian randomization in the wake of genome-wide association studies. Int J Epidemiol 45, 1600–1616 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyw088&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27342221&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 58. 58.Suhre, K. et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun 8, 14357 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ncomms14357&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28240269&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 59. 59.Yao, C. et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat Commun 9, 3268 (2018). 60. 60.Emilsson, V. et al. Human serum proteome profoundly overlaps with genetic signatures of disease. bioRxiv (2020) doi:10.1101/2020.05.06.080440. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNS4wNi4wODA0NDB2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 61. 61.Robins, C. et al. Genetic control of the human brain proteome. Am J Hum Genet 108, 400–410 (2021). 62. 62.Yang, C. et al. Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders. Nat Neurosci 24, 1302–1312 (2021). 63. 63.Kibinge, N. K., Relton, C. L., Gaunt, T. R. & Richardson, T. G. Characterizing the Causal Pathway for Genetic Variants Associated with Neurological Phenotypes Using Human Brain-Derived Proteome Data. Am J Hum Genet 106, 885–892 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2020.04.007&link_type=DOI) 64. 64.Davey Smith, G., Paternoster, L. & Relton, C. When Will Mendelian Randomization Become Relevant for Clinical Practice and Public Health? JAMA 317, 589–591 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2016.21189&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28196238&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 65. 65.Mahmoud, O., Dudbridge, F., Smith, G. D., Munafo, M. & Tilling, K. Slope-Hunter: A robust method for index-event bias correction in genome-wide association studies of subsequent traits. Cold Spring Harbor Laboratory (2020) doi:10.1101/2020.01.31.928077. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wMS4zMS45MjgwNzd2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 66. 66.Paternoster, L., Tilling, K. M. & Smith, G. D. Genetic Epidemiology And Mendelian Randomization For Informing Disease Therapeutics: Conceptual And Methodological Challenges. (2017) doi:10.1101/126599. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czo4OiIxMjY1OTl2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 67. 67.Zheng, J. et al. Genetic effect modification of cis-acting C-reactive protein variants in cardiometabolic disease status. bioRxiv (2021) doi:10.1101/2021.09.23.461369. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMS4wOS4yMy40NjEzNjl2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 68. 68.Walker, V. et al. The consequences of adjustment, correction and selection in genome-wide association studies used for two-sample Mendelian randomization. Wellcome Open Res 6, 103 (2021). 69. 69.Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 37, 658–665 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21758&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24114802&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 70. 70.Zhao, Q., Wang, J., Hemani, G., Bowden, J. & Small, D. S. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. (2018). 71. 71.Giambartolomei, C. et al. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics 34, 2538–2545 (2018). 72. 72.Foley, C. N. et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat Commun 12, 764 (2021). 73. 73.Wallace, C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet 16, e1008720 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1008720&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32310995&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 74. 74.Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. bioRxiv (2021) doi:10.1101/2021.02.23.432421. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMS4wMi4yMy40MzI0MjF2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 75. 75.The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol 129, 687–702 (1989). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.aje.a115184&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=2646917&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 76. 76.Appel, L. J. et al. The rationale and design of the AASK cohort study. J Am Soc Nephrol 14, S166–72 (2003). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiam5lcGhyb2wiO3M6NToicmVzaWQiO3M6MTU6IjE0L3N1cHBsXzIvUzE2NiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzExLzIwMjIuMDEuMDkuMjEyNjg0NzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 77. 77.Di Angelantonio, E. et al. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet 390, 2360–2371 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(17)31928-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 78. 78.Nayor, M. et al. Aptamer-Based Proteomic Platform Identifies Novel Protein Predictors of Incident Heart Failure and Echocardiographic Traits. Circ Heart Fail 13, e006749 (2020). 79. 79.Ganz, P. et al. Development and Validation of a Protein-Based Risk Score for Cardiovascular Outcomes Among Patients With Stable Coronary Heart Disease. JAMA 315, 2532–2541 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2016.5951&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27327800&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 80. 80.Burgess, S. & Thompson, S. G. Bias in causal estimates from Mendelian randomization studies with weak instruments. Stat Med 30, 1312–1323 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.4197&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21432888&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 81. 81.Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet 50, 1335–1341 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0184-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30104761&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 82. 82.Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol (2017) doi:10.1093/ije/dyx034. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyx034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28398548&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F11%2F2022.01.09.21268473.atom) 83. 83.Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, (2018).