Abstract
Understanding the genomic basis of human proteomic variability provides powerful tools to probe potential causal relationships of proteins and disease risk, and thus to prioritise candidate drug targets. Here, we investigated 6432 plasma proteins (1533 previously unstudied in large-scale proteomic GWAS) using the SomaLogic (v4.1) aptamer-based technology in a Scottish population from the Viking Genes study. A total of 505 significant independent protein quantitative trait loci (pQTL) were found for 455 proteins in blood plasma: 382 cis-(P < 5x10-8) and 123 trans-(P < 6.6x10-12). Of these, 31 cis-pQTL were for proteins with no previous GWAS. We leveraged these pQTL to perform causal inference using bidirectional Mendelian randomisation and colocalisation against complex traits of biomedical importance. We discovered 42 colocalising associations (with a posterior probability >80% that pQTL and complex traits share a causal variant), pointing to plausible causal roles for the proteins. These findings include hitherto undiscovered causal links of leukocyte receptor tyrosine kinase (LTK) to type-2 diabetes and beta-1,3-glucuronyltransferase (B3GAT1) to prostate cancer. These new connections will help guide the search for new or repurposed therapies. Our findings provide strong support for continuing to increase the number of proteins studied using GWAS.
Competing Interest Statement
P.T. and L.K. are currently employed by and have share options in BioAge Labs. The remaining authors declare no conflicts of interest
Funding Statement
The Viking Health Study - Shetland (VIKING1) was supported by the MRC Human Genetics Unit quinquennial programme grant QTL in Health and Disease (U. MC_UU_00007/10). DNA extractions and genotyping were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. J.K. acknowledges the MRC Doctoral Training Programme in Precision Medicine (MR/N013166/1). L.K. was supported by an RCUK Innovation Fellowship from the National Productivity Investment Fund (MR/R026408/1). PN was supported by UKRI Medical Research Council (MC_PC_U127592696, MC_PC_U127561128 and MC_UU_00007/10) and the BBSRC (BBS/E/RL/230001A).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All participants gave informed consent, and the study was approved by the Southeast Scotland Research Ethics Committee, NHS Lothian (reference: 12/SS/0151).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Added ORCID identifiers for coauthors. Clarified the Conflict of Interest section.
Data availability
The summary association statistics for all proteomic GWAS in this study have been deposited in the DataShare repository (available under https://datashare.ed.ac.uk/handle/10283/705) There is neither Research Ethics Committee approval, nor consent from individual participants, to permit open release of the individual-level research data underlying this study. The datasets generated and analysed during the current study are therefore not publicly available. Instead, the research data and/or DNA samples are available by managed access from accessQTL{at}ed.ac.uk on reasonable request, following approval by the QTL Data Access Committee and in line with the consent given by participants. Each approved project is subject to a data or materials transfer agreement (D/MTA) or commercial contract. The UK Biobank genotypic data used in this study as a LD reference panel were approved under application 19655 and are available to qualified researchers via the UK Biobank data access process.
Code availability
All analyses were conducted using publicly accessible software tools, which are detailed both in the main text and within the Methods section.
Data handling was done in Python 3. Main modules used include pandas (v1.4), scipy (v1.4), numpy (v1.20) for data transformation and statistical analysis, requests (v2.22) for data download via API, and matplotlib (v3.2) for creating graphs. Scripts will be made available upon request.