Abstract
The implications of selection bias due to volunteering (volunteer bias) for genetic association studies are poorly understood. Because of its large sample size and extensive phenotyping, the UK Biobank (UKB) is included in almost all large genomewide association studies (GWAS) to date, as it is one of the largest cohorts. Yet, it is known to be highly selected. We develop inverse probability weighted GWAS (WGWAS) to estimate GWAS summary statistics in the UKB that are corrected for volunteer bias. WGWAS decreases the effective sample size substantially compared to GWAS by an average of 61% (from 337,543 to 130,684) depending on the phenotype. The extent to which volunteer bias affects GWAS associations and downstream results is phenotype-specific. Through WGWAS we find 11 novel genomewide significant loci for type 1 diabetes and 3 for breast cancer. These loci were not identified previously in any prior GWAS. Further, genetic variant’s effect sizes and heritability estimates become more predictive in WGWAS for certain phenotypes (e.g., educational attainment, drinks per week, breast cancer and type 1 diabetes). WGWAS also alters biological annotation relations in gene-set analyses. This suggests that not accounting for volunteer-based selection can result in GWASs that suffer from bias, which in turn may drive spurious associations. GWAS consortia may therefore wish to provide population weights for their data sets or rely more on population-representative samples.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported in this publication was supported by the National Institute On Aging of the National Institutes of Health (RF1055654 and R56AG058726), the Dutch National Science Foundation (016.VIDI.185.044), and the Jacobs Foundation.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee of Vrije Universiteit Amsterdam gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
UKB data can be accessed upon request for research projects that have obtained the necessary approval. These requests can be submitted through https://www.ukbiobank.ac.uk/