PT - JOURNAL ARTICLE AU - Griffith, Gareth AU - Morris, Tim T AU - Tudball, Matt AU - Herbert, Annie AU - Mancano, Giulia AU - Pike, Lindsey AU - Sharp, Gemma C AU - Palmer, Tom M AU - Smith, George Davey AU - Tilling, Kate AU - Zuccolo, Luisa AU - Davies, Neil M AU - Hemani, Gibran TI - Collider bias undermines our understanding of COVID-19 disease risk and severity AID - 10.1101/2020.05.04.20090506 DP - 2020 Jan 01 TA - medRxiv PG - 2020.05.04.20090506 4099 - http://medrxiv.org/content/early/2020/05/14/2020.05.04.20090506.short 4100 - http://medrxiv.org/content/early/2020/05/14/2020.05.04.20090506.full AB - Observational data on COVID-19 including hypothesised risk factors for infection and progression are accruing rapidly. Here, we highlight the challenge of interpreting observational evidence from non-random samples of the population, which may be affected by collider bias. We illustrate these issues using data from the UK Biobank in which individuals tested for COVID-19 are highly selected for a wide range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the sampling mechanisms that leave aetiological studies of COVID-19 infection and progression particularly susceptible to collider bias. We also describe several tools and strategies that could help mitigate the effects of collider bias in extant studies of COVID-19 and make available a web app for performing sensitivity analyses. While bias due to non-random sampling should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.Key messagesCollider bias can occur in studies that non-randomly sample people from the population of interest. This bias can distort associations between variables or induce spurious associations.It may be possible to estimate the underlying selection model or run sensitivity analyses to examine the credibility of the threat of collider bias, but it is difficult to prove that bias has been reduced or eliminated.Tested samples in the UK Biobank cohort are highly selected for a range of traits.Sampling strategies that are resilient to collider bias issues should be used at the design stage of data collection where possible.Where this is not possible, linkage or collection of data on the target population can help in sensitivity and validation analyses.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis research has been conducted using the UK Biobank Resource under Application Number 16729. The Medical Research Council (MRC) and the University of Bristol support the MRC Integrative Epidemiology Unit [MC_UU_12013/1, MC_UU_12013/9, MC_UU_00011/1]. NMD is supported by a Norwegian Research Council Grant number 295989. GH is supported by the Wellcome Trust and Royal Society [208806/Z/17/Z].Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll analysis was performed on UK Biobank data https://github.com/explodecomputer/covid_ascertainment