PT - JOURNAL ARTICLE AU - Evans, Benjamin D. AU - Słowiński, Piotr AU - Hattersley, Andrew T. AU - Jones, Samuel E. AU - Sharp, Seth AU - Kimmitt, Robert A. AU - Weedon, Michael N. AU - Oram, Richard A. AU - Tsaneva-Atanasova, Krasimira AU - Thomas, Nicholas J. TI - Estimating population level disease prevalence using genetic risk scores AID - 10.1101/2020.02.20.20025528 DP - 2020 Jan 01 TA - medRxiv PG - 2020.02.20.20025528 4099 - http://medrxiv.org/content/early/2020/02/23/2020.02.20.20025528.short 4100 - http://medrxiv.org/content/early/2020/02/23/2020.02.20.20025528.full AB - Clinical classification is essential for estimating disease prevalence in a population but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining the prevalence of a disease within a population using genetic risk scores. We compare and evaluate methods based on the means of the genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We assess the impact on estimates resulting from the population size and proportion of cases to non-cases. Using less discriminative genetic risk scores still results in robust estimates of proportion. Genetic stratification techniques provide exciting research tools enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria.Competing Interest StatementThe authors have declared no competing interest.Funding StatementBDE and PS acknowledge that this work was generously supported by the Wellcome Trust Institutional Strategic Support Awards (WT204909MA and 204909/Z/16/Z respectively). KTA gratefully acknowledges the financial support of the EPSRC via grant EP/N014391/1. NJT is funded by an NIHR Academic Clinical Fellowship and undertook the research as part of a Wellcome Trust funded secondment within the translational research exchange at Exeter University (WT204909MA and 204909/Z/16/Z respectively). S.A.S. is supported by a Diabetes UK PhD studentship (17/0005757). M.N.W. is supported by the Wellcome Trust Institutional Support Fund (WT097835MF). RAO is funded by a Diabetes UK Harry Keen Fellowship (16/0005529). SEJ is funded by an MRC grant. ATH is supported by the NIHR Exeter Clinical Research Facility and a Wellcome Senior Investigator award and an NIHR Senior Investigator award. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesUK Biobank data is a open access resource The software implementing these methods (in Python and Matlab) will be open-sourced upon acceptance through a public version-controlled code repository