PT - JOURNAL ARTICLE AU - Shringarpure, Suyash S. AU - Wang, Wei AU - Jiang, Yunxuan AU - Acevedo, Alison AU - Dhamija, Devika AU - Cameron, Briana AU - Jubb, Adrian AU - Yue, Peng AU - , AU - Sarov-Blat, Lea AU - Gentleman, Robert AU - Auton, Adam TI - Large-scale trans-ethnic replication and discovery of genetic associations for rare diseases with self-reported medical data AID - 10.1101/2021.06.09.21258643 DP - 2021 Jan 01 TA - medRxiv PG - 2021.06.09.21258643 4099 - http://medrxiv.org/content/early/2021/06/16/2021.06.09.21258643.short 4100 - http://medrxiv.org/content/early/2021/06/16/2021.06.09.21258643.full AB - A key challenge in the study of rare disease genetics is assembling large case cohorts for well-powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.Competing Interest StatementAdam Auton, Briana Cameron, Devika Dhamija, Robert Gentleman, Yunxuan Jiang, Adrian Jubb, Suyash Shringarpure, Wei Wang, and Peng Yue are current or former employees of 23andMe and hold stock or stock options in 23andMe, Inc. Alison Acevedo and Lea Sarov-Blat are employees of GlaxoSmithKline and own company stock.Funding StatementNo external fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). Participants were included in the analysis on the basis of consent status as checked at the time data analyses were initiated.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe full GWAS summary statistics for the 23andMe discovery data set will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.