PT - JOURNAL ARTICLE AU - Knight, Spencer C. AU - McCurdy, Shannon R. AU - Rhead, Brooke AU - Coignet, Marie V. AU - Park, Danny S. AU - Roberts, Genevieve H.L. AU - Berkowitz, Nathan D. AU - Zhang, Miao AU - Turissini, David AU - Delgado, Karen AU - Pavlovic, Milos AU - , AU - Haug Baltzell, Asher K. AU - Guturu, Harendra AU - Rand, Kristin A. AU - Girshick, Ahna R. AU - Hong, Eurie L. AU - Ball, Catherine A. TI - COVID-19 susceptibility and severity risks in a survey of over 500,000 individuals AID - 10.1101/2020.10.08.20209593 DP - 2021 Jan 01 TA - medRxiv PG - 2020.10.08.20209593 4099 - http://medrxiv.org/content/early/2021/01/27/2020.10.08.20209593.short 4100 - http://medrxiv.org/content/early/2021/01/27/2020.10.08.20209593.full AB - Background The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analyzing population-scale datasets in real time to monitor and better understand the evolving pandemic.Methods The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors, and exposures for over 563,000 adult individuals in the U.S. in just under four months, including over 4,700 COVID-19 cases as measured by a self-reported positive test.Results We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for males even after adjusting for known exposures and age (adjusted odds ratio [aOR]=1.36, 95% confidence interval [CI] = (1.19, 1.55)). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualized COVID-19 susceptibility (area under the curve [AUC]=0.84) and severity outcomes including hospitalization and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex, and genetic ancestry groups within the study.Conclusion The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.What is already known on this subjectThe COVID-19 pandemic has exacted a historic toll on human lives, healthcare systems and global economies, with over 83 million cases and over 1.8 million deaths worldwide as of January 2021.COVID-19 risk factors for susceptibility and severity have been extensively investigated by clinical and public health researchers.Several groups have developed risk models to predict COVID-19 illness outcomes based on known risk factors.What this study addsWe performed association analyses for COVID-19 susceptibility and severity in a large, at-home survey and replicated much of the previous clinical literature.Associations were further adjusted for known COVID-19 exposures, and we observed elevated positive test odds for males even after adjustment for these known exposures.We developed risk models and evaluated them across different age, sex, and genetic ancestry cohorts, and showed robust performance across all cohorts in a holdout dataset.Our results establish large-scale, self-reported surveys as a potential framework for investigating and monitoring rapidly evolving pandemics.Competing Interest StatementThe authors declare competing financial interests: authors affiliated with AncestryDNA may have equity in Ancestry.Funding StatementAll work was supported and funded by AncestryDNAAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:All data for this research project were from subjects who have provided informed consent to participate in AncestryDNA's Human Diversity Project, as reviewed and approved by our external institutional review board, Advarra (formerly Quorum, IRB approval number: Pro00034516). Advarra operates under ethical principles underlying the involvement of human subjects in research, including the Declaration of Helsinki. All data were de-identified prior to use.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesA dataset (EGAC00001001762) is available to qualified scientists through the European Genome-phenome Archive (EGA). The EGA dataset includes the risk factors and outcomes studied here. The EGA dataset is de-identified and comprises ~15,000 individuals who tested for COVID-19, including more than 3,000 individuals who tested positive, many of whom are in this study. The EGA cohort is sufficient to nominally replicate the vast majority of susceptibility and severity associations from this study. https://ega-archive.org/dacs/EGAC00001001762 aORadjusted odds ratioAUCarea under the curveBMIbody mass indexCIconfidence intervalCDCU.S. Centers for Disease ControlCKDchronic kidney diseaseCOPDchronic obstructive pulmonary diseaseEGAEuropean Genome-phenome ArchiveGWASgenome-wide association studiesHWF“How We Feel” studyICUintensive care unitLRlogistic regressionORodds ratioROCreceiver operating characteristic