RT Journal Article SR Electronic T1 The All of Us Research Program: data quality, utility, and diversity JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.05.29.20116905 DO 10.1101/2020.05.29.20116905 A1 Ramirez, Andrea H. A1 Sulieman, Lina A1 Schlueter, David J. A1 Halvorson, Alese A1 Qian, Jun A1 Ratsimbazafy, Francis A1 Loperena, Roxana A1 Mayo, Kelsey A1 Basford, Melissa A1 Deflaux, Nicole A1 Muthuraman, Karthik N. A1 Natarajan, Karthik A1 Kho, Abel A1 Xu, Hua A1 Wilkins, Consuelo A1 Anton-Culver, Hoda A1 Boerwinkle, Eric A1 Cicek, Mine A1 Clark, Cheryl R. A1 Cohn, Elizabeth A1 Ohno-Machado, Lucila A1 Schully, Sheri A1 Ahmedani, Brian K. A1 Argos, Maria A1 Cronin, Robert M. A1 O’Donnell, Christopher A1 Fouad, Mona A1 Goldstein, David B. A1 Greenland, Philip A1 Hebbring, Scott J. A1 Karlson, Elizabeth W. A1 Khatri, Parinda A1 Korf, Bruce A1 Smoller, Jordan W. A1 Sodeke, Stephen A1 Wilbanks, John A1 Hentges, Justin A1 Lunt, Christopher A1 Devaney, Stephanie A. A1 Gebo, Kelly A1 C Denny, Joshua A1 Carroll, Robert J. A1 Glazer, David A1 Harris, Paul A. A1 Hripcsak, George A1 Philippakis, Anthony A1 Roden, Dan M. A1 , YR 2020 UL http://medrxiv.org/content/early/2020/06/03/2020.05.29.20116905.abstract AB Importance The All of Us Research Program hypothesizes that accruing one million or more diverse participants engaged in a longitudinal research cohort will advance precision medicine and ultimately improve human health. Launched nationally in 2018, to date All of Us has recruited more than 345,000 participants. All of Us plans to open beta access to researchers in May 2020.Objective To demonstrate the quality, utility, and diversity of the All of Us Research Program’s initial data release and beta launch of the cloud-based analysis platform, the cloud-based Researcher Workbench.Evidence We analyzed the initial All of Us data release, comprising surveys, physical measurements (PM), and electronic health record (EHR) data, to characterize All of Us participants including self-reported descriptors of diversity. Data depth, density, and quality were evaluated using medication sequencing analyses for depression and type 2 diabetes. Replication of known oncologic associations with smoking exposure ascertained by EHR and survey data and calculation of population-based atherosclerotic cardiovascular disease risk scores demonstrated the utility of data and platform capability.Findings The beta launch of the All of Us Researcher Workbench contains data on 224,143 participants. Seventy-seven percent of this cohort were identified as Underrepresented in Biomedical Research (UBR) including over forty-eight percent self-reporting non-White race. Medication usage patterns in common diseases depression and type 2 diabetes replicated prior findings previously reported in the literature and showed differences based on race. Oncologic associations with smoking were replicated and effect sizes compared for EHR and survey exposures finding general agreement. A cardiovascular disease score was calculated utilizing multiple data elements curated across sources. The cloud-based architecture built in the Researcher Workbench provided secure access and powerful computational resources at a low cost. All analyses have been made available for replication and reuse by registered researchers.Conclusions and Relevance The All of Us Research Program’s initial release of cohort data contains longitudinal and multidimensional data on diverse participants that replicate known associations. This dataset and the cloud-based Researcher Workbench advance the mission of All of Us to make data widely and securely available to researchers to improve human health and advance precision medicine.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe All of Us Research Program is supported (or funded) by grants through the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition to the funded partners, the All of Us Research Program would not be possible without the contributions made by its participants. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Approval to use the dataset for the specified demonstration projects was obtained from the All of Us Institutional Review Board. Results reported are in compliance with the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20 to protect participant privacy.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe dataset was accessed through the All of Us Researcher Workbench platform, a cloud-based analytic platform custom built by the program for approved researchers. The Workbench is built on top of the Terra platform, which is also utilized for a number of other NIH-funded studies including the NCI Cloud Resources, the NHLBI BioData Catalyst, and the NHGRI AnVIL. Access to the Researcher Workbench and data are free. Compute and storage accrue usage cost. All researchers who accessed the data for analyses were authorized and approved via a 3-step process that included registration, completion of ethics training, and attestation to a data use agreement. The Researcher Workbench uses Google Compute Engine for computational resources in the cloud and Google Cloud Storage for storage in the cloud. https://workbench.researchallofus.org/workspaces/aou-rw-dd7cff0e/medicationspathwaysequencesbyracephase1/notebooks https://workbench.researchallofus.org/workspaces/aou-rw-a8fc912d/duplicateofframinghamahariskscore/notebooks https://workbench.researchallofus.org/workspaces/aou-rw-d59956e4/jamaphewasfinalreview05212020/data