PT - JOURNAL ARTICLE AU - Jones, Sara AU - Bradwell, Katie R. AU - Chan, Lauren E. AU - Olson-Chen, Courtney AU - Tarleton, Jessica AU - Wilkins, Kenneth J. AU - Qin, Qiuyuan AU - Faherty, Emily Groene AU - Lau, Yan Kwan AU - Xie, Catherine AU - Kao, Yu-Han AU - Liebman, Michael N. AU - Mariona, Federico AU - Challa, Anup AU - Li, Li AU - Ratcliffe, Sarah J. AU - McMurry, Julie A. AU - Haendel, Melissa A. AU - Patel, Rena C. AU - Hill, Elaine L. AU - , TI - Who is pregnant? defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C) AID - 10.1101/2022.08.04.22278439 DP - 2022 Jan 01 TA - medRxiv PG - 2022.08.04.22278439 4099 - http://medrxiv.org/content/early/2022/08/08/2022.08.04.22278439.short 4100 - http://medrxiv.org/content/early/2022/08/08/2022.08.04.22278439.full AB - Objective To define pregnancy episodes and estimate gestational aging within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C).Materials and Methods We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS) and applied it to EHR data in the N3C from 1 January 2018 to 7 April 2022. HIPPS combines: 1) an extension of a previously published pregnancy episode algorithm, 2) a novel algorithm to detect gestational aging-specific signatures of a progressing pregnancy for further episode support, and 3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated three types of pregnancy cohorts based on the level of precision for gestational aging and pregnancy outcomes for comparison of COVID-19 and other characteristics.Results We identified 628,165 pregnant persons with 816,471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, spontaneous abortions), and 23.3% had unknown outcomes. We were able to estimate start dates within one week of precision for 431,173 (52.8%) episodes. 66,019 (8.1%) episodes had incident COVID-19 during pregnancy. Across varying COVID-19 cohorts, patient characteristics were generally similar though pregnancy outcomes differed.Discussion HIPPS provides support for pregnancy-related variables based on EHR data for researchers to define pregnancy cohorts. Our approach performed well based on clinician validation.Conclusion We have developed a novel and robust approach for inferring pregnancy episodes and gestational aging that addresses data inconsistency and missingness in EHR data.Competing Interest StatementKRB is an employee of Palantir Technologies. YK and LL are employees of Sema4, ML is Managing Director of IPQ Analytics, LLC.Funding StatementThe analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave covid.cd2h.org/enclave and N3C Attribution & Publication Policy v1.2-2020-08-25b, and supported by NCATS U24 TR002306, and NIGMS National Institute of General Medical Sciences, 5U54GM104942-04. Individual authors were supported by the following funding sources: NIMH R01131542 (Rena C. Patel), NICHD R21105304 (Anup P. Challa).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Data partner sites transfer their N3C-eligible data to NCATS/NIH under a Johns Hopkins University Reliance Protocol (IRB00249128) or via individual site agreements with NCATS (see below). Managed under the NIH authority, the N3C Data Enclave can be accessed as previously described [10] and at ncats.nih.gov/n3c/resources, https://covid.cd2h.org/for-researchers. SiteIRB NameExempted vs approvedProtocol number Medical University of South CarolinaHealth Sciences South Carolina Institutional Review BoardexemptPro00111335 National Institutes of HealthNIH Office of IRB OperationsexemptN/A University of MinnesotaUniversity of Minnesota Institutional Review BoardapprovedSTUDY00012706 University of RochesterUniversity of Rochester Research Subjects Review BoardexemptSTUDY00005366 University of WashingtonHuman Subjects DivisionapprovedSTUDY00013147 Institutional IRBs determine that it is not human subjects research, it is research on the dataI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesN3C Data Enclave can be accessed at ncats.nih.gov/n3c/resources, https://covid.cd2h.org/for-researchers https://ncats.nih.gov/n3c/resources https://covid.cd2h.org/for-researchers