Abstract
Background We created a United States-based real-world data resource to better understand the continued impact of the COVID-19 pandemic on immunocompromised patients, who are typically under-represented in prospective studies and clinical trials. Methods: The COVID-19 Real World Data infrastructure (CRWDi) was created by linking and harmonizing deidentified HealthVerity medical and pharmacy claims data from December 1, 2018 to December 31, 2023, with SARS-CoV-2 virologic and serologic laboratory data from major commercial laboratories and Northwell Health; COVID-19 vaccination data; and for patients with cancer, 2010 to 2021 National Cancer Institute Surveillance, Epidemiology, and End Results registry data. Results: The CRWDi dataset contains data on 5.2 million people. Four populations were included in the dataset: (1) patients with cancer (n=1,294,022); (2) patients with rheumatic conditions receiving pharmacotherapy (n=1,636,940); (3) non-cancer solid organ (n=249,797) and hematopoietic stem cell (n=30,172) transplant recipients; and (4) people from the general population including adults (>18 years of age; n=1,790,162) and pediatric patients (<18 years of age; n=198,907).
Conclusions We have created a complex real-world data system to address unanswered questions that have arisen during the COVID-19 pandemic. Further, by making the data broadly and freely available to academic researchers from the United States, the CRWDi real-world data system represents an important complement to existing consortia studies and clinical trials that have emerged during the healthcare crisis, and is readily reproducible for future purposing.
Summary The COVID-19 Real World Data infrastructure dataset contains 5.2 million deidentified patient records, with focus on immunocompromising conditions, and is freely available to approved researchers to study the impact of coronavirus disease 2019 (COVID-19) on patient morbidities and outcomes.
Competing Interest Statement
The following authors have no disclosures: L.P; L.A.P.; S.K.; J.W.L.; C.B.S.; S.Y.; Y.C.Z. These authors disclose the following: J.M.C. received support from the NCI for this project, and is board member, Project Santa Fe Foundation, LLC; K.N.A. has received investigator-initiated research grants from the NIH (to the institution) and consultation fees (both unrelated to the current work) from the All of Us Research Program (NIH; payment to the author), TrioHealth (payment to the author as advisory board member), and Kennedy Dundas; K. N. A. also reports royalties or licenses from Coursera as the director of a 5-course specialization (payment to the author and institution); M.M.A. is employee of and holds stock options in Aetion, Inc. She also reports receiving honoraria from the American Society of Nephrology and the International Society of Nephrology outside the submitted work; O.C. was previously employed by Labcorp; L.G. is an employee of and owns stock in Labcorp; T.L.H. is an employee of HealthVerity; H.W.K. was previously employed by Quest Diagnostics; D.K.is an employee of HealthVerity; W.A.M. is consultant to and owns stock in Quest Diagnostics; S.L.R. is employee of and owns stock in Aetion, Inc.; S.S. receives consulting and advisory board fees from ADC Therapeutics; C.T. is an employee of HealthVerity; Z.S.W. reports research support from Bristol-Myers Squibb and Principia/Sanofi and consulting fees/advisory board fees from Zena Biopharma, Horizon, Sanofi, Shionogi, Viela Bio, Biocryst, Visterra, Novartis and MedPace; and J.L.W. provides consulting for Westat and The Lewin Group and has ownership of HemOnc.org, LLC.
Funding Statement
This project was funded in whole or in part by federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. 75N91019D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does the mention of trade names, commercial products, or organizations imply endorsement by the U.S. government.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The contribution of Northwell Health data to this data resource was under a Northwell Health IRB approved protocol, with exempt status.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
This data resource was created to support academic, non-commercial research projects in the United States. Submitted proposals are reviewed by the NCI for appropriateness of the proposal to the data resource (https://seer.cancer.gov/data-software/crwdi/). Upon approval, obtaining access to CRWDi requires both NCI-authorized access to the SEER Registry, and HealthVerity-authorized access to the cloud-based cohort discovery tool and analytic platform housing the CRWDi data.
Abbreviations
- CDC
- Centers for Disease Control and Prevention
- CPT
- Current Procedural Terminology
- CRWDi
- COVID-19 Real World Data infrastructure
- HCPCS
- Healthcare Common Procedure Coding System
- HSC
- Hematopoietic stem cell
- HVID
- HealthVerity de-identified patient identification number
- HVM
- HealthVerity Marketplace
- ICD-10-CM
- International Classification of Disease, 10th Revision, Clinical Modification
- ICD-10-PCS
- International Classification of Disease, 10th Revision, Procedure Coding System
- LOINC
- Logical Observation Identifier Names and Codes
- NAAT
- Nucleic Acid Amplification Test
- NCI
- National Cancer Institute
- NDC
- National Drug Code
- PPRL
- Privacy preserving record linkages
- SEER
- Surveillance, Epidemiology, and End Results program.