ABSTRACT
Background and Objective The Surveillance, Epidemiology, and End Results Program (SEER) program and the National Program of Cancer Registries (NPCR), are authoritative sources for population cancer surveillance and research in the US. An increasing number of recent oncology studies are based on the electronic health record (EHR)-derived de-identified databases created and maintained by Flatiron Health. This report describes the differences in the originating sources and data development processes, and compares baseline demographic characteristics in the cancer-specific databases from Flatiron Health, SEER, and NPCR, to facilitate interpretation of research findings based on these sources.
Methods Patients with documented care from January 1, 2011 through May 31, 2019 in a series of EHR-derived Flatiron Health de-identified databases covering multiple tumor types were included. SEER incidence data (obtained from the SEER 18 database) and NPCR incidence data (obtained from the US Cancer Statistics public use database) for malignant cases diagnosed from January 1, 2011 to December 31, 2016 were included. Comparisons of demographic variables were performed across all disease-specific databases, for all patients and for the subset diagnosed with advanced-stage disease.
Results As of May 2019, a total of 201,570 patients with 19 different cancer types were included in Flatiron Health datasets. In an overall comparison to national cancer registries, patients in the Flatiron Health databases had similar sex, age at initial diagnosis, and geographic distributions but appeared to be diagnosed with later stages of disease compared with patients in other datasets. For variables such as stage and race, Flatiron Health databases had a greater degree of incompleteness. There are variations in these trends by cancer types.
Conclusions These three databases present general similarities in demographic and geographic distribution, but there are overarching differences across the populations they cover. Differences in data sourcing (medical oncology EHRs vs cancer registries), and disparities in sampling approaches and rules of data acquisition may explain some of these divergences. Furthermore, unlike the steady information flow entered into registries, the availability of medical oncology EHR-derived information reflects the extent of involvement of medical oncology clinics at different points in the specialty management of individual diseases, resulting in inter-disease variability. These differences should be considered when interpreting study results obtained with these databases.
Competing Interest Statement
At the time of the study, all authors report employment at Flatiron Health, Inc., which is an independent member of the Roche Group, and stock ownership in Roche. SSB, LL own equity in Flatiron Health.
Clinical Trial
Not applicable
Funding Statement
This study was sponsored by Flatiron Health, Inc. (Flatiron Health), which is an independent member of the Roche group.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Funding: This study was sponsored by Flatiron Health, Inc. (Flatiron Health), which is an independent member of the Roche group.
Disclosures: At the time of the study, all authors report employment at Flatiron Health, Inc., which is an independent member of the Roche Group, and stock ownership in Roche. SSB, LL own equity in Flatiron Health.
Author roles and contributions: Study design and concept: XM, LL, SM, BJSA, SSB; Data collection: Flatiron Health; Data analysis and interpretation: XM, LL, SSB; Manuscript writing, review and approval: All
The analysis of patients in the Flatiron Health databases was refreshed in April 2023 to incorporate an update to the birth year variable, reflecting best practices in patient de-identification. This refresh resulted in updates to the distributions in calculated age at initial diagnosis. These updates have been incorporated in the methods section, Tables 2 through 20 and A3.A through A3.S, and a new appendix (Appendix II). Additionally, race/ethnicity and sex/gender variables have been clarified in Table 1.
Data Availability
The data that support the findings of this study have been originated by Flatiron Health, Inc. Requests for data sharing by license or by permission for the specific purpose of replicating results in this manuscript can be submitted to dataaccess{at}flatiron.com.