Abstract
Objective To evaluate the validity of death ascertainment from publicly available internet media (IM) sources by benchmarking against state and Federal vital statics data for patients in two large healthcare systems from the US.
Methods We extracted names and dates of birth and death from publicly available data—including obituaries and memorial websites—using previously developed natural language processing models. These data were probabilistically matched to electronic health records (EHRs) from Mass General Brigham (MGB) and Vanderbilt University Medical Center (VUMC) on first name, last name, and date of birth. Using reference standards from state vital statistics databases from MA, CT, and VT for MGB and the National Death Index (NDI) for VUMC patients, we reported positive predicted values (PPV) considering cases where dates of death from IM sources were within 7 days of the reference standard to be true positives. We also reported sensitivity of deaths ascertained from IM sources.
Results When probabilistically matching 8.1 million deaths extracted from public data to 78,848 deaths observed in the reference standards across two sites, 30,607 (38.8%) matched exactly. A PPV of 98.2% for MGB and 98.9% for VUMC was observed for exact matches, while <6% for non-exact matches. Considering only the exact matches, IM sources led to an improvement in sensitivity of death capture by 24% in MGB and 18% in VUMC, compared to using EHRs alone for death ascertainment.
Conclusions Using public information to augment mortality data increased capture of death meaningfully over reliance on EHR records alone.
Competing Interest Statement
Michele LeNoue-Newton reports financial support was provided by US Food and Drug Administration. Michele LeNoue-Newton reports a relationship with GE Healthcare that includes: funding grants. Michele LeNoue-Newton has patent #US2022-032075/US2023-020068 pending to GE Healthcare/Vanderbilt. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Funding Statement
This project was supported by Master Agreement 75F40119D10037 from the US Food and Drug Administration (FDA).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of Mass General Brigham (MGB) determined that this project involves public health surveillance activity and does not require IRB approval as it does not meet the criteria for human subjects research. IRB of Vanderbilt University Medical Center (VUMC) determined that this project involves public health surveillance activity and does not require IRB approval as it does not meet the criteria for human subjects research.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.