Abstract
Background Temporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.
Findings EHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.
Conclusions EHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.
Availability https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by UPV grant PAID-00-17, GVA grant BEST/2018, and projects H2020-SC1-2016-CNECT No. 727560 and H2020-SC1-BHC-2018-2020 No. 825750.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The data of the NHDS case study is publicly available at https://www.cdc.gov/nchs/nhds/index.htm. A random subset of this dataset is available as a proxy for testing purposes within the EHRtemporalVariability package, and reproducible examples are available within the package help, its vignette, and the on-line demo. Access to BCH-ASD case study data is restricted by Boston’s Children’s Institutional Review Board. Access to the Mortality case study data is restricted by the Conselleria de Sanitat Universal i Salut Pública, Generalitat Valenciana, Spain.
Abbreviations
- BCH-ASD
- Boston Children’s Hospital Autism Spectrum Disorders cohort
- DTH
- Data Temporal Heatmap
- EHR
- Electronic Health Record
- ICD
- International Classification of Diseases
- ICD-9-CM
- ICD Ninth Revision, Clinical Modification
- IGT plot
- Information Geometric Temporal plot
- NHDS
- National Hospital Discharge Survey
- PheWAS
- Phenome Wide Association Studies