RT Journal Article SR Electronic T1 EHRtemporalVariability: delineating temporal dataset shifts in electronic health records JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.04.07.20056564 DO 10.1101/2020.04.07.20056564 A1 Sáez, Carlos A1 Gutiérrez-Sacristán, Alba A1 Kohane, Isaac A1 García-Gómez, Juan M A1 Avillach, Paul YR 2020 UL http://medrxiv.org/content/early/2020/04/11/2020.04.07.20056564.abstract AB Background Temporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.Findings EHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.Conclusions EHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.Availability https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by UPV grant PAID-00-17, GVA grant BEST/2018, and projects H2020-SC1-2016-CNECT No. 727560 and H2020-SC1-BHC-2018-2020 No. 825750.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data of the NHDS case study is publicly available at https://www.cdc.gov/nchs/nhds/index.htm. A random subset of this dataset is available as a proxy for testing purposes within the EHRtemporalVariability package, and reproducible examples are available within the package help, its vignette, and the on-line demo. Access to BCH-ASD case study data is restricted by Boston’s Children’s Institutional Review Board. Access to the Mortality case study data is restricted by the Conselleria de Sanitat Universal i Salut Pública, Generalitat Valenciana, Spain. https://www.cdc.gov/nchs/nhds/index.htm BCH-ASDBoston Children’s Hospital Autism Spectrum Disorders cohortDTHData Temporal HeatmapEHRElectronic Health RecordICDInternational Classification of DiseasesICD-9-CMICD Ninth Revision, Clinical ModificationIGT plotInformation Geometric Temporal plotNHDSNational Hospital Discharge SurveyPheWASPhenome Wide Association Studies