Abstract
Linear mixed models (LMMs) are commonly used in many areas including epidemiology for analyzing multi-site data with heterogeneous site-specific random effects. However, due to the regulation of protecting patients’ privacy, sensitive individual patient data (IPD) are usually not allowed to be shared across sites. In this paper we propose a novel algorithm for distributed linear mixed models (DLMMs). Our proposed DLMM algorithm can achieve exactly the same results as if we had pooled IPD from all sites, hence the lossless property. The DLMM algorithm requires each site to contribute some aggregated data (AD) in only one iteration. We apply the proposed DLMM algorithm to analyze the association of length of stay of COVID-19 hospitalization with demographic and clinical characteristics using the administrative claims database from the UnitedHealth Group Clinical Research Database.
Competing Interest Statement
Drs. Sheils and Islam and Mr. Buresh are full-time employees in Optum Labs and own stock in its parent company, UnitedHealth Group, Inc.
Funding Statement
Research reported in this article was partially funded through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2019C3-18315).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
UnitedHealth Group Certificate of Action Office of Human Research Affairs(OHRA) 160 Second Street 3rd Floor Cambridge MA, 02142 Federalwide Assurance #: FWA00028881 OHRP Registration #: IORG0010356 Investigator Name: Sheils, Natalie Action Date: 12 Nov 2020 UHG Project Number: 2020 0088 Action Type: New Initial Application Project Sponsor: No Sponsor Action ID: 2020 0088 01 Project Expiration: None Action Level: Exempt Project Risk Level: Negligible Risk Action Decision: Approved Project Title: Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data The following are included in the decision for this action: Documents: Protocol, dated 11-10-2020 CITI Training: Biomedical Research (Sheils) Regulatory Determinations: The research was determined to qualify as negligible risk and is permissible under exempt category 4 (ii). The research was determined to be exempt as the activities did not identify subjects directly or through identifiers via the claims database, subjects were not contacted, and investigators will not re-identify subjects. Notes & Reminders: Annual renewal of this project is not required. If any aspects of the project change, please engage the OHRA to determine whether any additional research determinations or risk assessments are needed. Please inform the OHRA when they project and any associated data utilization have concluded.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Institution information updated for authors: Md. Nazmul Islam, Natalie E. Sheils, and Rui Duan.
Data Availability
The data are proprietary and are not available for public use but can be made available under a data use agreement to confirm the findings of the current study.