ABSTRACT
Machine learning (ML) models require large datasets which may be siloed across different healthcare institutions. Using federated learning, a ML technique that avoids locally aggregating raw clinical data across multiple institutions, we predict mortality within seven days in hospitalized COVID-19 patients. Patient data was collected from Electronic Health Records (EHRs) from five hospitals within the Mount Sinai Health System (MSHS). Logistic Regression with L1 regularization (LASSO) and Multilayer Perceptron (MLP) models were trained using local data at each site, a pooled model with combined data from all five sites, and a federated model that only shared parameters with a central aggregator. Both the federated LASSO and federated MLP models performed better than their local model counterparts at four hospitals. The federated MLP model also outperformed the federated LASSO model at all hospitals. Federated learning shows promise in COVID-19 EHR data to develop robust predictive models without compromising patient privacy.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by U54 TR001433-05, National Center for Advancing Translational Sciences, National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai (IRB-20-03271).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
This article is written following the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines, which are further elaborated in Supplementary Table 3. Furthermore, we release all code used for building the classifier under the GPLv3 license in a public GitHub repository.