PT - JOURNAL ARTICLE AU - Morales, Félix L. AU - Xu, Feihong AU - Lee, Hyojun Ada AU - Navarro, Helio Tejedor AU - Bechel, Meagan A. AU - Cameron, Eryn L. AU - Kelso, Jesse AU - Weiss, Curtis H. AU - Nunes Amaral, Luís A. TI - Open-source machine learning pipeline automatically flags instances of acute respiratory distress syndrome from electronic health records AID - 10.1101/2024.05.21.24307715 DP - 2024 Jan 01 TA - medRxiv PG - 2024.05.21.24307715 4099 - http://medrxiv.org/content/early/2024/05/26/2024.05.21.24307715.short 4100 - http://medrxiv.org/content/early/2024/05/26/2024.05.21.24307715.full AB - Physicians could greatly benefit from automated diagnosis and prognosis tools to help address information overload and decision fatigue. Intensive care physicians stand to benefit greatly from such tools as they are at particularly high risk for those factors. Acute Respiratory Distress Syndrome (ARDS) is a life-threatening condition affecting >10% of critical care patients and has a mortality rate over 40%. However, recognition rates for ARDS have been shown to be low (30-70%) in clinical settings. In this work, we present a reproducible computational pipeline that automatically adjudicates ARDS on retrospective datasets of mechanically ventilated adult patients. This pipeline automates the steps outlined by the Berlin Definition through implementation of natural language processing tools and classification algorithms. We train an XGBoost model on chest imaging reports to detect bilateral infiltrates, and another on a subset of attending physician notes labeled for the most common ARDS risk factor in our data. Both models achieve high performance—a minimum area under the receiver operating characteristic curve (AUROC) of 0.86 for adjudicating chest imaging reports in out-of-bag test sets, and an out-of-bag AUROC of 0.85 for detecting a diagnosis of pneumonia. We validate the entire pipeline on a cohort of MIMIC-III encounters and find a sensitivity of 93.5% — an extraordinary improvement over the 22.6% ARDS recognition rate reported for these encounters — along with a specificity of 73.9%. We conclude that our reproducible, automated diagnostic pipeline exhibits promising accuracy, generalizability, and probability calibration, thus providing a valuable resource for physicians aiming to enhance ARDS diagnosis and treatment strategies. We surmise that proper implementation of the pipeline has the potential to aid clinical practice by facilitating the recognition of ARDS cases at scale.Competing Interest StatementThe authors have declared no competing interest.Funding StatementFeihong Xu was supported in part by the National Institutes of Health Training Grant (T32GM008449) through Northwestern University's Biotechnology Training Program. Curtis H. Weiss was supported by the National Heart Lung and Blood Institute (R01HL140362 and K23HL118139). Luís A. Nunes Amaral was supported by the National Heart Lung and Blood Institute (R01HL140362). Luís A. Nunes Amaral and Feihong Xu are supported by the National Institute of Allergy and Infectious Diseases (U19AI135964).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Institutional Review Board of Northwestern University gave ethical approval for this work (STU00208049). Institutional Review Board of Endeavor Health gave ethical approval for this work (EH17-325).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe datasets analyzed in this study will be made available upon publication at ARCH repository hosted by Northwestern University (https://arch.library.northwestern.edu).