RT Journal Article SR Electronic T1 Detecting Goals of Care Conversations in Clinical Notes with Active Learning JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.01.03.24300801 DO 10.1101/2024.01.03.24300801 A1 Weissenbacher, Davy A1 Courtright, Katherine A1 Rawal, Siddharth A1 Crane-Droesch, Andrew A1 O’Connor, Karen A1 Kuhl, Nicholas A1 Merlino, Corinne A1 Foxwell, Anessa A1 Haines, Lindsay A1 Puhl, Joseph A1 Gonzalez-Hernandez, Graciela YR 2024 UL http://medrxiv.org/content/early/2024/01/04/2024.01.03.24300801.abstract AB Objective Goals Of Care (GOC) discussions are an increasingly used quality metric in serious illness care and research. Wide variation in documentation practices within the Electronic Health Record (EHR) presents challenges for reliable measurement of GOC discussions. Novel natural language processing approaches are needed to capture GOC discussions documented in real-world samples of seriously ill hospitalized patients’ EHR notes, a corpus with a very low event prevalence.Methods To automatically detect utterances documenting GOC discussions outside of dedicated GOC note types, we proposed an ensemble of classifiers aggregating the predictions of rule-based, feature-based, and three transformers-based classifiers. We trained our classifier on 600 manually annotated EHR notes among patients with serious illnesses. Our corpus exhibited an extremely imbalanced ratio between utterances discussing GOC and utterances that do not. This ratio challenges standard supervision methods to train a classifier. Therefore, we trained our classifier with active learning.Results Using active learning, we reduced the annotation cost to fine-tune our ensemble by 70% while improving its performance in our test set of 176 EHR notes, with 0.557 F1-score for utterance classification and 0.629 for note classification.Conclusion When classifying notes, with a true positive rate of 72% (13/18) and false positive rate of 8% (13/158), our performance may be sufficient for deploying our classifier in the EHR to facilitate point-of-care access to GOC conversations documented outside of dedicated notes types, without overburdening clinicians with false positives. Improvements are needed before using it to enrich trial populations or as an outcome measure.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Heart, Lung, and Blood Institute, Grant Number NHLBI K23-HL143181Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Institutional Review Board of the University of Pennsylvania determined that this study protocol meet the eligibility criteria for exemption (45 CFR 46) and a waiver of the HIPAA authorization requirement was granted (45 CFR 164).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yes