Accurate, Robust, and Scalable Machine Abstraction of Mayo Endoscopic Subscores from Colonoscopy Reports

Anna L. Silverman; Balu Bhasuran; Arman Mosenia; Fatema Yasini; Saransh Gupta; Taline Mardirossian; Rohan Narain; Justin Sewell; Atul J. Butte; Vivek A. Rudrapatna

doi:10.1101/2022.06.19.22276606

Abstract

Importance Electronic health records (EHR) data are growing in importance as a source of evidence on real-world treatment effects. However, many clinical important measures are not directly captured as structured data by these systems, limiting their utility for research and quality improvement. Although this information can usually be manually abstracted from clinical notes, this process is expensive and subject to variability. Natural language processing (NLP) is a scalable alternative but has historically been subject to multiple limitations including insufficient accuracy, data hunger, technical complexity, poor generalizability, algorithmic unfairness, and an outsized carbon footprint.

Objective Compare different algorithmic approaches for classifying colonoscopy reports according to their ulcerative colitis Mayo endoscopic subscores

Design Other observational study – NLP algorithm development and validation

Setting Academic medical center (UCSF) and safety-net hospital (ZSFG) in California

Participants Patients with ulcerative colitis

Exposures Colonoscopy

Main Outcomes and Measures The primary outcome was accuracy in identifying reports suitable for Mayo subscoring (binary yes/no) and then separately assigning a Mayo subscore where relevant (ordinal). Secondary outcomes included learning efficiency from training data, generalizability, computational costs, fairness, and sustainability.

Results Using automated machine learning (autoML) we trained a pair of classifiers that were 98% [91-99%] accurate at determining which reports to score and 97% [88-99%] accurate at assigning the correct Mayo endoscopic subscore. The binary classifiers trained on UCSF data achieved 96% accuracy on hold-out test data from ZSFG. Training these classifiers required 4 hours of computation on a standard laptop. Classification errors were not associated with either gender or area deprivation index. The carbon footprint of this approach was 24x less than current deep learning algorithms for clinical text classification.

Conclusions and Relevance We identified autoML as an efficient and robust method for training clinical text classifiers. AutoML-trained classifiers demonstrated many favorable properties including generalizability, limited effort needed for data annotation and algorithm training, fairness, and sustainability. More generally, these results support the feasibility of using unstructured EHR data to generate real-world evidence and drive continuous improvements in learning health systems.

Question Is natural language processing (NLP) a viable alternative to manually abstracting disease activity from procedure notes?

Findings We compared different methods for abstracting the ulcerative colitis Mayo endoscopic subscore from colonoscopy reports. Classifiers trained using automated machine learning (autoML) achieved the greatest accuracy (97%), recognized when to abstain, generalized well to other health systems, required limited effort for annotation and programming, demonstrated fairness, and had a small carbon footprint.

Meaning NLP methods like autoML appear to be sufficiently mature technologies for clinical text classification, and thus are poised to enable many downstream endeavors using electronic health records data.

Competing Interest Statement

Disclosures: Vivek Rudrapatna receives research support from the following for-profit entities: Janssen Research and Development, Alnylam, and Genentech. Atul Butte is a co-founder and consultant to Personalis and NuMedii; consultant to Mango Tree Corporation, and in the recent past, Samsung, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, and Merck, and Roche; is a shareholder in Personalis and NuMedii; is a minor shareholder in Apple, Meta (Facebook), Alphabet (Google), Microsoft, Amazon, Snap, 10x Genomics, Illumina, Regeneron, Sanofi, Pfizer, Royalty Pharma, Moderna, Sutro, Doximity, BioNtech, Invitae, Pacific Biosciences, Editas Medicine, Nuna Health, Assay Depot, and Vet24seven, and several other non-health related companies and mutual funds; and has received honoraria and travel reimbursement for invited talks from Johnson and Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, and many academic institutions, medical or disease specific foundations and associations, and health systems. Atul Butte receives royalty payments through Stanford University, for several patents and other disclosures licensed to NuMedii and Personalis. The research of Atul Butte has been funded by NIH, Peraton (as the prime on an NIH contract), Genentech, Johnson and Johnson, FDA, Robert Wood Johnson Foundation, Leon Lowenstein Foundation, Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, and in the recent past, the March of Dimes, Juvenile Diabetes Research Foundation, California Governors Office of Planning and Research, California Institute for Regenerative Medicine, LOreal, and Progenity. These institutions had no influence on the study or the manuscript. The remaining authors have nothing to disclose nor any conflicts of interest.

Funding Statement

Funding Source: This study was supported by funding from the UCSF Bakar Computational Health Science Institute and the National Center for Advancing Translational Sciences of the NIH, grant number UL1TR001872. VAR was supported by funding from the NIH/National Center for Advancing Translational Sciences, grant number TL1TR001871.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The IRB of University of California gave ethical approval for this work (#18-24588).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The analytic code has been made publicly available at https://github.com/rwelab/MayoClassifier. The data used for this study contains protected health information and thus have not been made available for reuse. However, a machine-redacted version of the data can be made available to requesting researchers by mutual agreement and following the execution of a data use agreement.

https://github.com/rwelab/MayoClassifier