Abstract
Background Low health literacy is associated with poor health outcomes. Hospital discharge instructions are often written at advanced reading levels, limiting patients’ with low health literacy ability to follow medication instructions or complete other necessary care. Previous research demonstrates that improving the readability of discharge instructions reduces hospital readmissions and decreases healthcare costs. We aimed to use artificial intelligence (AI) to improve the readability of discharge instructions.
Methodology/Principal Findings We collected a series of discharge instructions for adults hospitalized for heart failure (n=423), which were then manually simplified to a lower reading level to create two parallel sets of discharge instructions. Only 343 sets were then processed via AI-based machine learning to create a trained algorithm. We then tested the algorithm on the remaining 80 discharge instructions. Output was evaluated quantitatively using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid readability scores and cross-entropy analysis and qualitatively. Using this test dataset (n=80), the average reading levels were: original discharge instructions (SMOG: 10.5669±1.2634, Flesch-Kincaid: 8.6038±1.5509), human-simplified instructions (SMOG: 9.4406±1.0791, Flesch-Kincaid: 7.2221±1.3794), and AI-simplified instructions (SMOG: 9.3045±0.9531, Flesch-Kincaid: 7.0464±1.1308). AI-simplified instructions were significantly different from original instructions (p<0.00001). The algorithm made appropriate changes in 26.1% of instances to the original discharge instructions and improved average reading levels by 1.26±0.32 grade levels (SMOG) and 1.02±0.47 grade levels (Flesch-Kincaid). Cross-entropy analysis showed that as the data set increased in size, the function of the algorithm improved.
Conclusions/Significance The AI-based algorithm learned meaningful phrase-level simplifications from the human-simplified discharge instructions. The AI simplifications, while not in complete agreement with the human simplifications, do appear as statistically significant improvements to SMOG and Flesch-Kincaid reading levels. The algorithm will likely produce more meaningful and concise simplifications among discharge instructions as it is trained on more data. This study demonstrates an important opportunity for AI integration into healthcare delivery to address health disparities related to limited health literacy and potentially improve patient health.
Author summary Patient-facing materials are often written at too high of a reading level for patients, such as hospital discharge instructions. These instructions provide critical information on how to control health conditions, take medications, and attend follow-up visits. Difficulty understanding these instructions could lead to the patient returning to the hospital if they do not understand how to control their health condition.
Improving the readability of discharge instructions can reduce hospital readmissions. It may improve health outcomes for patients and reduce healthcare costs. Artificial intelligence (AI) may be used to improve the reading level of patient-facing materials. Our work aims to create a tool that can accomplish this goal.
We obtained hospital discharge instructions for heart failure. Discharge instructions were edited by medical experts to improve their readability. This created two sets of discharge instructions that were processed using AI. We created and tested an AI tool to automatically simplify discharge instructions. Although not perfect, we found that the tool was successful. This research shows that AI can be used to address health literacy needs within health care by making patient-facing health materials easier to understand. This is important to empower all patients to take action to improve their health.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The authors received no specific funding for this work. However, we have the following financial disclosures: JK is the recipient of grants from NIH, PCORI, CDC, SAMHSA, HRSA, AHRQ, and the State of PA.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Not Applicable
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Study ID: 00018117 Penn State Office for Research Protections Human Research Protection Program Phone:814-865-1775 Fax:814-863-8699 Email: irb-orp@psu.edu
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Not Applicable
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Not Applicable
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Not Applicable
Data Availability
The code used for algorithm training is publicly available at https://www.kaggle.com/code/daveyfoley/readability-transformer. The SNOMED ontology used in the development of the algorithm can be accessed at snomedct.org. Because SNOMED is protected by copyrights, elements of the code involving SNOMED, as well as the algorithm itself, cannot be shared publicly. Out of concern for the protection of patient privacy and confidentiality, we are unable to make the full training dataset publicly available. However, a deidentified partial training set and data analysis can be found at kaggle.com/datasets/ntcannon/SimplifyTrainingDataset. Additionally, access to the full dataset and algorithm is available to researchers who meet the criteria for access to confidential data and can be obtained by contacting the Penn State Institutional Review Board (irb-orp@psu.edu) or corresponding author, Nathan Cannon, at ncannon@pennstatehealth.psu.edu.