Applications of Natural Language Processing at Emergency Department Triage: A Systematic Review

Jonathon Stewart; Juan Lu; Adrian Goudie; Glenn Arendts; Shiv A Meka; Sam Freeman; Katie Walker; Peter Sprivulis; Frank Sanfilippo; Mohammed Bennamoun; Girish Dwivedi

doi:10.1101/2022.12.20.22283735

ABSTRACT

INTRODUCTION Millions of patients attend emergency departments (EDs) around the world every year. Patients are triaged on arrival by a trained nurse who collects structured data and an unstructured free-text history of presenting complaint. Natural language processing (NLP) uses various computational methods to analyse and understand human language, and has been applied to data acquired at ED triage to predict various outcomes. The objective of this systematic review is to evaluate how NLP has been applied to ED triage, assess if NLP based models outperform humans or current risk stratification techniques, and assess if incorporating free-text improve predictive performance of models when compared to predictive models that use only structured data.

METHODS All English language peer-reviewed research that applied an NLP technique to free-text obtained at ED triage was eligible for inclusion. We excluded studies focusing solely on disease surveillance, and studies that used information obtained after triage. We searched the electronic databases MEDLINE, Embase, Cochrane Database of Systematic Reviews, Web of Science, and Scopus for medical subject headings and text keywords related to NLP and triage. Databases were last searched on 01/01/2022. Risk of bias in studies was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Due to the high level of heterogeneity between studies, a metanalysis was not conducted. Instead, a narrative synthesis is provided.

RESULTS In total, 3584 studies were screened, and 19 studies were included. The population size varied greatly between studies ranging from 1.8 million patients to 762 simulated encounters. The most common primary outcomes assessed were prediction of triage score, prediction of admission, and prediction of critical illness. NLP models achieved high accuracy in predicting need for admission, critical illness, and mapping free-text chief complaints to structured fields. Overall, NLP models predicted admission with greater accuracy than emergency physicians, outperformed abnormal vital sign trigger and triage score at predicting critical illness, and were more accurate than nurses at assigning triage scores in two out of three papers. Incorporating both structured data and free-text data improved results when compared to models that used only structured data. The majority of studies were (79%) were assessed to have a high risk of bias, and only one study reported the deployment of an NLP model into clinical practice.

CONCLUSION Unstructured free-text triage notes contain valuable information that can be used by NLP models to predict clinically relevant outcomes. The use of NLP at ED triage appears feasible and could allow for early and accurate prediction of multiple important patient-oriented outcomes. However, there are few examples of implementation of into clinical practice, most research in retrospective, and the potential benefits of NLP at triage are yet to be realised.

INTRODUCTION

Millions of patients attend emergency departments (EDs) around the world every year.¹ Queues for care are common, so patients are often triaged on arrival to the ED by a trained nurse. Triage is central to the practice of emergency medicine.² In the face of excess demand, triage allows EDs to allocate their finite resources in an equitable, efficient, and standardised way.^3,4 Triage systems in current use include the Emergency Severity Index (ESI), Australasian Triage Scale (ATS), Manchester Triage Scale (MTS), and the Korean Triage and Acuity Scale (KTAS).^3,5 Triage systems aim to aid emergency care providers in making a structured decision regarding the urgency of care that a patient requires, and in doing so, identify and prioritise those patients with time-sensitive care needs.^3,4 However, urgency of care does not necessarily reflect severity of illness (as judged by morbidity or mortality). For example, a young patient with a known history of recurrent renal calculi who presents with severe flank pain may be appropriately triaged as high urgency to receive analgesia, but will most likely have a good clinical outcome, whereas an elderly patient with undifferentiated abdominal pain may be triaged as a lower urgency but have higher risk of morbidity and mortality. No triage tool is perfect, and all have issues with sensitivity and specificity resulting in over and under-triage, particularly for certain demographic groups and conditions.^6-8 There is opportunity to improve triage performance in identifying patients with critical illness, and for improving triage accuracy and the consistency of triage categorisation between healthcare workers.³

Machine learning (ML) is a subfield of artificial intelligence (AI), that uses various methods to automatically deduce patterns in data, then makes predictions.⁹ These patterns are learned from the data rather than being explicitly pre-programmed by humans. ML models are iteratively improved through a process called training. In supervised ML training, the model’s predicted output is compared to a “ground truth”, and the error between the predicted value and ground truth is progressively reduced through the training process.⁹ ML models have the potential to improve risk stratification and outcome prediction in the ED setting.^10-12

Triage has been identified as a promising area to apply ML in the ED.^{13, 14} ML has previously been applied successfully to structured data acquired at triage (such as patient age and vital signs) to predict outcomes including need for admission and intensive care.^{15, 16} Triage nurses routinely collect structured data and an unstructured free-text history of presenting complaint, capturing their impression and subjective assessment about the presentation. This free-text may be more expressive, nuanced, and contain a higher level of information than structured data.¹⁷ Prior work has suggested that incorporating free-text may improve the performance of ML at ED triage and is an important area for future research despite the challenges of incorporating free-text data into models.^18-20

Natural language processing (NLP) uses computational methods to analyse and understand human language and its structure.²¹ Early NLP techniques were relatively simple. For example, a “bag-of-words” model bases its decision on the relative frequencies of words in the text, ignoring their order.²² These early models often lacked the ability to assess context, negations, and as a result had numerous limitations.²³ Significant advancements in NLP have been made over the last few years through the use of Deep Learning (DL), a subfield of ML.^{24, 25} DL models pass data through multiple processing layers and in doing so, achieve increasingly abstract representations of the input data, enabling them to learn complex functions.²⁶ Massive DL based NLP models have recently been developed.^27-29 These models have been trained on datasets containing billions of words and have achieved high levels of performance.^27-29 Some large, pre-trained models, such as Bidirectional Encoder Representations from Transformers (BERT) are publicly available.²⁷ Using a pre-trained model allows researchers to take a high performing model as their starting point, and then customise it to their unique needs through fine tuning the model on their local data. For example, Tahayori et al. were able to accurately predict admission from ED using only free-text triage notes and a BERT based NLP model.³⁰ Multimodal models integrate NLP with other types of ML to analyse combinations of both free-text data and structured data (such as age and vital signs).

Objectives

This systematic review aims to evaluate the applications of NLP at ED triage by answering the following questions:

How has NLP been applied to ED triage?
Do NLP based models outperform humans or current risk stratification techniques?
Does incorporating free-text improve predictive performance of ML models when compared to ML models that use only structured data?

METHODS

A systematic review protocol was prepared in accordance with PRISMA-P guidelines and registered with the International Prospective Register of Systematic Reviews (PROSPERO) on 04/10/2021 (Registration ID: CRD42021276980).^31,32 All English language peer-reviewed research that applied an NLP technique to free-text obtained at ED triage were eligible for inclusion. As this study aims to broadly assess the capability of NLP at triage, all outcomes and comparators were included. We excluded studies focusing solely on disease surveillance, and studies that used information obtained after triage (such as emergency physician clinical notes and investigations performed within the ED).

We searched PubMed (MEDLINE), Embase, Cochrane Database of Systematic Reviews, Web of Science, and Scopus for research published from 01/01/2012 to present. Electronic databases were first searched on 16/09/2021 and last searched on 01/01/2022. We searched for medical subject headings (MeSH) and text keywords related to NLP and triage. The search strategy was iteratively developed by the multidisciplinary project team that included emergency physicians and computer scientists. The MEDLINE search strategy is provided in Appendix 1, and was adapted to the other databases. Reference lists of the included studies and the authors’ personal archives were reviewed for further relevant literature.

Citations and abstracts were screened independently by two reviewers (JS and JL) against the inclusion and exclusion criteria. Both reviewers were blind to the journal titles, study authors, and institutions. Full text articles were obtained for any articles identified by one reviewer to meet inclusion criteria. Two reviewers (JS and JL) then evaluated the full text reports against the inclusion and exclusion criteria. Data were extracted by JS and JL using a standardised form that included study country, study design, primary outcome, number of sites, study population, input data, NLP and ML models used, comparison, and results. The form was piloted, and calibration exercises were conducted prior to formal data extraction to ensure consistency between reviewers. In cases of conflict or discrepancy, additional review authors were involved until a decision was reached. There were no uncertainties that required authors of the included studies to be contacted.

Data extracted included the study country, study type, outcomes, population, input data, NLP technique, ML method, comparisons, results, public availability of datasets, and public availability of model code. Risk of bias in studies was assessed independently by two authors (JS and JL) using the Prediction model Risk of Bias Assessment Tool (PROBAST).³³ Due to the high level of heterogeneity between studies, a metanalysis was not conducted. Instead, a narrative synthesis is provided to summarise review findings.

RESULTS

Study selection

This process is summarised in a PRISMA Flow Diagram (Figure 1). There were 5099 records identified following database searching and a further 11 records identified through other sources. Following removal of duplicates, 3584 records remained and underwent title and abstract screening. 3448 records were excluded. The remaining 136 full-text articles were assessed for eligibility. In total, 117 articles were excluded, and 19 studies remained for inclusion (Figure 1). There were no unresolved disagreements as to study inclusion or results of data extraction.

Characteristics of included studies

A summary of the included studies is shown in Table 1. There were 18 retrospective studies.^{17, 18, 30, 34-48} One study reported their ML model was developed using retrospective data then validated using prospective data.⁴⁹ All used observational cohort designs. Two studies were international multi-centre studies (USA and Portugal); 11 were conducted in the USA; 2 were from South Korea; one each from Australia, Brazil, China, and France. The most common primary outcomes assessed were prediction of triage score (six studies), prediction of admission (five studies), and prediction of critical illness (three studies). Two studies predicted need for imaging within the ED, two studies looked at the assignment of provider assigned chief complaint label, and one study predicted diagnosis of infection in the ED.

View this table:

Table 1. Summary of included studies.

Abbreviations

KTAS - Korean Triage and Acuity Scale

ESI - Emergency Severity Index

ICU - Intensive Care Unit

ED - Emergency Department

ML - Machine learning

FHx - Family history

SHx - Social history

PMHx - Past medical history

Vitals - Respiratory rate (RR), heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), temperature (Temp), and oxygen saturation (SPO2).

MTS - Manchester Triage system

BERT - bidirectional encoder representations from transformers

XGBoost - eXtreme Gradient Boosting

LSTM - Long short-term memory

DNN - Deep neural network

LR - Logistic regression

RF - Random forest

CNN - Convolutional neural network

ANN - Artificial neural network

BoW - Bag-of-words

PCA - Principal component analysis

SVM - Support vector machine

KNN - k-nearest neighbors

F1 - the harmonic mean of precision and recall

AUC - Area under the receiver operating characteristic curve

ELMo - Embeddings from Language Model

NEWS - National Early Warning Score

The population size varied greatly between studies ranging from 1.8 million patients to 762 simulated encounters. Four studies used a population of under 100 000, four studies had a population of between 100 000 and 200 000, six studies had a population of between 200 001 and 300 000, and six studies had a population of over 300 000. Eleven studies used data from a single site and eight studies used data from multiple sites. The largest number of sites used was 642 by Zhang et al.

Fourteen studies applied NLP to free-text history of presenting complaint, seven studies applied NLP to a free-text chief complaint, two studies applied NLP to a structured chief complaint label, and one study applied NLP to simulated triage dialogues that had been transcribed by either a human or an ML model. The other most frequently used input variables were patient demographics (13 studies), patient vital signs (heart rate, respiratory rate, oxygen saturation, blood pressure, and temperature) (15 studies), pain score (12 studies), triage score (10 studies), mode of arrival (10 studies), time of arrival (8 studies) and past medical history (7 studies). Other input variables included mental status (5 studies), and blood glucose level (5 studies).

Prediction of admission

NLP models and multimodal models were able to accurately predict admission at time of triage for adult and paediatric patients.^{18, 30, 35, 41, 46} Of the five studies focusing on predicting admission to hospital, Roquette et al. achieved the highest Area Under the Receiver Operating Characteristic Curve (AUC) using a gradient boosting model (AUC 0.89). Tahayori et al. achieved a similar AUC (0.88) using only free-text history of presenting complaint. Tahayori et al. were the only authors that compared their model to emergency physician performance. Their model achieved a higher accuracy than five emergency consultants (0.83 vs 0.78) and higher specificity (0.86 vs 0.77), but lower sensitivity (0.72 vs 0.9). Roquette et al. and Zhang et al. both compared ML models trained using structured data only with ML models that incorporated both structured data and text data. They found that the addition of text data results in a small improvement when compared to the use of structured data alone.

Prediction of critical illness

Multimodal models were able to accurately predict critical illness in adult patients, defined as ICU admission, cardiopulmonary arrest within 24 hours, or death within 24 hours of triage.^43-45 Of the three studies that predicted critical illness at triage, Fernandes et al. achieved the highest AUC (0.96) in predicting in-hospital death or cardiopulmonary arrest within 24 hours of triage using an extreme gradient boosting model. They found no difference in AUC when using clinical variables only or clinical variables and structured chief complaint processed by NLP. Joseph et al. found their NLP model (AUC 0.857) significantly outperformed an abnormal vital sign trigger (AUC 0.521) and ESI score ≤ 2 (AUC 0.672) in predicting critical illness. The addition of free-text data improved the performance of their neural network model (from AUC 0.820 to AUC 0.857).

Prediction of triage score

NLP has been applied in multiple triage systems. NLP models and multimodal models were able to accurately assign triage categories using structured and free-text data.^{17, 36}-^{38, 47, 48} Wang et al. achieved the highest performance in predicting ESI using their “DeepTriager” model (AUC 0.96). Kim et al. achieved an AUC of 0.89 in assigning a KTAS category to auto-transcribed simulated triage dialogue. This was only slightly lower than the performance achieved using human-transcribed simulated triage dialogue (AUC 0.90). Three studies compared the accuracy of triage scores assigned by multimodal models incorporating NLP to triage scores assigned by nurses.^{17, 36, 47} Such models were more accurate than nurses in two out of three papers.^{36, 47} The addition of text data compared to structured data alone improved performance in assigning triage score.^{36, 37}

Prediction of provider-assigned chief complaint

NLP models and multimodal models incorporating NLP were able to accurately map free-text history of presenting complaint to structured chief complaints.^{42, 49} Chang et al. (2020) used BERT to accurately predict provider-assigned chief complaint labels (Top-5 structured label AUC 0.92). Greenbaum et al. (2019) iteratively developed their own structured ontology and were eventually able to map 97.2% of presentations to their structured ontology using their NLP based predictive model.

Prediction of investigations

Multimodal models incorporating NLP were able to predict diagnostic imaging performed in the ED.^{39, 40} Zhang et al. developed a model to predict need for advanced diagnostic imaging (computed tomography, ultrasound, magnetic resonance imaging) in the ED, and obtained an AUC 0.78 using a “bag-of-words” model. Zhang et al. were also able to predict the need for any diagnostic imaging in a paediatric population with an AUC 0.824. The inclusion of unstructured variables improved performance slightly in both cases.

Identifying infection

Horng et al. (2017) found that the incorporation of free-text data improves the discriminatory ability (increase in AUC from 0.67 to 0.86) for identifying sepsis (defined by ICD-9-CM code) in the ED at triage.

Multimodal models

Eleven papers compared ML models that used only structured data to multimodal models that incorporated both structured data and free-text data.^{34-40, 43-46} The best performing model in each of these papers incorporated free-text. The largest improvement in model performance from incorporating free-text was found by Horng et al. (increase in AUC from 0.67 to 0.86 for identifying infection). The addition of free-text did not improve model AUC in one case, however, did improve model average precision.⁴⁴ There were no cases where the incorporation of free-text into the model resulted in worse performance. Six papers assessed models that used only free-text, with no structured data.^{30, 36, 37, 39, 40, 42} Tahayori et al. were able to use only free-text data to predict admission with high accuracy (83%). Zhang et al. used free-text to predict performance of diagnostic imaging. Gligorijevic’s “Deep Attention” models using only unstructured data outperformed those using only structured data. Incorporating both structured data and free-text data improved results when compared to models that used only free-text data, though often only a small improvement was found.

Modern NLP compared to traditional NLP

Three papers directly compared modern NLP based on DL to more traditional ML techniques such as bag-of-words and topic modelling.^{30, 38, 48} Modern DL based NLP outperformed traditional ML based NLP in two cases.^{30, 38} In contrast, Kim et al. found that a BERT based DL model did not perform better than ML based models, though their population was relatively small. Chang et al. compared the performance of multiple modern DL based models, finding BERT slightly outperformed Embeddings from Language Models (ELMo) and Long Short-Term Memory (LSTM) networks in mapping free-text chief complaints to structured fields.

Integration into practice

Greenbaum et al. was the only study that reported the deployment of an NLP based model into clinical practice. Greenbaum et al. aimed to increase the ease of high-quality structured data collection at triage through the use of an NLP based model. Their model used both free-text triage notes and structured data to provide contextual autocomplete of chief complaint label, and also show the user a list of the top five most likely chief complaints. Prior to implementation of their model, 26.2% of patient encounters resulted in structured data capture. Following implementation this increased to 97.2%. The authors aggregated multiple incidents of unscheduled downtime that occurred throughout the study to assess the impact of their model. When ML based autocomplete was not operational (and instead alphabetised autocomplete was shown), the percent of encounters that resulted in structure data capture decreased from 97.2% to 89.2%. The number of keystrokes typed for each presenting problem decreased from 11.6 pre-implementation to 0.6 post implementation. Contextual autocomplete was associated with qualitatively more complete and higher quality structured documentation of chief complaints.

Study quality—Risk of bias within and across studies

A summary of the PROBAST assessment is provided in Table 2. Overall, 15 studies were considered to have a high risk of bias. Four studies were assessed as having a low risk of bias. One study had high applicability concerns and 18 studies had low applicability concerns. The four studies assessed as having low risk of bias also had low applicability concerns. No studies referred to a previously published or publicly registered protocol.

View this table:

Table 2. PROBAST assessment of the included studies.

PROBAST = Prediction model Risk Of Bias ASsessment Tool; ROB = risk of bias.

+ indicates low ROB/low concern regarding applicability;

− indicates high ROB/high concern regarding applicability; and

? indicates unclear ROB/unclear concern regarding applicability

Availability of datasets and code

Availability of study datasets and code is shown in Table 3. Data was publicly available for three studies (all by Zhang et al.) and was available on request from study authors for a further four studies.^{30, 34, 35,39, 40,43, 44} One study reported plans to release a modified de-identified dataset, however at the time of this review this is still pending approvals.⁴⁵ The model code was publicly available for two studies.^{42, 45} Notably, the code repository from Chang et al. was well organised and contained clear instructions for researchers on how to download their pretrained model and apply it to their own dataset.

View this table:

Table 3. Availability of dataset and code for included studies.

* Data available on request from the authors and may be released to researchers following the signing of a data sharing agreement.

** Pending approval, a modified, de-identified dataset containing modified chief complaint text data will be uploaded. Approval still pending at time of this review.

*** All data freely and publicly available.

DISCUSSION

NLP at triage

This review finds that NLP has been applied to data available at the time of ED triage to predict a range of outcomes, with a focus on predicting need for admission and assigned triage score. The results of this review also highlight that unstructured free-text triage notes contain valuable information. Through NLP techniques, this information has started to become accessible to use for automated predictive purposes. The combination of free-text nursing triage notes with structured data appears to result in the best model performance, however free-text nursing triage notes alone can be used by NLP algorithms to predict need for admission and need for diagnostic imaging.^{18, 30, 39, 40} A benefit of developing models that require only free-text as an input is that it may allow for easier portability of predictive models between different triage systems.³⁰

Structured data capture

Accurate and consistent structured capture of patients’ presenting complaints is important for research, service improvement, and public health initiatives.⁴⁹ Common medical ontologies also improve system interoperability.⁵⁰ However collection of structured data is often difficult, especially when contrasted with the ease and expressiveness of free-text entry.⁴⁹ In a rare singular example of NLP being deployed into routine clinical practice at ED triage, Greenbaum et al. developed, implemented, and prospectively evaluated an NLP driven user interface to mitigate the challenges of structured data capture.⁴⁹ Promisingly, they report that their NLP based contextual auto-predict did not add additional burden to users and made structured data collection easier than unstructured data collection. Because of this, structured data collection increased significantly.

Improving ED workflow and efficiency

ED overcrowding is a serious issue worldwide, with significant negative impact on patient morbidity and mortality. Having an emergency physician triage patients (or implementing a rapid assessment zone) enables early senior clinician input and decision making, and can lead to a reduced patient ED length of stay.^{51, 52} Patient time spent in the waiting room is likely underutilised.⁵² NLP could be applied to triage notes to predict which patients will likely require investigations such as blood tests or imaging, and in doing so allow for these investigations to be ordered immediately on arrival, rather than only being ordered after they are seen by a doctor. An emergency physician could review and then approve or reject suggested investigations. In this way, applying NLP to triage could leverage the expertise of the emergency physician.

Delays in specialist consultation and subsequent specialist review contribute to reduced ED throughput, and improvements in the consultation process from the ED have the potential to reduce ED length of stay.⁵³ Using NLP to identify at the point of triage, patients who are likely to require admission could assist with hospital resource allocation, improve patient flow, and allow for anticipation of system stressors, such as worsening access block.^{18, 30} Bed allocation could begin at the time of patient triage, rather than hours into a patient’s ED stay.³⁰ To fully realise the potential of predicting admission at triage, the NLP model would need to be supported by other infrastructure. For example, an “early admission team” could review patients who are flagged as very likely to be admitted, or stable patients not needing acute resuscitation could be diverted away from the ED and sent to the appropriate specialty team.

NLP compared to humans

Human performance may be a reasonable baseline for ML models to meet to be considered accurate enough for implementation into clinical practice. Few studies have compared NLP models at triage to human performance. Such comparisons will be crucial in future work.

Tahayori et al. was the only study that compared results from NLP models to emergency physicians.³⁰ Ivanov et al., Sterling et al, and Gligorijevic et al. compared NLP based models to nurses in assigning triage scores and found model accuracy was similar to nurses.^{17, 36, 47}

Interpretability

Few papers attempted to address human interpretability of models. While DL has been criticised as being a “black box”, there is ongoing work to develop more “explainable AI”.^{54, 55} Wang et al. show how models could be somewhat more interpretable.³⁸ Their triage model is able to highlight free-text triage notes, with a darker colour corresponding to the sections of text that was more heavily weighted by the model. This provides an initial “sense check” that humans can then combine with their own experience and knowledge.

Modern NLP

While it is difficult to compare studies due to their heterogeneity, advanced DL based NLP appears to outperform traditional NLP. This is certainly the case when compared internally within studies and is consistent with previous NLP research.⁵⁶ BERT appears to be the most popular advanced NLP that has been used. BERT was released in October 2018 and at the time of release, BERT outperformed other NLP models.²⁷ However, of the 16 papers published since the release of BERT, only three have used it. Other large models have subsequently been released. For example, GPT-3 is a 175 billion parameter language model that was released in 2020 and is reported to outperform BERT in various circumstances.²⁸ Chowdhery et al. have recently published Pathways Language Model (PaLM), a 540-billion parameter model that achieves further increases in performance.²⁹

FUTURE DIRECTIONS

Triage is a promising place to start applying NLP in the ED. Large datasets with clearly labelled outcomes makes triage well suited to applications of ML. Triage information is often available hours before emergency physician documentation, and accurate predictions made at triage have the potential to increase healthcare system efficiency.¹⁸ There is also the possibility of close human oversite if deployed in practice. Future work could aim to predict other important patient-oriented outcomes at the time of triage such as wait times, need for advanced cardiovascular investigations, or need for surgery.

Incorporating clinical gestalt

Sterling et al. 2020 noted the difficulty in capturing the general clinical impression of the triage nurse.¹⁷ Ivanov et al. also noted that important contextual aspects at triage were not available for consideration by ML models.⁴⁷ Future work could assess the impact of incorporating triage nurses’ gestalt into predictive models. This could be expanded to also capture patients’ predictions regarding their need for admission to hospital. Other contextual data available at the time of triage such as the number of patients currently waiting, the number of patients currently in the ED, and number of admitted patients in the hospital could also be incorporated into ML models.

Integration with other AI systems

There are opportunities to integrate NLP as part of a larger AI based system. Kim et al. provides an interesting example of how various AI based technologies can be combined.⁴⁸ Speech recognition could be used to automatically generate a transcript of the entire triage conversation, which could then be used by NLP models. However, the performance of speech recognition technologies would likely deteriorate in a noisy ED, and combining multiple complex AI based technologies raises the possibility that small initial errors could be amplified as they propagate through the models. NLP models at triage could also be integrated with other novel AI based interventions, such as automated monitoring of patients’ vital signs while they are in the waiting room, or with data entered by patients themselves in AI based self-triage applications.

Pre-trained models for ED triage

Publicly available large DL based language models have often been trained on corpuses containing text from newspapers, books, and websites.^{27, 28} Triage notes are often quite short and contain a number of unique and idiosyncratic abbreviations and acronyms not common in everyday English language.^{17, 30} The benefits of applying DL based NLP models to triage notes may yet to be fully realised, as they were not developed for triage specific purposes. DL based NLP models that have been fine-tuned on large corpuses of medical text have been released, however they have not been applied to ED triage. Large publicly available clinical databases such as MIMIC-IV that contain ED triage notes with linked outcomes are likely to be helpful in further model development and may facilitate direct comparisons between models developed by different research groups.^{57, 58} Triage focused NLP research could also benefit from groups sharing large language models that have been pre-trained on triage data. These models could be used as starting points by others, though it is unknown whether such models’ performance would generalise across different healthcare settings and triage systems. It is also unknown if the length of triage notes impacts model performance. This could be evaluated in future work.

Prospective and external validation is needed

The majority of research so far has been retrospective and completed in the USA. There is a significant need for prospective evaluation and external validation, especially in other countries and triage systems.

Clinical impact and risk

NLP models have rarely been deployed at ED triage. As such, it is unknown what impact these tools could have on clinical practice. The introduction of a new tool into a complex system is likely to have unintended consequences, and use of the tool may itself change practice. Triage notes may be written in a different way if it is known that they are being used for predictive purposes. There may also be unintended harms. For example, telling a patient at triage that they are likely to be admitted or to have a long wait time, could influence their behaviour and increase the number of patients who leave without being seen. It may be useful to establish the performance benchmarks predictive models must meet prior to implementation into clinical practice. This could be done through further studies comparing NLP model performance to emergency physicians and nurses. Further research is also required to understand how to best integrate early admission predictions into hospital systems and clinical practice.

NLP models can be retrained and updated as new data becomes available. Therefore, model performance may change over time. It will be important to ensure that there is appropriate algorithm stewardship in place prior to clinical use.⁵⁹ Predictive models are trained on data that reflects current practice. This engrains the assumption that current practice is appropriate, which may not be the case.

Acceptability

It is also unknown if the use of NLP at triage is acceptable to patients and staff. It will be important to involve clinicians, patients, and healthcare consumer groups in the development and governance of any future implementation projects. It will also be important to ensure that these systems do not place further burden on users. Ease of use and perceived clinical impact will likely be important factors for adoption by clinicians.

Ethical issues

Race, age, and gender biases at ED triage have been previously reported.^60-62 Concerns over bias in ML models have been well described, and new tools are being developed to assess such biases.^63-65 At its best, NLP at triage could help reduce bias through standardising triage decisions and providing a more objective triage score. However, at its worst NLP at triage could further ingrain existing biases into practice, under the guise of objectivity and hidden in the opacity of abstract algorithms. Patient apprehensions and concerns about the use of AI will also need to be considered. While the emerging body of literature shows patients view AI largely positively, they do have some concerns with its use in healthcare.⁶⁶ These include perceptions that AI is less accurate than clinicians, there is a lack of transparency in predictions, and there are risks to the privacy of their personal healthcare data.^67-72 Further research investigating the impact of NLP based tools on vulnerable and minority populations is warranted.

LIMITATIONS

Study level

Only one study contained prospectively validated results, and no studies contained results that were externally validated at a separate site. Results reported may not be generalisable to other settings. There was inconsistent reporting of methods and results among studies. The majority of studies (79%) were assessed to have a high risk of bias.

Review level

Heterogeneity of the included studies precluded meta-analysis which limits the level of evidence this review provides. All studies reported positive results for NLP at triage, which may reflect publication bias. While we took significant care to ensure our search strategy was broad enough to capture all relevant literature, the variety of NLP and ML terminology means that some studies may have been missed. Non-English articles, and articles published prior to 2012 were also excluded from our search.

CONCLUSION

The use of NLP at triage appears feasible and could accurately predict important patient-oriented outcomes including need for admission and need for critical care. However, there are few examples of implementation into clinical practice and most research is retrospective. The potential benefits of using NLP at triage are yet to be realised. Further research is needed to prospectively assess the acceptability and impact of implementing NLP at triage on staff, patients, and the healthcare system.

Funding

This project was supported by the Western Australian Health Translation Network’s Health Service Translational Research Project and the Australian Government’s Medical Research Future Fund (MRFF) as part of the Rapid Applied Research Translation program. Authors who received grant: JS, GD, MB, PS, FS Funder Website: https://wahtn.org/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interested

No authors declare any competing interests.

REFERENCES

1.↵
Morley C, Unwin M, Peterson GM, Stankovich J, Kinsman L. Emergency department crowding: A systematic review of causes, consequences and solutions. PLoS One. 2018;13(8):e0203316.
OpenUrl CrossRef PubMed Google Scholar
2.↵
Iserson KV, Moskop JC. Triage in medicine, part I: Concept, history, and types. Ann Emerg Med. 2007 Mar;49(3):275–81.
OpenUrl CrossRef PubMed Web of Science Google Scholar
3.↵
Hinson JS, Martinez DA, Cabral S, George K, Whalen M, Hansoti B, et al. Triage performance in emergency medicine: a systematic review. Ann Emerg Med. 2019 Jul;74(1):140–52.
OpenUrl CrossRef PubMed Google Scholar
4.↵
Cameron P, Little M, Mitra B, Deasy C, editors. Textbook of adult emergency medicine. Fifth edition. Edinburgh: Elsevier; 2020.
Google Scholar
5.↵
Park JB, Lim TH. Korean Triage and Acuity Scale (KTAS). Journal of The Korean Society of Emergency Medicine. 2017;28(6):547–51.
OpenUrl Google Scholar
6.↵
Zachariasse JM, van der Hagen V, Seiger N, Mackway-Jones K, van Veen M, Moll HA. Performance of triage systems in emergency care: a systematic review and meta-analysis. BMJ Open. 2019 May 28;9(5):e026471.
OpenUrl Abstract/FREE Full Text Google Scholar
7.
Jeppesen E, Cuevas-Østrem M, Gram-Knutsen C, Uleberg O. Undertriage in trauma: an ignored quality indicator? Scand J Trauma Resusc Emerg Med. 2020 May 6;28(1):34.
OpenUrl Google Scholar
8.↵
Banco D, Chang J, Talmor N, Wadhera P, Mukhopadhyay A, Lu X, et al. Sex and race differences in the evaluation and treatment of young adults presenting to the emergency department with chest pain. J Am Heart Assoc. 2022 May 17;11(10):e024199.
OpenUrl Google Scholar
9.↵
Murphy KP. Machine learning: a probabilistic perspective. Cambridge, Mass.: MIT Press; 2012.
Google Scholar
10.↵
Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine. Emerg Med Australas. 2018 Dec;30(6):870–4.
OpenUrl CrossRef PubMed Google Scholar
11.
Kareemi H, Vaillancourt C, Rosenberg H, Fournier K, Yadav K. Machine learning versus usual care for diagnostic and prognostic prediction in the emergency department: a systematic review. Acad Emerg Med. 2021 Feb;28(2):184–96.
OpenUrl PubMed Google Scholar
12.↵
Stewart J, Lu J, Goudie A, Bennamoun M, Sprivulis P, Sanfillipo F, et al. Applications of machine learning to undifferentiated chest pain in the emergency department: A systematic review. PLoS One. 2021;16(8):e0252612.
OpenUrl Google Scholar
13.↵
Levin S, Toerper M, Hamrock E, Hinson JS, Barnes S, Gardner H, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018 May;71(5):565-574.e2.
OpenUrl CrossRef PubMed Google Scholar
14.↵
Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos-Cabrera MB, Velando-Soriano A, et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022 Jan;60:101109.
OpenUrl PubMed Google Scholar
15.↵
Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One. 2018;13(7):e0201016.
OpenUrl CrossRef PubMed Google Scholar
16.↵
Kwon JM, Lee Y, Lee Y, Lee S, Park H, Park J. Validation of deep-learning-based triage and acuity score using a large national dataset. PLoS One. 2018;13(10):e0205836.
OpenUrl PubMed Google Scholar
17.↵
Sterling NW, Brann F, Patzer RE, D. M, Koebbe M, Burke M, et al. Prediction of emergency department resource requirements during triage: An application of current natural language processing techniques. J Am Coll Emerg Physicians Open. 2020 Dec;1(6):1676–83.
OpenUrl Google Scholar
18.↵
Sterling NW, Patzer RE, D. M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019 Sep;129:184–8.
OpenUrl CrossRef PubMed Google Scholar
19.
Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020 Mar 31;8(3):e17984.
OpenUrl PubMed Google Scholar
20.↵
Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform. 2015 Oct;57:28–37.
OpenUrl CrossRef Google Scholar
21.↵
Russell SJ, Norvig P, Davis E. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River: Prentice Hall; 2010
Google Scholar
22.↵
Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, Mass: MIT Press; 1999.
Google Scholar
23.↵
Juluru K, Shih HH, Keshava Murthy KN, Elnajjar P. Bag-of-words technique in natural language processing: a primer for radiologists. Radiographics. 2021;41(5):1420–6.
OpenUrl Google Scholar
24.↵
Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing [review article]. IEEE Computational Intelligence Magazine. 2018 Aug;13(3):55–75.
OpenUrl CrossRef Google Scholar
25.↵
Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020 ar 1;27(3):457–70.
OpenUrl CrossRef PubMed Google Scholar
26.↵
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015 May 28;521(7553):436–44.
OpenUrl CrossRef PubMed Google Scholar
27.↵
Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) [Internet]. Minneapolis, Minnesota: Association for Computational Linguistics; 2019 [cited 2022 Apr 6]. p. 4171–86. Available from: https://aclanthology.org/N19-1423
Google Scholar
28.↵
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2022 Apr 8]. p. 1877–901. Available from: https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Google Scholar
29.↵
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. Palm: scaling language modeling with pathways [Internet]. arXiv; 2022 [cited 2022 Apr 5]. Available from: http://arxiv.org/abs/2204.02311
Google Scholar
30.↵
Tahayori B, Chini-Foroush N, Akhlaghi H. Advanced natural language processing technique to predict patient disposition based on emergency triage notes. Emerg Med Australas [Internet]. 2020;33(3):480–4. Available from: http://dx.doi.org/10.1111/1742-6723.13656
OpenUrl Google Scholar
31.↵
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (Prisma-p) 2015 statement. Syst Rev. 2015 Jan 1;4(1):1.
OpenUrl CrossRef PubMed Google Scholar
32.↵
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021 Mar 29;372:n71.
OpenUrl FREE Full Text Google Scholar
33.↵
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019 Jan 1;170(1):51–8.
OpenUrl CrossRef PubMed Google Scholar
34.↵
Horng S, Sontag DA, Halpern Y, Jernite Y, Shapiro NI, Nathanson LA. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS One. 2017;12(4):e0174708.
OpenUrl CrossRef PubMed Google Scholar
35.↵
Zhang X, Kim J, Patzer RE, Pitts SR, Patzer A, Schrager JD. Prediction of emergency department hospital admission based on natural language processing and neural networks. Methods Inf Med. 2017 Oct 26;56(5):377–89.
OpenUrl CrossRef Google Scholar
36.↵
Gligorijevic D, Stojanovic J, Satz W, Stojkovic I, Schreyer K, Del Portal D, et al. Deep attention model for triage of emergency department patients. In: Proceedings of the 2018 SIAM International Conference on Data Mining (SDM) [Internet]. Society for Industrial and Applied Mathematics; 2018 [cited 2022 Dec 17]. p. 297–305. (Proceedings). Available from: https://epubs.siam.org/doi/abs/10.1137/1.9781611975321.34
Google Scholar
37.↵
Choi SW, Ko T, Hong KJ, Kim KH. Machine learning-based prediction of korean triage and acuity scale level in emergency department patients. Healthc Inform Res. 2019 Oct;25(4):305–12.
OpenUrl Google Scholar
38.↵
Wang G, Liu X, Xie K, Chen N, Chen T. Deeptriager: a neural attention model for emergency triage with electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2019. p. 978–82.
Google Scholar
39.↵
Zhang X, Bellolio MF, Medrano-Gracia P, Werys K, Yang S, Mahajan P. Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department. BMC Med Inform Decis Mak. 2019 Dec 30;19(1):287.
OpenUrl CrossRef Google Scholar
40.↵
Zhang X, Kim J, Patzer RE, Pitts SR, Chokshi FH, Schrager JD. Advanced diagnostic imaging utilization during emergency department visits in the United States: A predictive modeling study for emergency department triage. PLoS One. 2019;14(4):e0214905.
OpenUrl CrossRef Google Scholar
41.↵
Arnaud É, Elbattah M, Gignon M, Dequen G. Deep learning to predict hospitalization at triage: integration of structured data and unstructured text. In: 2020 IEEE International Conference on Big Data (Big Data). 2020. p. 4836–41.
Google Scholar
42.↵
Chang D, Hong WS, Taylor RA. Generating contextual embeddings for emergency department chief complaints. JAMIA Open. 2020 Jul;3(2):160–6.
OpenUrl Google Scholar
43.↵
Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Predicting Intensive Care Unit admission among patients presenting to the emergency department using machine learning and natural language processing. PLoS One. 2020;15(3):e0229331.
OpenUrl Google Scholar
44.↵
Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Risk of mortality and cardiopulmonary arrest in critical patients presenting to the emergency department using machine learning and natural language processing. PLoS One. 2020;15(4):e0230876.
OpenUrl Google Scholar
45.↵
Joseph JW, Leventhal EL, Grossestreuer AV, Wong ML, Joseph LJ, Nathanson LA, et al. Deep-learning approaches to identify critically Ill patients at emergency department triage using limited information. J Am Coll Emerg Physicians Open. 2020 Oct;1(5):773–81.
OpenUrl Google Scholar
46.↵
Roquette BP, Nagano H, Marujo EC, Maiorano AC. Prediction of admission in pediatric emergency department with deep neural networks and triage textual data. Neural Netw. 2020 Jun;126:170–7.
OpenUrl CrossRef Google Scholar
47.↵
Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K, et al. Improving ed emergency severity index acuity assignment using machine learning and clinical natural language processing. J Emerg Nurs. 2021 Mar;47(2):265-278.e7.
OpenUrl PubMed Google Scholar
48.↵
Kim D, Oh J, Im H, Yoon M, Park J, Lee J. Automatic classification of the korean triage acuity scale in simulated emergency rooms using speech recognition and natural language processing: a proof of concept study. J Korean Med Sci. 2021 Jul 12;36(27):e175.
OpenUrl Google Scholar
49.↵
Greenbaum NR, Jernite Y, Halpern Y, Calder S, Nathanson LA, Sontag DA, et al. Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces. Int J Med Inform. 2019 Dec;132:103981.
OpenUrl Google Scholar
50.↵
Liyanage H, Krause P, De Lusignan S. Using ontologies to improve semantic interoperability in health data. J Innov Health Inform. 2015 Jul 10;22(2):309–15.
OpenUrl Google Scholar
51.↵
Abdulwahid MA, Booth A, Kuczawski M, Mason SM. The impact of senior doctor assessment at triage on emergency department performance measures: systematic review and meta-analysis of comparative studies. Emerg Med J. 2016 Jul;33(7):504–13.
OpenUrl Abstract/FREE Full Text Google Scholar
52.↵
Begaz T, Elashoff D, Grogan TR, Talan D, Taira BR. Initiating diagnostic studies on patients with abdominal pain in the waiting room decreases time spent in an emergency department bed: a randomized controlled trial. Ann Emerg Med. 2017 Mar;69(3):298–307.
OpenUrl Google Scholar
53.↵
Asplin BR, Magid DJ, Rhodes KV, Solberg LI, Lurie N, Camargo CA. A conceptual model of emergency department crowding. Ann Emerg Med. 2003 Aug;42(2):173–80.
OpenUrl CrossRef PubMed Web of Science Google Scholar
54.↵
Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (Xai). IEEE Access. 2018;6:52138–60.
OpenUrl Google Scholar
55.↵
Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable ai: a review of machine learning interpretability methods. Entropy (Basel). 2020 Dec 25;23(1):E18.
OpenUrl Google Scholar
56.↵
Li H. Deep learning for natural language processing: advantages and challenges. National Science Review [Internet]. 2018 Jan 1 [cited 2022 Jun 16];5(1):24–6. Available from: https://academic.oup.com/nsr/article/5/1/24/4107792
OpenUrl Google Scholar
57.↵
Johnson, Alistair, Bulgarelli, Lucas, Pollard, Tom Celi, Leo Anthony, Mark, Roger, Horng, Steven. Mimic-iv-ed [Internet]. PhysioNet; [cited 2022 Jun 22]. Available from: https://physionet.org/content/mimic-iv-ed/2.0/
Google Scholar
58.↵
Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000 Jun 13;101(23):E215–220.
OpenUrl CrossRef PubMed Web of Science Google Scholar
59.↵
Eaneff S, Obermeyer Z, Butte AJ. The case for algorithmic stewardship for artificial intelligence and machine learning technologies. JAMA. 2020 Oct 13;324(14):1397–8.
OpenUrl PubMed Google Scholar
60.↵
Schrader CD, Lewis LM. Racial disparity in emergency department triage. J Emerg Med. 2013 Feb;44(2):511–8.
OpenUrl PubMed Google Scholar
61.
Kuhn L, Page K, Rolley JX, Worrall-Carter L. Effect of patient sex on triage for ischaemic heart disease and treatment onset times: A retrospective analysis of Australian emergency department data. Int Emerg Nurs. 2014 Apr;22(2):88–93.
OpenUrl Google Scholar
62.↵
Vigil JM, Coulombe P, Alcock J, Kruger E, Stith SS, Strenth C, et al. Patient ethnicity affects triage assessments and patient prioritization in u. S. Department of veterans affairs emergency departments. Medicine (Baltimore). 2016 Apr;95(14):e3191.
OpenUrl Google Scholar
63.↵
Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018 Nov 1;178(11):1544–7.
OpenUrl Google Scholar
64.
Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable ai: a review of machine learning interpretability methods. Entropy (Basel). 2020 Dec 25;23(1):18.
OpenUrl Google Scholar
65.↵
Vokinger KN, Feuerriegel S, Kesselheim AS. Mitigating bias in machine learning for medicine. Commun Med (Lond). 2021 Aug 23;1:25.
OpenUrl Google Scholar
66.↵
Young AT, Amara D, Bhattacharya A, Wei ML. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digit Health. 2021 Sep;3(9):e599–611.
OpenUrl Google Scholar
67.↵
Ongena YP, Haan M, Yakar D, Kwee TC. Patients’ views on the implementation of artificial intelligence in radiology: development and validation of a standardized questionnaire. Eur Radiol. 2020 Feb;30(2):1033–40.
OpenUrl CrossRef PubMed Google Scholar
68.
Bala S, Keniston A, Burden M. Patient perception of plain-language medical notes generated using artificial intelligence software: pilot mixed-methods study. JMIR Form Res. 2020 Jun 5;4(6):e16670.
OpenUrl Google Scholar
69.
Nelson CA, Pérez-Chada LM, Creadore A, Li SJ, Lo K, Manjaly P, et al. Patient perspectives on the use of artificial intelligence for skin cancer screening: a qualitative study. JAMA Dermatol. 2020 May 1;156(5):501–12.
OpenUrl Google Scholar
70.
Jutzi TB, Krieghoff-Henning EI, Holland-Letz T, Utikal JS, Hauschild A, Schadendorf D, et al. Artificial intelligence in skin cancer diagnostics: the patients’ perspective. Front Med (Lausanne). 2020;7:233.
OpenUrl PubMed Google Scholar
71.
Nadarzynski T, Miles O, Cowie A, Ridge D. Acceptability of artificial intelligence (Ai)-led chatbot services in healthcare: A mixed-methods study. Digit Health. 2019;5:2055207619871808.
OpenUrl Google Scholar
72.↵
Palmisciano P, Jamjoom AAB, Taylor D, Stoyanov D, Marcus HJ. Attitudes of patients and their relatives toward artificial intelligence in neurosurgery. World Neurosurg. 2020 Jun;138:e627–33.
OpenUrl CrossRef PubMed Google Scholar

Posted December 21, 2022.

Download PDF

Author Declarations

Supplementary Material

Data/Code

Citation Tools

Get QR code

Tweet Widget

Subject Area

Emergency Medicine

Reviews and Context

Comment

TRIP Peer Reviews

Community Reviews

Automated Services

Blogs/Media

Author Videos

Subject Areas

All Articles

Addiction Medicine (418)
Allergy and Immunology (740)
Anesthesia (217)
Cardiovascular Medicine (3175)
Dentistry and Oral Medicine (355)
Dermatology (268)
Emergency Medicine (469)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1128)
Epidemiology (13147)
Forensic Medicine (17)
Gastroenterology (878)
Genetic and Genomic Medicine (4980)
Geriatric Medicine (458)
Health Economics (761)
Health Informatics (3133)
Health Policy (1115)
Health Systems and Quality Improvement (1156)
Hematology (418)
HIV/AIDS (988)
Infectious Diseases (except HIV/AIDS) (14449)
Intensive Care and Critical Care Medicine (897)
Medical Education (462)
Medical Ethics (121)
Nephrology (511)
Neurology (4726)
Nursing (251)
Nutrition (699)
Obstetrics and Gynecology (858)
Occupational and Environmental Health (773)
Oncology (2433)
Ophthalmology (692)
Orthopedics (272)
Otolaryngology (335)
Pain Medicine (315)
Palliative Medicine (88)
Pathology (523)
Pediatrics (1263)
Pharmacology and Therapeutics (535)
Primary Care Research (536)
Psychiatry and Clinical Psychology (4060)
Public and Global Health (7296)
Radiology and Imaging (1634)
Rehabilitation Medicine and Physical Therapy (974)
Respiratory Medicine (953)
Rheumatology (468)
Sexual and Reproductive Health (486)
Sports Medicine (409)
Surgery (527)
Toxicology (66)
Transplantation (226)
Urology (196)

Comments

medRxiv aims to provide a venue for anyone to comment on a medRxiv preprint. Comments are moderated for offensive or irrelevant content (this can take ~24 h). Please avoid duplicate submissions and read our Comment Policy before commenting. The content of a comment is not endorsed by medRxiv.

medRxiv aims to inform readers about online discussion of this preprint occurring elsewhere. The content at the links below is not endorsed by either medRxiv or the preprint's authors.

Community reviews for this article:

There are no community reviews for this paper.

Automated Evaluations

Certain services provide automated analysis of preprints. Analyses invited by the authors are displayed at the top of this tab. Those done independently of authors are shown underneath . None of these analyses is endorsed by medRxiv.

Automated Evaluations:

There are no automated evaluations for this paper.

[1] 1.↵
Morley C, Unwin M, Peterson GM, Stankovich J, Kinsman L. Emergency department crowding: A systematic review of causes, consequences and solutions. PLoS One. 2018;13(8):e0203316.
OpenUrl CrossRef PubMed Google Scholar

[2] 2.↵
Iserson KV, Moskop JC. Triage in medicine, part I: Concept, history, and types. Ann Emerg Med. 2007 Mar;49(3):275–81.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[3] 3.↵
Hinson JS, Martinez DA, Cabral S, George K, Whalen M, Hansoti B, et al. Triage performance in emergency medicine: a systematic review. Ann Emerg Med. 2019 Jul;74(1):140–52.
OpenUrl CrossRef PubMed Google Scholar

[4] 4.↵
Cameron P, Little M, Mitra B, Deasy C, editors. Textbook of adult emergency medicine. Fifth edition. Edinburgh: Elsevier; 2020.
Google Scholar

[5] 5.↵
Park JB, Lim TH. Korean Triage and Acuity Scale (KTAS). Journal of The Korean Society of Emergency Medicine. 2017;28(6):547–51.
OpenUrl Google Scholar

[6] 6.↵
Zachariasse JM, van der Hagen V, Seiger N, Mackway-Jones K, van Veen M, Moll HA. Performance of triage systems in emergency care: a systematic review and meta-analysis. BMJ Open. 2019 May 28;9(5):e026471.
OpenUrl Abstract/FREE Full Text Google Scholar

[7] 7.
Jeppesen E, Cuevas-Østrem M, Gram-Knutsen C, Uleberg O. Undertriage in trauma: an ignored quality indicator? Scand J Trauma Resusc Emerg Med. 2020 May 6;28(1):34.
OpenUrl Google Scholar

[8] 8.↵
Banco D, Chang J, Talmor N, Wadhera P, Mukhopadhyay A, Lu X, et al. Sex and race differences in the evaluation and treatment of young adults presenting to the emergency department with chest pain. J Am Heart Assoc. 2022 May 17;11(10):e024199.
OpenUrl Google Scholar

[9] 9.↵
Murphy KP. Machine learning: a probabilistic perspective. Cambridge, Mass.: MIT Press; 2012.
Google Scholar

[10] 10.↵
Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine. Emerg Med Australas. 2018 Dec;30(6):870–4.
OpenUrl CrossRef PubMed Google Scholar

[11] 11.
Kareemi H, Vaillancourt C, Rosenberg H, Fournier K, Yadav K. Machine learning versus usual care for diagnostic and prognostic prediction in the emergency department: a systematic review. Acad Emerg Med. 2021 Feb;28(2):184–96.
OpenUrl PubMed Google Scholar

[12] 12.↵
Stewart J, Lu J, Goudie A, Bennamoun M, Sprivulis P, Sanfillipo F, et al. Applications of machine learning to undifferentiated chest pain in the emergency department: A systematic review. PLoS One. 2021;16(8):e0252612.
OpenUrl Google Scholar

[13] 13.↵
Levin S, Toerper M, Hamrock E, Hinson JS, Barnes S, Gardner H, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018 May;71(5):565-574.e2.
OpenUrl CrossRef PubMed Google Scholar

[14] 14.↵
Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos-Cabrera MB, Velando-Soriano A, et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022 Jan;60:101109.
OpenUrl PubMed Google Scholar

[15] 15.↵
Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One. 2018;13(7):e0201016.
OpenUrl CrossRef PubMed Google Scholar

[16] 16.↵
Kwon JM, Lee Y, Lee Y, Lee S, Park H, Park J. Validation of deep-learning-based triage and acuity score using a large national dataset. PLoS One. 2018;13(10):e0205836.
OpenUrl PubMed Google Scholar

[17] 17.↵
Sterling NW, Brann F, Patzer RE, D. M, Koebbe M, Burke M, et al. Prediction of emergency department resource requirements during triage: An application of current natural language processing techniques. J Am Coll Emerg Physicians Open. 2020 Dec;1(6):1676–83.
OpenUrl Google Scholar

[18] 18.↵
Sterling NW, Patzer RE, D. M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019 Sep;129:184–8.
OpenUrl CrossRef PubMed Google Scholar

[19] 19.
Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020 Mar 31;8(3):e17984.
OpenUrl PubMed Google Scholar

[20] 20.↵
Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform. 2015 Oct;57:28–37.
OpenUrl CrossRef Google Scholar

[21] 21.↵
Russell SJ, Norvig P, Davis E. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River: Prentice Hall; 2010
Google Scholar

[22] 22.↵
Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, Mass: MIT Press; 1999.
Google Scholar

[23] 23.↵
Juluru K, Shih HH, Keshava Murthy KN, Elnajjar P. Bag-of-words technique in natural language processing: a primer for radiologists. Radiographics. 2021;41(5):1420–6.
OpenUrl Google Scholar

[24] 24.↵
Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing [review article]. IEEE Computational Intelligence Magazine. 2018 Aug;13(3):55–75.
OpenUrl CrossRef Google Scholar

[25] 25.↵
Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020 ar 1;27(3):457–70.
OpenUrl CrossRef PubMed Google Scholar

[26] 26.↵
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015 May 28;521(7553):436–44.
OpenUrl CrossRef PubMed Google Scholar

[27] 27.↵
Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) [Internet]. Minneapolis, Minnesota: Association for Computational Linguistics; 2019 [cited 2022 Apr 6]. p. 4171–86. Available from: https://aclanthology.org/N19-1423
Google Scholar

[28] 28.↵
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2020 [cited 2022 Apr 8]. p. 1877–901. Available from: https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Google Scholar

[29] 29.↵
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. Palm: scaling language modeling with pathways [Internet]. arXiv; 2022 [cited 2022 Apr 5]. Available from: http://arxiv.org/abs/2204.02311
Google Scholar

[30] 30.↵
Tahayori B, Chini-Foroush N, Akhlaghi H. Advanced natural language processing technique to predict patient disposition based on emergency triage notes. Emerg Med Australas [Internet]. 2020;33(3):480–4. Available from: http://dx.doi.org/10.1111/1742-6723.13656
OpenUrl Google Scholar

[31] 31.↵
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (Prisma-p) 2015 statement. Syst Rev. 2015 Jan 1;4(1):1.
OpenUrl CrossRef PubMed Google Scholar

[32] 32.↵
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021 Mar 29;372:n71.
OpenUrl FREE Full Text Google Scholar

[33] 33.↵
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019 Jan 1;170(1):51–8.
OpenUrl CrossRef PubMed Google Scholar

[34] 34.↵
Horng S, Sontag DA, Halpern Y, Jernite Y, Shapiro NI, Nathanson LA. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS One. 2017;12(4):e0174708.
OpenUrl CrossRef PubMed Google Scholar

[35] 35.↵
Zhang X, Kim J, Patzer RE, Pitts SR, Patzer A, Schrager JD. Prediction of emergency department hospital admission based on natural language processing and neural networks. Methods Inf Med. 2017 Oct 26;56(5):377–89.
OpenUrl CrossRef Google Scholar

[36] 36.↵
Gligorijevic D, Stojanovic J, Satz W, Stojkovic I, Schreyer K, Del Portal D, et al. Deep attention model for triage of emergency department patients. In: Proceedings of the 2018 SIAM International Conference on Data Mining (SDM) [Internet]. Society for Industrial and Applied Mathematics; 2018 [cited 2022 Dec 17]. p. 297–305. (Proceedings). Available from: https://epubs.siam.org/doi/abs/10.1137/1.9781611975321.34
Google Scholar

[37] 37.↵
Choi SW, Ko T, Hong KJ, Kim KH. Machine learning-based prediction of korean triage and acuity scale level in emergency department patients. Healthc Inform Res. 2019 Oct;25(4):305–12.
OpenUrl Google Scholar

[38] 38.↵
Wang G, Liu X, Xie K, Chen N, Chen T. Deeptriager: a neural attention model for emergency triage with electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2019. p. 978–82.
Google Scholar

[39] 39.↵
Zhang X, Bellolio MF, Medrano-Gracia P, Werys K, Yang S, Mahajan P. Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department. BMC Med Inform Decis Mak. 2019 Dec 30;19(1):287.
OpenUrl CrossRef Google Scholar

[40] 40.↵
Zhang X, Kim J, Patzer RE, Pitts SR, Chokshi FH, Schrager JD. Advanced diagnostic imaging utilization during emergency department visits in the United States: A predictive modeling study for emergency department triage. PLoS One. 2019;14(4):e0214905.
OpenUrl CrossRef Google Scholar

[41] 41.↵
Arnaud É, Elbattah M, Gignon M, Dequen G. Deep learning to predict hospitalization at triage: integration of structured data and unstructured text. In: 2020 IEEE International Conference on Big Data (Big Data). 2020. p. 4836–41.
Google Scholar

[42] 42.↵
Chang D, Hong WS, Taylor RA. Generating contextual embeddings for emergency department chief complaints. JAMIA Open. 2020 Jul;3(2):160–6.
OpenUrl Google Scholar

[43] 43.↵
Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Predicting Intensive Care Unit admission among patients presenting to the emergency department using machine learning and natural language processing. PLoS One. 2020;15(3):e0229331.
OpenUrl Google Scholar

[44] 44.↵
Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Risk of mortality and cardiopulmonary arrest in critical patients presenting to the emergency department using machine learning and natural language processing. PLoS One. 2020;15(4):e0230876.
OpenUrl Google Scholar

[45] 45.↵
Joseph JW, Leventhal EL, Grossestreuer AV, Wong ML, Joseph LJ, Nathanson LA, et al. Deep-learning approaches to identify critically Ill patients at emergency department triage using limited information. J Am Coll Emerg Physicians Open. 2020 Oct;1(5):773–81.
OpenUrl Google Scholar

[46] 46.↵
Roquette BP, Nagano H, Marujo EC, Maiorano AC. Prediction of admission in pediatric emergency department with deep neural networks and triage textual data. Neural Netw. 2020 Jun;126:170–7.
OpenUrl CrossRef Google Scholar

[47] 47.↵
Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K, et al. Improving ed emergency severity index acuity assignment using machine learning and clinical natural language processing. J Emerg Nurs. 2021 Mar;47(2):265-278.e7.
OpenUrl PubMed Google Scholar

[48] 48.↵
Kim D, Oh J, Im H, Yoon M, Park J, Lee J. Automatic classification of the korean triage acuity scale in simulated emergency rooms using speech recognition and natural language processing: a proof of concept study. J Korean Med Sci. 2021 Jul 12;36(27):e175.
OpenUrl Google Scholar

[49] 49.↵
Greenbaum NR, Jernite Y, Halpern Y, Calder S, Nathanson LA, Sontag DA, et al. Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces. Int J Med Inform. 2019 Dec;132:103981.
OpenUrl Google Scholar

[50] 50.↵
Liyanage H, Krause P, De Lusignan S. Using ontologies to improve semantic interoperability in health data. J Innov Health Inform. 2015 Jul 10;22(2):309–15.
OpenUrl Google Scholar

[51] 51.↵
Abdulwahid MA, Booth A, Kuczawski M, Mason SM. The impact of senior doctor assessment at triage on emergency department performance measures: systematic review and meta-analysis of comparative studies. Emerg Med J. 2016 Jul;33(7):504–13.
OpenUrl Abstract/FREE Full Text Google Scholar

[52] 52.↵
Begaz T, Elashoff D, Grogan TR, Talan D, Taira BR. Initiating diagnostic studies on patients with abdominal pain in the waiting room decreases time spent in an emergency department bed: a randomized controlled trial. Ann Emerg Med. 2017 Mar;69(3):298–307.
OpenUrl Google Scholar

[53] 53.↵
Asplin BR, Magid DJ, Rhodes KV, Solberg LI, Lurie N, Camargo CA. A conceptual model of emergency department crowding. Ann Emerg Med. 2003 Aug;42(2):173–80.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[54] 54.↵
Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (Xai). IEEE Access. 2018;6:52138–60.
OpenUrl Google Scholar

[55] 55.↵
Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable ai: a review of machine learning interpretability methods. Entropy (Basel). 2020 Dec 25;23(1):E18.
OpenUrl Google Scholar

[56] 56.↵
Li H. Deep learning for natural language processing: advantages and challenges. National Science Review [Internet]. 2018 Jan 1 [cited 2022 Jun 16];5(1):24–6. Available from: https://academic.oup.com/nsr/article/5/1/24/4107792
OpenUrl Google Scholar

[57] 57.↵
Johnson, Alistair, Bulgarelli, Lucas, Pollard, Tom Celi, Leo Anthony, Mark, Roger, Horng, Steven. Mimic-iv-ed [Internet]. PhysioNet; [cited 2022 Jun 22]. Available from: https://physionet.org/content/mimic-iv-ed/2.0/
Google Scholar

[58] 58.↵
Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000 Jun 13;101(23):E215–220.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[59] 59.↵
Eaneff S, Obermeyer Z, Butte AJ. The case for algorithmic stewardship for artificial intelligence and machine learning technologies. JAMA. 2020 Oct 13;324(14):1397–8.
OpenUrl PubMed Google Scholar

[60] 60.↵
Schrader CD, Lewis LM. Racial disparity in emergency department triage. J Emerg Med. 2013 Feb;44(2):511–8.
OpenUrl PubMed Google Scholar

[61] 61.
Kuhn L, Page K, Rolley JX, Worrall-Carter L. Effect of patient sex on triage for ischaemic heart disease and treatment onset times: A retrospective analysis of Australian emergency department data. Int Emerg Nurs. 2014 Apr;22(2):88–93.
OpenUrl Google Scholar

[62] 62.↵
Vigil JM, Coulombe P, Alcock J, Kruger E, Stith SS, Strenth C, et al. Patient ethnicity affects triage assessments and patient prioritization in u. S. Department of veterans affairs emergency departments. Medicine (Baltimore). 2016 Apr;95(14):e3191.
OpenUrl Google Scholar

[63] 63.↵
Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018 Nov 1;178(11):1544–7.
OpenUrl Google Scholar

[64] 64.
Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable ai: a review of machine learning interpretability methods. Entropy (Basel). 2020 Dec 25;23(1):18.
OpenUrl Google Scholar

[65] 65.↵
Vokinger KN, Feuerriegel S, Kesselheim AS. Mitigating bias in machine learning for medicine. Commun Med (Lond). 2021 Aug 23;1:25.
OpenUrl Google Scholar

[66] 66.↵
Young AT, Amara D, Bhattacharya A, Wei ML. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digit Health. 2021 Sep;3(9):e599–611.
OpenUrl Google Scholar

[67] 67.↵
Ongena YP, Haan M, Yakar D, Kwee TC. Patients’ views on the implementation of artificial intelligence in radiology: development and validation of a standardized questionnaire. Eur Radiol. 2020 Feb;30(2):1033–40.
OpenUrl CrossRef PubMed Google Scholar

[68] 68.
Bala S, Keniston A, Burden M. Patient perception of plain-language medical notes generated using artificial intelligence software: pilot mixed-methods study. JMIR Form Res. 2020 Jun 5;4(6):e16670.
OpenUrl Google Scholar

[69] 69.
Nelson CA, Pérez-Chada LM, Creadore A, Li SJ, Lo K, Manjaly P, et al. Patient perspectives on the use of artificial intelligence for skin cancer screening: a qualitative study. JAMA Dermatol. 2020 May 1;156(5):501–12.
OpenUrl Google Scholar

[70] 70.
Jutzi TB, Krieghoff-Henning EI, Holland-Letz T, Utikal JS, Hauschild A, Schadendorf D, et al. Artificial intelligence in skin cancer diagnostics: the patients’ perspective. Front Med (Lausanne). 2020;7:233.
OpenUrl PubMed Google Scholar

[71] 71.
Nadarzynski T, Miles O, Cowie A, Ridge D. Acceptability of artificial intelligence (Ai)-led chatbot services in healthcare: A mixed-methods study. Digit Health. 2019;5:2055207619871808.
OpenUrl Google Scholar

[72] 72.↵
Palmisciano P, Jamjoom AAB, Taylor D, Stoyanov D, Marcus HJ. Attitudes of patients and their relatives toward artificial intelligence in neurosurgery. World Neurosurg. 2020 Jun;138:e627–33.
OpenUrl CrossRef PubMed Google Scholar

Applications of Natural Language Processing at Emergency Department Triage: A Systematic Review

ABSTRACT

INTRODUCTION

Objectives

METHODS

RESULTS

Study selection

Characteristics of included studies

Prediction of admission

Prediction of critical illness

Prediction of triage score

Prediction of provider-assigned chief complaint

Prediction of investigations

Identifying infection

Multimodal models

Modern NLP compared to traditional NLP

Integration into practice

Study quality—Risk of bias within and across studies

Availability of datasets and code

DISCUSSION

NLP at triage

Structured data capture

Improving ED workflow and efficiency

NLP compared to humans

Interpretability

Modern NLP

FUTURE DIRECTIONS

Incorporating clinical gestalt

Integration with other AI systems

Pre-trained models for ED triage

Prospective and external validation is needed

Clinical impact and risk

Acceptability

Ethical issues

LIMITATIONS

Study level

Review level

CONCLUSION

Data Availability

Funding

Competing Interested

REFERENCES

Subject Area

Follow this preprint