Evaluating the impact on clinical task efficiency of a natural language processing algorithm for searching medical documents: Prospective crossover study
=========================================================================================================================================================

* Eunsoo H Park
* Hannah I Watson
* Felicity V Mehendale
* Alison Q O’Neil

## Abstract

**Background** Information retrieval (IR) from the free text within Electronic Health Records (EHRs) is time-consuming and complex. We hypothesise that Natural Language Processing (NLP)-enhanced search functionality for EHRs can make clinical workflows more efficient and reduce cognitive load for clinicians.

**Objective** To evaluate the efficacy of three levels of search functionality (no search, string search, and NLP-enhanced search) in supporting IR for clinical users from the free text of EHR documents in a simulated clinical environment.

**Methods** A clinical environment was simulated by uploading three sets of patient notes into an EHR research software application and presenting these alongside three corresponding IR tasks. Tasks contained a mixture of multiple choice and free text questions.

A prospective crossover study design was used, for which three groups of evaluators were recruited, comprised of doctors (n=19) and medical students (n=16). Evaluators performed the three tasks using each of the search functionalities in an order according to their randomly assigned group. The speed and accuracy of task completion was measured and analysed, and user perceptions of NLP-enhanced search were reviewed in a feedback survey.

**Results** NLP-enhanced search facilitated significantly more accurate task completion than both string search (5.26%, p=0.01) and no search (7.44%, p=0.05). NLP-enhanced search and string search facilitated similar task speeds, both showing an increase in speed over no search function (15.9%/11.6%, p=0.05). 93% of evaluators agreed that NLP-enhanced search would make clinical workflows more efficient than string search, with qualitative feedback reporting that NLP-enhanced search reduced cognitive load.

**Conclusions** To the best of our knowledge, this study is the largest evaluation to date of different search functionalities for supporting target clinical users in realistic clinical workflows, with a 3-way prospective crossover study design. NLP-enhanced search improved both accuracy and speed of clinical EHR IR tasks compared to browsing clinical notes without search. NLP-enhanced search improved accuracy and reduced the number of searches required for clinical EHR IR tasks compared to direct search term matching.

Key words
*   Clinical Decision Support
*   Electronic Health Records
*   Natural Language Processing
*   Semantic Search

## Introduction

The benefits of the transition from storing patient information in paper notes to Electronic Health Records (EHRs) has been a topic of debate among healthcare professionals [1-4]. Many clinicians have expressed dissatisfaction with their current hospital systems and EHR use is consistently cited as a contributor to clinician burnout [5-7]. Approximately 40% of doctors’ time is spent documenting patient information, with evidence showing that this work burden has increased following EHR implementation [8,9]. However, difficulties in quickly and accurately retrieving relevant information from these documents means this wealth of collected information is often not fully utilised [10,11]. Navigating EHR documents is challenging due to the complexity of medical text, which tends to include frequent misspellings, abbreviations, specialty-specific acronyms, and clinical shorthand [12-15]. Time-consuming and inaccurate information gathering from EHRs limits the efficiency of wider clinical workflows [16], with some doctors believing that difficulties in retrieving patient information significantly impact face-to-face patient care [17].

Despite the increasing sophistication of general search engines, there remain relatively limited search options within medical record software. One barrier is the need for patient data to be held securely; therefore, access to computing power and shared resources may be limited. To have clinical utility, search facilities must be fast and intuitive for use by time-pressured clinicians, including relatively junior members of staff to whom the task of searching through complex notes is frequently delegated. In addition, the search must handle high variability of text expression as mentioned above. Clinical text is error-prone; unlike journals and other publications, there is no editorial control to check for errors. Medical terminology, acronyms and abbreviations vary between regions, hospitals and even across different specialties; for instance, “CHD” may be related to chronic heart disease (Cardiology), congenital heart disease (Paediatrics), or congenital hip dislocation (Orthopaedics). Since clinical care is a high stakes environment, failure to find relevant information potentially has great implications; to effectively save the time of clinicians, search tools should ideally go beyond document-level results to locate and highlight all relevant sentences or even words within a document. Efforts to achieve easier information retrieval (IR) have included the integration of string search in some EHRs, similar to the “Ctrl-F” or “Find” function now frequently available on everyday platforms [18]. However, the effectiveness of string search is limited for heterogeneous clinical text; therefore, studies have also considered semantic search algorithms [19-22]. A large-scale retrospective analysis of searches performed in an EHR found that the use of search varied considerably across and within user roles, with physicians and pharmacists being the most active user groups [19]. A review of the use of search within EHRs found that few articles focussed on the impact of search within clinical workflows [23]; one study with 7 diabetes experts found that content-based search was both faster and more accurate than conventional search for finding relevant information [20], another study with 10 family and internal medicine physicians found that semantic search gave faster medical notes navigation for IR tasks [21], and a final study with 4 students found that a semantic search tool enabled faster clinical note summarisation [22]. Only one of the described studies [20] used a crossover study design. In this paper, a larger study is reported (n=35 valid task completions, n=42 qualitative responses), in which a 3-way prospective crossover study was conducted, comparing standard string search with no search and with Natural Language Processing (NLP)-enhanced search. The custom NLP-enhanced search tool combines ontologies with fuzzy matching to offer search functionality which captures not only semantically related terms (e.g., synonyms and hyponyms) but also linguistic alternative (miss)spellings and word forms of the search term. A simulated clinical environment was used alongside target-user feedback to determine whether search tools could make clinical workflows more efficient and reduce clinicians’ cognitive burdens when attempting to find information.

### 1.2 Aims and Hypotheses

#### Aims

To quantitatively and qualitatively compare the efficacy of three search functionalities for IR from medical free text documents, in terms of accuracy, speed, and ease of search.

#### Hypotheses

Search tools will allow clinical users to perform simulated clinical IR tasks faster and more accurately than when using no search, with the use of NLP techniques enabling NLP-enhanced search to perform more effectively than string search.

## 2 Materials and Methods

### 2.1 Search tools

The string search function is an open-source JavaScript library implementationa. NLP-enhanced search is a proprietary rule-based algorithm (developed at Canon Medical Research Europe) that leverages NLP techniques such as edit distance and stemming in conjunction with medical knowledge bases, notably the Unified Medical Language System (UMLS) semantic web, Metathesaurus [24], and the Wikipedia [25] and OpenMD medical abbreviation lists [26]. These sources are used to expand the original search term into a list of equivalent terms which are then located in the text. The tool was designed to locate linguistic variants such as misspellings and alternative spellings, wordforms, and abbreviations, as well as additional semantic synonyms.

Search tools were integrated into a patient-centric viewer (EHR research software), which allowed the user to type in a search term and view the highlighted findings within the retrieved subset of documents, which the user could scroll through. In the case of no search, the user was expected to scroll through the patient’s EHR to find the relevant information. Figure 1 illustrates the difference between the two search tools in the patient-centric viewer.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/06/01/2022.05.24.22275490/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/F1)

Figure 1: 
Example results for string search (A) and NLP-enhanced search (B) for the search term “heart”. String search returns only direct matches to “heart” (green highlights) whereas NLP-enhanced search returns also semantically related terms (yellow highlights), such as: “coronary”, the misspelling of atrial (fibrillation) as “atriall”, and the appearance of “heart” within the abbreviation of heart failure, “HF”.

### 2.2 Simulating a Clinical Workflow

Free text medical documents were synthesised for three fictional patients. These materials were paired with corresponding sets of ten IR questions for each patient, grounded in relevant and realistic clinical scenarios. Patient documents were uploaded into the patient-centric viewer. Questions were uploaded onto a custom Evaluation Platform built using Vue.js that also displayed the clinical scenarios and task-specific instructions for the evaluator. Below we describe the document synthesis and question generation in more detail.

#### Patient document synthesis

Three patient profiles were created with varying age, sex, ethnic background, social history, and medical history. The three patients were assigned primary medical specialties of respiratory, neurology, and oncology. For each patient, 20 documents were created by selecting and augmenting publicly available anonymised medical documents [27], as well as manually synthesising additional documents, to provide a patient EHR with a coherent chronological sequence of clinical events. Documents were varied and included discharge letters, outpatient clinic letters, operation notes, and general practice (GP) referral letters. To imitate real-world medical text, common misspellings, abbreviations and acronyms were included in the text, using investigator clinical experience (HW) and reference papers [13].

#### Clinical scenarios and question generation

For each task, clinical scenarios were designed to focus on real-world situations where information can be extracted from patient notes. To ensure the tasks were comparable across patients (and therefore interventions), a master template of 10 questions prompting IR was created, which was then tailored to fit each patient scenario. Questions were inspired by past medical exam questions [28] and investigators’ clinical experience (HW, FM). Requested information resembled that required in typical clinical workflows to support clinical decision-making. Care was taken to ensure task questions tested search function and not clinical knowledge or judgement, therefore all answers could be found by searching the respective patient’s notes. Questions required a mixture of multiple choice and free text responses. Examples of scenarios and corresponding questions for each patient can be seen in Figure 2.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/06/01/2022.05.24.22275490/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/F2)

Figure 2: 
Examples of clinical scenarios for each patient and their corresponding question-answer options. Scenarios aimed to simulate a standard clinical workflow, providing context for the questions.

### 2.3 Study Design

The clinical evaluation pipeline was structured as a prospective crossover trial design. Evaluators were banded based on their level of clinical experience before being assigned pseudonymised evaluator IDs that were used for the remainder of the study and analysis. Evaluators in each band were then randomly allocated across the three study groups using a random number generator. This yielded three groups stratified for level of clinical experience. Each group had a predetermined order of search functionality; once the three tasks had been completed using the allocated search order, evaluators were asked to fill out a feedback survey that focused on their user experience.

### 2.4 Evaluator Recruitment and Training

Recruitment for the study was accomplished via professional contacts and advertising on social media channels to reach evaluators from a variety of clinical specialties and years of clinical experience.

A training video was provided to evaluators, which comprised a brief introduction to the study, demonstrations of the three search interventions within the patient-centric viewer and detailed instructions on how to complete the evaluation. An example patient with a small set of curated medical documents was also provided for training, on which evaluators could familiarise themselves with the capabilities of the different search functionalities.

### 2.5 Data Collection

Evaluators were provided with secure remote access to the evaluation environment (Figure 3), allowing the evaluation to be performed remotely from personal machines. Using this setup, evaluators could view the patient-centric viewer and the Evaluation Platform. Answers had to be inputted sequentially on the Evaluation Platform, which did not allow evaluators to return to a question once they had submitted an answer.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/06/01/2022.05.24.22275490/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/F3)

Figure 3: 
Screenshot of evaluation environment during a task. Evaluators only had permission to view the two relevant sites: the patient-centric viewer (left) and Evaluation Platform (right). The patient-centric viewer contains the synthetic patient documents for a given patient (in this case “Joseph Williams”) with “hba1c” as the search term. The Evaluation Platform detailed the clinical scenarios, task-specific instructions, along with the question-and-answer sections.

During each task, the evaluators submitted answers to the task questions through the Evaluation Platform. To ensure accurate recording of task times, evaluators were asked to do each task in one go, and to take breaks between tasks rather than during tasks. Evaluators were free to spend as long as they needed on each task. In addition, a search log was kept that recorded the search terms entered by the evaluator along with the search functionality used, plus the time spent on each question.

### 2.6 Data Analysis

#### Exclusion criteria

Data was excluded where search logs showed that evaluators had used an incorrect search functionality for a given task.

#### Question marking

Two clinical investigators (EP, HW) came to a consensus on the correct answers for each question. Answers were then clustered depending on the document in which they were located, and marks were awarded for finding each relevant area of correct documents. For example, if 3 pieces of clinical information across 2 unique documents were required to correctly answer a question, then 2 marks were awarded if the correct answer was inputted as the evaluator had successfully found both documents. Questions were weighted equally.

#### Statistical analysis

Data analysis was performed using custom Python code. For all metrics, samples were weighted to compensate for imbalances in group size (see Section 3.1). Paired two-tailed t-tests were performed to determine if there was a significant difference in timing and accuracy between 1) string search and no search, 2) NLP-enhanced search and no search, and 3) NLP-enhanced search and string search.

#### Search term analysis

Following the study, search term logs were analysed to extract the number and pattern of search terms for each type of search.

## 3 Results

### 3.1 Evaluator Demographics and Group Stratification

In total, 60 evaluators were recruited with multiple levels of clinical experience from medical students to doctors, and from 9 specialties ranging from vascular surgery to general practice. Of 60 recruited evaluators, 44 evaluators completed the tasks; 35 were included in the final analysis (Table 1), whilst 9 were excluded. Evaluators were excluded from the quantitative analysis if their data was corrupted (n = 2) or they completed the tasks incorrectly (n = 7), for example by using the wrong search functionality for a given task. From the original 20 evaluators per group, this gave n = 7 (Group 1), n = 13 (Group 2) and n = 15 (Group 3) successful completions. There were 42 responses to the feedback survey. Table 1 shows the final distribution of clinical experience across the groups.

View this table:
[Table 1:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/T1)

Table 1: 
Summary table showing the allocation across clinical bands and study groups.

### 3.2 Effect of search functionality on speed and accuracy of task completion

Results are shown in Tables 2 and 3. Overall, NLP-enhanced search facilitated significantly more accurate task completion than both string search (p=0.01) and no search (p=0.05). In terms of speed, NLP-enhanced search and string search gave significantly faster task completion compared to no search (p=0.05); there was no significant time difference between string search and NLP-enhanced search.

View this table:
[Table 2:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/T2)

Table 2: 
Accuracy and time for different search functionalities, showing µ=mean (SD=standard deviation).

View this table:
[Table 3:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/T3)

Table 3: 
Pairwise comparison between different search functionalities, showing mean difference +/- standard error and p-value significance level.

### 3.3 Analysis of search terms employed by evaluators

Analysis of the logged search terms (Table 4) revealed that evaluators tried almost twice as many search terms when using string search compared to NLP-enhanced search, and uptake of string search was slightly lower than NLP-enhanced search i.e., the percentage of questions for which no searches were performed was higher for string search.

View this table:
[Table 4:](http://medrxiv.org/content/early/2022/06/01/2022.05.24.22275490/T4)

Table 4: 
Analysis of used search terms, showing: the percentage of answers that used search, and the number of search terms for each of these answers i.e., µ=mean (SD=standard deviation). Best results shown in bold.

The higher number of search terms required for string search might intuitively be explained by the user needing to try multiple synonyms to find relevant information. For instance, for the question “*Does the patient have a history of stroke?*” in the text there were 4 negative mentions scattered through the documents: “does not look like she has a stroke”, “No TIA or CVA”, “No CVA” and “No CVA” (TIA = transient ischaemic attack; CVA = cerebrovascular accident). NLP-enhanced search found all mentions with the search term “stroke” (which was the only term that evaluators tried), but string search evaluators also tried “TIA”, “CVA”, “neurological”, “history” and “infarction” in their efforts to find all relevant information. Interestingly, we see that evaluators were sometimes searching for neighbouring words (“history”, “neurological”) most likely as a method to bypass the possible variation in textual mentions. Further, string search does not match spelling variants (or misspellings) and therefore evaluators sometimes tried different spellings e.g., for the question “*Is the patient currently on full-dose anticoagulant treatment?”*, both “anti-coagulant” and “anticoagulant” were tried as successive search terms by evaluators using string search.

This analysis also highlighted that the strict parameter settings for string search meant that search terms matched only to whole words, not to substrings and thus evaluators could not search with a prefix. We see some evidence of evaluators adjusting to this e.g., searching first for “anticoag” and then “anticoagulant”, or searching for both “smoke” and “smoker”, and this also increases the number of search terms tried.

### 3.4 User perceptions of NLP-enhanced search

User perceptions were assessed via the feedback survey which included a mix of Likert scale ratings, from “strongly disagree” to “strongly agree”, and free text responses. Below we summarise responses to four questions.

#### How was NLP-enhanced search perceived?

Most respondents positively described the capabilities of NLP-enhanced search, noting its identification of misspellings, wordforms and synonyms, though some reported that NLP-enhanced search returned too many findings (*“[NLP-enhanced] search was very clever and thorough but could return 100 results”)*. However, when rating the efficacy of NLP-enhanced search, 76% of respondents thought any unrelated findings, i.e., *false positives*, did not significantly impact the usefulness of the search algorithm.

#### Is NLP-enhanced search better than string search?

81% of respondents agreed that NLP-enhanced search facilitated more relevant IR than string search. However, many commented that the string search capabilities within the patient-centric viewer were more limited than they were accustomed to on everyday devices, stating that “*string search was too discriminatory*” (the parameter settings meant that only whole word matches were returned, not substrings, as discussed in Section 3.3).

#### Would NLP-enhanced search make clinical workflows more efficient?

93% (39 out of 42) respondents agreed that NLP-enhanced search would make clinical workflows more efficient than string search, in particular during clinics and clerking of patients. Free text feedback reflected this, with respondents reporting that NLP-enhanced search was useful and less time-consuming when compared to string search or no search when retrieving specific information. One evaluator commented *“the [NLP-enhanced] search tool made it significantly easier for me to find the information I was looking for and also quicker”*. On the other hand, respondents further reported that NLP-enhanced search would not always be the best method for situations where a comprehensive overview of a patient is needed. In this case, assimilating information using manual review (no search) would be more effective. One evaluator said, *“I felt that using the [NLP-enhanced] search tool meant I wasn’t focussing on the case as much but just looking for words”*. A common opinion was that NLP-enhanced search would be a useful addition to manual review for clinical tasks.

#### Would NLP-enhanced search reduce cognitive load?

Respondents frequently mentioned that NLP-enhanced search made it easier to retrieve the information they were looking for, with one evaluator stating that *“[NLP-enhanced] search is an excellent tool for a quick way to filter through relevant information”*. While a few mentioned that too many results were returned, respondents also reported that going through the relevant findings was easier and preferable to a full manual review of the notes, with manual review being described as “*tedious”, “painstaking”* and *“very easy to miss vital information”*. One evaluator commented that NLP-enhanced search could *“improve the workload of an already overworked profession”*.

## 4 Discussion

Our results showed a significant increase in accuracy when NLP-enhanced search was used compared to string search and no search, whilst both NLP-enhanced search and string search offered time savings. There was a perception of easier navigation from evaluators and a measured decrease in required interactivity in the case of NLP-enhanced search (lower number of search terms compared to string search). We caveat this conclusion with the observation that the strict parameter settings of string search meant search terms matched only to whole words, not substrings; this increased the number of terms that evaluators used and potentially reduced the search accuracy, compared to a string search version that matches also to substrings.

There is limited literature on the potential impact of EHR search tools on day-to-day clinical care [29]. Our results support those of previous studies [20-22] which find that semantic search tools allow faster and more accurate EHR task completion in simulated clinical workflows. A related study showed Artificial Intelligence-optimised patient records to give speed improvements in answering clinical questions whilst maintaining the same accuracy [30]. Interestingly, the impact of the patient record search engine MorphoSaurus has been measured in a real clinical setting [31], albeit with user surveys only. This method would have had the benefit of involving real-world stresses such as task interruptions and time pressure, as well as the key element of patient interaction. Importantly, however, our method of using a controlled simulated clinical environment enabled us to control for variables such as distractions or interruptions, as well as variation in the complexity and length of medical records. Additionally, our crossover design controlled for individual participants’ ability, experience and diligence. This enabled robust comparison of quantitative and qualitative data for each search type, while minimising the impact of confounding factors.

Overall, evaluator feedback suggested that the optimum approach to navigating clinical notes is a hybrid of manual browsing and search, depending on the context. In the real-world, NLP-enhanced search is likely best employed as a complementary tool to aid clinical users in navigating clinical notes, with the ability to manually parse and ingest relevant facts from a complex medical history remaining important.

## 5 Conclusion

In conclusion, this study suggests that search tools have a positive effect on both the measured and perceived accuracy and ease of clinical IR. Search tools that can leverage NLP techniques are more effective for retrieving all relevant terms from heterogeneous medical free text. There is potential to reduce clinicians’ cognitive burden and make clinical workflows more efficient. A critical direction for future research is to assess the use of search tools in real-world clinical practice.

## Data Availability

All data used is non-publicly available data.

## Authors’ Contributions

*   **Eunsoo H Park** co-designed the study, co-designed the patient histories, reviewed the synthetic patient notes, designed the tasks, designed the clinical feedback survey, organised evaluator recruitment, recorded training materials for evaluators, performed preliminary analysis of the findings, and contributed to the paper draft.

*   **Hannah I Watson** co-designed the study, co-designed the patient histories, created the synthetic patient notes, reviewed the tasks, reviewed the clinical feedback survey, supported evaluator recruitment, organised the infrastructure for the practical evaluation, contributed to & reviewed the analysis, and contributed to & reviewed the paper draft.

*   **Felicity V Mehendale** co-designed the study, reviewed the patient histories, reviewed the synthetic patient notes, reviewed the tasks, reviewed the clinical feedback survey, reviewed the analysis, and contributed to & reviewed the paper draft.

*   **Alison Q O’Neil** co-designed the study, organised provision of the NLP-enhanced search, reviewed the tasks, reviewed the clinical feedback survey, contributed to & reviewed the analysis, and contributed to & reviewed the paper draft.

## Funding

This work is part of the Industrial Centre for AI Research in digital Diagnostics (iCAIRD), which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) project number 104690. FV Mehendale’s research at the University of Edinburgh is supported by the Caledonian Heritable Foundation.

## Conflicts of Interest

HI Watson and AQ O’Neil are employees of Canon Medical Research Europe, who provided the software and algorithms for this evaluation. EH Park was sponsored by Canon Medical Research Europe during her Spring 2021 BSc research project at the University of Edinburgh (“Evaluation of a natural language processing algorithm for searching medical documents”) which was the basis for this evaluation. EH Park had previously performed paid annotation work for the development of the NLP-enhanced search tool.

## Acknowledgements

Thank you to Professor Keith W Muir (Institute of Neuroscience & Psychology, University of Glasgow) for his clinical insights during the development of the NLP-enhanced search tool. We would like to thank the West of Scotland Safe Haven within NHS Greater Glasgow and Clyde for assistance in creating and providing a dataset which was used during development of the NLP-enhanced search tool.

Many thanks to the Canon Medical Research Europe staff who developed the infrastructure required for this evaluation: Yvonne Belton, Michael Corrigan, Vismantas Dilys, Francisco Gomez, Graham Jones, Hamish MacKinnon, David Miller, Emel Muzaç, Paul Norman, and Euan Robertson. Further, we would like to acknowledge the research team which was responsible for creating the NLP-enhanced search tool: Murray Cutforth, Vismantas Dilys, Matúš Falis, Aneta Lisowska, Hamish MacKinnon, Maciej Pajak, Alison O’Neil, and Hannah Watson.

Thank you to our evaluators: Fiona Auld, Anna Barton, Rong Bing, Cameron Brown, Khai Syuen Chew, Jane Yi Chiam, Vanessa Chou, Luisa Ciriello, George Cooper, Iona Cutworth, Jamie Donachie, Vivienne Evans, Magdalena Gabrysiak, Eilidh Gunn, Mohamed Hamed, Hamzah Hanif, Ewen Harrison, Kylla Hernandez, Lana Huang, Katie Hunter, Haider Khan, David Kluth, Niki Kouvroukoglou, Barbora Krivankova, Tommy Le, Charles Leeson-Payne, Alinah Sum-Ping Leung, Jenny Lockhart, Jack Lugue, Angus MacLeod, Tomos Morgan, Ellen Murgitroyd, Sarah Murphy, Helen O’Neil, Yusuke Onishi, Lisa Ragunathan, Nikita Rana, Qi Shun Yong, Lucy Taylor, Evangelos Tzolos, Miriam Veenhuizen, Philippa Veenhuizen, Olivia Yu, Sydney Zides.

Thank you to our pre-trial evaluators: Marcus Boyd, Elizabeth Daly, Greta Economides, Keziah Lewis, Abhishek Nambiar, Sumrah Naqvi, Risako Sakatsume, Faye Sikora, Emma Warburton.

Thank you to our internal Canon reviewers: William Clackett, Russell Hung and David Miller.

Many thanks to MTSamples for permitting free use and modification of their data to create the patient case studies.

## Footnotes

*   a [https://markjs.io/](https://markjs.io/)

## Abbreviations

EHR
:   Electronic Health Record
IR
:   Information Retrieval
NLP
:   Natural Language Processing
UMLS
:   Unified Medical Language System

*   Received May 24, 2022.
*   Revision received May 24, 2022.
*   Accepted June 1, 2022.


*   © 2022, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/)

## References

1.  1.Holanda AA, e Sá HL, Vieira AP, et al. Use and satisfaction with electronic health record by primary care physicians in a health district in Brazil. J Med Syst; 2012 Oct;36(5):3141–9. [https://doi.org/10.1007/s10916-011-9801-3](https://doi.org/10.1007/s10916-011-9801-3)
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22072279&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

2.  2.King J, Patel V, Jamoom EW, et al. Clinical benefits of electronic health record use: national findings. Health Serv Res; 2014 Feb;49(1pt2):392–404. [https://doi.org/10.1111/1475-6773.12135](https://doi.org/10.1111/1475-6773.12135)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/1475-6773.12135&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24359580&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000330995400006&link_type=ISI) 

3.  3.Burke HB, Sessums LL, Hoang A, et al. Electronic health records improve clinical note quality. J Am Med Inform Assoc; 2015 Jan 1;22(1):199–205. [https://doi.org/10.1136/amiajnl-2014-002726](https://doi.org/10.1136/amiajnl-2014-002726)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/amiajnl-2014-002726&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25342178&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

4.  4.Entzeridou E, Markopoulou E, Mollaki V. Public and physician’s expectations and ethical concerns about electronic health record: Benefits outweigh risks except for information security. Int J Med Inform; 2018 Feb 1;110:98–107. [https://doi.org/10.1016/j.ijmedinf.2017.12.004](https://doi.org/10.1016/j.ijmedinf.2017.12.004)
    
    
5.  5.Kroth PJ, Morioka-Douglas N, Veres S, et al. Association of electronic health record design and use factors with clinician stress and burnout. JAMA Netw Open. 2019 Aug 2;2(8):e199609.. [https://doi.org/10.1001/jamanetworkopen.2019.9609](https://doi.org/10.1001/jamanetworkopen.2019.9609)
    
    
6.  6.Starren JB, Tierney WM, Williams MS, et al. A retrospective look at the predictions and recommendations from the 2009 AMIA policy meeting: did we see EHR-related clinician burnout coming?. J Am Med Inform Assoc; 2021 May;28(5):948–54. [https://doi.org/10.1093/jamia/ocaa320](https://doi.org/10.1093/jamia/ocaa320)
    
    
7.  7.Yan Q, Jiang Z, Harbin Z, et al. Exploring the relationship between electronic health records and provider burnout: A systematic review. J Am Med Inform Assoc; 2021 May;28(5):1009–21. [https://doi.org/10.1093/jamia/ocab009](https://doi.org/10.1093/jamia/ocab009)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jamia/ocab009&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33659988&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

8.  8.Sinsky C, Colligan L, Li L, et al. Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann Intern Med; 2016 Dec 6;165(11):753–60. [https://doi.org/10.7326/M16-0961](https://doi.org/10.7326/M16-0961)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/M16-0961&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27595430&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

9.  9.Joukes E, Abu-Hanna A, Cornet R, et al. Time spent on dedicated patient care and documentation tasks before and after the introduction of a structured and standardized electronic health record. Appl Clin Inform; 2018 Jan;9(01):046–53. [https://doi.org/10.1055/s-0037-1615747](https://doi.org/10.1055/s-0037-1615747)
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

10. 10.Beasley JW, Wetterneck TB, Temte J, et al. Information chaos in primary care: implications for physician performance and patient safety. J Am Board Fam Med; 2011 Nov 1;24(6):745–51. [https://doi.org/10.3122/jabfm.2011.06.100255](https://doi.org/10.3122/jabfm.2011.06.100255)
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToiamFiZnAiO3M6NToicmVzaWQiO3M6ODoiMjQvNi83NDUiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wNi8wMS8yMDIyLjA1LjI0LjIyMjc1NDkwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

11. 11.Blijleven V, Koelemeijer K, Jaspers M. Identifying and eliminating inefficiencies in information system usage: A lean perspective. Int J Med Inform; 2017 Nov 1;107:40–7. [https://doi.org/10.1016/j.ijmedinf.2017.08.005](https://doi.org/10.1016/j.ijmedinf.2017.08.005)
    
    
12. 12.Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform; 2008;17(01):128–44.
    
    
13. 13.Sinha S, McDermott F, Srinivas G, et al. Use of abbreviations by healthcare professionals: what is the way forward?. Postgrad Med J; 2011 Jul 1;87(1029):450–2. [https://doi.org/10.1136/pgmj.2010.097394](https://doi.org/10.1136/pgmj.2010.097394)
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTI6InBvc3RncmFkbWVkaiI7czo1OiJyZXNpZCI7czoxMToiODcvMTAyOS80NTAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wNi8wMS8yMDIyLjA1LjI0LjIyMjc1NDkwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

14. 14.Turchin A, Chu JT, Shubina M, et al. Identification of misspelled words without a comprehensive dictionary using prevalence analysis. In AMIA Annu Symp Proc; 2007 (Vol. 2007, p. 751). American Medical Informatics Association.
    
    
15. 15.Zhou L, Mahoney LM, Shakurova A, et al. How many medication orders are entered through free text in EHRs?-a study on hypoglycemic agents. In AMIA Annu Symp Proc; 2012 (Vol. 2012, p. 1079). American Medical Informatics Association.
    
    
16. 16.Farri O, Pieckiewicz DS, Rahman AS, et al. A qualitative analysis of EHR clinical document synthesis by clinicians. In AMIA Annu Symp Proc; 2012 (Vol. 2012, p. 1211). American Medical Informatics Association.
    
    
17. 17.Grabenbauer L, Skinner A, Windle J. Electronic Health Record Adoption–Maybe It’s not about the Money. Appl Clin Inform; 2011;2(4):460–71. [https://doi.org/10.4338/ACI-2011-05-RA-0033](https://doi.org/10.4338/ACI-2011-05-RA-0033)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4338/ACI-2011-05-RA-0033&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23616888&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

18. 18.Yang L, Mei Q, Zheng K, et al. Query log analysis of an electronic health record search engine. In AMIA Annu Symp Proc; 2011 (Vol. 2011, p. 915). American Medical Informatics Association.
    
    
19. 19.Ruppel H, Bhardwaj A, Manickam RN, et al. Assessment of electronic health record search patterns and practices by practitioners in a large integrated health care system. JAMA Netw Open; 2020 Mar 2;3(3):e200512..
    
    
20. 20.Duftschmid G, Rinner C, Kohler M, et al. The EHR-ARCHE project: Satisfying clinical information needs in a Shared Electronic Health Record System based on IHE XDS and Archetypes. Int J Med Inform; 2013 Dec 1;82(12):1195–207.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijmedinf.2013.08.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23999002&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 

21. 21.Tawfik AA, Kochendorfer KM, Saparova D, et al. Using semantic search to reduce cognitive load in an electronic health record. In: 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services; 2011 Jun 13 (pp. 181–184). IEEE.
    
    
22. 22.Hasan SA, Zhu X, Liu J, et al. Ontology-Driven Semantic Search for Brazilian Portuguese Clinical Notes. Stud Health Technol Inform; 2015 Jan 1;216:1022-.
    
    
23. 23.Hill JR, Visweswaran S, Ning X, et al. Use, Impact, Weaknesses, and Advanced Features of Search Functions for Clinical Use in Electronic Health Records: A Scoping Review. Appl Clin Inform; 2021 May;12(03):417–28.
    
    
24. 24.Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res; 2004 Jan 1;32(Suppl_1):D267–70.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkh061&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14681409&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000188079000061&link_type=ISI) 

25. 25.Wikipedia. List of Medical Abbreviations. Available from: [https://en.wikipedia.org/wiki/List\_of\_medical\_abbreviations](https://en.wikipedia.org/wiki/List_of_medical_abbreviations) [Accessed 31st January 2022]
    
    
26. 26.OpenMD. Medical Abbreviations & Acronyms. [https://openmd.com/dictionary/medical-abbreviations](https://openmd.com/dictionary/medical-abbreviations) [Accessed 23rd February 2022]
    
    
27. 27.MTSamples. Medical documents. Available from: [https://www.mtsamples.com/index.asp](https://www.mtsamples.com/index.asp) [Accessed 31st January 2022]
    
    
28. 28.PassMedicine. Available from: [https://passmedicine.com/index.php](https://passmedicine.com/index.php) [Accessed 31st January 2022]
    
    
29. 29.Natarajan K, Stein D, Jain S, et al. An analysis of clinical queries in an electronic health record search utility. Int J Med Inform; 2010 Jul 1;79(7):515–22. [https://doi.org/10.1016/j.ijmedinf.2010.03.004](https://doi.org/10.1016/j.ijmedinf.2010.03.004)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijmedinf.2010.03.004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20418155&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F06%2F01%2F2022.05.24.22275490.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000278193500006&link_type=ISI) 

30. 30.Chi EA, Chi G, Tsui CT, et al. Development and Validation of an Artificial Intelligence System to Optimize Clinician Review of Patient Records. JAMA Netw Open; 2021 Jul 1;4(7):e2117391..
    
    
31. 31.Schulz S, Daumke P, Fischer P, et al. Evaluation of a document search engine in a clinical department system. In AMIA Annu Symp Proc; 2008 (Vol. 2008, p. 647). American Medical Informatics Association.