Natural Language Word-Embeddings as a glimpse into healthcare at the End Of Life
================================================================================

* Shun Lau
* Zeljko Kraljevic
* Mohammad Al-Agil
* Shelley Charing
* Alan Quarterman
* Harold Parkes
* Victoria Metaxa
* Katherine Sleeman
* Wei Gao
* Richard Dobson
* James T Teo
* Phil Hopkins

## Abstract

**Introduction** Planning in advance and personalised discussions on limitation of life sustaining treatment (LST) is an indicator of good care. However, there are many linguistic nuances and misunderstandings around dying in hospital as well as inaccuracy in individual-level prognostication.

**Methods** Using unsupervised natural language processing (NLP), we explored real-world terminology using phrase clusters with most similar sematic embeddings to “Ceiling of Treatment” and their prognostication value in the electronic health record of an urban teaching hospital.

**Results** Word embeddings with most similar to “Ceiling of Treatment” clustered around phrases describing end-of-life care, ceiling of care and resuscitation discussions. The phrases have differing prognostic profile with the highest 7-day mortality in the phrases most implicitly referring to end of life -“terminal care”, “end of life care” (57.5%) and “unsurvivable” (57.6%).

**Conclusion** NLP can quantify and analyse real-world end of life discussions around prognosis and appropriate LST.

**Patient-friendly Summary** (by expert patients: Sherry Charing, Alan Quarterman, Harold Parkes)

Discussions between doctors, patients and family in deciding what is the appropriate maximum treatment a specific patient should have based on their clinical condition is complex. Discussions, often involving expressions regarding “End Of Life” care are used to describe the maximum invasive treatments a patient should have or would want. There are a range of expressions used, many with overlapping meanings which can be confusing, not only for the patient and family, but also for doctors reading the patient’s clinical notes. In this study, a computational approach using Artificial Intelligence to read clinical patient notes was carried out by looking at thousands of patient records from a large urban hospital. Expressions that doctors use to describe these discussions were analysed to show the associations of particular words and phrases in relation to mortality. Using a computer analysis for this study it was possible to quantify the use of these expressions and their relation to the “End Of Life”. Through this AI-based approach, real-world use of phrases and language relating “End Of Life” can be analysed to understand how doctors and patients are communicating, and about any possible misunderstandings of language.

## Introduction

Planning in advance for ‘End Of Life’ care is a complex and sensitive area of healthcare, and there is significant room for misunderstandings1–3. However, such discussions and advance decisions can be mishandled without personalised counselling as misperceptions may arise about what kinds of treatments are referred to4. Phrases such as ‘ceiling of treatment’ and ‘treatment escalation plans’ attempt to clarify in more detail the context and the conversation of the different types of treatments being discussed. This has been supplemented by additional healthcare intervention approaches to improve standardisation of documentation of teams transcribing and transferring information relating to ceiling of treatment5,6. As a result, there has been an expansion in the vocabulary around advanced directives and end of life care.

Traditional approaches using standardised forms or integrated care pathways to record such sensitive advance care plans have been extremely helpful in recording such complex personalised discussions between healthcare professionals with patients, families and carers7. Many of such advance care plans are now captured in standardised electronic form templates often with details captured in typed free-text narrative. Often words and phrases in advance care plans have very specific technical meanings to a specialist which may not match intended meaning as interpreted by a non-specialist or a non-medical individual. Conventionally, studies in this domain have often used qualitative methodologies to disentangle this8–10.

To address this quantitative research gap, a computational linguistic approach was used called Natural Language Processing (NLP) to analyse large amounts using unsupervised algorithms to detect patterns in the use of words and phrases, The first approach used a data-driven technique called ‘word2vec’ to represent words from a large body of text in a multi-dimensional vector space (‘latent space’), based on the contextual use of surrounding words11. With a sufficient body of text, these ‘word embeddings’ begin to cluster and words that cluster together often have similar meaning. These embeddings therefore follow the philosophical principle first coined by Ludwig Wittgenstein in 1953*”*… *the meaning of a word is its use in the language”*12. This ecological data-driven approach has the advantage of also capturing jargon, acronyms and unconventional language that are being used in the real-world.

Using this data-driven approach in a large body of anonymised electronic clinical text at a large urban hospital in London, we analysed whether words or phrases (‘word embeddings’) discussing advance care planning and ceilings of treatment have similar semantic clusters. We also test whether there is any correlation of these ‘word embeddings’ with mortality, and how ‘word embeddings’ are abstracted by AI into ‘concept embeddings’.

## Results

### Word embeddings

The root n-gram “ceiling of treatment” was selected *a priori* by the healthcare team (see Methods), and the 40 n-grams (up to 4 tokens) non-nonsense fragments most closely associated with the root n-gram: “ceiling of treatment” was obtained and shown below. Nonsense fragments included mentions of prescribed drugs like midazolam.

View this table:
[Table1](http://medrxiv.org/content/early/2021/07/24/2021.07.22.21260874/T1)

### Relationship with outcome

The n-grams above were then divided along phrases with similar meaning (poicelonym), and then repeated word/phrase searches were performed in the whole 2019 inpatient dataset at Kings College Hospital to provide aggregated unique patients with those phrases. There were two broad groups of phrases with similar meaning – phrases relating to the “ceiling of treatment” clusters and “End Of Life”. This is summarised in Table 1 together with proportions with recorded dates of death.

View this table:
[Table 1:](http://medrxiv.org/content/early/2021/07/24/2021.07.22.21260874/T2)

Table 1: 
Word and phrase counts per inpatient were searched across all inpatient records along groups of similar semantics and linked to whether there was an associated date of death. Absolute and relative risk are derived from these absolute values.

Phrases indicating “End of Life” and “Terminal” clearly had higher rates of mortality since it is implicit in their meaning, whereas terms referring to different limitations of LST had more intermediate prognosis. It is noteworthy that the preferred hospital protocol term to describe such discussions and plans in the hospital -“Treatment Escalation Plan” was extremely common (>3k inpatients). However, this appeared to be used as a heading phrase, as it did not contain any semantic meaning on what the level of advance care was agreed. As a result, the 7-day mortality with “Treatment Escalation Plan” was extremely low. This suggests that these discussions are not foregone conclusions and that having such discussions does not carry an implicit implication of early mortality.

### Concept embeddings

To correct for any mis-spellings and typographical errors, the word embeddings were converted to MedCAT concept embeddings and trained against the entire corpus. To visualise the semantic relationships between these concept embeddings, a t-distributed stochastic neighbour embedding (t-SNE) was used to reduce a high-dimensional vector (300 dimensions) into a 2 dimensional in Figure 113. There are four broad groups which only partially follow the clinical groupings used in Table 1. Of note, the regions outlined in green and red in this 2-dimensional semantic space in Figure 1 correspond to the ‘End of Life’ grouping in Table 1 where the outcomes are poorest. Less discrete clusters in the Blue regions with n-grams of overlapping outcomes describing limits of appropriate interventions similar in meaning to the Ceiling of Care group.

![Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/24/2021.07.22.21260874/F1.medium.gif)

[Figure 1](http://medrxiv.org/content/early/2021/07/24/2021.07.22.21260874/F1)

Figure 1 
showing the clusters of concept embeddings on a t-distributed stochastic neighbour embedding (TSNE) plot in two-dimensions (X and Y). X and Y represent synthetic dimensions derived from the word embeddings, and is analogous to principle components in a Principal Component Analysis (PCA). Regions of clustering are expanded for clarity with green-red clusters corresponding most similarly to End Of Life Care while blue cluster corresponding to Ceiling of Care. TSNE plot is available as dynamic figure in Supplementary HTML file.

## Discussion

We present the first quantitative NLP evaluation of the language used in real world discussions about ceiling of treatments and End Of Life care. Phrases recording clinical care from the treating medical team had substantial varied language describing advance care planning ranging from specific interventions to terminal prognostication. This study also showed that unsupervised word-embedding techniques (Word2Vec and MedCAT) were able to produce clusters of phrases which reflect phrases of similar meaning using dimensionality reduction techniques. With a sufficiently large corpus of data, such unsupervised NLP techniques were able to capture implicit and inferred poor prognosis.

This study therefore has an inverted design to a previous Sentiment Analysis study of nursing notes from the MIMIC-III public ICU dataset which found a relationship between such ‘sentiment’ with survival14; the ‘sentiment’ was calculated using a rules-based semantic analysis tool (TextBlob15) designed for generic non-clinical text which assigns a positive or negative ‘sentiment’ score to a piece of text based on the adjectives, verbs and adverbs used in the text16,17. In the current study, both an *a priori* approach and an unsupervised clustering approach were used showing clear associations with the ‘ground truth’ of mortality. The derivation of ‘sentiment’ on prognosis from real world clinical text also makes this much more ecological rather than using rule-based text analysis designed for non-clinical uses.

One significant limitation is that this study did not explore temporal trends in prognosis or embeddings. The scope of this study was the ceiling of treatments towards the end of life and so the focus was very much on the discussions and words used very near the end of life (i.e. within the next 7 days). This narrows the vocabulary for prognosis without introducing noise around the vocabulary of tenses and accuracy of time-course prognostication. Another limitation is the lack of distinction between the different types of ceiling of treatment scenarios; it is likely a ceiling of treatment discussion about an elderly disabled patient is substantially different to that of a young patient with a terminal illness or a sudden traumatic event. Both aspects could be improved on with an expanding corpus as well as exploring the temporal relationship with medical and palliative interventions.

During this study, typographical errors and metonymic variations on free text data entry was frequently detected, requiring an addition of a concept embedding approach. These variations in typing suggest clinicians do not simply copy-and-paste templated thoughts for a very ill patient but instead provide contextualised care to the individual (with manually composed typing) even in an era of increasing standardisation of care pathways.

In summary, this is the first real-world NLP study of ‘End Of Life’ care, mapping out how clinical language is used to describe ‘End Of Life’ discussions as well as to produce syntactic phrase or word clusters that capture information on prognosis. This study introduces quantitative NLP techniques into a field which has traditionally used qualitative approaches. Future work could explore the use of language in different professional groups or explore the temporality of interventions before and after such discussions.

## Methods

### Governance

The project operated under London South East Research Ethics Committee (reference 18/LO/2048) approval granted to the King’s Electronic Records Research Interface (KERRI); specific work on end-of-life care research was reviewed with expert patient input on a virtual committee with Caldicott Guardian oversight. Patient and public engagement was sought throughout this project with expert patients approving the projects as well as writing this article.

### Unsupervised Word and Concept Embeddings

The corpus of records consisted of ∼13 million out of a total of 18 million documents over ∼20 years (2001 to 2020) in the Cogstack platform at Kings College Hospital18 which pooled data from the structured and unstructured components of the electronic health record using the CogStack ecosystem tools such as DrugPipeline19, MedCAT20 and MedCATTrainer21. This includes all inpatient and outpatient document text with the exclusion of form checklists and scanned documents of insufficient legibility (∼5 million).

The text was first split into words, then put through a phraser which merged separate tokens into phrases before being passed through *word2vec11*. 2,3,4-Grams were calculated from the text using MedCAT which internally relies on gensim22. This allowed us to get embeddings for phrases and not just single words. Given a root n-gram (“Ceiling of Treatment”), the most similar n-grams based on the cosine distance between their vector embeddings were collected and again clustered. This resulted in 32 unique phrases.

### Explanation of grouping of Concept-Embeddings into meaning groups

After the top concepts were identified, these tokens were presented to the 3 healthcare professionals (one critical care physician, one palliative care physician and one neurologist) to group into phrases of similar meaning. These phrase groupings were: “ceiling of care” clusters and “end of life”.

A Cogstack elastic query was then performed for phrases within these clusters to generate total aggregate counts of unique inpatients with documents created in 2019 containing these phrases. The creation dates of the documents were also recorded and matched to whether a date of death was also recorded within 7 days of the date of the recorded phrase. Seven days was chosen to limit the analyses to short-term prognostication. Dates of death were recorded based on the inpatient certification of death by doctor. As a control, all documents in the same time period without these phrases were used. The short time-window provides confidence on accuracy on mortality data as any under-counting of outpatient mortality would not significantly impact the data.

### Visualisation of Concept Embeddings

All selected phrases were also converted into MedCAT concepts and the unsupervised training for concepts was again rerun on a dataset consisting of ∼13M documents. This is to avoid problems with spelling mistakes and metonyms and slight variations in the phrasing.

To visualise the relationship between the chosen concepts, t-distributed stochastic neighbour embedding (t-SNE) was used to reduce a high-dimensional vector (300 dimensions) into a 2 dimensional space13. In summary, t-SNE converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. This plot ensures that word embeddings that are close in the high-dimensional space remain close in low-dimensional representation. An alternative dimensional reduction technique (Uniform Manifold Approximation and Projection, UMAP23) was also tested and is available as a Supplementary File.

## Supporting information

Supplementary Figure 1 [[supplements/260874_file04.zip]](pending:yes)

## Data Availability

The data are not publicly available as the source data analysed is unstructured textual data, which carries risk of patient re-identification.

## Competing Interests

The authors have received research funding support from the Cicely Saunders Institute on Palliative Care, NIHR Applied Research Centre South London and the NIHR Maudsley Biomedical Research Centre. There are no other relevant competing personal financial interests.

## Author Contributions

Study Design: JTT, PH

Data Collection: JT, ZK, MAA

Data Analysis: JT, ZK

Manuscript Drafting: SL, WG, AQ, HP, SC, SL

Manuscript Criticism: RD, KS, VM, AQ, HP, SC

## Data Availability

The data are not publicly available as the source data analysed is unstructured textual data, which carries risk of patient re-identification.

## Acknowledgements

We would like to thank the Kings Electronic Records Research Interface (KERRI), the Cicely Saunders Institute, the NIHR Applied Research Centre South London and the NIHR Maudsley Biomedical Research Centre.

*   Received July 22, 2021.
*   Revision received July 22, 2021.
*   Accepted July 24, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  O’Dowd A. End of life care services are in limbo after phasing out of Liverpool Care Pathway, MPs hear. BMJ 2015; 350:h386. doi: 10.1136/bmj.h386.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE2OiIzNTAvamFuMjJfNS9oMzg2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDcvMjQvMjAyMS4wNy4yMi4yMTI2MDg3NC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

2.  Seymour J, Clarke D. The Liverpool Care Pathway for the Dying Patient: a critical analysis of its rise, demise and legacy in England. Wellcome Open Research 2018; 3:15. doi: 10.12688/wellcomeopenres.13940.2.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.12688/wellcomeopenres.13940.2&link_type=DOI) 

3.  Booth R. ’Do not resuscitate’ orders caused potentially avoidable deaths, regulator finds. The Guardian. 2020. [http://www.theguardian.com/society/2020/dec/03/do-not-resuscitate-orders-caused-potentially-avoidable-deaths-regulator-finds](http://www.theguardian.com/society/2020/dec/03/do-not-resuscitate-orders-caused-potentially-avoidable-deaths-regulator-finds) (accessed 21 Jun2021).
    
    

4.  Fritz Z, Slowther A-M, Perkins GD. Resuscitation policy should focus on the patient, not the decision. BMJ 2017; 356 j813. doi :10.1136/bmj.j813.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE2OiIzNTYvZmViMjhfNi9qODEzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDcvMjQvMjAyMS4wNy4yMi4yMTI2MDg3NC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

5.  Hamilton IJ. Advance care planning in general practice: promoting patient autonomy and shared decision making. Br J Gen Pract 2017; 67:104.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYmpncCI7czo1OiJyZXNpZCI7czoxMDoiNjcvNjU2LzEwNCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA3LzI0LzIwMjEuMDcuMjIuMjEyNjA4NzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

6.  Davies M, Couper K, Jeyes L, Slater P, Speakman J, Arolker M et al.Successful implementation of the ReSPECT (Recommended Summary Plan for Emergency Care and Treatment) process in a large UK based NHS Trust. Resuscitation 2017; 118:E95– E96.
    
    

7.  Higginson IJ, Koffman J, Hopkins P, Prentice W, Burman R, Leonard S et al. Development and evaluation of the feasibility and effects on staff, patients, and families of a new tool, the Psychosocial Assessment and Communication Evaluation (PACE), to improve communication and palliative care in intensive care and during clinical uncertainty. BMC Med 2013; 11:1–14.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1741-7015-11-1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23281898&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F24%2F2021.07.22.21260874.atom) 

8.  Lim CT, Tadmor A, Fujisawa D, MacDonald JJ, Gallagher ER, Eusebio J et al. Qualitative Research in Palliative Care: Applications to Clinical Trials Work. J Palliat Med 2017; 20:857.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F24%2F2021.07.22.21260874.atom) 

9.  Mistry B, Bainbridge D, Bryant D, Toyofuku ST, Seow H. What matters most for end-of-life care? Perspectives from community-based palliative care providers and administrators. BMJ Open 2015; 5:e007492.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMToiNS82L2UwMDc0OTIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8yNC8yMDIxLjA3LjIyLjIxMjYwODc0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

10. Sleeman KE, Koffman J, Bristowe K, Rumble C, Burman R, Leonard S et al.’It doesn’t do the care for you’: a qualitative study of health care professionals’ perceptions of the benefits and harms of integrated care pathways for end of life care. BMJ Open 2015; 5:e008242.
    
    

11. Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. 2013.[http://arxiv.org/abs/1301.3781](http://arxiv.org/abs/1301.3781) (accessed 21 Jun2021).
    
    

12. Wittgenstein L. Philosophical Investigations: The English Text of the Third Edition. Prentice Hall, 1958.
    
    

13. Van der Maaten LG, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research 11/2008; 9:2579–2605.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00026263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F24%2F2021.07.22.21260874.atom) 

14. Waudby-Smith IER, Tran N, Dubin JA, Lee J. Sentiment in nursing notes as an indicator of out-of-hospital mortality in intensive care patients. PLoS One 2018; 13:e0198687.
    
    

15. Loria S. textblob Documentation. 2020 [https://textblob.readthedocs.io/en/dev/](https://textblob.readthedocs.io/en/dev/) (accessed 21 Jun2021).
    
    

16. Pang B, Lee L. Opinion mining and sentiment analysis. Found Trends(r) Inf Retr 2008; 2:1–135.
    
    

17. 1.  Indurkhya N, 
    2.  Damerau FJ
    
    Liu B. Sentiment Analysis and Subjectivity. In: Indurkhya N, Damerau FJ (eds). Sentiment Analysis and Subjectivity Bing Liu. Chapman and Hall/CRC, 2010.
    
    

18. Jackson R, Kartoglu I, Stringer C, Gorrell G, Roberts A, Song X et al.CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak 2018; 18:47.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F24%2F2021.07.22.21260874.atom) 

19. Bean DM, Teo J, Wu H, Oliveira R, Patel R, Bendayan R et al.Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data. PLoS One 2019; 14:e0225625.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0225625&link_type=DOI) 

20. Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D et al. Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit. Artif Intell Med 2021; 117:102083.
    
    

21. Searle T, Kraljevic Z, Bendayan R, Bean D, Dobson R. MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations. 2019. doi: 10.18653/v1/d19-3024.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.18653/v1/d19-3024&link_type=DOI) 

22. Rehurek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA: Valletta, Malta, 2010, pp 45–50.
    
    

23. McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426. 2020.[https://arxiv.org/abs/1802.03426](https://arxiv.org/abs/1802.03426) (accessed 21 Jun2021).