research-article

Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

Authors:
Mi-Young Kim

University of Alberta, Canada

University of Alberta, Canada
View Profile

,
Ying Xu

University of Alberta, Canada

University of Alberta, Canada
View Profile

,
Osmar R. Zaiane

University of Alberta, Canada

University of Alberta, Canada
View Profile

,
Randy Goebel

University of Alberta, Canada

University of Alberta, Canada
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 6 Issue 4Article No.: 59pp 1–23https://doi.org/10.1145/2651444

Published:24 July 2015Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

We explore methods for effectively extracting information from clinical narratives that are captured in a public health consulting phone service called HealthLink. Our research investigates the application of state-of-the-art natural language processing and machine learning to clinical narratives to extract information of interest. The currently available data consist of dialogues constructed by nurses while consulting patients by phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise. When we extract the patient-related information from the noisy data, we have to remove or correct at least two kinds of noise: explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, and variants of terms, and implicit noise, which includes non-patient information and patient's untrustworthy information. To filter explicit noise, we propose our own biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms, temperature, and other types of named entities (which show patients’ personal information such as age and sex), we propose a bootstrapping-based pattern learning process to detect a variety of arbitrary variations of named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information, and we visualize the named entities by constructing a graph that shows the relations between named entities. The objective of this knowledge discovery task is to identify associations between biomedical terms and to clearly expose the trends of patients’ symptoms and concern; the experimental results show that we achieve reasonable performance with our noise reduction methods.

References

ACE. 2008. Automatic Content Extraction. English annotation guidelines for relations. Linguistic Data Consortium, version 6.0--2008.01.07 edition. Retrieved from http: //www.ldc.upenn.edu/Projects/ACE/.Google Scholar
A. R. Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In Proceedings of AMIA Symposium. 17--21.Google Scholar
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H. P. Kriegel. 2008. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 23, 9, 207.Google Scholar
A. J. Butte and R. Chen. 2006. Finding disease-related genomic experiments within an international repository: First steps in translational bioinformatics. In Proceedings of the AMIA Annual Symposium. 106--110.Google Scholar
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, NY, 101--110. Google ScholarDigital Library
A. X. Chang and C. D. Manning. 2012. SUTIME: A library for recognizing and normalizing time expressions. In Proceedings of the Eight International Conference on Language Resources and Evaluation. Istanbul, Turkey, 3735--3740.Google Scholar
H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. 2006. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In Proceedings of the Pacific Symposium on Biocomputing. 4--15.Google Scholar
M. Dai, N. H. Shah, W. Xuan, M. A. Musen, S. J. Watson, B. D. Athey, and F. Meng. 2008. An efficient solution for mapping free text to ontology terms. In Proceedings of the AMIA Summit on Translational Bioinformatics. 21.Google Scholar
F. J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 3, 171--176. Google ScholarDigital Library
J. C. Denny, J. F. Peterson, N. N. Choma, H. Xu, R. A. Miller, L. Bastarache, and N. B. Peterson. 2010. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. Journal of the American Medical Information Association 17, 4, 383--8.Google ScholarCross Ref
R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 shared task: Learning to detect hedges and their scope in natural language text. In Proceedings of the 14th CoNLL Conference -- Shared Task. 1--12. Google ScholarDigital Library
M. Fiszman, W. Chapman, D. Aronsky, R. Evans, and P. Haug. 2000. Automatic detection of acute bacterial pneumonia from chest X-ray reports. Journal of the American Medical Information Association 7, 6, 593--604.Google ScholarCross Ref
S. Gaudan, A. Jimeno Yepes, V. Lee, and D. Rebholz-Schuhmann. 2008. Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text. EURASIP Journal on Bioinformatics and Systems Biology 8, 1, 1--9. Google ScholarDigital Library
T. Hao. 2012. Bootstrap-based equivalent pattern learning for collaborative question answering. LNCS, 318--329. Google ScholarDigital Library
A. Holzinger, R. Geierhofer, F. Modritscher, and R. Tatzl. 2008. Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. Journal of Universal Computer Science 14, 22, 3781--3795.Google Scholar
A. Holzinger, K. M. Simonic, and P. Yildirim. 2012. Disease-disease relationships for rheumatic diseases: Web-based biomedical textmining an knowledge discovery to assist medical decision making. In Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC). 573--580. Google ScholarDigital Library
A. Holzinger, P. Yildirim, M. Geier, and K.-M. Simonic. 2013. Quality-based knowledge discovery from medical text on the web. In Quality Issues in the Management of Web Information, Intelligent Systems Reference Library, ISRL 50. Springer, Berlin, 145--158.Google Scholar
Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214. Google ScholarDigital Library
A. Jimeno, E. Jimenez-Ruiz, V. Lee, S. Gaudan, R. Berlanga, and D. Rebholz-Schuhmann. 2008. Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9, Suppl 3, S3.Google ScholarCross Ref
L. Karttunen and A. Zaenen. 2005. Veridicity. In Proceedings of the Dagstuhl Seminar. Retrieved from http://drops.dagstuhl.de/opus/volltexte/2005/314/pdf/05151.KarttunenLauri.Paper.314.pdf.Google Scholar
J. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. 1--9. Google ScholarDigital Library
Z. Kozareva and E. Hovy. 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL. 1482--1491. Google ScholarDigital Library
M. Li and J. Patrick. 2012. Extracting temporal information from electronic patient records. In Proceedings of the AMIA Annual Symposium. 542--551.Google Scholar
X. Ling and D. S. Weld. 2010. Temporal information extraction. In Proceedings of the 24th Conference on Artificial Intelligence (AAAI). 1385--1390.Google Scholar
T. McIntosh. 2010. Unsupervised discovery of negative categories in lexicon bootstrapping. EMNLP 356--365. Google ScholarDigital Library
A. Mottaz, Y. L. Yip, P. Ruch, and A. Veuthey. 2007. Mapping protein information to disease terminologies. Journal of Integrative Bioinformatics 4, 3, 79.Google ScholarCross Ref
F. Mougin, A. Burgun, and O. Bodenreider. 2006. Mapping data elements to terminological resources for integrating biomedical data sources. BMC Bioinformatics 7, S3.Google ScholarCross Ref
N. Nakashole, M. Theobald, and G. Weikum. 2010. Find your advisor: Robust knowledge gathering from the web. In Proceedings of the 13th International Workshop on the Web and Databases. 6. Google ScholarDigital Library
A. Névéol, W. Kim, John W. Wilbur, and Z. Lu. 2009. Exploring two biomedical text genres for disease recognition. In Proceedings of the Workshop on BioNLP. 144--152. Google ScholarDigital Library
J. Pustejovsky, M. Verhagen, R. Saurí, J. Littman, R. Gaizauskas, G. Katz, I. Mani, R. Knippen, and A. Setzer. 2006. TimeBank 1.2. Linguistic Data Consortium, LDC2006T08.Google Scholar
E. Riloff and J. Shepherd. 1997. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing. Providence, RI, 117--124.Google Scholar
E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multilevel bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference. 474--479. Google ScholarDigital Library
S. Robertson and S. Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th ACM Conference on Research and Development in Information Retrieval (SIGIR'94). ACM Press, 232--241. Google ScholarDigital Library
P. Ruch, R. Baud, and A. Geissbuhler. 2003. Using lexical disambiguation and named entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29, 12, 169--184. Google ScholarDigital Library
R. Saurí and J. Pustejovsky. 2012. Are you sure that this happened? Assessing the factuality degree of events in text. Computational Linguistics 38, 2, 261--299. Google ScholarDigital Library
M. Skeppstedt, M. Kvist, and H. Dalianis. 2012. Rule-based entity recognition and coverage of SNOMED-CT in Swedish clinical text. LREC 1250--1257.Google Scholar
J. Strötgen and M. Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324. Google ScholarDigital Library
L. K. Tanabe and W. J. Wilbur. 2006. A priority model for named entities. In Proceedings of HLT-NAACL BioNLP Workshop. 33--40. Google ScholarDigital Library
Ö. Uzuner, B. South, S. Shen, and S. DuVall. 2010. i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Information Association 18, 5, 552--556.Google Scholar
Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. 2010. Timely Yago: Harvesting, querying, and visualizing temporal knowledge from Wikipedia. In EDBT. 697--700. Google ScholarDigital Library
P. Willet. 1988. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management 24, 577--597. Google ScholarDigital Library
H. Yu and E. Agichtein. 2003. Extracting synonymous gene and protein terms from biological literature. Bioinformatics 19, 1, i340--i349.Google ScholarCross Ref
A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. 2005. Biocreative task 1a: Gene mention finding evaluation. BMC Bioinformatics 6, Suppl.1, S2.Google ScholarCross Ref

Index Terms

Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Named Entity Recognition Experiments on Turkish Texts
FQAS '09: Proceedings of the 8th International Conference on Flexible Query Answering Systems

Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern ...
Read More
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 4
Regular Papers and Special Section on Intelligent Healthcare Informatics
August 2015
419 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2801030
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2015
- Accepted: 1 July 2014
- Revised: 1 May 2014
- Received: 1 October 2013
Published in tist Volume 6, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Tele-health mining
biomedical text mining
effective information retrieval
named entity recognition
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 367
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Learning multilingual named entity recognition from Wikipedia

Named Entity Recognition Experiments on Turkish Texts

Two-stage approach to named entity recognition using Wikipedia and DBpedia