skip to main content
research-article

Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

Published:24 July 2015Publication History
Skip Abstract Section

Abstract

We explore methods for effectively extracting information from clinical narratives that are captured in a public health consulting phone service called HealthLink. Our research investigates the application of state-of-the-art natural language processing and machine learning to clinical narratives to extract information of interest. The currently available data consist of dialogues constructed by nurses while consulting patients by phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise. When we extract the patient-related information from the noisy data, we have to remove or correct at least two kinds of noise: explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, and variants of terms, and implicit noise, which includes non-patient information and patient's untrustworthy information. To filter explicit noise, we propose our own biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms, temperature, and other types of named entities (which show patients’ personal information such as age and sex), we propose a bootstrapping-based pattern learning process to detect a variety of arbitrary variations of named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information, and we visualize the named entities by constructing a graph that shows the relations between named entities. The objective of this knowledge discovery task is to identify associations between biomedical terms and to clearly expose the trends of patients’ symptoms and concern; the experimental results show that we achieve reasonable performance with our noise reduction methods.

References

  1. ACE. 2008. Automatic Content Extraction. English annotation guidelines for relations. Linguistic Data Consortium, version 6.0--2008.01.07 edition. Retrieved from http: //www.ldc.upenn.edu/Projects/ACE/.Google ScholarGoogle Scholar
  2. A. R. Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In Proceedings of AMIA Symposium. 17--21.Google ScholarGoogle Scholar
  3. M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H. P. Kriegel. 2008. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 23, 9, 207.Google ScholarGoogle Scholar
  4. A. J. Butte and R. Chen. 2006. Finding disease-related genomic experiments within an international repository: First steps in translational bioinformatics. In Proceedings of the AMIA Annual Symposium. 106--110.Google ScholarGoogle Scholar
  5. A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, NY, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. X. Chang and C. D. Manning. 2012. SUTIME: A library for recognizing and normalizing time expressions. In Proceedings of the Eight International Conference on Language Resources and Evaluation. Istanbul, Turkey, 3735--3740.Google ScholarGoogle Scholar
  7. H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. 2006. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In Proceedings of the Pacific Symposium on Biocomputing. 4--15.Google ScholarGoogle Scholar
  8. M. Dai, N. H. Shah, W. Xuan, M. A. Musen, S. J. Watson, B. D. Athey, and F. Meng. 2008. An efficient solution for mapping free text to ontology terms. In Proceedings of the AMIA Summit on Translational Bioinformatics. 21.Google ScholarGoogle Scholar
  9. F. J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 3, 171--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. C. Denny, J. F. Peterson, N. N. Choma, H. Xu, R. A. Miller, L. Bastarache, and N. B. Peterson. 2010. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. Journal of the American Medical Information Association 17, 4, 383--8.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 shared task: Learning to detect hedges and their scope in natural language text. In Proceedings of the 14th CoNLL Conference -- Shared Task. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Fiszman, W. Chapman, D. Aronsky, R. Evans, and P. Haug. 2000. Automatic detection of acute bacterial pneumonia from chest X-ray reports. Journal of the American Medical Information Association 7, 6, 593--604.Google ScholarGoogle ScholarCross RefCross Ref
  13. S. Gaudan, A. Jimeno Yepes, V. Lee, and D. Rebholz-Schuhmann. 2008. Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text. EURASIP Journal on Bioinformatics and Systems Biology 8, 1, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hao. 2012. Bootstrap-based equivalent pattern learning for collaborative question answering. LNCS, 318--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Holzinger, R. Geierhofer, F. Modritscher, and R. Tatzl. 2008. Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. Journal of Universal Computer Science 14, 22, 3781--3795.Google ScholarGoogle Scholar
  16. A. Holzinger, K. M. Simonic, and P. Yildirim. 2012. Disease-disease relationships for rheumatic diseases: Web-based biomedical textmining an knowledge discovery to assist medical decision making. In Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC). 573--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Holzinger, P. Yildirim, M. Geier, and K.-M. Simonic. 2013. Quality-based knowledge discovery from medical text on the web. In Quality Issues in the Management of Web Information, Intelligent Systems Reference Library, ISRL 50. Springer, Berlin, 145--158.Google ScholarGoogle Scholar
  18. Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Jimeno, E. Jimenez-Ruiz, V. Lee, S. Gaudan, R. Berlanga, and D. Rebholz-Schuhmann. 2008. Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9, Suppl 3, S3.Google ScholarGoogle ScholarCross RefCross Ref
  20. L. Karttunen and A. Zaenen. 2005. Veridicity. In Proceedings of the Dagstuhl Seminar. Retrieved from http://drops.dagstuhl.de/opus/volltexte/2005/314/pdf/05151.KarttunenLauri.Paper.314.pdf.Google ScholarGoogle Scholar
  21. J. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Kozareva and E. Hovy. 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL. 1482--1491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Li and J. Patrick. 2012. Extracting temporal information from electronic patient records. In Proceedings of the AMIA Annual Symposium. 542--551.Google ScholarGoogle Scholar
  24. X. Ling and D. S. Weld. 2010. Temporal information extraction. In Proceedings of the 24th Conference on Artificial Intelligence (AAAI). 1385--1390.Google ScholarGoogle Scholar
  25. T. McIntosh. 2010. Unsupervised discovery of negative categories in lexicon bootstrapping. EMNLP 356--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mottaz, Y. L. Yip, P. Ruch, and A. Veuthey. 2007. Mapping protein information to disease terminologies. Journal of Integrative Bioinformatics 4, 3, 79.Google ScholarGoogle ScholarCross RefCross Ref
  27. F. Mougin, A. Burgun, and O. Bodenreider. 2006. Mapping data elements to terminological resources for integrating biomedical data sources. BMC Bioinformatics 7, S3.Google ScholarGoogle ScholarCross RefCross Ref
  28. N. Nakashole, M. Theobald, and G. Weikum. 2010. Find your advisor: Robust knowledge gathering from the web. In Proceedings of the 13th International Workshop on the Web and Databases. 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Névéol, W. Kim, John W. Wilbur, and Z. Lu. 2009. Exploring two biomedical text genres for disease recognition. In Proceedings of the Workshop on BioNLP. 144--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Pustejovsky, M. Verhagen, R. Saurí, J. Littman, R. Gaizauskas, G. Katz, I. Mani, R. Knippen, and A. Setzer. 2006. TimeBank 1.2. Linguistic Data Consortium, LDC2006T08.Google ScholarGoogle Scholar
  31. E. Riloff and J. Shepherd. 1997. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing. Providence, RI, 117--124.Google ScholarGoogle Scholar
  32. E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multilevel bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference. 474--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Robertson and S. Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th ACM Conference on Research and Development in Information Retrieval (SIGIR'94). ACM Press, 232--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Ruch, R. Baud, and A. Geissbuhler. 2003. Using lexical disambiguation and named entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29, 12, 169--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Saurí and J. Pustejovsky. 2012. Are you sure that this happened? Assessing the factuality degree of events in text. Computational Linguistics 38, 2, 261--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Skeppstedt, M. Kvist, and H. Dalianis. 2012. Rule-based entity recognition and coverage of SNOMED-CT in Swedish clinical text. LREC 1250--1257.Google ScholarGoogle Scholar
  37. J. Strötgen and M. Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. K. Tanabe and W. J. Wilbur. 2006. A priority model for named entities. In Proceedings of HLT-NAACL BioNLP Workshop. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ö. Uzuner, B. South, S. Shen, and S. DuVall. 2010. i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Information Association 18, 5, 552--556.Google ScholarGoogle Scholar
  40. Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. 2010. Timely Yago: Harvesting, querying, and visualizing temporal knowledge from Wikipedia. In EDBT. 697--700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Willet. 1988. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management 24, 577--597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Yu and E. Agichtein. 2003. Extracting synonymous gene and protein terms from biological literature. Bioinformatics 19, 1, i340--i349.Google ScholarGoogle ScholarCross RefCross Ref
  43. A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. 2005. Biocreative task 1a: Gene mention finding evaluation. BMC Bioinformatics 6, Suppl.1, S2.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 4
      Regular Papers and Special Section on Intelligent Healthcare Informatics
      August 2015
      419 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2801030
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 July 2015
      • Accepted: 1 July 2014
      • Revised: 1 May 2014
      • Received: 1 October 2013
      Published in tist Volume 6, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader