It is widely acknowledged that there is an extraordinary amount of unstructured clinical data referred to as “the text blob” in healthcare today. Although there have been numerous incentives to support the adoption of electronic medical records (EHRs), evidence that these systems improve the efficiency or quality of care has been scarce. A promising solution for the problem seems to be natural language processing (NLP).
What does Natural Language Processing mean?
Natural language processing is a method that utilizes computational techniques for the purpose of learning, understanding, and producing human language content. In particular, NLP programmes process text to extract information based on the rules of linguistics using sophisticated algorithms. The process in one where NLP breaks down sentences and phrases into words, and assigns each word a part of speech, for instance a noun or adjective. It should be noted that natural language processing is a lot more that a computer or program recognizing a list of words. Mark Kumar posits that, “when it comes to data variety, a large part of the challenge lies in putting the data into the right context.”
Origins of Natural Language Processing
Beginning in the 1980s, but more widely in the 1990s, early computational approaches to language research focused on automating the analysis of the linguistic structure of language and finding ways to develop basic technologies such as machine translation, speech recognition, and speech synthesis.
The evolution of NLP begun with the design of modelsover large quantities of empirical language data. Historically, this transformation of NLP into a big data field occur as a result of two developments. Firstly, researchers had access to linguistic data in digital form early on through the Linguistic Data Consortium (LDC), founded in 1992. Statistical or corpus based NLP which means sharing a ‘body of words’, has been of the first notable successes of the use of big data, long before the term “big data” was even introduced. The second development was performance improvements in NLP through share task competitions funded and organized primarily by the U.S. Department of Defense. Later, the research community itself for example the CoNLL Shared Tasks continued to organize these events.
NLP for healthcare and electronic medical records
Cutting edge NLP technologies have been applied to internet search engines and automatic speech recognition with substantial success. However they are only now being adapted in other sectors such as healthcare for electronic medical records. The fundamental component for a meaningful use of NLP in healthcare is high-quality documentation. This is the only approach to succeed things like tracking trends in patient car or improving treatment decisions through data that are first digital and secondly meaningful. Such information captured in EHRs would be incredibly valuable Making sense of the large volume of less structured part of the medical record, such as clinicians’ notes—that includes free-form entries regarding a patient’s history and status, is much harder. NLP could play a key role in making meaning from unstructured text which is so much a part of our clinical ecosystem, making “the text blob” meaningful and actionable.
A study by Harvey Murffpublished in JAMA, validates natural language processing (NLP) technologies as a powerful tool to unlock data (meaning) from EHRs (REF). Murff, a physician at Vanderbilt University, and collaborators dealt with the problem using natural-language processing algorithms. These algorithms employ certain rules of speech and language into analysis. For example, searching using a keyword could retrieve all instances of the word “pneumonia”. However natural-language processing takes into account other modifiers as well, like “no signs of” pneumonia that would yield a more accurate count. The Murff/JAMA study sets the stage for a broad spectrum of scenarios in which we can apply intelligence technologies such as NLP to improve care quality, reimbursement and efficiency (Murff, et al. 2011).
More recently, a significant progress in the use of NLP for EMR was incorporating NLP in phenotype classification algorithms. In creating EMR phenotypes, researchers relied on the NLP task to identify so-called concepts in narrative clinical text, for example the terms “atrial fibrillation(s)” and “auricular fibrillation(s)” to express the same concept (Pradhan, et al. 2015). According to this study incorporating data extracted by NLP into a phenotype algorithm proved to have several advantages.
Thinking long term, NLP has unlimited potential for the way we interact with computer systems. When it comes to the value of NLP technologies for healthcare, we have yet to see a systematic implementation yet if we make use of the increase in both computing powers and storage capability the coming years promise a revolution, starting from handling electronic medical records, and becoming more that a query tool, to inform point-of-care process, providing doctors with real-time information about a patient through use of a medical record. Gradually physicians would be able to include the most thorough and accurate patient information in the EMRs using NLP programs as they are dictating their notes.
References
Murff, H.J, FitzHenry, F., Matheny, M.E., Gentry, N., Kotter, K.L., Crimin, K., Dittus, R.S., Rosen, A. M., Elkin, P.L., Brown, S., Speroff, T. (2011) Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing. JAMA, 306(8), pp.848-855.DOI:10.1001/jama.2011.1204.
Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, et al. (2015) Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 22:143-54.