1. Field of the Invention.
The present invention relates to the field of natural language understanding, and more particularly to the application of natural language understanding in the area of medical information systems. In a broader sense, the inventive method and system could be applied in any area in which there is a need for extracting conceptual information from free-text.
2. Description of Related Art
Medical information systems are designed to capture and manipulate large amounts of medical data. In most modem information systems this data takes the form of either free-test or coded data. Free-text is typically the information that is dictated by a caregiver and typed into a computer by a transcriptionist. It is frequently referred to as natural language data. Coded data is data that is typically entered in a structured way and stored according to a data dictionary and a pre-defined storage structure. Natural language documents can be shown on a computer screen or printed and are easily understood by humans who read them. However, the data is largely inaccessible to computer programs that manipulate medical information for research, medical decision making, quality assurance initiatives, and the management of medical enterprises. In contrast, data in coded form can be conveniently used in research, decision support, quality assurance, analyses done for management purposes and in a variety of focused reports that combine information from multiple sources, but is not readily accessible to a human reader unless translation of the coded forms and special formatting has been performed.
In order to made coded data available in a setting where a large subset of the information resides in natural language documents; a technology called natural language understanding (NLU) is required. This technology allows a computer system to "read" free-text documents, convert the language in these documents to concepts, and capture these concepts in a coded form in a medical database. NLU has been a topic of interest for many years. However, it represents one of the most difficult problems in artificial intelligence. Various approaches have been tried with varied degrees of success. Most current systems are still in the research stage, and have either limited accuracy or the capability to recognize only a very limited set of concepts.
NLU systems which have been developed for use in the field of medicine include those of Sager et al. ("Natural language processing and the representation of clinical data", JAMIA, vol. 1, pp 142-160, 1994), and Gabrielli ("Computer assisted assessment of patient care in the hospital", J. Med. Syst., vol. 12, p 135, 1989). One approach has been to made use of regularities in speech patterns to break sentences into their grammatical parts. Many of these systems work well in elucidating the syntax of sentences, but they fall short in consistently mapping the semantics of sentences.
The concepts and ultimate data base representation of the text must be derived from its semantics. Systems which rely upon the use of semantic grammars include those of Sager et al. (Medical Language Processing: Computer Management of Narrative Data, Addison-Wesley, Menlo Park, Calif., 1987) and Friedman et al. ("A general natural-language text processor for clinical radiology," JAMIA, vol. 1, pp. 161-174, 1994). Zingmond and Lenert have described a system which performs semantic encoding of x-ray abnormalities ("Monitoring free-text data using medical language processing", Comp. Biomed. Res., vol. 265, pp. 467-481, 1993).
A few systems have been developed which used a combination of semantic and syntactic techniques, e.g., Haug et al. (as described in "A Natural Language Understanding System Combining Syntactic and Semantic Techniques," Eighteenth Annual Symposium on Computer Applications in Medical Care, pp. 247-251, 1994 and "Experience with a Mixed Semantic/Syntactic Parser," Nineteenth Annual Symposium on Computer Applications in Medical Care, pp. 284-288, 1995) and Gunderson et al. ("Development and Evaluation of a Computerized Admission Diagnoses Encoding System," Comp. Biomed. Res, Vol. 29, pp. 351-372, 1996).
Bayesian networks, also known as causal or belief networks, are trainable systems, which have been used to apply probabilistic reasoning to a variety of problems. These networks are described in some detail in Pearl (Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufman, San Mateo, Calif., 1988) and Neopolitan (Probabilistic Reasoning in Expert Systems, Wiley, New York, N.Y., 1990.
All of the above references are incorporated herein by reference.