In order to make coded data available in a setting where a large subset of the information resides in natural language documents; a technology called natural language understanding (NLU) is required. This technology allows a computer system to “read” free-text documents, convert the language in these documents to concepts, and capture these concepts in a coded form in a medical database. NLU has been a topic of interest for many years. However, it represents one of the most difficult problems in artificial intelligence. Various approaches have been tried with varied degrees of success. Most current systems are still in the research stage, and have either limited accuracy or the capability to recognize only a very limited set of concepts.
NLU systems which have been developed for use in the field of medicine include those of Sager et al. (“Natural language processing and the representation of clinical data”, JAMIA, vol. 1, pp 142-160, 1994), and Gabrielli (“Computer assisted assessment of patient care in the hospital”, J. Med. Syst., vol. 12, p 135, 1989). One approach has been to made use of regularities in speech patterns to break sentences into their grammatical parts. Many of these systems work well in elucidating the syntax of sentences, but they fall short in consistently mapping the semantics of sentences.
The concepts and ultimate data base representation of the text may be derived from its semantics. Systems which rely upon the use of semantic grammars include those of Sager et al. (Medical Language Processing: Computer Management of Narrative Data, Addison-Wesley, Menlo Park, Calif., 1987) and Friedman et al. (“A general natural-language text processor for clinical radiology,” JAMIA, vol. 1, pp. 161-174, 1994). Zingmond and Lenert have described a system which performs semantic encoding of x-ray abnormalities (“Monitoring free-text data using medical language processing”, Comp. Biomed. Res., vol. 265, pp. 467-481, 1993).
A few systems have been developed which used a combination of semantic and syntactic techniques, e.g., Haug et al. (as described in “A Natural Language Understanding System Combining Syntactic and Semantic Techniques,” Eighteenth Annual Symposium on Computer Applications in Medical Care, pp. 247-251, 1994 and “Experience with a Mixed Semantic/Syntactic Parser,” Nineteenth Annual Symposium on Computer Applications in Medical Care, pp. 284-288, 1995) and Gunderson et al. (“Development and Evaluation of a Computerized Admission Diagnoses Encoding System,” Comp. Biomed. Res, Vol. 29, pp. 351-372, 1996).
Bayesian networks, also known as causal or belief networks, are trainable systems, which have been used to apply probabilistic reasoning to a variety of problems. These networks are described in some detail in Pearl (Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufman, San Mateo, Calif., 1988) and Neopolitan (Probabilistic Reasoning in Expert Systems, Wiley, New York, N.Y., 1990.
All of the above references are incorporated herein by reference.