1. Field of the Invention
The present invention is directed toward the field of knowledge bases for use in natural language processing systems, and more particularly toward integrating thesauri from disparate sources into a single knowledge base.
2. Art Background
In general, knowledge bases include information arranged to reflect ideas, concepts, or rules regarding a particular problem set. Knowledge bases have application for use in natural language processing systems (a.k.a. artificial linguistic or computational linguistic systems). These types of knowledge bases store information about language. Specifically, natural language processing knowledge bases store information about language, including how terminology relates to other terminology in that language. For example, such a knowledge base may store information that the term xe2x80x9cbuildingsxe2x80x9d is related to the term xe2x80x9carchitecture,xe2x80x9d because there is a linguistic connection between these two terms.
Natural language processing systems use knowledge bases for a number of applications. For example, natural language processing systems use knowledge bases of terminology to classify information. One example of such a natural language processing system is described in U.S. Pat. No. 5,694,523, entitled xe2x80x9cContent Processing System for Discourse,xe2x80x9d issued to Kelly Wical on Dec. 2, 1997, which is expressly incorporated herein by reference. Terminological knowledge bases also have application for use in information search and retrieval systems. In this application, a knowledge base may be used to identify terms related to the query terms input by a user. One example for use of a knowledge base in an information search and retrieval system is described in U.S. patent application Ser. No. 09/095,515, entitled xe2x80x9cHierarchical Query Feedback in an Informative Retrieval System,xe2x80x9d by Mohammad Faisal, filed on Jun. 10, 1998 and U.S. patent application Ser. No. 09/170,894, entitled xe2x80x9cRanking of Query Feedback Terms in an Information Retrieval System,xe2x80x9d by Mohammad Faisal and James Conklin, filed on Oct. 13, 1998, both of which are incorporated herein by reference.
Natural language processing systems, including information search and retrieval systems, may be applied to domain specific applications. For example, a natural language processing system may process and classify information (e.g., documents) about medicine for a system tailored for the medical profession. For this example, a natural language processing system may compile and classify thousands of documents related to medicine. A commercially available natural language processing system may include a general knowledge base, that includes terminology from a wide range of topics. However, this general knowledge base may not include specific terminology relating to a domain specific application. A user of the natural language processing system for the medical application may desire to augment the general knowledge base with terms specific to medicine. For example, the user may desire to augment the knowledge base to include terms that classify specific types of blood disorders. As illustrated by the above example, it would be impossible for a commercial developer of a knowledge base to thoroughly include all topics or domains of interest to all users. Accordingly, it is desirable to provide a means for a user to add domain or topic specific terminological information into a built-in knowledge base. It is also desirable to provide an automated means to enter the terminological information to facilitate easy use of a system, as well as provide a seamless integration of domain specific terms and a general built-in knowledge base.
A terminological system automates the integration of terminological information into a knowledge base. The system contains a built-in knowledge base comprising a plurality of nodes, which represent terminology, arranged to depict relationships among the terminology. Input terminology information, which includes a plurality of input terms and information that specifies relationships among at least two of the input terms, is input to the terminological system. The terminological system parses the input terminology information to generate a logical structure that depicts relationships among the input terms in a format compatible with the built-in knowledge base. A determination as to whether at least one input term exists as a node in the knowledge base is made, and if there is no corresponding node, then an independent ontology comprising the logical structure is generated. If at least one input term exists as a node in the knowledge base, then the knowledge base is extended by logically coupling the logical structure to a node that matches the input term. The terminological system also resolves conflicts if an input term that matches a terminological node in the knowledge base connotes a different meaning than the terminological node.
In one embodiment, the input terminology information is received in an ISO 2788 format. For this embodiment, the input terminology information may include broader term and narrower term relationships among two input terms for conversion to parent-child and child-parent relationships in the built-in knowledge base. The input terminology information may also include synonym relationships between two terms for conversion to parent-child relationships between a common parent node in the knowledge base and the input terms specified as synonym relationships. Furthermore, the input terminology information may include related term (RT) relationships among at least two input terms for conversion to cross references between terms comprising a related term (RT) relationship in the input terminological information. In addition, the input terminology information may include preferred term (PT) relationships among at least two input terms for conversion to a canonical/alternate form index between terms comprising a preferred term (PT) relationship in the input terminological information.