1. Field of the Invention
The present invention is directed toward the field of computational linguistics, and more particularly to an automated terminology learning and classification system.
2. Art Background
In general, automated terminology learning/classification systems attempt to understand terminology input from one or more documents. One goal of learning systems is to learn terminology in documents so that when the terminology is encountered in a subsequent document, the meaning of the term is understood. Another goal of learning systems is to use learned terminology for query operations in a search and retrieval system. For the query application, if the meaning of a term is understood or learned, then documents, which include similar themes but expressed using different terminology, may be retrieved.
Typically, the attempt to understand the meaning of terminology from documents is not done in conjunction with an external reference that provides a definition or classification for the term A general definition or classification for a term may be defined as a lexical association Instead of using a lexical association, the prior art learning systems attempt to learn terminology from a document as it is associated with other terms used in the same document. The association of a term to other terms is not truly defining or associating the term to a definition, but merely identifying semantic associations. For example, a semantic association of the term restaurant may yield terms such as menu, chef, seafood, etc. However, a lexical meaning or definition of the term restaurant may yield the term dining establishments, or the like. The association of terminology with other terms in the same document to learn the meaning of the term is limited in terms of the usefulness and application of these prior art learning systems. Thus, it is desirable to develop a learning system that learns terminology, not by statistical association, but through use of independent criteria, such as generating lexical associations.
Typically, prior art learning systems implement a sequential approach to learning. For this approach, the learning systems attempt to learn a term on a document by document basis. For example, a learning system may associate a term with other terms used in a document to understand the meaning of that term. If the term is encountered in a subsequent document, the learning system may attempt to change or modify the previous understanding (e.g. modify the association of the term) This sequential approach tends to lead to a substantially inaccurate understanding of the term. Although a decision to learn a term based on input from a single document may appear correct based on that document, a series of these isolated decisions ultimately leads to a diversion in determining the true meaning of the term. Thus, with the sequential method, the prior art learning systems get on the wrong track in terms of understanding terminology. Therefore, it is desirable to develop a learning system that does not use a sequential approach to learning terminology.