Automated natural language (NL) text processing typically refers to text processing, such as text retrieval performed on text by a computer capable of "reading" and "understanding" the semantics of the text. Efficient natural language processing systems can be of great benefit in performing tasks such as information retrieval. The computer, by being able to understand the meaning, i.e., semantics, of the text, can perform a more accurate search and bring only relevant information to the attention of the requestor.
In order to perform such "intelligent" searches, the computer itself must "understand" the text. Natural language processing systems therefore typically contain tools, or software modules, to facilitate generating a representation of an understanding of the text. Particularly, when text is input to a NL system, the system not only stores the text but also generates a representation, in a computer-understandable format, of the meaning, i.e., semantics, of the text.
For generating a computer-understandable semantic representation of text, natural language processing systems include, in general and at a high level, a lexicon module and a processing module. The lexicon module is a "dictionary", or database, containing words and semantic knowledge related to each word. The processing module typically includes a plurality of analyzer modules which operate upon the input text and the lexicon module in order to process the text and generate the computer-understandable semantic representation. Particularly, the processing module generates a recorded version for each word of text, and the recoded version includes fields which represent semantic knowledge. Once this semantic knowledge of the text is generated in a computer-understandable format, a system user can use the computer, via an application program such as a search program to perform tasks such as text retrieval.
One analyzer module which may be utilized to operate upon input text to facilitate generation of the computer-understandable semantic representation of the text is a morphological analysis module. Morphologic analysis, as used herein, refers to the derivation of lexical entries for a word based on its internal structure. The internal structure of a word refers to lexical constituents, i.e., root and any affixes, that make up the word.
Known morphological analysis systems utilize affix stripping methods to obtain a word root. The affixes of the word are discarded. The lexical entry is created from the word root alone. Utilizing only the word root, however, limits the accuracy of these known morphological analysis systems. For example, for two separate words having a common root, identical lexical entries will be generated. The two words, in their entirety, however, may have totally different meanings. Until now, no known morphological analysis system utilizes affix information to generate more accurate lexical entries which contain more semantic knowledge.