The present invention relates to a natural language processing system such as a machine translation system, a query answering system or a document data base system, and more particularly to storing and retrieving information by using a natural language word as an index in such a system.
In such a system, a file which stores related information (grammars, equivalent words, meanings, references, etc.) on a number of words and allows retrieval of the related information of any given word is an essential element. The capacity and retrieval efficiency of such a file significantly influences cost and any performance of the system and any improvement thereof therein is very important.
In a prior art information storage/retrieval system, a record for each word including related information of the word is created using the word as a key, and the records are assembled in a file. For a given word, matching between the word and the record key is determined to retrieve the desired related information. In this system, the following problem is encountered with respect to derivatives.
The word usually has many derivatives and the related information of those derivatives frequently include common information. For example, there are many instances where a plurality of derivatives differ from each other only in terms of part of speech but the descriptions of meaning thereof are substantially identical. In spite of such a circumstance, if the common information is entered repeatedly in a plurality of records corresponding to the respective derivatives, not only is memory capacity required to be increased but also maintenance (correction and supplement) of the stored information is troublesome. Accordingly, an appropriate one of a family of derivatives may be selected, the common related information may be recorded only in the record corresponding to the selected derivative, and this record may be referenced by other derivatives. However, this leads to a complexity of the file structure and a long retrieval time is required when the word other than the selected word is to be retrieved.
In a certain application of the file, it is required to retrieve a derivative of a given word. (For example, a dictionary for generating a target language in an automated translation system or a key word file in a document data base system.) In order to meet such a requirement, it is necessary in the prior art information retrieval system to store the derivatives as a kind of related information. As a result, the memory capacity required further increases, and the retrieval time is further extended if the related information of the derivatives are also retrieved.