(1) Field of the Invention
The present invention generally relates to a character string retrieval system using an index and a unit for making the index, and more particularly to a character string retrieval system in which a character string input, as a retrieval key, thereto is retrieved from a text file with reference to the index, and a unit for making the index used in the character string retrieval system.
(2) Description of the Related Art
Conventionally, two types of character string retrieval systems have been proposed. In the first type, a character string is retrieved from characters in a text file without an index. That is, the text file is read and a character string corresponding to a retrieval key is searched in the text file. In the second type, a character string is retrieved from characters in a text file with reference to an index. That is, words which can be retrieval keys are extracted from the text file, an index regarding the extracted words having been made previously. A character string corresponding to a retrieval key input by an operator is searched with reference to the index.
However, the above first and second types of the character string retrieval systems have the following disadvantages.
In the first type, as all characters in the text file must be read out, when retrieving a text file including a large number of characters is being retrieved, a retrieving time increases. In the second type, it is difficult to choose words which can be retrieval keys in each text file. Thus, a long time is required for making the index. Furthermore, a character string which is not included in the index can not be retrieved.
A character string retrieval system similar to the above second type is disclosed, for example, in Japanese Patent Laid Open Application No. 64-8441. This character string retrieval system are provided with an address table having entries corresponding to characters forming a text file. In the address table, the same characters in the text file is linked using address information stored in the entries. A character string corresponding to a retrieval key is retrieved with reference to the address table.
In this system, entries for all the characters in the text file must be provided in the address table. For example, although "hiragana" (Japanese characters of a certain type) is almost never used for retrieving a character string, the address table must be provided with entries corresponding to "hiragana" characters. Thus, the address table must have a large capacity.