1. Field of the Invention
The present invention generally relates to a document retrieval apparatus for retrieving documents including a query character string by using index keys registered for a plurality of registered documents.
2. Description of the Related Art
Conventionally, a full text search has been used as a method for document retrieval. However, in the full text search, since it is needed to search all registered documents, there is a problem in that a huge amount of retrieval time is required to search for a large amount of documents. To eliminate this problem, an index structure and a document retrieval processing method have been improved to realize a high-speed retrieval. As an index structure, a method for corresponding an index key to a document ID was mainly implemented. In this method, presence of an index key relating to registered documents can be obtained. However, in general, a query character string is divided into a plurality of index keys and each index key is collated with character strings in all registered documents. Hence, a search noise (over searched data) is caused. A process for eliminating the search noise is required, while there is a limitation to improve a high-speed retrieval. In order to further improve the high-speed retrieval, another method is recently proposed in that an appearance location of the index key in each document is additionally included in an index table.
For example, in the Japanese Patent Laid-open Application No.6-52222, a character string appearing at a predetermined frequency in registered documents is stored in the index table with an appearance location in the registered documents. The documents including a query character string are specified by using the appearance locations of index keys relating to the query character string.
Further, in the Japanese Patent. Laid-open Application No.8-101848, information including each single character and the appearance location thereof in the registered documents is compressed and then registered in the index table. The documents including a query character string are specified by using the appearance locations of index keys relating to the query character string.
However, there are disadvantages in the above methods in that a retrieval time is increased when the length of an index key is shorter, a query character string including short index keys is not properly searched for in a case where longer index keys are defined, and the retrieval time is increased when a query character string is longer.