1. Field of the Invention
The present invention generally relates to a document retrieval apparatus for retrieving documents including a query character string by using index keys registered for a plurality of registered documents.
2. Description of the Related Art
Conventionally, a full text search has been used as a method for document retrieval. However, in the full text search, since it is needed to search all registered documents, there is a problem in that a huge amount of retrieval time is required to search for a large amount of documents. To eliminate this problem, an index structure and a document retrieval processing method have been improved to realize a high-speed retrieval. As an index structure, a method for corresponding an index key to a document ID was mainly implemented. In this method, presence of an index key relating to registered documents can be obtained. However, in general, a query character string is divided into a plurality of index keys and each index key is collated with character strings in all registered documents. Hence, a search noise (over searched data) is caused. A process for eliminating the search noise is required, while there is a limitation to improve a high-speed retrieval. In order to further improve the high-speed retrieval, another method is recently proposed in that an appearance location of the index key in each document is additionally included in an index table.
For example, in the Japanese Patent Laid-open Application No. 6-52222, a character string appearing at a predetermined frequency in registered documents is stored in the index table with an appearance location in the registered documents. The documents including a query character string are specified by using the appearance locations of index keys relating to the query character string.
Further, in the Japanese Patent Laid-open Application No. 8-101848, information including each single character and the appearance location thereof in the registered documents is compressed and then registered in the index table. The documents including a query character string are specified by using the appearance locations of index keys relating to the query character string.
However, there are disadvantages in the above methods in that a retrieval time is increased when the length of an index key is shorter, a query character string including short index keys is not properly searched for in a case where longer index keys are defined, and the retrieval time is increased when a query character string is longer.
It is a general object of the present invention to provide a document retrieval apparatus for retrieving documents in which the above-mentioned problems are eliminated.
A more specific object of the present invention is to provide a document retrieval apparatus for retrieving documents which improves a document dividing process and a retrieval condition evaluating process so as to effectively retrieve documents.
The above objects of the present invention are achieved by an apparatus for retrieving documents including: a document dividing part dividing each document into partial character strings as index keys; an index table maintaining the index keys and document information relating to each index key; a query character string dividing part dividing a query character string into a plurality of index keys; a retrieval condition analyzing part analyzing a retrieval condition including the index keys divided from the query character string and generating a retrieval condition tree where the index keys are synthesized by at least one operator that retrieves an intermediate retrieval result including the document information from said index table; a retrieval condition evaluating part evaluating each intermediate retrieval result obtained by the retrieval condition tree and determining a final retrieval result.
According to the present invention, it is possible to reduce the size of a document set that may be searched for by an operation. Therefore, the document retrieval process can be effectively conducted.
The above objects of the present invention are achieved by a method for retrieving documents including the steps of: (a) dividing each document into partial character strings as index keys; (b) maintaining the index keys and document information relating to each index key; (c) dividing a query character string into a plurality of index keys; (d) analyzing a retrieval condition including the index keys divided from the query character string and generating a retrieval condition tree where the index keys are synthesized by at least one operator that retrieves an intermediate retrieval result including the document information from said index table; (e) evaluating each intermediate retrieval result obtained by the retrieval condition tree and determining a final retrieval result.
According to the present invention, the method can reduce the size of a document set that may be searched for by an operation. Therefore, the document retrieval process can be effectively conducted.
The above objects of the present invention are achieved by a computer-readable recording medium having program code recorded therein for causing a computer to retrieve documents, said program code comprising the code for: (a) dividing each document into partial character strings as index keys; (b) maintaining the index keys and document information relating to each index key; (c) dividing a query character string into a plurality of index keys; (d) analyzing a retrieval condition including the index keys divided from the query character string and generating a retrieval condition tree where the index keys are synthesized by at least one operator that retrieves an intermediate retrieval result including the document information from said index table; (e) evaluating each intermediate retrieval result obtained by the retrieval condition tree and determining a final retrieval result.
According to the present invention, computer-readable recording medium can be provided in which the size of a document set, which may be searched for by an operation, can be reduced. Therefore, the document retrieval process can be effectively conducted.