The present invention relates to a method and system for retrieving stored document data parallelly with use of a plurality of computers and more particular, to a method for changing the number of computers which share retrieval processing.
As personal computers and the Internet spread these years, electronic documents have been explosively increased and a tendency of such document increase will be expected to further continue. Under such circumstances, a users' demand for wanting to search for documents including desired information at high speed is increasing.
As a technique for serving such a growing need, much attention is focused on a full text search technique for searching for documents including a character string (which will be referred to as a query term, hereinafter) specified as a query condition.
As one of such full text search techniques, there is disclosed in JP-A-8-194718 a method for deciding a document that matches the query condition, for example, by previously generating a character string index which has the appearance position of a character string having a length of n characters (which is called n-gram) appearing in the document to be searched together with a document identifier applied to the document, and by comparing an appearance order of the n-gram of the query term specified as a query condition appearing in the document with an appearance order obtained by referring to the corresponding character string index for the document to be searched.
In JP-A-8-194718, since retrieval processing can be carried out only by referring to a character string index generated for the n-gram of a query term, desired documents can be searched for at high speed, regardless of the number of documents to be searched for.
When the above JP-A-8-194718 is employed, a document including desired information can be searched for at high speed.
As a method for adding a search server with an increased quantity of data, there is disclosed in JP-A-9-293006 a database management method for allocating newly-added data to an added search server while not moving existing data.
Also disclosed in JP-A-2001-142752 is a database management method for realizing high-speed data rearrangement by previously dividing data into buckets as predetermined logical units and then managing the buckets.
Even when such a document search system according to JP-A-8-194718 is introduced, however, the capacity of a character string index generated with an increased number of search target documents is also increased. As a result, the search speed is gradually decreased.
To avoid this, a method for reducing the number of documents per one search server to be searched for by using a plurality of search servers in a document search system, is employed.
When a search server is added according to such a database management method as shown in JP-A-9-293006, data stored in an existing search server is not moved. Thus, a retrieval processing time by the existing search server is not improved. This results in that the retrieval processing time of the entire system cannot be improved and the object of the search server addition cannot be attained. In other words, in order to suitably use the method of JP-A-9-293006, it is required to add a search server under such situations that a sufficient retrieval processing time can be obtained, and it is difficult to decide the timing of search server addition when the method is actually carried out.
When the database management method of JP-A-2001-142752 is employed, data is allocated equally to all search servers. Thus, the retrieval processing time of the entire system can be improved. Since the data is managed based on predetermined unit buckets, however, it is similarly necessary to also manage a character string index by bucket unit. To this end, during execution of retrieval processing, a first search result is generated by referring to a character string index for each bucket, and such first search results are merged for all the buckets to generate a second search result for each search server. And when such second search results are merged for all the search servers, a final search result can be obtained. In other words, it is necessary to perform the merging operation of the search results according to the number of buckets and this disadvantageously leads to a large retrieval processing cost.