Recently, a demand of carrying out a fast text search for a large-scale document set, which is frequently updated, has been occurred. For example, although blogs and news, which are opened on the Internet and the Intranet, are frequently updated, there is a demand of providing these on a search service without any time delay. In addition, in a call center, there is also a demand of carrying out a fast search for a huge volume of (incidents) correspondence records, which have been stored and are received just before, without any time delay of the content change.
For these demands, as a broadly implemented method for the document search, there are two methods: one is a method in which the index is created, and the other is a string pattern matching method for scanning the text of the documents to be searched in order to judge whether or not a search key is contained in the documents to be searched.
The index method is a method for creating the index for the search targets to make the search fast, and the representative one is a method using “inverted file”. The “inverted file” is an index structure holding words, which appear in the documents, and the document number sequence. The speed of the search using the “inverted file” is high, the search capability indicates several GB to several ten GB per 1 second by one Central Processing Unit (CPU) (e.g. a CPU whose frequency is 3 GHz), and it is suitable for the large-scale document search. However, because the implementation in which the grand document number sequence is compressed is usual, the updating process is always difficult.
On the other hand, the string pattern matching (also called a pattern matching) is a method for scanning the text to be searched without creating the index to judge whether or not the pattern of the search target exists in the text. When a search mechanism is implemented by using the string pattern matching, the search capability indicates 10 to 100 MB per 1 second by one CPU (e.g. a CPU whose frequency is 3 GHz), and the speed of the search processing is slow. On the other hand, because the index is not used, the updating process is completed only by the update of the search target, and it is easy and has fast.
Incidentally, JP-A-H08-272806 discloses a technique for automatically judging conditions of a searched formula input by a searcher to efficiently carry out a search using merits of the respective search methods, in a database search system capable of executing plural search methods. Specifically, as search means for searching a database in a database storage, for example, index search means and full text search means are provided. Then, the database search system has input means for inputting a search formula; dividing means for dividing the search formula into nominals; assigning means for assigning the respective nominals obtained by the dividing means as a search key to either of the index search means and the full text search means; operation means for carrying out a logical operation for the search results from the index search means and the full text search means based on the aforementioned search formula to output the result of the logical operation as the search result of the search formula to display means. However, this publication does not investigate the update of the database.
In addition, U.S. Pat. No. 5,903,890 discloses a technique capable of carrying out the data search at fast, and effectively utilizing system resources. Specifically, a database system includes a single-coupled database respectively having two coupled data columns; a data base driver respectively executing the search of them; and an interface driver to couple the search result in the database driver. The interface driver instructs the database driver corresponding to the single-coupled database having a desired search item, and by coupling those search results, it obtains the desired search results. Also in this publication, the update of the database is not investigated.
Furthermore, JP-A-H01-98020 discloses a technique in which a cache index is used in addition to a base index having all key values, the update information is temporarily stored in the cache index by update means, and in response to a search request, the search merging both indexes is carried out, and when a portion corresponding to the base index exists in the cache index, the corresponding portion in the cache index is deleted, and the content is reflected to the corresponding portion in the base index. Although the update of the index is discussed, the update of the database is not investigated.
As for the aforementioned “inverted file”, because the document number sequences in the index are compressed, the update of the index during the search service is difficult, generally. In order to deal with this problem, a method is adopted in which the index is duplicatively held, and one is used for the search and the other is used for the update in the background, and when the update is completed in the background, the indexes are exchanged. This method causes the delay about several ten minutes for the update completion of the index from the update of the original documents, although it depends on the document scale to be updated.
In addition, in the string pattern matching, although the update speed is high because there is no need to update the index (specifically, it can be presumed that the delay time is zero in actual), the search speed is slow, and when the documents having a volume equal to or more than several GB is searched, a lot of hardware resources are required to hold the documents to be searched, in a distributed manner. This is a large problem.