1. Field of the Invention
The present invention generally relates to a database management system, a full-text search device, and a full-text search method, and more particularly, to a database management system for performing a merge processing by a delayed update, and a full-text search device and method for searching for a document containing a specified character string from a plurality of document data.
This invention is applicable to a system managing a large quantity of document data, such as a document management system, an electronic library system, and a patent publication search system.
2. Description of the Related Art
In a relational database, data is represented and managed as a table. This table is composed of sets of tuples each of which is a list of attribute values. Besides, a substance of the table is stored in a file.
Operations to a database are classified into the following four operations.
(1) Search (Retrieval) Operation
This is an operation of providing conditions concerning attribute values as search conditions so as to retrieve a set of tuples that match the conditions.
(2) Insertion Operation
This is an operation of inserting a new tuple having given attribute values into a table.
(3) Updating Operation
This is an operation of changing attribute values of a tuple selected from a table into new values.
(4) Deletion Operation
This is an operation of deleting a tuple selected from a table.
Hereinafter, the above-mentioned insertion operation, the updating operation, and the deletion operation are collectively referred to as a changing operation.
In a system using the relational database, a response time upon performing the search operation is an important performance indicator.
Therefore, in order to shorten the search response time, there is a method of building a relational database by using an index file.
This index file includes particular structures converted from one or more attribute values so as to evaluate conditions concerning the attribute values at high speed.
On the other hand, in the changing operation, a time required for updating the index file makes a cause of aggravating a performance.
In a common form of using the index file, only the search operation is performed since the changing operation is less frequently requested compared with the search operation; and a mass of the changing operations are performed at night when the system is stopped. Therefore, the performance measured by the response time to the changing operation.
However, when a real-time property is required as in an on-line system, the response time to the changing operation is regarded as being important.
To solve these problems, in a “database management system” disclosed in Japanese Laid-Open Patent Application No. 10-143412, a writing to a database is temporarily retained in a nonvolatile memory before being reflected to a magnetic disk, and corresponding data is referred to by using the nonvolatile memory as a disk cache, in place of the magnetic disk.
However, since only data having a simple structure can be retained in the disk cache, there is a problem that a highly functional index file cannot be used.
Besides, in a “database management method and device, and a machine-readable recording medium recording a program thereof” disclosed in Japanese Laid-Open Patent Application No. 2000-163294, a reference and an updating to a database on a secondary storage are performed on data buffers provided on a primary storage, and an updated page is reflected to the database asynchronously with a processing by an application program, thereby performing a delayed updating process using only one set of data buffers; this reduces a required capacity of the main memory.
However, since only data having a simple structure can be retained in this data buffer, there is a problem that a highly functional index file cannot be used, as in Japanese Laid-Open Patent Application No. 10-143412.
When a plurality of users simultaneously use a database system, the search operation and the changing operation are asynchronously requested. In this course, a transaction processing is used so as to maintain a consistency of data. The transaction processing is explained in detail in (1) “‘Principles of Transaction Processing’ Philip A Bernstein, Eric Newmarker; Nikkei Business Publications, Inc.”.
Completely isolating transactions guarantees the consistency of data at any point of time. However, a reduction in a concurrent execution property may result in a reduction in a throughput as a whole considerably. In order to solve this problem, a concept of an isolation level is used. The isolation level is explained in detail in (2) “‘A Critique of ANSI SQL Isolation Levels’ Hal Bereson, Philip A Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil Proc. ACM SIGMOD Conf. (June 1995) p. 1-10”.
In order to explain the above-described problems of the conventional technology, a consideration is given of a merging operation of arranging together data retaining parts (inverted files) used for full-text search which are divided in plurality.
A merging operation is started when an amount of data in an inverted file to be merged reaches a threshold value. There are two types of such merging operation: one is a synchronous merge that performs a merging as an operation (a foreground operation) in a same series as an insertion operation to the inverted file; the other is an asynchronous merge that performs a merging as an operation (a background operation) different from the insertion operation.
In the asynchronous merge, in order to perform an insertion during merging correctly, an inverted file to be merged has to be processed exclusively. Therefore, during a period of the exclusive processing, the merging operation and the insertion operation each undergo delayed processing, as a result of which a response of the insertion operation is aggravated.
Besides, in recent years, as information communication technology has developed, electronic documents and information about the documents are distributed in large quantities via the Internet etc. Thereupon, there is proposed a document search device for searching a desired document with accuracy and at high speed.
In such document search device, a keyword search method and a full-text search method are used. A full-text search device using the full-text search method compares any given search character string with all documents to be searched so as to extract all documents containing the search character string. Thus, unlike the keyword search method, the full-text search device using the full-text search method involves no need for a large amount of manpower for providing keywords beforehand to all the documents to be searched. Various types of full-text search device are proposed, one type of which is a device adopting an inverted (index) file method. In the inverted-file method, an inverted file that records documents containing a character/word/n-gram (n character juncture), or records appearance positions thereof in the documents, is built beforehand as an auxiliary file for searching; and upon a full-text search, the search is performed by using only the inverted file. Thus, the inverted-file method enables a considerably high-speed search, and is effective for a system necessitating a high-speed search of a large quantity of documents.
Besides, a full-text search method in general and an inverted-file method in detail are described in “Information Retrieval Algorithm” (Kenji Kitasato, Kazuhiko Tsuda, Masami Shishibori; Kyoritsu Shuppan Co., Ltd.; pp. 160-179), “Description of the Related Art” of Japanese Laid-Open Patent Application No. 11-073429, and 1998 Activity Report of Full-Text Search System Conference, etc., and are well-known; therefore, an explanation thereof is omitted.
As a conventional technology adopting the inverted-file method, Japanese Patent No. 3024544 (Japanese Laid-Open Patent Application No. 9-265420) describes an information search device that stores real-time processing data apart from a search index file so as to perform a search processing even when the search index file is being updated. Besides, Japanese Laid-Open Patent Application No. 7-146880 describes a document search device and method registering a new document in a subindex smaller than a main index, thereby shortening a registration time.
However, in the inverted-file method including those described in the above-described Laid-Open Patent Applications, it is commonly necessary to build an inverted file several times as large as original data. Accordingly, as an amount of registered document data increases, a full-text index of the inverted-file method requires a longer time for a registration/deletion processing. Thus, as a full-text search device, a response time of the registration/deletion processing becomes long from a user's point of view.