Conventionally, when a large volume of data is managed by a tree data structure, management by a data structure called a B-tree is performed in a relatively large number of cases. Compared to a simple binary tree, a B-tree stores multiple data entries in one block and consequently, offers the advantage of being able to reduce the degree to which the shape of the tree data structure changes even when data entries are added. Therefore, a B-tree is often used as a data management method for disks such as hard disks.
However, when data managed on a disk by a tree data structure is searched, multiple data blocks have to be read in. Further, in general, the input/output (I/O) of a disk is relatively slow compared to memory access and consequently, data searches on disks are time consuming.
Therefore, to prevent search delays consequent to disk I/O, countermeasures such as the storage of a tree data structure in a memory are under consideration. In this case, in a B-tree, when the number of data entries becomes large, there is a risk of the amount of corresponding memory increasing. Thus, a method is also under consideration where in the tree data structure, only the portions that are often read in are stored to the memory (cache).
Meanwhile, data structures called Bloom filters have also become well known. A Bloom filter is a method of efficiently determining whether an entry belongs to an existing set.
Further, technologies have been disclosed that in information searches, search multiple servers by a daisy chain technique. Technologies have been disclosed that in information searches, estimate hit rates and read out contents having a high hit rate. Technologies have been further disclosed that in information searches, use alternative search conditions, in descending order of the rate at which data requested by the user is included in the alternative search conditions. For examples, refer to Japanese Laid-Open Patent Publication Nos. 2005-234759, S63-317859, and H7-302267.
As described, since B-trees can cope with a large volume of data, if cache is suitably prepared, disk I/O can be reduced. However, disk I/O cannot be reduced beyond a given point. Furthermore, if the tree data structure changes consequent to the addition of data entries, I/O for tree data structure management may become necessary. The Bloom filter only determines the presence of data entries and therefore, cannot be used as is for data management.
In addition, the amount of memory that the Bloom filter consumes is proportional to the volume of data that can be managed. Therefore, when the Bloom filter is applied to a redundancy deleting storage system, a problem arises in that the capacity of the entire system is determined by the amount of memory that can be provided to a node.