Hitherto, when a large amount of data is managed in a tree structure, the data is in relatively many cases managed with a data structure called a B-tree. The B-tree stores multiple data entries in one block. Thus, compared to a simple binary tree, the B-tree has an advantage of being able to reduce the range affected by changes in the shape of the tree structure even when a data entry is added. For this reason, B-trees are often employed as a data management method for disk media, such as hard disks.
However, when data managed with a tree structure is searched on a disk medium, multiple data blocks are actually read out. Typically, since I/O (input/output) for the disk is relatively slow compared to memory access, data searching on the disk may involve large amounts of time and work.
For this reason, recently, measures, such as providing a tree structure in the memory, may also be taken in order to avoid a search delay due to the disk I/O. Accordingly, with the B-tree, as the number of data entries increases, the amount of memory used may also increase correspondingly. Thus, a method for storing (caching) only segments that are most frequently read out in the tree structure may also be employed.
Recently, a data structure called a Bloom filter has also become available. The Bloom filter is a method for efficiently determining whether or not a certain entry belongs to an existing set. For example, there is a disclosed technology for dial-pulse processing of an electronic branch exchange. In the technology, two bits, namely, a pulse speed bit and an even-numbered/odd-numbered bit are provided for dial pulses and group processing for capturing the bits is performed.
Further, there is a disclosed technology in which pieces of index information of data are grouped into hierarchical information and the hierarchical information is arranged in a distributed manner. There is also a disclosed database search system in which attributes of data and entry information are distributed, the data is divided, indices are created for the respective pieces of data, the attributes and the entry information are stored in multiple processing apparatuses and storage devices in a distributed manner, and nodes search for the data in parallel. There is also a disclosed storage system in which, in order to check whether or not the same data is already stored, a hash value to be used as an identifier is determined from data, and when the same identifier is not found, it is regarded that the same data does not exist and thus the data is stored.
Further, there is a disclosed network database having multiple nodes. Each node determines bit vectors of a Bloom filter on the basis of a search query, generates at least one network message including a set of specific values, and transmits the network message to data sources. There is also a disclosed information retrieval technology in which hash values are determined for a stored lookup set, a third table (e.g., a Bloom filter) having the hash values at the bit positions is created, hash values are determined from an input value. Further, if an encoded input value exists in a table, it is determined that the input value is in the lookup set.
Since the B-tree can handle a large amount of data, as described above, the number of disk inputs/outputs can be reduced if appropriate caching is realized. However, it is difficult to reduce the number of disk inputs/outputs to a certain number or more. When the tree structure changes as a result of addition of a data entry, inputs/outputs for managing the tree structure may have to be provided. The Bloom filter is not directly applicable to data management, since it is adapted to check the presence of data entries. The Bloom filter also occupies a memory having a size that is proportional to a manageable amount of data.
Examples of the related art include Japanese Laid-open Patent Publication No. 2007-52698, Japanese Laid-open Patent Publication No. 04-18895, Japanese Laid-open Patent Publication No. 2001-101047, Japanese Laid-open Patent Publication No. 02-297670, Japanese Laid-open Patent Publication No. 2010-182302, Japanese National Publication of International Patent Application No. 2006-503342, and Japanese National Publication of International Patent Application No. 2007-524946.