Distributed storage systems can store vast amount of data, often in the scale of terabyte, or even petabyte. If the average size of a record is small, such as dozen of bytes, then the number of records is going to be huge. There are two approaches in designing a disk storage engine. In the first approach, no indexes for the records are created in the memory, the disk is organized solely based on hash, and multiple records are stored together. In the second approach, indexes for all the records are created in the memory, and the number of records equals to number of record keys. In the second approach, due to the large number of keys, it is often necessary to store a number of small records together using one hash, and maintain only one index for these records. Thus, storing a number of records together under one hash is quite common in hash-based disk storage systems.
In such a hash-based disk storage system, there are no indexes in the memory for a set of records {K1, K2, K3 . . . Kn}. In determining whether a particular record is in the set, all records in need to be accessed through Input/Output (I/O) resources. In certain circumstances, the requested record is not in the set, which results in unnecessary access and the wasting of I/O resources.