1. Field
The present disclosure relates to a method for hash collision detection based on the sorting unit of the bucket, and more particularly, to a method for hash collision detection based on the sorting unit of the bucket designed to reduce the future search time by pre-aligning the database in the bucket and to enhance the efficiency in a search within the database.
2. Description of Related Art
As is well known, the index data structure enables a speedy search of targeted data through records. The most fundamental index data structure is a method of aligning all the records in sequential order. As an index is generally smaller than a data file, searching an index is more efficient than searching the entire data in sequential order. However, most database and file systems often use hash and tree index data structure to manage a large amount of data since the extent to be searched in order to detect the targeted records becomes wider as the number of records increases.
The tree structure method is not efficient in a big dataset since it takes long to search data due to the method's susceptibility to the number, the format, and the location of records saved in files and the difference between the best way and the worst way can be great.
Hash is a method of searching records by using the numerical characteristic of the key, namely by calculation and not by comparison. Hash is now being used in various fields of the computer as it can use enough memory space, its speed can be predetermined, and it is easy to be inserted and be removed.
The separate chaining method takes each bucket of the hash table as a Head node, composes the index portion, which functions as a link between the saving unit of the record key and the node, with a single bucket, and makes a single connection list per a bucket of the Head node. Each bucket is independently situated in the saving apparatus.
The method to read all the slots by linking them in a link list whenever a collision happens to save only the needed data in a saving apparatus is suitable for the environment such as RAM (Random Access Memory) where a random approach is possible and an approach is very speedy. The same method is hard to be directly applied to NAND flash memory as the reading speed of the NAND flash memory is faster than those of a hard disk and other NAND flash memories but it is slower than the speed of RAM. And that a minimum reading unit of the NAND flash memory is Page, different from a Bit unit of the RAM, is one of the problems of the separate chaining method. It causes much damage to continuously read and write the Page unit to read a small record.
Although the most ideal hash table is composed of buckets and slots suitable for the number of records, there is a difficulty in determining the size of a bucket due to the characteristic of the index data structure wherein the ideal records are continuously transformed by being inserted and removed.
Making the size of the bucket identical to the size of the NAND flash memory is a way to demonstrate the best result in efficiency considering the change of records numbers. However, the method has a disadvantage in that it needs much more saving space than needed in a hash table where collisions do not frequently occur.
Although the best hash function is the status where collisions rarely occur and distribution is well aligned, the efficiency of the entire hash table may be decreased as the resource waste becomes serious and the Hit Rate of the memory buffer is reduced in case when the size of the bucket used in the separate chaining method is set to the Page unit of the flash memory, the unit of which is much larger compared to the sector unit of the conventional hard disk. In case of hash wherein collisions often occur, it is more efficient to reduce collision by Rehashing which use other hash functions.