The present invention relates generally to data processing systems and, more particularly, to decreasing time to access data linked to hash tables.
As processors have become faster, memory access has become a major bottleneck to overall increased performance. To improve performance, memory caching schemes have been adopted to lessen the effect of the memory access bottleneck. Direct access tables, sorted tables, and tree structured tables are conventional techniques to reduce the blockage. Also, hashing schemes are efficient in accessing stored information. Hashing schemes are useful because they provide direct access to data by mapping a large range of values into a smaller range of values. Hashing provides an efficient method of accessing stored data. Conventional hashing mechanisms utilize a data structure known as a hash table to provide access to the stored data.
One example of a conventional hashing mechanism 10 is depicted in FIG. 1. The hashing mechanism 10 comprises a key 12, a hashing function 14, a hashing index 16, and a hash table 18. The hash table 18 contains M buckets 20 numbered 0 to Mxe2x88x921. Each bucket 20 contains data, such as a pointer. To access the hash table 18, the key 12 is input into hashing function 14 which generates an index 16 that indicates a specific bucket 20. The hashing function 14 is a very important part of the hashing mechanism. There are two principle criteria in selecting a hashing function. The hashing function should be easy and quick to compute, and it should achieve a generally even distribution of the indices across the range of indices. Numerous hashing functions meet this criteria to some extent. For example, direct, mid-square, division-remainder, folding, digit extraction, and non-numeric key hashing all meet the above criteria. With the exception of direct hashing, none of these methods use one-to-one mapping. This means that each time a new key is hashed to indicate a bucket, a collision may occur. A collision occurs when a first key and a second key both cause the hashing function to generate the same identical index. The conventional hashing mechanisms take steps to reduce the number of collisions.
A method for resolving collisions is called chaining. As compared with linear probing and secondary hashing, this method has the advantage that it can, if necessary, accommodate more pointers than there are buckets in the table 18, though with a progressive degradation of performance as the number of pointers and keys chained to each bucket grows longer and as the linear search down such chains comes to occupy a larger fraction of the running time. Chaining requires that for each of the M buckets in the hash table, there are pointers corresponding to all of the keys that have hashed to that bucket. For example, FIG. 2 depicts a hash table 22 that is subject to collisions. The hash table 22 contains M buckets (0 to Mxe2x88x921), and each bucket contains two linkages. The linkages are pointers to objects 24 and 26 that contain the first and last pointers and keys placed into the corresponding bucket. If a bucket is empty, then both the first and last linkages are nil. However, when a key is hashed into a bucket 23, such as the bucket 21, the first and last linkage values are adjusted to point to an object that contains a pointer and a copy of the corresponding key. If there is a collision, and accordingly there is more than one entry in a bucket, such as the bucket 25, then the linkages are adjusted so that two objects 24 and 26, each containing a pointer and a key, are linked to the bucket 25. Bucket 21 contains a linkage to the object 23, indicating the key K3 has been hashed into the index 3 which points to that bucket. Bucket 25 is linked to the objects 24 and 26, indicating that the keys K4 and K5 have been hashed into the index 9 which points to that bucket. When a key hashes into the index value 9, the computer searches all the linked objects associated with the given bucket 25, checking the keys, to access the proper pointer. As more collisions occur and the hash buckets become more populated, the benefits of using a hash table start to diminish. The processor must serially search the objects linked to buckets to locate the correct pointer, and this increases processor time.
The hashing function itself plays a key role in reducing the number of collisions and the length of the linkages that must be searched. A hashing function that distributes the indices relatively uniformly over the hash table reduces the probability of collisions. In the prior art, typically the index values are not uniformly distributed over their respective ranges. This non-uniformity of index values has the property of increasing the probability of collisions, thereby increasing the access time and decreasing processor performance. This undesirable property of prior art hash functions is illustrated in FIG. 7. FIG. 7 shows a hash table represented as a ring 92, with the bucket 0 positioned as the uppermost entry in the ring, and with lines radiating from the perimeter of the circle representing assigned bucket linkages. The xe2x80x9caxe2x80x9d indicates a relatively small distance between many nearby bucket linkages in the hash table. From the FIG. 7, it can be seen that the bucket linkages frequently tend to be bunched together and are non-uniformly distributed throughout the hash table. X, Y and Z indicate larger distances between groups of such object linkages, such that X, Y and Z are not equal. This nonuniform arrangement results in excess collisions and increased serial searching time. Therefore, the conventional hash mechanism must take further steps to reduce the number of collisions because of the non-uniformity of the hash function.
Briefly, the present invention comprises one embodiment for an operating system in a computer system. A virtual file management system for a computer, which generates an index to access data stored in hash table comprising: at least one device coupled to a processor, wherein each device has at least one memory unit which has been assigned an origin value; at least one allocated storage wherein each allocated storage corresponds to a particular device and each allocated storage has an identifier; a buffer hash table comprising a plurality of buckets wherein at least one bucket contains a linkage to a buffer cache; a program execution for generating an index into the buffer hash table using the identifier.
In a further aspect of the present invention, in an operating system for a computer system, a virtual file management system that allocates, whenever a mounted device requests a block of memory, an allocated storage area for each device, each allocated storage area having a unique identifier and each storage area containing device meta-data associated with a particular device, and where the operating system has at least one buffer hash array which links to a predetermined number of the allocated storage areas, a method for generating, for each the allocated storage area, an index into the buffer hash table by means of the linkages, to the corresponding allocated storage area, the method comprising the steps of: allocating an origin value for each mounted device, wherein the origin is a value representing a fractional value and the size of hash table, the fractional value has the property of being uniformly distributed over a range; operating on the allocated storage identifier with the origin value in order to produce the allocated storage unique identifier; generating an index into the buffer hash table, the index is computed by XOR the unique identifier with a block sequence number; and establishing a linkage to a bucket in the buffer hash table at the index location in the buffer hash table, if the bucket is having first been assigned a particular index value, or establish linkage chain to the correspondingly named bucket and to all other data structures having been assigned that same index value, if the bucket is the second or later bucket to be assigned that particular index.
In a further aspect of the present invention, the step of allocating an origin value includes using a fractional value multiplied by the size of the hash table, such that the fractional value is one of a series of values having the property of being uniform over a range of 0 to 1.
In a further aspect of the present invention, the step of allocating an origin value includes generating the series of value such that the series is a numeric series of fractional values of the form N/D, where N is one of a list of predetermined values and where D starts as 1 and is doubled each time a 1 is found in the list of values. This series has the property that it tends to be uniform over the range 0 to 1. Wherein, the fractional value is then multiplied by the size of the hash table to produce the origin value.
In a further aspect of the present invention, a virtual file management system that allocates, whenever a mounted device requests a block of memory, an allocated storage area for each device, each allocated storage area having a unique identifier and each allocated storage area containing device meta-data associated with a particular device, and where the operating system has at least one buffer hash table which links to a predetermined number of the allocated storage areas, a method for generating, for each the allocated storage area, an index into the buffer hash table by means of the linkages, to the corresponding allocated storage area, the method comprising the steps of: replacing at least a part of the identifier for each storage area with a fractional value chosen such that the set of all such fractional values chosen up to any given point in time hash the chosen fractional value relatively uniform distributed, numerically speaking, over the permissible universe of fractional values that may be chosen; thereby creating a modified identifier for each storage area; generating from the unique identifier for each storage area the index in the buffer hash table within the range of 0 to Lxe2x88x921, the generating function being a hashing function that generates an index whose value is within the range of 0 to Lxe2x88x921; and at the index location within the array, establishing a linkage to the correspondingly identified and allocated bucket, if the bucket is the first having been assigned that particular index value, or establishing a linkage chain to the correspondingly named and allocated bucket and to all other buckets having been assigned that same index value, if the bucket is the second or later bucket to be assigned that particular index; whereby, the number of buckets that are linked to the excessive number of allocated storage areas is minimized and the number of unlinked array locations is also minimized and accordingly, the time to access file meta-data is reduced. Further objects and advantages of the invention are apparent in the drawings and in the detailed description presented below. The features of novelty that characterize the invention are pointed out with particularities in the claims annexed to and forming apart of this specification.