1. Technical Field
This invention relates to data storage and retrieval systems, and, more particularly, to the employment of hashing techniques in shared memory of such systems.
2. Description of the Related Art
In real-time systems, the techniques employed to store and retrieve data must be efficient in the use of time and storage space. As the amount of data increases, efficient data storage and retrieval becomes even more important in order to satisfy the tight time constraints inherent to real-time systems.
Large amounts of related data are commonly retained for later use in well-known data structures, such as linked lists. A linked list data structure typically employs memory pointers to facilitate quick data access. In a typical linked list located in virtual (non-shared) memory, a number of records are linked together by memory pointers, and each record contains one or more data entries.
Data stored in a particular record can be retrieved by searching for a key data entry located in the desired record. Such a search involves a CPU-time-intensive string comparison of each data entry in the linked list, beginning with the first record in the list and working sequentially to the end of linked list, until the key data entry is located. In large storage and retrieval systems, such searching, even if augmented by efficient search procedures, often requires an excessive amount of time due to the large number of CPU-time-intensive string comparisons required.
Another well-known and much faster way of storing and retrieving information from computer storage, albeit at the expense of additional storage, is the so-called xe2x80x9chashingxe2x80x9d technique, also called the scatter-storage or key-transformation method. In such a system, the key is operated on by a hashing function to produce a storage address, called a xe2x80x9cbucketxe2x80x9d, in the storage space, called the xe2x80x9chash tablexe2x80x9d, which is a large one-dimensional array of record locations. A specified bucket is then accessed directly for the desired record. Hashing techniques are described in the classic text by D. E. Knuth entitled The Art of Computer Programming, Volume 3, Sorting and Searching, Addison-Wesley, Reading, Mass., 1973, pp. 506-549.
Hashing functions are designed to translate the universe of keys into buckets uniformly distributed throughout the hash table. Typical hashing functions include truncation, folding, transposition, and modulo arithmetic. A disadvantage of hashing is that more than one key will inevitably translate to the same bucket, causing xe2x80x9ccollisionsxe2x80x9d in storage. Some form of collision resolution must therefore be provided. One well-known collision resolution technique is referred to as xe2x80x9cexternal chaining.xe2x80x9d
In this scheme, each hash table bucket optimally has a pointer to the head of a linked list of records, all of whose keys translate under the hashing function to that very bucket. Each linked list is itself searched sequentially when retrieving, inserting, or deleting a record. Insertion and deletion are done by adjusting pointers in the linked list. External chaining is discussed in considerable detail in the aforementioned text by D. E. Knuth, in Data Structures and Program Design, Second Edition, by R. L. Kruse, Prentice-Hall, Incorporated, Englewood Cliffs, N.J., 1987, Section 6.5, xe2x80x9cHashing,xe2x80x9d and Section 6.6, xe2x80x9cAnalysis of Hashing,xe2x80x9d pp. 198-215, and in Data Structures with Abstract Data Types and Pascal, by D. F. Stubbs and N. W. Webre, Brooks/Cole Publishing Company, Monterey, Calif., 1985, Section 7.4, xe2x80x9cHashed Implementations,xe2x80x9d pp. 310-336.
The advantage that hash-based searching provides over linked-list-based searching is apparent. Specifically, the latter search requires a CPU-time-intensive sequential search through each data entry of each record, until locating the record containing the desired entry, a process that may require searching every record in the linked list. In contrast, a hash-based search could not require searching every record in a hash table, assuming the search request is based on a key entry that was hashed. This is because a hash table always breaks down its total number of records into smaller groups of records, set up as linked lists. And, a CPU-time-intensive sequential search through one linked list of a number of linked lists, throughout which a given number of records are distributed, will naturally be faster than searching the same number of records distributed in a single linked list.
However, a serious limitation of prior art hash-based approaches is that they partition records according to a hash calculation on only one data entry per record. Records include many data entries, but heretofore only one data entry was the key data entry from which the convenient sub-groupings of the hash table were based. Thus, in order to advantageously reduce the number of records that need to be searched to a subset of the total set of available records, previous hash-based searches must use the one key data entry that is hashed. A search based on any other data entry in the records, in effect, reverts to a traditional linked list search, requiring CPU-time-intensive sequential searching through each record until the desired data entry is found.
Another limitation flows from the fact that prior art hash-based approaches partition records according to a hash calculation on only one data entry. Specifically, searches taking advantage of the hashing scheme were fixed in scope. Search scope therefore encompassed any record containing the one key data entry in the search request, which of course, could still result in a large number of possible records, and a longer search. If it is known that a desired record includes a number of specific data entries, it would be highly advantageous to base the search on each of the known entries, in effect, narrowing the scope of the search to a more efficient level. Prior art hashing schemes fail to accommodate this desirable feature, since there is only one key data entry that is hashed upon which to efficiently base the search.
Accordingly, there is a need to provide a system and method of hash scanning which overcome the limitations of the prior art.