Information or data stored in a computercontrolled storage mechanism can be retrieved by searching for a particular key in the stored records. Records with a stored key matching the search key are then retrieved. Such searching techniques require repeated accesses or probes into the storage mechanism to perform the key comparisons. In large storage and retrieval systems, such searching, even if augmented by efficient search algorithms such as a binary search, often requires an excessive amount of time.
Another well-known and much faster method for storing and retrieving information from computer store involves the use of so-called "hashing" techniques. These techniques are also sometimes called scatter-storage or key-transformation techniques. In a system using hashing, the key is operated upon (by a hashing function) to produce a storage address in the storage space (called the hash table). This storage address is then used to access the desired storage location directly with fewer storage accesses or probes than sequential or binary searches. Hashing techniques are described in the classic text by D. Knuth entitled The Art of Computer Programming, Volume 3, Sorting and Searching, pp.506-549, Addison-Wesley, Reading, Massachusetts, 1973.
Hashing functions are designed to translate the universe of keys into addresses uniformly distributed throughout the hash table. Typical hashing operations include truncation, folding, transposition and modulo arithmetic. A disadvantage of hashing techniques is that more than one key can translate into the same storage address, causing "collisions" in storage or retrieval operations. Some form of collision-resolution strategy (sometimes called "rehashing") must therefore be provided. For example, the simple strategy of searching forward from the initial storage address to the desired storage location will resolve the collision. This latter technique is called linear probing. If the hash table is of the table map back to the beginning of the table, then the linear probing is done with "open addressing," i.e., with the entire hash table as overflow space in the event that a collision occurs.
Removing or deleting records from a hash table can also be a complicated procedure. The location of a record to be deleted cannot be simply emptied since this location may be a link in a chain of locations previously created during a collision-resolution procedure. The typical solution to this problem is to mark the record as "deleted" rather than as "empty." In time, however, the storage space can become contaminated by an excessive number of deleted storage locations that must be searched to locate desired records. With the passage of time, such storage contamination can reduce the performance of retrieval operations below acceptable levels. These problems are discussed in considerable detail in Data Structures and Program Design, by R. L. Kruse, Prentice-Hall, Englewood Cliffs, N.J., 1984, pp. 112-126, and Data Structures with Abstract Data Types and PASCAL, by D. F. Stubbs and N. W. Webre, Brooks/Cole Publishing, Monterey, California, 1985, pp. 310-336.
In the prior art, such storage space contamination was avoided by deletion procedures that eliminated deleted records by replacing the deleted record with another record in the collision-resolution chain of records and thus close the chain without leaving any deleted records. One such procedure is shown in the aforementioned text by Knuth at page 527. Unfortunately, such non-contaminating procedures, due to the necessity for successive probes into the storage space, take so much time that they can be used only in systems having very low fill factors. The fill factor, in this sense, is the amount of storage space which is full.
The problem, then, is to provide the speed of access of hashing techniques for large and heavily used information storage systems and, at the same time, prevent the large-scale contamination which normally results from deletions in such large and heavily used systems.