A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright (copyright)1999, Microsoft, Inc.
1. Technical Field of the Invention
The present invention pertains generally to data access with hash tables, and more particularly to a system for improving access to nearest logical records in logically ordered data contained in a non-logically ordered hash table.
2. Background of the Invention
Traditional linear hash tables optimize access time by evenly distributing records across the underlying table. In the process of entering records into the hash table, any logical ordering of the data is lost. While access time for a specific record given a specific key is fast, the ability to xe2x80x9cwalkxe2x80x9d to adjacent logically ordered (not physically ordered in the hash table) is lost. Further, in the absence of a record existing in the hash table for a given key, due to the logical ordering of the data being lost on entry into the hash table, it is impossible to perform a time optimal xe2x80x9cfind nearestxe2x80x9d, xe2x80x9cfind nearest precedingxe2x80x9d, or xe2x80x9cfind nearest proceedingxe2x80x9d type of query.
Referring now to FIG. 1, there is illustrated an example of the use of hashing to store and retrieve logically ordered data. In this example, employee records 1 include an employee name 2, and a social security number 3 used as key k for a hash function F(k) used to map employee records 1 to a hash table 4 with an Index 5 space of 2000 (0-1999). Records 1 are logically ordered according to social security number. As illustrated in FIG. 1, records 1 are mapped, in this example, with hash function F(k) (which may be any arbitrary function), as follows:
As further illustrated in FIG. 1 and Table 1 above, each record in the hash table is threaded by the inclusion of pointers 6 to the next succeeding and next preceding logically ordered record. In FIGS. 1 and 3, pointers with arrows on both ends represent both the xe2x80x9cnextxe2x80x9d and xe2x80x9cpreviousxe2x80x9d record pointers between records. With this threading, it is possible to xe2x80x9cwalkxe2x80x9d to adjacent logically ordered records by following pointers 6.
While the threaded linear hashing table example described above provides for optimized access time by allowing records to be located with a single hashing operation, and also provides the ability to walk the records in the table in logical order using pointers, a problem still remains in locating a record logically nearest a particular key where there is no record in the hash table corresponding to the key. For example, if one were to attempt to locate the nearest preceding logical record with the key 474-56-0789, the hashing function may point to a record in the hash table with the Index 1854. In the example of FIG. 1, no record exists at this Index, and thus it is not possible to enter the hash table at the index and walk to the nearest preceding record. Thus, this record may only be found by walking through the entire table, beginning with the first record (which, assuming that the table contains thousands or hundreds of thousands of records is likely very inefficient), or hashing each of the possible keys on the preceding record side of the key 474-56-0789, until a hashing operation locates a record. In this example, such a search would require hashing all of the keys in the range between 474-56-0789 and 470-70-2358. Again, this is an inefficient manner in which to locate the nearest logical record. Accordingly, there is a need for a more efficient system for locating a nearest logical record in a hash table where no record exists for the designated key.
According to various example embodiments of the invention, there is provided an efficient system for performing sequential linear access and other operations on logically ordered data stored in non-logical order in a hash table, as described herein below. In particular, the invention provides, among other things, that marker data records with known keys are inserted in the hash table together with the actual data records threaded in the hash table, wherein the marker records are used to obtain an initial position in the hash table data records.
By selecting a logically closest marker record, a valid entry into the hash table is provided, which should be near a desired record. Where it is desired to find a record which is nearest to a record which may not exist, using the marker closest to such record provides a known valid entry point into the hash table. A reasonably short walk through records in the hash table is then performed to locate a nearest logical record.