A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright(copyright) 1999, Microsoft, Inc.
The present invention pertains generally to hash tables, and more particularly to a system for determining the distribution of records in a hash table, and tuning the hash table in response to such determinations.
This application is related to a co-pending application having attorney docket number 777.256US1 Method and Apparatus for Finding Nearest Logical Record in a Hash Table which is assigned to the same assignee as the present application, filed on the same day herewith and hereby incorporated by reference.
Linear hash tables optimize access time by evenly distributing records across the underlying table. Ideally, records may be inserted into and accessed with a single hashing operation. Furthermore, it is desirable to use system memory efficiently, to, among other things, optimize the quantity of data that may be held in memory simultaneously. If the hashing function spreads data too sparsely across the hash table index, memory optimization may be diminished. On the other hand, if the data is too closely spaced or xe2x80x9cclumpedxe2x80x9d together, multiple access operations may be required to insert or locate a record. Accordingly, tuning or adjusting the hashing function to achieve better performance is a major goal of hash table design and operation.
The hashing function for any given data structure is thus selected to achieve optimal distribution of records in the hash table. In actual operation, the selected function is often checked for its performance, which may be done as records are initially inserted into the hash table, or later by analysis of the spread of data. Obtaining statistics for this analysis, however, may be cumbersome and time consuming. For example, in the case of a threaded linear hash table, the entire table may need to be xe2x80x9cwalkedxe2x80x9d in order to assess the distribution of data across the table. In the case of large databases, this operation may be time prohibitive to perform with any degree of regularity. Accordingly, there is a need for improved or alternate ways to assess the efficiency of the distribution of records in a hash table.
According to various example embodiments of the invention, there is provided a system for analyzing the efficiency or performance of a hash function by insertion of marker data records with known keys in a hash table together with the actual data records threaded in the hash table, and using the marker data records to analyze the distribution of actual data records around a marker data record.
In one aspect of the invention, the hash table is entered at various markers. A desired number of records is then walked using pointers and recording the index number to the hash table for each record. The index numbers are then analyzed to determine an efficiency of distribution of logically consecutive records. The hashing function may then be tuned based on the distribution as compared to a desired distribution.