1. Field of the Invention
The present invention relates generally to computer systems, and more specifically to the storage of data using a hash table.
2. Related Art
Since the creation of computer systems, engineers have struggled to maintain an optimal balance between efficient storage and fast location of data: Storage needs to be efficient because memory is finite and costly; Data needs to be quickly located to optimize system performance.
A traditional method for storing and accessing data employs a hash table, which is a type of database. A computer system employing such a methodology is shown in FIG. 1. The illustrated system 100 includes the standard array of components: a central processing unit 102; a user interface 104, which typically consists of a keyboard, a mouse, and a monitor; a primary memory 106, such as a random access memory, for program execution; and a secondary memory 108, such as a disk, for storing programs and data that are not immediately needed for execution. Contained within these memories 106, 108 are: an operating system 110 that manages the computer's resources; a database 112 that contains a variety of data objects 114, each containing a key 116 and associated data 118; and a traditional hash program 120.
The workings of the traditional hash program 120 are well-known to those skilled in the art, but will be explained here for the lay reader. Basically, the program 120 uses keys 116 to locate and store objects 114 in the database 112: each key 116 is unique and therefore can be used to identify the key's associated data 118. Typically, keys 116 are potentially large semi-random numbers drawn from a set of such numbers.
For example, consider a business that desires to track its clients' accounts. A database 112 would be created to hold the tracked data 118. In this case, possible keys 116 include the customers' credit card or social security numbers; the associated data 118 could be billing information, such as available credit and billing addresses.
No business would have an account for every social security number or a VISA account holder, and consequently it is unnecessary and wasteful to size a database large to house data for all such individuals. To more efficiently utilize memory, the near infinite set of social security or VISA card numbers is mapped to a smaller, more manageable set. This is done using a hash function, which is contained in the hash program 120. For example, a hash function might map 100 billion possible credit card numbers to a set of 150,000 possible numbers, using mathematical operations to translate the key to a smaller number. The resulting number is called a "hash address" or a "hash value" and identifies an entry of the database 112 for the key 118. In FIG. 1, the dotted boxes 122 represent the hash address at which each respective data object 114 is stored in the database 112.
Since the hash function greatly reduces the amount of information from the key 116 when translating it to the hash address 122, the function may translate two keys to the same address. This occurrence is called a "collision" and must be resolved by the hash program 120. That is, the program must have a method for storing two objects whose keys hash to the same address. One such method is known as "bucket chaining." "Bucket" refers to the database entry for a given hash address; "chain" refers to a linked list of data objects that hash to the same address. In FIG. 1, a "bucket chain" 124 is shown linking two such data objects 114, data object i and data object j. Other types of collision resolution are possible and are well-known those skilled in the art. Thus, they need not be described here.
When resolving collisions, by bucket chaining or similar methods, the traditional hash program 120 stores each object's key 116 along with the associated data 118 in the data object 114. When the program 120 later attempts to access a given object 114, it will again run that object's key 116 through the hash function to produce the key's hash address 120. The program then compares the key used to produce the hash address 122 with the key 116 stored in the data object 114 at the hash address. If these keys are identical, the object has been found. If not, the program 120 continues its search along the chain 124. If no stored key equals the newly produced key, there is no database entry for the key, so a new entry is created.
The traditional hashing methodology provides a reliable means for storing and locating data. But because memory costs are considerable, improvements to the scheme can represent significant cost savings. This is so because reducing the amount of memory needed to store a given set of objects frees memory for other resources. And when each object conserves memory, the overall cost of the computer system is reduced because less memory is required to achieve storage goals. Any reduction in the amount of memory needed to store and access data is therefore valuable.