The present invention relates to methods and systems for accessing and storing information in an indexed computer data structure. More particularly, the present invention relates to lock-free methods and systems for accessing and storing information in an indexed computer data structure having modifiable entries.
In computer programs, such as cache memory management systems, it is desirable to decrease the retrieval time of frequently accessed information. In order to decrease the retrieval time of frequently accessed information, cache memory systems may store the frequently accessed information in an indexed data structure, such as a hash table. As used herein, the phrase xe2x80x9cindexed computer data structurexe2x80x9d or xe2x80x9cindexed data structurexe2x80x9d refers to a set of elements referred to as entries, wherein each and every entry is uniquely selectable based on a selection function that produces an index. A hash table is an indexed data structure that may be stored in computer memory to reduce the number of comparisons required to locate and retrieve information. For example, a hash table may comprise an array of records or entries containing the information desired to be located. Each entry may also contain a key used by processes or threads performing lookup or search algorithms to locate the entry. A key is a unit of data that is compared with a search key to determine whether an entry contains the data of interest. A hash table reduces the number of keys that must be compared to locate a given entry using a hash function. A hash function is a mathematical algorithm that receives a key as an input and ideally computes a unique index as an output. For example, the hash function may always produce the same index for a given key. However, different keys may result in the same index, if the hash function is not perfect. If the hash table is stored in an array, the index may represent an offset from the table base address and a potential location for the entry containing a key that matches the search key. Thus, for a given entry e in the table with key k, INDEX=hash(k). If the table including the entry e is an array, e and the key k may be stored at array[INDEX].
Because hashing different keys may result in the same index, an entry may not be located at the index corresponding to the hash of the key for the entry. For example, when different keys hash to the same index, the first key may be stored at the location corresponding to hash(key). The second key may be stored at another location, e.g., a location corresponding to hash(key)+1. Thus, even though hashing reduces the number of comparisons required to locate an entry, more than one comparison may be required. For example, in order to locate an entry in a hash table, a thread may first hash a search key to determine an initial index. The thread then locates the entry corresponding to the index and compares the key in the entry with the search key. If the keys match, the thread may extract the desired information or a pointer to the desired information from the entry. If the keys do not match, the thread may search the remaining entries in the table to locate a matching key. In certain cases the hash table may be constructed or used in such a way as to eliminate the potential for duplicate hash results, i.e., as in the case of perfect hash functions. In such a case, only a single comparison may be required to locate an entry because each entry is stored exactly at the location corresponding to the hash of the key value for the entry.
If the hash table is stored in a fast memory device and used to cache data stored in another slower memory device, when a thread fails to locate an entry having a matching key in the hash table, the thread may extract the data from the slower memory device and attempt to insert the data into the hash table. Inserting the data in the hash table may include attempting to store the data at the location corresponding to the hash of the search key. If the entry at that location is uninitialized, i.e., if it has not been previously used to store information, the thread may insert the data in the entry. If the entry has been initialized, the thread may search for an empty entry in the table to store the new data. If the table is full, the thread may replace an existing entry with the new data. Once the new data is stored in the hash table, subsequent searches of the hash table for the entry will succeed, unless another process removes the new entry.
A problem with conventional methods for accessing and storing data in hash tables and other data structures used in cache memory systems is that these methods may require locking when multiple threads or multiple processes access a data structure to ensure the validity of information retrieved during a search. For example, in one conventional cache memory system, when a first thread accesses a cache, for example to perform a lookup operation, the first thread locks the cache, preventing other threads from performing any cache operations. When the first thread completes the search operation, the first thread unlocks the cache, allowing other threads to access the cache. Allowing a single thread to block all other attempted accesses to a cache is undesirable because the first thread may block during its access to the cache and prevent other threads from accessing the cache indefinitely. Even if the first thread does not fail or block, the remaining threads are delayed for at least the time for the first thread to lock the cache, perform the lookup operation, and unlock the cache. These operations introduce latency into cache accesses by the other threads. Thus, allowing a single thread to lock the entire cache may be undesirable in high-speed computer memory systems whenever there is likely to be contention among multiple threads for accesses to the cache.
In another conventional memory system, rather than locking an entire cache, a thread may lock each individual cache entry when the thread accesses the entry and unlock the entry when the thread completes the access to the entry. For example, in a lookup operation, a thread may search through entries or records in a table. In order to access each entry, the thread locks the entry, reads the information in the entry, determines whether the information matches the search key, and unlocks the entry. This process is repeated for each entry in the table until a match is found. Other threads may be prevented from accessing the entry due to the lock. Because the first thread locks, reads, compares, and unlocks each entry before another thread can access the entry, latency may be introduced into cache operations performed by other threads. In addition, locking and unlocking each entry during a search may introduce latency into memory operations performed by the first thread. Thus, a cache memory system that requires locking and unlocking of entries by each thread may be unsuitable for high-speed operations.
As described above, the phrase xe2x80x9cindexed computer data structurexe2x80x9d refers to a set of elements referred to as entries, wherein each entry is uniquely selectable utilizing a selection function that produces an index. The selection function may comprise any mathematical function that maps data from a first domain into a second domain, wherein the second domain is a set of indices capable of selecting all of the entries. Exemplary indexed computer data structures to which the lock-free methods and systems according to the present invention may apply include hash tables and linked lists. A hash table may be indexed using a perfect or an imperfect hash function. The lock-free methods and systems for accessing an indexed computer data structure are applicable to hash tables accessible by both perfect and imperfect hash functions. A linked list may be indexed by computing an index to a first entry in the list and following pointers to locate the data of interest. The lock free methods and systems for accessing an indexed computer data structure according to the present invention may be used to access entries stored in a linked list utilizing any selection function to compute the initial index.
The present invention is not limited to methods and systems for accessing and storing information in a hash table or a linked list. For example, the lock-free methods and systems are applicable to any indexed computer data structure, such as a simple array. In such a case, the search key may be the offset from the table base address and the selection function may utilize the search key to directly access an entry.
The lock-free methods and systems for accessing and storing information in an indexed computer data structure according to the present invention may be used by any form of concurrent execution or apparently concurrent execution provided by a computer operating system to access and store entries without requiring locks. For example, in operating systems that allow concurrent threads of execution on multiple processors in a shared memory multiprocessor, the lock-free methods and systems for accessing and storing information may be used by multiple threads to concurrently access and store entries in the indexed computer data structure. In an operating system that allows multiple processes, the lock-free methods and systems for accessing and storing information according to the present invention may be used by multiple processes to concurrently access and store information. In an operating system that allows both multiple threads and multiple processes, the lock-free methods and systems for accessing and storing information according to the present invention may be used by multiple threads and multiple processes to concurrently access information. Thus, while the discussion that follows may refer to concurrent access by threads or processes, the lock-free methods and systems according to the present invention are applicable to any form of concurrent execution, regardless of whether the concurrent execution is real or simulated.
According to a first aspect, the present invention includes a method for locating entries in an indexed computer data structure. The method may include executing a first thread for performing a lookup for a first entry in an indexed computer data structure, locating the first entry, and accessing the information stored in the first entry. While the first thread accesses the information in the first entry, the method may include allowing a second thread to concurrently access the information in the first entry. Allowing concurrent access to information in an entry greatly increases the efficiency of concurrent lookup operations.
According to another aspect, the present invention may include a lookup procedure including computer-executable instructions for allowing concurrent access to entries in an indexed computer data structure. The instructions may include starting a first thread for performing a lookup for a first entry in an indexed computer data structure and locating the first entry. After starting the first thread, a second thread may be allowed to concurrently access the first entry.
According to another aspect, the present invention may include a computer data structure for a concurrently accessible entry in an indexed computer data structure. The computer data structure for the entry may include at least one key value field for storing at least one key value for comparison with a search key to identify the entry. The data structure for the entry may also include a data field for storing data or a pointer to the data to be extracted from the entry. The data structure may further include an in-use counter field for storing an in-use counter for indicating whether the entry is in use.
According to another aspect, the present invention may include an insertion procedure for inserting entries in an indexed computer data structure. The insertion procedure may include computer-executable instructions embodied in a computer-readable medium for performing steps. The steps may include obtaining data to be inserted into an indexed computer data structure from a source external to the indexed computer data structure. After obtaining the data, the insertion procedure may access a first entry to insert the data while allowing other threads to access to the first entry. The insertion procedure may then determine whether the first entry is empty. If the first entry is empty, the insertion procedure may increment an in-use counter associated with the first entry. After incrementing the in-use counter, the insertion procedure may determine whether the first entry remains empty. In response to determining that the first entry remains empty, the insertion procedure may write the data obtained from the external source in the first entry.
According to another aspect, the present invention may include a removal and replacement procedure for removing and replacing entries in an indexed computer data structure. The removal and replacement procedure may include computer-executable instructions embodied in a computer-readable medium for performing steps. The steps may include accessing a first entry in an indexed computer data structure, reading an in-use counter associated with the first entry, and determining whether the first entry is a candidate for removal. The first entry may be identified as a candidate for removal based on any suitable removal condition or conditions. For example, exemplary removal conditions or criteria include the time of last use of the entry, random or pseudo-random selection algorithms, or a roving index that is incremented after each removal. However, even if the removal condition is satisfied for an entry, the entry may not be considered as a candidate for removal until the in-use counter indicates that no threads are using the entry. Once the removal and replacement procedure determines that the predetermined removal condition is satisfied and that no threads are using the entry, the removal and replacement procedure may increment the in-use counter of the entry and re-check whether the predetermined removal condition remains satisfied. If the condition remains satisfied, the entry may be identified as a candidate for removal. The removal and replacement procedure may repeat the testing for each entry in the indexed computer data structure until the best candidate for removal is identified. When the best candidate is identified, the removal and replacement procedure may write new values to the fields in the entry.
In one implementation of the invention, a least recently used policy is used to identify candidates for removal. In this implementation, the removal and replacement procedure may compare a time stamp value of the first entry with a predetermined time stamp value. The removal and replacement procedure may identify the first entry as a candidate for removal if the in-use counter indicates that the first entry is not in use and the time stamp value of the entry is less than the predetermined time stamp value.
According to another aspect, the present invention may include a virtual interface user agent comprising computer-executable instructions embodied in a computer-readable medium. The computer executable instructions may include receiving a request for performing an operation for receiving data into or sending data from a memory location specified by a virtual memory address. The request may include a virtual memory address. In response to the request, the virtual interface user agent may perform a lookup procedure in an indexed computer data structure to locate a memory handle corresponding to the virtual memory address. The virtual interface user agent may locate the entry containing the memory handle and access the entry. While the virtual interface user agent accesses the entry, the virtual interface user agent may allow other threads to access the entry.
According to another aspect, the present invention includes a computer program including computer-executable instructions embodied in a computer-readable medium for coalescing entries in an indexed computer data structure. The computer executable instructions instruct a computer to perform steps. The steps include searching an indexed computer data structure for a first entry having one or more key values having a first relationship with a search key. In response to failing to locate the first entry, the computer instructions may include identifying at least one second entry in the indexed computer data structure having a second relationship with the search key and incrementing an in-use counter associated with the second entry. After incrementing the in-use counter, the instructions may include determining whether the second relationship between the second entry and the search key still exists. In response to determining that the second relationship still exists, the instructions may include coalescing the second entry with data corresponding to the search key.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments which proceeds with reference to the accompanying figures.