In Oracle Express and other paging systems, access to data in a database is provided by demand paging of information into a fixed set of memory buffers. To provide access to a page of information, the paging system must locate the requested page in the main memory or, if the page is not in the main memory, the paging system must read the page into the main memory from disk. This process is called xe2x80x9cpage translation.xe2x80x9d
One of the first things done by typical virtual memory paging systems during a page translation is to locate an internal record with information about the requested page. This applies to both software and hardware virtual memory systems. In the presence of a great number of virtual pages shared by many users, this process typically requires searching large trees of data, involving expensive synchronization operations and additional page translations to access the data in those trees.
Almost every software application exhibits spatial locality of page translations. That is, the same relatively small set of pages is translated repeatedly during some period of time. Hence, paging systems very often search for the same information records repeatedly. These searches are redundant and very expensive in terms of time and computer resources.
Modern computer hardware architectures typically include specialized translation lookaside buffer (TLB) hardware to reduce the number of expensive and redundant searches. That hardware stores, or caches, the results of several recent searches so that those results can be reused if the same virtual page needs to be translated again.
Software applications, however, have little or no direct control over the TLB hardware. Hence, when a database server, for example, needs to access some record identified by, say, a page descriptor comprising a database number, page space number and page number, it cannot utilize the TLB. As a consequence, software paging systems typically did not address the problem of redundant searches and, therefore, performed redundant searches very often.
The present system applies hardware TLB techniques to a software virtual memory paging system. Experiments have indicated that software implementation of a TLB caching system eliminates the need for expensive searches in over 99% of page translations.
One embodiment of the present system uses a 2-way associative cache which contains a plurality of records. Each record has two cells for holding the results of searches for two page translations. During a page translation, a record is selected by computing a hash function of a page descriptor which may comprise a database identifier, a page space identifier, and a page number. If either cell contains the search result for the given page descriptor, no search is needed. Otherwise, a search is performed, and the search result replaces the least-recently-used cell in the record. This method can be generalized to an N-way associative cache method by maintaining N cells per record.
Another embodiment uses a LRU (least recently used) cache which employs a hash table of doubly linked lists of records, where each record holds the result of only one page translation search. All records also belong to a doubly linked LRU list, which is maintained so that the least-recently-used record is at the head of the list, and the most-recently-used record is at the tail.
During a page translation, a list of records corresponding to the value of the hash function of the page descriptor is selected from the hash table. If a record containing the search result for the given page descriptor is located in that list, no search is needed. Otherwise, the search is performed, and its result replaces the least-recently-used record in the entire cache. That record is then removed from its hash list and placed in the hash list that corresponds to the value of the hash function of the page descriptor.
The advantage of the LRU cache over the 2-way associative cache is the superb (perfect) retention of the results of recent searches, given the same maximum number of searches which can be cached. However, a 2-way associative cache requires at least four times less memory, and therefore can store many more search results for the same amount of memory. Also, because the 2-way associative cache does less bookkeeping, it is faster.
The present system includes a method of translating, in a software paging system, an input key describing a virtual page to the address of the page in memory. The system comprises creating, in main memory, a translation buffer which has a plurality of records. Each record has a plurality of translation entries or cells, and each cell has a key field for storing at least a portion of a key which identifies a page in memory. In addition, each cell has an address field for storing the address of the identified page. A record in the translation buffer is dereferenced from the input key, for example, by applying a hashing function, or dereference, to the input key to obtain a pointer to the dereferenced record. The input key is then compared with the keys stored in the dereferenced record. If the input key matches one of the stored keys, the address associated with the identified page is retrieved from the corresponding address field. If the input key does not match any key stored in the dereferenced record, a paging manager is invoked to establish an address for the input key, and the input key and established address are saved in a translation entry, or cell, of the dereferenced record.
In a particular embodiment, each translation entry also has a version field. Upon saving the address in the address field of a translation entry, a version identifier is saved in the version field of the translation entry. The version identifier is incremented each time a different virtual page is associated with the address. Upon an input key match, the version identifier of the corresponding translation entry is compared with the last retrieved version identifier for the same input key. The data from the page associated with the address is retrieved only if the version identifiers match.
Specifically, the key comprises a context and a page number, and the context comprises a database number and a page space number.
In one embodiment, the least recently used order of memory pages addressed in the dereferenced record is indicated by updating a least-recently-used cell indicator associated with the dereferenced record. In an embodiment where each record has two translation entries, the least-recently-used cell indicator is a single bit.
Where the system is employed in a multithreaded system, each thread can be associated with its own translation buffer to eliminate the need for expensive synchronization.
In accordance with another embodiment of the present system, a table having a plurality of entries is created. Each entry references a respective chain of translation records in a main memory translation buffer. Each chain, or hash chain, is associated with a unique key. Preferably, each hash chain is a doubly-linked list. Each translation record has a key field for storing a key identifying a page, and an associated address field for storing the address of the identified page in memory. A chain of translation records associated with the input key is dereferenced from the input key. The records of the dereferenced hash chain are searched until a translation record is found which has a key value matching the input key. Upon finding a match, the address is retrieved from the address field of the translation record having the matching key, and the translation record is indicated as the most recently used. If, on the other hand, no match is found, a page manager is invoked which establishes the address corresponding to the input key. The address is saved in the address field of the least recently used translation record, which is then indicated as the most recently used translation record. The translation record is then placed into the hash chain associated with the input key.
Preferably, a list of translation records is created which is ordered by least recent use (LRU). The LRU chain thereby provides an indication of which translation record is the most recently used and which translation record is the least recently used. Preferably, the LRU chain is a doubly-linked list.