Microprocessors typically employ one or more caches to store data and instructions for use by the processes being executed. Often, these caches take the form of a cache hierarchy including two or three caches that are physically positioned successively farther away from the processor. The use of such caches makes it possible for the processor to access information more rapidly than would otherwise be the case if the processor had to obtain the information from main memory or some other memory device.
Although the information stored in caches can be accessed significantly faster than the information stored in other memory devices such as main memory, caches are only able to store relatively small amounts of information. That is, only a relatively small portion of all of the information required by a process being executed by a processor typically resides in the cache(s). Nevertheless, even though a cache at any given time may only store a fraction of the total amount of information that may be required by a given process, the operation of the cache can be managed in such a way that at any given time the cache holds much if not all of the information that is required by the process at that particular time. Information desired by a process that is absent from a cache can usually be transferred to the cache from another memory device.
Because information is stored in multiple memory devices including cache(s), main memory, and elsewhere, and because as discussed above such information is often moved among the different memory devices (e.g., to and from the caches), it would be difficult to compose processes for execution on a processor if those processes were themselves responsible for controlling the movement and storage of information and keeping track of the locations of the information to allow for accessing of that information. Instead of burdening the processes with such responsibilities, many modern microprocessors implement “virtual memory” systems in which the microprocessors organize and manage the operations of the various memory devices in such a manner that, from the perspective of the processes, the overall memory appears to take on a standardized form independent of the particular memory devices that are available.
To implement such a virtual memory system, actual physical memory locations or addresses are mapped to corresponding virtual addresses, typically in a one-to-one manner. Rather than having to access the actual memory addresses, processes operating on the processor merely refer to the virtual addresses. By virtue of a page table or similar map that stores the correspondences between the physical and virtual addresses, the referred to virtual addresses are converted into the actual physical addresses and as a result the data at those actual memory addresses can then be provided to the processes requesting that data.
Due to its size, the page table itself is stored in main memory. However, for much the same reasons that the substantive information desired by processes operating on the processor are stored in cache(s) (in particular, to facilitate the speed of accessing the stored information), portions of the page table that are most relevant to the processes being performed at a given time can also be stored in a specialized cache termed a translation lookaside buffer (“TLB”). When a process requests information at a given virtual address, the processor first consults the TLB cache rather than the page table in main memory to obtain the mapping or translation of that virtual address into a corresponding physical address. If the TLB cache does not include the desired translation, such that a “TLB miss” occurs, then the desired translation is obtained from the page table and inserted into the TLB.
Although the use of a TLB cache improves the speed with which a desired translation can be obtained, the TLB cache by itself fails to address another problem associated with the accessing of such translations. Because the amount of data in a page table is typically very large, the process of searching through a page table for a desired translation can itself take a relatively large amount of time. To reduce the time associated with such searching, rather than consulting standard page tables, many conventional processors instead consult a modified version of page table termed a virtual hash page table (“VHPT”) in main memory that assigns codes or hashes to the different translation information. The codes are typically representative of the different translations but are shorter in length, and consequently it is possible to search through the translations as represented in the VHPT, by a hardware or software-implemented “page walker”, at a quicker pace than would be possible if a standard page table was being searched.
While the use of a TLB cache in conjunction with a VHPT in main memory improves the rapidity with which a processor can obtain desired address translation information, it would be desirable to achieve even faster rates of obtaining desired translations than are achieved with current systems. In particular, it would be desirable to improve the speed with which a processor is able to obtain desired translation information from the VHPT when TLB misses occur. Although inclusion of VHPT information within one or more of the caches of the cache hierarchy of the microprocessor might initially appear to provide some benefit, insofar as the caches can be accessed more quickly than main memory, this does not in fact provide any significant benefit.
More particularly, the caches of the cache hierarchy are operated in a manner that takes advantage of a “temporal locality” principle. That is, the caches are operated so as to favor the continued storing of information that has been more recently accessed by the processor and to overwrite/replace information that has been less recently accessed, with the underlying presumption being that the more recently a given piece of data has been requested, the more likely it is to be requested again in the future. Although caching information based upon the temporal locality principle works well generally in terms of making available to the processor that information which is most likely to be of use to the processor at a given time, the caching of VHPT entries within the caches of the cache hierarchy based upon this principle does not work well. TLB misses, and consequently searches for VHPT entries, occur with such a low frequency relative to the frequency with which other types of information (e.g., substantive information) are requested that relevant VHPT entries stored in the cache hierarchy are usually replaced/overwritten before those entries are requested again.
Therefore, it would be advantageous if a new system and method could be developed that enhanced the speed at which stored data could be accessed by a processor. More particularly, it would be advantageous if such a new system and method increased the speed at which appropriate VHPT entries representing correspondences between virtual and physical addresses could be obtained by the processor in response to TLB misses, such that the data stored at the physical addresses corresponding to those virtual addresses could be more rapidly accessed by the processor.