The present invention relates generally to memory devices, and more particularly to cache memories. A cache memory is a random access memory that buffers data from a main memory. A cache memory is typically employed to provide high bandwidth memory accessing to a processor by storing selected locations of the main memory. A typical cache memory contains a memory array organized into a set of cache blocks, often referred to as cache lines. A cache line can be addressed using an address tag that identifies a main memory location corresponding to that cache line.
Many computer systems today use virtual memory systems to manage and allocate memory to various processes running within the system. An operating system (OS) maps the virtual address (VA) space for each process to the actual physical address (PA) space for the system. Mapping from a physical address to a virtual address is typically maintained through the use of page tables.
One way in which the performance of a processor is improved is through use of a multiple-stage pipeline architecture, in which various pipeline resources, such as caches, buffers, arrays, and the like may be used to more efficiently execute instructions. One such pipeline resource that improves use of virtual memory systems is a translation lookaside buffer (TLB). A TLB is a relatively small cache memory in a processor pipeline which caches part of the system's virtual address to physical address translations. Specifically, a few elements of the translation set are stored in the TLB that the processor can access extremely quickly.
It is common for TLBs to be organized in a set-associative manner. In operation, control logic for a set-associative TLB constructs an index for the TLB from information including bits from the virtual address of a request received from a processor, and checks to see if the needed translation is present. A translation is present if one of the currently valid entries at the presented index has a tag that matches appropriate bits of the virtual address presented. Further, the entry may also be required to match other bits corresponding to a processor state, such as a process identifier or address space identifier. If a translation for a particular request is not present in the TLB, a “translation miss” occurs and the address translation is resolved using more general mechanisms. Translations in a TLB typically cover a contiguous naturally-aligned memory region (such as 4 kilobytes (KB) or 1 megabyte (MB)), and the method chosen to construct the index depends on the size of the region covered by the translations.
However, when a system contains translations covering widely varying sizes, it is impossible to create a single index that would efficiently handle all the different sizes. For example, a typical OS uses both 4 KB and 1 MB translation regions, represented by 4 KB and 1 MB page table entries, respectively. Typically, the 1 MB page table entry is broken down into multiple 4 KB TLB entries, or two separate TLBs (one for 4 KB entries and one for 1 MB entries) are used, or a fully-associative TLB is used. None of these alternatives is efficient, as each suffers from problems including higher miss rates, wasted cache space, and excessive power consumption.
Accordingly, a need exists for improved cache mechanisms.