The technical field encompasses computer architectures having prevalidated cache designs. In particular, the technical field encompasses an architecture to support virtual address aliasing and multiple page sizes.
Computer systems may employ a multi-level hierarchy of memory, with relatively fast, expensive, but limited-capacity memory at the highest level of the hierarchy proceeding to relatively slower, lower cost, but higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor or mounted physically close to the processor for speed. The computer system may employ separate instruction caches and data caches. In addition, the computer system may use multiple levels of caches. The use of a cache is transparent to a computer program at the instruction level and can thus be added to a computer architecture without changing the instruction set or requiring modification to existing programs.
A cache hit occurs when a processor requests an item from a cache and the item is present in the cache. A cache miss occurs when a processor requests an item from a cache and the item is not present in the cache. In the event of a cache miss, the processor retrieves the requested item from a lower level of the memory hierarchy. In many processor designs, the time required to access an item for a cache hit is one of the primary limiters for the clock rate of the processor, if the designer is seeking a single cycle cache access time. In other designs, the cache access time may be multiple cycles, but the performance of a processor can be improved in most cases when the cache access time in cycles is reduced. Therefore, optimization of access time for cache hits is critical to the performance of the computer system.
Associated with cache design is a concept of virtual storage. Virtual storage systems permit a computer programmer to think of memory as one uniform single-level storage unit but actually provide a dynamic address-translation unit that automatically moves program blocks on pages between auxiliary storage and the high speed storage (cache) on demand.
Also associated with cache design is a concept of fully associative or content-addressable memory (CAM). Content-addressable memory is a random access memory that, in addition to having a conventional wired-in addressing mechanism, also has wired-in logic that makes possible a simultaneous comparison of desired bit locations of a specified match for all entries during one memory-cycle time. Thus, the specific address of a desired entry need not be known since a portion of its contents can be used to access the entry. All entries that match the specified bit locations are flagged and can be addressed on the current or subsequent memory cycles.
Memory may be organized into words (for example, 32 bits or 64 bits per word). The minimum amount of memory that can be transferred between a cache and the next lower level of memory hierarchy is called a line or a block. A line may be multiple words (for example, 16 words per line). Memory may also be divided into pages, or segments, with many lines per page. In some computer systems page size may be variable.
In modern computer memory architectures, a central processing unit (CPU) produces virtual addresses that are translated by a combination of hardware and software to physical addresses. The physical addresses are used to access a physical main memory. A group of virtual addresses may be dynamically assigned to each page. A special case of this dynamic assignment is when two or more virtual addresses are assigned to the same physical page. This is called virtual address aliasing. Virtual memory requires a data structure, sometimes called a page table, that translates the virtual address to the physical address. To reduce address translation time, computers may use a specialized associative cache dedicated to address location, commonly called a translation lookaside buffer (TLB).
If a cache stores an entire line address along with the data and any line can be placed anywhere in the cache, the cache is said to be fully associative. For a large cache in which any line can be placed anywhere, the hardware required to rapidly determine if and where an item is in the cache may be very large and expensive. For larger caches a faster, space saving alternative is to use a subset of the address (called an index) to designate a line position within the cache, and then store the remaining set of the more significant bits of each physical address, called a tag, along with the data. In a cache with indexing, an item with a particular address can be placed only within a set of lines designated by the index. If the cache is arranged so that the index for a given address maps exactly to one line in the subset, the cache is said to be direct mapped. If the index maps to more than one line in the subset, or way, the cache is said to be set-associative. All or part of an address may be hashed to provide a set index that partitions the address space into sets.
With direct mapping, when a line is requested, only one line in the cache has matching index bits. Therefore, the data can be retrieved immediately and driven onto a data bus before the computer system determines whether the rest of the address matches. The data may or may not be valid, but in the usual case where the data is valid, the data bits are available on the data bus before the computer system determines validity. With set associative caches, the computer system cannot know which line corresponds to an address until the full address is compared. That is, in set-associative caches, the result of a tag comparison is used to select which line of data bits within a set of lines is presented to the processor.
In a cache with a TLB, the critical timing path for a hit requires a sequence of four operations: 1) a virtual tag must be presented to the TLB to determine the location of a corresponding physical tag in random access memory (RAM) in the TLB; 2) the physical tag must then be retrieved from the TLB random access memory; 3) the physical tag from the TLB RAM must then be compared to physical tag""s accessed from the tag section of the cache; and 4) the appropriate data line must be selected. The sequence of four operations is required to read the cache and can be a limiter to processor frequency and processor performance.
A computer architecture with a prevalidated tag cache design includes a prevalidated tag that holds TLB bits corresponding to TLB entry slot numbers, instead of physical or virtual addresses. To support virtual address aliasing, additional logic circuits are added to the cache micro-architecture. The logic allows the prevalidated tag cache design to allow different users with different virtual addresses to access a same physical address space. The logic circuitry may include a physical address, content addressable memory (CAM) and a page size mask. The logic circuit compares physical addresses and page sizes among existing TLB entries a new TLB entry. If a match is found, the logic circuit can direct invalidation of an existing TLB entry, replacement of the TLB entry, or saving of the current TLB entry in a new TLB slot location.
To support multiple page sizes, additional logic circuits are added to the cache micro-architecture. This logic allows the prevalidated tag cache design to support multiple page sizes. The logic may include a subset of the physical address in the prevalidated tag, a page size mask and a content addressable memory (CAM) in the prevalidated tag. A page size mask and optionally a CAM that uses the page size mask for virtual address hits, may be added to the TLB. Logic may be added to the cache hit logic to add in page variable results.