The present invention relates generally to set-associative caches. More particularly, the present invention relates to a way-selecting translation lookaside buffer.
In high-performance cached-memory subsystems, single-cycle cache accesses are common. In such systems, the address tags and the associated data are read from all ways of the cache simultaneously. The incoming address is compared against all available address tags of the same set, providing a vector that is used to select which way of the available data corresponds to the request. In physically-addressed caches, the virtual-to-physical address translation typically occurs in parallel to the tag and data access. Generally, the critical timing path extends from the generation of the physical address (usually via a translation lookaside buffer), through the comparators associated with the address tags stored in the cache, to the switch-network (commonly referred to as the way multiplexer) that controls the output of the requested cache data. Implementation of a direct-mapped (that is, one-way) cache somewhat eases this timing constraint as the comparison is required for only a single tag to determine a hit condition, so no switch network needs to be traversed to access the data. But set-associative caches further exacerbate the timing constraint because a comparison is required to select the way of data.
For example, FIG. 1 shows a prior art set-associative cache 100. Cache 100 includes a data cache 102, a tag cache 104, a translation lookaside buffer (TLB) 106, a tag comparator 108, and a way multiplexer (MUX) 110. TLB 106 includes a content-addressable memory (TLB CAM) 116 and a random-access memory (TLB RAM) 118. A virtual address 120 is received that includes a set index 122 and a virtual tag 124.
Cache data 126 and cache tags 128 are read from all ways of data cache 102 and tag cache 104, respectively. TLB 106 provides a physical tag 130 of a physical address based on virtual tag 124. In particular, TLB CAM 116 provides a hit vector 134 based on virtual tag 124, and TLB RAM 118 provides physical tag 130 based on hit vector 134.
Tag comparator 108 compares physical tag 130 against all available address tags of the same set, providing a way selection 132 that way MUX 110 uses to select which way of cache data 126 corresponds to virtual address 120. The selected cache data is provided as data 136. In FIG. 1, the critical timing path includes TLB 106, tag comparator 108, and way MUX 110.
Various methods are used in an attempt to alleviate this timing burden. For example, a cache based on the un-translated virtual address does not fully contain this timing path—although the TLB must still provide permission checks. However, a virtual cache must be invalidated on context switch. In addition, many instruction set architectures require physically-addressed caches. A virtually-indexed, physically-addressed cache lessens this timing burden as the cache access occurs entirely in parallel with the address translation (again, save the permission checks). However, virtually-indexed caches suffer from address aliasing phenomena where multiple virtual addresses may be mapped to the same physical address. Additional hardware and/or software must be implemented to resolve the aliasing issue.