A cache in a central processing unit is a data storage structure that is used by the central processing unit of a computer to reduce the average time that it takes to access memory. It is a memory which stores copies of data that is located in frequently used main memory locations. Moreover, cache memory (“cache”) is memory that is smaller than and that may be accessed more quickly than main memory. There are several different types of caches. These include physically indexed physically tagged (MT), virtually indexed virtually tagged (VIVT) and virtually indexed physically tagged (VIPT).
Caches that can accommodate multiple accesses in a single cycle provide performance advantages. In particular, such caches feature reduced access latencies. Conventional approaches to accommodating multiple accesses in a single cycle include the use of multi-ported caches and the provision of caches that include a plurality of tag and data banks.
A multi-ported cache is a cache Which can serve more than one request at a time. In accessing some conventional caches a single memory address is requested, whereas in a multi-ported cache, N memory addresses can be requested at a time, where N is the number of ports that is possessed by the multi-ported cache. An advantage of a multi ported cache is that greater throughput (e.g., a greater number of load and store requests) may be accommodated. However, the number of cache ports that are needed to accommodate increasingly high levels of throughput may not be practical.
Caches that include a plurality of tag and data banks can serve more than one request at a time as each tag and data bank can serve at least one request. However, when more than one request attempts to coincidentally access the same bank, the request that will be allowed to access the bank must be determined. In one conventional approach, arbitration is used to determine which request will be allowed to access a given tag and data bank. In such conventional approaches, the time that it takes to execute the arbitration can delay access to the tag bank and thus delay the triggering of the critical Load Hit signal, typically found in the Level 1 cache of processors.
Further, in a conventional system which supports a plurality of load accesses of a cache in a single cycle, if multiple accesses are accessing the same block within the same cache line, the arbitration scheme selects one access per block and signals a Load Hit (if there is a tag hit) only for the selected access while penalizing the other accesses. This is problematic because only one of the accesses signals a Load Hit while the other same-cycle accesses that are attempting to access the same data return a Load Miss. Further, if conventional systems supporting multiple load accesses in a single cycle encounter a load access that is unaligned, the access request is split into two components before sending it to the Level-1 cache. Because these two components cannot be sent to the Level-1 cache at the same time, such accesses never result in a Level-1 cache hit and are penalized.