The disclosed subject matter relates generally to computing devices having cache memories and, more particularly, to a merging eviction and fill buffers for cache line transactions.
A typical computer system includes a memory hierarchy to obtain a relatively high level of performance at a relatively low cost. Instructions of different software programs are typically stored on a relatively large but slow non-volatile storage unit (e.g., a disk drive unit). When a user selects one of the programs for execution, the instructions of the selected program are copied into a main memory, and a processor (e.g., a central processing unit or CPU) obtains the instructions of the selected program from the main memory. Some portions of the data are also loaded into cache memories of the processor or processors in the system. A cache memory is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs) are generally associated with a cache or a hierarchy of cache memory elements. Processors other than CPUs, such as, for example, graphics processing units (GPUs) and others, are also known to use caches.
The cache memory closest to the processor core is typically referred to as the L1 cache. A L2 cache may be located on a different die than the processor and L1 cache, and it may be shared across multiple processor cores. Due to the limited size of the L1 cache it is sometimes necessary to evict a cache line residing in the L1 cache to make room for a cache line being added. Evicted cache lines are sent to the L2 cache, which is typically larger than the L1 cache.
To handle cache fills and evictions are plurality of data fill buffers and data eviction buffers are typically employed. Data fill buffers hold the data fills before they can be sent to the cache. The cache fill port may not always be available. For example, the fill port may be servicing an older fill transaction. Hence, the data fill buffers allow data to be temporarily buffered prior to scheduling the line fill into the cache. A fill request can only be sent to the L2 cache if a free data fill buffer is present. A miss in the L1 cache and subsequent fill may also require a different line to be evicted out of the L1 cache. The data eviction buffers hold the evictions before the evicted data can be sent out to the L2 cache. These cache evictions are triggered by the fills or external probes from other cores.
For a 64 byte cache line, each fill transaction puts 64 bytes worth of line data into the cache in a sequence of four consecutive (16 byte) beats. Since a cache fill can cause another line to be evicted out of the cache, the fill is sent only when a free eviction buffer is present to hold the evicted data, if necessary. The victim data is written out from the cache concurrent to the incoming fill, i.e., each incoming 16 byte fill beat on the fill port causes the victim line's corresponding 16 byte chunk to be evicted out on the read port.
The performance of the cache is directly related to having a sufficient number of fill and eviction buffers to handle the cache traffic. If not enough buffers are present, bottlenecks can occur. However, the buffers consume an appreciable amount of real estate on the die and also consume power. Thus, there is direct tradeoff between performance and real estate and power consumption.
This section of this document is intended to introduce various aspects of art that may be related to various aspects of the disclosed subject matter described and/or claimed below. This section provides background information to facilitate a better understanding of the various aspects of the disclosed subject matter. It should be understood that the statements in this section of this document are to be read in this light, and not as admissions of prior art. The disclosed subject matter is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.