The disclosed subject matter relates generally to computing devices having cache memories and, more particularly, to merging demand load requests with prefetch load requests.
A typical computer system includes a memory hierarchy to obtain a relatively high level of performance at a relatively low cost. Instructions of different software programs are typically stored on a relatively large but slow non-volatile storage unit (e.g., a disk drive unit). When a user selects one of the programs for execution, the instructions of the selected program are copied into a main memory, and a processor (e.g., a central processing unit or CPU) obtains the instructions of the selected program from the main memory. Some portions of the data are also loaded into cache memories of the processor or processors in the system. A cache memory is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs) are generally associated with a cache or a hierarchy of cache memory elements. Processors other than CPUs, such as, for example, graphics processing units (GPUs) and others, are also known to use caches.
The cache memory closest to the processor core is typically referred to as the L1 cache. A L2 cache may be located on a different die than the processor and L1 cache, and it may be shared across multiple processor cores. When a processor executes a program, it looks in the cache for the data. The data might have been previously used during the course of execution of the program and it may reside in the cache. The act of the processor finding the cache line, which holds the program data or program instructions, in the cache is called a “cache hit.” The act of the processor not finding the data or instruction in the cache is called a “cache miss.” The processor issues demand load requests to load cache lines into the cache on a cache miss. These demand load requests read cache lines from the system memory and store copies of them in caches that are accessible by the processor. The act of storing a copy of a cache line in a cache is referred to as “filling a cache line” into the cache. To improve performance, prefetching is used to predict that a processor core will need a cache line prior to it actually requesting data from that cache line using a demand load request. A prefetch unit implemented by hardware monitors the data patterns of the core and predicts future cache lines that will likely be needed.
The goal of prefetching is to fill the cache line prior to a demand load request targeting the cache line being serviced. If the cache line can be successfully prefetched, the latency for the later demand load request can be reduced because the demand load request will not see a cache miss. However, in some cases, the demand load request is processed before the cache line fill for the prefetch load can be completed, so the demand load request is queued behind the prefetch load request. The demand load request must wait for the cache line fill to complete prior to being serviced. For cache line fills, the data can be forwarded to the requestor in parallel with the cache line being entered into the cache. If a prefetch had not been implemented, the demand load request would have encountered a cache miss and would been eligible for data forwarding during the cache fill. On the other hand, if prefetching is implemented by the hardware, the demand load request may be received sometime shortly after the prefetch load request has been issued, but before the cache line fill for the prefetch load request has completed. In such cases, when the cache line fill happens, which is done on behalf of the prefetch load request, the data is not forwarded to the requesting processor that issued the demand load request. In these cases, the latency seen by the demand load request to obtain data after the cache line fill completes for the prefetch load request could be greater than it would have been if there had been no prior prefetch load request for this cache line.
This section of this document is intended to introduce various aspects of art that may be related to various aspects of the disclosed subject matter described and/or claimed below. This section provides background information to facilitate a better understanding of the various aspects of the disclosed subject matter. It should be understood that the statements in this section of this document are to be read in this light, and not as admissions of prior art. The disclosed subject matter is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.