In computer engineering, a cache is a block of memory used for temporary storage of frequently accessed data so that future requests for that data can be more quickly serviced. As opposed to a buffer, which is managed explicitly by a client, a cache stores data transparently; thus, a client requesting data from a system is not aware that the cache exists. The data that is stored within a cache might be comprised of results of earlier computations or duplicates of original values that are stored elsewhere. If requested data is contained in the cache, often referred to as a cache hit, this request can be served by simply reading the cache, which is comparably faster than accessing the data from main memory. Conversely, if the requested data is not contained in the cache, often referred to as a cache miss, the data is recomputed or fetched from its original storage location, which is comparably slower. Hence, the more requests that can be serviced from the cache, the faster the overall system performance.
In this manner, caching is generally used to improve processor core (i.e., core) performance in systems where data accessed by the core is located in comparatively slow and/or distant memory (e.g., double data rate 3 (DDR3) memory, etc.). A data cache is used to manage core accesses to the data information; an instruction cache is used to manage core access to instruction information. A conventional data caching strategy is to only fetch one line of data on any request from the processor core that results in a cache miss. This approach, however, causes a degradation of the application cycle count and is therefore undesirable. The cycle penalty is caused primarily by processor cycles that are spent bringing a cache line from main memory to the data cache.
A standard methodology which attempts to reduce this cycle penalty is to perform a hardware prefetch, also referred to as fetch ahead (FA), which brings a next line of data from main memory after any cache access (hit or miss). While the prefetch scheme may be helpful in reducing or even eliminating cycle penalties caused by sequential instruction execution, this approach is insufficient for non-sequential access patterns, such as, for example, a change of flow in code execution, as may be frequently encountered depending upon the particular application in which the processor is utilized.