The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to embodiments of the claimed subject matter.
Generally speaking, memory closer to the CPU may be accessed faster. Memory within a CPU may be referred to as cache, and may be accessible at different hierarchical levels, such as Level 1 cache (L1 cache) and Level 2 cache (L2 cache). System memory such as memory modules coupled with a motherboard may also be available, such externally available memory which is separate from the CPU but accessible to the CPU may be referred to as, for example, off-chip cache or Level 3 cache (L3 cache), and so on, however, this is not always consistent as a third hierarchical level of cache (e.g., L3 cache) may be on-chip or “on-die” and thus be internal to the CPU.
CPU cache, such as L1 cache, is used by the central processing unit of a computer to reduce the average time to access memory. The L1 cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. L2 cache may be larger, but slower to access. And additional memory, whether on-chip or externally available system memory used as cache may be larger still, but slower to access then smaller and closer CPU cache levels. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.
When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in one of its caches (e.g., L1, L2 caches, etc.) and when available, the processor reads from or writes to the cache instead of seeking the data from a system's main memory, thus providing a faster result than reading from or writing to main memory of the system.
Conventional caches utilize a store buffer to reduce cache latency and also to enable the reading of store instructions that have not yet been written into cache. As stores go down a pipeline they store the data in a store buffer and persist until the store is retired from the pipeline, at which point the store writes the data to cache.
Improvements to cache latency (e.g., reductions in cache latency) provide direct and immediate benefits to computational efficiency for an implementing circuit. Lower latency means that data required by, for example, a CPU pipeline is available sooner without having to expend cycles waiting for unavailable data.
However, the conventional cache design exhibits undesirable traits. For example, the store buffer necessitates additional circuit complexity and additional components on an integrated circuit that implements such circuitry. The store buffer requires the allocation of valuable area for address comparators, data buffering space, muxes (multiplexors) and so forth on an integrating circuit and further consumes power to operate such devices. Moreover, when data is directed to the store buffer, several cycles may be required before a subsequent cache read operation is able to “see” and retrieve the data from the store buffer; and still further additional cycles are required before the data can be retrieved from the cache. Thus, if an instruction to store “x” in the cache is triggered and an instruction to read “x” from the cache is issued within shortly thereafter, the read must be stalled or replayed in the pipeline for multiple cycles until data “x” becomes available in the store buffer, thus introducing overhead inefficiencies and sub-optimal system performance.
The present state of the art may therefore benefit from systems and methods for implementing a speculative cache modification design as described herein.