To cope with the high cost of memory access, modern architectures provide a large number of general purpose registers. These registers offer a dense set of short term storage within the CPU to avoid accessing memory. Unfortunately, short term values cannot always take advantage of these registers. There are situations known to cause this behavior, for example: register pressure causes values to be spilled and filled from memory, registers must be demoted across function calls, and compilers are not able to disambiguate pointers and conservatively keep the values in memory to guarantee correctness. While a variety of techniques to reduce these restrictions have been proposed, they have not seen widespread adoption. This is likely because of the required changes to the programming interface. The most common architectural approach used in modern out-of-order processors is not to prevent the situations listed above, but instead to speed up the short term spills via a sophisticated load-store-unit (LSU) in conjunction with a high bandwidth L1 cache.
Besides serving as a device for high speed memory access, the LSU is also used as storage for speculative data. Stores cannot commit to memory until they become known valid architectural state, typically at the head of the reorder buffer. While effective, many LSU designs are considerably expensive, featuring comparator matrices, storage registers, ordering logic, scheduling logic, and requiring the L1 cache to be multi-ported with low latency. Many of these components share similar functionality to the point of redundancy with other pipeline components. An example of this redundancy is the storage of a single value being potentially duplicated in the register file, LSU, cache, and main memory. Another example is the ordering of memory operations being maintained by both the reorder buffer and the LSU.
The present technique seeks to address these issues.