The invention herein relates to architecture of a processor, and in particular to an architecture for avoiding store-hit-store issues in a microprocessor.
For some microprocessors that implement store forwarding mechanisms, special problems may arise when there are multiple stores to one block in the pipeline. If the background data is accessed for a store and from the cache before all preceding stores for the block have been written to the cache, a problem arises. More specifically, a problem will exist where the background data read from the cache is old and cannot be used for merging to create an up-to-date block of data. As perspective, consider operation of a processor in general.
Most processors run programs by loading an instruction from memory and decoding the instruction; loading associated data that is needed to process the instruction; processing the instruction; and storing any associated results in registers or memory. Complicating this series of steps is the fact that access to the memory, which includes the cache, main memory (i.e., random access memory) and other memory such as non-volatile storage like hard disks, (not shown) involves a lengthy delay (in terms of processing time).
One technique to improve performance is the use of “pipelining.” Pipelines improve performance by allowing a number of instructions to work their way through the microprocessor at the same time. For example, if each of the previously mentioned four steps of running programs is implemented as a pipeline cycle, then microprocessor would start to decode (in the first step) a new instruction while the last instruction waits for results to continue. This would allow up to four instructions to be “in flight” at one time, making the microprocessor appear to be up to four times as fast. Although any one instruction takes just as long to complete (there are still four steps) the microprocessor as a whole “retires” instructions much faster and can be run at a much higher clock speed than in prior designs.
Unfortunately, in a pipelined microprocessor, a special condition exists. This condition is commonly referred to as “store-hit-store.” In store-hit-store, a store (also referred to herein as an “update” or a “write”) to the cache is generated and designated for an address(es). Concurrently, another store is designated for at least a portion of the same address(es).
As constant, fixed size blocks of data are simpler to transfer and manipulate then variable size blocks of data, it makes sense to use a single fixed block size as much as possible. However, not all data in a block may require update. Accordingly, a certain portion that is not updated is referred to as “background data.”
Thus, for a pipelined microprocessor that can store data of variable lengths into a cache or memory hierarchy, it may be advantageous, at least some of the time, to merge this variable length store data into a larger fixed size block such that a fixed size up-to-date block of data may be passed on to the rest of the cache or memory hierarchy as the result of the store.
One problem that arises is that there may be multiple stores in the pipeline to the same block of the cache such that when a newer store reads the cache for its background data, the background data is not the correct, most recent value for that store due to an outstanding older store to that same block that has not yet written the cache.
What are needed are techniques for solving an overlap of stores. The techniques should guarantee that the correct background data will always be written into the background data register (either by the cache or by older stores to the same block) before the background data is needed for store merging and provide minimal impact upon system performance.