1. Field of the Invention
This invention relates to computer processors and, more particularly, to queuing and writing store data to cache.
2. Description of the Related Art
Microprocessors have evolved to include a variety of features aimed at improving the speed and efficiency with which instructions are executed. In addition to advances in clock speed and the resulting reduction in instruction execution time, microprocessors may include pipelines, multiple cores, multiple execution units, etc. that permit some degree of parallel instruction execution. Further performance improvements have also been realized through a variety of buffering, queuing, and caching features intended to overcome bottlenecks in the movement of data to and from memory. For example, microprocessors often include multiple memory caches, arranged hierarchically and shared by multiple cores or execution units. Since, cache accesses are faster than memory accesses, various caching techniques are used to increase the likelihood that data is located in a cache when needed by a core or execution unit.
When multiple cores share memory or cache space, it is necessary to coordinate loading and storing of data in caches and in the shared memory so that a globally consistent view of the data at each location is maintained. For instance, it may be necessary for a given core to obtain exclusive access to a shared memory location before storing cached data in it. In the case where each core has its own level-1 cache but uses a shared, level-2 cache, a similar problem may exist. It may be advantageous to temporarily store data in one or more buffers or queues until exclusive access is obtained in order to permit the core to process additional instructions instead of waiting for the store operation to be completed.
One approach used to address the above concerns is for each core to have a store queue. A store queue may buffer memory operations that have been executed, but not yet committed to cache or memory. Memory operations that write data to memory may be referred to more succinctly herein as “stores”. A store may target a particular cache line (or portion of a cache line) and include an address identifying the targeted line as well as including data to be stored within the cache line. In order to improve performance, modern microprocessor cores may execute instructions out-of-order or speculatively. These techniques create a need for stores to be held until the order in which they should be presented to memory is determined and exclusive access to the targeted memory location is granted. Once the order of commitment is determined, the store may be retired. A store queue may be used to hold stores until they are retired, after which they may be committed to cache or to memory when exclusive access to the targeted memory location is granted. Moving store operations to the store queue permits a core's instruction execution pipeline to be used to execute other, subsequent instructions. However, even though queuing stores decouples a core from the operations of retiring stores and acquiring exclusive access to memory, a core may still stall if the store queue becomes full. In order to address the above concerns, what is needed is a way to reduce the chances of a store queue becoming full and stalling its associated processor core.