1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to performing store-to-load forwarding (STLF) in a microprocessor.
2. Description of the Related Art
In high-performance microprocessors, the load store unit typically contains storage for several outstanding load and store operations waiting to access L1 cache. A common performance enhancement that can be applied to this type of microprocessor architecture is to implement a mechanism for forwarding data from older store operations (store operations that occur earlier in program order) to younger load operations (load operations that occur later in program order).
This store-to-load forwarding of data significantly improves execution efficiency by circumventing accesses to the L1 data cache. The method by which STLF is typically achieved is to search all older store operations while the load operation is in the data cache access stage of the execution pipeline. The address of the load operation is compared with the addresses of all the store operations resident in the load store unit. First, all store operations that target the same address as the load operation are identified and then this grouping is refined by eliminating any store operations that are younger than the load operation. Once all the store operations that are older than the load operation have been identified, the relative ages of these store operations are compared to find the youngest store operation that is older than the load operation. The data associated with this youngest store operation is then forwarded to the load operation allowing it to complete normally without the need to access the L1 data cache.
The address comparisons and searching algorithms used to locate the youngest store operation with the same target address as the load operation are relatively complex and require many levels of combinatorial logic for implementation. Typically the load store unit stores operations waiting to complete by accessing L1. Each line of this storage contains multiple entries for load or store operations. When the address of a load operation becomes available, it must be compared to the address of each entry and all matching entries must be verified as store operations. Once all store operation entries matching the load operation's targeted address have been identified, a find-first algorithm may be employed to identify the youngest matching store operation that is older than the load operation. The data from the appropriate entry may then be forwarded to the load operation.
For high-performance microprocessors, this STLF functionality is typically a part of the critical path for completion of load operations, and therefore significantly impacts effective load latency. The time taken to perform each of the procedures outlined above contributes to this effective load latency, and in some instances, may limit the maximum frequency at which the microprocessor can operate.