1. Technical Field
The present invention relates in general to the field of data processing systems, and more particularly, the field of enhancing performance of data processing systems.
2. Description of the Related Art
There is a desire by computer users to maximize performance of microprocessors and a corresponding pressure on the computer industry to increase the computing power and efficiency of microprocessors. This desire is especially evident in the server computer field where entire businesses are dependent on their computer infrastructure to carry out and monitor day to day activities that affect revenue, and the like. Increased microprocessor performance will provide additional resources for computer users while providing a mechanism for computer manufacturers to distinguish themselves from the competition.
Over the years, state-of-the-art microprocessors have evolved from fairly simple systems to extremely complex integrated circuits with millions of transistors on a single silicon substrate. Early microprocessors were only able to execute one instruction per cycle. Today, “superscalar” microprocessors are able to execute more than one instruction per cycle.
As known in the art, certain situations result in instruction stalls where instruction execution is limited or halted until the situation is resolved. An example of such a situation is a cache miss that occurs when data required by an instruction is not available in level one (L1) cache and the microprocessor is forced to wait until the data can be retrieved from a slower cache or main memory. Obtaining data from main memory is a relatively slow operation, and when out-of-order execution is limited due to aforementioned complexities, subsequent instructions cannot be fully executed until valid data is received from memory.
More particularly, an older instruction that takes a long time to execute can create a stall that may prevent any subsequent instructions from executing until the time-consuming instruction completes. For example, in the case of a load instruction that requires access to data not in the L1 cache (cache miss), a prolonged stall can occur while data is fetched from a slower cache, or main memory. Without facilities to support all out-of-order execution scenarios, instruction order may not be changed such that forward progress through the instruction stream can be made while the missed data is retrieved.
In the Power6™ processor, a product of International Business Machines of Armonk, N.Y., the fixed point, load/store, and branch instructions are executed in-order with respect to each other. Therefore, when a load encounters a cache miss, subsequent instructions are stalled while waiting for the missed request to complete.
To overlap cache misses, a feature called load lookahead (LLA) execution is implemented in Power6™. Under LLA, when a load instruction cannot execute due to a translation or cache miss, subsequent instructions are allowed to execute if the subsequent instructions do not (directly or indirectly) depend on the load instruction. The LLA mechanism enables Power6™ to generate multiple data fetch requests to the lower cache structure and to bring data required by the subsequent instructions into the L1 cache.
Results under LLA executions are not saved. The results are available when the instructions execute and while they are being staged through the execution unit before the write back stage. While the instructions are being staged, the results can be forwarded to subsequent instructions, if necessary. When the result is passing through the write-back stage, the general-purpose register (GPR) location being set by the instruction under LLA is marked as “dirty”, because the results are discarded. Subsequent instructions utilizing the facility beyond the write-back stage cannot rely on the data since the architected location (e.g., GPR) was not updated by the older instruction.
Therefore, there is a need for a system and method for enabling subsequent instructions to utilize the results of the LLA executions beyond the write-back stage to address the aforementioned limitations of the prior art.