1. Field of the Invention
The present invention relates to instruction processing in a microprocessor. More particularly, the invention is a microprocessor that utilizes the time period associated with a stall condition in order to speculatively execute instructions and identify invalid data such that retrieval of valid data can be initiated.
2. Description of Related Art
There is a continual desire by computer users to maximize performance and a corresponding pressure on the computer industry to increase the computing power and efficiency of microprocessors. This is especially evident in the server computer field where entire businesses are dependent on their computer infrastructure to carry out and monitor day to day activities that affect revenue, profit and the like. Increased microprocessor performance will provide additional resources for computer users while providing a mechanism for computer manufacturers to distinguish themselves from the competition.
Over the years, state of the art microprocessors have evolved from fairly straight forward systems to extremely complex integrated circuits having many millions of transistors on a single silicon substrate. One of the many improvements made to microprocessors was the ability of microprocessors to execute more than one instruction per cycle. This type of microprocessor is typically referred to as being “superscalar”. A further performance enhancement was the ability of microprocessors to execute instructions “out of order”. This out of order operation allows instructions having no dependencies to bypass other instructions which were waiting for certain dependencies to be resolved. The IBM Power and PowerPC series of microprocessors are examples of superscalar systems that provide out of order processing of instructions. Microprocessors may support varying levels of out of order execution support, meaning that the ability to identify and execute instructions out of order may be limited.
One major motivation for limiting out of order execution support is the enormous amount of complexity that is required to identify which instructions can execute early, and to track and store the out of order results. Additional complexities arise when the instructions executed out of order are determined to be incorrect per the in order execution model, requiring their execution to not impact the architected state of the processor when an older instruction causes an exception. As processor speeds continue to increase, it becomes more attractive to eliminate some of the complexities associated with out of order execution. This will eliminate logic (and its corresponding chip area, or “real estate”) from the chip which is normally used to track out of order instructions, thereby allowing additional “real estate” to become available for use by other processing functions.
As known in the art, there are certain conditions that occur when instructions are executed by a microprocessor that will cause a stall to occur where instruction execution is limited or halted until that condition is resolved. One example is a cache miss which occurs when data required by an instruction is not available in a level one (L1) cache and the microprocessor is forced to wait until the data can be retrieved from a slower cache, or main memory. Obtaining data from main memory is a relatively slow operation, and when out of order execution is limited due to aforementioned complexities subsequent instructions cannot be fully executed until valid data is received from memory.
More particularly an older instruction that takes a long time to execute can create a stall that may prevent any younger, or subsequent instructions from executing until the time consuming instruction completes. For example, in the case of a load instruction that requires access to data not in the L1 cache (cache miss), a prolonged stall can occur while data is fetched from a slower cache, or main memory. Without facilities to support all out-of-order execution scenarios, it may not be possible to change instruction ordering such that forward progress through the instruction stream can be made while the missed data is retrieved.
Therefore, it can be seen that a need exists for a microprocessor with reduced or limited support for out of order execution that can make progress during stall conditions.
Load Lookahead Prefetch, and Branch Lookahead Prefetch are mechanisms that reduce the performance impact of stalls by allowing the instruction stream to be examined during such an extended stall condition in order to identify and speculatively execute future Load and Branch instructions without updating the architectural state of the machine.
In its basic form however, Load Lookahead Prefetch and Branch Lookahead Prefetch have no mechanism to store results beyond the length of the execution pipelines, limiting its ability to identify loads and branches that would qualify for prefetching and execution respectively. This shortcoming can be addressed by adding facilities to store intermediate results along with a method of managing the use of those values. The effect is increased performance of the Load Lookahead and Branch Lookahead mechanisms.