I. Field of the Invention
This invention relates generally to computer technology, and more particularly, to improving processor performance in a computer system.
II. Background Information
Improving processor performance is a major concern in the field of computer systems. Since the primary function of many general-purpose computers is to execute a program which includes a sequence of instructions, a computer characterized as operating with improved performance completes a given program faster than the standard computer.
Piplines are employed to increase processor performance. In pipelined processors, execution of instructions is broken down into separate pipeline stages. Although the pipeline may be divided into any number of stages at which portions of instruction processing is performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction. Different aspects of different instructions are processed at the same time by different stages forming the pipeline. For example, while one instruction is being fetched from memory, another is being decoded, another is being executed, etc.
A branch instruction is an instruction which typically causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; and a target address identifying an instruction stream beginning at another location in memory. When it is known whether or not an instruction being processed in the pipeline will cause a branch, and to what address the instruction will cause a branch (the xe2x80x9cbranch targetxe2x80x9d), the branch is resolved. Branch instructions typically are not resolved until after the execution stage.
Waiting for the branch instruction to be resolved would cause many of the pipeline stages to be idle and severely impact performance because it is unknown which instructions to load into the pipeline until after the branch is resolved. In order to maintain optimum performance of the processor, it is necessary to predict the instruction subsequent in program order to the control-flow instruction (branch) and dispatch that instruction into the instruction processing pipeline. A branch prediction mechanism indicates a predicted target for a branch instruction, allowing subsequent instruction fetching to continue within the predicted instruction stream indicated by the branch prediction.
When a branch is resolved, if the fetch unit has not fetched from the correct instruction stream (i.e., a branch misprediction occurred), the instructions fetched and placed in the pipeline subsequent to that branch instruction must be flushed, i.e., removed from the pipeline and the correct instructions must be fetched and executed. Branch mispredictions should be avoided because of the resulting pipeline flush and refetching of instructions which significantly decreases processor performance.
Another way to improve processor performance is to employ cache memory. Cache memory is a relatively high speed, relatively small memory in which active portions of program instructions and/or data are placed. The cache memory is typically much faster than main memory and approaches the speed of the processor. By keeping the most frequently accessed instructions and/or data in the high speed cache memory, the average memory access time approaches the access time of the cache.
When the processor needs a new instruction and/or data, it first looks in the cache. If the instruction and/or data is in the cache (referred to as a xe2x80x9ccache hitxe2x80x9d), the processor can obtain the instruction and/or data quickly and proceed with the computation. If the instruction and/or data is not in the cache (referred to as a xe2x80x9ccache missxe2x80x9d), the processor must wait for the instruction and/or data to be loaded from the slower main memory. Thus a cache miss leads to a substantial reduction in processor performance.
Some processors execute ready instructions in the instruction stream ahead of earlier instructions that are not ready (these processors are referred to as out-of-order execution processors). A ready instruction is an instruction whose source data is already available. In computer systems implementing out-of-order execution, identifying the instructions that a particular instruction depends upon is necessary because the instructions dependent upon must be executed before the particular instruction in order to maintain correctness.
Dependency tracking and marking mechanisms are known in the art and exist for the purpose of identifying instructions that produce results needed for the execution of a specific instruction that follows.
Out-of-order implementations require the use of instruction scheduling. Instruction scheduling allows the processor to bypass hazards such as instruction dependencies. Instruction scheduling is a run-time technique that rearranges the execution order and functional resource allocation of the instructions from a computer program so the instructions execute in the fastest and most efficient order possible. While the rearranged stream of instructions is semantically equivalent to the original stream, the instruction scheduler arranges and overlaps the execution of instructions so as to reduce overall execution time. Current scheduling techniques usually choose instructions for execution based on a simple criteria such as original order and latency.
These scheduling policies are not optimized in the sense that they do not maximize processor performance by advancing branches that are likely to mispredict or memory accesses that are likely to miss the cache.
For the foregoing reasons, there is a need to perform early resolution of low confidence branches and cache accesses.
The present invention is directed to a computer system and method for early resolution of an instruction which is a critical instruction. An embodiment of the present invention includes a scoreboard which has one or more reservation stations where each one of the reservation stations corresponding to one of the decoded instructions and each one of the reservation stations has a priority field.