1. Field of the Invention
The present invention relates to the field of microprocessor architecture. More particularly, the present invention relates to increasing instruction throughput by optimizing instructions executed by the microprocessor.
2. Art Background
As the computer revolution has progressed the quest of microprocessor developers has been to develop chips exhibiting more power and faster performance. Initial efforts focused essentially on increasing transistor populations on single microprocessor integrated circuits. That effort continues with today's microprocessors now housing literally millions of transistors on a single chip. Further integration has allowed processor clock speeds to be greatly increased with the increased density of transistors.
In addition to squeezing performance by overcoming physical limitations, microprocessor design has developed into an art form. Microprocessors are divided into discrete functional blocks through which instructions are propagated one stage at a time. This allows for pipelining of instructions such that when one instruction has completed the first stage of processing and moves on to the second stage, a second instruction may begin the first stage. Thus, even where each instruction requires a number of clock cycles to complete all stages of processing, pipelining provides for the completion of instructions on every clock cycle. This single-cycle throughput of a pipelined microprocessor greatly increases the overall performance of computer systems.
Other enhancements to microprocessor design include the development of superscalar microprocessors which are capable of initiating more than one instruction at the initial stage of the pipeline per clock cycle. Likewise, in a superscalar microprocessor, frequently more than one instruction completes on each given clock cycle. Other development efforts have gone into the simplification of microprocessor instruction sets, developing reduced instruction set computing (RISC) microprocessors which exploit the fact that many simple instructions are more commonly executed than some complicated instructions. Eliminating the complicated instructions from the instruction set provides for a faster executing pipeline. Complicated instructions are carried out by combinations of the more simple instructions.
In order for pipelined microprocessors to operate efficiently, an instruction fetching and sequencing mechanism at the head of the pipeline must continually provide the pipeline with a stream of instructions. However, conditional branch instructions within an instruction stream prevent an instruction fetching mechanism at the head of a pipeline from fetching the correct instruction until the condition is resolved. Since the condition will not be resolved until further down the pipeline, the instruction fetching mechanism may not be able to fetch proper instructions.
One type of instruction that can slow an executing pipeline until a condition is resolved is the compare-and-branch instruction. A microprocessor executing a compare-and-branch instruction will first perform a comparison and then branch to a specified address as determined by the result of the compare operation. The compare-and-branch instruction is a complex instruction which in previous RISC style microprocessor architectures required vectoring to the microprocessor's microcode for executing the compare followed by the branch. For optimal performance, this requires considerable overhead in hardware associated with entering and exiting the processor's microcode. This introduces additional latency to an executing pipeline for the execution of the compare-and-branch instruction.
In addition to the compare-and-branch instruction, there are other pairs of complimentary instructions which also require vectoring to microcode and additional latency resulting in decreased performance for a microprocessor. It would be advantageous, and is therefore an object of the present invention, to provide a mechanism for executing appropriate sequences of complimentary instructions, such as the constituents of the compare-and-branch instruction, in a more efficient manner without introducing complicated hardware or introducing additional latencies required for invoking a microprocessor's microcode.