1. Field of the Invention
The present invention generally relates to general purpose digital data processing systems and more particularly relates to such systems which employ pipelined execution of program instructions.
2. Description of the Prior Art
In most general purpose, stored program, digital computers, software is developed under the assumption that program instructions are executed in their entirety in a sequential fashion. This frees the software developer from the need to account for potential non-sequential operation of the hardware. However, most large scale modern machines are designed to take advantage of the overlapping of various functions. In its simplest form, such overlapping permits instruction processing of the N+1st instruction to be performed during operand processing of the Nth instruction. U.S. Pat. No. 4,890,225 issued to Ellis, Jr. et al. shows a rudimentary overlapped machine. To free the software developer from concerns about non-sequentiality, Ellis Jr. et al. store the machine state during the complete execution of the Nth instruction. U.S. Pat. No. 4,924,376 issued to Ooi provides a technique for resource allocation in an overlapped environment.
A more general form of overlapping is termed a pipelined environment. In implementing such a machine, the designer dedicates certain hardware resources to the various repetitive tasks. The performance advantage in this dedication comes from employing these dedicated hardware elements simultaneously. Typically, this means that instruction decode, operand fetch, and arithmetic operations each have separate and dedicated hardware resources. Even though the Nth instruction is processed by each of these hardware resources sequentially, each separate hardware resource is deployed on a different instruction simultaneously. The N+1st instruction may be processed by the instruction fetch and decode hardware, while the Nth instruction is being processed by the operand fetch hardware and while the N-1st instruction is being processed by the arithmetic hardware. U.S. Pat. No. 4,855,904 issued to Daberkow, et al. describes a pipelined architecture.
The problems associated with sequentiality experienced by software developers are magnified when considering microcode controlled machines operating in a pipelined mode. The performance advantages of the pipelined architecture can be readily dissipated by timing problems within the machine. U.S. Pat. No. 4,875,160 issued to Brown III discusses a number of pipeline based problems including conditional branching of microcode. The Brown III system accommodates pipeline exceptions by extending performance time for one or more clock cycles. U.S. Pat. No. 4,980,823 issued to Liu seeks to minimize the impact of branching on performance by prefetching of predicted data.
Other approaches to the problem include "de-piping". That is simply forcing serial performance of all functions until the pipeline exception is accommodated. U.S. Pat. No. 5,014,196 issued to Hayashi et al. suggests this approach for certain types of pipeline problems.
Another way to provide protection for microcode branching is by using non-staged control. In this approach, each microcode instruction becomes a family of instructions which provide for the various permutations and combinations associated with the branch conditions. Each member of this instruction family controls all stages for a single clock cycle rather than only one stage per cycle for a number of clock cycles. U.S. Pat. No. 4,891,754 issued to Boreland suggests such an approach. Non-staged design tends to cause additional complexity in microcode design. Boreland approaches this problem by providing additional read only memory to store the combinations. U.S. Pat. No. 4,835,679 issued to Kida et al. and U.S. Pat. No. 4,872,109 issued to Horst et al. show that read only memory space can be saved by slowing the pipeline during conditional branching.
U.S. Pat. No. 5,040,107 issued to Duxbury et al. operates the pipeline until a dependency is found using a look-ahead technique. The dependency is resolved by aborting the second (i.e. dependent) instruction to preserve sequentiality resulting in a performance penalty.