1. Field of the Invention
The present invention generally relates to general purpose digital data processing systems and more particularly relates to such systems that employ pipelined execution of program instructions.
2. Description of the Prior Art
In most general purpose, stored program, digital computers, software is developed under the assumption that program instructions are executed in their entirety in a sequential fashion. This frees the software developer from the need to account for potential non-sequential operation of the hardware. However, most large scale modern machines are designed to take advantage of the overlapping of various functions. In its simplest form, such overlapping permits instruction processing of the N+1st instruction to be performed during operand processing of the Nth instruction. U.S. Pat. No. 4,890,225 issued to Ellis, Jr. et al. shows a rudimentary overlapped machine. To free the software developer from concerns about non-sequentiality, Ellis Jr. et al. store the machine state during the complete execution of the Nth instruction. U.S. Pat. No. 4,924,376 issued to Ooi provides a technique for resource allocation in an overlapped environment.
A more general form of overlapping is termed a pipelined environment. In implementing such a machine, the designer dedicates certain hardware resources to the various repetitive tasks. The performance advantage in this dedication comes from employing these dedicated hardware elements simultaneously. Typically, this means that instruction decode, operand fetch, and arithmetic operations each have separate and dedicated hardware resources. Even though the Nth instruction is processed by each of these hardware resources sequentially, each separate hardware resource is deployed on a different instruction simultaneously. The N+1st instruction may be processed by the instruction fetch and decode hardware, while the Nth instruction is being processed by the operand fetch hardware and while the N-1st instruction is being processed by the arithmetic hardware. U.S. Pat. No. 4,855,904 issued to Daberkow, et al. describes a pipelined architecture.
The problems associated with sequentiality experienced by software developers are magnified when considering microcode controlled machines operating in a pipelined mode. The performance advantages of the pipelined architecture can be readily dissipated by timing problems within the machine. U.S. Pat. No. 4,875,160 issued to Brown III discusses a number of pipeline based problems including conditional branching of microcode. The Brown III system accommodates pipeline exceptions by extending performance time for one or more clock cycles. U.S. Pat. No. 4,980,823 issued to Liu seeks to minimize the impact of branching on performance by prefetching of predicted data.
Other approaches to the problem include "de-piping". That is simply forcing serial performance of all functions until the pipeline exception is accommodated. U.S. Pat. No. 5,014,196 issued to Hayashi et al. suggests this approach for certain types of pipeline problems.
Another way to provide protection for microcode branching is by using non-staged control. In this approach, each microcode instruction becomes a family of instructions which provide for the various permutations and combinations associated with the branch conditions. Each member of this instruction family controls all stages for a single clock cycle rather than only one stage per cycle for a number of clock cycles. U.S. Pat. No. 4,891,754 issued to Boreland suggests such an approach. Non-staged design tends to cause additional complexity in microcode design. Boreland approaches this problem by providing additional read only memory to store the combinations. U.S. Pat. No. 4,835,679 issued to Kida et al. and U.S. Pat. No. 4,872,109 issued to Horst et al. show that read only memory space can be saved by slowing the pipeline during conditional branching.
For many of these prior art systems, a goal is to maintain a valid architectural state of a corresponding data processing system while at the same time, maximizing the performance thereof. For example, an instruction following a conditional branch instruction may alter the architectural state of a data processing system before the data processing system determines if the branch will be taken. That is, it is known that most microprocessors and data processing systems have a number of software architectural registers wherein the architectural state of a machine may be defined by a number of values stored in the software architectural registers. Typically, the software architectural registers are defined by a software specification of the microprocessor or data processing system. It is further known that the software architectural registers may be used to pass values from one instruction to another. Software architectural registers are often referred to as architectural registers, software visible registers, a working register set, or a General Register Set (GRS).
Under certain circumstances, the modification of the architectural state of the machine may become problematic when a dependency exists between instructions. An example of such an instruction is a load A with address incrementation instruction. The load A with address incrementation instruction may change the architectural state of the machine by incrementing the operand address, which may be stored in one of the architectural state registers, during the same pipeline stage that the operand address is generated. An advantage of this approach is that the incremented operand address may be used during address generation of a next succeeding instruction.
While this may increase the performance of operand address generation for a subsequent instruction, it is problematic in many systems because the architectural state of the machine may be improperly changed in view of a subsequent event. This may typically occur when there is some dependency between the corresponding instructions.
For example, it may not be known if a conditional branch instruction will in fact change the normal sequential execution of the instructions until the third stage (the arithmetic operation stage) of the pipeline. Accordingly, a subsequent instruction may modify the architectural state of the machine by writing an incremented address, for example, to the GRS before it is determined whether the branch instruction will in fact change the normal sequential execution of the instructions. If the condition of the conditional branch instruction is later determined to be satisfied, thereby indicating that the normal sequential operation of the instructions is to be changed, the architectural state change of the subsequent instruction may be improper.
U.S. Pat. No. 5,040,107 issued to Duxbury et al. addresses this problem by operating the pipeline until a dependency is found between instructions using a look-ahead technique. The dependency is resolved by aborting the second (i.e. dependent) instruction to preserve sequentiality resulting in a performance penalty. U.S. Pat. No. 5,363,490 issued to Alferness et al. suggests an improvement to this basic approach by utilizing a technique for conditional aborting an instruction within an instruction processor when a dependency is found.
Another related problem is when the architectural state of the machine is changed before a corresponding fault or interrupt signal can be provided to the instruction processor. For example, and as indicated above, the Load A with address incrementation instruction may generate an operand address and may then increment the operand address during the first pipeline stage. This may allow the incremented operand address to be used during address generation of a next succeeding instruction.
A problem may arise when a corresponding fault or interrupt is not determined until the third pipeline stage (e.g. the arithmetic stage) of the instruction pipeline. For example, the Load A instruction discussed above may perform an address limits check to ensure the address generated for the corresponding operand is within a previously allocated address space. If the address of the operand is not within the allocated address space, an address limits fault may be provided to the instruction processor. Accordingly, the load A with address incrementation instruction may change the architectural state of the machine by writing an incremented operand address to the GRS, before the corresponding limits check fault is provided to the pipelined instruction processor. In order to continue processing, it may be necessary to restore the architectural state of the machine as it existed just prior to the execution of the Load A instruction. A substantial amount of microcode may be required to restore the proper architectural state of the machine. Further, a substantial amount of time may be required to execute the microcode restore algorithms.