A general purpose computer processes data by executing one or more of several predefined instructions in a particular sequence. An example of a computing machine is a hand held calculator. In this machine, the predefined instructions (the instruction set) may include only the arithmetic operations of addition, subtraction, multiplication and division. Data and the required sequence of instructions are input by the user one by one and an arithmetic calculation results.
The set of sequential instructions that a computer executes to produce a desired result is called a program. In general purpose machines with large instruction sets, the programs may be very large. Since computers execute the instructions much faster than users can input them, it is desirable to store the programs in electronic memories so that the computer can automatically read the instructions and thereby run at top speeds.
Most modern stored-program data processing systems are based on the Von Neumann model. The Von Neumann computer design is based upon three key concepts:
Data and instructions are stored in a single read-write memory. PA1 The contents of this memory are addressable by location, without regard to the type of data contained in that location. PA1 Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next. PA1 1. Fetch an instruction from memory. PA1 2. Decode the fetched instruction to interpret the instruction. PA1 3. Fetch from memory any operands (data on which the instruction operates) required by the instruction. PA1 4. Perform the operation defined by the instruction. PA1 5. Store the results of the operation in memory for future reference.
The primary circuits of the Von Neumann computer can be broadly grouped into two parts: a memory and a Central Processing Unit (CPU). The memory holds the data and the instructions for the computer system. The CPU can be considered the brain of the system. It contains electronic logic that sequentially fetches and executes the stored instructions.
Data in most digital computers is represented in the form of binary numbers. Each location in memory is capable of storing a binary number (the maximum size of which depends upon the type of computer system). The program or set of sequential instructions that the CPU executes is stored in a particular region of memory. An instruction may occupy more than one location in memory. The first part of each instruction is called an opcode. The opcode is a unique binary number that tells the CPU which instruction it is. Most instructions have other parts that may contain operands (data to be processed) or operand specifiers. Operand specifiers inform the CPU where to find the operands that the instruction requires. These operands may be anywhere in memory or in certain temporary memory locations inside the CPU.
In general, the CPU performs the following operations to execute an instruction:
Different sets of hardware (called functional units) within the CPU carry out these operations. The functional units of a CPU may contain various registers (memory elements) and arithmetic and logic units (ALUs). The registers store temporary results and instruction operands (data on which an instruction operates). The ALU uses combinatorial logic to process the data present at its inputs. The output of the ALU depends upon the control signals provided to it, and is obtained from the input by performing an arithmetic operation or a logical (shifting or boolean) operation. The processing in the CPU is done by channeling data from operand registers through the ALU into result registers. The data may be channeled through the ALU many times for complex instructions.
Data is transferred between the basic elements of the CPU through common busses (set of wires that carry related signals). The data transfers are dependant on the type of instruction currently being executed and are initiated by a central controller. The CPU controller sends a sequence of control signals to the various registers of the CPU, telling the registers when to put data on the common read bus (going to the inputs of the ALU) and when to get data off the common write bus (coming out of the ALU). The CPU controller also tells the ALU what operation to perform on the data from the input to the output. In this way, the controller of the CPU may initiate a sequence of data transfers starting with fetching the instruction from main memory, fetching corresponding data, passing the data between the ALU and the various temporary storage registers, and finally writing processed data back to main memory.
The various implementations of a CPU controller fall under two main categories: hardwired and microprogrammed. Hardwired controllers use combinatorial logic and some state registers to produce a sequence of control signals. These control signals depend upon the type of instruction just fetched and the result of the execution of the previous instruction. The microprogrammed controller performs the same function but uses a ROM or RAM controlled state machine to produce the control signals from previous state and instruction inputs.
Hardwired controllers are tailored for a particular instruction set, and the logic used to implement them becomes increasingly complex as the complexity of the instruction set increases. Microprogrammed controllers are more general purpose devices in that changes in the contents of the control store microinstruction flow without changing the hardwired logic. While the hardwired controllers are fast, microprogrammed controllers provide more flexibility and ease of implementation.
In the simplest implementation of a microprogrammed CPU controller, each CPU instruction corresponds to a micro-flow stored in the control store. As used herein, a micro-flow refers to a micro-programmed subroutine. Each bit or decoded field of a micro-instruction corresponds to the level of a control signal. Sequencing through a series of such microinstructions thus produces a sequence of control signals. In a microprogrammed controller, each CPU instruction invokes at least one micro-flow (which may be just one micro-instruction long for small one cycle CPU instructions) to generate control signals which control ALU operations and data transfers on the CPU internal busses.
Computers are often classified into complex instruction set computers (CISCs) and reduced instruction set computers (RISCs) on the basis of the instruction sets that their CPUs support. CISCs commonly have a large instruction set with a large variety of instructions, while RISCs typically have a relatively small set of simple instructions. Since RISC CPUs have a few simple instructions, they can afford to use the fast hardwired controllers. CISC CPUs usually use microprogrammed controllers because of ease of implementation.
The simple configuration of data processing computers specified in the Von Neumann model of computation is frequently subject to enhancements in an effort to increase the computer's efficiency and usefulness. One such enhancement is the proven architectural modification of "pipelining", which can significantly increase computer performance by overlapping the execution of several instructions in the CPU, thus engaging each functional unit in productive work for a greater overall percentage of time. In a pipelined CPU, the multiple functional units concurrently perform the basic constituent segments of execution for a plurality of CPU instructions.
An example of a pipelined CPU is described by Sudhindra N. Mishra, in "The VAX 8800 Microarchitecture," Digital Technical Journal, Feb. 1987, p. 20-33.
Since each functional unit can handle only one instruction at a time, it is necessary that all functional units in a pipeline advance the instructions that they are processing in a synchronized manner. Unlike in the assembly line analogy, however, the functional units in pipelined computer may require variable amounts of time depending upon the instruction that they are currently processing. If one of the functional units takes a long time to perform its function on a particular instruction, all the functional units that follow in the pipeline must wait for it to finish before they can advance their respective instructions to the next phase of the pipeline. This delay for the purpose of maintaining synchronization is known as a pipeline "stall". Pipeline stalls can also occur if a particular instruction needs the results of a previous instruction in the pipeline which has not completed execution. The instruction that needs the results may stall the pipeline starting at the operand fetch unit, waiting for the previous instruction to pass through the pipeline and produce the operand that the stalled instruction requires.
In known RISC systems, most instructions use the various CPU functional units for equal amounts of time. Pipelining in RISCs can thus be accomplished by overlapping the execution of the simple CPU instructions, as described above. On the other hand, some CISC instructions can be quite complex, requiring numerous CPU register/ALU transfers and long periods of time to execute. Other CISC instructions may be relatively simple and require fewer transfers and much less time to execute. The disparity in functional unit usage among various CISC instructions would make a CISC instruction pipeline stall often and for relatively long periods of time. For this reason, the pipelining of CISC CPU instructions is more difficult.
CISC instructions of varying complexity may have correspondingly different sizes of microflows. Since each microinstruction provides the lowest-level control signals for one CPU cycle to all elements of the various functional units, in some CISC machines the execution of microinstructions is pipelined instead of the CPU instructions. This reduces stalling because the time of execution of each microinstruction is more nearly the same. In a microinstruction pipeline, each stage uses a few bits in the microinstruction that correspond to the functional unit of that stage. After each functional unit is done with the microinstruction that controlled its activity during a cycle, it passes this microinstruction to the next functional unit in the pipeline for the next cycle. The first functional unit gets a new microinstruction each cycle. In this way, the fundamental principle of pipelining--the overlapped instruction execution to utilize the various functional units in parallel--is realized.
A CPU instruction typically specifies an operation which requires a number of data transfers between the registers and ALU in the CPU and this sequence of transfers is carried out under control of the CPU controller. With a microprogrammed controller a single CPU instruction specifies the execution of one or more micro-flows each consisting of one or more microinstructions to be executed in order. In this way, a computer program consisting of a sequence of CPU instructions is converted by the CPU controller into a corresponding program of microinstructions that themselves must be executed in order.
In normal operation, a CPU processes instructions one at a time in the order that the instructions reside in the computer's memory. However, the CPU instruction set may include instructions that specify an alternate flow of program execution. Such instructions, called "Branch" instructions, indicate that the next instruction that the CPU should execute is an instruction other than the instruction that immediately follows the branch instruction. Branch instructions may be either "unconditional" or "conditional". An unconditional branch instruction specifies that the program execution should continue at a non-sequential location in the program memory location that is provided as part of the branch instruction. A conditional branch, however, specifies that the program execution should continue at one of a set of particular program locations. The determination of which instruction should be executed after a conditional branch instruction is made according to the current or previous state of the computer.
An example of a conditional branch instruction included in an ordinary instruction set is a "Branch on Equal to Zero" or BEQL instruction. A BEQL instruction specifies to the CPU that the program execution should divert to the non-sequential location specified within the BEQL instruction only if the result of the latest ALU operation is zero. If the latest result of an ALU operation is non-zero, program execution should continue with the instruction immediately following the BEQL instruction in program memory.
Since a microprogrammed controller translates sequences of CPU instructions into sequences of microinstructions used to control the CPU, the control store should contain microflows which accomplish conditional branching. That is, the CPU controller must have the means to cause a non-sequential CPU instruction to be fetched and executed in the case that certain conditions are met.
If the execution of microinstructions is not pipelined, microprogrammed support of conditional branching is straightforward. One microinstruction would cause the ALU to perform an operation, and a subsequent microinstruction would use the result of that ALU operation to determine which CPU instruction should then be fetched, translated into a sequence of microinstructions and executed.
If the execution of microinstructions is pipelined, however, conditional branching can cause the same sorts of data dependency problems and pipeline stalling as in any pipelined execution of instructions. Since various phases of more than one microinstruction are executing at the same time in the pipeline, a result produced from one stage of execution of a microinstruction might not be available to a subsequent microinstruction in the pipeline soon enough to be used as conditional data in a branch statement. For this reason, the micro-flows in the control store must be written in such a way as to insure that there is a delay between the microinstruction which produces a condition and the microinstruction which uses that condition to conditionally branch. This delay is termed "microbranch latency", and a number of techniques in the art exist for introducing this latency, when necessary, into instruction pipelines.
One method for introducing latency into the pipeline to delay the execution of a phase which requires data not yet available is to cause the functional units to execute "no-op" instructions. No-op instructions are processed in the pipeline in the ordinary manner, but direct the functional units in the pipeline to do nothing. By executing no-op instructions, the pipeline effectively idles, waiting for data without stalling. In high-performance systems, it is undesirable to produce a microbranch condition in one microinstruction and then insert non-productive no-op instructions to wait until the condition is available for use. One common technique for avoiding these no-op instructions is to use the intervening microinstructions during the period of microbranch latency to do useful work, or to produce other microbranch conditions to be used later.
In a microprogrammed controller system which supports pipelined conditional microbranching, it is common to use the condition exactly once and at the earliest possible time. A condition that is used only once and at a fixed time after it is produced is called a "dynamic microbranch condition" because it is not stored for later use.
There are some rare situations, however, in which a microinstruction generates a condition and requires that the use of the condition be delayed by one or more cycles. With pipelined execution of microinstructions, however, the ALU and other functional units which may generate conditions are applied to a different microinstruction in each CPU cycle, causing the dynamic microbranch conditions to be updated once per cycle. Delaying the use of a microbranch condition, therefore, imposes the requirement that the condition generated by one microinstruction not be updated by subsequent microinstructions until that condition has been used. Otherwise, a conditional branch microinstruction could base its decision upon inappropriate condition data. Since delaying the use of condition information is rare, it is not efficient to dedicate a microinstruction bit to indicate where the condition generated by that microinstruction should be stored or latched for later use. Such a static microbranch condition implementation generally requires a wider microword than is required if dynamic microbranch conditions are used.
It is accordingly advantageous to implement dynamic microbranches while also providing the CPU with the flexibility to occasionally retain a previous dynamic microbranch condition, thus allowing a delay in the use of that condition.