This invention relates to a digital computer, and more particularly to a pipelined CPU for a digital processor.
A general purpose computer processes data by executing one or more of several predefined instructions in a particular sequence. An example of a computing machine is a hand held calculator. In this machine, the predefined instructions (the instruction set) may include only the arithmetic operations of addition, subtraction, multiplication and division. Data and the required sequence of instructions are input by the user one by one and an arithmetic calculation results.
The set of sequential instructions that a computer executes to produce a desired result is called a program. In general purpose machines with large instruction sets, the programs may be very large. Since computers execute the instructions much faster than users can input them, it is desirable to store the programs in electronic memories so that the computer can automatically read the instructions and thereby run at top speeds.
Most modern stored-program data processing systems are based on the Von Neumann model. The Von Neumann computer design is based upon three key concepts:
Data and instructions are stored in a single read-write memory. PA1 The contents of this memory are addressable by location, without regard to the type of data contained in that location. PA1 Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next. PA1 1. Fetch an instruction from memory. PA1 2. Decode the fetched instruction to interpret the instruction. PA1 3. Fetch from memory any operands (data on which the instruction operates) required by the instruction. PA1 4. Perform the operation defined by the instruction. PA1 5. Store the results of the operation in memory for future reference.
The primary circuits of the Von Neumann computer can be broadly grouped into two parts: a memory and a central processing unit (CPU). The memory holds the data and the instructions for the computer system. The CPU can be considered the brain of the system. It contains electronic logic that sequentially fetches and executes the stored instructions.
Data in most digital computers is represented in the form of binary numbers. Each location in memory is capable of storing a binary number (the maximum size of which depends upon the type of computer system). The program or set of sequential instructions that the CPU executes is stored in a particular region of memory. An instruction may occupy more than one location in memory. The first part of each instruction is called an opcode. The opcode is a unique binary number that tells the CPU which instruction it is. Most instructions have other parts that may contain operands (data to be processed) or operand specifiers. Operand specifiers inform the CPU where to find the operands that the instruction requires. These operands may be anywhere in memory or in certain temporary memory locations inside the CPU.
In general, the CPU performs the following operations to execute an instruction:
Different sets of hardware (called functional units) within the CPU carry out these operations. The functional units of a CPU usually include various registers (memory elements) and an arithmetic and logic unit (ALU). The registers store temporary results and instruction operands (data on which an instruction operates). The ALU uses combinatorial logic to process the data present at its inputs. The output of the ALU depends upon the control signals provided to it, and is obtained from the input by performing an arithmetic operation or a logical (shifting or boolean) operation. The processing in the CPU is done by channeling data from operand registers through the ALU into result registers. The data may be channeled through the ALU many times for complex instructions.
Data is transferred between the basic elements of the CPU through common busses (set of wires that carry related signals). The data transfers are dependant on the type of instruction currently being executed and are initiated by a central controller. The CPU controller sends a sequence of control signals to the various registers of the CPU, telling the registers when to put data on the common read bus (going to the inputs of the ALU) and when to get data off the common write bus (coming out of the ALU). The CPU controller also tells the ALU what operation to perform on the data from the input to the output. In this way, the controller of the CPU may initiate a sequence of data transfers starting with fetching the instruction from main memory, fetching corresponding data, passing the data between the ALU and the various temporary storage registers, and finally writing processed data back to main memory.
The various implementations of a controller fall under two main categories: hardwired and microprogrammed. Hardwired controllers use combinatorial logic and some state registers to produce a sequence of control signals. These control signals depend upon the type of instruction just fetched and the result of the execution of the previous instruction. The microprogrammed controller performs the same function but uses a ROM or RAM controlled state machine to produce the control signals from previous state and instruction inputs.
Hardwired controllers are tailored for a particular instruction set, and the logic used to implement them becomes increasingly complex as the complexity of the instruction set increases. Microprogrammed controllers are more general purpose devices, in that changes in the contents of the control store can be used to change the microinstruction flow, without changing the hardwired logic. While the hardwired controllers are fast, microprogrammed controllers provide more flexibility and ease of implementation.
In the simplest implementation of a microprogrammed controller, each CPU instruction corresponds to a micro-flow stored in the control store. As used herein, a micro-flow refers to a micro-programmed subroutine. Each bit or decoded field of a micro-instruction corresponds to the level of a control signal. Sequencing through a series of such microinstructions thus produces a sequence of control signals. In a microprogrammed controller, each CPU instruction invokes at least one micro-flow (which may be just one micro-instruction long for small one cycle CPU instructions) to generate control signals which control ALU operations and data transfers on the CPU internal busses.
Computers are often classified into complex instruction set computers (CISCs) and reduced instruction set computers (RISCs) on the basis of the instruction sets that their CPUs support. CISCs commonly have a large instruction set with a large variety of instructions, while RISCs typically have a relatively small set of simple instructions. Since RISC CPUs have a few simple instructions, they can afford to use the fast hardwired controllers. CISC CPUs usually use microprogrammed controllers because of ease of implementation. Some CPUs may use a plurality of controllers: hardwired and microprogrammed, to control various subsections of the CPU.
Since a machine operation may depend on the completion of a previous machine operation, the functional units operate on instructions sequentially. As a result, in a simple computer design, each functional unit is only being used for a fraction of the duration of the instruction execution.
The iterative fetch and execute scheme of the Von Neumann machine has been modified in many ways to produce faster computers. One such architectural modification is a technique known as pipelining. Pipelining significantly increases CPU performance by overlapping execution of several instructions in the CPU. In a pipelined architecture, different functional units process different instructions simultaneously.
An example of a pipelined CPU is described by Sudhindra N. Mishra in "The VAX 8800 Microarchitecture," Digital Technical Journal, Feb. 1987, pp. 20-33.
Pipeline processing is like an assembly line where assembly of many items happens simultaneously, but at any time each item is at a different stage of the assembly process. Pipelining allows overlapped execution of several instructions, thereby increasing the effective execution speed (or throughput) of each instruction.
Since each functional unit can handle only one instruction at a time, it is necessary that all functional units advance the instructions that they are processing in a synchronized manner. Unlike the assembly line analogy, however, functional units in a pipelined computer may require variable amounts of time depending upon the instruction they are processing. If one of the functional units takes a long time to perform its function on a particular instruction, all the functional units that follow in the pipeline must wait for it to finish before they can advance their respective instructions. This results in a pipeline stall. Pipeline stalls can also occur if a particular instruction needs the results of the previous instruction. The instruction that needs the results may stall the pipeline starting at the operand fetch unit, waiting for the previous instruction to pass through the pipeline and produce the operands that the stalled instruction requires.
Stalling introduces bubbles in the pipeline. A bubble represents a stage in the pipeline that cannot accomplish any useful work due to the lack of data from an earlier pipeline stage. As a bubble propagates through the pipeline it causes the corresponding functional units to become idle. In effect, a pipeline bubble is a lost opportunity to do useful work and results in lower processor throughput. This invention deals with a CPU pipeline implementation that compresses bubbles.
In known RISC systems, most instructions use the various CPU functional units for equal amounts of time. Pipelining in RISCs can thus be accomplished by overlapping the execution of CPU instructions, as described above. On the other hand, some CISC instructions can be quite complex, requiring long periods of time to execute, while other CISC instructions may be relatively simple and require much less time to execute. The disparity in functional unit usage among various CISC instructions would make the CISC pipeline stall often and for relatively long periods of time. For this reason, the pipelining of CISC CPU instructions is more difficult.
Various CISC instructions may have different sizes of microflows. Since each microinstruction provides control signals for one cycle to all elements of the various functional units, in some CISC machines the microinstructions are pipelined instead of the CPU instructions (as commonly done in RISC machines). This reduces stalling because the time of execution of each microinstruction is the same. In a microinstruction pipeline, each stage uses a few bits in the microinstruction that correspond to the functional unit of that stage. After each functional unit has made use of the microinstruction that controlled its activity during a cycle, it passes this microinstruction to the next functional unit in the pipeline in the next cycle. The first functional unit gets a new microinstruction. In this way, the fundamental principle of pipelining--overlapped instruction execution to utilize various functional units in parallel--is realized. Even a microinstruction pipeline is not immune to bubbles. This invention provides a means for bubble compression in any kind of instruction pipeline.
A basic rule governing control of most pipelined processors is that all functional stages of the pipeline simultaneously advance their states to the next functional stage. This is necessary because each functional unit transmits its processed state to the following unit while it receives a new state from the preceding unit. Thus, in previous designs, if a bubble is introduced into the pipeline, it propagates through each successive pipeline stage as all stages are simultaneously advanced. It would therefore be advantageous to overwrite or compress bubbles so introduced to optimize system throughput.