Pipeline Processors--RISC
Many modern microprocessors are designed using Reduced Instruction Set Computer (RISC) techniques. Such machines use a relatively simple instruction set but concentrate on executing those instructions very quickly. High speed operation is enhanced by use of a pipeline of about half a dozen stages.
Each instruction in sequence enters the pipeline and goes through various processing steps at each stage in the pipeline. In an early stage, for example, the instruction is decoded so that the actions of later pipeline stages for it become known. In another stage the data values required for the instruction are retrieved from the register file. In a later stage the arithmetic or logical operation required by the instruction is performed.
It is common in microprocessors to provide access to the register file at an early stage in the pipeline. This is done so that values from the register file required by an instruction can be delivered to the pipeline early in the process of performing an instruction. The speed of access to the register file is often a pacing item in the speed of the machine. Results computed later in the pipeline are returned to the register file for storage.
Bypass Paths
The process of moving all data through the register file has proven to be a bottleneck in the performance of microprocessors. Therefore, modern microprocessor designs use a complex set of "bypass" paths between stages in the pipeline to deliver values computed by one instruction to subsequent instructions. Thus, for example, if one instruction computes a value required as input to an instruction immediately following it, the value will be passed back one stage in the pipeline rather than circulating through the register file. Similarly, if the value is required by an instruction two instructions following, the value will be passed back two stages in the pipeline. A complex network of data paths is required to accommodate all possible needs of instructions for recently computed values. Design and management of a complex set of bypass paths has become a major task in the design of microprocessors. Because they pass by several stages in the pipeline, they will require wires longer than those normally connecting adjacent stages. Long wires have greater delay than the shorter wires connecting adjacent stages, which may degrade performance of the machine. Considerable effort may be required to accommodate the peculiar timing constraints of bypass paths. Second, because bypass communications in the pipeline may pass several stages, the timing of the stages connected by a bypass path must be carefully controlled. If there is a small timing error between adjacent stages, it may accumulate over the several stages around which the bypass passes to such an extent that it causes difficulty in the bypass communication. Third, should one part of the pipeline stall, i.e. be unable to continue without receipt of some essential item, all other parts of the pipeline must also stall, because the bypass paths would otherwise risk loss of data. The logic network that detects a stall condition in any part of the pipeline and transmits it to all parts to keep them in step often limits the performance of the computer. Finally, machines that use a multiplicity of bypass paths require switches to deliver the various bypass data from the various paths to the proper part of the processor. These switches themselves introduce not only delay in the processor but also complexity in the design.
Out of Order Execution
One way to enhance speed in a computer is to execute instructions out of order. As soon as enough is known to perform some instruction, the computer can do it, even though "previous" instructions have not yet been done. Nevertheless, such machines must produce the same results as would be produced by sequential execution of the instructions in the order written. The term "out of order execution" has come into use to describe any mechanism that is able to complete instructions in an order different from their order as presented by the program. Out of order execution can speed execution of a variety of programs that include floating point arithmetic instructions, or complicated and thus relatively slow numeric operations such as trigonometric functions. While the arithmetic operations are underway, other parts of the computer may do other, generally simpler, instructions out of order, completing as much work as possible in time that might otherwise be wasted.
Multiple Instruction Issue or Super-Scalar
Another way to enhance performance is called "multiple instruction issue", used in "super-scalar" machines. In a super-scalar machine, instructions are processed in groups rather than singly. Greater speed is achieved by using duplicate processing machinery in parallel rather than a single processing device sequentially.
It sometimes happens that the instructions in a group must interact. For example, the second instruction in a group may require a value computed by the first instruction in the group. Some computers provide communication paths between the parallel processing machinery to accommodate this kind of need. Other computers avoid this requirement by choosing to place in a group only instructions that have no mutual interaction. For example, some such computers can execute two instructions at a time, provided that one instruction requires only fixed point arithmetic and the other requires only floating point arithmetic.
Speculative Execution
Another useful mechanism for increasing performance is speculative execution. Although instructions to be performed by a computer are usually stored in consecutive cells in memory, some instructions, called "branch" instructions, direct the computer to take instructions from an entirely different location. Some branch instructions, called "conditional branches", direct the computer either to continue executing instructions in sequence or to take instructions from some other sequence, depending on the value of some data element that is computed.
In a high performance machine, the mechanism that fetches instructions from memory may be fetching instructions well before they are actually executed. The instructions that are fetched and not yet executed lie in a pipeline between the fetch unit and the place where they are actually executed. When the instruction fetch mechanism reaches a conditional branch, it may not know for certain which of the two possible next instructions to fetch. Knowledge of which is the proper next instruction may wait until the data element being tested by the conditional branch is actually calculated. However, rather than waiting for this calculation, the instruction fetch units of many modern machines fetch instructions based on a guess of the outcome. Success rates of about 85% are achieved by relatively simple predictors known in the art. The fetch unit fetches instructions from the predicted location and issues them into the pipeline. Such instructions are called "speculative" because it is not certain that they should be executed at all. If the branch prediction is wrong, the speculatively issued instructions must be eliminated and all traces of their action reversed.
Register Renaming
In a simple computer design the computed values are stored in a register file. Values required as input for an instruction are fetched from the register file and computed values are returned to it. In more complex designs intermediate values are sometimes stored in temporary holding locations in order to save the time that would otherwise be used to move them to or from the register file. The control system for such a computer records both the value and the identity of the registers stored in the temporary holding locations. In effect, each temporary holding location may from time to time be identified with a different register from the register file. This mechanism is commonly known as "register renaming".
Register renaming ordinarily requires special design consideration. A designer must decide which temporary holding registers can be renamed, and how the identity of their contents will be recorded. A wide variety of complex mechanisms has been developed for this purpose.
Multiple Memory Issue
Another method used to speed the operation of modern computers is called multiple memory issue. In a simple memory system, values may be drawn from the memory one at a time. Each access to memory must complete before another can begin. In such a system the rate at which information can be drawn from the memory is limited by the access time of the memory.
Some modern machines improve on this rate of memory access by including special circuits that can accommodate more than one outstanding memory request at a time. Such circuits must include storage for the details of each memory request as well as control circuits to introduce them to and pass them through the memory system without interfering with each other. In order to simplify the memory control circuits it is customary for the memory to return its responses in the same sequence that they were requested.
Even more sophisticated memory systems are capable of out of order reply from the memory. Some memory requests may be satisfied by access to a fast cache memory, while others require recourse to the main memory system of the computer. The most sophisticated cotters put this difference in memo access time to use. They permit answers from memory that are available quickly to be used quickly even though previous requests of the memory are not yet complete. This is similar to out of order execution, but concerns the memory system rather than the arithmetic and logical parts of the computer.
Preserving the Programmer's Model
Great care must be exercised in the design of computers capable of multiple issue, out of order execution, speculative execution, register renaming, and multiple or out or order memory access to ensure correct execution of the instruction set. The instruction sets now in common use presume sequential execution of the instructions and presume that all computed values are produced and recorded in the order of the instructions that produce them. If one wishes to make a machine capable of higher speeds, one must exercise great care to ensure that its operation is compatible with programs initially intended for simpler machines.
The programmer thinks of the program as a sequence of instructions to be performed in the sequence he defines. A computer that does out of order execution must be designed to produce the same results as would be obtained by sequential operation. Usually this is easy, because any operations actually performed out of order must be independent of other instructions. Preserving compatibility with sequential operation is hard, however, when an instruction executed out of order produces some kind of fault. For example, if a branching decision instruction has already been executed when an instruction before it in sequence produces a memory fault, the effect of the branching decision instruction must be undone. Similarly, suppose a floating point divide instruction is launched, and instructions after it in the program are performed before the divide completes. If the divisor of the divide instruction is zero, an overflow results and instructions after the divide that were performed out of order must be undone. Preserving compatibility between computers that can execute instructions out of order and computers that perform in sequence has proven to be difficult, requiring complex circuits for many special cases of instruction sequences.