This invention relates to the field of digital computers. More specifically, it relates to the design and implementation of digital processors which are fast and cost effective, having a high ratio of performance to cost.
As is known by those skilled in the art, a digital processor executes programs by drawing instructions from memory, decoding the instructions to determine the instruction type and the location in memory of the operand before executing the instruction. The majority of the instructions that a processor executes are of the type wherein the contents of the register (called the index or base register) are added to the contents of a specified field of the instruction (called the offset). This determines the location in the computer's memory of an operand (the second operand) which is used to modify the contents of a register containing the first operand.
A typical instruction may consist of anywhere from 8 to 64 bits of information. In the case of a 32 bit machine the instruction may have a format wherein:
______________________________________ Bits ______________________________________ 1-8 = operation code 9-12 = operand 1 register 13-16 = base register 17-32 = offset ______________________________________
The instructions are stored in the processor memory. The processor is designed to go to the memory and withdraw the instructions. As indicated, the 1st eight bits are the op code which tell the processor what to do, for example, to add. The next four bits identify a register containing the first operand to be involved in the specified operation. The remaining four and sixteen bit fields represent a location in memory of the second operand. Usually the contents of the designated base register are combined with the offset to determine the memory address from which the second operand is to be fetched. Thus, before the second operand is obtained its address must be "prepared" by adding the offset to the contents of the base register.
The performance of a general purpose processor is largely determined by the speed with which it can perform the various steps, such as fetching an instruction, decoding the instruction, fetching the first operand, preparing the address for the second operand, fetching the second operand from memory, executing the instruction and restoring the registers with the results of the operation. Also of importance is the cost and amount of circuitry required to implement a processor to perform its desired functions.
The simplest way of executing a sequence of such instructions is serial operation. That is, each instruction is taken in sequence and all of the necessary steps performed before the next instruction is acted upon. This means, however, that many portions of the processing system sit idle for significant time periods. A more efficient method of operating a processor is to execute the processing steps in parallel. In other words, the sequence of steps for executing a second instruction can be started before the sequence for the preceding instruction is complete. This technique is called pipe lining and the concept is known in the art.
Although pipe lining techniques are known, the implementations have been extremely complex and consequentially the use of this technique has been limited to expensive processors. In addition, the complexity of the logic tended to be counter productive, slowing the execution speed of the individual steps in the sequence thereby diminishing the improvement and performance obtained by pipe lining.
There are many approaches to the development and implementation of a pipe lined processor. In general, as the number of simultaneous activities increases, the cost of the processor increases due to the increased complexity of the control circuits required to keep the steps of each instruction in its proper order. For example, if the design should require that an address be prepared for one instruction while another instruction is being executed, two arithmetic logic units (ALUs) must be provided. The use of two ALUs requires two sets of registers and considerable surrounding logic to maintain order. Further, increasing the number of parallel steps increases the time lost when a branch is taken.
Obviously the most cost effective pipe lined processor is one which requires a minimum amount of control logic and which has all of the arithmetic and storage logic busy all of the time. In addition, the design should provide a minimum of lost time for branches and other operations which, of necessity, interrupt the pipe line operation of the system.
A major impediment to the efficient design of pipe lined computers is the fact that the instruction decode is normally followed by the address preparation. This sequence takes approximately the same time as the instruction fetch thus causing the operand fetch to occur just when it would be desirable to initiate the next instruction fetch for effective pipe line operation. In addition to fetch cycles conflicting, the address preparation and execution cycles also conflict.
A prior solution to this problem has been the use of expensive cache (fast) memory to speed up the instruction and operand fetches so that the fetches do not conflict and the use of two ALUs so that address preparation does not have to wait for instruction execution or vice versa. All of this requires a considerable amount of control logic and storage to make certain that the actual execution of the instructions occurs in the proper sequence despite out of sequence fetches.