1. Field of the Invention
The present invention relates generally to the field of general purpose digital computers and to methods and apparatus for high speed instruction processing. More particularly, the invention relates to pipeline processing control of hardwired instructions.
A general purpose digital computer processes a group of instructions which are received, in sequence, from a storage unit of the computer. The instructions processed or executed by the processor of the computer can be organized depending on whether or not a particular instruction depends on microcode for its execution. Instructions that do not depend on microcode, called hardwired instructions, are executed solely through hardware in the computer and perform the most basic functions of the computer. One method of processing hardwired instructions is to execute them serially, one starting after a preceding instruction has finished. This normally wastes a significant amount of the available computer hardware because most of the hardware sits idle as the instruction is passed from one part of the computer to the next in its execution. Another method of processing recognizes that the processing of each instruction within a sequence involves several different stages. Several stages can be processed simultaneously if each stage, by itself, can be processed independently of the other stages within the processor. This results in the first stage of one instruction being executed by the processor immediately following the execution of the first stage of a previous instruction while, at the same time, the execution of the second stage of the previous instruction takes place. In general for a K stage pipeline, the Nth stage of an instruction is executed following the Nth stage of the previous instruction and the N+1 through the last stages of the previous K-N instructions are executed simultaneously with the Nth stage of the current instruction.
These stages of hardwired instructions generally include, among others, routing the instruction to the proper device for reading and decoding the instruction, reading and decoding the instruction, obtaining any information required by the instruction for further processing, executing the instruction, and routing the results of the execution to the proper devices to act on the results. All the stages are performed in the same duration of time, so that although some stages may execute faster than others, stages are performed in the same duration of time, so that although some stages may execute faster than others, the stage with the longest processing time sets the time duration for all the stages. Each stage is unique in that its execution only requires a part of the computer apparatus not used by the other stages. This means that, as an instruction moves through each stage, the other parts of the computer not associated with the individual stage are free to operate on other instructions. The instruction stage may depend on the output of other stages for its input so that its execution may not be completely independent from other stages, however, once the inputs to a particular stage are available, the execution of the instruction with those inputs is independent of other stages. Therefore, it is possible for all the different stages to be executing simultaneously and in turn process several instructions simultaneously instead of serially. This method will waste less of the available computer hardware and take less time to process a sequence of instructions in order.
This segmentational approach to processing instructions is referred to as pipelining and is described, for example, in an article by D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo entitled "The IBM System/360 Model 91: Machine Philosophy and Instruction Handling", IBM Journal of Research and Development, Vol. 11, No. 1, pp. 8-24, January 1967. Since different sections of consecutive instructions are carried out simultaneously the computer throughput is improved. The term "performance" is synonymous with the term "throughput"; it is measured by recording the number of instructions-per-cycle, that is, the number of instructions completed in one machine cycle. The measurement is an average number produced when a batch of instructions or a program is processed in the processor. It is the inverse of the number of machine cycles it takes to complete a batch of instructions or a program. The smaller number of machine cycles per program the better the performance or throughput.
The particular pipeline structure used for executing hardwired instructions in a computer is very dependant upon the way in which the hardware in the computer is designed to operate. Typically, higher performance machines use separate parts of the computer for specialized requirements, such as having one part of the computer only doing a small piece of an instruction but doing it very fast, or accessing memory in the computer by a specific calculation technique requiring separate or different instruction stages. This will make the computer have higher throughput but will also generally make the individual instruction go through a larger number of stages. The increased number of stages coupled with the fact that a large number of hardwired instructions do not require all the stages in the execution of a particular instruction, means that even though the throughput is improved there is still a significant amount of computer hardware not being used at any one time. This is because when a particular pipeline structure is used, the instruction must execute all the stages of the instruction, regardless of whether or not the stage actually provides a function required by the instruction. This results in the processor standing idle for the full time associated with one stage of an instruction because each stage is allocated the same amount of processor time regardless of the actual execution time of the stage. The idle time of the processor in one instruction ripples through the execution of a sequence of instructions because the simultaneous processing of instructions requires that the Nth stage of a current instruction not start execution until the Nth stage of the preceding instruction has completed execution. Therefore, if the idle time of the processor associated with one instruction delays the Nth stage of a preceding instruction it will also delay the Nth stage of the current instruction and so on throughout the processing of the sequence of instructions. Delays in the execution of stages will continuously add up in this fashion until they become a significant factor in the performance of the processor.
2. Prior Art
The prior art has attacked the idle processor time problem, of pipelined processors, in a variety of ways. One such attack involves a pipelined processor which divides the decoding of each instruction into an operation decode and an operand specifier decode. The processor then decodes an operation and an operand part of an instruction in every decode stage, decoding subsequent operand parts of instructions when the current instruction does not require an operand decode. This method of processing instructions requires duplicate sets of hardware in order to fetch and buffer for use the two parts of the instruction. In addition, this method only contemplates saving time associated with fetching data from memory and does not address the problem of how to save time associated with executing the operation stages of the instructions. Another method of reducing idle processor time involves the sequential processing of a specific two instruction combination, for loading an execution result into an address of main memory, which then is executed in fewer stages than would be required in a conventional pipeline structure. Although the specific two instruction combination does appear repeatedly, there are many more combinations of instructions that waste execution stages. A particular solution to one such combination does not address the larger, general problem, of how to remove wasted stages in many different instruction combinations.
Another prior method reduces idle processor time by allowing the execution of the instruction fetch and address preparation stages of the pipeline to overlap. Here, the second instruction fetch section of a two instruction sequence (each instruction including both fetch and preparation stages) will be executed faster because the processor will not need to wait as long for the second instruction fetch stage, of the two instruction sequence, to complete execution. This method of instruction processing requires additional state logic to control and keep track of what instructions are at what stage of processing because as fetch or address preparation stages overlap in execution, the processor may or may not have the operands necessary to perform the current instruction. The additional state logic hardware is an unnecessary and complex burden which also does not address the problem of processor utilization when the instruction does not need to have an address preparation stage.
Another prior art attempt to reduce processor time used two fixed pipeline structures, one for the instruction processing and one for instruction execution. The system employs pipeline control circuitry to gate instructions through the different stage of each pipeline. The two pipelines are each three stages long with one stage overlapping between the two pipelines. This requires that there will be five stages between the two pipelines and therefore the combined pipeline structure displays the inefficiency of a single, fixed structure, pipeline.
Improving processor performance implies saving unnecessary machine cycles, if those cycles were merely bypassed altogether, then processor performance would be enhanced. This, however, would produce two problems; 1) different instructions would have different pipeline lengths and 2) some instructions would finish processing out of the order of the sequence in which they started processing. The first result of bypassing stages requires the determination of which particular instructions will have stages bypassed and what conditions in the computer hardware will generate different pipeline lengths for different instructions. In addition, the second result is because shorter pipelength instructions would take less time to execute than longer pipelength instructions and so that even if they started executing later than the longer pipelength instructions, they could still finish earlier. The second result of processing instructions in a way that results in an out of order sequence requires a significant increase in the complexity and amount of hardware used in the computer in order to keep track of what instruction is at what stage of execution and what information each instruction needs at each stage. Such complexity is not justified or possible to be handled by many systems.