One form of architecture used in high speed computers is pipelining. A pipelined central processor is organized as a series of stages where each stage performs a dedicated function, or task, much like a job station on a factory assembly line. While pipelining does not decrease the time required to execute an instruction, it does allow multiple instructions in various phases of execution to be processed simultaneously. For an n stage pipeline, where one instruction enters the pipeline every cycle and a different instruction exits from the pipeline on every cycle, instructions appear to be executed at a rate equal to the cycle time. The actual execution time is n times the cycle time, as long as there are no dependencies between instructions in the program being executed.
Quite frequently, however, execution of one instruction requires data generated by another instruction. Execution of the instruction requiring the data must be held up until the instruction generating the data is completed. Typically, dependencies arise in (1) instruction sequencing, for example, a conditional branch; (2) operand address formation, for example, loading a register used in forming an address; and (3) execute data, for example, calculating data that is used in a later operation. The delay associated with each of these dependencies is dependent upon type of dependency, the length of the pipeline and the spacing between the two instructions in the program. The delay is increased for longer pipelines and/or close spacing between two instructions with dependencies. Several techniques have been developed to minimize the delay, including branch predictors and bypass paths which short circuit the normal pipeline data flow.
Many current computer architectures contain instructions which require several processor cycles to execute. This type of architecture is commonly known as a "complex instruction set computer" (CISC). The term CISC arose as the popularity of another type of architecture, "reduced instruction set computers" (RISCs), grew. The predominance of single cycle instructions and the lack of complex pipeline interlock detection hardware are characteristic of the RISC machines.
Program execution in a data processor usually includes fetching programmed instructions from memory and then performing the operation indicated by that instruction. An instruction decoder portion of the processor hardware is dedicated to transforming the ones and zeros in an instruction into signals used to control operations performed by other parts of the processor.
To execute single-cycle instructions, the instruction decoder can be a simple lookup table addressed by the instruction. The table width is determined by the required number of control signals. The table can be implemented using logic gates or memory such as RAM or ROM.
The execution of multicycle instructions requires a control sequencer which can be implemented using either logic gates or memory. Computers with only a few multicycle instructions may utilize the logic approach. As the number of multicycle instructions increases, the memory-based design is usually chosen, since the logic approach becomes very cumbersome. The memory-based design is usually called microcode. Most current computer designs, with the exception of RISC machines, rely heavily upon microcode and microprograms to provide the desired processor functionality. The incorporation of microcode into a machine design requires a memory in which to store the microcode, usually called the control store, and a sequencer to provide flow control for the microcode.
Pipelined computers typically include at least an instruction fetch and decode stage and an execute stage. Microcode has been incorporated into the execute stage for handling of multicycle instructions. When a multicycle instruction is decoded, the microcode in the execute stage sequences through the steps of the complex instruction, and the remainder of the pipeline is placed on hold. Disadvantages of the microcode approach include the additional hardware required for implementing the microcode and somewhat slower operation due to memory latency involved in execution of the microcode.
Other approaches to the execution of multicycle instructions have included the storage of millicode instructions in memory, as described by D. S. Coutant et al in Hewlett Packard Journal, Jan. 1986, pp. 4-19. Still another approach described by A Bandyopadhyay et al in "Combining Both Micro-code And Hardwired Control in RISC," Computer Architecture News, Sep. 1987, pp. 11-15, involves the incorporation of a bit into each instruction that indicates whether the instruction is single cycle or multicycle. When the bit indicates a multicycle instruction, microcode is called. The disadvantage of both these approaches is a delay in operation of one or two machine cycles, since the instruction is not identified as a multicycle instruction until it has been decoded. Thus, the subsequent cycle must wait until the preceding instruction is decoded.
The use of a branch cache prediction mechanism for providing more efficient pipelined processor operation is described in U.S. Pat. No. 4,777,594 issued Oct. 11, 1988 and assigned to the assignee of the present application. Without a prediction mechanism, a pipelined processor may function slowly on a branch instruction. A branch instruction requires branching to an instruction out of the normal sequence. However, earlier pipeline stages have begun processing the next instruction in the normal sequence. When this occurs, flushing of the pipeline and restarting is required, and processing is delayed. The branch cache memory is associated with instruction fetching and predicts the next address after a branch instruction based on past operation. Thus, when a branch instruction is encountered in the program, the branch cache predicts the target address of the branch instruction, and the next instruction address in the normal sequence is replaced with the branch target address. As a result, pipeline operation proceeds without interruption.
It is a general object of the present invention to provide improved digital processing apparatus.
It is another object of the present invention to provide a pipelined central processor capable of high speed instruction processing.
It is a further object of the present invention to provide a high performance, pipelined central processor capable of executing both single-cycle instructions and multicycle instructions.
It is yet another object of the present invention to provide a high performance, pipelined central processor that is simple in construction and low in cost.
It is still another object of the present invention to provide a high performance central processor that utilizes a cache memory for predicting multicycle instructions and for calling an instruction interpreter located in an instruction cache memory.