1. Field of the Invention
This invention relates to computer systems and, more particularly, to methods and apparatus for increasing the speed of operation of computer processors capable of providing pipelined operation.
2. Background of the Invention
The development of digital computers progressed through a series of stages beginning with processors that were able to process only a few basic instructions in which the programming needed to be done at a machine language level to processors capable of handling very complicated instructions written in high level languages. At least one of the reasons for this development is that high level languages are easier for programmers to use; and, consequently, more programs are developed more rapidly. Another reason is that more advanced machines executed operations more rapidly.
There came a point, however, where the constant increase in the ability of the computers to run more complicated instructions actually began to slow the operation of the computer over what engineers in the computer field felt was possible with machines operating with only a small number of basic instructions. These engineers began to design advanced machines for running a limited number of instructions, a so-called reduced instruction set, and were able to demonstrate that these machines did, in fact, operate more rapidly for some types of operations. Thus began the reduced instruction set computer which has become known by its acronym, RISC.
The central processing unit of the typical RISC computer is very simple. In general, it fetches an instruction every clock cycle. In its simplest embodiment, all instructions except for load and store instructions act upon internal registers within the central processing unit. A load instruction is used to fetch data from external memory and place it in an internal register, and a store instruction is used to take the contents of an internal register and place it in external memory.
One of the techniques utilized in RISC and other computers for obtaining higher speeds of operation is called pipelining. Processors utilized in computer systems to provide pipelined operations normally cycle through fetch, decode, execute, and write back steps of operation in executing each instruction. In a typical pipelined system, the individual instructions are overlapped so that an instruction executes once each clock cycle of the system. However, when the pipelined RISC computer is performing a load or store operation, a longer time is required because the typical system has only a single data bus that carries both instructions and data to off-chip memory. A load or a store operation requires that both the instruction and the data be moved on the bus. Consequently, at least two cycles are normally required.
This two cycle minimum time has been shortened in the prior art by employing the "Harvard" architecture in which separate buses and memories are used for instructions and data. Using this architecture, the processor can continue executing instructions while a load or a store instruction is being performed since there is a separate path for fetching instructions. However, with single chip integrated circuit implementations of RISC processors, a Harvard architecture requires that either the existing address and data pins to memory operate at twice the rate at which instructions are needed or twice the number of pins must be provided. In either case, twice the off-chip bandwidth is required.
To avoid this significant change in off-chip bandwidth, some RISC chips provide an on-chip instruction cache so that an internal source of instructions is available and both data and instructions need not be on the bus at the same time. However, such an instruction cache usually requires a good deal of chip space and may not be feasible without radical overhaul of the architecture of a chip.
It is, therefore, very desirable that some means for eliminating the time cost of load and store instructions in pipelined systems, especially RISC systems, be provided. Moreover, this same problem may arise in any situation in which a pipelined processor of any sort must deal with an instruction that causes a stall due to a bus contention. For example, an instruction adding memory to a register might cause such a stall. Input/output controllers utilizing processors are often subject to such problems.