A principle design objective in computer technology has been to increase the data processing rate, as well as maximizing the throughput of data. In early computer designs the instructions, which were stored in main memory, were read one at a time from the memory and each instruction was executed before the next instruction was read from memory. However, in typical applications, the memory operates at a much slower rate than the central processor which executes the instructions. Therefore, the processing of instructions is slowed substantially if there must be a wait to obtain each instruction.
As a result of the recognition of this delay in retrieving instructions, there have been developed instruction caches, in which a block of instructions are read from the main memory and stored in a small, high speed memory so that they can be quickly provided to the central processor. Such an instruction cache is shown in Peter Kogge, "The Architecture of Pipelined Computers," McGraw-Hill, 1981.
Although the use of an instruction cache does increase the operand processing rate, its effectiveness depends upon the execution of sequential instructions. Substantial delay can be encountered in such systems when the computer executes a branch away from sequential instructions. When a branch is encountered, the address of the branch instruction must be calculated. The branching of the instruction path typically invalidates all of the instructions which were stored in the cache. When this occurs, there must be a retrieval of an entire block of instructions to refill the cache and resume the execution of instructions.
The problem of branching in the use of instruction caches has been discussed in the literature, such as in the book to Kogge referenced above.
Despite the work that has been done regarding the use of instruction caches and the problem of branching, there remains a substantial difficulty in preventing the loss of processing time when such a branch is encountered. The present invention provides a method and apparatus for handling the branching problem so that the majority of branches (conditional and unconditional) are executed in one clock cycle.