This invention relates to pipelined computer central processors and their support logic structure. More particularly, this invention relates to a private cache associated with each processor and which incorporates a specially-configured branch cache for increasing the average efficiency and speed in handling transfer instructions in the pipeline which may be subject to a transfer go condition.
As faster operation of computers has been sought, numerous hardware/firmware features have been employed to achieve that purpose. One widely incorporated feature directed to increasing the speed of operation is pipelining in which the various stages of execution of a series of consecutive machine level instructions are undertaken simultaneously. Thus, in a simple example, during a given time increment, a first stage of a fourth (in order of execution) instruction may be carried out while a second stage of a third instruction, a third stage of a second instruction and a fourth stage of a first instruction are all performed simultaneously.
Pipelining dramatically increases the apparent speed of operation of a computer system. However, it is well known that the processing of a transfer (sometimes called a branch) instruction when it is necessary to find a target (i.e., when the conditions calling for a transfer are met) temporarily slow down processing while the target instruction is found in the cache. Even when an instruction cache is provided, the target must be found and processed before it can be sent to the pipeline. It is to significantly speeding up the average rate of servicing transfer operations that the present invention is directed.
The environment of the invention is within a data processing system having a pipelined processor and a cache which includes an instruction cache, instruction buffers for receiving instruction sub-blocks from the instruction cache and providing instructions to the pipelined processor, and a branch cache. The branch cache includes an instruction buffer adjunct for storing an information set for each of sub-blocks which are currently resident in the instruction buffers. The information set includes a search address, a predicted transfer hit/miss, a projected location of a target in a sub-block and a predicted target address and may include additional information. A branch cache directory stores instruction buffer addresses corresponding to current entries in the instruction buffer adjunct, and a target address RAM stores target addresses developed from prior searches of the branch cache. A delay pipe is used to selectively step an information set read from the buffer instruction adjunct in synchronism with a transfer instruction traversing the pipeline. A comparison, at a predetermined phase along the delay pipe, determines if the information set identifies, as currently resident in the instruction buffers, a target address that matches the target address in the transfer instruction traversing the pipeline. If there is a finding that the information set traversing the delay pipe identifies a target address in the instruction buffers that matches the target address in the transfer instruction traversing the pipeline and there is an indication of TRA-GO from the pipeline, the instruction identified by the target address is sent to the pipeline from the instruction buffers rather than from the instruction cache, a faster operation. If there is not such a finding, the instruction is sent to the pipeline from the instruction cache. Preferably, the sub-blocks stored in the instruction buffers are four instruction words in length.