The present invention relates generally to the field of digital computer architecture and, in Particular, to apparatus for processing instructions in high speed data Processing systems.
One form of known digital computer, a pipelined system, concurrently processes a succession of instructions, each executed in part at each of a succession of stages. After the instruction has been processed at all of the stages, the execution is complete. With this processor configuration, as an instruction is passed from one stage to the next, it is replaced by the next instruction in the program. The stages together form a "pipeline" which at any time is simultaneously processing a succession of instructions flowing through the pipelined processor. A further description of the operation of Pipelined processors can be had by reference to commonly assigned United States Patent Application, entitled "Data Processing Apparatus and Method Employing Instruction Flow Prediction", Ser. No. 578,872, filed Feb. 27, 1984 (PCH-278), now U.S. Pat. No. 4,777,594 issued Oct. 11, 1988, the specification of which being incorporated herein.
When a digital computer encounters a branch instruction, an instruction indicating a possible change from the normally orderly sequential execution of instructions, it is wasteful of computer resources to wait for decoding of the instruction before proceeding with the next instruction to be fetched for execution, and yet branch instruction decoding would appear at first blush to be necessary in order to determine the branch outcome, that is, the target address of the next instruction. Consequently, pipelined systems commonly utilize branch prediction mechanisms to predict the outcome of branch instructions before the execution of the instruction, and to guide prefetching of instructions. If a prediction is successful the computer will function without a delay in processing time due to decoding of the branch instruction.
Accordingly, it is a known advantage to provide a mechanism to predict a change in program flow as a result of a branch instruction. It has been recognized, however, that there is a time penalty for an incorrect prediction of program flow. This time loss occurs, for example, when instructions issue along the incorrect path selected by the branch prediction mechanism, and/or conditionally issued instructions along the correct path are cancelled.
Several approaches to improving branch prediction are known in the art.
U.S. Pat. No. 3,559,183 to Sussenguth teaches the reduction of branch penalty through the use of registers to store branch history tables. The tables are accessed by instruction addresses which are cross-referenced with branch target addresses to indicate whether a taken branch were previously encountered at a specified instruction address and, if so, the target address of that branch on its previous execution. This target address is then used to redirect instruction prefetching because of the likelihood that the branch will repeat its past behavior.
U.S. Pat. No. 3,940,741 to Horikoshi teaches a memory for storing branch target addresses in corresponding relationship to the branch target instructions, the memory being referenced by the branch target address which is used for prediction purposes.
U.S. Pat. No. 4,200,927 to Hughes teaches a branch processing mechanism using a field of three branch status bits of an instruction queue register to store signals indicative of the type of instruction to be executed, e.g., no branch, branch on condition, or other conditional branch instruction.
U.S. Pat. No. 4,435,756 to Potash teaches the use of encodings in a fetched conditional branch instruction which predict the state of the branch condition to be tested, and a pre-fetch means which fetches the next instruction based on the predicted state.
U.S. Pat. No. 4,477,872 to Losq teaches a method of improving guess accuracy of the prediction of the branch instruction outcome, but not its target address, by utilizing at decode time a table which records the history of the outcome of the branch.
In yet another approach, the reduction of branch penalty is attempted through the use of branch cache memory structures in conjunction with prediction logic. These are utilized to permit expedited predictions of non-sequential program flow following a branch instruction, prior to determination that the instruction is capable of modifying program flow. Branch cache is a fast access storage system which holds currently used branch information such as branch addresses within the central processor itself. A prediction using such an approach does not require computation of the branch address before instruction prefetching can continue because target and branch addresses are locally stored in the branch cache. This information is used to make predictions based solely on previous instruction locations, thereby avoiding the wait for decoding of the current instruction before proceeding with pre-fetch of the next instruction.
An advantage of branch prediction using branch cache memory structures is the potential of substantially reducing delays associated with branching. There remain, however, possible delays due to incorrect prediction of branches. There are also possible delays associated with cache access time for branch targets.
To reduce cache access time, indexing of the branch cache structures has proven successful. Typically, a portion of the branch address of each entry stored in the branch cache structure is used as an index to that entry. For example, the least significant bits ("LSB"), the bits with the smallest numerical value at the right end of a word, can be used as the index and an entry can be stored at locations in the branch cache structure corresponding to that LSB.
A disadvantage of the indexing technique is that, of course, more than one entry can share the same LSB, but have different higher order bits which are known as the most significant bits ("MSB"). Absent means to narrow the selection to a single entry, a "collision" results, consequently, to avoid "collisions" and impose prediction accuracy, a technique to validate or confirm the branch cache entry as a suitable prediction must be employed.
Exemplary of branch prediction using an indexing technique is U.S. Pat. No. 4,370,711 to Smith which teaches a branch predictor using random access memory ("RAM"). Recent branch history is used to predict the branch decision through the use of an index hashed from the instruction address, for example, the LSB, in combination with a count stored in the RAM at a corresponding hash address location. The count is incremented or decremented to indicate branch outcome. The count serves also to indirectly validate branch address selection by influencing the prediction according to the way a "majority" of the more recent decisions were made, by effectively "voting" among the branch instructions mapping to the same memory locations.
A more direct method of confirming the selection of the branch address to be predicted and avoid collisions altogether is to use the full MSB as a tag. This indexing technique encompasses the steps of locating a branch cache entry by matching the LSB of a fetched branch instruction address with a location address of the branch cache structure and then validating the match by comparing the MSB of the entry stored at the location with the MSB of the fetched instruction. If the MSB matches, then the entire fetched branch instruction corresponds to the branch cache entry and, generally speaking, a prediction using that entry can be made.
Unfortunately, the index - MSB tag technique just described uses the entire branch address which, with segment offsets, segment number and process identification code, can amount to a relatively large number of bits, for example, 46 bits. The memory size, pin-outs, logic devices and other hardware needed to implement this technique can be costly.
Accordingly, an object of the invention is to provide an improved branch prediction apparatus with a high rate of correct predictions, so as to reduce the time loss resulting from incorrect predictions.
A further object of the invention is to provide an improved branch cache entry qualifying technique to reduce branch cache accessing time and reduce the likelihood of collisions while requiring less hardware than in known arrangements.