1. Field of the Invention
The present invention relates to a data processor capable of efficiently supplying instructions.
2. Description of the Prior Art
Recent processors are capable of simultaneously executing several instructions. These processors need a device to supply a sufficient number of instructions thereto so that they can execute them without interruption. Such an instruction supplying device usually has an instruction fetch mechanism and a branch prediction mechanism. The instruction fetch mechanism fetches instructions block by block and keeps them in an instruction cache.
FIG. 1 shows an example of the instruction fetch mechanism according to a prior art. A CPU 101 reads instructions block by block out of a main memory 102 and stores them in an instruction cache 103. Each block of instructions consists of four words in the example, and blocks are numbered n, n+1, n+3, and the like. Transfer of instructions from the main memory 102 to the instruction cache 103 is carried out with, for example, 4-word burst transfer.
The cached instructions are supplied to pipelines of the CPU 101 block by block. Even a branch target instruction is transferred in a block from the main memory 102.
FIG. 2 shows examples of blocks of words representing instructions fetched from the main memory 102 to the instruction cache 103. A block supplied at time t contains a branch target instruction 4n+3. This instruction is not at the head of the block, and instructions 4n+0 to 4n+2 in the same block are not executed. This sort of branch instruction deteriorates instruction fetching efficiency because it reduces the number of instructions supplied to the CPU 101.
To solve this problem, one idea is to fetch a block of instructions from the main memory 102 with a branch target instruction being at the head of the block. This, however, is very difficult to practically achieve because of the structure of the instruction cache.
A practical solution of the problem is to use branch prediction hardware to pick up a branch target instruction in advance. Branch prediction is carried out by holding historical data on branch instructions executed before. The historical data is referred to for predicting whether or not a branch occurs, and according to the prediction, the next instruction is read in advance. The branch prediction is usually practiced with a branch target buffer.
The branch target buffer employs a branch instruction table, a target table, a branch prediction flag, and a comparator. The branch instruction table stores the lower addresses of branch instructions that are accessible with the lower bits of a given instruction stored in a program counter. The target table stores branch target addresses corresponding to the branch instructions stored in the branch instruction table. The comparator determines whether or not a given instruction will branch.
More precisely, the program counter stores the address of an instruction to be executed. If the address agrees with one of the entries in the branch instruction table, the instruction in question is a branch instruction. Then, a target address related to the branch instruction is read out of the target table and is set as a next value in the program counter. If the address agrees with none of the entries in the branch instruction table, the instruction in question is not a branch instruction, and the program counter is normally updated.
FIG. 3 shows an extended branch target buffer according to a prior art. This prior art stores not only branch target addresses but also branch target instructions themselves in the branch target buffer 104. Namely, the buffer 104 stores a copy of an instruction cache 103, to supply instructions to CPU pipelines incessantly.
FIG. 4 shows blocks of instructions handled by the instruction cache 103 and branch target buffer 104. If a branch target of a given branch instruction is 4n+3, a block starting from 4n+3 followed by 4n+4 to 4n+6 is stored in the buffer 104. At time t-1, the block of 4n+3 to 4n+6 is read out of the buffer 104 and is supplied to CPU pipelines. At time t, the instructions up to 4n+6 have been executed, and therefore, 4n+7, etc., are read out of the instruction cache 103.
A disadvantage of this technique is that the branch target buffer 104 must have a large capacity to improve instruction supply performance. This increases necessary hardware.
Namely, to improve the efficiency of supplying instructions, the branch target buffer 104 must store not only branch target addresses but also branch target instructions, or even several sequences of branch target instructions. This results in increasing the capacity of the buffer 104, to deteriorate prediction performance and increase hardware costs.