1. Field of the Invention
The present invention relates to various kinds of information processors having a delayed branch function, and more particularly to a variable-length VLIW (Very Long Instruction Word) information processor having a delayed branch function.
2. Description of the Related Art
As a processor's operating frequency increases, the number of pipeline stages increases, and thus the pipeline's disturbance resulting from branch instructions increases. In order to reduce a branch penalty, a branch prediction has been performed with a branch target buffer (BTB), and a delayed branch scheme has also been adopted in which a branch is executed after the execution of instructions (delay slot instructions) subsequent to a branch instruction. In a case where the number of steps SN of delay slot instructions is variable in length, information on the number of steps SN is included in a branch instruction code.
Meanwhile, in the VLIW processors, NOP (No Operation) instructions are added as dummy instructions in advance by a programmer or a compiler in order to logically attain spatially-parallel executable VLIWS. However, the program's versatility is low because the spatially-parallel executable VLIW depends on the characteristics of the VLIW processors, and the cache hit rate also decreases because a large number of NOP instructions are included in an instruction cache.
In order to overcome such problems, a variable-length VLIW processor as follows has been developed. A packing flag indicating a boundary between VLIWs is inserted in each instruction code of a VLIW so that VLIWs are variable in length. After the variable-length VLIWs (compressed VLIW) are read from the instruction cache by the processor, their boundaries are detected using the value of packing flags. NOP instructions are added to VLIWs whose instruction length is shorter than the maximum length, before they are provided to an instruction pipeline.
Even if a BTB output indicating a branch instruction is obtained by accessing the instruction cache and the BTB simultaneously using an instruction address, in the case of the variable-length VLIW, it is necessary to confirm up to which portion of a series of instructions, read from the instruction cache, belongs to one variable-length VLIW and then decode the VLIW so as to check which portion of the VLIW is the branch instruction, and further to read the information of the number of steps SN included in the branch instruction code.
According to the VLIW processor, since each of the VLIWs can be recognized as one block thanks to the packing flags, the number of steps SN can be variable in length without incorporating the information of the number of steps SN into the branch instruction code. In this case, there is no need to read the number of steps SN.
However, in order to confirm the number of steps of the delay slot instructions, it is necessary to confirm up to which portion of a series of instructions, read from the instruction cache, belongs to one variable-length VLIW and also to sequentially read instructions from the instruction cache up to an instruction code which includes a packing flag PF (for example, PF=‘1’) designating the boundary of the delayed branch slot. If the reading of the delay slot instructions is performed without confirming the number of steps under the assumption that the number of steps is the maximum value, an unnecessary pre-reading is performed causing a delay in providing instructions to the instruction pipeline.
Therefore, irrespective of whether or not the information of the number of steps SN is included in the branch instruction code, a delay occurs in providing instructions from the instruction cache to the instruction pipeline. Because processors, other than the VLIW processor, also perform processes of reading instructions from the instruction cache and decoding the read instructions in a similar manner, such problems may occur in various kinds of processors having the delayed branch function.