The present invention relates to a microprocessor and, more particularly, to an improvement in an instruction prefetch unit of a pipelined microprocessor.
An instruction prefetch function and a pipeline processing function are widely used to enhance the program execution efficiency of a microprocessor. However, when the microprocessor encounters a conditional branch instruction, the execution efficiency thereof is often lowered remarkably. Specifically, the conditional branch instruction is used to control or change the instruction stream in accordance with whether or not a branch condition designated by the conditional branch instruction is coincident with a current processor execution state which may be changed by the execution of an instruction before the conditional branch instruction. Therefore, whether the branch is taken is unknown before the processor execution state is settled. In other words, even if the conditional branch instruction is decoded, the execution thereof is awaited until the processor execution state is settled. Before the execution of the conditional branch instruction, one or more instructions subsequent to that branch instruction are prefetched by the instruction prefetch operation. These instructions thus prefetched are executed as valid instructions when the branch is not taken. On the other hand, when the branch is taken, the prefetched instructions become invalid instructions. An instruction at a branch address (called hereinafter "branch target instruction") has to be fetched newly. A pipelined operation is thereby brought to a halt.
In order to solve this drawback, therefore, such a construction has been proposed that not only the instruction subsequent to the conditional branch instruction but also the branch target instruction are prefetched before the branch is determined to have been taken and one of them is selected to be executed in accordance with whether the branch was taken. According to this construction, even if the branch was taken, the branch target instruction has been already prefetched, so that the program execution efficiency can be enhanced.
However, the following problem is raised in a recent high performance microprocessor. Specifically, a microprocessor generally performs a data processing operation in units of words. The instruction prefetch operation is also performed in units of words. In a recent high performance microprocessor, the bit length of one word is expanded. For example, assuming that one word is constructed of 32 bits, the instruction data would be fetched in 4-byte units per one prefetch operation. The fact that the instruction prefetch operation is performed in 4-byte units means that the contents of the least significant two bits (including the least significant bit) of a memory access address are disregarded. That is, the prefetch operation for the branch target instruction is performed by disregarding the contents of the less significant two bits of the branch address. On the other hand, the byte length of respective instructions including the branch target instruction is not constant, but is changed independent on the required data processing operation and/or the addressing mode of operand data. For this reason, the leading byte of an .instruction is not always coincident with the word boundary. Thus, the leading byte of the branch target instruction is often different from the first byte of the four bytes fetched in fact by the prefetch operation for the branch target instruction.
An instruction includes an operation code (called hereinafter "OP-code") field and one or two operand fields, in general. The OP-code field is decoded by an OP-code field decoder of an instruction decoder unit and then supplied to an execution unit. On the other hand, the operand field is decoded by an addressing field decoder of the decoder unit and then supplied to an operand access unit. Therefore, it is required to detect whether the OP-code field and the operand field of the branch target instruction are supplied to the corresponding decoders of the instruction decoder unit, because of the fact that the leading byte of the branch target instruction is not always coincident with the first byte of the actually fetched four bytes. As a result, the initiation in decoding of the branch target instruction is delayed.