The present invention relates to an information processing apparatus and, more particularly, to an improvement in a microcomputer as an information processing apparatus which must fetch an instruction at a branch address in order to change an instruction stream.
As well known in the art, while a microcomputer executes instructions in a current instruction stream in order, it is often required to change the current instruction stream in response to a conditional/unconditional branch instruction (called collectively a "branch instruction") or an interrupt request. In such a case, the microcomputer has to perform a bus cycle to fetch an instruction at a branch address of a main memory from which a new instruction stream starts. An instruction at a branch address will be called hereinafter a "branch target instruction".
In general, it takes several clock cycles for a microcomputer to perform a bus cycle for fetching each instruction from the main memory. Since instructions in one instruction stream are stored in consecutive memory addresses, however, recent microcomputers are equipped with an instruction prefetch unit for prefetching instructions from the main memory, during a period of time when an execution unit in the microcomputer is executing an instruction, without using a system bus coupled to the main memory. Accordingly, as long as the instructions in the same instruction stream are executed, the bus cycles for fetching such instructions do not cause substantially lower data processing speed and/or efficiency of the microcomputer.
However, when the microcomputer encounters a branch instruction or receives an interrupt request, it is necessary to perform a bus cycle for fetching a branch target instruction from a branch address which is different from the prefetch address. At this time, the instructions which have already been fetched in the microcomputer are made invalid. The execution unit has to wait for the branch target instruction until the bus cycle for fetching the same is completed. That is, the data processing operation is suspended temporarily each time a branch request responsive to the branch instruction or the interrupt request occurs. Data processing speed and/or efficiency are thereby lowered.
Some microcomputers include a branch prediction unit. This unit operates to detect the prefetching of the branch instruction, obtain a branch address, and then cause the prefetch unit to prefetch an instruction at that branch address (i.e., a branch target instruction) before the execution unit executes the branch instruction. When execution of the branch instruction results in a determination that the branch is to be taken, the execution unit receives and executes the branch target instruction which has already been fetched in the microcomputer. Thus, the bus cycle for fetching the branch target instruction is performed as an instruction prefetching operation and does not suspend the data processing operation of the microcomputer.
However, in the case of branch failure (i.e. a determination that the branch is not to be taken), a bus cycle for fetching an instruction following the branch instruction is required to invalidate the prefetched branch target instruction. This bus cycle causes temporary suspension of the data operation of the microcomputer. Moreover, the branch prediction unit does not answer the branch request responsive to the interrupt request. A bus cycle for fetching a branch target instruction for an interrupt subroutine program, which is performed after suspending the current program execution, is required each time the interrupt request occurs.
Some microcomputers further incorporate a cache memory for copying a string of instructions and/or data stored in the main memory. In this case, if a branch target instruction is copied into the cache memory, that instruction is read out therefrom and transferred to the execution unit at a high speed, one clock for example, in response to the branch request. However, the cache memory copies only instructions stored in addresses which are in the neighborhood of the address storing the instruction being currently executed. The branch target instruction required is not always stored in the cache memory.
FIG. 9 is a block diagram of a conventional microcomputer which includes an instruction cache. This structure incorporates a bus controller unit 620, an execution unit 630, an external data bus terminal 640, an internal bus 650, a cache controller unit 660, an instruction cache 670, and an instruction bus 680. The bus controller unit 620, which is connected to the internal bus 650, receives a fetch request signal 651 from the cache controller unit 660. The bus controller unit 620 controls data reading and writing to and from an on-chip or off-chip memory (not shown), and also control input/output (I/O) operation. The bus controller unit 620 further controls fetching of instruction code stored in a memory (not shown). The execution unit 630, connected to the internal bus 650 and instruction bus 680, outputs an instruction signal 652 to the cache controller unit 660. The execution unit 630 also decodes the instruction code supplied from the instruction bus 680 and controls execution logical and data transfer operations. The external bus terminal 640, which is connected to the internal bus 650, is a 16-bit terminal. The internal bus 650 is 16 bits wide and, in addition to be connected to the external bus terminal 640, also is connected to the bus controller unit 620, the execution unit 630, and the instruction cache 670.
The cache controller unit 660, which is connected to the bus controller unit 620, the execution unit 630, and the instruction cache 670, outputs a write signal 653 for writing an instruction code to the instruction cache 670. The cache controller unit 660 also commands the instruction cache 670 to output the instruction code in response to the instruction request signal 652 supplied from the execution unit 630. The instruction cache 670, connected to the internal bus 650, the cache controller unit 660, and the instruction bus 680, is responsive to the write signal 653 from the cache controller unit 660 to store an instruction code on the internal bus 650, and the address from which the instruction code has been read. If the cache controller 660 does not issue the write signal 653, the cache 670 outputs the stored instruction code on bus 680. The instruction bus 680, connected to the instruction cache 670 and the execution unit 630, sends the instruction code stored in the instruction cache 670 to the execution unit 630.
The microcomputer of FIG. 9 operates as follows. The execution unit 630 outputs the instruction request signal 652 to the cache controller unit 660, which compares the address information stored in the instruction cache 670 with an address generated in the bus controller unit 620. If there is coincidence (a condition known generally as a "cache hit"), the instruction code is supposed from the instruction cache 670 to the instruction bus 680. Since the instruction cache generally is comprised of high speed memory, the execution unit 630 can read and execute an instruction code at every clock cycle from the instruction bus 680 when there is a cache hit.
If the address information stored in the instruction cache 670 is different from the instruction address (a condition known generally as a "cache miss"), the cache controller unit 660 recognizes that the requested instruction code does not exist in the instruction cache 670, and supplies the fetch request signal 651 to the bus controller unit 620. The bus controller unit 620 starts a bus cycle for reading an instruction code from the main memory (not shown) through the external data bus terminal 640 and the internal bus 650. The cache controller unit 660 outputs a write signal 653, so that the read instruction code and address information from which the code is read are written into the instruction cache 670, and the instruction code applied simultaneously to the instruction bus 680. Until the bus cycle terminates, say after three clock periods, instruction code is not supplied to the execution unit 630, so that instruction execution is inhibited.
For the provision of the next read request of instruction code, the cache controller unit 660 also starts a write cycle for reading an instruction code, stored in main memory at an address equal to the address information stored in the instruction cache 670, and then writing that information to the instruction cache 670 through the data terminal 640 and the internal bus 650.
Two other deficiencies of this cache memory structure, in addition to those mentioned above, must be considered. First, the cache memory structure is relatively complicated. Second, cache operation is a statistical process; hence the need to take up precious bus cycles for fetches in the event of a cache miss. The statistical aspect of cache operation in the context of the handling of branch instructions becomes more apparent in view of branch target instruction decoding operations, which also were mentioned earlier. When a branch instruction, stored in a prefetch queue, is decoded, the branch target address is calculated before the branch instruction is executed, so that the instruction code at the branch target address is prefetched. The various prefetches include the prefetch of an unconditional branch (i.e. a branch that is always taken); the prefetch of a conditional branch using a prediction bit, and a prefetch of a branch address indication through an indirect address stored in a register.
The usual capacity of cache included in a large scale integration (LSI) chip is 32 pages of 16 bytes each. Though execution of instruction code at successive addresses, or the branch inside the page can be processed at high speed, the branch outside the page occurs often in the control field, where there are many branches, and there is a need to update the cache frequently. The cache size could be increased to 64 pages of 32 bytes each, but this requires an increase in chip area, with an unacceptable attendant increase in product cost, particularly in the control area. Even if an instruction cache were provided external to the microcomputer chip, an additional bus only for the cache, and cache control terminals are necessary, resulting in a larger LSI package size.
Still further, the prefetching of instructions at the branch target address requires an additional instruction decoder in the prefetch queue, and a buffer circuit for storing prefetched data. Yet another disadvantage is that the prefetch function can process only branch instructions, but not interrupts, which also occur frequently in the control field. Thus, even the more complicated hardware cannot handle all of the situations which can occur. Also, even if they could handle all situations, the statistical nature of the operation of the cache-based circuitry can result in the waste of precious bus cycles, and the necessity of flushing of the cache, when branches are not taken.