1. Field of the Invention
This invention relates to an information processing apparatus, and more particularly to an information processing apparatus in which instructions are executed by the pipeline method and is provided with cache memories for instructions and data.
2. Description of the Prior Art
FIG. 5 illustrates a conventional pipeline type information processing apparatus which is provided with physical cache memories for instructions and data. In the pipeline type information processing apparatus, the execution of instructions is performed in parallel in four stages: FETCH, DECODE, EXECUTION and STORE stages.
In the FETCH stage, a logical address is translated into a physical address by an address translating unit 112, and an instruction cache 104 is accessed so that an instruction at an address (nPC) indicated by a program counter 111 is fetched into an instruction buffer 105.
In the DECODE stage, instructions in the instruction buffer 105 are converted by an instruction decoder 106 to pipeline control signals 107 which are in turn used to control an arithmetic and logic unit 116 and data cache 113. In the case of operation instructions, the operands required in the execution of the operation instructions are read from a general purpose registers 115, and then output to an internal bus 117. In the case of load and store instructions, the immediate information contained in the instruction code is converted to single-word data in an immediate code extension unit 108, and the address of the data to be loaded or stored is calculated by an address calculating unit 109. An example of an instruction format used in a 32-bit information processing apparatus is shown in FIG. 4. In this format, the 13-bit "imm" field indicates immediate information, and the "op" field indicates the type of instruction. Instructions which perform generally similar types of functions often employ an instruction set architecture in which they are distinguished by two or three bits. In FIG. 4, "rs" and "rd" indicate the source register and the destination register, respectively. The address of the data to be loaded is calculated in the address calculating unit 109 as [rs +imm].
In the EXECUTION stage, when an operation instruction is issued, each operation is performed in the arithmetic and logic unit 116, and the operation results are output to an internal bus 118. If the operation instruction is accompanied by a conditional Judgment such as overflow, underflow, negative, zero, etc., the above operation results are checked in a conditional Judgment section 119 in which condition codes indicating conditions such as overflow, underflow, negative and zero are generated. When a load instruction is issued, the data cache 113 is accessed, and the data is transferred to the correct bus by an aligner 114 and output to the internal bus 118.
In the STORE stage, the data on the internal bus 118 is stored in the general purpose register 115. In the case of a load instruction, the data is stored in the destination register "rd" indicated in the instruction code.
When the capacities of the caches 104 and 113 are sufficiently large and the miss-hit rate is low, there is almost no need to exchange instructions and data with the main memory 102 via the bus controller 103, and each instruction is processed at every cycle by pipeline processing. FIG. 6 shows a timing chart of pipeline processing in which it is assumed that there is no miss-hit in the cache. In FIG. 6, the instruction address "n" indicated by nPC is output from the program counter 111 during the period from cycle t-2 to cycle t-1. In cycle t-1, a logical address of a portion of the instruction is translated into a physical address, and the instruction cache 104 is accessed. In cycle t, the instruction I[n] at the instruction address "n" is fetched to the instruction buffer 105. In this way, pipeline processing of the instruction I[n] is initiated from the FETCH stage.
FIG. 3A shows an example of a source program supplied from a compiler to the information processing apparatus 101. In FIG. 3A, "ld" is the load instruction, "st" is the store instruction, "muld" is the double-word multiplication instruction, and "addd" is the double-word addition instruction. The "muld" instruction at address 00000010 shown in FIG. 3A performs double-word multiplication between the registers (r20, r21) and the registers (r10, r11), and outputs the results to the registers (r30, r31 ). As shown in this example, the operand data for the operation instruction are often loaded in registers at two consecutive addresses, before the double-word operation instruction.
FIG. 10 shows another example of a source program supplied from the compiler to the information processing apparatus 101. In FIG. 10, "ldd" is a double-word load instruction, "std" is a double-word store instruction, "fmuld" is a double-word multiplication instruction, "faddd" is a double-word addition instruction, "add" is a single-word addition instruction, "addcc" is a single-word addition instruction for generating the negative condition code in the conditional Judgment section 119 when the result is negative, and "bge" is a backward branch instruction for returning to address 00000000 when the condition code is positive. The instruction at address 00000028 is executed following the branch instruction at address 00000024 before the instruction at address 00000000 is executed.
The program shown in FIG. 10 is repeated until the operation result at address 00000018 becomes negative, and it forms a loop consisting of the 11 steps from address 00000000 to address 00000028. In this program, the addresses loaded by the load instructions at addresses 00000000 and 00000008 change each time the loop is repeated in the add instructions at addresses 00000010 and 0000001c, respectively, but the address loaded by the load instruction at address 00000004 does not change in the loop.
The above-mentioned configuration only has the function to execute the instruction code output from the compiler as it is. Even when an inefficient source program (such as that shown in FIG. 3A) in which processing that can be executed by one set of the double-word load instruction "ldd" is performed twice with the single-word load instruction "ld", the inefficient source program must be executed as is. This increases the number of program steps which must be executed in an information processing apparatus, with the result of an increased processing time. This problem often occurs when executing an application program processed by a compiler with poor optimization performance.
According to the above-mentioned configuration, moreover, in the loop program shown in FIG. 10, load instructions which need to be executed once in the loop such as that shown at address 00000004 are executed each time the loop is repeated. Therefore, the number of program steps which must be executed in an information processing apparatus is increased, thus increasing processing time.