1. Field of the Invention
The present invention relates to a memory system, and more specifically to a memory system including an instruction memory, for use in a RISC (reduced instruction set computer) type microprocessor having a highly pipelined architecture.
2. Description of Related Art
Rapid advancement of a VLSI (very large scaled integrated circuit) technology and a design technique has resulted in a remarkable development of microprocessors whose performance continues to elevate and approach a superminicomputer. One of the performance elevating technologies includes a so called RISC type microprocessor, which is characterized in that, instructions that are included in an instruction set used in conventional computers and that have a high use frequency, are realized in the form of hardware for the purpose of increasing the precessing speed.
For example, N. P. Jouppi, "The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance", IEEE Transactions on Computers, Vol. 38, No. 12, December 1989, pp1645-1658, defines a superscalar system and a superpipelined system for elevating the performance of the RISC microprocessor, as follows:
Referring to FIG. 1A, there is shown a pipelined structure of a basic RISC processor, which has four stages called "Instruction Fetch" (IF), "Decode" (D), "Execute" (EX) and "Write Back" (WB), respectively. In the stage "IF", an instruction code is read from an instruction cache memory, and in the stage "D", the fetched instruction code is decoded, and necessary register files are read. In the stage "EX", an arithmetic or logic operation is performed on contents read out of the register files, and in the stage "WB", the result of arithmetic or logic operation is written back to a register file. The operation is advanced by one pipelined stage in each one clock cycle, so that one instruction can be executed in each one clock cycle.
The superscalar system is featured in that "N" processor units are provided so that "N" instructions can be simultaneously executed (where "N" is an integer not less than 2). FIG. 1B illustrates the superscalar system of N=2, so that two instructions are executed in each one clock cycle.
On the other hand, the superpipelined system is realized by subdividing the basic pipelined system shown in FIG. 1A by "M" (where "M" is an integer not less than 2) and shortening the period of each clock cycle to one-divided-by-"M", so that the instructions can be executed at a speed which is "M" times the speed of the basic pipelined system. FIG. 1C shows the superpipelined system of M=2. In the shown example, since the period of two clock cycles in the superpipelined system corresponds to one clock cycle of the basic pipelined system shown in FIG. 1A, although only one instruction can be executed in each one clock cycle, two instructions can be executed in the period of one clock cycle of the basic pipelined system.
The superscalar microprocessor is disadvantageous in that the amount of hardware is increased by the number of processor units increased, and therefore, the chip size is correspondingly increased. In this connection, the superpipelined microprocessor is convenient in that it can realized by addition of only a small amount of hardware such as addition of pipelining registers and some control logic circuits.
However, the superpipelined system has a problem in an incrementer for a program counter.
Referring to FIG. 2, there is illustrated a construction of a conventional memory system provided at the basic pipelined stage "IF" in a 32-bit RISC microprocessor in the prior art. As shown in FIG. 2, the shown conventional memory system includes a 30-bit program counter (PC) 101, a 30-bit incrementer 102 associated to the program counter 101, an instruction memory 103 of 1024 words.times.32 bits receiving, as an address, least significant bits of the program counter 101, a 30-bit pipelining register 104 latching an output of the program counter 101, and a 32-bit pipelining register 105 latching an output of the instruction memory 103.
Now, operation of the shown conventional memory system will be described.
Since a word length of each instruction is 32 bits (4 bytes), an address for an instruction word has two least significant bis of ceaseless "0". Therefore, as mentioned above, each of the program counter 101, the incrementer 102, the register 104 and a branch address AB supplied to the program counter 101 has the word length of 30 bits, by cutting off the two least significant bis of ceaseless "0".
Assuming that the program is being sequentially executed in order, the program counter 101 is incremented +4 by +4 by the output of the incrementer 102, since the two least significant bis of the address are "0". By using the output (12 least significant bis) of the program counter 101 as the address, an instruction word is read from the instruction memory 103 and outputted to the register 105.
On the other hand, when a branch instruction is executed, a branch destination address is generated on the basis of the registers 104 and 105 by action of hardware of the stage "D" (Decode) and its downstream stage(s). Ordinarily, the branch instruction is a program counter relative addressing. Namely, the branch destination address is calculated by adding an offset value included in the instruction code, namely, in the output of the register 105, to the value of the program counter 101, namely, the output of the register 104. When the branch is executed, the branch destination address is supplied to the program counter 101 as the branch address "AB".
Here, consider that the basic pipelined structure shown in FIG. 2 is modified to the superpipelined structure of N=2. Modification of the instruction memory 103 into the superpipelined structure can be realized by a self-resetting circuit configured to detect arrival of data, to perform an operation for the data and to return to a standby condition after completion of the operation. This self-resetting circuit is disclosed in for example, T,I, Chappell et al, "A 2-ns Cycle, 3.8-ns Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture". IEEE Journal of Solid-State Circuits, Vol. 26, No.11, November 1991, pp1577-1585.
However, it is not possible to modify the incrementer 102 into the superpipelined structure. Even if a pipelining register is inserted into the incrementer 102 so as to realize the superpipelined structure, the incrementing can be executed only one time per two clock cycles. It is impossible for 30 bits to be incremented in one clock cycle, because addition of 32 bits require two clock cycles since the execution of operation (EX) needs two stage.
For example, JP-A-57-027477 discloses a memory which makes it possible to access consecutive addresses at a high speed with using no incrementer. More specifically, a nibble mode access of a DRAM (dynamic random access memory) is performed by causing a shift register to temporarily receive an output of a Y (column) decoder for the purpose of increasing the consecutive address access speed.
It is sure that this system enables a high speed consecutive address access with no incrementer. However, for performing the branch instruction of the program counter relative addressing, the incremented value of the program counter is necessary.
As will be apparent, for realizing the superpipelined microprocessor, it is necessary to fulfill both of the two conditions, namely, the fact that the instruction memory can be successively accessed at each clock cycle, and the fact that the program counter incremented at each clock cycle is required for the branch instruction. However, there is no means which can simultaneously fulfill both of the two conditions.