1. Field of the Invention
The present invention relates to a microprocessor and a method thereof.
2. Description of the Related Art
When classifying microprocessors by general design concepts, they can be divided into, for example, a reduced instruction set computer (RISC) type and a complex instruction set computer (CISC) type.
Note that the program execution time, which decides the performance of the microprocessor per se, can be expressed by the following formula (1):
[Formula 1]
Program execution time=number of executed instructions (IC)xc3x97average number of clock cycles required per instruction (CPI)xc3x97clock cycle time (CCT)(1)
An RISC type microprocessor is based on the design concept of making the CPI in the above formula (1) close to 1 as much as possible by using instruction pipeline processing.
Therefore, in an RISC type microprocessor, instructions are made a fixed single length and a register-register format (load/store type architecture: only source operand for processing instruction is register operand) is used as an instruction format from the viewpoint of simplifying the functions of instructions functions to be suitable for instruction pipeline processing.
Also, an RISC type microprocessor performs static code scheduling by a compiler so as to prevent delays in the instruction pipeline processing.
On the other hand, a CISC type microprocessor is based on the design concept of improving the level of functions of instructions so as to eliminate the IC of the above formula (1).
Accordingly, in a CISC type microprocessor, instructions are made a plurality of fixed lengths or variable lengths, and the instruction format includes a mixture of the register-memory format and memory-memory format (where a memory operand is also possible for a source operand of a processing instruction). Namely, it direct processing between a register and memory is made possible.
When data on a memory is processed by an arithmetic and logic unit (ALU), an RISC type microprocessor requires at least two instructions, a load instruction and a store instruction, for accessing the memory.
On the other hand, a CISC type microprocessor does not require any instruction only for accessing the memory.
In a CISC type memory-processor, a large bit field is required in an instruction for designating a memory address. As mentioned above, a variable length instruction is used in many cases.
However, the decoding circuit tends to become complicated and large in size when using variable length instructions. Therefore, in a CISC type microprocessor, a the program execution time is being shortened by using super scalar technique or out-of-order technique to speed up the processing of data in the memory at the present time.
Below, an explanation will be given of the method of accessing a memory in a conventional RISC type and the CISC type microprocessor.
FIG. 20 is a view for explaining general-purpose registers in conventional RISC type and the CISC type microprocessors.
As shown in FIG. 20, a conventional microprocessor is provided with, for example, 16 general-purpose registers. Assume that these 16 general-purpose registers are referred to as r0 to r15.
When these registers are mounted in a processor architecture comprised of a set of 3-operand processing instructions, three ports in total, that is, two read ports and one write port, are necessary
With a 3-operand processing instruction, as shown in FIG. 21, it is possible to designate three register designators of an ALU processing instruction.
Note that, in FIG. 21, a comment on the instruction written on the left side of the semicolon is given on the right side of the semicolon.
The instruction shown in FIG. 21 is an instruction to execute xe2x80x9cr2xe2x88x92r3+r4xe2x80x9d. The registers r0 to r15 are general-purpose ones and are used for temporarily holding values.
In a processor using a load/store type architecture, a load/store instruction is executed for a general-purpose register in order to realize a load/store operation on a memory. There are no instructions which are directly entered into the ALU processor. This is often seen in RISC type processors.
As shown in FIG. 22, when processing data in the memory, it is necessary to execute a load instruction xe2x80x9c1w r3, 0 (r10)xe2x80x9d once.
On the other hand, in some CISC processors, it is possible to designate data on a memory as an operand of an ALU processing instruction. In this case, however, a general-purpose register is not used. A memory buffer is used directly.
Below, an explanation will be given of pipeline processing in a conventional RISC type processor.
In an RISC type processor, a five-stage or eight-stage pipeline structure is often used.
For example, the xe2x80x9cR3000xe2x80x9d (product name) of the MIPS Co., as shown in FIG. 23, uses a five-stage pipeline comprising an instruction fetch (IF) stage, an instruction decode (DEC) stage, a memory (MEM) stage, and a write back (WB) stage.
This processor fetches (reads) an instruction in the first IF stage and decodes the instruction in the second DEC stage. Note that if the instruction designates a general-purpose register as a source register, the processor decodes the instruction in the DEC stage, then reads the data from the general-purpose register.
Next, at the third ALU stage, it executes an ALU processing instruction. Note that when the fetched instruction is not an ALU instruction, nothing is done in the ALU stage and the data is output to the ALU output port as it is.
Next, in the fourth MEM stage, when the fetched instruction is a memory access instruction, the processor outputs a memory address used for memory access to a memory unit for accessing the memory.
Next, fifth, for an instruction which designates a general-purpose register as a destination register, the processor writes back the result of the ALU processing in the general-purpose register. If the instruction is a memory read instruction (load instruction), it receives a value from the memory unit and writes it in the general-purpose register.
As shown in FIG. 23, a processor using a five-stage pipeline performs, for example, in a clock cycle X, the WB stage of a code C1, the MEM stage of a code C2, the ALU stage of a code C3, the DEC stage of ta code C4, and the IF stage of a code C5 by multiplexing.
However, as explained above, since an RISC type processor uses a load/store type instruction set architecture, the ALU processing instructions and the load/store instructions exist separately and independently of each other.
Accordingly, in order to multiplex the desired instruction with these instructions, for example, use of the five-stage pipeline structure shown in FIG. 23 is convenient. Namely, the memory access instruction and other instructions can be executed simultaneously. Since it is assumed that there is only one system (1 set) of paths for memory access, it is impossible to execute memory read and memory write operations at the same MEM stage simultaneously.
Also, when treating a memory access instruction and the other instructions as independent, unused pipeline stages end up occurring. For example, in a transfer instruction between registers, the function of the MEM stage is not used. Also, in a memory access instruction, the function of the ALU stage is not used. Note that the address generating processing for memory access is performed in units other than the ALU.
In the five-stage pipeline processing shown in FIG. 23, when data on the memory is processed by the ALU, the program is written, for example, as shown in FIG. 24
In the program shown in FIG. 24, first, the processor loads the data at the memory address indicated by the register r10 to the register r2 by the instruction xe2x80x9c1w r2, 0 (r10 )xe2x80x9d. Next, it adds the values in the registers r2 and r9 and inserts the result in the register r3 by the instruction xe2x80x9caddu r3, r2, r9xe2x80x9d. Next, it stores (writes back) the value in the register r3 in the memory address indicated by the register r11 by the instruction xe2x80x9csw r3, 0 (r11)xe2x80x9d. These operations are written by three instructions. Since each instruction requires at least one clock cycle for execution, three cycles are required to execute the three instructions. In actuality, one more cycle is required because the data read (loaded) from the memory cannot be referred to by the immediately succeeding instruction.
However, in video processing, audio processing, and other media processing, it is necessary to repeatedly perform a predetermined ALU processing on data in the space of consecutive memory addresses. In this case, as shown in FIG. 25, the instructions xe2x80x9caddi r10 , 4xe2x80x9d and the instruction xe2x80x9caddi r11, 4xe2x80x9d for updating the memory address must be further added to the program shown in FIG. 24. As a result, there is the problem that at least five clock cycles are needed to execute the program shown in FIG. 25 and the processing time becomes longer.
Note that, in FIG. 25, the starting address of the source data of the addition processing on the memory is set using the register r10, while the starting address of the destination data is designated is set using the register r11.
Further, in the above-mentioned conventional five-stage pipeline processing in a microprocessor, since the memory access is executed at the MEM stage and there is only one system of paths for memory access provided, it is not possible to simultaneously execute a memory read operation and memory write operation. Therefore, it is necessary to write a program which writes a memory read instruction and memory write instruction independently. This has been an obstacle when trying to shorten the processing time.
An object of the present invention is to provide a microprocessor which is improved in its processing capability, and a method for the same, which can effectively perform processing accompanied with a predetermined pattern of consecutive accesses to a memory address space.
According to the first aspect of the present invention, there is provided a microprocessor, comprising: an internal memory for storing data to be processed; a data pointer register for storing an address on said internal memory and automatically updating a stored address in accordance with a predetermined pattern when there is a predetermined pattern in access to said internal memory; a decoding means for decoding instructions; a plurality of general-purpose registers including a data register for storing data read from an address on said internal memory stored in said data pointer register in accordance with a request for reading out data stored in said internal memory and for writing stored data at an address on said internal memory stored in said data pointer register in accordance with a request for writing data to said internal memory; and a processing means for performing processing by using data stored in said general-purpose registers and for writing the result of processing in said general-purpose registers in accordance with the result of decoding of said decoding means.
Preferably, the data register reads the data from an updated address on said memory immediately after an address stored in said data pointer register is updated.
Preferably, the data register terminates the function of reading the data from an updated address on said memory immediately after an address stored in said data pointer register is updated when continuously and repeatedly writing data in said internal memory.
Preferably, the plurality of general-purpose registers include a plurality of data registers and there are a plurality of data pointer registers corresponding to these plurality of data registers.
Preferably, the memory address space of said internal memory is divided into a plurality of banks and each of the plurality of data pointer registers stores only addresses in a corresponding bank among said plurality of banks.
The microprocessor of the first aspect of the present invention further comprises a switching means for switching interconnections among said data register, said plurality of general-purpose registers other than said data register, and said processing means.
Preferably, a mode of connection between said data register and said processing means and a mode of connection between said plurality of general-purpose registers other than said data register and said processing means are equivalent.
Further, the microprocessor of a second aspect of the present invention comprises: an instruction memory for storing a plurality of instructions; a program counter for designating an address on said instruction memory where an instruction to be next executed is stored; a decoding means for decoding instructions; an internal memory for storing data to be processed; a first data pointer register and a second data pointer register for storing addresses on said internal memory; a plurality of general-purpose registers including a first data register for storing data read from an address on said internal memory stored in said first data pointer register in accordance with a request for reading out data stored in said internal memory and a second data register for writing stored data at an address on said internal memory stored in said second data pointer register in accordance with a request for writing data to said internal memory; and a processing means for performing processing by using data stored in said general-purpose registers and for writing the result of processing in said general-purpose registers in accordance with the result of decoding of said decoding means and performing four-stage pipeline processing comprising, multiplexed, instruction fetching processing for reading an instruction from an address on said instruction memory designated by said program counter; decoding processing for decoding said fetched instruction and transfer data from said general-purpose registers to said processing means in accordance with need, processing by said processing means; and write back processing for writing the result of processing of said processing means in said general-purpose registers in accordance with need.
Preferably, the memory address space of said internal memory is divided into a plurality of banks, said first data pointer register and said second data pointer register access different banks of said internal memory, said decoding processing transfers data from said first data register to said processing means, and said write back processing transfers the results of said processing to said second data register.
Preferably, the first data pointer register and said second data pointer register automatically update an address stored in said first data pointer register and said second data pointer register in accordance with a predetermined pattern when there is a predetermined pattern in access to said internal memory.
Further, in the microprocessor of a second aspect of the present invention, processing for reading data from an address on said internal memory stored in said first data pointer register and storing it in said first data register and processing for writing the data written in said second data register at an address on said internal memory stored in said second data pointer register are performed in parallel with said four-stage pipeline processing.
Further, according to a microprocessor of a third aspect of the present invention, there is provided a microprocessor, comprising: an internal memory for storing data to be processed; a data pointer register for storing an address on said internal memory; a decoding means for decoding instructions; a data register for storing data read from an address on said internal memory stored in said data pointer register in accordance with a request for reading out data stored in said internal memory and for writing stored data at an address on said internal memory stored in said data pointer register in accordance with a request for writing data to said internal memory; a plurality of general-purpose registers; and a processing means for performing processing by using data stored in at least one of said data register and said general-purpose registers and writing the result of processing in said data register or said general-purpose registers based on the result of decoding of said decoding means.
Further, the processing method of the present invention comprises performing four-stage pipeline processing comprising, multiplexed, instruction fetching processing for reading an instruction from an address on an instruction memory designated by a program counter; decoding processing for decoding said fetched instruction; processing performed using data stored in a first data register based on the result of said decoding processing; and write back processing for writing the result of processing of said processing in a second data register and performing: processing for reading data from an address on an internal memory stored in a first data pointer register and storing it in said first data register and processing for writing back the data written in said second data register at an address on said internal memory stored in a second data pointer register in parallel with said four-stage pipeline processing.