1. Field of the Invention
The present invention relates to a microprocessor and an arithmetic logic unit (ALU) used in the microprocessor. More specifically, the present invention relates to a microprocessor and an arithmetic logic unit allowing user setting for effective operation for specific application.
2. Description of the Background Art
An RISC (Reduced Instruction Set Computer) or a DST (Digital Signal Processor) contains an ALU. FIG. 7 schematically shows a structure of a conventional DSP. Referring to FIG. 7, DSP 100 includes an instruction memory 110, a program counter (PC) 114 designating a specific address in instruction memory 110, an instruction decoder 112 for reading and decoding an instruction from an address of instruction memory 110 designated by PC 114, an incrementor 116 for incrementing by one (adding one to) the output of PC 114, an adder 118 for adding a relative branch address output from instruction decoder 112 to the output of PC 114, and a multiplexer (MUX) 140 for selecting and setting in PC 114 one of the output from incrementor 116, the output from adder 118 and a constant "0". The constant "0" is set in PC 114 at the time of resetting. PC 114, incrementor 116, adder 118 and MUX 120 control order of reading instruction from instruction memory 110 in accordance with a result of processing by ALU 124.
DSP 100 further includes an ALU 124 for performing an operation on two input operands and for outputting result of operation, based on an instruction code Op applied from instruction decoder 112, a register file 126 for storing an output from ALU 124 at an address R output from instruction decoder 112 or for outputting data from address R, an RAM 122 receiving and storing an output from register file 126 at a prescribed address, and two MUXs 128 and 130 each receiving output from RAM 122 and register file 126, selecting either of these under the control of instruction decoder 122 and applying the selected one to ALU 124 as operand data.
The operation of DSP 100 shown in FIG. 7 will be briefly described in the following. First, MUX 120 selects "0" and sets it in PC 114, and an instruction read from address 0 of instruction memory 110 is decoded by instruction decoder 112. Consequently, DSP 100 starts its operation. Instruction decoder 112 decodes an instruction and outputs instruction code Op and address or addresses R of register file 126, which are applied to ALU 124 and register file 126, respectively. When the instruction is a branch instruction (in this case, only a relative branch), instruction decoder 112 applies relative branch address to adder 118.
Register file 126 outputs data from address R applied from instruction address 122, and RAM 122 also outputs data from an address designated by instruction decoder 122. Each of MUXs 128 and 130 selects either the output from register file 126 or the output from RAM 122 under the control of instruction decoder 112, and applies the selected one to ALU 124. ALU 124 performs a processing designated by instruction code Op applied from instruction decoder 112 on two operands, and outputs the result of processing to register file 126. Register file 126 stores the data at an address designated by instruction decoder 112.
Incrementor 116 increments by one the output of PC 114 and applies it to MUX 120. MUX 120 selects either incrementor 116 or adder 118 in accordance with control of instruction decoder 112, and sets an output of the selected one in PC 114. MUX 120 generally selects the output from incrementor 116 and it selects the output from adder 118 only when the instruction is a branch instruction.
FIG. 8 shows examples of number of input/output bits and number of bits of instruction codes to ALU 124. It is assumed that instruction code to ALU 124 has i bits, each of first and second operand data has n bits, and resulting data has n bits.
In an arithmetic logic unit represented by ALU 124 and in an arithmetic logic unit as a single chip LSI (Large Scale Integrated Circuit), all the available instruction sets are determined in advance, an example of which is shown in FIG. 9.
When an instruction code has i bits, there may be 2.sup.i different instruction codes from 0 to 2.sup.i -1, as shown in FIG. 9. These instruction codes have corresponding processing contents determined respectively. The contents of processing are all determined in advance by the supplier of the ALU, and the contents cannot be changed by the user, since ALU 124 is realized by hard-wired logics and ROMs, of which logic cannot be changed by the user.
However, in the arithmetic logic unit in which all the instruction sets are determined in advance, these instructions are provided for general purposes, and therefore these are not always very effective when applied to a specific use. It has been known that performance of an arithmetic logic unit used in a specific application is remarkably improved by providing an instruction dedicated to the specific application and allowing effective execution of that instruction. One such example includes an underflow/overflow processing in integer operation.
Let us consider signed 8 bit addition in an 8 bit ALU, as an example. In this case, the signed 8 bit data is in 2's complement representation and assumes a value within the range of -128 to 127. Here, "overflow" refers to a case where result of addition exceeds 127, and "underflow" refers to a case where the result is smaller than -128. Consider an addition of "7F+7F" in hexadecimal notation. In general addition, the result is FE (hexadecimal notation). However, since this result is regarded as a signed integer in a conventional ALU, it is interpreted as "-2", which value is meaningless when viewed as the result of operation.
This problem causes lower efficiency in processing especially in the field handling image and video data. A large amount of data must be handled in the field of image and video processing. Therefore, the number of bits of each data is reduced to be as small as possible, and processing is performed with the minimum length of data. This tends to cause the aforementioned problem of underflow/overflow. In other fields, sufficient number of bits are generally allotted to data so as to reduce the possibility of underflow/overflow. Therefore, the problem of underflow/overflow is less likely.
In the field handling image and video data, the problem caused by such underflow/overflow has been solved by programming. For example, in the signed 8 bit addition described above, the range of the value of the result of operation is limited such that when the result of operation is below -128, -128 is output, and when the result exceeds 127, 127 is output.
This example is represented by the following equation. EQU 7F (hexadecimal notation)+7F (hexadecimal notation)=7F (hexadecimal notation)
In the conventional ALU, this process is realized by using a plurality of instruction steps. However, this is one reason why the processing is not efficient enough for the processing of this field by the conventional ALU. When an addition instruction involving underflow/overflow processing is prepared in advance and the hard-wired logic of the ALU is adapted to execute such instruction in the similar manner as other instructions, it is clear that the actual processing efficiency in the field handling image and video data can be significantly improved.
Consider a simple IIR filter (Infinite Impulse Response Filter) shown in FIG. 10, for example. The processing shown in FIG. 10 is to multiply an input by .alpha., add the result to an output Z.sup.-1 of this processing of the last cycle multiplied by .beta., and to output the result of addition. When this processing is executed by the conventional ALU and a microprocessor containing such an ALU, the following sequential processing was necessary.
(1) The input is multiplied by .alpha. and stored in a register.
(2) The value (Z.sup.-1) stored as an output of the last cycle is multiplied by .beta..
(3) The stored input multiplied by .alpha. is added to the output of the last cycle multiplied by .beta., and the result of addition is stored.
(4) The stored value is outputted as a result of operation.
(5) The output value is stored as the value of the last cycle.
Since such a processing including five steps is carried out sequentially, five steps (five cycles of DSP) have been necessary to realize an IIR filter, when the conventional DSP is used. When a DSP having clock frequency of 50 MHz is used, the maximum rate of processing of the IIR filter is 10 MHz, and higher rate of processing is not available. When the rate of processing as high as 50 MHz is required of the IIR filter, a DSP having operational clock frequency of 250 MHz is necessary, which means that it is difficult to realize such a filter as long as the conventional DSP is used.
Further, various and many filter characteristics are desired in filtering using IIR filter or the like. Sometimes filters of different characteristics are required simultaneously in a specific application. Higher speed operation to some extent is possible when the IIR filter is implemented as a hardware (hard-wired logic). In that case, the hardware logic must be changed on each occasion in accordance with the user's request, and when the application is changed, a different logic must be prepared.
Some of the conventional CISCs (Complex Instruction Set Computers) have control storage implemented by RAM. In such a CISC, it is possible to describe a relatively complicated processing in one step as a macro instruction to be applied to the CISC, by reforming the contents of the control storage. However, since the ALU is built-in, execution of the macro instruction means execution of a plurality of steps of the micro instructions stored in the control storage, and therefore operation at the same clock frequency as the hard-wired logic is not possible.