1. Field of the Invention
The present invention relates to a data processor for executing various instructions to process data.
2. Description of the Prior Art
There are many data processors of a control driven type in which instructions stored in a memory are sequentially fetched and decoded for execution. Microprocessors are examples of such processors.
FIG. 1 is a flow diagram of a RISC (reduced instruction set computer) type processor. This processor comprises a memory 1, an instruction fetcher 2, an instruction decoder 3, a register section 4 having a plurality of registers, an execution unit 5 and a data access section 6. In such an arrangement, computation, data transfer (load/store) and other instructions are sequentially fetched and decoded, the decoded instructions being then used by the execution unit 5 for executing operations according to the instruction codes.
In the prior art, the execution unit 5 of FIG. 1 includes a basic arithmetic logic unit (ALU) for executing addition, subtraction, logical or, logical and, and other operations, and additionally has a parallel multiplier for executing multiplication at a high speed, for example. The execution unit 5 may include two types of ALU for integer number arithmetic and for floating-point arithmetic. When data is to be processed through various execution sub-units forming the execution unit 5, an operation code corresponding to each of the sub-units is generally provided and described in a program. These operation codes are then read in the processor. More particularly, extended operation codes for MUL (multiplication), DIV (division), MAC (sum of products), FADD (floating-point addition), FSUB (floating-point subtraction) and others may be provided in addition to basic operation codes for ADD (integer addition), SUB (integer subtraction), AND (logical or), OR (logical and) and others. These operation codes were described in the program.
However, such a processor can perform a given operation only when it executes an instruction for providing data required by the operation to an operation data storage location (e.g., a register) and another instruction for starting the actual operation. For example, if the RISC processor is to perform repeated multiplications for operand data stored in the memory, the following program may be considered:
______________________________________ LD R2, (R0) (R2 .rarw. mem (R0)) LD R3, (R1) (R3 .rarw. mem (R1)) MUL R2, R3 (R2 .rarw. R2 .times. R3). ______________________________________
In such a program, two load instructions for transferring data to two registers (R2 and R3) and a multiplication instruction are necessary. One multiplication requires the execution of at least three instructions.
This raises the following problems:
(Problem 1) Limitation of Processing Performance
Even if the processor includes a high-speed computing element for executing the operation within one instruction cycle, the overall processing rate will only just perform one operation through three instruction cycles. This hampers the improvement of performance.
(Problem 2) Program Size
The overall program size is large because the number of instructions necessary to perform the operation is large.
(Problem 3) Instruction Code Length
When a new operation is to be added, the operation code indicative of how it is to be processed must be added into the instruction code. The extended operation directly results in an increase of the instruction code length. This makes the problem 2 more significant. The hardware of the processor will also be increased in scale.
(Problem 4) Difficulty in Compactly Designing Instructions
Operations other than the basic operations may be added and extended in the future. These operations must be previously reserved in the processor. It is thus difficult to compactly design the instruction code.
(Problem 5) Future Extension
When all the future additions and extensions of the operations are considered, the design itself becomes difficult. The number of additional operations is consequently limited. It is thus very difficult to design a flexible processor facilitating extension.
To overcome some of these problems, there has been proposed .mu.PD77240 which is a DSP from NIHON DENKI CORP. The features of the DSP are described in "User's Manual .mu.PD77240", on page 66.
(1) This DSP has a circuit for performing the floating-point multiplication (FMPY) in addition to the conventional ALU. The ALU starts the operation based on an explicit operation instruction (ADD, SUB or the like) while the FMPY circuit automatically starts the operation for data transferred by a data transfer instruction.
(2) In the FMPY, data transferred to two multiplication input registers K and L are multiplied together. The result is outputted to the output bus of the FMPY after one instruction cycle and written into a multiplication output register M after two instruction cycles.
As is shown, the multiplication can only be started by the data transfer instruction. This overcomes the above problems 1 and 2 to some extent.
However, the DSP still includes the following problems.
1. As to the Problems 1 and 2
FIG. 2 shows the structure of a field for specific instructions in the DSP. This field is one included in generally called "operation instruction" in said manual on page 14. The synopsis of the field is as follows:
1. The instruction code length is 32 bits.
2. OP indicates the type of operations.
3. CNT instructs to change the internal state of the DSP, for example, an increment in an address register.
4. Q and P represent first and second operands, respectively.
5. SRC and DST indicate source and destination for the transfer instruction, respectively.
More particularly, three different instructions, operation, transfer and internal state manipulation, hereinafter referred to as "individual instructions" are usually described together in the 32-bit fixed instruction (see the bottom of the drawing). This is because many DSP's follow the architecture of the microprogram computer. As a result, the total number of steps can be reduced at the sacrifice of shortening the instruction code length. When the number of bits required to describe an individual instruction is appropriately combined with that required to describe other individual instructions, these instructions can be completely accomodated in the 32-bit instruction. Such an architecture may overcome the problems 1 and 2.
When the actual programing is considered, however, the technique cannot improve the problems 1 and 2 very much. This is because if the efficiency is to be maximized, it is always required to set three parallel-describable instructions, which is actually impossible. If only several transfer instructions are to be continuously executed, for example, it requires a 32-bit instruction code for each transfer instruction since only the fields of the SRC and DST used (with total 10 bits). During such a period, the efficiency of usage in the memory will be reduced to about one-third, and the problems 1 and 2 will be raised again.
2. As to the Problems 3 to 5
In this DSP, the FMPY can only perform the multiplication. If any other operation is added for the FMPY in the future, an operation selection mechanism must be also incorporated in a suitable manner. In the structure of FIG. 2, the a selection can be carried out by adding a new individual instruction to the OP. However, such a manner is hardly acceptable because it conflicts with the basic design concept of the FMPY in which the multiplication is only executed by the individual instructions relating to the data transfer. This may also directly raise the problems 3 to 5. It is to be noted that the DSP is not initially designed to overcome the problems 3 to 5.
The relationship between the problems 1 to 5 and the DSP has been now described. When the RISC processor, which is one of the main applications of the present invention, is considered, the DSP may further raise the following problems.
[Problem 6] Difficulty in Programing
In the FMPY, the multiplication is carried out for the registers K and L at all times. The result of multiplication activated two cycles earlier appears in the register M. Since the result is updated for each cycle with or without reference from outside, a programmer must pay carefully attention to timing when a desired result is read out from the register M. On the other hand, if a high level language is to be used, it is extremely difficult to develop a compiler for attaining the read-out timing intended by the programmer.
Even if the problem of timing is overcome, another problem will be raised when an interruption occurs before reading out the result of computation. This is because the multiplication result will be lost until the interruption has been processed. Although another method by which the interruption is inhibited before the multiplication is considered, it is not practical for reasons (1) the program becomes complicated; (2) the inhibition and permission of interruption should be judged at the system level; and (3) the immediacy intended by the FMPY is lost.
[Problem 7] Restriction due to Architecture
The DSP only has three registers K, L and M and is combined with the FMPY as a set. If a new operation (e.g., addition) is to be added, any other register for such a new operation (e.g., K.sub.2, L.sub.2 or M.sub.2) must be added to the DSP. Since the RISC processor requires various operations such as addition, subtraction, multiplication, division, trigonometric function and others for performing scientific and technical computations such as image processing, CG and the like, the scale of the hardware will be significantly increased when a 32-bit register set is added for each of these operations.
[Problem 8] Hardware Volume
The DSP works at high speed by simultaneously transferring the data to the registers K and L. Thus, the DSP takes a double 32-bit data bus structure at the entries of these registers. This makes the hardware complicated, thereby leading to increase of the circuit scale. It is desirable to improve such an arrangement from the viewpoint of compact design and manufacturing cost.