1. Field of the Invention
The present invention relates to a program execution method and a program execution device using said method. In particular, the present invention relates to a program execution method and device using a general-purpose register for executing a program.
2. Description of the Related Arts
Most control-driven computers for sequentially fetching, decoding and executing instructions stored in a memory are used as program execution devices such as microprocessors.
FIG. 4 shows a configuration of a conventional RISC (Reduced Instruction Set Computer) System microprocessor which comprises a memory 1, an instruction fetch unit 2, an instruction decoder 3, a group of resisters 4, a computing unit 5 and a data access unit 6. In this configuration, instructions such as computing instructions, data transfer (load/store) instructions and such like are fetched from the memory 1 and designed; the computing unit 5 then carries out an arithmetic computation as indicated by the instruction sign.
Conventionally, in addition to an arithmetic and logic unit (ALU) for carrying out basic arithmetic operations such as addition, subtraction, logical OR and AND operations and such like, a computing unit 5 as shown in FIG. 4 further comprises, for instance, a multiplier connected in parallel thereto for carrying out multiplication at high speed. A computing unit in which an ALU for carrying out integer computations and an ALU for carrying out floating-point arithmetic computations are provided in parallel has also been used. When processing data using an arithmetic unit 5 comprising these types of computing devices, a general method has been to provide a computing instruction corresponding to a computation carried out by the computing device. This computation instruction is described in the program and read into the processor. In other words, in addition to basic computations such as ADD (Integer Addition), OR (logical OR) and the like, expanded arithmetic computation instructions such as MUL (Multiplication), MAC (Multiplication and Addition Calculation), FSUB (Floating-Point Subtraction) and such like can be described in a program and executed.
However, in this type of processor, it has not been possible to carry out a designated computation without first executing an instruction to prepare data required for the computation by storing the data in a computation data storage location (such as a register, for instance) and then executing an instruction to start and carry out the computation. For example, shown below is a program envisaged when carrying out repeated multiplication of operand data stored in a memory using an RISC processor. Here, since 2 load instructions are required in order to transfer data to 2 registers (R2, R3), at least 3 instructions must be executed for a single multiplication.
LD R2 (R0) (R2.rarw.memory (R0)
LD R3 (R1) (R3.rarw.memory (R1)
MUL R2 R3 (R2.rarw.R2.times.R3)
Consequently, even by installing a high-speed computing device capable of carrying out computations in 1 instruction cycle, the overall processing rate will remain at 3 instruction cycles for 1 computation, thereby hindering performance improvement.
The .mu.PD77240 (Trademark) DSP manufactured by NEC Corp. is an example of technology intended to solve the above problem. The features of this DSP as detailed on page 66 of ".mu.PD77240 User Manual" (September 1991) are as follows.
1. In addition to a conventional ALU, the DSP has a circuit for carrying out floating-point multiplication (FMPY). The ALU starts computation in response to a clear computation instruction (such as ADD or SUB) and the FMPY automatically starts computation of transferred data in compliance with a data transfer instruction.
2. In the FMPY, transfer data to registers K and L for multiplication input are multiplied in each instruction cycle. The multiplication result is output from the output bus of the FMPY one instruction cycle later and written in the multiplication output register M two instruction cycles later.
This DSP attempts to solve the problems mentioned above by starting multiplication only in compliance with a data transfer instruction.
However, in this DSP, the FMPY is only capable of floating-point multiplication. When other computations are desired, such as high-speed division, for instance, it is necessary to provide a dividing circuit having the same configuration as the FMPY. This dividing circuit also requires 2 input registers and 1 output register, consequently increasing the overall hardware size. With a RISC processor, since all types of arithmetic computations such as addition, subtraction, multiplication, division and trigonometric functions are required in order to carry out scientific and technical calculations used in image processing and CG and such like, if an additional 32-bit register has to be set for each computation, large-scale hardware will inevitably be required.
One method of solving this problem is to reduce the total number of registers by jointly using input registers K, L for multiple computations. In this case, however, it is necessary to specify which computation is to be carried out on input data. Such a specification cannot be carried out using a normal data transfer instruction and to clarify the computation would be contrary to the design principles of the FMPY.