This application is based on an application No. 10-083369 filed in Japan, the content of which is hereby incorporated by reference.
1. Field of the Invention
The present invention relates to a processor that executes a plurality of instructions in parallel and to a program conversion apparatus for the same.
2. Description of the Related Art
In recent years, VLIW (Very Long Instruction Word) processors have been developed with the aim of achieving high-speed processing. These processors use long-word instructions composed of a plurality of instructions to execute a number of instructions in parallel.
Japanese Laid-Open Patent No. 5-11979 discloses an example of this kind of technique. FIG. 1 is a block diagram of a processor disclosed in this document.
The processor of FIG. 1 includes a register file 1, an external memory 2, an instruction register 3 having four instruction slots, an input switching circuit 4, a transfer unit 5, a integer calculation unit 6, a transfer unit 7, an integer calculation unit 8, an integer calculation unit 9, a floating-point unit 10, a branch unit 11, an output switching circuit 12 and a register file or external memory 13.
The instruction register 3 stores four instructions, which make up one long-word instruction, in its four internal instruction slots (hereafter referred to as xe2x80x98slotsxe2x80x99). Here, the instruction in each of the first and second slots is either an integer calculating instruction or a data transfer instruction (also referred to as a load/store instruction). The instruction in the third slot is a floating-point calculating instruction or an integer calculating instruction and that in the fourth slot is a branch instruction. The arrangement of instructions in one long-word instruction is performed in advance by a compiler.
The transfer unit 5 and the integer calculation unit 6 are aligned with the first slot, and execute the data transfer and integer calculating instructions respectively.
The transfer unit 7 and the integer calculation unit S are aligned with the second slot, and execute the data transfer and integer calculating instructions respectively.
The integer calculation unit 9 and the floating-point unit 10 are aligned with the third slot, and execute the integer calculation and floating-point instructions respectively.
The branch unit 11 is aligned with the fourth slot and executes branch instructions.
Here, the transfer units 5 and 7, the integer calculation units 6, 8 and 9, the floating-point unit 10 and the branch unit 11 are generally referred to as functional units.
The input switching circuit 4 inputs source data read from the register file 1 or the external memory 2 into the required functional units.
The output switching circuit 12 outputs the results of calculations by the utilized functional units to the register file or external memory 13.
A processor constructed as above decodes and executes instructions stored in the four slots in parallel. Assume, for example, that an xe2x80x98addxe2x80x99 instruction for adding register data is stored in the first slot. The processor inputs two pieces of register data from the register file 1 into the integer calculation unit 6 via the input switching circuit 4. The two pieces of register data are then added by the integer calculation unit 6 and the result stored in the register file 13 via the output switching circuit 12. Instructions in the second, third and fourth slots are also decoded and executed in parallel with this instruction.
However, in this kind of conventional processor certain functional units are left idling when instructions are executed. When an integer calculating instruction is executed by the third slot, for example, the floating-point unit is left idling.
An object of the present invention is to provide a processor that utilizes idling functional units, thus improving processing performance.
A second object is to provide a processor that executes at a high speed the product-sum operations frequently used in current multimedia processing.
A processor that achieves the above objects includes first and second decoding units, first and second executing units corresponding to the first and second decoding units, and a selecting unit. The first and second executing units decode instructions and generate results denoting their content. It the first decoding unit decodes a special instruction, it generates first-part and second-part decode results denoting a first-type calculation and a second-type calculation. The executing units execute instructions in parallel according to a decode result from the corresponding decoding unit. If the first decoding unit decodes the special instruction, the selecting unit selects the second-part decode result, and if the first decoding unit decodes an instruction other than the special instruction, the selecting unit selects the decode result from the second decoding unit.
The second executing unit includes a first functional unit, which executes instructions according to the decode result selected by the selecting unit, and a second functional unit, which executes instructions according to the decode result of the second decoding unit. If the special instruction is decoded, the first executing unit performs a first-type calculation, the first functional unit performs a second-type calculation and the second functional unit executes an instruction decoded by the second decoding unit.
Here, the special instruction may include an operation code denoting the first-type calculation and the second-type calculation, and first and second operands. The first executing unit performs the first-type calculation on the first and second operands, and stores a calculation result in the first operand. Meanwhile, the second executing unit performs the second-type calculation on the first and second operands, and stores a calculation result in the second operand.
This structure enables a first-type calculation and a second-type calculation to be executed by the first and second executing units according to a special instruction in one instruction slot. This allows idling functional units to be used, thus increasing processing performance.
Here, the first executing unit may include an adder/subtracter, the first functional unit be an adder/subtracter and the special instruction denote addition as the first-type calculation and subtraction as the second-type calculation.
This structure enables an instruction other than the special instruction to be executed in parallel with the addition and subtraction denoted by the special instruction, so that the processing performance of the processor can be further increased.
Here, the second functional unit is a multiplier and the instruction is a multiply instruction
This structure enables addition, subtraction and multiplication to be executed in parallel, so that product-sum calculations extensively used in modern multimedia processing can be executed efficiently.
Furthermore, a program conversion apparatus that achieves the above objects is one that changes a source program to an object program for a target processor executing long-word instructions. This program conversion apparatus includes a retrieving unit, a generating unit and an arranging unit. The retrieving unit retrieves a pair of instructions denoting a first-type calculation of two variables and a second-type calculation of the same two variables from a source program. The generating unit generates a special instruction corresponding to the retrieved pair. This special instruction includes an operation code denoting the first-type calculation and the second-type calculation, and two operands representing the two variables. The arranging unit arranges the generated special instruction into a long-word instruction.
This structure generates an object program, composed of a plurality of long-word instructions. Special instructions supported by the target processor are embedded in certain of the plurality of long-word instructions.
Here, the first instruction denotes addition, and the second instruction denotes subtraction. The target processor includes a first instruction execution unit having a first calculation unit, and a second instruction execution unit having a second calculation unit and a multiplication unit. The arranging unit retrieves a multiply instruction that does not share dependency with the special instruction generated by the generating unit, and arranges the special instruction and the multiply instruction in one long-word instruction.
This structure enables addition, subtraction and multiplication to be performed in parallel by aligning two instructions (a special instruction and a multiplication instruction) found in one long-word instruction in parallel. This makes the operation suitable for a program compiler performing product-sum calculations.