1. Field of the Invention
The present invention relates to a processor for expanding and executing VLIW (very long instruction word) instructions (hereinafter also called a long instruction).
2. Description of the Related Art
The performance of a computer is determined by a period of one machine cycle and CPI (cycles per instruction) indicating the number of machine cycles required for executing one instruction. It is essential for improving the computer performance to shorten both the machine cycle period and CPI. There is a scheme of shortening CPI in which a number of instructions are executed during one machine cycle in parallel at the same time. One of typical examples of this scheme is a VLIW scheme (refer to David A. Patterson and John L. Hennessy, "Computer Architecture A quantitative approach" by Morgan Kaufmann Publishers, Inc., 1990).
In the VLIW scheme, a long instruction containing a plurality of instruction fields is used, and each instruction field controls a function unit such as a calculation unit and a memory unit. One instruction can therefore control a plurality of function units. In order to simplify an instruction issuing circuit, each instruction field of a VLIW instruction is assigned a particular operation or instruction (hereinafter called a small operation or instruction). A plurality of small instructions of one VLIW instruction can control at the same time a plurality type of function units assigned to the small instructions. Each small instruction is constituted of an operation code (hereinafter called an OP code) representative of the type of arithmetic operations and an operand representative of the subject of arithmetic operations. With the VLIW scheme, in compiling a VLIW instruction, the dependency relationship between small instructions of a program is taken into consideration to schedule the execution order of small instructions and distribute them into a plurality of VLIW instructions so as to make each VLIW instruction contain concurrently executable small programs as many as possible. As a result, a number of small instructions in each VLIW instruction can be executed in parallel and a computer executing such instructions does not require a complicated instruction issuing circuit. It is therefore easy to shorten the machine cycle period, to increase the number of instructions issued at the same time (hereinafter called an instruction parallel degree), and to reduce the number of cycles per instruction (CPI). This technique has drawn attention as a means of improving the performance of a computer.
In the VLIW scheme, each VLIW instruction contains instruction fields corresponding to function units. If there is a function unit not used by a VLIW instruction, the instruction field corresponding to this function unit is assigned a NOP (no operation) instruction indicating no operation. Depending on the kind of a program, a number of NOP instructions are embedded in a number of VLIW instructions. As NOP instructions are embedded in a number of instruction fields of VLIW instructions, the number of VLIW instructions constituting the program increases. Therefore, the main storage and instruction cache are consumed in storing a large capacity of these VLIW instructions.
Several proposals have been presented in order to reduce NOP instructions. For example, in the study reports of Information Processing Society of Japan, Vol. 93-ARC-102, pp. 17-24 (hereinafter called the first reference document), one or a plurality of consecutive invalid VLIW instructions each having only NOP instruction are removed. To this end, a field for storing the number of delay cycles of one or a plurality of invalid VLIW instructions is stored in the valid VLIW instruction to be executed immediately before one or a plurality of invalid VLIW instructions. After the preceding valid VLIW instruction, the succeeding valid VLIW instruction is executed after the lapse of the delay cycles. This technique can reduce the number of VLIW instructions so that it can be considered as a method of compressing VLIW instructions in time. This conventional technique also proposes a method of improving a use efficiency of function units by a multi-thread process which switches a series of invalid VLIW instructions having all fields filled with NOP instructions to another instruction series.
Another method of compressing VLIW instructions in time has been proposed in the study reports of Information Processing Society of Japan, Vol. 94-ARC-107, pp. 113-120 (hereinafter called the second reference document) or in the papers of "Parallel Processing Symposium JSPP '92", pp. 265-272 (hereinafter called the third reference document). In this technique disclosed, if a small instruction in each VLIW instruction is a NOP instruction, this NOP instruction itself is deleted. More specifically, each valid small instruction of a VLIW instruction is provided with a field for storing the number of NOP instructions (hereinafter also called a NOP number). The number of NOP instructions to be executed by the function unit assigned to a valid small instruction is stored in this field, and one or a plurality of NOP instructions contained in one or a plurality of consecutive VLIW instructions before the valid small instruction are deleted. Namely, after the valid small instructions contained in preceding VLIW instructions are executed by the function unit, the execution of the next valid small instruction is delayed by the number of cycles determined by the NOP number. With this method, preceding NOP instructions can be deleted for each instruction field. Therefore, the total numbers of NOP instructions and VLIW instructions can be reduced more than the first reference document. Furthermore, since only the number of deleted NOP instructions is stored in place of the NOP instructions themselves, the length of each VLIW instruction increases not so much. Therefore, the capacity of a program constituted such VLIW instructions can be reduced considerably as compared to the system not adopting this technique.
According to the technique disclosed in JP-A-7-105003 (hereinafter called the fourth reference document), a series of VLIW instructions containing other instructions in addition to NOP instructions is compressed and stored in a main storage or the like. The series of stored VLIW instruction is expanded and executed. Specifically, each of VLIW instructions in a program having different structures is replaced by a code sequence with a variable length, and each of VLIW instructions having the same structure is replaced by a predetermined code sequence. This compressed program made of a plurality of such code sequence is stored in a main storage. A plurality of non-compressed VLIW instructions corresponding to the plurality of code sequence of the compressed program are stored in an instruction decode memory provided separately from the main storage. In executing the compressed program, each non-compressed VLIW instruction of the corresponding code sequence of the compressed program is read from the instruction decode memory and executed. In order to determine the code sequence, the VLIW instruction having the same structure is required to have the same OP code and operand value in each small instruction. With this technique, each VLIW instruction is replaced by a code sequence having a shorter length so that this technique can be considered as a method of compressing VLIW instructions in space.
As compared with the conventional techniques described in the first to third reference documents, the conventional technique described in the fourth reference document is expected to generate a smaller program because a series of VLIW instructions containing other instructions in addition to NOP instructions is compressed. However, this technique described in the fourth reference document discriminates the structure of a VLIW instruction while considering even the operand field in the small instruction used for designating a register and the like. Therefore, the number of VLIW instructions in a program judged as having the same structure is not so large, and there is a possibility that the compression factor becomes not so large.