a. Field of the Invention
The invention relates to VLIW (Very Long Instruction Word) processors and in particular to instruction formats for such processors and apparatus for processing such instruction formats.
b. Background of the Invention
VLIW processors have instruction words including a plurality of issue slots. The processors also include a plurality of functional units. Each functional unit is for executing a set of operations of a given type. Each functional unit is RISC-like in that it can begin an instruction in each machine cycle in a pipe-lined manner. Each issue slot is for holding a respective operation. All of the operations in a same instruction word are to be begun in parallel on the functional unit in a single cycle of the processor. Thus the VLIW implements fine-grained parallelism.
Thus, typically an instruction on a VLIW machine includes a plurality of operations. On conventional machines, each operation might be referred to as a separate instruction. However, in the VLIW machine, each instruction is composed of operations or no-ops (dummy operations).
Like conventional processors, VLIW processors use a memory device, such as a disk drive to store instruction streams for execution on the processor. A VLIW processor can also use caches, like conventional processors, to store pieces of the instruction streams with high bandwidth accessibility to the processor.
The instruction in the VLIW machine is built up by a programmer or compiler out of these operations. Thus the scheduling in the VLIW processor is software-controlled.
The VLIW processor can be compared with other types of parallel processors such as vector processors and superscalar processors as follows. Vector processors have single operations which are performed on multiple data items simultaneously. Superscalar processors implement fine-grained parallelism, like the VLIW processors, but unlike the VLIW processor, the superscalar processor schedules operations in hardware.
Because of the long instruction words, the VLIW processor has aggravated problems with cache use. In particular, large code size causes cache misses, i.e. situations where needed instructions are not in cache. Large code size also requires a higher main memory bandwidth to transfer code from the main memory to the cache.
Large code size can be aggravated by the following factors.
In order to fine tune programs for optimal running, techniques such as grafting, loop unrolling, and procedure inlining are used. These procedures increase code size.
Not all issue slots are used in each instruction. A good optimizing compiler can reduce the number of unused issue slots; however a certain number of no-ops (dummy instructions) will continue to be present in most instruction streams.
In order to optimize use of the functional units, operations on conditional branches are typically begun prior to expiration of the branch delay, i.e. before it is known which branch is going to be taken. To resolve which results are actually to be used, guard bits are included with the instructions.
Larger register files, preferably used on newer processor types, require longer addresses, which have to be included with operations.
A scheme for compression of VLIW instructions has been proposed in U.S. Pat. Nos. 5,179,680 and 5,057,837. This compression scheme eliminates unused operations in an instruction word using a mask word, but there is more room to compress the instruction.
It is an object of the invention to reduce code size in a VLIW processor.
This object is met by using a compression scheme in which, within an instruction having a plurality of operations, each operation is compressed. Compression includes assigning a compressed operation length to the operation. The compression includes choosing one of a plurality of finite lengths. The finite lengths include at least one non-zero length. Which length is chosen depends on a feature of the operation. Branch targets are not compressed. For each instruction, information about compression format is stored in a previous instruction.
Further information about technical background to this application
The following prior applications are incorporated herein by reference:
U.S. application Ser. No. 07/998,080, filed Dec. 29, 1992 (PHA 21, 777), now abandoned, which shows a VLIW processor architecture for implementing fine-grained parallelism;
U.S. application Ser. No. 07/142,648 filed Oct. 25, 1993 (PHA 1205), now U.S. Pat. No. 5,450,556, which shows use of guard bits; and
U.S. application Ser. No. 08/366,958 filed Dec. 30, 1994
(PHA 21,932), now U.S. Pat. No. 6,370,623, which shows a register file for use with VLIW architecture.
Bibliography of program compression techniques:
J. Wang et al, xe2x80x9cThe Feasibility of Using Compression to Increase Memory System Performancexe2x80x9d, Proc. 2nd Int. Workshop on Modeling Analysis, and Simulation of Computer and Telecommunications Systems, p. 107-113 (Durham, N.C., USA 1994);
H. Schrxc3x6der et al., xe2x80x9cProgram compression on the instruction systolic arrayxe2x80x9d, Parallel Computing, vol. 17, n 2-3, June 1991, p.207-219;
A. Wolfe et al., xe2x80x9cExecuting Compressed Programs on an Embedded RISC Architecturexe2x80x9d, J. Computer and Software Engineering, vol. 2, no. 3, pp 315-27, (1994);
M. Kozuch et al., xe2x80x9cCompression of Embedded Systems Programsxe2x80x9d, Proc. 1994 IEEE Int. Conf. on Computer Design: VLSI in Computers and Processors (Oct. 10-12, 1994, Cambridge Mass., USA) pp.270-7.
Typically the approach adopted in these documents has been to attempt to compress a program as a whole or blocks of program code. Moreover, typically some table of instruction locations or locations of blocks of instructions is necessitated by these approaches.