VLIW, Very Long Instruction Word, processors are a class of parallel processors that perform multiple operations synchronously by executing very long instruction words, each of which consists of multiple fields, and each for one operation. One of the major problems with VLIW processing occurs when the executable code has a corresponding instruction set which cannot be handled by the VLIW processor. The reason that the instruction set cannot be handled is that oftentimes the executable code is compiled by a VLIW compiler which is incompatible with the specific VLIW processor employed to do the task. This happens when the executable code has been previously compiled for a different VLIW processor. Thus, problems with compatibility between executable code and VLIW processors happen most often in the case of operating system libraries, utility libraries and frequently used application programs which have been previously compiled.
By way of example, editing utilities commonly used for every type of processing are usually compiled without regard to the specific processor on which the editing utility is to be run. For instance, a typical VLIW processor may have a parallel operations capability of 4 or 8. This means that the VLIW processor can process 4 or 8 instructions in parallel. However, the compiled executable code may require parallel processing of 6 or 12 instructions. It will be appreciated that there is no necessary match between the capability of the VLIW processor and the requirements of the executable code. There is therefore a need to accommodate such a mismatch.
By way of background, VLIW processors are contrasted with other classes of parallel processors, among which SIMD, Single Instruction Stream--Multiple Data Streams, and MIMD, Multiple Instruction Streams--Multiple Data Streams, classes are well known. In an SIMD processor, an instruction is broadcast to all the processing units, which perform the same operation specified by the instruction in parallel in a synchronous manner. In an MIMD processor, the constituent CPUs, or Central Processing Units which autonomously execute instructions, perform operations in parallel in a synchronous manner.
VLIW processors provide more flexible parallel processing than SIMD processors because they can perform different operations on different data in parallel and more efficient synchronous parallel processing than MIMD processors because operations that can be performed in parallel are encoded in very long instruction words.
Unlike conventional processors, whose instructions encode one operation, VLIW processors execute very long instruction words, each containing several operation fields. Instructions for VLIW processors are produced by a VLIW compiler, which reads a program written in a high-level programming language like C, translates high-level constructs into primitive operations supported by the processor, checks the dependency of operations to find operations that can be performed in parallel, and encodes operations into VLIW instructions. If the compiler cannot find many parallel operations to fill up a VLIW instruction, it has to encode NOP, e.g., no operation, in some of its fields.
The number of NOP fields greatly affects the efficiency of the VLIW processor as well as the size of executable code. Although programs that are abundant in parallelism are compiled into compact executable codes containing a small number of NOP fields, programs that are not abundant in parallelism are compiled into large executable codes containing a large number of NOP fields. Note that bode efficiency is also affected significantly by the compiler's capability to detect parallelism in programs.
The fields of the VLIW instructions correspond to the functional units implemented in the VLIW processor. The functional units are not necessarily of the same kind. For example, integer units and floating-point units are different in operation. Once a VLIW processor is implemented with a certain number of functional units, it is very difficult to change their number and their functionality later to follow advances in computer and semiconductor technologies. Any changes to the number and functionality of functional units require the complete recompilation of all the programs, clearly a disadvantage.
Since the VLIW instruction format directly reflects the internal structure of the VLIW processor in terms of functional units, it is very difficult and very inefficient, if possible, to use the same VLIW instruction set for VLIW processors which have different structures. Thus, it is almost impossible to keep object-level code compatibility among different VLIW processor implementations, which may be required for different applications. Thus, the conventional VLIW processor architecture is not scalable.
The following lists the problems with the conventional VLIW processors. First, a VLIW processor can not efficiently support a wide range of programs which differ in the amount of parallelism. Second, the VLIW compiler tends to be large and very complicated because it has to use sophisticated but time-consuming techniques to produce the quality code which contains a small number of NOP fields in the VLIW instructions. Third, any changes to the functional units in the VLIW processor cause the complete recompilation of all the programs. Finally, it is very difficult and inefficient, if possible, to use the same VLIW instruction set for different VLIW processor implementations.