1. Field of the Invention
The present invention generally relates to parallel execution of primitive instructions in data processors and, more particularly, to a mechanism for the representation of very long instruction word (VLIW) programs in such a way that the programs do not reflect the organization (i.e., implementation) of the processor where they are executed.
2. Background Description
A Very Long Instruction Word (VLIW) is an instruction that contains more than one basic (i.e., primitive) instruction. A Very Long Instruction Word processor is a suitable alternative for exploiting instruction-level parallelism in programs; that is, for executing more than one basic (i.e., primitive) instruction at a time. These VLIW processors fetch from the instruction cache a very long instruction word and dispatch the primitive instructions contained in the VLIW to multiple functional units for parallel execution. These capabilities are exploited by compilers which generate code that has grouped together independent primitive instructions executable in parallel. The VLIW processor has relatively simple control logic because it does not perform any dynamic scheduling nor reordering of operations, as is the case in superscalar processors.
An apparent limitation of VLIW processors is the lack of object-code compatibility with the object-code used by sequential (i.e., scalar and superscalar) processors, because such a code has not been parallelized for VLIW. Conversely, an apparent limitation is that the code used by a VLIW processor cannot be used by a scalar or superscalar processor, because the parallel code uses features that exist only in VLIW implementations. Furthermore, another apparent limitation is the lack of object code compatibility for VLIW implementations having varying degrees of parallel execution capabilities, because the code reflects the detailed structure (e.g., parallel execution capabilities) of one specific implementation, which is different from the others. As a result, the VLIW approach appears as unable to enhance an existing family of scalar and superscalar processors, which has lead to the perception that VLIW processors are limited in their suitability for being adopted.
The perceived limitations described above are actually a consequence of how the implementations of the VLIW concept have been carried out in the past. See, for example, R. P. Colwell, R. P. Nix , J. J. O'Donnell, D. B. Papworth and P. K. Rodman, "A VLIW architecture for a trace scheduling compiler", IEEE Transactions on Computers, Vol. C-37, No. 8, pp. 967-979, 1988; G. R. Beck, D. W. L. Yen and T. L. Anderson, "The Cydra 5 mini-supercomputer: architecture and implementation", The Journal of Supercomputing, Vol. 7, No. 1/2, pp. 143-180, 1993; and A. E. Charlesworth, "An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family", IEEE Computer, Vol. 14, No. 9, pp. 18-27, 1981. Processors such as those reported in these articles have made visible features of the implementation to the compiler/programmer, including the number, types and location of the functional units, under the assumption that the compiler could better exploit the hardware if it has good knowledge of its features and limitations. VLIW programs have been represented as sets of VLIWs which specify exactly the operations performed in each functional unit on a cycle-by-cycle basis, as determined by the compiler (this is known as static scheduling). This is drastically different from the approach used in conventional scalar and superscalar processors, which at run time perform the analysis and decisions regarding which operations are executed in each cycle (known as dynamic scheduling), so that the detailed features of the processor need not be known by the compiler. In other words, the separation among architecture and implementation that is common practice in processor design for scalar and superscalar implementations has been sacrificed in VLIW implementations, in order to better exploit the capabilities of the hardware by the compiler/programmer.
Although the benefits of exposing the details of the implementation to the compiler/programmer are clear, this has lead to the perception that such an exposure is a requirement for a VLIW processor. Thus, there is a need to develop a mechanism that represents a VLIW program without depending on the specific aspects of an implementation, so that the perceived requirement is sustained.