1. Field of the Invention
The present invention generally relates to parallel execution of primitive instructions in data processors and, more particularly, to a mechanism for the representation of very long instruction word (VLIW) programs in such a way that the programs do not reflect the organization (i.e., implementation) of the processor where they are executed.
2. Background Description
Very long instruction word processors are a suitable alternative for exploiting instruction-level parallelism in programs; that is, executing more than one basic (i.e., primitive) instruction at a time. These processors contain multiple functional units, fetch from the instruction cache a very long instruction word (VLIW) containing several primitive instructions, and dispatch the entire VLIW for parallel execution. These capabilities are exploited by compilers which generate code that has grouped together independent primitive instructions executable in parallel. The processor has relatively simple control logic because it does not perform any dynamic scheduling nor reordering of operations, as is the case in superscalar processors.
An apparent limitation of VLIW processors is the lack of object code compatibility with the object code used by sequential (i.e., scalar and superscalar) processors, because such a code has not been parallelized for VLIW. Conversely, an apparent limitation is that the code used by a VLIW processor cannot be used by a scalar or superscalar processor, because the parallel code uses features that exist only in VLIW implementations. Furthermore, another apparent limitation is the lack of object code compatibility for VLIW implementations having varying degrees of parallel execution capabilities, because the code reflects the detailed structure (e.g., parallel execution capabilities) of one specific implementation, which is different from the others. As a result, the VLIW approach appears as unable to enhance an existing family of scalar and superscalar processors, which has lead to the perception that VLIW processors are limited to their suitability for being adopted.
The perceived limitations described above are actually a consequence of how the implementations of the VLIW concept have been carried out in the past. See, for example, R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth and P. K. Rodman, "A VLIW architecture for a trace scheduling compiler", IEEE Transactions on Computers, Vol. C-37, No. 8, pp. 967-979, 1988; G. R. Beck, D. W. L. Yen and T. L. Anderson, "The Cydra 5 mini-supercomputer: architecture and implementation", The Journal of Supercomputing, Vol. 7, No. 1/2, pp. 143-180, 1993; and A. E. Charlesworth, "An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family", IEEE Computer, Vol. 14, No. 9, pp. 18-27, 1981. Processors such as those reported in these articles have made visible features of the implementation to the compiler/programmer, including the number, types and location of the functional units, under the assumption that the compiler could better exploit the hardware if it has good knowledge of its features and limitations. VLIW programs have been represented as sets of VLIWs which specify exactly the operations performed in each function unit on a cycle-by-cycle basis, as determined by the compiler (known as static scheduling). This is drastically different from the approach used in conventional scalar and superscalar processors, which at run time perform the analysis and decisions regarding which operations are executed in each cycle (known as dynamic scheduling), so that the detailed features of the processor need not be known by the compiler. In other words, the separation among architecture and implementation that is common practice in processor design for scalar and superscalar implementation has been sacrificed in VLIW implementations, in order to better exploit the capabilities of the hardware by the compiler/programmer.
Although the benefits of exposing the details of the implementation to the compiler/programmer are clear, this has lead to the perception that such an exposure is a requirement for a VLIW processor. Furthermore, there have been very few proposals on how to describe or represent a VLIW program without depending on the specific aspects of an implementation, so that the perceived requirement has been sustained.