The present invention relates generally to improvements in very long instruction word (VLIW) processing, and more particularly to advantageous methods and apparatus for instruction addressing in indirect VLIW (iVLIW) processors.
In signal processing applications a high percentage of the algorithms use loops, usually with high iteration counts and consisting of relatively few instructions. Inside these loops, dramatic performance gains can usually be made by providing multiple functional units and executing instructions in parallel. A VLIW architecture provides a way to achieve these gains.
In typical VLIW processors, a wide memory for storing the VLIWs is provided. The memory is accessed for each instruction fetched and fed directly to the decode logic to control the execution of multiple execution units in parallel. A problem or inefficiency of operation results because sequential code does not make efficient use of the long instruction word. Underutilization of the very wide instruction memory results. In addition, treating the traditional VLIW memory as the central instruction memory for an array of processing elements would not work due to the necessity of distributing the wide VLIW bus throughout the array causing path timing and area problems.
An embodiment of a manifold array instruction set in accordance with the present invention provides for indirect VLIWs as described more fully below. A VLIW is selected by reference rather than by loading its constituent instructions as part of a single instruction stream. This separation of the program flow short instruction word (SIW) selection from VLIW selection allows both sequential code, a sequence of short instruction words, and parallel operations in the form of VLIWs to be encoded efficiently. The indirect nature of VLIW access in accordance with the present invention allows for great flexibility in both VLIW execution control and in the efficiency of VLIW memory usage. The invention described herein provides a programmer with a degree of flexibility in VLIW execution and loading which closely parallels that which is available for data access. This flexibility is provided by supplying the programmer with a set of addressing modes for VLIW access which are similar to data memory addressing modes. Some of these addressing modes allow a synchronous MIMD mechanism for the selection of different VLIWs in each PE in parallel and in synchronism. In addition, other addressing modes support the automatic incrementing of the VLIW memory address providing hardware support for selecting different VLIWs in an ordered sequence.