This invention relates to very long instruction word (VLIW) computing architectures, and more particularly to methods and apparatus for reducing storage requirements of VLIW instructions.
Multimedia computing applications such as image processing are more efficiently implemented using parallel structures for handling multiple streams of data. VLIW processors, such as the TMS320C6×manufactured by Texas Instruments of Dallas Tex. and the MAP1000 manufactured by Hitachi Ltd. of Tokyo Japan and Equator Technologies of Campbell Calif., support a large degree of parallelism of both data streams and instruction streams to implement parallel or pipelined execution of program instructions.
A VLIW processor includes one or more of multiple homogeneous processing blocks referred to as clusters. Each cluster includes a common number of functional processing units. A VLIW instruction includes multiple subinstruction fields. The size of the VLIW instruction grows linearly with the number of parallel operations being defined concurrently in the subinstruction fields. The subinstructions present in an instruction are distributed among functional processing units for parallel execution.
Conventional VLIW processors typically execute fewer than ten operations per instruction. The number of concurrent executions is likely to increase substantially in future media processors with instructions likely to be 256 or 512 bits wide. As the size of the instruction increases, however, a correspondingly increased burden on the data flow and memory structures occurs. To provide enough instruction fetch bandwidth, the VLIW instructions typically are fetched first from external memory and stored in an on-chip instruction cache before being executed. Thrashing of the cache (i.e., cyclic misses), for example, during a tight processing loop is very undesirable resulting in degraded performance. Accordingly, it is increasingly desirable to manage the instruction cache effectively to sustain a desired high processing throughput.
At the same time, the need for a larger instruction cache increases as the clock frequency of the processor increases, as wider VLIW architectures are adapted, and as more complex algorithms are developed. Accordingly, there is a need for methods of efficiently handling and caching VLIW instructions.