Very long instruction word (VLIW) techniques can be used to execute multiple instructions concurrently in a processor, thereby increasing processor performance. When a program is compiled for a VLIW processor, multiple instructions of the program are combined together into a single very long instruction word. During execution of the program, a very long instruction word is fetched from memory and decoded, and each of the instructions within the very long instruction word is input to one of multiple functional units of the processor where it is executed. Each of the instructions within a very long instruction word can be input to a different functional unit, so each of the instructions within the very long instruction word can be executed concurrently.
Although this concurrent execution of multiple instructions can improve processor performance, there are still problems associated with this approach. One such problem is bandwidth requirements in the processor due to the number of bits in a very long instruction word, also referred to as the width of the very long instruction word. As the very long instruction words can include multiple instructions, a large number of bits may be used. Accordingly, this large number of bits typically employs a relatively wide instruction memory in order to accommodate the width of the very long instruction words. Further, a wide instruction path from the instruction memory to the decode unit is also employed in order to accommodate the width of the very long instruction words. These required widths increase the costs of VLIW processors and increase the physical space used within VLIW processors to route data within the processors.