1. Field of the Invention
This invention relates to the field of data processing. More particularly, this invention relates to data processing systems having a plurality of data path elements operable independently to perform in parallel respective data processing operations specified by a program instruction, such as, for example, so called very long instruction word (VLIW) processors and measures to reduce program code size for such systems.
2. Description of the Prior Art
The known TMS3206xx processor produced by Texas Instruments is designed for high-speed operation (e.g. 1 GHz) and consequently contains a simple instruction decoder. This processor uses 32-bit instructions. Instructions are loaded from a memory in a 256-bit fetch packet containing eight 32-bit instructions. Each instruction contains a bit (the P bit) that indicates if the next instruction in the fetch packet can be executed in the same clock cycle. Instructions that execute in the same clock cycle are called the execute packet. Since an execute packet cannot cross a fetch packet boundary, the P bit of the last instruction in the fetch packet must be cleared. If a functional unit within the processor is not addressed by an instruction within the execute packet, then it performs a default operation, such as a Nop.
The SC140 Processor produced by StarCore builds its instruction words up out of 16-bit words. Most instructions consist of a single 16-bit instruction word. Some instructions need two instruction words. An instruction prefix word (16 or 32 bits) can be specified. This prefix is used to extend the number of addressable regiaster fields, conditionally execute instructions (guarded execution), or to specify the number of instructions to be executed in one clock cycle. If no prefix word is used, then the instructions are linked together using a bit in the instructions in a similar way to the TMS320C6xx processor discussed above.
Within the SC 140 processor, instructions are fetched from the memory in 128-bit units (8*16-bit words). Up to six functional units can be controlled in one clock cycle. The instructions that execute in one clock cycle can span a 128-bit boundary. An instruction alignment circuit performs necessary alignment operations when the instructions span such a boundary.
The Thumb enabled scalar processors produced by ARM Limited are able to execute either 32-bit ARM code or 16-bit Thumb code. The Thumb instruction set does not provide all instructions that can be specified within the ARM instruction set.
VLIW processors such as the TMS320C6xx and SC140 processors are advantageous in providing for highly parallel execution of data processing operations. However, as the complexity of processing operations to be performed steadily increases, the high program memory storage requirements associated with these VLIW processors become a significant disadvantage.