Programmable digital data processors use instructions, which are stored in memory, to tell the processor how to perform a specific task. Instructions typically include an operation code (opcode), which tells the processor what operation to perform, and operand specifiers, which indicate the location of input and output data used by the operation. The instructions to be performed by the processor are often stored in program memory and the data to be used by the instructions is often stored in data memory. Typical operations include loading data from memory, storing data to memory, performing arithmetic and logic operations, and branching to a different location in a program.
The amount of program memory used to implement a specific task or set of tasks is referred to as the code size. The code size depends on the size of the individual instructions, the complexity of the instructions, the complexity of the task or set of tasks, and other factors. In modern processors, instructions typically have a fixed size, since this allows the instructions to be efficiently fetched from memory, decoded, and executed. Because of how memory systems are designed, the instruction size in bits is often restricted to be a whole number power of two (e.g., 16 bits, 32 bits or 64 bits).
Small code size is an important goal in the design of low-power embedded processors, such as digital signal processors, multimedia processors and graphics processors. Thus, these types of architectures often feature compact instructions that are fairly powerful. For example, in a traditional embedded processor architecture, a 16-bit multiply-accumulate instruction might be used to specify that the values in two registers, RC and RB, should be multiplied together, and added to the value in an accumulator register, RA, with the result being stored back to the accumulator register RA.
To achieve small code size, many processors implement Single Instruction Multiple Data (SIMD) processing techniques. With SIMD processing, a single instruction is used to perform the same operation on multiple data operands. SIMD processing is especially useful when performing the same operation on multiple vector or matrix elements.
All programmable processors use some type of instruction format. Conventional instruction formats are described in, for example, John L. Hennessy and David A. Patterson, “Computer Architecture: A Quantitative Approach,” Third Edition, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 2003.
In order to achieve high performance, modern embedded processors for applications such as digital signal processing, multimedia and graphics often have Very Long Instruction Word (VLIW) architectures. Examples are described in J. A. Fischer, “Very Long Instruction Word Architectures and ELI-512,” Proceedings of the Tenth Symposium on Computer Architecture, pp. 140-150, June 1983, R. Colwell et al., “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Transactions on Computers, pp. 967-979, August 1988, and N. Seshan, “High VelociTI Processing: Texas Instruments VLIW DSP Architecture,” IEEE Signal Processing Magazine, Vol. 15, No. 2, pp. 86-101, March 1998. With these architectures, a single VLIW specifies multiple operations that can execute in parallel. For example, a 256-bit VLIW might have eight operation fields, each of which is specified using 32 bits. Although VLIW architectures typically offer improved performance over architectures that perform only a single operation each cycle, they may have much larger code size, since operation fields that cannot be utilized in a given cycle are filled with no operation (NOP) instructions.
More recently, Explicitly Parallel Instruction Computing (EPIC) architectures have been proposed. See, e.g., M. Smotherman, “Understanding EPIC Architectures and Implementations,” ACM Southeast Conference, 2002, and M. Schlansker and B. Rau, “EPIC: Explicitly Parallel Instruction Computing,” IEEE Computer, pp. 37-45, February, 2000. These architectures often contain additional bits in the instruction, which indicate the operations in the instruction that can execute in parallel or if multiple instructions can execute in parallel. Although these architectures often have more compact code than VLIW processors, they add complexity to the processor hardware.
Accordingly, a need exists for an improved approach to achieving small code size, particularly in low-power embedded processors, which avoids the problems associated with the above-described VLIW and EPIC approaches.