A number of data processor instruction set architectures (ISAs) operate with fixed length instructions. For example, several Reduced Instruction Set Computer (RISC) architecture data processors, such as one known as the PowerPC™ (PowerPC is trademark of the International Business Machines Corporation), feature instruction words that have a (fixed) width of 32 bits. Another conventional architecture, known as IA-64 EPIC (Explicitly Parallel Instruction Computer), uses a fixed format of three instructions per 128 bits, and a 32-bit Modifier field (the first word in every quadword) that provides up to 10 additional instruction bits for each of the next three instructions of the quadword.
As instruction pipelines become deeper and memory latencies become longer, more instructions must be in flight (executing) at once in order to keep data processor execution units well utilized. However, in order to increase the number of non-memory operations in flight, it is generally necessary to increase the number of registers in the data processor, so that independent instructions may read their inputs and write their outputs without interfering with the execution of other instructions. Unfortunately, in most RISC architectures there is not sufficient space in a 32-bit opcode (instruction word) for operands to specify more than 32 registers, i.e., 5-bits per operand, with most operations requiring three operands and some requiring two or four operands.
In addition, as the conventional fixed-width data processor architectures age, new applications become important, and these new applications may require new types of instructions to run efficiently. For example, in the last few years multimedia vector extensions have been made to several ISA's, for example SSE-2 for the IA-32 architecture and VMX (also known Altivec™, a trademark of Motorola, Inc., or by Velocity Engine™, a trademark of Apple Computer, Inc.) for the PowerPC™ architecture. However, with only a fixed number of bits in an instruction word, it has become increasingly difficult or impossible to add new instructions/opcodes to many architectures.
Several techniques for extending instruction word length have been proposed and used in the prior art. For example, Complex Instruction Set Computer (CISC) architectures generally allow the use of a variable length instruction. However variable instruction lengths have at least three significant drawbacks.
A first drawback to the use of variable length instructions is that they complicate the decoding of instructions, as the instruction length is generally not known until at least a part of the instruction has been read, and because the positions of all operands within an instruction are likewise not generally known until at least part of the instruction is read.
A second drawback to the use of variable length instructions is that variable length instructions may cross a memory page boundary. In modern data processors having address translation this means that both the lower order and higher order parts of the instruction address must be checked to ensure that they have a valid mapping from the effective address space given by the instruction pointer to the physical address space of the machine, with an appropriate exception being signaled if one or both parts of the instruction address do not have a valid mapping. It is noted that page crossings cannot occur if: (1) instructions have a fixed width of 32-bits (or equivalently 4 bytes, or any number of bytes that is a power of 2); and (2) instruction addresses are aligned on a “natural” byte boundary corresponding to the width of the instruction, e.g., 4 byte instructions on 4-byte boundaries.
A third drawback to the use of variable length instructions is that instructions of variable width are not compatible with the existing code for fixed width data processor architectures.
The use of a fixed width 64-bit instruction word (or other higher powers of two) would avoid the first two problems, but not the third. However, the use of 64-bit instructions introduces the further difficulty that the additional 32-bits beyond the current 32-bit instruction words are far more than what is needed to specify the numbers of additional registers required by deeper instruction pipelines, or the number of additional opcodes likely to be needed in the foreseeable future. The use of excess instruction bits wastes space in main memory and in instruction caches, thereby slowing the performance of the data processor.
The above-mentioned IA-64 architecture packs three instructions into 16 bytes (128-bits), for an average of 42.67 bits per instruction. While this type of instruction encoding avoids problems with page and cache line crossing, this type of instruction encoding also exhibits several problems, both on its own, and as a technique for extending other fixed instruction width ISAs.
First, and without incurring significant implementation difficulty (likely slowing the execution speed and requiring significantly more integrated circuit die area), this technique allows branches to go only to the first of the three instructions, whereas most other architectures allow branches to any instruction.
Second, this technique also “wastes” bits for specifying the interaction between instructions. For example, “stop bits” are used to indicate if all three instructions can be executed in parallel, or whether they must be executed sequentially, or whether some combination of the two is possible.
Third, the three instruction packing technique also forces additional complexity in the implementation in order to deal with three instructions at once.
Finally, the three instruction packing format for IA-64 has no requirement to be compatible with existing 32-bit instruction sets. As a result, there is no obvious mechanism to achieve compatibility with other fixed width instruction encodings, such as the conventional 32-bit RISC encodings.
Prior to this invention, the problems that were inherent in the prior art instruction word extension approaches were not adequately addressed or solved.