1. Field of the Invention
The present invention is directed to microprocessor systems; more particularly, the invention is directed to high-performance reduced instruction set computer (RISC) architecture processors which implement highly efficient usage of instruction width.
2. Description of Related Art
The design of processor instruction sets is a well-established art. Most instruction set features are not new in themselves. However, individual features can be combined in new and unique ways that advance the state of the art. In particular, when instruction set design is optimized for a different use than prior instruction sets, significant improvements may result when a processor implementing that instruction set is used in the target application.
Instruction set design needs to balance many competing goals, including the size of the machine code required to encode various algorithms; the extensibility and adaptability of the instruction set for new algorithms and applications; the performance and power consumption of processors that implement the instruction set on such algorithms; the cost of processors that implement the instruction set; the suitability of the instruction set for multiple processor implementations over time; the complexity of design of processors that implement the instruction set; and the suitability of instruction set as a target for compilation from high-level programming languages.
The instruction set has one direct and two indirect influences on processor performance. The instruction set directly determines IE, the number of instructions required to implement a given algorithm, although the suitability of the instruction set as a target for compilation is a factor here as well. The other components of processor performance are clock period CP and the average clocks per instruction CPI. These are primarily attributes of the implementation of the instruction set, but instruction set features may affect the ability of the implementor to simultaneously meet time per clock and clocks per instruction goals. For example, an encoding choice might mandate additional logic in series with the rest of instruction execution, which an implementor would either address by increasing the time per clock, or by adding an additional pipeline stage, which will usually increase the clocks per instruction.
In the 1980s and 1990s, a new instruction set architecture called RISC developed. It was born of the realization of the above tradeoff, namely that EQU T=IE*CPI*CP
where T is the program execution time in seconds and the other variables are as described above. RISC instruction sets allowed implementors to improve CPI and CP significantly without increasing IE by much. RISC instruction sets improved the performance of processors, lowered design complexity, allowed lower cost processor implementations at a given performance level, and was well suited to compilation from high-level programming languages.
The processor architecture community has never agreed on a completely satisfactory definition of RISC, but it has generally included most of the following attributes: fixed size instruction words; arithmetic and other computation operations are performed on operands read from a general register file with 16 or more registers and results are written to the same register file; fixed positions in the instruction word for source register fields so that register file access can occur in parallel with instruction decode; memory access is primarily done via loads from memory to registers, and stores to memory from registers (as opposed to having memory operands in computational instructions); a small number (often 1, usually less than 4) of methods for computing memory addresses; avoidance of features that would make pipelined execution of instructions difficult (e.g., use of a hardware resource more than once by a given instruction); and avoidance of features that require microcode or its equivalent. Not all processors considered to be RISCs contain all of the above elements, but all contain most of the above.
The early RISC instruction sets were not however particularly efficient at producing compact machine code. In particular, RISC instruction sets usually required more bits to encode an application than pre-RISC instruction sets. The size of the machine code for an application is often more important than the cost of the processor itself in the total solution cost, because larger memories are required to hold the application. RISC is still acceptable in many applications where performance is most important, but instruction sets that have the advantages of RISC but reduced code size would be useful in many other processor applications.
Some of the early processor instruction sets (IBM 7090, CDC 6600, DEC PDP6, GE 635) had some of the characteristics of RISC because they were designed to be directly executed by hardware, without microcode, like RISC. Most of these instruction sets are not very suitable for modem high-level languages and applications because of features such as word (as opposed to byte) addressing, limited address space, and peculiar combinations of operations. Most were in fact intended for assembly language programming. Several were also based on 36-bit data word and instruction width, and 36-bit instructions are not very good for code density. Several were based on an accumulator and memory paradigm for computation, which limits performance. None had the desired characteristics, although some of the individual features of this invention can be traced to these generations of machines.
The use of microcode to implement processors made more complicated instruction sets feasible (IBM 360, DEC PDP11, DEC VAX, Intel x86, LLNL S-1, Motorola 68000). The next generation of processors therefore had complex instruction sets with good code density, partially due to complex variable instruction length encodings. However, microcoded processors and their complex instruction sets were often not well-suited to high performance. Complex instructions were implemented by iteration of a micro-engine, instead of direct execution in a hardware pipeline, which increased CPI.
Various different styles of instruction set design emerged in this era with a tendency away from one or two accumulators to either general register architectures or stack architectures. The implementation cost of registers or stacks had become low enough that instruction sets could use these advantageous styles.
As mentioned above, although a significant improvement in performance, RISC was a set-back for code density. Most RISC instruction sets are based on fixed length 32-bit instructions, and 32 bits turns out to be more than is necessary. Also, some sort of variable length encoding is necessary to achieve the best code density. Stack architectures faded away at this point because of their low performance, despite their code size advantage, which shows how important it is that an instruction set achieve both performance and code size goals.
To compensate for the code size disadvantage of RISC, several processor designers introduced compact encodings of their instruction sets. ARM's Thumb and MIPS' MIPS16 are examples. Both use predominately 16-bit instructions with a small number of 32-bit instructions. The 16-bit encodings (which provide smaller code by halving the number of bits per instruction), yield poor performance because of having only 8 registers (increases IE), the use of implied source register operands (increases CP or CPI), limited range of constants in the instruction word (increases IE), and restrictions on the number of distinct register operands (two or less for most instructions--increases IE).
The Hitachi SH instruction set is RISC-like, and targeted code size as an objective. It started with a 16-bit instruction set, but found it later necessary to add 32-bit instructions. It has 16 registers, but still has at most two register fields per instruction (increases IE), and has severely limited branch offsets.
What is needed is an instruction set design that gives the performance and other advantages of RISC, and yet provides small cost-effective machine code. To facilitate high-performance implementations without excessive complexity, the instruction set should be directly executable without microcode by a simple, short pipeline. There should be a sufficient number of general registers to achieve good performance and to be a suitable target for optimizing compilers. Other techniques may be used to further reduce code size.