1. Field of the Invention
This invention relates to the field of computer architecture and compiler design. More specifically, the invention is a method and apparatus for generating instructions capable of addressing an additional set of registers on a central processing unit (CPU).
2. Background
There is an ongoing effort to make computer programs as efficient as possible in their execution time. This has driven the development of multiple issue computer processors. The multiple issue computer architecture improves execution efficiency by processing multiple instructions in parallel each clock cycle rather than only a single instruction each clock cycle. Currently, multiple issue processors issue from 2-4 instructions during each clock cycle on the processor. Some examples of multiple issue processor designs are Sun's UltraSPARC.TM. (4 issue), IBM's PowerPC.TM. series (2-4) issue, MIPS' R10000.TM. and Intel's Pentium-Pro.TM. (3 issue). (These processor names are the trademarks respectively of Sun Microsystems, Inc., International Business Machines (IBM) Corp., MIPS Technologies, Inc. and Intel Inc.). Typically, these processors contain multiple I/O memory ports, integer adders, floating point adders, multipliers and other functional units which enable them to execute multiple instructions simultaneously each machine cycle.
A compiler takes the source code of a computer program and generates an executable module containing machine specific instructions designed to execute on a particular processor. Most modern compilers also have an optimizer designed to schedule instructions and make use of available hardware resources in the most efficient manner possible. A multiple issue processor typically requires an optimizing compiler to maximize the issuance of multiple instructions each machine cycle. FIG. 1 illustrates a conventional optimizing compiler 100 used for a multiple issue processor. The compiler 100 typically contains a front end 102, a code generation section 104, an optimizer section 106 and a backend section 108. First, the source code for a computer program is generated by a user and provided to the front end 102 of the compiler where various pre-processing functions are performed. Next, the code is provided to the code generation section 104 which generates a set of instructions expressed in an intermediate code which is semantically equivalent to the source code. Typically, the intermediate code is expressed in a machine-independent format. The code optimizer 106 accepts the intermediate instruction set and performs various transformations to schedule the instruction set in a faster and more efficient manner. Finally, the backend 108 accepts the optimized intermediate code and generates a target executable program 110 which includes a set of machine instructions in binary format which can be executed on a specific target machine (i.e. SPARC, Intel, PowerPC, MIPS etc.). Each machine instruction includes an operation code (opcode) portion and an operand portion containing one or more operands. The opcode portion of the machine instruction instructs the target machine to execute specific functions. The operand portion of the instruction is used to access data stored in the registers during execution.
Efficient execution of the optimized target computer program is limited, in part, by the availability of registers. These registers are located on board the processor and can store and retrieve data faster than memory or any other storage subsystem. During compilation the backend 108 allocates the registers to instructions which store data and operands in registers. Accordingly, an executable computer program would run most efficiently if the backend could allocate registers to all instructions.
Register pressure is a condition which occurs when the mixture of instructions provided to the processor demands more registers than immediately available. When registers are unavailable the instructions must store data in storage mediums having slower read and write access times such as memory or disk thus creating register "spill over". Significant register "spill over" can cause even a highly optimized target executable program to run less efficiently as the latency in retrieving data from these slower storage mediums becomes the processing bottleneck.
For example, assume a multiple issue processor 200 in FIG. 2A has 32 registers in a register set 208 and processor elements 210 are capable of issuing eight instructions each machine cycle. Accordingly, an instruction group 214 in FIG. 2B contains eight instructions (N=8) which can be executed simultaneously on multiple issue processor 200. Further assume, each instruction retains only two registers for a period of four instruction cycles to complete execution. Consequently, an eight issue multiple issue processor 200 must allocate sixteen registers each machine cycle. This translates to 64 registers each four machine cycles. Unfortunately, all 32 of the available registers on the processor are allocated after only the first two machine cycles. Any instruction issued after the first two machine cycles must "spill over" any operands or data into slower storage such as memory or disk until registers become available once again. Performance on this processor becomes limited by the access times of the numerous storage devices on the system.
Register pressure may be reduced in several steps. First, a second or extended set of registers is added to multiple issue processor 200. For example, in the previous example a second set of 32registers can be added to the first set of registers and make 64 registers available on the processor. Unfortunately, this procedure is not the most difficult aspect of reducing register pressure on a processor. In fact, those skilled in the art of chip fabrication can add a second register set on a processor relatively easily using numerous techniques well known in the art.
Addressing the second set of registers is the more difficult problem. Unfortunately, conventional instruction layout formats 216 are unable to address the second set of registers. For example, in FIG. 2B only five bits are reserved in the DR register address field 216B and five bits are reserved in the SR register address field 216C. A five bit register address field can only address the 32 registers in the first set of registers. More than five bits are required to address the second or extended set of registers on a processor. Accordingly, there is a need to devise a method of addressing a second set of registers on multiple issue processor 200.
In the past, a second set of registers on a processor were addressed by adding additional bits to the length of each instruction in the instruction set. These wider instructions used the additional bits to address the registers in the second set of registers. The normal progression is to increase the bit length of instruction from 32 to 64 bits. For example, assume 3 additional bits are required to address additional registers located on a processor which executes 32 bit instructions. Traditionally, these 3 additional bits would be accommodated by increasing the instruction length to 64 bits.
Unfortunately, increasing the bit width of an instruction has several disadvantages. First, processors with wider instruction words are typically incompatible with legacy software compiled for older processors with shorter instruction words. Generally, processors with wider (e.g. 64 bit) instruction words are unable to execute software compiled for narrower instruction words (e.g. 32 bit) and vice-versa. As a result, software developers must recompile each software program for each type of computer. Users and software developers can not enjoy the luxury of using a single executable program across a family of computers. Furthermore, it is more difficult for hardware manufacturers to sell new computers if they can not execute a user's existing software applications.
Second, the increased instruction width is also undesirable because computer programs with wider instruction words require twice the amount of storage in memory and disk. For example, a binary 64-bit instruction word takes up approximately twice as much storage as the corresponding 32-bit instruction word. It would be desirable to address an extended number of registers on a processor without having twice the storage and memory requirements.
Third, increasing the width of the instruction is undesirable because it may increase the cost of the computer system. Increasing the width of an instruction also requires that various bus widths within the computer increase. A computer system with wider bus widths runs more efficiently because wider instruction words can be fetched and processed in less cycles than on a narrower bus architecture. However, this will cost the computer manufacturer a great deal of money redesigning the processor, busses and all the related computer cards and peripherals to accept the wider instruction words. Ultimately, these higher costs will be passed along to the consumer. It would be advantageous to have an instruction set address an additional set of registers without having to redesign various components within the computer system.
The present invention provides techniques to address additional registers without the previously mentioned disadvantages. The present invention uses an elegant method and apparatus to address an extended set of registers on a processor without increasing the width of the instruction.