This patent application incorporates a sixty one (61) page appendix entitled xe2x80x9cAPPENDIX Axe2x80x9d and referred to hereafter as xe2x80x9cAppendix A.xe2x80x9d
Some RISC (Reduced Instruction Set Computer) microprocessors have FPU""s (Floating Point Unit). A floating point unit is a circuit for executing floating point computations. RISC is a computer architecture that uses relatively simple, fixed size instructions to reduce the complexity of the microprocessor. Most instructions in a RISC architecture operate on operands available in general purpose registers and store result in a register. These registers are loaded from memory and typically register contents are reused during execution of a program. Most RISC architectures have 16 or more general purpose registers.
Typical RISC microprocessors have the capability to pipeline instruction execution. There are a number of problems in coordinating the activities of multiple function units (e.g., an integer pipeline of a CPU and a floating point pipeline). If any of the two units in such a machine share resources, then synchronizing the activities of the two pipelines plays a major role in the solution to the problems.
Another problem is maintaining precise exception semantics. Handling exceptions or interrupts precisely on pipelined or multi-function unit architecture implies that when an exception or interrupt occurs it should be possible to save the state of the machine which should be precisely the same as the program executing on a completely sequential version of the architecture. Even if the instruction issuing order to the function units maintains strict program order, the instruction completion (or state updating) order could be out of order due to differences in execution time of instructions in different function units. Several effective means of implementing precise interrupts in pipelined processors have been discussed in the article xe2x80x9cImplementing Precise Interrupts in Pipelined Processors,xe2x80x9d IEEE Transaction on Computers, pp. 562-573, May 1988. Most of the modern pipelined multifunction unit processors implement variations of the techniques presented in this reference.
Some of these techniques require additional register files and, significantly, complex logic for control. Typically, synchronization of resource sharing requires a tag matching hardware at the inputs of function units as well as more complex internal data buses connecting the shared resources. Other techniques use register score boarding for identifying and resolving register resource conflicts. These techniques, in essence, require additional die area and are not suitable for inexpensive processors meant for embedded applications.
Floating point instructions in typical RISC architectures have a length of at least thirty-two bits. An example of such a RISC microprocessor is a Power PC. Power PC""s were introduced by IBM and Motorola. Similarly, MIPS, another RISC-based microprocessor, also requires thirty-two bits for each floating point instruction. MIPS microprocessors are made by MIPS Computer Systems, Inc., of Sunnyvale, Calif.
FIG. 17 illustrates a typical 32-bit length floating point instruction 1710 for the Power PC. Seventeen bits of instruction 1710 are dedicated to the operation code 1714. Fifteen bits 1718 of the floating point instruction 1710 are used to address registers. The operation code 1714 of the floating point instruction 1710 operates on the contents of registers addressed using the fifteen bits 1718 to perform the floating point instruction 1710.
One reason that RISC architectures typically require at least thirty-two bit long floating point instructions is because such instructions typically use three operands with registers selected from a bank of thirty-two floating point registers. To address thirty-two registers requires five bits. So, selecting each operand from thirty-two bit registers already requires fifteen bits. Obviously, additional bits are required for the operation code 1714.
There is a related issue of transfer of data between registers of the FPU and registers of a CPU (Central Processing Unit) of the RISC microprocessor. An example of a register is an array of latches. Typically, a floating point unit has registers for storing data in floating point format. Similarly, a CPU has integer registers for storing data in integer format. Transfers of data between integer and floating point registers usually occur via the memory unit, such as cache memory of the RISC microprocessor. For instance, when the FPU needs to transfer data to the CPU, the FPU first transfers data from a floating point register to the cache memory. Second, the CPU retrieves this data stored in the cache memory for storage in the CPU register. However, access to cache memory for data storage or retrieval is relatively slow compared to data access for storage or retrieval from a register. Moreover, the capability to access memory requires die area for the memory access circuits for the FPU and the CPU. But die area is at a premium in, for example, embedded applications. Embedded applications are those where, for instance, a processor is dedicated to a particular function, such as a game. Some more complex RISC processors dedicate a direct path for data transfer between the CPU and the FPU registers. However, this additional path requires an increase in die area.
A processor uses a floating point pipeline to execute floating point operations and an integer pipeline to execute integer and memory addressing operations. The floating point pipeline is synchronized with the processor pipeline. Principally, synchronization of the FPU pipeline and the CPU pipeline is achieved by having stalls and freezes on either one of these pipelines effect stalls and freezes on both pipes.
This invention further relates generally to a 32-bit RISC architecture with a 16-bit fixed length floating point instruction set. Reducing the floating point instruction length to only sixteen bits saves memory space for storage of a computer program. For example, reducing the floating point instruction length from thirty-two bits to sixteen bits cuts the memory required for storing these instructions by half. Reducing instruction size reduces the cache miss rate, because more instructions can be stored in the cache memory. Furthermore, reducing the floating point instruction length improves the instruction fetch latency. The 16-bit instructions are fetched in 32-bit blocks. Consequently, a single fetch from memory can obtain two instructions, whereas for 32-bit instructions it is possible to fetch only one instruction per memory access. Reducing the floating point instruction length permits reduction in the size of the memory required for storing the floating point instructions, thus reducing the die area used for on-chip cache memory.
For efficiency, this embodiment may not support all of the exceptions of the IEEE floating point standard No. 754. Also, all of the IEEE rounding modes are not necessarily supported. Similarly, if the result of a value generating floating point operation is a denormalized number, it is flushed to zero. All these above deviations from the IEEE floating point standard save die area and execution cycle time.