1. Field of the Invention
The present invention relates to a processor architecture, and more particularly to a processor architecture combining floating point functional units and non-floating point functional units.
2. Description of the Related Art
Processors generally process a single instruction of an instruction set in several steps. Early technology processors performed these steps serially. Advances in technology have led to pipelined-architecture processors, which may be called scalar processors, which perform different steps of many instructions concurrently. A "superscalar" processor is also implemented using a pipelined structure, but further improves performance by supporting concurrent execution of scalar instructions.
In a superscalar processor, instruction conflicts and dependency conditions arise in which an issued instruction cannot be executed because necessary data or resources are not available. For example, an issued instruction cannot execute when its input operands are dependent upon data calculated by other instructions that have not yet completed execution.
Superscalar processor performance is improved by the speculative execution of branching instructions and by continuing to decode instructions regardless of the ability to execute instructions immediately. Decoupling of instruction decoding and instruction execution requires a buffer between the processor's instruction decoder and the circuits, called functional units, which execute instructions.
Floating point functionality has been available in nonsuperscalar computers and processors for many years. Microprocessors typically perform floating point and integer instructions by activating separate floating point and integer circuits. A standard for floating point arithmetic has been published by Institute of Electrical and Electronic Engineers in "IEEE Standard For Binary Floating-Point Arithmetic", ANSI/IEEE Standard 754-1985, IEEE Inc., 1985. This standard is widely accepted and it is advantageous for a processor to support its optional extended floating point format.
Some computers employ separate main processor and coprocessor chips. The main processor reads and writes to a floating point register stack to effect floating point operations. For example, an 80386 main processor, which is a scalar microprocessor and an 80387 math coprocessor are available from various manufacturers. The math coprocessor controls floating point operations initiated upon a request from a main processor. The main processor accesses a register stack, which includes eight registers for holding up to eight floating point values that are stored in double extended format. A 32-bit single precision or 64-bit double precision value is loaded from memory and expanded to 80-bit double extended format. Conversely, the double extended value is shortened and rounded to a single or double precision value as it is stored in memory.
A Pentium.TM. microprocessor, available from Intel Corporation of Santa Clara, Calif., is a superscalar processor which executes mixed floating point and integer instructions by controlling the operation of two instruction pipelines. One of the pipelines executes all integer and floating point instructions. The second pipeline executes simple integer instructions and a floating point exchange instruction.
It is desirable to incorporate a floating point functional unit with several integer functional units in a superscalar processor. W. M. Johnson in Superscalar Processor Design, Englewood Cliffs, N.J., Prentice Hall, 1991, p. 45, provides two sets of processor functional blocks, an integer set structured on 32-bit units and busses and a floating point set organized into 80-bit structures. In a superscalar processor, the floating point and integer sets each require separate register files, reorder buffers and operand and result busses. Floating point instructions are dispatched by an instruction decoder within the floating point set. A separate instruction decoder is provided in the integer set of units. The Johnson approach supports floating point arithmetic in a processor which incorporates a superscalar architecture, decoupling of instruction decoding and instruction execution, and branch prediction. The considerable performance advantages of this approach are achieved at the expense of duplicating resources. Moreover, some reduction in performance arises from coordination of operations between the integer and floating point sets of functional blocks.