To represent a large dynamic range of numbers with relatively few bits, floating point representation can be used to explicitly encode a scale factor in each number. A floating point number includes a mantissa, an exponent, and sign bit that indicates the sign of the mantissa. In contrast, integer instructions, and other non-floating point instructions, typically do not include exponent bits. Examples of floating point numbers include (1) single precision floating point real numbers, (2) double precision floating point real numbers, and (3) extended precision floating point real numbers.
A computer instruction written in a floating point format typically requires more processor clock cycles to complete than a corresponding instruction written in an integer or non-floating point format. For example, instructions requiting the addition, subtraction, multiplication, or division of floating point numbers each require the execution of an algorithm with several steps. One of the steps is the normalization of the result. A non-zero floating point number is normalized if the left-most bit of the mantissa is non-zero. The normalized representation of zero is all zeroes. A denormalized number is a number not in the normalized format.
A computer can have integer instructions and floating point instructions intermixed. For example, a series of integer instructions can follow a floating point instruction. As discussed above, floating point instructions typically take longer to execute than integer instructions. For example, an integer ADD instruction typically takes one clock cycle. On the other hand, a floating point ADD instruction typically takes 8 to 10 clock cycles to complete. An integer LOAD instruction typically takes one clock cycle to complete. On the other hand, a floating point LOAD instruction typically takes multiple clock cycles to complete. Moreover, typically 25 to 30 percent of all instructions in a work station environment are floating point instructions.
In one past approach, floating point instructions are handled by a separate chip such as the 80387 80-bit CHMOS 111 Numeric Processor Extension sold by Intel Corporation of Santa Clara, Calif. Integer instructions, however, are handled by a separate main microprocessor, such as the 80386 32-bit CHMOS microprocessor sold by Intel Corporation.
The 80387 is a co-processor. The 80386 microprocessor decodes a floating point instruction and passes to the 80387 all the relevant information from the floating point instruction needed by the 80387 to execute the floating point instruction. Once that information is passed from the 80386 to the 80387, the 80386 can proceed to execute any subsequent integer instructions until the 80386 reaches the next floating point instruction. The 80387 has its own control read-only memory ("ROM") and control logic, and the 80386 in turn has its own control ROM and control logic.
Thus, with the prior two chip 80387 and 80386 approach, floating point instructions are executed in parallel with non-floating point instructions. The passing of relevant floating point information between the 80386 and the 80387 imposes a significant performance penalty, however, from an overall system performance point of view in the form of interface overhead.
In some other prior approaches, a floating point unit is placed on the same chip as the microprocessor. This removes the interface overhead that would otherwise occur if the floating point unit was on a separate chip. In those past approaches that put the floating point unit on the same chip as the microprocessor, the floating point instructions are not executed in parallel with the non-floating point instructions, however. Instead, all instructions are executed sequentially. In other words, the microcomputer waits for the execution of a floating point instruction before moving on to the next instruction. Therefore, although some performance is gained by removing interface overhead, some performance is lost because of the lack of parallelism.
Furthermore, although the floating point unit is on the microprocessor chip in those past non-parallel approaches, the floating point unit nevertheless has its own control ROM and control logic, and the microprocessor in turn has its own separate control ROM and control logic.
If the floating point unit could be placed on the microprocessor chip in a way that floating point instructions could be executed in parallel with integer instructions, there would be a gain in performance. One way to do this might be to introduce parallelism but yet have two separate microcoded control ROMs--namely, a floating point control ROM and an integer control ROM.
One disadvantage of this multiple control ROM approach is that it would require the duplication of the control logic--there would need to be control logic at the periphery of each control ROM.
Another disadvantage of the multiple control ROM approach is that it would add to the complexity of the "who is in charge" decision.
A further disadvantage of this multiple control ROM approach is that it would require additional hardware to allow the sharing of resources, and such hardware would be complex and take additional space in silicon. For example, if a floating point execution unit and an integer execution unit were to operate at once, those units might need to share the same bus or the same addressing unit. Complex circuitry would be required to oversee such sharing of resources.
Another consideration with respect to floating point units is that exception conditions must be handled somehow, regardless of whether a parallel or non-parallel approach is used. Although the prior non-parallel instruction execution method imposed a performance penalty, the handling of exception conditions is nevertheless a straightforward task if there is no parallelism. Exceptions are handled as soon as they arise if there is no parallelism.
Examples of those exceptions are invalid operation, denormalized operand, zero divisor, overflow, underflow, and inexact result in terms of precision. Microcode is used to assist the hardware in handling the exceptions. The exceptions are divided into two types of problems: (1) pre-execution assist and (2) post-execution fault. With pre-execution assist, the microcode corrects the problem before execution is finished. With post-execution faults, the problems are corrected after instruction execution.