The demand for ever-faster computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. Microprocessor speeds have been increased in a number of different ways, including increasing the speed of the clock that drives the processor, reducing the number of clock cycles required to perform a given instruction, implementing pipeline architectures, and increasing the efficiency at which internal operations are performed. This last approach usually involves reducing the number of steps required to perform an internal operation.
Efficiency is particularly important in mathematical calculations, particularly floating point calculations. Some mathematical operations, such as multiplication and division, cause significant delays during program execution. A pipelined floating point unit (FPU) may be particularly susceptible to long delays during the execution of certain sequences of instructions. For example, a floating point “load” instruction may occur in a pipelined FPU immediately after, or shortly after, a floating point store instruction occurs. This is sometimes referred to as a “read-after-write” (RAW) hazard. The write (or store) operation to system memory may have a long latency before the write data is “committed” to system memory by the processor. The read (or load) operation following the write (or store) operation may occur before the write operation is complete and may, therefore, suffer significant delays waiting for the write operation to complete before the committed data may be read back from memory.
Therefore, there is a need in the art for an improved microprocessor that executes mathematical operations more rapidly. In particular, there is a need for an improved floating point unit that executes floating point operations as rapidly as possible. More particularly, there is a need in the art for a floating point unit that minimizes delays caused by writing data to memory.