The present invention relates to execution of computer instructions.
An important goal in computer design is increasing the computer throughput, that is, the number of computer instructions completed per clock cycle. Another important goal is reducing the instruction execution latency.
In the past, throughput and latency have been improved by parallelism, that is, by making instruction execution units perform different operations in parallel. For example, in some floating point multipliers, generation of the exponent of the result is done in parallel with multiplication of the significands of the operands. It is desirable to further increase parallelism in order to improve throughput and latency.
In the past, multiplication involving denormalized numbers required special processing which was done by software. Software processing reduced multiplier speed. To increase the speed, some multipliers replaced denormalized numbers by zero. This, however, resulted in a loss of precision. Therefore, it is desirable to do full processing of denormalized numbers by hardware to achieve a high speed without a loss of precision.