1. Field of the Invention
This invention relates to processors and, more particularly, to execution of floating-point arithmetic instructions.
2. Description of the Related Art
In many processor implementations that include support for floating-point arithmetic, different types of floating-point instructions are provided with distinct and dedicated resources for execution, often in distinct execution pipelines. For example, common algorithms for evaluating divide and square-root instructions are iterative in nature and typically do not overlap well with other instructions such as addition and multiplication, particularly when the latter functions are pipelined. Consequently, divide and square root instructions may be implemented in one execution unit according to one pipeline, while other instructions may be implemented in another execution unit according to a different pipeline. However, completely segregating instruction implementation in this manner may result in increased implementation area due to the costs of providing independent resources to each execution unit, such as dedicated mantissa and exponent computation resources.
Additionally, in some embodiments, one set of separately implemented instructions may execute with longer latency than another. For example, some iterative division algorithms may produce only one or two quotient bits per execution cycle and are difficult to parallelize, in contrast to operations such as, e.g., multiplication. Depending on the frequency of occurrence of such longer-latency instructions, incurring the latency of completely executing an instruction where such latency is arithmetically unnecessary may degrade overall processor performance.