1. Technical Field
The present invention relates to a data processing apparatus and method for performing a reciprocal operation on an input value to produce a result value.
2. Background
There are a number of data processing applications where it is often required to perform reciprocal operations, a reciprocal operation being an operation of the form 1/Fn(d), where d is the input value. Two such reciprocal operations that are often required involve computing the reciprocal of the input value, i.e. 1/d, or the reciprocal square root of the input value, i.e. 1/√{square root over (d)}. These particular two reciprocal operations are often used, for example, in graphics processing applications.
Dedicated hardware may be developed to perform such reciprocal operations but typically it is desirable to keep the data processing apparatus as small as possible, and to re-use hardware logic wherever possible.
A known technique for determining the results of complex functions such as reciprocal and reciprocal square root functions, which does not require dedicated hardware, employs iterative execution of a computation in order to converge on the result value. One particular such iterative process is commonly referred to as the Newton-Raphson method. In accordance with the Newton-Raphson method, an initial estimate of the result value is made, and then a refinement step is iteratively executed in order to converge on the actual result value.
The Motorola AltiVec technology uses such a Newton-Raphson refinement technique for evaluating reciprocal and reciprocal square root functions. In accordance with the approach taken by the Motorola AltiVec technology, a number of instructions are issued to load required constant values into registers, and then to determine an initial estimate value for the result value, whereafter a sequence of multiply-accumulate instructions are issued to perform the refinement step.
In a data processing apparatus it is typically desired to reduce the power consumption of the data processing apparatus whilst also increasing the speed of operation of the data processing apparatus. With regard to the handling of reciprocal operations such as those discussed above, it would be desirable to increase the code density of the code required to implement the reciprocal operation and also improve the efficiency of use of the registers, to thereby yield a reduction in power consumption and an improvement in speed of operation. With regard to the use of registers, it should be noted that efficiency is particularly impacted, since each time the refinement step is performed, any constant value that has been loaded in a working register will typically be overwritten during execution of the refinement step, and accordingly if the refinement step needs to be repeated again, the required constant needs to be loaded again into a working register.
U.S. Pat. No. 6,115,733 describes a technique for calculating reciprocals and reciprocal square roots. The Newton-Raphson approach is again used, and the refinement step includes as part of its process the multiplication of two values to produce a product, which is then subtracted from a constant. This would typically be implemented by some form of multiply-accumulate operation, with the required constant having been first loaded into a working register. However, in accordance with the technique in U.S. Pat. No. 6,115,733, such a multiply-accumulate operation is not performed, and instead an estimate of the result of such a multiply-accumulate operation is generated by instead merely performing the required multiplication, and then inverting the result to generate an approximation of the result of the multiply-accumulate operation. This avoids the need to load the required constant into a register and hence makes more efficient use of the register file, and also removes the need for a load instruction to be performed to load a constant into a register.
Hence, the technique in U.S. Pat. No. 6,115,733 will provide some improvement in code density, and will make more efficient use of the working registers. However, to enable this improvement to be obtained, the technique in U.S. Pat. No. 6,115,733 replaces the required computation of a portion of the refinement step with a different computation which will produce an approximation of the result that would have been performed had the true computation been performed.
Accordingly, it would be desirable to develop a technique which enabled such improvements in code density and efficient use of registers to be achieved, but without the need to replace the required computation with a different computation that merely produces an approximation of the result that would have been produced by the true computation performed as part of the refinement step.