There are many situations in which a computer system needs to perform a reciprocal square root operation. To give just some examples, numerical analysis, complex number computations, statistical analysis, computer graphics, and signal processing are among the fields where reciprocal square root operations are often performed by computer systems. As an example, a computer system may perform a reciprocal square root operation using a converging approximation technique which may use a quadratic convergence algorithm such as a Newton-Raphson technique or a Goldschmidt technique. In particular, the converging approximation technique may converge towards a result (e.g. a floating point result) from below, such that a proposed result provided by the converging approximation is never too large, i.e. it is either correct or it is too small. The proposed result provided by the converging approximation technique can be rounded, in accordance with a rounding mode, to provide a rounded proposed result. The rounding mode may, for example, be a round away from zero, a round towards zero or a round to nearest mode.
For example, the rounded result may have k bits, and the rounding mode may be a round towards zero mode. One way of obtaining a result that is always correct would be to obtain an infinitely precise result and then that infinitely precise result could be truncated to k bits of precision. However, in real computer systems, an infinitely precise result is often not obtainable, and so an approximation of the result is computed to at least k+1 bits and then that approximation can be truncated to obtain the result with k bits. In this way, the correctly rounded result is either the obtained k-bit result, or the obtained k-bit result plus one in the least significant bit.
A converging approximation technique (such as the Newton Raphson technique) receives an input value b and can approximate a value for 1/√{square root over (b)}, (denoted r). The result, r, is a rounded k-bit result. A check procedure could be carried out to determine whether the correctly rounded result is the obtained k-bit result, or the obtained k-bit result plus one in the least significant bit (denoted r+u, where u is an increment in the least significant bit position of r).
In a round towards zero mode, the converging approximation technique determines a result for 1/√{square root over (b)} which has more than k bits of accuracy and then truncates that result to determine r. Due to the nature of the converging approximation technique, in the round towards zero mode it is known that r≤1/√{square root over (b)}. In the round towards zero mode, if r+u>1/√{square root over (b)} then r is the correctly rounded result, whereas if r+u≤1/√{square root over (b)} then r+u is the correctly rounded result. Therefore, in the check procedure, an error value e is considered whereby 1/b=(r+u−e)2, wherein due to the nature of the converging approximation technique, e could be positive or negative and |e|≤u. If e is positive then r is the correctly rounded result, whereas if e is negative (or zero) then r+u is the correctly rounded result. In the round towards zero mode, a check parameter, g, is defined as:g=(r+u)2b−1.  (1)
It can be shown that g=e(2r+2u−e)b, such that g has the same sign as e (since b is positive and so is (2r+2u−e)). A computation of g in accordance with equation 1 would involve a multiplication of three values: (r+u), (r+u) and b. As described below, a multiplication of three values is not trivial to compute accurately in typical hardware.
In a round away from zero mode, the converging approximation technique determines a result for 1/√{square root over (b)} which has more than k bits of accuracy and then truncates that result and adds one unit of least precision (ulp) to determine r. In the round away from zero mode, if r>1/√{square root over (b)} then r is the correctly rounded result, whereas if r≤1/√{square root over (b)} then r+u is the correctly rounded result. Therefore, in the check procedure, an error value e is considered whereby 1/b=(r−e)2, wherein due to the nature of the converging approximation technique, e could be positive or negative and |e|≤u. If e is positive then r is the correctly rounded result, whereas if e is negative (or zero) then r+u is the correctly rounded result. In the round away from zero mode, a check parameter, g, is defined as:g=r2b−1.  (2)
It can be shown that g=e(2r−e)b, such that g has the same sign as e (since b is positive and so is (2r−e)). A computation of g in accordance with equation 2 would involve a multiplication of three values: r, r and b. As described below, a multiplication of three values is not trivial to compute accurately in typical hardware.
In a round to nearest mode, the converging approximation technique determines a result for 1/√{square root over (b)} which has more than k bits of accuracy and then adds one half unit of least precision (u/2) and then truncates that result to determine r. In the round to nearest mode, if r+u/2>1/√{square root over (b)} then r is the correctly rounded result, whereas if r+u/2<1/√{square root over (b)} then r+u is the correctly rounded result. Therefore, in the check procedure, an error value e is considered whereby 1/b=(r+u/2−e)2, wherein due to the nature of the converging approximation technique, e could be positive or negative and |e|≤u. If e is positive then r is the correctly rounded result, whereas if e is negative then r+u is the correctly rounded result. In the round to nearest mode, a check parameter, g, is defined as:g=(r+u/2)2b−1.  (3)
It can be shown that g=e(2r+u−e)b, such that g has the same sign as e (since b is positive and so is (2r+u−e)). A computation of g in accordance with equation 3 would involve a multiplication of three values: (r+u/2), (r+u/2) and b. As described below, a multiplication of three values is not trivial to compute accurately in typical hardware.
A standard hardware multiply unit is configured to receive inputs containing up to a number of bits (i.e. up to k bits) and to provide an output having ≤2k bits. To then multiply the 2k-bit output with another input containing ≤k bits would require a multiply unit which could receive such inputs and provide a result having ≤3k bits. However, such an increase in the size of the multiply unit in typical hardware environments is usually not justifiable due to the increase in area and heat generation. Another approach could be to truncate the first 2k-bit output such that it has only k bits and then perform the second multiplication on the truncated value. However, some accuracy will be lost by the truncation. It is noted that in the reciprocal square root check procedure the result of the multiplication of the three values will be very close to 1, and it is compared with 1 in order to determine the sign of g, such that the inaccuracies introduced by the truncation of the first multiplication will render the check procedure too unreliable to be of use, hence reducing the number of bits in the first multiply is not an option since the result would not be exact. Therefore, typically, no check procedure is carried out on the result of the converging approximation technique used to determine a reciprocal square root. The reciprocal square root operation is typically referred to as a “reciprocal square root approximation” because no check procedure is performed so the result may be inaccurate by one unit of least precision, i.e. by u.