The invention pertains to digital data processing and, more particularly, to improved methods and apparatus for generating square roots.
Many scientific and engineering computations involve square root calculations. These may be performed, for example, thousands or millions of times during the analysis of radar signals, processing speech or speech signals, processing video signals, or in solving other engineering and scientific problems.
Traditionally, such computationally-intensive applications have been serviced by workstations or other computing devices employing general purpose microprocessors that work in concert with DSPs (digital signal processors), custom ASICs (application specific integrated circuits) or other off-chip devices. The microprocessors perform instruction execution and provide overall control. The off-chip devices perform specialized mathematical computations such as generating square roots. More recently, array or vector processors have been developed that provide instruction execution, control and vector computation functions on a single chip.
In one common type of array processor, the SIMD (single instruction, multiple data) processor, a single instruction operates on a vector register that holds multiple values. This is in contrast to a conventional processor, in which each instruction operates on only a single data value. One class of recent SIMD processors employ the AltiVec(trademark) technology, of Motorola, that is capable of concurrently operating on vectors with 4, 8 or 16 values.
Processors such as the AltiVec(trademark) do not provide for the direct calculation of vector square roots. Instead, they provide only reciprocal square root operations. In order to determine the square roots of multiple values, it is thus necessary to load those values into a vector register, to invoke the AltiVec(trademark) vector reciprocal square root instruction (vrsqrtefp), and to calculate reciprocals of each of the values in the result vector.
Notwithstanding the benefits of array processors such as the AltiVec(trademark), the rapid determination of square roots, e.g., for multiple vectors or large sets of numbers, remains problematic. Where there is a risk that one of the operands may be zero, for example, reciprocal square root operations can produce unpredictable results and, in any event, necessitate that computer programs utilizing those operations execute extra instructionsxe2x80x94typically conditional branchesxe2x80x94to insure proper processing. These extra instructions can add significantly to overall processing time, especially, in applications requiring generation of thousands or millions of square roots.
In view of the foregoing, an object of the invention is to provide improved methods and apparatus for digital data processing.
A more particular object is to provide such methods and apparatus as permit the rapid generation of square roots.
Another object is to provide such methods as permit the generation of square roots of multiple vectors or other large sets of operands.
Yet another object is to such methods and apparatus as can be utilized with exisiting and future processors, including vector processors.
These and other objects are attained by the invention that provides, in some aspects, novel methods utilizing inter alia processor bit-manipulation operations, as well as floating-point or other arithmetic operations, to generate square roots of vector and non-vector operands. The methods have application, by way of particular example, with processor instruction sets that provide for reciprocal square root operations.
A method according to the foregoing aspects of the invention utilizes a bit-manipulation operation to halve an intermediate value, generated by a processor reciprocal square root operation, during a multistep process for determining square roots. Such a method can also multiply an original operand (whose square root is being determined) with such an intermediate value, e.g., or a halved or other value thereon. Use of such bit-manipulation and multiplication operations avoids risks that might otherwise be associated with performing a floating point or other arithmetic operations on the resultant value of a reciprocal square root operation whose original operand was zero. It likewise avoids the need to perform additional steps, e.g., conditional branching, to obviate such risks.
Still further related aspects of the invention provide methods for determination of a square root utilizing a sequence of steps that include one or more of the following instructions: (1) invoking a processor reciprocal square root operation on an operand x; (2) halving the resultant value generated by that processor operation, preferably, via a bit-manipulation operation on the value""s mantissa; (3) multiplying the halved value by the original operand, x; (4) estimating the square root of x by doubling the value generated in the prior step; (5) estimating an error value by substracting the multiplied results of the two prior steps from one-half; and (6) re-estimating the square root, with still greater precision, by multiplying the prior estimate by a value equal to one plus the estimated error value.
Methods according to the foregoing aspects are advantageous over the prior art.
Among other reasons, such methods permit the rapid generation of square roots without requiring branching, testing or other instructions that might otherwise consume needless processor resources. The methods provide such benefits while, at the same time, reducing the error otherwise associated with the processor instruction set.
Other aspects of the invention provide novel methods of simultaneously generating square roots of large groups of numbers using a vector processor. Methods according to these aspects interleave vector instructions for processing several vectors, taking advantage of necessary delays in the vector processing pipeline architecture to speed up overall processing.
In one such aspect of the invention, multiple sequences of the type described above are interleaved in an xe2x80x9cinner loopxe2x80x9d of a process for generating vector square roots, thus, enabling the determination of tens, hundreds, or thousands of such values in the minimal time and with the minimal computational resources.
In another such aspect, non-vector instructions such as loading data from memory to vector registers and storing data from vector registers to memory are scheduled to occur during the inherent latency period required for execution of vector instructions, thus, eliminating any additional time to attend to xe2x80x9cadministrationxe2x80x9d of the inner loop.
These and other aspects of the invention have the advantage of permitting rapid square root determinations without excessive use of system resources.