Many of today's high-performance digital computing systems incorporate some form of digital numeric processor capable of division and square-root taking. Such calculations are generally performed by an iterative convergent process such as that depicted in FIG. 1.
FIG. 1 shows a generalized technique 100 for performing an iterative convergent calculation comprising an input multiplexer 110, and computing blocks 120 and 130, each having two inputs and one output. Multiplexer 110 selects between two input values: an initial value (or "seed") 140 and a feedback value 180 (which is the output of computing block 130) and presents the selected input value as its output value 190. (Multiplexer 110 is a conceptual device in this representation, and merely represents the ability to select between two values, which may be accomplished by any suitable technique).
Computing block 120 operates on its two input values "X" 190 and "P2" 160 and produces an output value "Y" 170 according to the expression Y=F(X,P2). "P2" 160 is representative of one or more external parameters values. Similarly, computing block 130 operates on its input values "Y" 170 and "P1" 150 producing the output value "Z" 180 according to the expression Z=G(Y,P1). "P1" 150 is representative of one or more external parameters values. This computing apparatus 100, as shown, forms an iterative calculation loop along the path defined by multiplexer 110, value "X" 190, computing block 120, value "Y" 170, computing block 130, and value "Z" 180.
As previously stated, FIG. 1 represents a generalization of simple iterative calculation techniques, and is not necessarily physically representative of any particular hardware implementation of these techniques. Numerous implementations of numeric processors capable of performing iterative calculations according to the generalized technique of FIG. 1 are well known in the prior art, and may be readily implemented by one skilled in the art.
Taking, for example, the problem of taking a square root of a number by an iterative computation, the technique of FIG. 1 may be applied. Several commonly used algorithms are suitable for this purpose, including: Newton's method, the Newton-Raphson method, and the Goldschmidt algorithm; all iterative convergent techniques which may be represented by the technique of FIG. 1. The simplest (though least efficient computationally) of these is Newton's method. The Newton-Raphson method and Goldschmidt algorithm are similar techniques which converge faster (in fewer iterations and/or with fewer calculations per iteration). Newton's method, however, is easiest to describe and is exemplary of these iterative methods.
Newton's method, as applied to the taking of square roots may be described as follows: ##EQU1## where "Q" is the value whose square root is to be taken, "G.sub.o " is the last guess (old guess) at the square root, and "G.sub.n " is the next guess at the square root. Each successive guess "G.sub.n " closer approximates the square root of "Q". In order to apply this method to the apparatus of FIG. 1, "P1" 150 is set equal to the value of the number "Q" whose square root is to be taken, computing block 130 is set up such that: ##EQU2## and computing block 120 is set up such that: EQU F(X, P2)=X
effectively eliminating computing block 120 (it is not needed for this particular calculation).
Note that for this type of calculation, an initial guess or "seed" 140 is required. For the first iteration, multiplexer 110 is set to use the "seed" 140 as its output 190. For all other iterations, multiplexer 110 is set to use the result of the last iteration 180 "Z" as its output 190 "X". Each iteration brings the computed value of the next guess 180 "Z" closer to the actual square root of the value "P1" 150. (Since the multiplexer is set to use "Z" 180 as its output "X" 190, and since computing block 120 is set to simply copy that signal to its output value "Y" 170, "Y" 170 also represents the result of the iterative calculation.) The closer the initial "seed" value 140 is to the square root of "P1" 150, the fewer iterations it takes to converge.
Typically, "convergence" is determined by one of a few methods, including:
1) Successive guesses G.sub.n are compared until the difference between them falls within a pre-specified tolerance range, at which point iterations are terminated and the last guess is taken as the result of the iterative computation; PA1 2) Successive guesses G.sub.n are squared in an attempt to recreate the input operand "Q" on "P1" until Q-G.sub.n.sup.2 falls within a certain tolerance range, at which point iterations are terminated and the last guess is taken as the result of the computation; or PA1 3) A fixed number of iterations is executed, where the number of iterations is chosen according to a worst-case analysis of the number of iterations required to provide a result of sufficient accuracy.
For any such iterative technique, an initial seed value is required. Ultimately, the better the seed selection process, the fewer the number of iterations required for the computation to converge. Generally, a seed value that starts the iterative computation process somewhere "in the ballpark" of the ultimate result will provide good performance. One commonly used technique for selecting seed values is shown in FIG. 2. Apparatus 200 for selecting a seed value 230 comprises a Read-Only Memory (ROM) 210 connected such that selected bits 220a (the most significant bits, or MSB's) of an input operand 220 are presented to the input of the ROM (the address inputs). A set of seed values are stored in the ROM according to the magnitude of the value of their address. The addressed seed value 230 is then used as the initial value for the iterative computation.
Typically, due to space and cost considerations, the ROM is a small one. Take, for example, the case where a 19 bit input operand is used in conjunction with a 256 by 8 bit ROM. If such a ROM is used, the 8 most significant bits of the input operand are used as the address input of the ROM. The output of the ROM only provides 8 bits, so this 8 bit seed value is typically used as the most significant 8 bits of a 19 bit seed value, the remainder of which seed value is padded with zeroes or ones. In this manner, one of 256 seed values is supplied according to the magnitude of the input operand.
One method of improving the initial selection of seed values is simply to provide a larger ROM which stores more seed values and/or more bits per seed value. However, this technique is generally not practical, due to the large size of the ROM which would be required. (In the extreme, a single ROM containing full precision results for each operand value could be provided, eliminating the need for iterative calculations at all, but this is extremely disadvantageous from a cost/size point of view.)
However, the results of square root operations are not linearly proportional to the value of the input operand, but are logarithmically related to the magnitude of the input operand. Ideally, a technique for selecting seed values for square root computations would provide for finer resolution where successive values are closely spaced, and coarser resolution where successive values are widely spaced. On the other hand, for some calculations, linear spacing of seed values is desirable. In these cases, variable resolution is disadvantageous.
It is known in the prior-art to expand the dynamic range of seed values by taking a raw seed value and squaring it via the same multiplier circuitry which forms a part of the iterative computation mechanism to create an expanded seed value, but this technique ties up the iterative computation mechanism for additional cycles, thereby negating at least some of the savings in iterations which would be realized by improving the dynamic range of the seed value.
An ideal seed selection technique would allow for linear or variable resolution seed selection, while minimizing the amount of ROM and/or total circuitry required to accomplish this. Unfortunately, prior art seed selection techniques do not provide this capability.