The divide function is one of evaluating how many times a given value can be added to itself to become equal to another. The fact that some values do not add an integral number of times to equal a specific value does not alter the concept. In general, the quotient of two values has a certain value irrespective of the system of units in which the quotient is expressed. Similarly, if a number is expressed in a certain radix system (such as Binary) and the number is then represented as a left-justified number times a power of the radix, there is no change in the generality of that number. This is a typical format for representing numbers in a computing system.
In many convention systems, division is achieved by a clocked sequence of steps. This is particularly intuitive because the steps are similar to how school children are taught to divide. The process is iterative meaning that the division is performed through trial and error. A specific structure tries to subtract the divisor from the dividend and reports its success or failure to a register that accumulates the quotient. Such an iterative system may take many steps to divide high precision numbers.
Additionally, many conventional systems use iterative approximation algorithms for division operations, and for other special mathematical functions, such as the square root, reciprocal square root, and reciprocal functions. Newton-Raphson iterative algorithms are used in many cases since they are often faster than other algorithms. Nevertheless, iterative algorithms may present other problems. For example, rounding errors can be introduced, and for division operations, they provide no remainder. Furthermore, iterative algorithms still have performance limitations. Depending upon the precision required, the complete iterative process can take a considerable number of processor cycles. The corresponding delay may affect some procedures, particularly those involving multimedia presentations. Some multimedia extensions to processor architectures also specify reciprocal and reciprocal square root instructions that require increased (12-bit) precision. To generate 12 correct bits using conventional table lookup techniques requires a table of 2048 entries, which introduces more hardware costs.
Another alternative involves the use of a linear approximation as described in U.S. Pat. No. 5,563,818. However, these implementations require large lookup tables and larger multipliers than the implementation that will be described. A system that performs a quadratic approximation is shown in U.S. Pat. No. 5,245,564. This technique is an extension of linear approximation, providing three times the precision of a table lookup for the same number of words at the cost of an additional multiply and add, but accordingly requiring a longer execution time. U.S. Pat. No. 6,240,338 describes another alternative that uses a memory containing estimated reciprocal terms and a second memory containing reciprocal error terms.
Many conventional systems include retrieval of the inverse by table look-up including enhancement by obtaining bits by gating. These methods lack true or direct computing of the quotient of the division calculating. Instead of performing a direct computation of the quotient, they use large amounts of storage and addressing for looking up the different values to determine the quotient. They do not offer advantages from algorithms and other computational benefits. Other processes have obtained the reciprocal with some enhancement or decrease in table size, but still using normalization, de-normalization, shifts and other steps with clocks or iterative techniques that must be integrated into the overhead of the application.
As such, there is a need for improved methods and apparatus for rapidly calculating quotients and for rapidly calculating reciprocals with a high degree of accuracy. Furthermore, there is a need for methods and apparatus that can perform such calculations without the need for iteration steps, (i.e., trial and error), as well as without the need for clocks or similar mechanisms used in typical computing devices. Still further, there is a need for efficient processors for performing these calculations so that the semiconductor real estate required for the apparatus is less than or equal to that for the standard technology.