1. Field of the Invention
The present invention relates to a method for performing a division operation in a system, and, more particularly, to a method for performing a division operation in a system having a processor that does not have a native division function.
2. Description of the Related Art
Electronic systems, such as for example, those utilizing embedded scanning, image processing, and printing algorithms, commonly use mathematical functions, such as division. As specific examples, division operations are commonly used for calculating averages in scanner calibration algorithms, printhead dot count calculations, and printhead carrier velocity determinations.
Existing division algorithms for digital designs generally fall into two categories: slow division and fast division. Slow division algorithms, such as restoring, non-restoring, and SRT (Sweeney, Robertson and Tocher) division, produce one digit of the final quotient per iteration. With a slow algorithm, calculating a 32-bit quotient always requires 32 iterations. Fast division algorithms, such as Newton-Raphson and Goldschmidt division, approach the final quotient quadratically, with more quotient bits calculated for each iteration. With a fast algorithm, calculating a 32-bit quotient may require 6 or less iterations.
The tradeoff with performance is complexity. Slow division algorithms can typically be implemented with less complex logic than a faster algorithm. Slow algorithms require simple shift and subtract operations for each iteration, and can be implemented in hardware at a low gate cost.
In contrast, faster algorithms generally require more complex logic to achieve better performance. For example, the Newton Raphson method attempts to converge to the reciprocal of the divisor. Each iteration requires a multiply operation, a subtract operation, and a second multiply operation. The required size of the multipliers increases on successive iterations as the product size increases, so this type of implementation may quickly become impractical due to the increased gate cost or timing requirements of a large multiplier circuit. Convergence algorithms such as this often attempt to decrease the number of required iterations by obtaining a better initial approximation value from a Lookup Table (LUT), which consequently decreases the multiplier complexity. However, implementing these LUTs in a hardware design can significantly increase cost.
What is needed in the art is a division algorithm with a simpler hardware implementation than a typical fast algorithm, such as the Newton-Raphson method, and that offers better performance by requiring fewer iterations than a typical slow algorithm.