1. Field of the Invention
This invention relates to computer systems, and more particularly to arithmetic circuits in computer processors.
2. Description of the Related Art
In computing systems, processor performance may have a significant impact on the overall performance of the system. One important component of a processor's performance is the speed with which it performs arithmetic operations. If a processor exhibits poor arithmetic performance, then overall performance of the processor is likely to be relatively poor as well. While some arithmetic operations may be implemented in software or microcode, others may be implemented in hardware. Typically, arithmetic operations implemented in hardware are faster than those implemented in software or microcode.
One arithmetic operation which is often studied and sought to be improved is division. Frequently, integer divides are implemented by microcode routines using adds and shifts. However, such approaches generally require one clock cycle for every bit in the dividend. For example, in a standard divider, dividing a 128-bit dividend by a 64-bit divisor would require approximately 128 cycles in order to complete the divide. Even if the divider were configured to operate on n bits per cycle, the latency would still be approximately 128/n cycles—regardless of the value of the dividend. Consequently, such approaches tend to be relatively slow.
Accordingly, an efficient method and mechanism for performing division is desired.