1. Field of the Invention
This invention relates in general to the field of data processing in computers, and more particularly to an apparatus and method for performing integer division.
2. Description of the Related Art
Software programs that execute on a microprocessor consist of macro instructions that together direct the microprocessor to perform a function. Each macro instruction directs the microprocessor to perform a specific operation that is part of the function such as loading data from memory, storing data in a register, or adding the contents of two registers.
A macro instruction may prescribe a simple operation, such as moving the contents of one register location to another register location. In contrast, a different macro instruction may prescribe a complex operation, such as deriving the cosine of a floating point number. Compared to the manipulation of integer data, the manipulation of floating point data by the microprocessor is complex and time consuming. Movement of integer data requires only a few cycles of a microprocessor clock; derivation of a cosine requires hundreds of machine cycles. Because floating point operations are basically more complex than integer operations, typical microprocessors employ a dedicated floating point unit to improve the speed and efficiency of floating point calculations. The dedicated floating point unit may be part of the same mechanical package as the remainder of the microprocessor or it may reside in a separate mechanical package.
Within an x86-compatible microprocessor, a floating point macro instruction is decoded into a sequence of floating point micro instructions that direct the microprocessor to execute a floating point operation. The sequence of floating point micro instructions is passed to the floating point unit. The floating point unit executes the sequence of floating point micro instructions and provides a result of the floating point operation in a result register. Likewise, an integer macro instruction is decoded into a sequence of integer micro instructions that direct the microprocessor to execute an integer operation. The sequence of integer micro instructions is passed to the integer unit. The integer unit executes the sequence of integer micro instructions and provides a result of the integer operation in a result register.
In recent years, advances in integrated circuit design and manufacturing technologies have allowed designers to add increasingly more functionality to a microprocessor. More functions are added by simply adding more logic circuits to a device. Additional functions, in this sense, often result in a greater power requirement, which is inversely proportional to device reliability. Consequently, microprocessor designers are now searching for alternative ways to add functions to a device. Designers now use existing logic to perform new functions, or they eliminate redundant logic and redistribute existing functions to remaining logic. One example of redistribution is seen in the implementation of logic to perform integer division.
Early microprocessors maintained separate dividers. A divider in an integer unit was dedicated to performing integer division and another divider in a floating point unit was used to perform floating point division. Recognizing that these dividers were essentially equivalent logic circuits, designers of present day microprocessor have eliminated the divider in the integer unit and have redistributed the integer division function so that it is now executed in the floating point unit.
Redistribution of the integer division function to a floating point divider, however, results in slower execution times for integer division instructions This is because a typical floating point divider is designed to operate on fixed-length operands. More specifically, the typical floating point divider expects to receive 64-bit operands, it requires approximately 64 cycles of the microprocessor clock to perform division, and it provides a 64-bit result. But, integer division in an x86-compatible microprocessor allows operands much smaller than 64-bits.
Additionally, for integer division, x86 architectural provisions restrict the size of a divisor and a quotient to be one-half the size of a dividend. As a result, to perform x86-compatible integer division in a 64-bit floating point divider requires initial alignment of the divisor and dividend in the divider and execution of a number of unnecessary divide cycles in the divider. A division which would result in a quotient that exceeds the size restriction causes a divide overflow condition. Hence, in addition to having to execute unnecessary divide cycles in the divider, floating point execution logic is also required to execute an operation to detect the divide overflow condition, the operation either immediately preceding or immediately following the integer division.
A consequence of using an existing 64-bit floating point divider to perform integer division is that an integer divide instruction, regardless of its operand sizes, requires 64 cycles to execute, plus a number of cycles to detect for a divide overflow. A majority of the cycles are not required. The cycles associated with detection of divide overflow are not essential to the division operation itself.
Attempts in the art are prevalent to increase the execution speed of an integer divide instruction. The Pentium.RTM. processor uses a technique called SRT division. At a high level, while execution speed is improved, implementation of SRT division requires the addition of complex and costly logic to a microprocessor. Other microprocessors have added hardware which is dedicated to performing detection of a divide overflow. As with the Pentium, dedicated hardware for overflow detection speeds up execution time for an integer divide instruction, but again, additional logic circuits are required. Both of these attempts to increase speed are a return to a design doctrine where power, design complexity, and cost are allowed to grow. One skilled in the art will appreciate that performance improvement for a given instruction in a microprocessor, where the improvement is gained without a requirement for additional hardware, is highly desirable.
Therefore, what is needed is an apparatus for performing integer division in a microprocessor that does not require complex additional hardware to execute the division or to detect a divide overflow.
In addition, what is needed is a microprocessor that executes an integer divide instruction in a specified number of divide cycles.
Furthermore, what is needed is a method for performing integer division in a microprocessor that requires neither additional clock cycles nor dedicated hardware to detect for a divide overflow.