One or more aspects relate in general to data processing systems, and in particular, to performing an arithmetic operation, by, for instance, iterative digit accumulations, in a data processing unit.
Fixed point operations, like, for example, integer divide operations, in general require many cycles to achieve the desired output precision, specified by a computing architecture. As such, many different algorithms have emerged to take advantage of different dataflow architectures in order to increase performance and throughput of these “slow” instructions.
Dividers are used in microprocessors and data-processing devices to perform arithmetic division. Because floating-point numbers and integers are represented differently in binary, and because the operations differ as a result, separate floating point and integer dividers are generally provided. Typically, floating point division is considered to be more important for high-demand applications, such as graphics and multimedia applications. Also, integer division is not performed with the same frequency as other mathematical operations. Consequently, many manufacturers save die real estate by providing only the most basic single bit per cycle (radix-2) integer divider, which reduces performance. While combined floating point and integer dividers are known, they generally have not provided significant performance or space-efficiency improvements over separate dividers.
Integer division and floating point division are commonly performed using one of a variety of well-known subtractive algorithms. Subtractive algorithms each include a sequence of shift, subtract, and compare operations. Among subtractive algorithms, restoring, non-restoring and the Sweeney, Robertson, and Tocher (SRT) division algorithms are known.
Subtractive division works similarly to standard long division. Each digit of the dividend, starting with the most significant digit, is compared to the divisor, and a digit of the quotient is computed. In computers, this is accomplished by the typical one bit per cycle (radix-2) integer divider by aligning the most significant bit of the dividend with the least significant bit of the divisor, subtracting the aligned digits, shifting the partial remainder to the left, subtracting, shifting again, and so on. For a 64-bit number, the minimum number of cycles is 64, plus several cycles for setting up the computation. Even in cases where the numbers have significantly fewer digits or the dividend is smaller than the divisor (a case which always results in zero for integer numbers) the entire process is performed. Thus, even radix-4 and radix-8 integer dividers, which process multiple bits per cycle, can be very inefficient.
GB 2 421 327 A, which is hereby incorporated herein by reference in its entirety, discloses a method for dividing integers comprising counting the number of leading sign bits of both the dividend and the divisor (e.g. the number of leading zeroes before the most significant one in a positive number or the number of leading ones before the most significant zero in a two's complement negative number), calculating the number of digits in the quotient by subtracting the number of leading sign bits in the dividend from the number of leading sign bits in the divisor and adding one, normalizing both the dividend and the divisor (e.g. by left shifting), and then calculating the digits in the quotient by using a subtractive divider such as one using a non-restoring SRT algorithm.