The present invention relates to digital processing in general and in particular to implementation of the mathematical division function. Often referred to as the division step instruction, this function is conventionally implemented in digital processors to perform integer division using dedicated hardware within the processor.
A conventional method and circuit for performing the so-called restoring division step operation for unsigned division can be understood from FIG. 1 of the accompanying drawings. The restoring division step is executed n times iteratively in order to perform an n-bit unsigned division. The n-bit storage elements R, Q and D are used to hold intermediate results after each iteration. Before the first iteration, R must be initialised to zero while Q and D must be loaded with the dividend and the divisor respectively. It is known, as indicated in FIG. 1, to use a wide datapath Arithmetic Logic Unit (ALU) so that the pair R and Q can be treated as one operand. This use of a wide datapath ALU is common in digital signal processors because it can support extended precision arithmetic. A single extended width accumulator 10 can be used to hold both operands R and Q. This combined operand can conveniently be referred to as (R, Q).
As will be understood from FIG. 1, each time the division step instruction is executed, D is first shifted left by (nxe2x88x921) bits, zero appended and then subtracted from the R, Q pair. If the difference (T) is negative, the value of (R, Q) is shifted left by one and zero appended (block 12) before being loaded back to the registers R and Q. Otherwise, T is left-shifted by 1 and one appended (block 14) to produce the value (2T+1) and then loaded into (R, Q). After n iterations, the quotient and the remainder of the division will be found in Q and R respectively.
The arrangement shown in FIG. 1 makes use of existing functional units such as the barrel shifter (block 12) and the ALU. The (R, Q) pair can be a single wide accumulator that supports extended precision. It""s input multiplexer (block 16) is provided to support parallel functional units. The shifter operating on D is needed as a bus alignment function to support increased precision internal datapath. The only extra hardware is the shifting function (block 14) on the ALU output.
A significant disadvantage of the arrangement shown in FIG. 1 is the extra delay incurred by the optional shifting of the ALU output. The shifting function can be implemented so as to produce a shift only for the division step instruction and so as to output the normal ALU output for all other instructions. As a consequence, the shifter delay is added directly to the ALU path. As an alternative, the shifting can be hardwired and produced as an extra input to the accumulator input multiplexer 16. The problem is then the extra delay caused by increased complexity of the multiplexer. As an example, a 2-input multiplexer is often much faster than a 3-input multiplexer of the same technology. With either of the known arrangements, extra delay is added to the ALU path. Since, particularly in a well balanced high performance processor, the ALU path is often the critical path within the processor, overall processor performance can be compromised.
According to a first aspect of the present invention there is provided a digital processor capable of performing mathematical n-bit division using n iterative steps, comprising: three storage elements one of which, the dividend element, is loaded with the dividend at the commencement of division; an arithmetic unit, a unit for left shifting and zero appending output from the dividend element which is connected to an output of the dividend element and which is connected to provide an input to the arithmetic unit; wherein the arithmetic unit is connected to supply an output to the two other storage elements an output from which is connected to be fed back to another input of the arithmetic unit, characterised by a left shift and append unit connected in the feedback from the two storage units to the arithmetic unit.
According to a second aspect of the present invention, in a digital processor for performing mathematical division using and having an arithmetic unit, a quotient storage element and a remainder storage element with feedback from the two storage elements to the arithmetic unit, there is provided a method of mitigating performance degradation characterised by the step of connecting a left shift and append unit in the feedback from the two storage elements to the arithmetic unit.