1. Field of the Invention
The present invention relates to an apparatus and method for performing operations implemented by iterative execution of a recurrence equation.
2. Description of the Background
There are a number of operations which can be implemented by iterative execution of a recurrence equation. In each iteration, execution of the recurrence equation causes a predetermined number of bits of the result of the operation to be generated, along with the generation of a residual. The residual generated in a previous iteration is used as an input for the current iteration. Hence, after each iteration, the result is updated to take account of the new result bits generated, and a new residual is output.
Typically, the recurrence equation is such that, as the number of iterations increases, the absolute value of the residual decreases. Accordingly, continued iterative execution of the recurrence equation will eventually result in the residual reducing to zero. Depending on the implementation, it is often considered unnecessary to continue iterative execution of the recurrence equation until the residual reduces to zero, and instead it is sufficient to merely perform a predetermined number of iterations that would typically ensure that the residual is less than or equal to some predetermined value, or alternatively to continue with the iterations until it is actually determined that the residual is less than or equal to a predetermined value.
It will be appreciated that there are a variety of operations which can be implemented by iterative execution of a recurrence equation. However, two common examples are division operations and square root operations. U.S. Pat. No. 4,939,686 describes a Radix 4 shared division and square root circuit.
When performing such operations within a data processing apparatus, it is typical to provide a primary iterative cycle containing the necessary logic to iteratively execute the recurrence equation. Considering the example of pipelined processors, this primary iterative cycle would typically be located within a particular pipelined stage of a particular pipeline. For example, in a pipelined processor containing multiple pipelines, a particular pipeline may be provided for performing operations such as division operations and square root operations, with that pipeline including a primary iterative cycle at a particular pipelined stage.
As mentioned earlier, the residual generated in a particular iteration needs to be used as the input for the next iteration. Typically, the residual is kept in a redundant form (for example a carry-save format) to reduce the cycle time required to compute the residual in a particular iteration. However, for at least one of the processing steps performed in each iteration, a certain number of bits of the residual need to be used in selecting the operation to be performed in this iteration. These bits may be required to be known in non-redundant form, and accordingly it is necessary in such situations to include some logic to convert those required bits from the redundant form into the non-redundant form, typically using a carry-propagate adder structure. It has been found that this conversion of certain bits from the redundant form into the non-redundant form is in the critical path of the iteration cycle, and hence limits the execution speed of the data processing apparatus.
This particular problem can be illustrating by way of example with reference to FIG. 1, which illustrates a typical primary iterative cycle that may be used when performing a division operation. As will be appreciated, a division operation is arranged to divide a dividend by a divisor. The dividend is routed to register 130, whilst the divisor is routed to register 110. In a first iteration, register 120 will be empty. Both the registers 120 and the registers 130 are arranged to store n+m bits.
The most significant n bits of registers 120 and 130 are passed to next quotient digit selector logic 140, whilst all bits of registers 120 and 130 are passed to carry-save adders 160. The logic provided within the next quotient digit select logic 140 is illustrated schematically in FIG. 2. As can be seen, an n-bit carry-propagate adder 200 is provided for receiving the n bits from both registers 120 and 130, and for producing an n-bit output in non-redundant form. As mentioned above, in the first iteration, register 120 will be empty, whilst register 130 will contain the dividend, which will already be in non-redundant form, and accordingly, the upper n bits of the dividend will be output from adder 200. The output of adder 200 is passed to a next quotient digit lookup table 210, which is arranged to determine based on the divisor stored in register 110 and the n-bits received from adder 200, a next quotient digit to be output as a control signal to multiplexer 150.
The next quotient digit specifies a multiplication factor to be applied to the divisor 110 in order to generate an update vector for outputting over path 155 to the carry-save adders 160. In the example illustrated in FIG. 1, the next quotient digit can have five possible values, namely −2, −1, 0, +1 and +2, and the multiplexer 150 is arranged to receive as inputs the values −2D, −1D, 0, +1D and +2D (where D denotes the divisor as stored in register 110). Accordingly, it can be seen that the next quotient digit output from logic 140 is used by the multiplexer 150 as a select signal to select the appropriate update vector to output over path 155 to the carry-save adders 160.
The carry-save adder 160 is arranged to generate a residual, which in the context of division operations will be referred to hereafter as a partial remainder, the partial remainder being generated from the contents of the registers 120 and 130 and the update vector received over path 155. The partial remainder as generated in redundant format is then routed back to registers 120, 130, with register 130 saving the carry bits of the partial remainder, and register 120 storing the save bits of the partial remainder. The partial remainder output by the carry-save adders 160 is also stored in redundant format in the register 170.
As can be seen from FIG. 1, the next quotient digit output from logic 140 is not only input to the multiplexer 150, but is also used to update the quotient value stored within the register 100, with that updated quotient value then being output to the register 190.
Accordingly, it can be seen that during each iteration, a next quotient digit is generated based on the most significant n bits of the partial remainder stored in the registers 120, 130, an update vector is then generated from that next quotient digit, and a new partial remainder is generated based on the previous partial remainder as stored in the registers 120, 130 and the update vector. In addition, the quotient is updated based on the next quotient digit.
At some point, for example after a predetermined number of iterations have been performed, the division operation will be deemed to be complete, at which point the result of the division operation will be given by the updated quotient stored within the register 190, and any final partial remainder stored within the register 170. Since the partial remainder stored within the register 170 will be in redundant format, carry-propagate adder 180 is provided for converting that final partial remainder into non-redundant form.
With the logic arranged as shown in FIG. 1, the following critical path is observed:    1 The upper bits of the partial remainder carry and save vectors are summed to a non-redundant form in carry-propagate adder 200;    2 A determination is made within the next quotient digit lookup table 210 of the next quotient digit based on the non-redundant upper partial remainder bits;    3 Selection is made via the multiplexer 150 of the update vector determined from the next quotient digit;    4 The update vector and the partial remainder carry and save vectors are summed in a redundant form within the carry-save adders 160;    5 The new partial remainder carry and save vectors are written to their respective registers 130, 120.
In parallel with step 4 above, the quotient is updated with the determined next quotient digit and written to the quotient register 100.
A timing diagram illustrating the timing of this critical path is illustrated in FIG. 9, this illustrating the sequence of the above five steps, preceded by an initial register read step.
It would be desirable to reduce the time taken for the above critical path, thereby facilitating an increase in processing speed of the data processing apparatus.