1. Field of the Invention
The present invention relates in general to an algorithm for division of a long polynomial expression in a finite field and a hardware architecture for the same, and more particularly to a method and apparatus for dividing a long polynomial expression in a finite field, in which a group-based parallel processing operation is performed on the basis of a lookahead technique and a partial division process, so that no inter-symbol multiplication is required in the finite field, resulting in the production of a relatively large throughput per unit time as compared with the conventional one.
2. Description of the Prior Art
Generally, long polynomial expression division methods have essentially been required for a variety of applications such as an error correction code or data coding method in various fields of electronics including a computer, communication, optical magnetic disk, control system, etc. Such a conventional long polynomial expression division method is shown in FIG. 1 herein. As shown in this drawing, the conventional long polynomial expression division method is implemented using a linear feedback shift register which performs a symbol-based serial process.
However, the above-mentioned conventional method has a disadvantage in that the symbol-based serial process cannot be effected at high speed because it is fully dependent on a degree of a dividend polynomial.
As a high-speed requirement for high-capacity video compression and a low-power requirement for portable information equipment increase, the conventional hardware architecture employing the linear feedback shift register has shown several limitations as follows.
Firstly, the throughput is limited by a degree of a dividend polynomial. This makes high-speed processing impossible. Secondly, the presence of a global feedback signal imposes severe constraints on a switching speed and necessitates the use of a global clock.
Thirdly, the high-speed condition and the low-power consumption condition cannot be satisfied concurrently. Fourthly, the feedback signal limits the degree of parallelism that can be exploited for low-power consumption. Finally, the complete linear feedback shift register and serial buffer registers providing inputs to the shift register and receiving outputs thereof should be clocked for every clock cycle without concern for the change of contents therein.
Therefore, for high-speed/low-power applications, there are required a new polynomial expression division algorithm and the associated architecture which do not suffer the above-mentioned limitations.
Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for dividing a long polynomial expression in a finite field, in which a group-based parallel processing operation is performed on the basis of a lookahead technique and a partial division process, so that no inter-symbol multiplication is required in the finite field, resulting in the production of a relatively large throughput per unit time as compared with the conventional one.
In accordance with one aspect of the present invention, there is provided a method for dividing a long polynomial expression in a finite field, comprising the first step of grouping elements in a dividend polynomial into a plurality of groups; the second step of combining the groups according to a superposition of the finite field; and the third step of performing a group-based parallel processing operation with respect to the combined results on the basis of a lookahead technique and a partial-division process to sequentially remove the groups up to the last one for inter-symbol division in the finite field.
In accordance with another aspect of the present invention, there is provided an apparatus for dividing a long polynomial expression in a finite field, which performs a group-based parallel processing operation on the basis of a lookahead technique and a partial-division process to sequentially remove groups in a dividend polynomial up to the last one for inter-symbol division in the finite field, comprising first group storage means for storing the first one of the groups, the first group storage means including k+1 first symbol registers, each of the first symbol registers including a D flip-flop on the basis of the fact that one symbol is composed of one bit in a binary field; X intermediate group storage means, each of the intermediate group storage means including k first symbol adders for adding partial-remainders from the previous and current groups, k second symbol registers for storing outputs of the first symbol adders, respectively, and a third symbol register for storing a lowest-order symbol from the current group; remainder generation means for adding partial-remainders from the previous and last groups to generate the overall remainder, the remainder generation means including k second symbol adders for adding the partial-remainders from the previous and current groups, and k fourth symbol registers for storing outputs of the second symbol adders, respectively; X+1 partial-quotient generation means connected respectively to the first group storage means and the X intermediate group storage means for generating partial-quotients in response to output data from the first group storage means and intermediate group storage means; and X+1 partial-remainder generation means connected respectively to the first group storage means and the X intermediate group storage means for generating partial-remainders in response to input data to the X+1 partial-quotient generation means, transferring the generated partial-remainders respectively to the intermediate group storage means which are arrayed on the same lines as those thereof, and transferring the lowest-order one of the partial-remainders to the remainder generation means.
In accordance with yet another aspect of the present invention, there is provided an apparatus for dividing a long polynomial expression in a finite field, which performs a group-based parallel processing operation on the basis of a lookahead technique and a partial-division process to sequentially remove groups in a dividend polynomial up to the last one for inter-symbol division in the finite field, comprising intermediate group storage means including k symbol adders for adding partial-remainders from the previous and current groups, k first symbol registers for storing outputs of the symbol adders, respectively, and a second symbol register for storing a lowest-order symbol from the current group; partial-quotient generation means connected to the intermediate group storage means for generating partial-quotients in response to output data from the intermediate group storage means; and partial-remainder generation means connected to the intermediate group storage means for generating partial-remainders in response to input data to the partial-quotient generation means and feeding the generated partial-remainders back to the intermediate group storage means.
First, the technical concept of the present invention will be mentioned briefly. The present invention proposes a division architecture capable of performing a group-based parallel processing operation on the basis of a technique called xe2x80x9clookahead of partial-remainder (LAPR)xe2x80x9d. The group-based parallel processing operation is performed on the basis of a lookahead technique and a partial division process, resulting in no inter-symbol multiplication being required in the finite field, leading to a highly increased throughput per unit time as compared with the conventional one. As a result, lowering a clock frequency being used enables a trade-off between the high operation speed and the low power consumption. The use of the lowered clock frequency also allows a supply voltage to be reduced, resulting in a larger amount of power consumption being saved.
The ability to reduce power consumption is based on the fact that there can be a trade-off between a silicon area and power consumption. Because a trade-off mechanism such as a parallel architecture, a pipelined architecture, etc. can be provided, a low frequency clock can be used and the associated low-voltage operation can be performed, as well as maintaining the throughput to a desired level.
However, this approach is able to obtain the minimum power at a given performance level, but difficult to obtain high performance. As a result, it has a limitation in attaining both of the two objects, high performance and low power consumption.
Therefore, on the basis of the fact that the low power consumption and high operation speed can be obtained at one time on the assumption that an algorithm is particularly tuned to a given function to increase the operation speed, the present invention provides a new long polynomial expression division method and apparatus which will be described later in detail with reference to the accompanying drawings.
In other words, the above-mentioned low power consumption strategies are based on the fact that power consumption of a CMOS digital system is proportioned to the square of a supply voltage, a clock frequency being used and an entire capacitance. Namely, a point to be considered in the algorithm and architecture level design for the reduction of power is to increase throughput per unit time as far as possible and lower the used clock frequency to a level corresponding to the increased throughput. Lowering the used clock frequency relaxes a timing limitation, resulting in the provision of a mechanism to reduce the supply voltage, followed by an additional power saving effect.