1. Field of the Invention
The present invention relates to electronic circuits, and more specifically to arithmetic circuits for use with the residue number system.
2. Description of Related Art
Power consumption is now becoming a more important consideration in integrated circuit design. This has compelled circuit designers to consider reducing power consumption through changes in many different levels of the design process, such as the system, technology, algorithm, physical, and circuit levels. For example, system level approaches include power supply voltage scaling, clock gating, and subsystem sleep (or power down) modes. Technology level techniques include using dynamic threshold MOSFETs, and algorithm level techniques include using alternate number systems and state encoding. Further, physical level methods include transistor reordering, and circuit level methods include self-timed asynchronous approaches and glitch reduction. The ultra-low power circuits of the future will have to employ several of these approaches because none alone can achieve the power reduction goals for the next decade.
While all of the techniques described above advantageously reduce power consumption, many of them have a deleterious side effect of reducing the speed of the circuit. For example, supply voltage scaling lengthens the system clock period if other factors such as technology and drive strength are kept the same. For this reason, designers now consider the delay-power (DP) product of a circuit as the important factor in low power circuit design. One system level design approach that is currently being investigated due to of its potential for significant DP product reduction is the use of a One-Hot Residue Number System (OHRNS). For example, the OHRNS is being considered for use in the adaptive FIR (finite impulse response) filter and Viterbi detector of the Project Orion read channel.
The Residue Number System (RNS) is an integer number system in which the basic operations of addition, subtraction, and multiplication can be performed quickly because there are no carries, borrows, or partial products. This allows the basic operations to be performed in a single combinational step, digit-on-digit, using simple arithmetic units operating in parallel. However, other operations such as magnitude comparison, scaling (the RNS equivalent of right shifting), base extension (the RNS equivalent of increasing the bit width), and division are slower and more complicated to implement. Thus, RNS is most widely used in applications in which the basic operations predominate such as digital signal processing (DSP).
The RNS representation of an integer X is a number of digits, with each digit being the residue of X modulo a specially chosen integer modulus. In other words, X is represented as the vector of its residues modulo a fixed set of integer moduli. In order to make the RNS representation of each integer unique for all nonnegative values less than the product M of the moduli, the moduli are chosen to be pairwise relatively prime (i.e., the smallest single number into which all divide evenly is equal to the product of the moduli). Letting mi denote the ith modulus, the RNS representation of X is given by X⇄(x1, x2, . . . , xn), where xj=X modulo mi and is known as the ith residue digit of the RNS representation of X. Table 1 shows the representation of the integers 0 to 2430 in an RNS in which m1=11, m2=13, and m3=17 (“an 11, 13, 17 RNS representation”).
TABLE 1RNS digitRNS digitRNS digitInteger Xx11x13x172430101216242991115. . .19862187511764016531615421514311413201312112121101111101010109999888877776666555544443333222211110000
As an example, for the natural number 19, the x11 digit is 19 mod(11)=8 (i.e., 19÷13=1 remainder 8), the x13 digit is 19 mod(13)=6, and the x17 digit is 19 mod(17)=2. Each RNS digit is determined without reference to any other RNS digit, and no RNS representation repeats in the range from 0 to 2430. Negative integers can be represented by limiting the represented range to an equal (or substantially equal) number of positive and negative numbers. The representation of the range from −1215 to 1215 in the 11, 13, 17 RNS representation is shown in Table 2. No separate sign is associated with the RNS representation, and the sign of the represented integer cannot be determined from any less than all of its RNS digits.
TABLE 2RNS digitRNS digitRNS digitInteger Xx11x13x1712155681214457. . .77776666555544443333222211110000−1101216−291115−381014−47913−56812−65711−74610. . .−12147810−1215679
In the RNS, the basic operations of addition, subtraction, and multiplication are performed in digit-parallel fashion, modulo mi. Thus, if operands X and Y have RNS representations of X⇄(x1, x2, . . . , xn) and Y⇄(y1, y2, . . . , yn), the result Z has an RNS representation of Z⇄(x1°y1, x2°y2, . . . , xn°yn), where “xi°yi” represents any of the basic operations performed on the two RNS digits modulo mi. More specifically, the corresponding RNS digits of the two numbers are added, subtracted, or multiplied, and then the proper modulo operation is performed on each to produce the RNS digits of the result.
For example, in the 11, 13, 17 RNS representation of Table 1, 4+15 gives (4, 4, 4)+(4, 2, 15) or (4+4 mod(11), 4+2 mod(13), 4+15 mod(17), which equals (8, 6, 2) or 19. Similarly, 19−15 gives(8−4 mod(11), 6−2 mod(13), 2−15 mod(17)), (4, 4, 4) or 4, and 6×3 gives (6×3 mod(11), 6×3 mod(13), 6×3 mod(17)), which equals (7, 5, 1) or 18. Because all individual operations are performed on each RNS digit independently and without reference to any other RNS digit, the operations can be performed completely in parallel. Thus, each of the basic operations can be performed quickly and efficiently, especially when all of the moduli are relatively small integers.
In electronic circuit implementations, addition is the fundamental RNS operation and subtraction is performed by adding the additive inverse of the subtrahend. Multiplication is also performed using addition, as will now be explained. Any prime modulus p has at least one primitive root, which is an integer α of order p−1 under multiplication. In other words, the primitive root is an integer α whose successive powers, taken modulo p, are the nonzero integers modulo p (i.e., for any 0≦X<p, X=αk modulo p for some 0≦k≦p−2). In such a case, X is said to have an index of k, modulo p.
Given the primitive root, multiplication modulo p can be performed by adding the indices modulo p−1. This is analogous to using logarithms in the binary number system. For example, α=2 is a primitive root modulo 13 because, the integers 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 210, and 211 modulo 13 are equal to 1, 2, 4, 8, 3, 6, 12, 11, 9, 5, 10 and 7, respectively. Thus, if X=5 (29 modulo 13) and Y=7 (211 modulo 13), X×Y=35 (28 modulo 13). Thus, the index of the product modulo p (8) of two RNS digits can be determined by adding the indices of the two RNS digits (9 and 11), modulo p−1 (i.e., (9+11) mod(12)=8).
Scaling is the RNS operation that corresponds to radix division in the binary number system (i.e., right-shifting with truncation or integer division). In the RNS, the radices are the moduli, and the scaling operation can be performed on any single modulus. Further, scaling can be performed on a combination of moduli, which corresponds to shifting by more than one bit position in the binary number system, by repeating the single modulus scaling operation. Scaling is performed using properties of division under certain limitations, as explained below.
Division (Q=N/D) can be performed with the same speed and simplicity as the three basic operations if it is known beforehand that the quotient Q is an integer and the divisor D has no zero-valued RNS digits. In such a case, the quotient Q can be determined by multiplying the dividend N with the multiplicative inverse D−1 of the divisor (Q=N×D−1). Every nonzero integer modulo p has a multiplicative inverse. In particular, the multiplicative inverse is the additive inverse of the integer's index, taken modulo p−1. If X has no zero-valued RNS digit, the multiplicative inverse of X, taken modulo M, is the vector of its inverted digits. Thus, the multiplicative inversion operation can also be independently performed on each RNS digit in parallel to quickly and efficiently obtain the result.
Using these RNS properties, the scaling operation is performed by converting the dividend to a multiple of the radix and then performing radix division through inverse multiplication. The conversion is first performed by subtracting the residue of the modulus used for scaling, and then the division is performed by multiplying the converted dividend with the multiplicative inverse of the modulus. In other words, the ith RNS digit of X (xi) is subtracted from X in order to round X to the next smaller multiple of mi, and then the result is multiplied with mi−1 in each modulus except the ith to perform radix division. Thus, X scaled by modulus mi is given by the following equation.└X/mi┘=(mi−1(x1−xi), mi−1(x2−xi), . . . , mi−1(xn−xi))   (1)
For example, in the 11, 13, 17 RNS representation, 47 scaled by modulus 11 └47/11┘ gives ((11−1(3−3)) mod(11), (11−1(8−3)) mod(13), (11−1(13−3)) mod(17)), which equals (*, 4, 4) or 4. The division operations are guaranteed to be correct because the prior subtractions ensure that the quotient is integral. However, the multiplication is not performed in the ith modulus because mi−1 does not exist modulo mi. Therefore, the result of the scaling operation exists only in moduli other than the ith modulus, and the ith RNS digit is truncated as expected. While scaling does require the conversion of RNS digits from one modulus to another, this can be performed in a simple manner as described below. Further, the subtraction and multiplication operations can be independently performed on each RNS digit in parallel. If needed, the truncated RNS digit can later be restored by performing a base extension operation (not described herein).
In electronic circuit implementations, the RNS digits can be encoded in various ways. In conventional binary encoding, each RNS digit is converted to a binary number that is represented by the states of one or more lines, each of which is in one of two states to represent a binary digit of “0” or “1”. There is also the “one-hot” encoding scheme in which each possible value of an RNS digit is associated with a separate two-state line. For example, in the 11, 13, 17 RNS representation, 11 lines are used to represent the first RNS digit, 13 lines are used to represent the second RNS digit, and 17 lines are used to represent the third RNS digit. When an RNS digit has a given value, the line associated with that value is high and all of the other lines are low. Thus, only one line of a digit is high (or hot) at any given time.
The use of the one-hot encoding scheme with the RNS produces such compelling advantages in electronic circuit implementations that such a system is identified as the “One-Hot Residue Number System” (OHRNS). While the OHRNS is really the same RNS with the same arithmetic properties, the advantages of using the OHRNS include basic operation implementation using barrel shifters with their superior delay-power products and operand-independent delays, simple and regular layout of arithmetic circuits, and zero-cost implementation through signal transposition of inverse calculation, index calculation, and residue conversion. When any RNS digit changes in value, at most two lines change state. This is the minimal possible activity factor and yields low power dissipation. Because in OHRNS implementations signal activity factors are near minimal and fewer critical path transistors are present, such systems have lower delay-power products. FIG. 1 shows the states of the lines of the RNS digits for representing integer 15 in an 11, 13, 17 OHRNS implementation.
With one-hot encoding of the RNS digits, addition can be performed through a cyclic shift (i.e., rotation). In particular, one of the operands is rotated by an amount equal to the value of the other operand. While such a rotation can be implemented using several different types of circuits, barrel shifters allow all possible rotations of the first operand to be computed in parallel. The second operand determines which of the rotations is output from the barrel shifter as the result. A conventional OHRNS modulo mi adder is shown in FIG. 2(a). The adder 10 includes a modulo mi barrel shifter 12 that performs the addition, and a static pipeline register 14 that stores the result for downstream processing. FIG. 2(b) shows the internal structure of the barrel shifter. As shown, NMOS pass transistors 16 are used instead of transmission gates to yield higher speed and lower power dissipation due to smaller input and output capacitive loadings (i.e., because there are half as many NMOS sources/drains per input/output line as when transmission gates are used). Additionally, the use of pass transistors lowers the area of the barrel shifter by at least half.
Further, in the OHRNS, subtraction can be performed by adding the additive inverse of the subtrahend, and the additive inverse can be computed by a simple one-to-one mapping using signal transposition. FIG. 3 shows a conventional OHRNS modulo mi subtractor. As shown, the subtractor 20 is identical to the adder 10 of FIG. 2(a) except for the use of signal transposition 22 on the subtrahend input to the barrel shifter 12. The signal transposition 22 computes the additive inverse quickly and simply through a one-to-one mapping, as described below.
Multiplication in the OHRNS can also be performed with barrel shifters by using indices. Indices and their additive inverses, which are known as anti-indices, are the RNS equivalents of logarithms and antilogarithms, as explained above. The computation of indices and anti-indices in any modulus can be performed quickly and simply through a one-to-one mapping. In particular, such mappings in the OHRNS are implemented by merely permutating the signal lines of the RNS digit. In other words, indices and anti-indices can be computed through signal transpositions or wire permutations that require no active circuitry and introduce little or no delay. An exemplary signal transposition is shown in FIG. 4.
FIG. 5 shows a conventional OHRNS modulo mi multiplier that uses wire transpositions to compute indices and anti-indices. More specifically, the multiplier 30 uses signal transpositions 34, 36, and 38 on the input and output lines to compute the indices and anti-indices, and a barrel shifter 32 to add the indices. A small amount of combinational logic 39 is used to handle the special case in which at least one of the operands is zero-valued. The separate handling of this special case allows the barrel shifter 32 to perform addition modulo mi−1, rather than modulo mi. As in the adder 10 of FIG. 2(a), a static pipeline register 14 stores the resulting product for downstream processing.
If one of the multiplicands is a constant, the OHRNS multiplier does not have to perform index calculation and addition. Instead, the product can be computed by simply using a single signal transposition that performs a one-to-one mapping of the input lines for the variable multiplicand to the proper output lines. This unique feature of the OHRNS allows constant multiplication to be performed without using any active circuitry, and thus very quickly and with little or no power consumption.
Residue conversion in the OHRNS can also be performed quickly and efficiently. Conversion of an RNS digit to a larger modulus, or “zero-filling”, can be performed by juxtaposing the input lines with additional low level signal lines, the number of which is equal to the difference in the moduli. On the other hand, conversion of an RNS digit to a smaller modulus, or “residue folding” can be performed through a many-to-one mapping that requires some active circuitry. In particular, all source modulus values that are congruent modulo the target modulus are mapped to that target modulus. FIG. 6 shows a conventional OHRNS folding circuit 40 that uses OR gates 42 to combine source modulus values that are congruent modulo the target modulus.
FIG. 7 shows a conventional OHRNS mi,j scaling unit. Such a scaling unit 50 is used to perform a scaling by modulus mi of the RNS digit of every modulus mj except modulus mi in accordance with equation (1). A signal transposition 54 is performed on the second input to compute the additive inverse of the modulo mi operand, and residue conversion 54 must also be performed to convert that operand from modulus mi to modulus mj. If mi>mj, residue folding can be performed using OR gates, as described above. Another signal transposition 56 is performed on the output of the adder 52 to multiply the result of the subtraction by mi−1, and a static register 58 stores the result for downstream processing.