1. Field of the Invention
This invention relates generally to the field of semiconductor microprocessors, and, more particularly, to the way that overflows are detected when decoded bit-vectors are added together.
2. Description of the Related Art
Decoded bit-vectors appear frequently in high-speed logic design. Decoded bit-vectors (also known as "one-hot encoded bit-vectors") are needed for multiplexer (MUX) control, generally simplify control logic and can be added together faster than vectors in encoded form. For example, fast SHIFT operations may be readily used with decoded bit-vectors. A function frequently needed when decoded bit-vectors are added together is the detection of an overflow. Indeed, whether an overflow occurs or not is sometimes more interesting than the numerical value of the sum of the decoded bit-vectors.
An (N+1)-bit decoded bit-vector A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.k+1 a.sub.k a.sub.k-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 may represent any number from 0 to N. Only one of the bits A[i]=a.sub.i for any i between 0 and N is different from zero (0) and is set equal to one (1). The number represented by the decoded bit-vector is equal to the number of zeros (0's) to the right of the non-zero bit. For example, the number n is represented by n=A=a.sub.N a.sub.N-1 a.sub.N-2 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.2 a.sub.1 a.sub.0 where A[k]=a.sub.k =0 for k.notident.n, and A[n]=a.sub.n =1. More particularly, for N =9, the 10-bit decoded bit-vector representing 4 is 0000010000 and the 10-bit decoded bit-vector representing 3 is 0000001000, for example.
Adding two decoded bit-vectors may be effected by a shift left operation. A first one of the two decoded bit-vectors may be input into a shifter and then the 1 of that input decoded bit-vector may be shifted left by the number of zeros (0's) to the right of the 1 of the second one of the two decoded bit-vectors. For example, adding the 10-bit decoded bit-vector representing 4 (0000010000) to the 10-bit decoded bit-vector representing 3 (0000001000) may be effected by inputting 0000010000 (4) into a shifter and then shifting the 1 of 0000010000 (4) three places to the left, yielding the 10-bit decoded bit-vector result 0010000000 (7).
FIG. 1 illustrates a conventional implementation of 10-bit decoded bit-vector addition using MUXes with ten 2-AND gates and a 10-OR (as shown explicitly in MUX 100) that is 10 bits wide (ten 10-ORs in parallel, for example, as shown with MUXes 100-190). As shown in FIG. 1, the first one of the two 10-bit decoded bit-vectors X (0000010000, corresponding to 4) is input into each of ten MUXes 100-190 with the least significant bit (LSB, 0 here) at the top and with the most significant bit (MSB, 0 here) at the bottom. Then, in leftmost MUX 100, whose output will be the MSB of the result, the second one of the two 10-bit decoded bit-vectors Y (0000001000, corresponding to 3) is input into MUX 100 with the MSB (0 here) at the top and with the LSB (0 here) at the bottom. Consequently, the LSB of X is 2-ANDed with the MSB of Y, the MSB of X is 2-ANDed with the LSB of Y and all the respective intervening bits are similarly 2-ANDed together, as shown in FIG. 1.
As further shown in FIG. 1, Y (0000001000, corresponding to 3) is input into MUX 110, whose output will be the next to MSB of the result, with the 1 bit shifted upward one place (giving 0000010000, corresponding to 4, reading from MSB to LSB). Similarly, Y (0000001000, corresponding to 3) is input into MUX 120 with the 1 bit shifted upward two places (giving 0000100000, corresponding to 5), into MUX 130 with the 1 bit shifted upward three places (giving 0001000000, corresponding to 6), into MUX 140 with the 1 bit shifted upward four places (giving 0010000000, corresponding to 7), into MUX 150 with the 1 bit shifted upward five places (giving 0100000000, corresponding to 8) and into MUX 160 with the 1 bit shifted upward six places (giving 1000000000, corresponding to 9). The outputs of each of the ten 2-AND gates in each of MUXes 100-160 are ORed together to yield the seven MSBs (0010000) of the result X+Y.
Moreover, as shown in FIG. 1, Y (0000001000, corresponding to 3) is input into MUX 170, whose output will be the third LSB of the result, with the 1 bit shifted downward three places (giving 0000000001, corresponding to 0, reading from MSB to LSB). Similarly, Y (0000001000, corresponding to 3) is input into MUX 180 with the 1 bit shifted downward two places (giving 0000000010, corresponding to 1) and into MUX 190 with the 1 bit shifted downward one place (giving 0000000100, corresponding to 2). The outputs of each of the ten 2-AND gates in each of MUXes 170-190 are ORed together to yield the three LSBs (000) of the result X+Y. Putting together the seven MSBs (0010000) and the three LSBs (000) gives the final result X+Y=0010000000, corresponding to 7=4+3.
Taking another example, as shown in the conventional design of FIG. 2, the first one of the two 10-bit decoded bit-vectors X (0000010000, corresponding to 4) is input into each of ten MUXes 200-290 with the least significant bit (LSB, 0 here) at the top and with the most significant bit (MSB, 0 here) at the bottom. Then, in leftmost MUX 200, whose output will be the MSB of the result, the second one of the two 10-bit decoded bit-vectors Z (0000100000, corresponding to 5) is input into MUX 200 with the MSB (0 here) at the top and with the LSB (0 here) at the bottom. Consequently, the LSB of X is 2-ANDed with the MSB of Z, the MSB of X is 2-ANDed with the LSB of Z and all the respective intervening bits are similarly 2-ANDed together, as shown in FIG. 2.
As further shown in FIG. 2, Z (0000100000, corresponding to 5) is input into MUX 210, whose output will be the next to MSB of the result, with the 1 bit shifted upward one place (giving 0001000000, corresponding to 6, reading from MSB to LSB). Similarly, Z (0000100000, corresponding to 5) is input into MUX 220 with the 1 bit shifted upward two places (giving 0010000000, corresponding to 7), into MUX 230 with the 1 bit shifted upward three places (giving 0100000000, corresponding to 8) and into MUX 240 with the 1 bit shifted upward four places (giving 1000000000, corresponding to 9). The outputs of each of the ten 2-AND gates in each of MUXes 200-240 are ORed together to yield the five MSBs (10000) of the result X+Z.
Moreover, as shown in FIG. 2, Z (0000100000, corresponding to 5) is input into MUX 250, whose output will be the fifth LSB of the result, with the 1 bit shifted downward five places (giving 0000000001, corresponding to 0, reading from MSB to LSB). Similarly, Z (0000100000, corresponding to 5) is input into MUX 260 with the 1 bit shifted downward four places (giving 0000000010, corresponding to 1), into MUX 270 with the 1 bit shifted downward three places (giving 0000000100, corresponding to 2), into MUX 280 with the 1 bit shifted downward two places (giving 0000001000, corresponding to 3) and into MUX 290 with the 1 bit shifted downward one place (giving 0000010000, corresponding to 4). The outputs of each of the ten 2-AND gates in each of MUXes 250-290 are ORed together to yield the five LSBs (00000) of the result X+Z. Putting together the five MSBs (10000) and the five LSBs (00000) gives the final result X+Z=1000000000, corresponding to 9=4+5.
When two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 (corresponding to n, so that a.sub.n =1 and a.sub.i =0 for i.notident.n) and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 (corresponding to m, so that b.sub.m =1 and b.sub.i =0 for i.notident.m) add up to a number n+m less than or equal to N, then N-m+1 MUXes similar to MUXes 100-160 of FIG. 1 and MUXes 200-240 of FIG. 2 may be used to generate the N-m+1 MSBs of the result A+B, and m MUXes similar to MUXes 170-190 of FIG. 1 and MUXes 250-290 of FIG. 2 may be used to generate the m LSBs of the result A+B, with A input into all of the MUXes with the LSB a.sub.0 at the top and the MSB a.sub.N at the bottom. Alternatively, N-n+1 MUXes similar to MUXes 100-160 of FIG. 1 and MUXes 200-240 of FIG. 2 may be used to generate the N-n+1 MSBs of the result A+B, and n MUXes similar to MUXes 170-190 of FIG. 1 and MUXes 250-290 of FIG. 2 may be used to generate the n LSBs of the result A+B, with B input into all of the MUXes with the LSB b.sub.0 at the top and the MSB b.sub.N at the bottom.
In particular, when one of the two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 corresponds to n=0, so that a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N, and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 corresponds to m, so that b.sub.m =1 and b.sub.i =0 for i.notident.m, then adding A+B always yields a number 0+m=m that is less than or equal to N. The leftmost MUX gives the MSB of the result, which is OR(AND(b.sub.N, a.sub.0), AND(b.sub.N-1, a.sub.1), . . . , AND(b.sub.2, a.sub.N-2), AND(b.sub.1, a.sub.N-1), AND(b.sub.0, a.sub.N))=b.sub.N, since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N.
Here, OR(x, y, . . . , z) is the inclusive logical OR operation, and AND (r, s) is the logical AND operation. Where at most one of x, y, . . . , z is equal to 1, with all the rest of x, y, . . . , z equal to 0, the output of OR(x, y, . . . , z) is equivalent to the sum x+y+ . . . +z. Where r and s are each either 1 or 0, the output of AND(r, s) is equivalent to the product (rs).
The next to leftmost MUX gives the next to leftmost MSB of the result, which is OR(AND(b.sub.N-1, a.sub.0), AND(b.sub.N-2, a.sub.1), . . . , AND(b.sub.1, a.sub.N-2), AND(b.sub.0, a.sub.N-1), AND(b.sub.N, a.sub.N))=b.sub.N-1, again since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N. The next to the next to leftmost MUX gives the next to the next to leftmost MSB of the result, which is OR(AND(b.sub.N-2, a.sub.0), AND(b.sub.N-3, a.sub.1), . . . , AND(b.sub.0, a.sub.N-2), AND(b.sub.N, a.sub.N-1), AND(b.sub.N-1, a.sub.N))=b.sub.N-2, again since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N.
Similarly, the rightmost MUX gives the LSB of the result, which is OR(AND(b.sub.0, a.sub.0), AND(b.sub.N, a.sub.1), . . . , AND(b.sub.3, a.sub.N-2), AND(b.sub.2, a.sub.N-1), AND(b.sub.1, a.sub.N))=b.sub.0, since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N. The next to rightmost MUX gives the next to rightmost LSB of the result, which is OR(AND(b.sub.1, a.sub.0), AND(b.sub.0, a.sub.1), . . . , AND(b.sub.4, a.sub.N-2), AND(b.sub.3, a.sub.N-1), AND(b.sub.2, a.sub.N))=b.sub.1, again since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N. The next to the next to rightmost MUX gives the next to the next to rightmost LSB of the result, which is OR(AND(b.sub.2, a.sub.0), AND(b.sub.1, a.sub.1). . . , AND(b.sub.5, a.sub.N-2), AND(b.sub.4, a.sub.N-1), AND(b.sub.3, a.sub.N))=b.sub.2, again since a.sub.0 =1 and a.sub.i =0 for i=1, 2, . . . , N. The net result is B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0, which is the expected result of adding 0+B.
Overflow occurs when two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 (n) and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 (m) add up to a number n+m larger than N. As discussed above, adding two decoded bit-vectors may be effected by a shift left operation. The first one of the two decoded bit-vectors may be input into a shifter and then the 1 of that input decoded bit-vector may be shifted left by the number of zeros (0's) to the right of the 1 of the second of the two decoded bit-vectors. For example, adding the 10-bit decoded bit-vector representing 7 (0010000000) to the 10-bit decoded bit-vector representing 5 (0000100000) may be effected by inputting 0000100000 (5) into a shifter and then shifting the 1 of 0000100000 (5) seven places to the left, yielding the 19-bit decoded bit-vector result 0000001000000000000 (12), where the 1 appears in the 9 overflow bits. Since the addition of two (N+1)-bit decoded bit-vectors A (n) and B (m) add up to a number n+m that is always less than or equal to 2N, the result may always be represented by a (2N+1)-bit decoded bit-vector C=c.sub.2N c.sub.2N-1 c.sub.2N-2 . . . c.sub.n+m+1 c.sub.n+m c.sub.n+m-1 . . . c.sub.2 c.sub.1 c.sub.0 (n+m). The N leftmost MSBs are the overflow bits.
A conventional approach to overflow detection is to add two decoded bit-vectors together and detect whether a 1 appears in the overflow bits. For example, in the addition of two (N+1)-bit decoded bit-vectors A (n) and B (m) that results in C (n+m), the N leftmost MSBs (the overflow bits) may be ORed together to give OR(c.sub.2N,c.sub.2N-1,c.sub.2N-2, . . . ,c.sub.N+3,c.sub.N+2,c.sub.N+1). Alternatively, another conventional approach to overflow detection is to add two decoded bit-vectors together and detect whether a 1 appears in the non-overflow bits. If a 1 appears as an output of MUXes with non-shifted or upward-shifted inputs, then there is no overflow. For example, the N+1 rightmost LSBs of the resulting (2N+1)-bit decoded bit-vector C (n+m) may be ORed together to give OR(c.sub.N,c.sub.N-1,c.sub.N-2, . . . ,c.sub.2,c.sub.1,c.sub.0). In the two examples given above (7=4+3 and 9=4+5), 1's appeared in the outputs of MUXes with non-shifted or upward-shifted inputs (MUX 120 and MUX 200, respectively) and, indeed, there was no overflow.
FIG. 3 illustrates a conventional implementation of 10-bit decoded bit-vector addition using MUXes with 2-AND gates and a 10-OR that is 10 bits wide (ten 10-ORs in parallel, for example) in a situation where overflow occurs. As shown in FIG. 3, the first one of the two 10-bit decoded bit-vectors R (0010000000, corresponding to 7) is input into each of ten MUXes 300-390 with the least significant bit (LSB, 0 here) at the top and with the most significant bit (MSB, 0 here) at the bottom. Then, in leftmost MUX 300, whose output would be the MSB of the result if there were no overflow, the second one of the two 10-bit decoded bit-vectors S (0000100000, corresponding to 5) is input into MUX 300 with the MSB (0 here) at the top and with the LSB (0 here) at the bottom. Consequently, the LSB of R is 2-ANDed with the MSB of S, the MSB of R is 2-ANDed with the LSB of S and all the respective intervening bits are similarly 2-ANDed together, as shown in FIG. 3.
As further shown in FIG. 3, S (0000100000, corresponding to 5) is input into MUX 310, whose output would be the MSB of the result if there were overflow (unless R=S=9), with the 1 bit shifted upward one place (giving 0001000000, corresponding to 6, reading from MSB to LSB). Similarly, S (0000100000, corresponding to 5) is input into MUX 320 with the 1 bit shifted upward two places (giving 0010000000, corresponding to 7), into MUX 330 with the 1 bit shifted upward three places (giving 0100000000, corresponding to 8) and into MUX 340 with the 1 bit shifted upward four places (giving 1000000000, corresponding to 9). The outputs of each of the ten 2-AND gates in each of MUXes 310-340 are ORed together to yield the four MSBs (0000) of the result R+S. Since a 1 does not appear as an output of MUXes 300-340 with non-shifted or upward-shifted inputs, there is an overflow and the ten LSBs of the 19-bit result R+S are all zeroes (0000000000).
Moreover, as shown in FIG. 3, S (0000100000, corresponding to 5) is input into MUX 350, whose output will be the fifth LSB of the 9 overflow bits of the result (unless R=S=9), with the 1 bit shifted downward five places (giving 0000000001, corresponding to 0, reading from MSB to LSB). Similarly, S (0000100000, corresponding to 5) is input into MUX 360 with the 1 bit shifted downward four places (giving 0000000010, corresponding to 1), into MUX 370 with the 1 bit shifted downward three places (giving 0000000100, corresponding to 2), into MUX 380 with the 1 bit shifted downward two places (giving 0000001000, corresponding to 3) and into MUX 390 with the 1 bit shifted downward one place (giving 0000010000, corresponding to 4). The outputs of each of the ten 2-AND gates in each of MUXes 350-390 are ORed together to yield the five LSBs (00100) of the 9 overflow bits of the result R+S. Putting together the four MSBs (0000) and the five LSBs (00100) of the 9 overflow bits with the ten LSBs (0000000000) of the 10 non-overflow bits gives the final 19-bit result R+S=0000001000000000000, corresponding to 12=7+5.
Taking another example, as shown in the conventional design of FIG. 4, the first one of the two 10-bit decoded bit-vectors T (1000000000, corresponding to 9) is input into each of ten MUXes 400-490 with the least significant bit (LSB, 0 here) at the top and with the most significant bit (MSB, 1 here) at the bottom. Then, in leftmost MUX 400, whose output would be the MSB of the result if there were no overflow, the second one of the two 10-bit decoded bit-vectors U (1000000000, also corresponding to 9) is input into MUX 400 with the MSB (1 here) at the top and with the LSB (0 here) at the bottom. Consequently, the LSB of T is 2-ANDed with the MSB of U, the MSB of T is 2-ANDed with the LSB of U and all the respective intervening bits are similarly 2-ANDed together, as shown in FIG. 4. Since a 1 does not appear as an output of MUX 400 with a non-shifted input (as shown in FIG. 4, there is no MUX with an upward-shifted input), there is an overflow and the ten LSBs of the 19-bit result T+U are all zeroes (0000000000).
As further shown in FIG. 4, U (1000000000, corresponding to 9) is input into MUX 410, whose output will be the MSB of the result since there is an overflow, with the 1 bit shifted downward nine places (giving 0000000001, corresponding to 0, reading from MSB to LSB). Similarly, U (1000000000, corresponding to 9) is input into MUX 420 with the 1 bit shifted downward eight places (giving 0000000010, corresponding to 1), into MUX 430 with the 1 bit shifted downward seven places (giving 0000000100, corresponding to 2) and into MUX 440 with the 1 bit shifted downward six places (giving 0000001000, corresponding to 3).
Moreover, as shown in FIG. 4, U (1000000000, corresponding to 9) is input into MUX 450, whose output will be the fifth LSB of the 9 overflow bits of the result, with the 1 bit shifted downward five places (giving 0000010000, corresponding to 4), into MUX 460 with the 1 bit shifted downward four places (giving 0000100000, corresponding to 5), into MUX 470 with the 1 bit shifted downward three places (giving 0001000000, corresponding to 6), into MUX 480 with the 1 bit shifted downward two places (giving 0010000000, corresponding to 7) and into MUX 490 with the 1 bit shifted downward one place (giving 0100000000, corresponding to 8). The outputs of each of the ten 2-AND gates in each of MUXes 410-490 are ORed together to yield all 9 overflow bits (100000000) of the result T+U. Putting together the 9 overflow bits (100000000) with the ten LSBs (0000000000) of the 10 non-overflow bits gives the final 19-bit result T+U=1000000000000000000, corresponding to 18=9+9.
When two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 (corresponding to n, so that a.sub.n =1 and a.sub.i =0 for i.notident.n) and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 (corresponding to m, so that b.sub.m =1 and b.sub.i =0 for i.notident.m) add up to a number n+m greater than N, then N-m MUXes similar to MUXes 310-340 of FIG. 3 may be used to generate the N-m MSBs of the result A+B, and m MUXes similar to MUXes 350-390 of FIG. 3 and MUXes 410-490 of FIG. 4 may be used to generate the m LSBs of the N overflow bits of the result A+B, with A input into all of the MUXes with the LSB a.sub.0 at the top and the MSB a.sub.N at the bottom. Alternatively, N-n MUXes similar to MUXes 310-340 of FIG. 3 may be used to generate the N-n MSBs of the result A+B, and n MUXes similar to MUXes 350-390 of FIG. 3 and MUXes 410-490 of FIG. 4 may be used to generate the n LSBs of the N overflow bits of the result A+B, with B input into all of the MUXes with the LSB b.sub.0 at the top and the MSB b.sub.N at the bottom.
In particular, when one of the two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 corresponds to n=N, so that a.sub.N =1 and a.sub.i =0 for i=0, 1, 2, . . . , N-1, and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 corresponds to m&gt;0, so that b.sub.0 =0, then adding A+B always yields a number N+m that is greater than N. The leftmost MUX gives OR(AND(b.sub.N, a.sub.0), AND(b.sub.N-1,a.sub.1), . . . , AND(b.sub.2, a.sub.N-2), AND(b.sub.1, a.sub.N-1), AND(b.sub.0, a.sub.N))=0, since a.sub.N =1, b.sub.0 =0 and a.sub.i =0 for i=0, 1, 2, . . . , N-1.
The next to leftmost MUX gives the MSB of the result, which is c.sub.2N =OR(AND(b.sub.N-1, a.sub.0), AND(b.sub.N-2, a.sub.1), . . . , AND(b.sub.1, a.sub.N-2), AND(b.sub.0, a.sub.N-1), AND(b.sub.N, a.sub.N))=b.sub.N, again since a.sub.N =1 and a.sub.i =0 for i=0, 1, 2, . . . , N-1. The next to the next to leftmost MUX gives the next to leftmost MSB of the result, which is c.sub.2N-I =OR(AND(b.sub.N-2, a.sub.0), AND(b.sub.N-3, a.sub.1), . . . , AND(b.sub.0, a.sub.N-2), AND(b.sub.N, a.sub.N-1), AND(b.sub.N-1, a.sub.N))=b.sub.N-1, again since a.sub.N =1 and a.sub.i =0 for i=0, 1, 2, . . . , N-1.
Similarly, the rightmost MUX gives the LSB of the N overflow bits of the result, which is c.sub.n+1 =OR(AND(b.sub.0, a.sub.0), AND(b.sub.N, a.sub.1), . . . , AND(b.sub.3, a.sub.N-2), AND(b.sub.2, a.sub.N-1), AND(b.sub.1, a.sub.N))=b.sub.1, since a.sub.N =1 and a.sub.i =0 for i=0, 1, 2, . . . , N-1. The next to rightmost MUX gives the next to rightmost LSB of the N overflow bits of the result, which is c.sub.N+2 =OR(AND(b.sub.1, a.sub.0), AND(b.sub.0, a.sub.1), . . . , AND(b.sub.4, a.sub.N-2), AND(b.sub.3, a.sub.N-1), AND(b.sub.2, a.sub.N))=b.sub.2, again since a.sub.N =1 and a.sub.i =0 for i=0, 1, 2, . . . , N-1. The next to the next to rightmost MUX gives the next to the next to rightmost LSB of the N overflow bits of the result, which is C.sub.N+3 =OR(AND(b.sub.2, a.sub.0), AND(b.sub.1, a.sub.1), . . . , AND(b.sub.5, a.sub.N-2), AND(b.sub.4, a.sub.N-1), AND(b.sub.3, a.sub.N))=b.sub.3, again since a.sub.N =1 and a.sub.i =0 for i=0,1,2, . . . , N-1. Generally, C.sub.N+j =b.sub.j for j=1, 2, . . . , N.
The net result is the (2N+1)-bit decoded bit-vector C=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 . . . c.sub.N c.sub.N-1 c.sub.N-2 . . . c.sub.n+m+1 c.sub.n+m c.sub.n+m-1 . . . c.sub.2 c.sub.1 c.sub.0, where c.sub.i =0 for i=0, 1, 2, . . . , N, which is the expected result of adding N+B. The N+1 LSBs c.sub.N c.sub.N-1 c.sub.N-2 . . . c.sub.n+m+1 c.sub.n+m c.sub.n+m-1 . . . c.sub.2 c.sub.1 c.sub.0 are the N+1 non-overflow bits (all 0's). The N MSBs b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.3 b.sub.2 b.sub.1 =c.sub.2N c.sub.2N-1 C.sub.2N-2 c.sub.2N-3 . . . c.sub.N+3 c.sub.N+2 c.sub.N+1 are the N overflow bits (the N MSBs of B) that, when ORed together, give OR(c.sub.2N,c.sub.2N-1,c.sub.2N-2, . . . ,c.sub.N+3,c.sub.N+2,c.sub.N+1)=OR(b.sub.N, b.sub.N-1, b.sub.N-2, . . . , b.sub.3, b.sub.2, b.sub.1)=1, since b.sub.0 =0 and b.sub.i.notident.0 for i=1, 2, 3, . . . , N, conventionally indicating an overflow.
The time delay for this conventional overflow detection scheme involving adding together two (N+1)-bit decoded bit-vectors using MUXes with 2-AND gates and an (N+1)-OR that is N+1 bits wide (N+1 (N+1)-ORs in parallel, for example) and detecting whether a 1 appears in the overflow bits using an N-OR that is 1 bit wide (one N-OR, for example) may be estimated as follows. The time t.sub.2-AND for using the 2-AND gates in the MUXes may be added to the time t.sub.(N+1)-OR [(N+1)-bits wide] for using the N+1 (N+1)-ORs in the MUXes and to the time t.sub.N-OR [1-bit wide] for using the one N-OR, giving altogether t.sub.conventional-simultaneous =t.sub.2-AND +t.sub.(N+1)-OR [(N+1)-bits wide]+t.sub.N-OR [1-bit wide].
The two (N+1)-bit decoded bit-vectors A=a.sub.N a.sub.N-1 a.sub.N-2 a.sub.N-3 . . . a.sub.n+1 a.sub.n a.sub.n-1 . . . a.sub.3 a.sub.2 a.sub.1 a.sub.0 (corresponding to n, so that a.sub.n =1 and a.sub.i =0 for i.notident.n) and B=b.sub.N b.sub.N-1 b.sub.N-2 b.sub.N-3 . . . b.sub.m+1 b.sub.m b.sub.m-1 . . . b.sub.3 b.sub.2 b.sub.1 b.sub.0 (corresponding to m, so that b.sub.m =1 and b.sub.i =0 for i.notident.m) must also arrive simultaneously for this conventional overflow detection scheme to work since the first step involves adding A (n) and B (m) together using MUXes with 2-AND gates. If there is signal skew, where A (n) arrives earlier to the MUXes with 2-AND gates than B (m), which arrives at a time t.sub.skew later than A (n), then the time t.sub.conventional-skewed =t.sub.skew +t.sub.conventional-simultaneous =t.sub.skew +t.sub.2-AND +t.sub.(N+1)-OR [(N+1)-bits wide]+t.sub.N-OR [1-bit wide].
Using N+1 transmission gates (T-gates) to implement MUXes in the conventional approach to overflow detection is a conventional technique for accommodating signal skew. The earlier arriving decoded bit-vector, A (n) for example, may be used to control the shift MUX that will be used to shift the later arriving decoded bit-vector, B (m) for example. The respective T-gate would already be opened or closed depending on the bits of A (n) (the T-gate control) by the time the bits of B (m) (the T-gate data) arrive. The time delay for the N+1 T-gate MUXes would then be just the data-to-out time t.sub.data-to-out [(N+1)-bits wide] instead of the time delay (t.sub.2-AND +t.sub.(N+1)-OR [(N+1)-bits wide]) through N+1 regular AND/OR MUXes. The total T-gate time delay would then be the time t.sub.conventional-skewed-T-gate =t.sub.skew +t.sub.data-to-out [(N+1)-bits wide]+t.sub.N-OR [1-bit wide]. However, if the input size N is large, the N+1 T-gate MUXes implementation would introduce a large amount of diffusion capacitance at the output of each T-gate. This would increase the data-to-out time t.sub.data-to-out [(N+1)-bits wide] so that the data-to-out time t.sub.data-to-out [(N+1)-bits wide]&gt;&gt;t.sub.2-AND.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.