1. Field of the Invention
The present invention relates to a method of operation of an arithmetic and logic unit capable of performing parallel processing, a storage medium storing this method as a computer program, and an arithmetic and logic unit, more particularly relates to a method of operation of an arithmetic and logic unit, storage medium, and arithmetic and logic unit which introduce the method and concept of converting a problem including a series of decisions having an order dependency which cannot be processed in parallel as it is to an indeterminate code binary tree which can be processed in parallel so as to simplify the circuit configuration and enable higher speed operational processing.
Further, the present invention relates to a method of operation of an arithmetic and logic unit for processing an operation such as comparison, addition, subtraction, and an operation for obtaining the absolute value of a difference of any R-nary number or binary number, a storage medium storing the method as a computer program, and an arithmetic and logic unit, more particularly relates to a method of operation of an arithmetic and logic unit, storage medium, and arithmetic and logic unit which introduce the method and concept of converting a problem including a series of decisions having an order dependency which cannot be processed in parallel as it is to an indeterminate code binary tree which can be processed in parallel so as to simplify the circuit configuration and enable higher speed operational processing.
2. Description of the Related Art
In the first half of the present specification, disclosure will be made of the method and concept of converting a problem including a series of decisions having an order dependency which cannot be processed in parallel as it is to an indeterminate code binary tree which can be processed in parallel. As the arithmetic and logic unit to which the method and concept is applied, here, a priority encoder is mentioned. As an example of a conventional priority encoder, a truth table of a priority encoder (first prior art) in which the input is comprised of a 8-bit input from a bit 0 to a bit 7 and the output is comprised of a 4-bit output of 1 bit of a validity output and 3 bits of numerical output is shown in FIG. 65.
Here, the "priority encoder" is a circuit of function for returning the position (digit place) of the bit which first becomes "1" as a binary numeral when viewing the input data from the MSB (left end) to the LSB (right end) bit by bit. In the case of an 8-bit input, the returned numerals are "7" to "0". Three bits are satisfactory as the output. However, in order to differentiate between a case where all of the input bits are "0" (invalid case) and a case where even one "1" is contained in the input bits (valid case), a validity bit is further added, so a 4-bit output in total is obtained.
For example, when the inputs are:
"00110101", "00111111", and "00100000", PA1 X=X7 X6 . . . X1 X0, and Y=Y7 Y6 . . . Y1 Y0, PA1 X=Xn Xn-1 . . . X1 X0, Y=Yn Yn-1 . . . Y1 Y0 PA1 X'=Xn-1 . . . X1 X0, Y'=Yn-1 . . . Y1 Y0 PA1 (a) Case where the MSB's of the code-bearing binary numbers X and Y are different: PA1 One of the code-bearing binary numbers X and Y is positive and one is negative. PA1 Accordingly, the circuit outputs a conclusion of magnitude regardless of the working comparison result. PA1 Both of the code-bearing binary numbers X and Y are positive or negative. PA1 The conclusion depends upon the working comparison result found by the inputs from the (N-1)-th bit to the zero-th bit. PA1 Accordingly, the circuit outputs the working comparison result as the final conclusion.
since the fifth bit is the first "1" in all of the cases, the outputs become "1101" ("valid", and then "5"). Note that, here, it is assumed that the nearer the position to the MSB (left end), the higher the priority order. However, there is essentially no difference even in the opposite case. For simplification of the discussion, in the present specification, it is assumed that the nearer the position to the MSB, the higher the priority order.
Such a function of a priority encoder becomes necessary when selecting one option from among a plurality of options, i.e., when performing so-called arbitration. For example, when a plurality of bus-connected functional units try to simultaneously output to the bus, the priority encoder performs arbitration.
FIG. 66 is a view of the configuration of a digital device for explaining the arbitration of a single bus SB by the priority encoder MPE. In the digital device of the figure, eight functional units MU0 to MU7 are connected via switching means SW0 to SW7 to a single bus SB. Operational control of the switching means SW0 to SW7 is carried out based on the output of the priority encoder MPE to control the bus connection of the eight functional units MU0 to MU7. Namely, the priority encoder MPE receives as its input bus use requests RQ0 to RQ7 of the functional units and outputs the number of the functional unit allowed to use the bus. The functional unit numbers output by the priority encoder MPE are interpreted by a decoder MDC and become the bus connection control signals CBC0 to CBC7 for controlling the switching means SW0 to SW7.
Further, FIG. 67 shows a view of the configuration of an 8-bit input priority encoder used as a partial circuit of the "Priority Detection Use Counter Device" disclosed in Japanese Unexamined Patent Publication (Kokai) No. 8-147142. Note that in the publication of the this prior art, the encoder is disclosed as one for counting the number of "0's" arranged from the head of the input bit train at a high speed. While the priority encoder and the function under discussion here are not precisely equivalent, they are equal if inversion of the numerical output is made possible.
The 8-bit priority encoder of this prior art is configured by creating a model by a binary tree having a height m of 3. The nodes ND01, ND23, ND45, and ND67 at a depth 3 of the binary tree respectively receive as their inputs 2 bits out of the input data and detect when the bit trains become "01" and "00". They are each provided with an AND logic gate circuit and an OR logic gate circuit. Further, the nodes ND03 and ND47 at a depth 2 of the binary tree are respectively provided with a selector, an AND logic gate circuit, and an OR logic gate circuit. The selector selectively identifies "00" and "01" of the bit train in accordance with the output of the AND logic gate circuit, the AND logic gate circuit detects when all bits of one of the inputs are "0" and all bits of the other of the inputs are not "0", and the OR logic gate circuit detects when all bits become "0", respectively. Further, the node ND07 at a depth 1 of the binary tree is provided with two selectors, an AND logic circuit, and an OR logic gate circuit and functions in the same way as the node at the depth 2.
Below, explanation will be made of terms to be used. In the latter half of the present specification, disclosure is made of the method and concept of converting a problem including a series of decisions having an order dependency which cannot be processed in parallel as it is to a binary tree which can be processed in parallel. As the arithmetic and logic unit to which this method and concept are applied, here, a comparator, adder, subtractor, operation unit for obtaining an absolute value of a difference, etc. realized by forming the logical circuit in the form of a binary tree are mentioned. Therefore, first, the terms such as "binary tree" frequently used in the present specification will be precisely defined.
According to the "Iwanami Joho Kagaku Jiten (Iwanami Information Science Dictionary)", an order tree having no more than two branches at each node is referred to as a "binary tree" ("Iwanami Joho Kagaku Jiten", Iwanami Shoten, 1990, p. 550). Further, particularly, a binary tree having a small biasing with a height of the tree substantially equal to the logarithm (log2 N) of the number N of nodes is referred to as a "balanced binary tree" (p. 683 of the same reference). Further, as a special case of a balanced binary tree, a binary tree having a height of h where there are 2.sup.i number of nodes at a depth i (0.gtoreq.i&lt;h) and where the nodes of the depth h are arranged so as to be filled from the left is referred to as a "complete binary tree" (p. 550 of the same reference).
A balanced binary tree has the property that the height of the tree is substantially equal to the logarithm of the number of leaves and all leaves exist at substantially the same depth. Further, in a complete binary tree, the difference of height between any two leaves is 1 or less. This property is important for realizing a high speed arithmetic and logic unit. In an arithmetic and logic unit in which the logical circuit is formed as a binary tree, the leaves of the tree are defined as the input, and the root of the tree is defined as the output, the height of the tree indicates the number of steps of a critical path.
In a logical circuit of a complete binary tree form, the output (root) is reached from all inputs (leaves) by substantially the same number of steps. Accordingly, there is no critical path which is conspicuously longer in comparison with the number of steps of other routes reaching the output from the input. A method is known of using this property to configure a logical circuit of a complete binary tree form realize a high speed arithmetic and logic unit.
Note that the term of "binary tree" which is frequently used in the present specification means a "complete binary tree" in so far as no special mention is made to the contrary. A binary tree that is not a balanced binary tree and has heights of leaves different from each other will be referred to as a "binary tree having biasing".
Logical circuit of binary tree form of the related art will be explained below.
First, as a second prior art of the arithmetic and logic unit using a conventional logical circuit of a binary tree form, a "Binary Look-Ahead Carry Method and Device of Same" disclosed in Japanese Unexamined Patent Publication (Kokai) No. 6-28158 may be mentioned. In this publication, disclosure is made of a general binary look-ahead carry (BLC) adder using an improved BLC system and a block carry look ahead (BCLA) adder using an improved BCLA system.
FIG. 68 is a view of a configuration concretely showing a carry generation part (BLC array) of individual bits in the 8-bit BLC adder according to the improved system of the BLC addition disclosed in Japanese Unexamined Patent Publication (Kokai) No. 6-28158.
Here, an addend X and another addend Y to be input to the 8-bit BLC adder is defined as:
and X7 and Y7 are defined as codes. Further, Ci-1 (i=0 to 7) is defined as the carry input to the bit i, a carry input to the least significant bit (bit 0) is defined as C-1, and the carry output of the 8-bit BLC adder is defined as C7. Further, a carry generation function indicating that a carry operation is generated at the i digit is defined as Gi, and a carry propagation function indicating the propagation of a carry Ci-1 coming from a digit where the i digit is at a lower significant bit is defined as Pi.
In the 8-bit BLC adder of FIG. 68, the addend X and the other addend Y are input to a carry generation and propagation function generation unit 502, and eight pairs of carry generation functions Gi and carry propagation functions Pi are generated. Further, the carry input Cin regarded as the carry generation function of the -1 bit is added, and nine pairs of carry generation functions and carry propagation functions G0, P0 to G7, P7 and Cin in total are input to the carry generation unit 501. Note that, at this time, the carry propagation function of the -1 bit is "0". Carries C-1 to C7 output from the carry generation unit 501 are input to a sum generation unit 500 together with the carry propagation functions P0 to P7 of the bits, and sums S0 to S7 are generated.
The carry generation and propagation function generation unit 502 is provided with eight input cells IC0 to IC7 which receive as their inputs the addends Xi and the other addends Yi and generate the carry generation functions Gi and the carry propagation functions Pi and one dummy cell ICD in parallel. FIG. 69A is a circuit diagram of an input cell of the present prior art. In the figure, the input cell IC is structured provided with an AND gate circuit GA51 for taking the AND logic of the addend Xi and the other addend Yi and outputting the carry generation function Gi and an OR gate circuit GO51 for taking the OR logic of the addend Xi and the other addend Yi and outputting the carry propagation function Pi. Note that the dummy cell ICD is a buffer cell with an improved driving ability.
Further, the carry generation unit 501 is configured as a 9.times.4 cell matrix. A first column is configured with the -1-th bit defined as a dummy cell CCD1-1 and a zero-th bit to seventh bit defined as carry generation cells CC10 to CC17; a second column is configured with the -1-th bit and the zero-th bit defined as dummy cells CCD2-1 and CCD20 and the first bit to the seventh bit defined as carry generation cells CC21 to CC27; a third column is configured with the -1-th bit to the second bit defined as dummy cells CCD3-1 to CCD32 and the third bit to the seventh bit defined as carry generation cells CC33 to CC37; and, further, a fourth column is configured with the -1-th bit to the sixth bit defined as dummy cells CCD4-1 to CCD46 and the seventh bit defined as a carry generation cell CC47. FIG. 69B is a circuit diagram of a carry generation cell of the present prior art. In the figure, the carry generation cell CC is configured provided with a composite logical gate circuit GC51 for further taking the OR logic between the value obtained by taking the AND logic of the carry propagation function Pi and the carry generation function Gj and the carry generation circuit Gi and with an AND gate circuit GA52 taking the AND logic of the carry propagation functions Pi and Pj.
Further, the sum generation unit 500 is configured provided with eight EXOR gate circuits GX5i for taking the exclusive OR logic of the carry Ci-1 output from the carry generation unit 501 and the carry propagation function Pi and outputting the sum Si in parallel as shown in FIG. 69C.
The 8-bit BLC adder of the present prior art is intended to further simplify the configuration of the carry generation unit 501 and reduce the delay time and power consumption by regarding the carry input Cin as the carry generation function lower than the least significant bit by one bit in this way.
Next, an explanation will be made of an N-bit code-bearing binary comparator as a third prior art of an arithmetic and logic unit using a conventional binary tree-like logical circuit. FIG. 70 is a view of the configuration of an N-bit code-bearing binary comparator receiving as its input the code-bearing binary numbers X and Y:
and outputting a final conclusion (Greater Than, Less Than, Equal).
An N-bit code-bearing binary comparator is configured mainly provided with two constituent elements as shown in the figure. One is an (N-1) bit code-less binary comparator 600 receiving as its input the code-less binary numbers X' and Y':
and outputting a working conclusion (Greater Than, Less Than, Equal), and the other one is a code judgement circuit 601 for outputting a final conclusion from the working conclusion of the (N-1) bit code-less binary comparator and the code bits Xn, Yn of the code-bearing binary numbers X and Y. Note that, in the code-bearing binary numbers X and Y, the MSB (Most significant Bit) Xn and Yn thereof indicate the code. When MSB=0, it means a positive number, while when MSB=1, it means a negative number.
In this way, in a code-bearing binary comparator of the present prior art, the input bits other than the MSB are input to the (N-1) bit code-less binary comparator 600, where a working comparison result is first obtained. This working comparison result and the MSB of the code-bearing binary numbers X and Y are input to the code judgement circuit 601, where a final comparison result is obtained.
Further, the contents of the judgement performed in the code judgement circuit 601 are as follows:
(b) Case where the MSB's of the code-bearing binary numbers X and Y are the same:
Next, an explanation will be made of a subtractor according to a fourth prior art of an arithmetic and logic unit using a conventional logical circuit of a binary tree form. The methods used in the conventional subtractor are roughly divided into two by the basic logical equation. The first method is based on the definition of subtraction of a 1-bit binary number. The second method is to add the complement of 2 of Y to X in the subtraction of X-Y.
First, in the first method of the subtractor, the subtraction by two 1-bit binary numbers (X,Y) and an external borrow input Bin is defined as follows: Note that, in the present specification, the OR logic is denoted by an "+" operator, the AND logic is denoted by a ".multidot." operator, negative logic is denoted by an " " operator, and the exclusive OR logic is denoted by a "(+)", respectively: EQU Difference D=X-Y-Bin=X(+)Y(+)Bin (mod 2) EQU External borrow output Bout= X.multidot.Y+Y.multidot.Bin+Bin.multidot. X(1)
Further, the truth table of Equation (1) becomes as shown in FIG. 71A.
On the other hand, the second method of the subtractor takes note of the fact that an operation of "X-Y" is equivalent to an operation such as "X+(Complement of 2 of Y)" and uses an adder to obtain a subtractor. The complement of 2 of Y is obtained by adding "1" to all bits of Y inverted. In order to add "1", it is sufficient to make the external carry input Cin of the adder "1". Namely, Cin=Bin. Accordingly, in the case of a 1-bit binary number, the definition of subtraction by the logical equation becomes as follows: EQU Difference D=X(+) Y(+) Cin=X(+)Y(+)Cin EQU External borrow output Bout=X.multidot. Y+ Y.multidot. Cin+ Cin.multidot.X(2)
Further, the truth table of Equation (2) becomes as shown in FIG. 71B. Further, the example of the configuration realized by the adder 500 becomes as shown in FIG. 72.
Comparing Equation (1) and Equation (2), the logical equations of the difference D are exactly the same, while the logical equations of the external borrow output Bout are obtained by respectively representing the same logical function by the positive logic output in Equation (1) and by the negative logic output in Equation (2), respectively. Accordingly, where realization by a logical circuit is being considered, there is no difference between Equation (1) and Equation (2) in terms of the number of gates and delay. For this reason, in conventional subtractors, subtractors using the definition of Equation (1) in which no particular advantages are found are not used that much. The already well known method of using an adder to realize a subtractor such as with the subtractor of the definition of Equation (2) is easy and has been generally widely used.
Next, an explanation will be made of an arithmetic and logic unit for performing an operation for obtaining an absolute value of a difference according to a fifth prior art of an arithmetic and logic unit using a conventional logic circuit of a binary tree form. As the method of finding the absolute value .vertline.X-Y.vertline. of the difference (X-Y) of any N-nary non-negative numbers X and Y, not limited to only binary numbers, the following methods are known:
The first method of an operation for obtaining an absolute value of a difference is to compare X and Y and subtract the smaller one from the larger one. Further, the second method is to simultaneously perform the subtraction of (X-Y) and the subtraction of (Y-X) and define the one giving a positive result as the absolute value. Further, the third method is to perform the subtraction of (X-Y) and define this as the absolute value when the result of subtraction is positive and to invert the code and define this as the absolute value when the result of subtraction is negative.
The methods which are usually adopted for an operation for obtaining an absolute value of a difference of binary numbers are the second and third methods. FIG. 73 is a view of the configuration of a unit for performing an operation for obtaining an absolute value of a difference according to the second method. The subtraction of (X-Y) and the subtraction of (Y-X) are simultaneously calculated by two subtractors 511 and 512, and the positive result is selected as the absolute value by the selector 513 based on the code of the result of subtraction of (X-Y).
Further, FIG. 74 is a view of the configuration of a unit for performing an operation for obtaining an absolute value of a difference according to the third method. It is configured by a complementer 522 including one subtractor 521 and one incrementer (+1 operation unit) and by one selector 523 having a required bit width. The subtraction of (X-Y) is carried out by the subtractor 521, while the complement of 2 of the result of the subtraction is produced by the 2-complementer 522. The selector 523 defines the output of the difference of the subtractor 521 as the absolute value when the result of subtraction is positive based on the code of the result of the operation by the subtractor 521 and defines the output of the complement of 2 of the 2-complementer 522 as the absolute value when the result of subtraction is negative. Note that, when the result of subtraction is negative, the complement of 2 of the result of subtraction is produced so as to invert the code. The complement of 2 is obtained by adding 1 by incrementer after inverting all bits of the data.
To solve the above mentioned disadvantages, first, consideration will be made of the first prior art (priority encoder). The integration density of semiconductor integrated circuits has been improved year by year. At present, it has become possible to include many functional units. In order to realize a high level of parallel processing in the future as well, it is certain that the number of the functional units will increase along with improvements made in the integration density. From such a background, it can be said that a priority encoder having an arbitration function is a key device. In the realization thereof, there is the problem that a high speed is particularly important.
Next, consideration will be given to the second prior art (8-bit BLC adder). In an arithmetic and logic unit using a logic circuit in the form of a binary tree such as a BLC adder, roughly, an operator "@" having a similar meaning as: ##EQU1## is defined. Namely, in the 8-bit BLC adder of the second prior art, the operator "@" is realized by a logical circuit. A binary tree using this as an element is used. A concrete logical circuit realizing the operator "@" becomes as shown in FIG. 69B.
However, this operator "@", as apparent also from FIG. 69B, includes the logical function (Gi+(Pi*Gj)), that is, the logical function of the composite logical gate circuit GA 51 (AND-OR gate circuit). Recently, as a method of increasing the speed in an arithmetic and logic unit, the method of constructing a logic by a transmission gate has been attracting attention. For example, in the reference (Makino, Suzuki, Morinaka, et al, "286 MHz and 64 Bit Floating Decimal Multiplier Having Function Suitable to CG" in Japan Association for Information and Communication Technical Research Report ICD95-146, pp. 13-20, 1995), it is mentioned that the speed of a logical device constructed by a transmission gate is higher than a composite logical gate circuit. Further, a composite logical gate circuit has a small load driving ability and is weak against increases in the wiring capacity. Accordingly, there was a problem that an arithmetic and logic unit using a conventional logical circuit of a binary tree form was not suitable for an arithmetic and logic unit constructed by standard cells etc.
Further, particularly, the 8-bit BLC adder of the second prior art is a so-called BLC array which arranges logical circuits of the operator "@" in an array and generates carry signals to the individual bits. It regards the external carry input Cin as the input of a-1 bit and provides the -1-th row also in the BLC array, therefore there was a problem that the number of gates was large and also the surface area became large.
Further, in an arithmetic and logic unit using a logical circuit of a binary tree form of the above third prior art (N-bit code-bearing binary comparator), the input bits other than the MSB are input to an (N-1) bit code-less binary comparator 600 where a working comparison result is obtained, then the working comparison result and the MSB of the code-bearing binary numbers X and Y are input to the code judgement circuit 601 where the final comparison result is obtained, but the logical circuit of the input stage in the (N-1) bit code-less binary comparator 600 has a large number of gates, is complex, and further is realized by cascade connection of the code judgement circuit to the latter stage of the code-less binary comparator, therefore there was a problem that the delay was increased by exactly the amount of the code judgement circuit 601.
Next, when considering the fourth prior art (subtractor), when discussing the problems of a conventional subtractor using a complement of 2, the points under the discussion may be roughly divided into two. One is the matter concerning the adder forming the core of the subtractor. The other is the matter occurring when converting the adder to a subtractor. The former matter is the matter concerning the adder per se, therefore is omitted here. Consideration will be given to the latter, i.e., the matter occurring when converting the adder to a subtractor.
There are a variety of methods configuring the adder such as CLA, BCLA, CSelectAdder, CSkipAdder, etc. However, the method of giving the complement of 2 does not depend upon such a method of configuration. When converting an adder to a subtractor, as shown in FIG. 72, it is sufficient to add all of the bits of Y inverted to the Y input of the adder 500 and the borrow input Bin inverted to the carry input Cin of the adder 500, respectively. Namely, the difference between an adder and subtractor resides in only the point of whether or not an inverter (negative logic gate circuit) is attached to the input Y and the input Cin. Accordingly, in a conventional subtractor, there was a problem that both of the number of gates and the signal propagation delay were increased due to the addition of the inverter to the adder.
Next, when considering the fifth prior art (arithmetic and logic unit performing operation for obtaining absolute value of difference), among the above three methods, first, the operation unit for obtaining an absolute value using the second method is configured by two subtractors 511 and 512 and one selector having a required bit width as shown in FIG. 73. The superior point of the operation unit for obtaining an absolute value according to this second method is that the operation speed is high. Since the subtractions are carried out in parallel, when CLA and BCLA type subtractors are used, the operation time is proportional to the logarithm (log.sub.2 N) of the data bit width N. However, there is the defect that there also exists a problem that two subtractors are necessary, so the number of gates is large.
Further, the operation unit for obtaining an absolute value according to the third method is configured by one subtractor 521 and one 2-complementer 522 and one selector 523 having a required bit width as shown in FIG. 74. The superior point of the operation unit for obtaining an absolute value of a difference according to this third method is that the amount of hardware is small. The incrementer can be realized by less than half the number of transistors in comparison with the CLA and BCLA type subtractors. The amount of hardware is clearly smaller in comparison with the second method needing two subtractors. However, there is the defect that the operation unit for obtaining an absolute value of a difference according to the third method has a large operation delay and is slow. That is, the delay of the CLA and BCLA type subtractors is proportional to the logarithm (log.sub.2 N) of the data bit width N and the delay of the high speed CLA and BCLA type incrementers is proportional to the logarithm (log.sub.2 N) of the data bit width N, therefore when considering the fact that the result of subtraction is input to the incrementer, the total delay becomes proportional to (2.times.log.sub.2 N). Accordingly, clearly, the signal propagation delay of the operation unit for obtaining an absolute value of a difference using the third method becomes larger than that of the operation unit for obtaining an absolute value of a difference using the second method.