The present invention relates to an arithmetic logic unit and arithmetic/logical operation method for realizing high-speed multiply, multiply-and-accumulate and other operations that are frequently used in signal processing.
In the past, multi-media data was processed using a microprocessor and a dedicated LSI in combination. However, thanks to recent amazing performance enhancement of microprocessors, it is now possible for a microprocessor to execute some types of multi-media data processing by itself. The development of a register-divided operation method was one of the factors contributing to this performance enhancement achieved. Also, in the fields of image processing, audio processing and so on, an operation method of deriving a single accumulated data word from input array elements is often used.
FIG. 12 illustrates a configuration for a known arithmetic logic unit performing a multiply-and-accumulate operation using a divided register.
In FIG. 12, a register 105 stores 32-bit accumulated data words ZU and ZL as its high- and low-order 32 bits, respectively. A multiplier 101 receives and multiplies together the high-order 16 bits of an input 32-bit data word X (hereinafter, referred to as xe2x80x9cXUxe2x80x9d) and the high-order 16 bits of another input 32-bit data word Y (hereinafter, referred to as xe2x80x9cYUxe2x80x9d) and outputs a 32-bit product. A multiplier 102 receives and multiplies together the low-order 16 bits of the input data word X (hereinafter, referred to as xe2x80x9cXLxe2x80x9d) and the low-order 16 bits of the input data word Y (hereinafter, referred to as xe2x80x9cYLxe2x80x9d) and outputs a 32-bit product. An adder 103 adds up the output data of the multiplier 101 and the data word ZU retained as high-order 32 bits in the register 105 and outputs a 32-bit sum. An adder 104 adds up the output data of the multiplier 102 and the data word ZL retained as low-order 32 bits in the register 105 and outputs a 32-bit sum. The output data of the adder 103 is stored as the high-order 32 bits in the register 105, while the output data of the adder 104 is stored as the low-order 32 bits in the register 105.
In the arithmetic logic unit with such a configuration, the multiplier 101 performs the multiplication XUxc2x7YU, the adder 103 adds up the product obtained by the multiplier 101 and ZU that has been stored in the high-order 32 bits in the register 105, and the register 105 stores the result of the multiply-and-accumulate operation XUxc2x7YU+ZU, which is the output of the adder 103, as its high-order 32 bits.
In the same way, the multiplier 102 performs the multiplication XLxc2x7YL, the adder 104 adds up the product obtained by the multiplier 102 and ZL that has been stored as low-order 32 bits in the register 105, and the register 105 stores the result of the multiply-and-accumulate operation XLxc2x7YL+ZL, which is the output of the adder 104, as its low-order 32 bits.
Suppose the multiply-and-accumulate operation is performed N times with the array elements shown in FIG. 13 provided as the input data words X and Y to the arithmetic logic unit and with i, or the number of times the data words are input, changed from 0 through Nxe2x88x921. In that case, (x0xc2x7y0+x2xc2x7y2+ . . . +x2nxe2x88x922xc2x7y2nxe2x88x922) will be stored as the result of the operation in the high-order 32 bits of the register 105, while (x1xc2x7y1+x3xc2x7y3+ . . . +x2nxe2x88x921xc2x7y2nxe2x88x921) will be stored as the result of the operation in the low-order 32 bits of the register 105.
Problems to be Solved
However, the conventional arithmetic logic unit must perform the multiply-and-accumulate operation N times and then add together (x0xc2x7y0+x2xc2x7y2+ . . . +x2nxe2x88x922xc2x7y2nxe2x88x922) stored in the high-order 32 bits in the register 105 and (x1xc2x7y1+x3xc2x7y3+ . . . +x2nxe2x88x921xc2x7y2nxe2x88x921) stored in the low-order 32 bits in the register 105 to obtain (x0xc2x7y0+x1xc2x7y1+x2xc2x7y2+ . . . +x2nxe2x88x922xc2x7y2nxe2x88x922+x2nxe2x88x921xc2x7y2nxe2x88x921).
To carry out this addition, only the high-order 32 bits of the data stored in the register 105 should be transferred to another register and only the low-order bits of the data stored in the register 105 should be transferred to still another register (or the same register as that receiving the high-order 32 bits). Then, these data bits transferred must be added together.
As can be seen, to obtain a single accumulation result from multiple input data words divided, the conventional arithmetic logic unit needs to perform not only the multiply-and-accumulate operation but also data transfer and addition, thus adversely increasing its processing cycle.
An object of the present invention is providing an arithmetic logic unit that can obtain a single accumulation result from multiple input data words divided without performing the data transfer and addition.
To solve this problem, an inventive arithmetic logic unit according to the present invention receives (nxc3x97M)-bit data words X and Y and outputs a single independent data word Z, where X and Y are each composed of a number n of M-bit data units that are independent of each other. The arithmetic logic unit includes: 1st through nth multipliers, each multiplying together associated data units with the same digit position of the data words X and Y; 1st through nth shifters, each being able to perform bit shifting on an output of associated one of the 1st through nth multipliers; and an adder for adding up outputs of the 1st through nth shifters. If a sum of the outputs of the 1st through nth multipliers is obtained as the data word Z, the 1st through nth shifters perform no bit shifting. But if the outputs of the 1st through nth multipliers are obtained separately for the data word Z, the 1st through nth shifters perform a bit-shifting control in such a manner that the outputs of the 1st through nth multipliers are shifted to respective digit positions not overlapping each other.
In such a configuration, a multiply-and-accumulate operation can be performed with the number of steps reduced. Also, by switching the modes of control performed by the shifter, multiple lines of multiplication can be performed in parallel.
To solve the above problem, an inventive arithmetic logic unit according to the present invention receives (nxc3x97M)-bit data words X and Y and outputs a single independent data word Z, where X and Y are each composed of a number n of M-bit data units that are independent of each other. The arithmetic logic unit includes: a register for storing the data word Z; 1st through nth multipliers, each multiplying together associated data units with the same digit-position of the data words X and Y; and an adder for adding up outputs of the 1st through nth multipliers and an output of the register and inputting the sum to the register. The arithmetic logic unit performs a sum-of-products operation with the data words X and Y input for multiple cycles.
In such a configuration, even though an increased number of inputs should be provided to a multi-input adder, the increase in circuit size of the adder can be relatively small. Thus, a multiply-and-accumulate operation is realizable with the increase in circuit size minimized.
To solve the above problem, an inventive arithmetic logic unit according to the present invention receives (nxc3x97M)-bit data words X and Y and outputs a single independent data word Z, where X and Y are each composed of a number n of M-bit data units that are independent of each-other. The arithmetic logic unit includes: a register for storing the data word Z; 1st through nth multipliers, each multiplying together associated data units with the same digit position of the data words X and Y; 1st through nth shifters, each being able to perform bit shifting on an output of associated one of the 1st through nth multipliers; and an adder for adding up outputs of the 1st through nth shifters and an output of the register and inputting the sum to the register. In performing a sum-of-products operation with the data words X and Y input for multiple cycles, if a cumulative sum of products of the 1st through nth multipliers is obtained as the data word Z, the 1st through nth shifters perform no bit shifting. But if the sums of products of the 1st through nth multipliers are obtained separately for the data word Z, the 1st through nth shifters perform such a control that the outputs of the 1st through nth multipliers are shifted to respective digit positions not overlapping each other.
In such a configuration, a sum-of-products operation can be performed with the number of steps reduced. Also, by switching the modes of control performed by the shifter, multiple lines of sum-of-products operations can be performed in parallel.
To solve the above problem, an inventive arithmetic logic unit according to the present invention receives (nxc3x97M)-bit data words X and,Y and outputs a data word Z, where X and Y are each composed of a number n of M-bit data units. The arithmetic logic unit is characterized by including: a register for storing the data word Z; a selector for selecting one of the number n of data units of which the data word Y is made up; 1st through nth multipliers, each selecting one of the number n of data units, of which the data word X is made up, and multiplying together the data unit selected and an output of the selector, the data units selected by the multipliers not overlapping each other; 1st through nth shifters, each being able to perform bit shifting on an output of associated one of the 1st through nth multipliers; and an adder for adding up outputs of the 1st through nth shifters and an output of the register and inputting the sum to the register. The arithmetic logic unit is also characterized in that in a pth cycle, the selector selects a pth least significant one of the data units-and a qth least significant one of the shifters performs a bit shifting control by (p+qxe2x88x922)M bits.
In this configuration, even a multiplicand with a bit number equal to or greater than the number of bits input to the multipliers included in the arithmetic logic unit can be multiplied.
To solve the above problem, an inventive arithmetic logic unit according to the present invention receives 2M-bit data words X and Y and outputs a 4M-bit data word Z. The arithmetic logic unit includes: a first register for storing bit-by-bit carries C resulting from additions; a second register for storing bit-by-bit sums S resulting from the additions; a third register for storing the data word Z; a first decoder for receiving and decoding high-order M bits of the data word X; a second decoder for receiving and decoding low-order M bits of the data word X; first and second selectors, each selecting either high- or low-order M bits of the data word Y; a first partial product generator for receiving output data of the first decoder and the first selector and generating partial products for a multiply-and-accumulate operation; a second partial product generator for receiving output data of the second decoder and the second selector and generating partial products for the multiply-and-accumulate operation; a first full adder for adding up the partial products generated by the first partial product generator; a second full adder for adding up the partial products generated by the second partial product generator; a data extender/shifter that receives output data of the first and second full adders and can perform data extension and data shifting on the data; a carry-propagation adder for receiving, and performing a carry-propagation addition on, the bit-by-bit carries C and the bit-by-bit sums S that have been stored in the first and second registers and outputting the result to the third register; a third selector for selectively outputting either the data stored in the third register or zero data; a fourth selector for selectively outputting either the output data of the carry-propagation adder or zero data; and a third full adder for receiving, and performing a full addition on, the output data of the data extender/shifter and the output data of the third and fourth selectors and for inputting the bit-by-bit carries C and the bit-by-bit sums S to the first and second registers, respectively.
In such a configuration, a sum-of-products operation can be performed with the number of steps and the circuit size both reduced. Also, by switching the modes of control performed by the shifter, multiple lines of sum-of-products operations can be performed in parallel or a multiplicand with a bit number equal to or greater than the number of bits input to the multipliers can be multiplied with the circuit size reduced.
A solution worked out by the present invention is an arithmetic/logical operation method for calculating a single independent data word Z from input (nxc3x97M)-bit data words X and Y, where X and Y are each composed of a number n of M-bit data units that are independent of each other. The method includes the steps of: multiplying together associated data units with the same digit position of the data words X and Y, thereby obtaining respective products; shifting bits of the products obtained in the multiplying step; and adding up values obtained in the shifting step, thereby obtaining a sum. If a sum of the number n of products obtained in the multiplying step is calculated as the data word Z, no bit shifting is performed in the shifting step. But if the number n of products are obtained separately in the multiplying step for the data word Z, bit-shifting is performed in the shifting step in such a manner that the number n of products are shifted to respective digit positions not overlapping each other.
According to the present invention, a multiply-and-accumulate operation can be performed with the number of steps reduced.
A solution worked out by the present invention is an arithmetic/logical operation method for calculating a single independent data word Z from input (nxc3x97M)-bit data words X and Y, where X and Y are each composed of a number n of M-bit data units that are independent of each other. The method includes the steps of: multiplying together associated data units with the same digit position of the data words X and Y, thereby obtaining respective products; adding up the number n of products obtained in the multiplying step, thereby obtaining a sum; and performing a sum-of-products operation on the sums obtained in the adding step with the data words X and Y input for multiple cycles.
According to the present invention, a multiply-and-accumulate operation can be performed with the increase in circuit size minimized.
A solution worked out by the present invention is an arithmetic/logical operation method for calculating a single independent data word Z from input (nxc3x97M)-bit data words X and Y, where X and Y are each composed of a number n of M-bit data units that are independent of each other. The method includes the steps of: multiplying together associated data units with the same digit position of the data words X and Y, thereby obtaining respective products; shifting bits of the products obtained in the multiplying step; adding up values obtained in the shifting step, thereby obtaining a sum; and performing a sum-of-products operation on the sums obtained in the adding step with the data words X and Y input for multiple cycles. If a cumulative sum of the number n of products, which have been generated in the multiplying step, is obtained as the data word Z, no bit shifting is performed in the shifting step. But if the sums of the number n of products, which have been generated in the multiplying step, are obtained separately for the data word Z, bits of the number n of products are shifted to respective digit positions not overlapping each other in the shifting step.