The present invention relates to a signal processor and a multiply-accumulate unit with a rounding function for use in such a signal processor.
Signal processors read data from memory and process the read data in various ways, i.e., processes of addition, subtraction, logical operation, and multiplication. The processing capability of signal processors are highly increased by incorporating a multiply-accumulate unit which can execute, in one processor cycle, multiply-accumulate operations that frequently appear in signal processing program such as image processing, sound processing, or the like.
FIG. 1 of the accompanying drawings shows a conventional signal processor having several execution units, registers, and memory. The signal processor shown in FIG. 1 is introduced in xe2x80x9cIEEE VLSI SIGNAL PROCESSING, VIxe2x80x9d, pp. 93-101, 1993.
As shown in FIG. 1, the conventional signal processor has eight 40-bit registers hereinafter referred to as xe2x80x9cregisters 50xe2x80x9d), MAC (multiply-accumulate) unit 52, MUX (multiplexer) 53, ALU (arithmetic and logical unit) 54, BSFT (barrel shift unit) 55, X memory 57x, and Y memory 57y. X memory 57x and Y memory 57y are hereinafter referred to as memory 57x and memory 57y, respectively.
Memory 57x and memory 57y are connected to registers 50 by respective data buses 58x, 58y. MAC unit 52, ALU 54, MUX 53, and BSFT 55 are connected to output lines 51a, 51b, and 51c from registers 50.
MAC unit 52 carries out multiply-accumulate operations. ALU 54 carries out an arithmetic or logical operation using an immediate value imm. selected by MUX 53 or a value from registers 50. BSFT 55 carries out an arithmetic or logical shift using an immediate value imm. selected by MUX 53 or a value from registers 50.
Multiply-accumulate operations that frequently appear in signal processing program are operations to perform a multiplication and an accumulation according to the following equation (1):
A=A+Bxc3x97Cxe2x80x83xe2x80x83(1)
Specifically, the product of multiplicand B and multiplier C is added to addend A on the right side of the equation (1), and the sum is placed on the left side A of the equation (1). In most cases, addend A on the right side of the equation (1) is the result of multiply-accumulate operations that are frequently performed, while it may be read from memory in some cases. Operations that are represented by the equation (1) where the symbol xe2x80x9c+xe2x80x9d on the right side of the equation (1) is replaced with the symbol xe2x80x9cxe2x88x92xe2x80x9d are also referred to as multiply-accumulate operations.
In general multiply-accumulate units which deal with fixed-point numerical data, multiplicand B and multiplier C on the right side of the equation (1) are usually expressed in 16-bit wide because of practical and economical reasons. Since the product of multiplicand B and multiplier C becomes 32-bit wide at maximum, each of addend A on the right side of the equation (1) and the sum A on the left side of the equation (1) need to be expressed in 32-bit wide or more.
For the above reason, general signal processors have 32-bit registers or more to save the results of multiply-accumulate operations. When two 16-bit data are held in one register of such a signal processor, they are placed in 15th-0th bits or 31st-16th bits of the register.
Let us describe the multiply-accumulate operation according to the equation (1) which is carried out by the conventional signal processor shown in FIG. 1 with reference to FIG. 2 of the accompanying drawings. FIG. 2 shows a sequence to perform a multiply-accumulate operation with registers 50 and MAC unit 52 of the conventional signal processor shown in FIG. 1.
Multiplicand B, multiplier C, and addend A on the right side of the equation (1) are read from memory connected to the signal processor into register 502, register 503, and register 501, respectively.
Multiplicand B and multiplier C may be placed in either 31st-16th bits or 15th-0th bits of registers 502, 503. It is assumed here that multiplicand B is placed in 31st-16th bits of register 502 and multiplier C is placed in 15th-0th bits of register 503. Addend A is placed in all the bits of register 501. In FIG. 2, numerals shown beneath registers 501, 502, 503 indicate bit positions therein.
Then, addend A is stored in ACC (accumulator) 523 of MAC unit 52. Multiplicand B and multiplier C are supplied to multiply unit 521 in MAC unit 52, which calculates the product of multiplicand B and multiplier C. The calculated product of multiplicand B and multiplier C is added to addend A from ACC 523 by adder/subtractor (xc2x1) 522. The sum produced by adder/subtractor 522 is temporarily stored in ACC 523, and written back via output line 56 into register 501 which has stored addend A.
Now, let us consider a process of reading an addend as 16-bit data from memory, performing a certain multiply-accumulate operation on the added, and saving the result as 16-bit data in memory on the conventional signal processor shown in FIG. 1. In the process, all input and output data are 16-bit wide regardless of interim data sizes.
The above process occurs when the multiplicand or multiplier in a multiply-accumulate operation is used as the addend in another multiply-accumulate operation. In this process, since all the addend, the multiplicand, and the multiplier are 16-bit wide, the result may possibly cause an overflow depending on the values of the addend, the multiplicand, and the multiplier. However, the multiply-accumulate operation can be performed without an overflow if the addend, the multiplicand, and the multiplier are arranged in a suitable range.
Let us describe the multiply-accumulate operation in the above process on the conventional signal processor shown in FIG. 1 with reference to FIG. 3 of the accompanying drawings. In FIG. 3, multiplicand B, multiplier C, and addend A are read from memory connected to the signal processor into 31st-16th bits of register 502, 15th-0th bits of register 503, and 31st-16th bits of register 501, respectively.
When addend A expressed as fixed-point 16-bit data is read into register 501, the sign of addend A is inserted into 39th-32nd bits, addend A into 31st-16th bits, and xe2x80x9c0xe2x80x9d into 15th-0th bits. A state in which the data are stored in registers 501, 502, 503 is referred to as state 50n. A state of the registers after the multiply-accumulate operation is referred to as state 50n1. A state of the registers after the result is rounded off is referred to as state 50n2.
In state 50n1 which follows state 50n, the result A+Bxc3x97C is stored in register 501. In state 50n2, the result of the multiply-accumulate operation, which is 40-bit wide, is rounded off into 16 bits by ALU 54, and the rounded result is stored in register 501. Finally, the rounded result is stored in memory.
There are two problems in the above processing sequence. The first problem is that the data size of addend A read from memory and the data size of an addend required by MAC unit 52 are different from each other. Since addend A is 16-bit data, it has to be expanded into 40-bit data for multiply-accumulate operations. Therefore, two 16-bit addends cannot be placed in one register.
The second problem is that the data size of the result of the calculation performed by the MAC unit and the data size of the result when it is stored in memory are different from each other. Because the MAC unit of the conventional signal processor outputs a 40-bit result, when it is to be stored as 16-bit data into memory, the 40 bits need to be rounded off into 16 bits. Consequently, a rounding process has to be carried out in addition to the multiply-accumulate operation.
If the bus size between memory and the register is increased to 32 bits in order to improve performance of the conventional signal processor, then two 16-bit data can simultaneously be read through each data bus.
Let us analyze multiply-accumulate operations with 16-bit input and output data on such a signal processor. Since each of the multiplicand and the multiplier is expressed as 16-bit data, the signal processor can simultaneously read both the multiplicand and the multiplier by exploiting its 32-bit data transfer capability. The read two 16-bit data are stored respectively in 31st-16th bits and 15th-0th bits of a register.
Similarly, two addends may simultaneously be read into a register by exploiting the 32-bit data transfer capability. However, the reading process does not work well because these two addends are placed in one register in spite of the fact that each of these addends must be placed in 31st-16th bits of an individual register, and that 15th-0th bits of the register must be filled with xe2x80x9c0xe2x80x9d for following multiply-accumulate operations.
Specifically, if two addends are stored in 31st-16th bits and 15th-0th bits of a register, then no correct operation can be performed. Consequently, addends need to be read, one at a time, into a register.
If two addends are read into one register, then they have to be moved into distinct registers by means of register-to-register transfer or shift operations. In this case, even though the number of load instructions to read two addends from memory may be reduced to half, the total number of instructions in terms of reading two addends cannot be reduced to half because extra data transfer instructions are required to separate two addends within one register. This means that the 32-bit data transfer capability between registers and memory cannot substantially be exploited.
As described above, the conventional signal processor suffers from some problems with regard to the handling of 16-bit addends. One of problems is that excess resources are occupied in multiply-accumulate operations where all inputs and outputs are 16-bit wide. A 16-bit addend must be placed into a register with expanded to the width of the register in an appropriate manner in order to match the data size required by the MAC unit.
Furthermore, since results of multiply-accumulate operations have the same data size as the size of registers, they need to be rounded off into 16-bit data in order to be stored into memory. This problem causes another problem in which the efficiency of data transfer between memory and the register cannot be increased.
For example, even if data bus widths between registers and memory in FIG. 1 are doubled to 32 bits so as to be able to read one 32-bit data or two 16-bit data through each data bus, it is indispensable to transfer addend data between registers in order to read two 16-bit addends simultaneously into registers through the doubled data bus and to efficiently process those 16-bit addends. Consequently, the efficiency of data transfer between registers and memory, which is required until the operations are carried out, cannot be increased.
An object of the present invention, therefore, is to provide a signal processor which can solve the above problems and efficiently handle 16-bit addends, or to provide a multiply-accumulate unit with a rounding function for use in such a signal processor. A more specific object of the present invention is to provide a signal processor which is able to execute efficiently 16-bit multiply-accumulate operations taking into account the position of an addend in a register, or to provide a multiply-accumulate unit with a rounding function for use in such a signal processor.
A signal processor based on the present invention includes a multiply-accumulate unit with a rounding function, which performs a multiply-accumulate operation on an addend, a multiplicand, and a multiplier. The signal processor has a number of registers connected to the multiply-accumulate unit with the rounding function. The multiply-accumulate unit with the rounding function comprises selecting inputting means for entering an addend supplied selectively from different positions in one of said registers, rounding means for performing a rounding process to convert data of a larger data size into data of a smaller data size on the result of the multiply-accumulate operation where the addend is selectively entered by said selecting inputting means, and selection outputting means for outputting the result of the multiply-accumulate operation rounded by said rounding means selectively to different positions in one of said registers.
A multiply-accumulate unit with a rounding function based on the present invention includes a multiply-accumulate unit for performing a multiply-accumulate operation on an addend, a multiplicand, and a multiplier. The multiply-accumulate unit with the rounding function works together with a number of registers connected to the unit, and comprises selecting inputting means for entering an addend supplied selectively from different positions in one of said registers which is connected externally thereto, rounding means for performing a rounding process to convert data of a larger data size into data of a smaller data size on the result of the multiply-accumulate operation based on the addend selectively entered by said selecting inputting means, and selection outputting means for outputting the result of the multiply-accumulate operation rounded by said rounding means selectively to different positions in one of said registers.
Specifically, the multiply-accumulate unit with the rounding function based on the present invention has a selection inputting and expanding means, a rounding and selection outputting means, and a multiply-accumulate unit. The multiply-accumulate unit with the rounding function has its operation mode which is controlled by two kinds of signals Round, Position. If control signal Round is xe2x80x9c0xe2x80x9d, then the multiply-accumulate unit with the rounding function based on the present invention operates as the conventional multiply-accumulate unit.
If control signal Round is xe2x80x9c1xe2x80x9d, the multiply-accumulate unit with the rounding function based on the present invention operates differently depending on control signal Position. In this case, the selection inputting and expanding means expands an addend into a data which has the data size required by the multiply-accumulate unit based on the control signal Position which indicates the position of the addend represented by 16-bit wide data in the externally connected register.
The rounding and selection outputting means rounds off the result of the multiply-accumulate operation whose data size corresponds to the data size of the register into 16-bit data, and then outputs the rounded data to the position of the addend in the register which is indicated by control signal Position.
With the above arrangement, it is possible to perform multiply-accumulate operations with the rounding operation on each 16-bit addend placed in 31st-16th bits or 15th-0th bits of registers, which are 32-bit wide or more and are connected to the multiply-accumulate unit with the rounding function, without affecting other bits.