This invention relates generally to a processor and is more particularly concerned with a microprocessor having a multimedia multiply-subtractor/adder which assures critical processing precision by properly combining multiplications, additions and subtractions in efficient execution of mass multimedia processing.
In the conventional microprocessor, multiplication of numerical data by numerical data is generally carried out. In conventional multiplication, either an unsigned multiplicand is multiplied by an unsigned multiplier or a signed multiplicand is multiplied by a signed multiplier. That is, the multiplicand and the multiplier are generally of the same type.
With data handled by multipliers and processors becoming more diversified, there has been proposed a processor including an embedded piece of hardware capable of multiplying a signed multiplicand by a signed multiplier or an unsigned multiplicand by an unsigned multiplier. In addition, there has been proposed a multiplier capable of carrying out four types of multiplication as disclosed in Japanese Patent Prepublication Nos. Sho 63-623 and Sho 64-88831. Two of the four types are the conventional multiplication of an unsigned multiplicand by an unsigned multiplier and the conventional multiplication of a signed multiplicand by a signed multiplier. The remaining two of the four types are multiplication of an unsigned multiplicand by a signed multiplier and multiplication of a signed multiplicand by an unsigned multiplier.
In the functional aspect of a processor, it is necessary to support a multiply-addition instruction in order enhance the signal processing performance and the multimedia processing performance of the processor. In recent years, the number of processors incorporating a dedicated processing unit having the multiply-addition function has been increasing. In addition, in order to make the processing configuration suitable for multimedia processing, the number of processed bits is optimized, and there has been adopted a parallel processing mechanism called SIMD (Single Instruction stream-Multiple Data stream) wherein all input/output bits of a processor are divided into a plurality of blocks each having a size of n bits where n does not exceed xc2xd the number of input/output bits. An example of the processor adopting the SIMD mechanism is Intel""s MMX Pentium processor.
In the multiplier or the processor described above, however, the inventor has discovered a number of problems.
In order to identify the characteristic of a multimedia processing function to which the present invention is applied, a multiply-addition processing algorithm of a discrete cosine transform used in picture processing is considered as an example. Since the processing is picture processing, the computation formula is 2-dimensional and the multiply term is a double product such as (X(i, j)xc2x7B(i))xc2x7C(j) wherei and j are subscripts of the addition in the two dimensions respectively, X(I, j) is a variable or a picturexe2x80x94data value and B(I) and C(j) are cosine constants. Normally, 2-dimensional multiply-addition is split into two 1-dimensional operations. That is, first of all, multiply-addition of D(j)=X(i, j)xc2x7B(i) is carried out with respect to i. Then, multiply-addition of Y=D(j)xc2x7C(j) is carried out with respect to j.
In these operations, the following problem is raised. In general, a product Xxc3x97Y of a As. multiplication of an n-bit multiplier by an n-bit multiplicand is 2n bits in size as shown in FIG. 2. For n=16, for example, the product of a multiplication of a 16-bit multiplier by a 16-bit multiplicand is 32 bits in size. Since the processing is 2-dimensional, however, the product must be multiplied by a 1-dimensional multiply-addition result once again. In this case, since the product obtained as a result of the first multiply-addition is 32 bits in size, in the second multiply-addition, the 32-bit result must be multiplied by a 16-bit multiplier. In this case, since the size of the multiplicand is different from the size of the multiplier, the same multiplier can not be used. It is thus desirable to reduce the result of the first multiply-addition to 16 bits so that, in the second multiply-addition, the 16-bit result is multiplied by a 16-bit multiplier to give a 32-bit product which is also reduced to a 16-bit final result. It is thus necessary to approximate the 32-bit product of a 16-bit multiplicand and a 16-bit multiplier by a 16-bit number.
Consider the following case. As shown in FIG. 2, data 10 is a number having a sign 11. A constant 20 is also a number having a sign 21. Used as a multiplicand and a multiplier with a uniform format, the numerical data 10 and the constant 20 are subjected to a multiply-addition with a size of 16 bitsxc3x9716 bits to give a 32-bit product 30. Then, the 32-bit product 30 is approximated by a number with a size of 16 bits obtained as a result of extraction of the 16 high-order bits from the product 30. The multiplication result 30 has 2 sign bits, namely, bits 31 and 32. Strictly speaking, the sign bit s is shifted to the high-order bit of the 2 sign bits, namely, bit 32. The approximation number with a size of 16 bits is necessary to have a signed number having a precision of 15 bits. In order to solve this problem, the multiplication result 30 is shifted to the left by 1 bit to discard the extra sign bit, that is, bit 32. That is, in order to express the final cumulative result 40 by an approximation number with a size of 16 bits, the multiplication result 30 is shifted to the left by 1 bit with its precision being maintained and stored in a cumulative register. The critical precision is considered to be insufficient unless the operations described above are carried out in the application of the SIMD technique to picture processing.
As will be appreciated from the above description, in order to assure the precision of multiplication of signed numbers as part of expansion of the conventional multiply-addition function, a function to shift a multiplication result to the left by 1 bit if necessary and to add the left-shifted multiplication result to a cumulative result obtained so far is required in multimedia processing. For this reason, there has been proposed a processing instruction whereby, in fixed-point processing of signed numbers, a multiplication result is shifted to the left by 1 bit and the position of the fixed point is restored. For details, refer to U.S. Pat. No. 5,754,456.
In the case of a constant that can have only a positive value, there is raised a problem that the precision of the absolute value is degraded by 1 bit. With the method described above, this problem is unsolved. In addition, since this method requires a shift operation, its implementation is difficult and the latency increases. With the rising operating frequency of recent more difficult to implement a processing unit with a complex function having a latency within one machine cycle. A latency of 2 to 3 or even more machine cycles may be required. As a result, multiply-addition for cumulatively adding results becomes more and more difficult to implement by using a processor with a low throughput. An example of such multiply addition is:
A←A+X[1]xc3x97Y[1],
A←A+X[2]xc3x97Y[2],
- - - 
A←A+X[n]xc3x97Y[n]
Several problems raised in actual applications are also revealed in a document describing the four combination types of multiplication, that is, the two conventional combination types of multiplication, namely, the multiplication of signed numbers and the multiplication of unsigned numbers, and the two new combination types of multiplication, namely, the multiplication of a signed multiplicand by an unsigned multiplier and the multiplication of an unsigned multiplicand by a signed multiplier. If information indicating whether or not a number has a sign is included in the numerical data of the number, for example, the degree of precision to express the numerical value of the number is decreased. If information indicating which type of multiplication is to be taken is included in a multiply instruction, on the other hand, the instruction requires a field of 2 bits for describing such information. For a processor with a limited number of instruction definition fields, addition of such a field will raise a problem of an unavoidable need to replace another usable instruction with a multiplication instruction including such information.
It is thus an object of the present invention addressing the problems described above to provide a multimedia multiply-adder having an instruction which allows a signed number to be multiplied by an unsigned number at a high speed.
It is another object of the present invention to provide a multimedia processor that allows a multiply-addition to be carried out at a high speed.
Other objects and novel characteristics of the present invention will become apparent from the description in the specification and the acccompanying drawings.
Several aspects of the invention disclosed in the present application are outlined below.
In order to implement fixed-point processing without degrading the critical processing precision in a multimedia multiply-adder, the present invention provides a means for implementing a technique of multiplying numbers of two different types, that is, multiplying a signed number by an unsigned number. In the case of a multimedia application, either the multiplicand X or the multiplier Y in a multiplication of Xxc3x97Y is a constant or a coefficient from the beginning. Referring to FIG. 2, since a constant can be treated as a positive or unsigned number, in the multiplication of numerical data 10 having a sign 11 by a constant 20 having no sign it is not necessary to shift the position of the sign bit 31 of the result 30. Thus, a 1-bit left-shift register like one shown in FIG. 2 is not required. In addition, since all bits of the constant 20 shown in FIG. 1 are valid, the sign bit 21 for making the constant 20 of FIG. 2 a signed number is not required. Thus, the precision degradation due to an insufficiency of 1 bit is avoided.
Therefore, in the execution of an operation A←Axc2x1Xxc3x97Y in the multimedia multiply-adder, instead of carrying out a multiplication X*Y and then cumulatively adding the product obtained as a result of the multiplication to the cumulative sum A as is the case with the conventional technique, the multimedia multiply-adder is provided with a means for starting an addition or a subtraction at the same time as the execution of the multiplication Xxc3x97Y in order to carry out the operation A←Axc2x1Xxc3x97Y at a high speed. To be more specific, the multimedia multiply-adder is provided with a counter for counting the number of xe2x80x9c1xe2x80x9d digits in the cumulative term and the partial-product term of the multiplication so that the multimedia multiply-adder can be implemented by as few gate stages as possible.
In addition, the multimedia multiply-adder is also provided with a means for continuing the multiply-addition/subtraction to the next machine cycle by using a multiply-addition/subtraction result with a prior-carry-save state maintained as it is before propagation of a current carry in order to allow continuous processing to be carried out at one throughput. With aplurality of instructions issued consecutively one after another, data dependence among the infractions is analyzed before carrying out the next operation by using a processing result obtained in the preceding operation. In this case, a processing result stored in a latch with a carry saved is passed on to the next operation.
Adoption of an SIMD configuration is considered to be effective in order to carry out picture processing with a high degree of efficiency. In the SIMD configuration, a register 310 is divided into 4 fields 10, 11, 12 and 13 each having a size of n bits as shown in FIG. 3. By the same token, a register 320 is divided into 4 fields 20, 21, 22 and 23 each having a size of n bits. Thus, a multiply-addition (2n+nxc3x97nxe2x86x922n bits) or (n+nxc3x97nxe2x86x92n bits) can be carried out concurrently by using the 4 fields of each of the registers 310 and 320 at the same time.
As another technique adopted in a multiply-addition of picture processing, a pair of an addition and a subtraction referred to as butterfly processing is often carried out on a multiplication result. Thus, an instruction for carrying out an operation A←Axc2x1Xxc3x97Y with a 1-bit left shift is convenient. Referred to as a multimedia multiply-adder, a processing unit capable of executing this processing instruction is taken into consideration.