1. Field of the Invention
The present invention relates to a data processing apparatus and method for applying floating-point operations to first, second and third operands.
2. Description of the Prior Art
It is common for data processing apparatus to be required to perform various floating-point computations on data. It has been found that general purpose processors are not well suited to the performance of floating-point computations, and hence this has led to the development of specialised floating-point units (FPUs) to handle such computations.
One particular floating-point computation which is commonly required is a multiply-accumulate operation A+(B*C), whereby two numbers are multiplied together, and the product is then added to a third number. Although a multiply-accumulate operation can be performed by executing a multiplication instruction followed by a separate accumulate instruction, such an approach is relatively slow. Hence, there has been a great deal of interest in developing FPUs arranged specifically to perform multiply-accumulate operations with increased speed.
Examples of FPUs designed specifically to increase the speed of a multiply-accumulate operation and/or reduce circuit complexity can be found in U.S. Pat. No. 4,969,118, U.S. Pat. No. 5,241,493, U.S. Pat. No. 5,375,078, U.S. Pat. No. 5,530,663, EP-A-0,645,699, U.S. Pat. No. 4,866,652 and U.S. Pat. No. 4,841,467. Alternatively, the multiplier and adder can be retained as separate logic units.
Another similar floating-point computation often used is a multiply-subtract operation xe2x88x92A+(B*C), which typically can be performed using the same multiply-accumulate logic, but negating the value A before it is input to the adder.
In addition to performing the above multiply-accumulate and multiply-subtract operations, it is desirable to also be able to produce negated versions of the multiply-accumulate and multiply-subtract operations, since such operations are useful in complex multiply routines, Fast Fourier Transform (FFT) and filter routines, and general computations used by a compiler.
Intel describe an instruction for performing a negated version of the multiply-subtract operation in the IA-64 Application Developer""s Architecture Guide, Rev 1.0, page 7-59. Here a xe2x80x9cfloating-point negative multiply addxe2x80x9d instruction is defined, where the product of two floating-point register values is computed to infinite precision, negated, and then the value in a third floating-point register is added to the product, again in infinite precision. Rounding is then performed on the final value. Hence, this Intel instruction evaluates the expression Axe2x88x92(B*C).
A standard was produced in 1985 to ensure a consistent approach in the way in which floating-point computations are handled by various data processing apparatus, this standard being called the xe2x80x9cIEEE Standard for Binary Floating-Point Arithmeticxe2x80x9d, ANSI/IEEE Std 754-1985, The Institute of Electrical and Electronic Engineers, Inc., New York, 10017 (hereafter referred to as the IEEE 754-1985 standard). This standard defined, amongst other things, that a multiplication operation should finish with a rounding operation, and similarly that an add, or accumulate, operation should finish with a rounding operation. The IEEE 754-1985 standard further provided a definition of a number of rounding operations which would be considered to be compliant with the IEEE 754-1985 standard.
In accordance with the above Intel technique, a xe2x80x98fusedxe2x80x99 multiply-accumulate circuit is used, which results in efficient processing of the above instruction, but means that the result of the multiplication is not independently determined prior to the accumulate operation. Further, the multiplication is performed to an internal precision which contains all of the bits from the multiplication (for an nxc3x97n bit multiplication the result is 2n bits) and the accumulation is then performed using all of the multiply bits. Due to this approach, no rounding is performed on the result of the multiplication before that result is used in the subsequent accumulation. Hence, it is apparent that this technique is not compliant with the IEEE 754-1985 standard since that standard defines that a rounding operation should be performed on the result of the multiplication.
Another way of performing negated versions of the multiply-accumulate and multiply-subtract operations would be to perform the multiply-accumulate and multiply-subtract operations as usual and then negate the final result output by the FPU. In this way the negated operations xe2x88x92(A+(B*C)) and xe2x88x92(xe2x88x92A+(B*C) can be performed, and these operations will produce the desired algebraic results.
Indeed, the IBM Power architecture specifies multiply-add with negate functions of the above type. According to the PowerPC 601 RISC Microprocessor User""s Manual, (IBM) 52G7484 (MPR601UMU-02) or (MOT) MPC601UM/AD, REV 1, pp. 10-76-10-79, the IBM Power architecture specifies the following four instructions:
fnmaddx: frD=xe2x88x92([(frA)*(frC)]+(frB))
fnmaddsx: (same, but with single precision data)
fnmsubx: frD=xe2x88x92([(frA)*(frC)]xe2x88x92(frB))
fnmsubsx: (same, but with single precision data)
Like Intel""s approach, IBM used a fused multiply-accumulate unit, and so also cannot guarantee results which are compliant with the IEEE 754-1985 standard.
The MIPS IV architecture also defines two instructions which perform multiply-add negate functions similar to the IBM instructions, the instructions being as follows:
NMADD.fmt: fd=xe2x88x92((fs*ft)+fr)
NMSUB.fmt: fd=xe2x88x92((fs*ft)xe2x88x92fr)
The description on pages B-78 to B-79 of the xe2x80x9cMIPS IV Instruction Setxe2x80x9d, Revision 3.2, September 1995, by Charles Price, reads as follows: xe2x80x9cThe value in fs is multiplied by the value in FPR ft to produce an intermediate product. The value in FPR fr is added to/subtracted from the product. The result sum is calculated to infinite precision, rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and placed in FPR fd.xe2x80x9d
Since the MIPS architecture retains the multiplier and adder as separate logic units, then when performing a multiply-accumulate operation, rounding is applied to the output of the multiplier unit, and this output is then input to the adder logic unit, with the result of the adder logic unit also being rounded. Hence, this enables an IEEE 754-1985 compliant result to be achieved for the expressions xe2x88x92(xe2x88x92A+(B*C)) and xe2x88x92(A+(B*C)).
Further, it will be appreciated that the formula xe2x88x92(xe2x88x92A+(B*C)) is mathematically equivalent to the formula Axe2x88x92(B*C), and similarly the formula xe2x88x92(A+(B*C)) is mathematically equivalent to the formula xe2x88x92Axe2x88x92(B*C), and so instructions of the type as used by MIPS can be used to produce the correct mathematical results for the expressions Axe2x88x92(B*C) or xe2x88x92Axe2x88x92(B*C).
Viewed from a first aspect, the present invention provides a data processing apparatus for applying a floating-point multiply-accumulate operation to first, second and third operands, comprising: a multiplier for multiplying the second and third operands and applying rounding to produce a rounded multiplication result; an adder for adding the rounded multiplication result to the first operand to generate a final result and for applying rounding to generate a rounded final result; and control logic, responsive to a first single instruction, to control the multiplier and adder to cause the rounded final result generated by the adder to be equivalent to the subtraction of the rounded multiplication result from the first operand.
The present invention realises that although the above-mentioned prior art instructions for performing the computation xe2x88x92(xe2x88x92A+(B*C)) enable the correct mathematical results to be obtained for the formula Axe2x88x92(B*C), those prior art instructions will not produce the correctly signed results for that formula in certain specific situations, in particular where the result is a zero value. This will be explained in more detail below.
The IEEE 754-1985 standard provides, amongst other things, a definition of a number of rounding modes which would be considered to be compliant with the IEEE 754-1985 standard, and defines how a zero result of a sum operation performed on oppositely-signed numbers, or a difference operation performed on like-signed numbers, should be represented in those particular rounding modes.
Such a zero result may occur because the numbers being added or subtracted are of equal magnitude or because the numbers are zero. Irrespective of this, the IEEE 754-1985 standard defines that in the rounding modes xe2x80x9cround to nearestxe2x80x9d (RN), xe2x80x9cround to zeroxe2x80x9d or xe2x80x9cchopxe2x80x9d (RZ) and xe2x80x9cround to plus infinityxe2x80x9d (RP), a zero result of a sum operation performed on oppositely-signed numbers, or a difference operation performed on like-signed numbers, should be represented as a positive zero, whilst in the rounding mode xe2x80x9cround to minus infinityxe2x80x9d (RM), such a zero should be represented as a negative zero.
Given the above, it will be appreciated that the following signed zero results occur when A and B*C are equal in magnitude and sign:
Thus, whilst mathematically the earlier mentioned MIPS IV instructions were sufficient to evaluate the correct mathematical result for the formula Axe2x88x92(B*C), since +0 and xe2x88x920 represent the same number, the present invention has realised that the results obtained using those instructions will not produce the correctly signed results for that formula in the above described instances.
Further, it has been realised that certain real-world systems care about the difference, for example ones using the Java language. Since the IEEE 754-1985 standard specifies different bit pattern representations for +0 and xe2x88x920, and somewhat different behaviour with regard to the results of subsequent arithmetic operations on them, this has an effect on some systems, such as those using the Java language, which has strict xe2x80x9cbit-exactxe2x80x9d results requirements. Accordingly, for such systems it is not acceptable to use instructions which evaluate xe2x88x92(xe2x88x92A+(B*C)) to produce results for expressions of the form Axe2x88x92(B*C).
Another area where the difference is important is when re-using old code. When porting old code that evaluated expressions such as Axe2x88x92(B*C) by executing a separate multiply instruction followed by rounding, and then executing a separate subtract instruction followed by rounding, it is clearly important that any new single instruction for evaluating Axe2x88x92(B*C) produces the same result.
In accordance with the present invention, the above problems are addressed by the multiply-accumulate logic being arranged to provide fast execution of a first single instruction to generate a rounded result equivalent to the subtraction of the rounded multiplication result from the first operand, thus producing a result which is compliant with that required by the IEEE 754-1985 standard when evaluating expressions of the form Axe2x88x92(B*C).
Further, since expressions of the form Axe2x88x92(B*C) are more common in practice than expressions of the form xe2x88x92(xe2x88x92A+(B*C)), except in some specialised application areas, it has been found that the first single instruction of the present invention, which provides the IEEE 754-1985 standard compliant zero result when evaluating expressions of the form Axe2x88x92(B*C), is more useful than the earlier mentioned prior art instructions in some application areas.
In preferred embodiments, this first instruction is referred to as an FNMAC instruction, and can be considered as a negate-multiply accumulate instruction.
It will be appreciated that the control logic will provide general control of the multiplier and adder to ensure appropriate operation for a particular instruction, and hence would typically control the inputs to the multiplier and adder and the timing of the operations performed by the multiplier and adder. In preferred embodiments, the control logic includes primitive determination logic responsive to the first single instruction and the sign values of the first, second and third operands to determine whether to cause the adder to perform a like-signed addition (LSA) operation or an unlike-signed addition (USA) operation, and to generate a control signal dependent on the determination.
More particularly, in preferred embodiments, the primitive determination logic is arranged, in response to the first single instruction, to determine from the sign values of the second and third operands the sign value of the rounded multiplication result and to determine that a USA operation should be performed by the adder if the sign of the first operand and the sign of the rounded multiplication result are the same, and that an LSA operation should be performed by the adder if the sign of the first operand and the sign of the rounded multiplication result are not the same.
Preferably, the adder circuitry includes a multiplexer for selecting either the first operand or the negated first operand to be added to the rounded multiplication result, and the control signal is supplied to the multiplexer to cause the negated first operand to be selected for USA operations, and the first operand to be selected for LSA operations.
In preferred embodiments, the apparatus further comprises sign calculation logic incorporating the primitive determination logic and further arranged to generate an initial sign value for the final result. The sign calculation logic is preferably arranged to be responsive to the first single instruction to select the sign of the first operand as the initial sign value.
In accordance with preferred embodiments, if a sum value generated by the adder is negative, it is inverted prior to generation of the final result, and the apparatus further comprises sign adjust logic arranged to invert the initial sign value if a USA operation was performed and the sum value is positive, but to leave the initial sign unadjusted if the sum value is negative.
Further, if the final result is zero, the sign adjust logic is arranged in accordance with preferred embodiments of the present invention to override the initial sign value with a sign value indicated by a predetermined rounding mode. This ensures compliance with the IEEE 754-1985 standard, in particular with regard to how that standard defines how a zero result of a sum or difference operation should be represented in particular rounding modes.
Preferably, the apparatus further comprises rounding logic for performing a rounding operation on the final result as specified by the predetermined rounding mode in order to generate the rounded result.
In preferred embodiments, the control logic is responsive to a second single instruction to control the multiplier and adder to cause the rounded final result generated by the adder to be equivalent to the subtraction of the rounded multiplication result from the negated first operand.
Again, the present invention realises that although the above-mentioned prior art instructions for performing the computation xe2x88x92(A+(B*C)) enable the correct mathematical results to be obtained for the formula xe2x88x92Axe2x88x92(B*C), those prior art instructions will not produce the correctly signed results for that formula in certain specific situations, in particular where the result is a zero value. However, in accordance with preferred embodiments of the present invention, the multiply-accumulate logic can be arranged to provide fast execution of a second single instruction to generate a rounded result equivalent to the subtraction of the rounded multiplication result from the negated first operand, and thus produce a result which is compliant with that required by the IEEE 754-1985 standard when evaluating expressions of the formxe2x80x94Axe2x88x92(B*C).
In preferred embodiments, this second instruction is referred to as an FNMSC instruction, and can be considered as a negate-multiply subtract instruction.
Viewed from a second aspect, the present invention provides a data processing apparatus for applying a floating-point multiply-subtract operation to first, second and third operands, comprising: a multiplier for multiplying the second and third operands and applying rounding to produce a rounded multiplication result; an adder for adding the rounded multiplication result to the negated first operand to generate a final result and for applying rounding to generate a rounded final result; and control logic, responsive to a first single instruction, to control the multiplier and adder to cause the rounded final result generated by the adder to be equivalent to the subtraction of the rounded multiplication result from the negated first operand.
In accordance with the second aspect of the present invention, multiply-accumulate logic can be arranged to provide fast execution of a first single instruction to generate a rounded result equivalent to the subtraction of the rounded multiplication result from the negated first operand, and thus produce a result which is compliant with that required by the IEEE 754-1985 standard when evaluating expressions of the form xe2x88x92Axe2x88x92(B*C).
In accordance with this second aspect, the first instruction is in preferred embodiments referred to as an FNMSC instruction, and can be considered as a negate-multiply subtract instruction.
Responsive to this first single instruction, the primitive determination logic of preferred embodiments is arranged to determine that an LSA operation should be performed by the adder if the sign of the first operand and the sign of the rounded multiplication result are the same, and that a USA operation should be performed by the adder if the sign of the first operand and the sign of the rounded multiplication result are not the same.
Further, in accordance with this aspect of the present invention, the sign calculation logic of preferred embodiments is arranged to be responsive to the first single instruction to select the initial sign value to be equal to the negated sign of the first operand.
Viewed from a third aspect, the present invention provides a method of applying within a data processing apparatus a floating-point multiply-accumulate operation to first, second and third operands, comprising the steps of: arranging a multiplier to multiply the second and third operands and apply rounding to produce a rounded multiplication result; arranging an adder to add the rounded multiplication result to the first operand to generate a final result and to apply rounding to generate a rounded final result; and responsive to a first single instruction, controlling the multiplier and adder to cause the rounded final result generated by the adder to be equivalent to the subtraction of the rounded multiplication result from the first operand.
Viewed from a fourth aspect, the present invention provides a method of applying within a data processing apparatus a floating-point multiply-subtract operation to first, second and third operands, comprising the steps of: arranging a multiplier to multiply the second and third operands and apply rounding to produce a rounded multiplication result; arranging an adder to add the rounded multiplication result to the negated first operand to generate a final result and to apply rounding to generate a rounded final result; and responsive to a first single instruction, controlling the multiplier and adder to cause the rounded final result generated by the adder to be equivalent to the subtraction of the rounded multiplication result from the negated first operand.