The present invention relates to a method and apparatus for performing floating point arithmetic operations in a data processing system. More particularly, the invention relates to an apparatus, and method for implementing the apparatus, for performing the subtraction of exponents which is required by the arithmetic operations of addition and subtraction for floating point numbers. Subtraction of the smaller exponent from the larger exponent is used to determine the number of places the binary point must be shifted left in the fraction portion of a smaller floating point number before adding the fractions of two floating point numbers.
The use of floating point arithmetic operations in a data processing system has been a common practice practically since the inception of computer technology. The development of floating point arithmetic hardware has taken many forms, usually with the objectives of simplifying the hardware construction, or enhancing the speed of the arithmetic processing operation. The four arithmetic operations of add, subtract, multiply and divide have usually been accomplished by using specialized subsets of processes involving addition and subtraction. For example, multiplication operations have in many cases been performed by repeated addition processes, and division has been accomplished by a process of repeated subtraction. The efforts made to speed up these processing operations have focused on enhancements and simplifications of hardware circuit design, particularly the adder circuit, which ultimately limits the maximum processing speed of all arithmetic operations. In the case of division, efforts have been made to increase the speed of operation by calculating partial quotients, or by simultaneously predicting multiple quotient bits, to reduce the number of addition or subtraction iterations required for the divide calculation.
An American national standard has been developed in order to provide a uniform system of rules for governing the implementation of floating point arithmetic systems. This standard is identified as ANSI/IEEE Standard No. 754-1985, and is incorporated by reference herein. In the design of floating point arithmetic systems and algorithms, it is a principal objective to achieve results which are consistent with this standard, to enable users of such systems and algorithms to achieve conformity in the calculations and solutions to problems even though the problems are solved using different computer systems. The standard specifies basic and extended floating point number formats, arithmetic operations, conversions between integer and floating point formats, conversions between different floating point formats, conversions between basic format floating point numbers and decimal strings, and the handling of certain floating point exceptions.
Most commonly, floating point arithmetic operations are accomplished in either single precision or double precision format as defined by the IEEE Standard. Both of these formats utilize a sign, exponent and fraction field, where the respective fields occupy predefined portions of the floating point number. In the case of a 32-bit single precision number the sign field is a single bit occupying the most significant bit position: the exponent field is an 8-bit quantity occupying the next-most significant bit positions; the fraction field occupies the least significant 23-bit positions. In the case of a double precision floating point number the sign field is a single bit occupying the most significant bit position; the exponent field is an 11-bit field occupying the next-most significant bit positions: the fraction field is a 52-bit field occupying the least significant bit positions. Other formats for the exponent field and the fraction field are available and many may be developed based on the need of the application.
In the past, the difference between two exponents was found using adders which had the same width as called for by the particular floating point format. For example, finding the difference between the exponents of two floating point numbers in double precision format required adders of eleven bits wide. These adders typically are on the critical path meaning that the time spent subtracting one exponent from another directly impacts the overall speed of the entire arithmetic operation being performed. As a result, time saved in finding the difference between exponents speeds up the entire arithmetic operation.
One of the operations that slows down an adder is when a carry propagates across each bit. The time necessary for the carries to propagate across a wide adder is longer than the time necessary for the carries to propagate across a short adder. One way to increase the speed of determining the difference between two exponents would be by using an adder which is less than the number of bits wide designated by the particular floating point format.
It is a principal object of the present invention to provide an apparatus and method capable of use with any desired format for floating point arithmetic.
It is a further object of the present invention to provide an apparatus and method for achieving certain floating point arithmetic operations in a shorter time period than previously obtained, through the reduction in the amount of time required to provide these operations.
It is a further object of the present invention to provide an apparatus and method for adding and subtracting two numbers in floating point. More specifically, it is an object of the present invention to provide an apparatus and method for subtracting the exponential field of one number from the exponential field of another number to determine the shift of the binary point in one of the numbers to allow addition or subtraction of the numbers while achieving a desirable reduction in processing time.