1. Field of the Invention
The present invention relates to computational and calculation functional units of computers, controllers and processors. More specifically, the present invention relates to functional units that perform division operations.
2. Description of the Related Art
Computer systems have evolved into versatile systems with a vast range of utility including demanding applications such as multimedia, network communications of a large data bandwidth, signal processing, and the like. Accordingly, general-purpose computers are called upon to rapidly handle large volumes of data. Much of the data handling, particularly for video playback, voice recognition, speech process, three-dimensional graphics, and the like, involves computations that must be executed quickly and with a short latency.
One technique for executing computations rapidly while handling the large data volumes is to include multiple computation paths in a processor. Each of the data paths includes hardware for performing computations so that multiple computations may be performed in parallel. However, including multiple computation units greatly increases the size of the integrated circuits implementing the processor. What are needed in a computation functional unit are computation techniques and computation integrated circuits that operate with high speed while consuming only a small amount of integrated circuit area.
Execution time in processors and computers is naturally enhanced through high speed data computations, therefore the computer industry constantly strives to improve the speed efficiency of mathematical function processing execution units. Computational operations are typically performed through iterative processing techniques, look-up of information in large-capacity tables, or a combination of table accesses and iterative processing. In conventional systems, a mathematical function of one or more variables is executed by using a part of a value relating to a particular variable as an address to retrieve either an initial value of a function or a numeric value used in the computation from a large-capacity table information storage unit. A high-speed computation is executed by operations using the retrieved value. Table look-up techniques advantageously increase the execution speed of computational functional units. However, the increase in speed gained through table accessing is achieved at the expense of a large consumption of integrated circuit area and power.
A division instruction is highly burdensome and difficult to implement in silicon, typically utilizing many clock cycles and consuming a large integrated circuit area.
What is needed is a method for implementing division in a computing circuit that is simple, fast, and reduces the amount of computation circuitry.
A computation unit computes a division operation Y/X by determining the value of a divisor reciprocal 1/X and multiplying the reciprocal by a numerator Y. The reciprocal 1/X value is determined using a quadratic approximation having a form:
Ax2+Bx+C,
where coefficients A, B, and C are constants that are stored in a storage or memory such as a read-only memory (ROM). The bit length of the coefficients determines the error in a final result. Storage size is reduced through use of xe2x80x9cleast mean square errorxe2x80x9d techniques in the determination of the coefficients that are stored in the coefficient storage. During the generation of partial products x2, Ax2, and Bx, the process of rounding is eliminated, thereby reducing the computational logic to implement the division functionality.
A method of computing a floating point division operation uses a piece-wise quadratic approximation to determine a value 1/X where X is a floating point number having a numerical format including a sign bit, an exponent field, and a mantissa field. A floating point division Y/X is executed by computing the value 1/X and multiplying the result by a value Y. The value 1/X is computed in a computing device using a piece-wise quadratic approximation in the form:
1/X=Ax2+Bx+C.
The value x is defined as a plurality of lower order bits of the mantissa. Coefficients A, B, and C are derived for the division operation to reduce the least mean square error using a least squares approximation of a plurality of equally-spaced points within an interval. In one embodiment, an interval includes 256 equally-spaced points. The coefficients are stored in a storage and accessed during execution of the division computation instruction.
In some embodiments, a lookup table in storage is indexed using the leading or higher order bits of the mantissa. Since the most significant bit of the mantissa is always 1, some embodiments use a plurality of higher order bits but not including the most significant bit to index into the lookup table storage.
The method produces a xe2x80x9cpre-roundedxe2x80x9d Y/X result that is rounded to the nearest value. The pre-rounded result is truncated at a round bit position and incremented at the round bit position to generate an incremented quotient that is within one LSB of a correct solution. The incremented quotient multiplied by the divisor is compared with the dividend by subtraction. If the remainder is negative, then the pre-rounded result is more than half an LSB below the correct value and is incremented. If the remainder is positive, then the prerounded result is less than half an LSB below the correct value and is not incremented. If the remainder is zero, the result is selected based on the LSB of the pre-rounded result.