1. Field of the Invention
The present invention relates to computational and calculation functional units of computers, controllers and processors. More specifically, the present invention relates to functional units that efficiently perform various operations including multiplication, division, reciprocal, square root, and power computation operations.
2. Description of the Related Art
Computer systems have evolved into versatile systems with a vast range of utility including demanding applications such as multimedia, network communications of a large data bandwidth, signal processing, and the like. Accordingly, general-purpose computers are called upon to rapidly handle large volumes of data. Much of the data handling, particularly for video playback, voice recognition, speech process, three-dimensional graphics, and the like, involves computations that must be executed quickly and with a short latency.
One technique for executing computations rapidly while handling the large data volumes is to include multiple computation paths in a processor. Each of the data paths includes hardware for performing computations so that multiple computations may be performed in parallel. However, including multiple computation units greatly increases the size of the integrated circuits implementing the processor. What are needed in a computation functional unit are computation techniques and computation integrated circuits that operate with high speed while consuming only a small amount of integrated circuit area.
Execution time in processors and computers is naturally enhanced through high speed data computations, therefore the computer industry constantly strives to improve the speed efficiency of mathematical function processing execution units. Computational operations are typically performed through iterative processing techniques, look-up of information in large-capacity tables, or a combination of table accesses and iterative processing. In conventional systems, a mathematical function of one or more variables is executed by using a part of a value relating to a particular variable as an address to retrieve either an initial value of a function or a numeric value used in the computation from a large-capacity table information storage unit. A high-speed computation is executed by operations using the retrieved value. Table look-up techniques advantageously increase the execution speed of computational functional units. However, the increase in speed gained through table accessing is achieved at the expense of a large consumption of integrated circuit area and power.
Many processors execute common arithmetic operations including multiplication, division, reciprocal, square root, and power computations that conventionally execute relatively slowly, consume a large silicon area, and have a high power drain. A division instruction is particularly burdensome and difficult to implement in silicon, typically utilizing many clock cycles and consuming a large integrated circuit area.
What are needed are a method for implementing computations in a computing circuit that is simple, fast, and reduces the amount of computation circuitry.
A computation unit employs a logarithmic number system that uses a logarithmic numerical representation that differs from an IEEE standard representation to improve the efficiency of computation, both by reducing the time expended in performing the computation and by reducing the size of the integrated circuit that performs the computation. A standard IEEE numerical representation for a number N is shown in equation (1) as follows:
N=(xe2x88x921)s2(Exe2x88x92127)(1.M).xe2x80x83xe2x80x83(1)
In contrast, an illustrative computation unit employs a different numerical representation for a number N shown in equation (2):
N=(xe2x88x921)02(Exe2x88x92127)(1.M).xe2x80x83xe2x80x83(2)
The illustrative computation unit employs a numerical representation that is similar to the IEEE format except that the sign term is omitted. Thus only positive numbers are represented. The value of the mantissa is defined as a fractional number between zero and one.
The numerical representation shown in equation (2) describes a useful number system domain for multiplication, division, reciprocal, square root, and power computations using multiplication, division, and square root computation techniques described by following equations (3) to (5), respectively:
A*B=Anti-log(log(A)+log(B)),xe2x80x83xe2x80x83(3)
A/B=Anti-log(log(A)xe2x88x92log(B)),xe2x80x83xe2x80x83(4)
Bxc2xd=Anti-log(log(B)/2).xe2x80x83xe2x80x83(5)
In equation (5), the division operation that divides log(B) by two is typically and advantageously performed using a right shift operation to reduce computational complexity.
The computation unit includes a logarithm computation block, an anti-logarithm computation block, and a division computation block. In various embodiments, the multiple computation blocks may be implemented as separate blocks or as common or shared blocks. In various embodiments, the multiple computation blocks may be constructed in any suitable form such as hardware logic, sequencers with microcode, programmable logic arrays, and the like.
An illustrative logarithm computation block performs a logarithm computation of the form of equation (6), as follows:
log(1+X)=X+E(X),xe2x80x83xe2x80x83(6)
where a logarithm table value E(X) is accessed from a storage or memory including a plurality of storage elements The logarithm table includes a plurality of data values that relate to segments of an error curve that is represented by a seed value and a slope such that the error curve fits the least squares regression line given by an approximation equation (7), as follows:
E(X)=ax+b,xe2x80x83xe2x80x83(7)
where the value a is the lower order 13-bit slope value stored in the exponential value table E(X). The values a have a range from 0 to 1xe2x88x92(2xe2x88x9223).
An illustrative anti-logarithm computation block performs an anti-logarithmic computation of the form of equation (8), as follows:
Anti-log(Y)=2Y=1+Y+E(Y),xe2x80x83xe2x80x83(8)
where an anti-logarithm table value E(Y) is accessed from a storage or memory including a plurality of storage elements. The anti-logarithm table includes a plurality of data values that relate to segments of an error curve that is represented by a seed value and a slope such that the error curve fits the least squares regression line given by an approximation equation (9), as follows:
E(Y)=cy+d,xe2x80x83xe2x80x83(9)
where the value c is the lower order 13-bit slope value stored in the exponential value table E(Y). The values c have a range from 0 to 1xe2x88x92(2xe2x88x9223).
An illustrative division computation block performs a division computation of the form of equation (10), as follows:
X/Y=Anti-log(log(X)xe2x88x92log(Y)),xe2x80x83xe2x80x83(11)
where the X and Y are floating point numbers. A plurality of bits encode the floating point numbers X and Y. The division computation is a multiple cycle operation. In a cycle of the operation, a portion of the most significant bits (MSB) of the X value are used to address into a logarithm storage. A portion of the least significant bits (LSB) of the X value are recoded using a Booth recoder.
In a subsequent cycle, the Booth recoded LSB of X are multiplied by a slope value (a) read from the logarithm storage and a seed value (b), also accessed from the logarithm storage at the addressed location, is added to the product according to equation (7) and the sum is saved as log(X). Also in the same cycle, portion of the most significant bits (MSB) of the Y value are used to address into the logarithm storage and a portion of the least significant bits (LSB) of the X value are recoded using the Booth recoder.
In a further subsequent cycle, the Booth recoded LSB of Y are multiplied by a slope value (a) read from the logarithm storage and a seed value (b), also accessed from the logarithm storage at the addressed location, is added to the product according to equation (7) to form a sum log(Y). The previously saved sum log(X) is subtracted from the sum log(Y) to form a result log(Y)xe2x88x92log(X).
In another cycle, a portion of the most significant bits (MSB) of the result are used as a pointer to address an anti-logarithm storage and a portion of the least significant bits (LSB) of the result are recoded using the Booth recoder.
In a subsequent cycle, the Booth recoded LSB of the result are multiplied by a slope value (c) read from the anti-logarithm storage and a seed value (d), also accessed from the anti-logarithm storage at the addressed location, is added to the product according to equation (9) and the sum is added to the saved result log(Y)xe2x88x92log(X) to form a division result. The exponent of Y is subtracted from the exponent of X in the same cycle.
In another cycle, the division result is rounded and any necessary adjustment of the exponent is made.