1. Field of the Invention
The invention relates to general purpose arithmetic logic units (ALUs), and in particular to an ALU utilizing a residue number system in performing arithmetic operations.
2. Related Art
The binary number system is the most widely used number system for implementing digital logic, arithmetic logic units (ALU) and central processing units (CPU). Binary based computers can be used to solve and process mathematical problems, where such calculations are performed in the binary number system. Moreover, an enhanced binary arithmetic unit, called a floating point unit, enhances the binary computers ability to solve mathematical problems of interest, and has become the standard for most arithmetic processing in science and industry.
However, certain problems exist which are not easily processed using binary computers and floating point units. One such class of problems involves manipulating and processing very large numbers. One example is plotting the Mandelbrot fractal at very high magnification. In order to plot the Mandelbrot fractal at high magnifications, a very long data word is required. Ideally, the Mandelbrot fractal plotting problem necessitates a computer with an extendable word size.
The main issue is that any real computer must be finite in size, and consequently the computer word size must be fixed at some limit. However, closer analysis reveals other contributing problems. One such problem is the propagation of “carry” bits during certain operations, such as addition and multiplication. Carry propagation often limits the speed at which an ALU can operate, since the wider the data word, the greater the path for which carry bits are propagated. Computer engineers have helped to reduce the effect of carry by developing carry look-ahead circuitry, thereby minimizing, but not eliminating, the effects of carry.
However, even the solution of implementing look-ahead carry circuits introduces its own limitations. One limitation is that look-ahead carry circuits are generally dedicated to the ALU for which they are embedded, and are generally optimized for a given data width. This works fine as long as the CPU word size is adequate for the problems of interest. However, once a problem is presented which requires a larger data width, the CPU is no longer capable of using its native data and instruction formats for direct processing of the larger data width.
In this case, computer software is often used to perform calculations on larger data widths by breaking up the data into smaller data widths. The smaller data widths are then processed by the CPU's native instruction set. In the prior art, software libraries have been written specifically for this purpose. Such libraries are often referred to as “arbitrary precision” math libraries. Specific examples include the arbitrary precision library from the GNU organization, and the high precision arithmetic library by Ivano Primi.
However, software approaches to processing very large data widths have significant performance problems, especially as the processed data width increases. The problem is that software processing techniques tend to treat the smaller data widths as digits, and digit by digit processing leads to a polynomial increase in execution time as the number of digits increases. In one example, an arbitrary precision software routine may take four times as much time to execute when the data width is doubled. When using arbitrary precision software solutions, the amount of processing time often becomes impractical.
One possible solution is to build a computer which is not based on binary arithmetic, and which does not require carry propagation logic. One candidate number system is the residue number system (RNS). Residue number addition, subtraction and multiplication do not require carry, and therefore do not require carry logic. Therefore, it is possible that RNS addition, subtraction and multiplication be very fast, despite the word size of the ALU. These facts have provided some interest for RNS based digital systems in the prior art; unfortunately, prior art RNS based systems are only partially realized, and have failed to match the general applicability of binary based systems in essentially every instance. This fact is evident from the lack of practical RNS based systems in the current state of the art.
The reasons for the failure of RNS based systems to displace binary systems are many. Fundamental logic operations, such as comparison and sign extension, are more complex in RNS systems than traditional binary systems, and require more logic circuitry and execution time. For many experts, it is often assumed the difficulty of RNS comparison, RNS to binary conversion, and RNS sign and digit extension make RNS based processors and ALUs impractical for general purpose processing.
In addition to the problems noted above, the lack of a practical RNS integer divide further restricts the applicability of RNS based systems of the prior art. Also, the lack of general purpose fractional number processing has (severely) restricted the usefulness of RNS based digital systems of the prior art. In summary, prior art RNS systems cannot process numbers in a general purpose manner, and this has relegated such systems to little more than research subjects.
Some Needs of the Present Invention
The method and apparatus disclosed herein provide a general purpose RNS arithmetic logic unit (ALU). The new RNS ALU addresses the many issues confronted and exposed in the prior art. The RNS ALU of the present invention is extensible, and provides a solution to the time complexity problem involving arithmetic processing of very wide data. For very long data widths, the RNS ALU may outperform many prior art binary systems.
In terms of general purpose processing, the RNS ALU provides performance advantages over very wide width binary systems, even if such binary systems exhibit a run time that is linear with respect to increasing bits (resolution). The reason is the RNS ALU can complete many operations in near constant time, such as adding, subtracting, and multiplying integers. The RNS ALU can also add and subtract fractional values in constant time, as well as multiply integers by fractions in near constant time. Therefore, if the problem of interest can take advantage of such single clock operations, the RNS ALU may provide results faster than an equivalent binary system, which must handle carry for all arithmetic operations of all data formats.
It is anticipated that the RNS ALU of the present invention find application in problems involving very large numbers, such as encryption and decryption. Other example applications are found in research, such as prime number searching and fractal analysis. Often, these applications involve very long word lengths, including binary word widths greater than 1024 bits. When dealing with very long word widths, numbers are broken down to smaller chunks for processing, and therefore arithmetic operations are processed digit by digit. In this context, the RNS ALU can effectively compete with binary systems, since RNS operations do not require carry.
The method and apparatus of the present invention is also applicable to fractal analysis. For example, consider the case of the analysis of the Mandelbrot set, or Mandelbrot fractal. In order to observe the fractal at increasingly greater magnification, the processing system requires increasingly greater numeric resolution. If one uses a standard binary floating point unit, there comes a point during magnification of the fractal image for which the floating point unit will be unable to render the fractal. In this case, a larger word size is needed, as well as the required operations of fractional multiplication, addition and compare on the larger word size.
The method and apparatus of the present invention can be used to create a very wide word ALU. The ALU will support fractional multiplication and addition of very long word values at theoretically greater speed then would be the case if a conventional binary floating point unit was extended to support the same word size.
The method of the present invention provides an ALU apparatus with superior fractional representation. The fractional representation of the RNS ALU provides many more denominators than does a binary representation covering the (approximately) same range. This provides more accurate representation of many more commonly used ratios. This high precision of the RNS ALU competes favorably with the precision of many binary formats, including extended precision floating point (when comparisons are made of ALUs of approximately the same effective word width).
In addition, the RNS ALU of the enclosed invention is very fast. For example, the theoretical performance of the RNS fractional multiply of the enclosed invention is approximately linear with respect to the number of equivalent binary bits (wide) of the data processed. This relation accounts for the increase in memory table lookup time as the binary width of the most significant digits increase. In practice, the performance of the RNS fractional multiply is closer to n/log(P), where n is the effective word width in bits, and P is the equivalent number of RNS digits.
Interestingly, if look-up table speed is assumed to be fixed, and other basic assumptions are made, the theoretical time for RNS fractional multiply is better than linear. This assumption is particularly valid within intervals for which a given (binary) look-up table supports a plurality of digit modules; for example, a look-up table supporting 8 bit wide operands supports up to 54 RNS digits, whereas a lookup table supporting 9 bit operands supports up to 97 RNS digits. The difference in supported digits is 97−54=43 digits. Therefore, assuming 9 bit look-up tables (LUT) are employed, up to 43 digits worth of number extension is possible without any increase in LUT size or speed. It should be noted this analysis compares “equivalent binary width”, and not RNS digit length. When using conventional memory to support look-up tables, higher density memory is also faster; therefore, the assumption of a fixed delay look-up table holds as long as this technology trend and the system memory requirements match.
In terms of RNS digit length, the time complexity analysis for fractional multiply versus RNS digit length is linear, again assuming a fixed LUT speed.
The performance of the RNS ALU compares favorably with binary processing systems, which may exhibit a polynomial increase in processing time with respect to an increasing number of bits (wide) of the data. For the multiply and divide operations, the RNS ALU will typically exceed the performance of a similarly sized (wide) binary ALU at some given data width. The point of crossover is to be determined based on actual implementations and technologies. For many types of arithmetic calculations, and in many cases, the RNS ALU will significantly outperform an equivalently sized binary ALU. For integer operations of addition, subtraction and multiplication, the RNS ALU theoretically outperforms the binary ALU at any bit width. In practice, the actual performance depends on many other real world factors, such as implementation technology and circuit topology.
Additionally, the sliding point operation of the RNS fractional multiplication supports a novel implementation of Goldschmidt division and Newton-Raphson reciprocal. The Newton reciprocal algorithm provides quadratic convergence, and is ideally suited for systems requiring fast division of fractional quantities. Using the fractional multiplication method to implement either the Goldschmidt or the Newton-Raphson technique provides a very fast division for fractional RNS values. (It should be noted the RNS integer division method of the present invention may also be used achieve fractional division without using Newton-Raphson or Goldschmidt).
The analysis and discussion above does not include the time to convert results back to binary, and this is partially justified. Some problems suitable for the method and apparatus of the present invention will require many iterative calculations to be performed. Using the apparatus and methods of the present invention, this will be accomplished entirely in RNS format. Once the final arithmetic result is ready, it is converted to binary. If the conversion time of the final result can be neglected, then the RNS multiplier's better than linear performance with respect to the number of binary digits may be realized. Furthermore, in the case of the Mandelbrot fractal problem, the results of repetitive calculation may only be a “yes” or “no” answer, which does not require conversion back to binary. In yet another case, if allowable, RNS results may be truncated, and converted with less resolution to shorten conversion time.
However, many arithmetic problems will not require repetitive calculations on one set of values, such as calculations involving matrixes. In this case, the speed of converting RNS results back to binary is more significant. Fortunately, the method of the present invention includes a new and unique apparatus for high speed conversion of RNS values to binary. The performance of the RNS to binary conversion is approximately linear with respect to RNS digits, given the assumption that LUT access time is fixed. Using the methods of the present invention, conversion of RNS to binary is on the order of the time required to perform a fractional RNS multiplication, and is therefore practical. Moreover, the conversion apparatus and method is extensible, and does not suffer from increasing carry propagation delay as data width is increased. Equally important is the fact the novel conversion apparatus is extendable to a pipelined architecture, capable of performing a conversion every clock cycle.
Another need and advantage of the disclosed invention is its potential application to other forms of computational processing. For example, optical computers may benefit from digit by digit isolation due to their large size; therefore, the method of the present invention is ideal. Additionally, new technologies, such as optical computing and quantum computing, can use the method of the present invention to perform digital arithmetic operations using hardware which has more states than Boolean logic, i.e., more than two states.
In hindsight, RNS systems have numerous embodiments and alternate methods that can be employed and exploited; therefore, in foresight, it is anticipated the ALU of the present invention be a new fundamental baseline, and therefore be further modified and enhanced in the future.