1. Field of the Invention
The present invention relates generally to performing floating-point operations in a Central Processing Unit (CPU) of a computing device, and more particularly, to an improved floating point unit for more efficiently performing multiple multiply Add operations at the rate of one per cycle.
2. Description of the Prior Art
Many compute-intensive applications today use extended-precision fixed-point arithmetic. This includes applications such as conversion between binary and decimal and public-key algorithms such as Diffie Hellman, DSA, ElGamel, and (most importantly) RSA. Public-key-Algorithm (PKA) cryptography particularly, has become an essential part of the Internet. The most compute-intensive part of PKA is a modular exponentiation using very large integers; typically 1024 bits, 2048 bits, or even larger. This computation is executed in software using multiple-precision arithmetic. For example, a typical 1024-bit RSA exponentiation requires about 200,000 64-bit multiplies and twice that many 64-bit adds. The computing time for this on a work station or a personal computer is not normally significant, as this occurs only once per secure-socket-layer (SSL) transaction. However, at the server, where many sessions can be in progress at the same time, this computation tends to be the limiting factor for the number of SSL transactions that can be performed.
The software on the IBM eServer zSeries® (z/OS) available from assignee International Business Machines, Inc., implements 64-bit fixed-point instructions to perform this operation. Fixed-point multiply on the zSeries is relatively slow, a 64-bit multiply typically taking more than 20 cycles, and is not pipelined. Additionally, there are not enough fixed-point registers to keep intermediate results in the registers.
One solution is to implement special cryptographic accelerators. With current technology, it takes several accelerators (usually more than 10) to provide the performance required by one main-frame server. Current technology trends indicate that server performance is increasing faster than accelerator performance, so this imbalance will continue to worsen in the future. Additionally, these accelerators run asynchronously to the central processing unit (CPU), so there is also a significant performance overhead in the CPU to interface with the accelerator.
Moreover, most current floating-point improvements are primarily concerned with performance, (not function) and especially as this applies to denormalized operands. In the application for which MAA is intended, denormalized operands do not occur. (Denormalized operands are very tiny values, unnormalized operands can have values in the normal range, but with leftmost zeros in the fraction.) For example, U.S. Pat. Nos. 5,943,249 and 6,732,134 describe processors for performing floating point operations, however, are concerned with denormalized operands and not normal values. U.S. Pat. Nos. 6,256,655 and 6,904,446 describe floating point processing that meet criteria for preserving the integrity of the result (e.g., fractions is affected by the alignment of the input fractions.)
It would be highly desirable to provide an improved floating-point unit for providing efficient processing of multiple-precision fixed-point operands.