The present invention relates to data processing, and more specifically, to mixed precision estimate instruction computing.
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard established by the Institute of Electrical and Electronics Engineers (IEEE) and the most widely used standard for floating-point computation. The current version is IEEE Standard for Floating-Point Arithmetic 754-2008, which was published in August 2008, and is herein incorporated by reference in its entirety. Many computer languages allow or require that some or all arithmetic be carried out using IEEE 754 formats and operations.
The IEEE 754-2008 standard defines: arithmetic formats: sets of binary and decimal floating-point data, which consist of finite numbers (including signed zeros and subnormal numbers), infinities, and special “not a number” values (NaNs); interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form; rounding algorithms: methods to be used for rounding numbers during arithmetic and conversions; operations: arithmetic and other operations on arithmetic formats; and exception handling: indications of exceptional conditions (such as division by zero, overflow, etc.).
Under exception handling, the standard defines five exceptions, each of which has a corresponding status flag that is raised when the exception occurs. The five possible exceptions are: invalid operation (e.g., square root of a negative number); division by zero; overflow (a result is too large to be represented correctly); underflow (a result is very small (outside the normal range) and is inexact); and inexact.
Single precision floating point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point. In IEEE 754-2008, the 32-bit base 2 format is officially referred to as binary32.
In computing, double precision floating point is a computer number format that occupies two adjacent storage locations in computer memory. A double precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point (in which case it is often referred to as FP64). Modern computers with 32-bit storage locations use two memory locations to store a 64-bit double-precision number (a single storage location can hold a single-precision number). Double-precision floating-point is an IEEE 754 standard for encoding binary or decimal floating-point numbers in 64 bits (8 bytes).