Implementations of the claimed invention generally may relate to formats and/or data structures for representing floating point numbers, and logic associated therewith.
In some cases, single precision, floating point numbers may be contained in a 32-it doubleword, taking the format, for example, as defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic. FIG. 1A illustrates such a conventional floating point format 110. As may be seen, format 110 may include one sign bit, eight exponent bits, and 23 fraction bits, for a total of 32 bits. In such format 110, the maximal representable number is (2−2−23)127 and the minimal number is −(2−2−23)127. The smallest fractional negative number that may be represented by format 110 is −2−149 and the smallest fractional positive number that may be represented is 2−149. In format 110, the value 0.0 has no fractional parts.
In many applications, the high precision of format 110 may not be required to describe certain sets or classes of numerical data. One such example is an immediate constant (“immediate” being defined and generally understood as an operand within an instruction) used in certain single instruction, multiple data (SIMD) instruction set architectures (ISAs). Hence, there is a need to use fewer bits to represent a floating point value.