Half precision (16-bit) floating point numbers are commonly used in computing where floating point range and precision are less important than memory footprint. Common applications using half-precision floating point numbers may include graphics, imaging, (e.g. the OpenEXR specification, CoreImage, Aperture, etc.) or a limited set of scientific applications. Typically data is stored in memory in the half-precision floating point format (e.g. specified by IEEE-754 standard, 2008) and converted to the single precision floating point format before arithmetic operations are performed with the data. The half precision floating point format may be sufficiently well used that some devices, such as GPUs (Graphics Processing Units) and mobile phones, may support hardware conversions between half-precision and single-precision. Some devices even are capable of doing arithmetic directly on the half precision floating point format.
However, a large class of devices, mostly desktop CPUs (Central Processing Units), do not provide hardware to convert between single precision and half precision floating point formats. Achieving correct software conversion from single precision to half precision can be especially onerous, due to the requirement by IEEE-754 that such conversion proceed by the current rounding mode, which by default is usually the IEEE-754 round to nearest, ties to even rounding mode. For example, in image processing (or other computations), converting the data from the single precision format to the half precision format without proper rounding, such as always rounding to zero, may cause a gradual drift of the data towards zero. As a result, the image based on the data would gradually get darker in image processing.
In addition, due to the limited range of the half precision floating point format, correct handling of subnormal conversion results (e.g. having subnormal floating point values based on IEEE-754 floating point standard), which are stored representation formats different from those of normal numbers, may further increase the complications of the conversions. Thus, a large number of instructions may be required to perform the conversions to cause a significant bottleneck in application performance.
Therefore, traditional approaches to convert floating point representations between different precisions tend to be slow, inexact or erroneous.