Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers. The central processing unit (CPU) may then provide parallel hardware to support processing vectors. Vectorization thus can improve power-efficient performance of a processor.
The IEEE 754 Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation established in 1985 (an update of which was published in August, 2008) by the Institute of Electrical and Electronics Engineers (IEEE). IEEE 754 sets forth standards at least for floating point number formats, required operations, and recommended operations. IEEE 754 defines formats for various precision levels, including: Half precision (2 bytes), Single precision (4 bytes), Double precision (8 bytes), and Quadruple precision (16 bytes).
For efficiency reasons, processor instruction set architectures often limit register sizes and floating point computations to a fixed number of bits. As a result, applications that desire increased accuracy and higher precision may seek to double the number of bits used for floating point representations. Doing so, however, can reduce the number of calculations that can be performed per cycle, can increase the instruction count, can decrease the number of data elements that can be stored in a data cache, and can require more memory operations.
What is needed is a way to vectorize in conformance with IEEE 754 using high-precision data when necessary for accuracy, and using low-precision data otherwise.