Elementary functions are mathematical functions such as square root, logarithm, exponential, etc., that are widely used in high performance computing (HPC) applications, scientific computing, and financial applications. The speed of elementary function evaluation often has a significant effect on the overall performance of such applications, making accelerated elementary function libraries an important factor in achieving high-performance on hardware.
Elementary function libraries, such as IBM MASS (Mathematical Acceleration SubSystem), are often called from performance critical code sections, and hence contribute greatly to the efficiency of numerical applications. Not surprisingly, such functions are heavily optimized both by the software developer and the compiler, and processor manufacturers provide detailed performance results which potential users can use to estimate the performance of new processors on existing numerical workloads.
Changes in processor design require such libraries to be re-tuned. For example, hardware pipelining and superscalar dispatch will favor implementations which use more instructions, and have longer total latency, but which distribute computation across different execution units and present the compiler with more opportunities for parallel execution. Additionally, Single-Instruction-Multiple-Data (SIMD) parallelism, and large penalties for data-dependent unpredictable branches favor implementations which handle all cases in a branchless loop body over implementations with a fast path for common cases and slower paths for uncommon, e.g., exceptional, cases. The present disclosure provides enhanced performance with these architectures through the use of elementary function algorithms and hardware instructions to accelerate such algorithms and simplify their use.