To improve arithmetic processing most modern processors use a process called the fused-multiply-add (in the following abbreviated as FMA) process to combine a multiplication operation, e.g., A×C, and an addition operation, e.g., +B, for execution as a single instruction, e.g., A×C+B, where A, B, C are operands of the multiplication product A×C and the sum of B and the product. By performing two operations in a single instruction, the FMA process reduces overall execution time.
A number of widely used crypto algorithms are based on one or many long integer multiply instructions (e.g. 256 or 2048 bits). These are e.g. used when establishing a secure connection or for blockchains. For performance reasons these algorithms should be as fast and efficient as possible. New algorithms that are quantum computers resistant are arising like elliptic curve cryptography (ECC) which might also require fast throughput and will become soon the new standard and be widely used. These new algorithms, in order to be quantum resistant, do not rely on basic long multiplications, but also shifts of intermediate results as characteristic. Therefore an efficient and fast implementation of multiplication operations is needed for those new algorithms.