Cryptographic algorithms, such as RSA2048 and others that underlie Secure Socket Layer (SSL) connections, Transport Layer Security (TLS) connections, and the like create heavy computational loads for supporting computing devices (e.g., servers).
Conventional software implementations of RSA are “scalar code” that use arithmetic logic unit (ALU) instructions (e.g., ADD/ADC/MUL). Improvements in the performance of ADD/ADC/MUL have made the scalar implementations more efficient on modern processors. Still, single instruction multiple data (SIMD) or “vector” architectures can provide improvements over scalar code. SIMD is an architecture where a single instruction computes a function of several inputs, simultaneously. These inputs are called “elements” and reside in registers that hold a few of the inputs together. Early SIMD architectures used instructions (e.g., MMX) that operate on 64-bit SIMD registers. Other architectures (e.g., SSE) introduced 128-bit registers. Still other architectures (e.g., advanced vector extensions (AVX), AVX2, and the like) extend the SSE architecture in several respects by, for example, introducing non-destructive destination and floating point operations over 256-bit registers.
Many methods (e.g., discrete cosine transform (DCT) used in media processing) operate on multiple independent elements, and are therefore inherently suitable for SIMD architectures. However, big-number (e.g., multi-digit) arithmetic, which is important in RSA applications and many other applications, is not naturally suitable for vector architectures considering, for example, the digits of multi-digit numbers are not independent due to carry propagation during arithmetic operations such as addition and multiplication.