Some computer architectures support execution of SIMD (Single Instruction, Multiple Data) instructions. Each such instruction performs one operation on multiple sets of data. During execution, multiple simultaneous instances of one operation are used to process different data. As an example, one instruction might perform several independent subtractions or other mathematical operations (but the same operation everywhere).
The Pentium® 4 microprocessor from Intel Corporation is one example of a computer architecture that is capable of executing SIMD instructions. The microprocessor utilizes 144 instructions, known as SSE2 (Streaming SIMD Extensions 2) instructions, to operate on data stored in registers. The microprocessor has eight 128-bit integer registers, each of which can hold multiple sets of data, such as two 64-bit integers, four 32-bit integers, eight 16-bit integers, or sixteen 8-bit integers. Each individual SSE2 instruction performs one operation on the multiple data sets contained in a register, possibly with inputs from another register. The SSE2 instructions reduce the overall number of instructions used to execute a particular program task, thereby increasing overall performance.
Today, microprocessors are called upon to perform many multiplication and division operations almost instantaneously. Security is one example where computers are called upon to perform many rigorous operations. Security functions employ complex cryptographic algorithms that often use exponentiation operations (e.g., the RSA algorithm) involving many modular multiplication and operations. It is a continuing goal to find ways to reduce computations in complex algorithms and thereby improve performance.
Montgomery multiplication is a well-known algorithm for modular multiplication which avoids division. It achieves this by reducing double-length products from the right rather than from the left.