FIG. 1 schematically illustrates a set top box including a video processor integrated circuit 1 (STB) having Set Top Box and decoder functions. Circuit 1 can receive digital terrestrial (antenna 2), satellite (antenna 3) or cable 4 broadcasted signals from receiving interfaces 12, 13, 14 (RX) respectively. Circuit 1 can also be coupled to a DVD interface 15 (DVD IF) to process digital video signals from a DVD 5 or to any other video interface. Circuit 1 processes (for example, demultiplexes, decrypts, decodes, etc.) video streams with associated audio channels. Video is outputted to adapted displays (for example a TV monitor 6) or recorders (for example a video cassette recorder 7 (VCR)). Circuit 1 also usually comprises composite outputs and audio outputs (not shown). Various interfaces, memories 16 and other storage means (for example mass storage elements like a Hard Disk Drive 17 (HDD)) can be embedded in a same device 1′ with the circuit 1. FIG. 1 is an example and alternative or additional functionality can be provided.
Internally, circuit 1 comprises the required functions (software and/or hardware implemented) for processing the appropriate signals. Among others, circuit 1 usually comprises a central processing unit 18 (CPU), internal storage elements 16, (MEM) such as RAM and/or registers, and hardware computation unit 19 (HWFCT). Among the functions of circuit 1, the disclosure more particularly relates to the computation of cryptographic operations.
Cryptographic operations usually process big digital numbers (several hundreds or thousands of bits) and are time consuming. Simplification methods are usually implemented to save time and/or space. In many cryptosystems, operations are executed modulo a prime number or the product of two primes. A known method for computing modular multiplication and squaring uses the so-called Montgomery multiplication method. Modular squares are required, for example, during an exponentiation computation by the method called square and double.
A Montgomery multiplication, noted MM(A, B, n), of a first operand A by a second operand B, both lower than n computes A*B*R−1 (mod n), with R and n such that gcd(n, R)=1 and n is odd (not divisible by two), where “mod” designates “modulo” and “gcd” designates “greatest common divider”. The Montgomery algorithm requires the operands to be expressed in the Montgomery domain (residual representation) noted [X]n for an integer X comprised in the interval [0, n]. Obtaining the residual representation of an integer X requires a transformation which corresponds to a computation of [X]n such that [X]n=MM(X, R2, n)=X*R2*R−1 (mod n). Hence, a Montgomery multiplication requires a pre-computation of a parameter R2(mod n). Parameter R2(mod n) can be computed as R2(mod n)=[R(mod n)*R](mod n).
FIG. 2 is a block diagram illustrating a known example of implementation of a computation of value R2(mod n).
A variable Z containing the intermediate result of the computation is initialized (block 21) with value R−n. Then, an iterative loop is implemented for each of the k bits of the binary representation of n. For example, an index i is initialized with value 1 (block 21) and a loop computation is performed for each bit of n, until the kth bit (included). In each iteration of the loop (output N of block 22, i=k+1 ?), variable Z is first doubled (block 23, Z=2*Z). Then, variable Z is compared to n (block 24, Z>n ?). While Z>n (output Y of block 24), variable Z is reduced by n (block 25, Z=Z−n). When Z is lower than modulus n (output N of block 24), the index i is incremented (block 26, i=i+1) for the next iteration. When the k iterations have been processed (output Y of block 22), the variable Z contains the result (R2(mod n)).
The pre-computation of the parameter R2(mod n) depends (at least indirectly) on the operands of the algorithm (for example on a public key of a cryptographic algorithm).
A Montgomery multiplication can be used, for example, to check a digital signature of a boot code of an embedded device. When the device has to boot, it first loads the public key of the user from an external memory. This key is signed with an internal stored public key of the manufacturer of the circuit and compared to an embedded signature before allowing the circuit to boot. The verified user public key is then used to check the authenticity of the boot code. In such an application, the computation time is critical since the device can not boot until the signature(s) has/have been checked.
The number of potential R values to be processed by a cryptosystem becomes higher and higher. Further, memory space limitations do not usually allow (especially in integrated circuits) the storage of a big number of pre-computed parameters. Furthermore, such a pre-computation may be undesirable for privacy reasons.
Hence, there is a need to increase the speed for computing a parameter such as R2(mod n).