1. Field of the Invention
The present invention generally relates to a method and an apparatus for generating random data, in particular to a method and an apparatus for post-processing raw random binary sequences to obtain purely random and/or pseudo-random numbers.
2. Background of the Invention
In many applications in the field of computers and other electronic devices there is a need for a physical source of true random numbers. Such applications include computer simulations of various probabilistic algorithms and processes, such as Monte Carlo numerical analysis, computer games, cryptographic algorithms and protocols whose security relies on the ability to generate unpredictable secret keys. High-speed truly random sequences are also needed for setting up countermeasures against so-called side-channel attacks against specific electronic devices, particularly microelectronic devices, implementing security schemes, such as integrated chip cards; such countermeasures include for example random masking of cryptographic functions, as well as generation of secret keys for the encryption of internal links and memories in such devices.
The output of a Random Number Generator (RNG) is typically a binary sequence that, in principle, has to be unpredictable in the sense of the information theory. Equivalently stated, it should be possible to statistically model the RNG output as a purely random sequence, i.e., a sequence of mutually independent, uniformly distributed binary random variables (bits), with maximal possible entropy per bit. In particular, it should be computationally infeasible to distinguish the RNG output sequence from a purely random sequence or, equivalently, it should be computationally infeasible to predict the RNG output sequence.
Several techniques are known in the art for generating true random numbers. For practical reasons, RNGs implemented in the solid-state, semiconductor technology are preferable, because they can more easily be incorporated in Integrated Circuits (ICs), particularly digital ICs. A conventional type of hardware-based RNGs exploit thermal noise in resistors and/or shot noise in PN-junctions; regretfully, these RNGs include analog elements, and are therefore difficult to be incorporated in digital ICs. RNGs that can be implemented by digital integrated circuits only are therefore preferred; such RNGs are usually based on the phase jitter of free-running oscillators, implemented as ring oscillators (a structure consisting of an odd number of inverter logic gates connected in a circular cascade to form a ring). Another class of digital RNGs exploit the meta-stability of SR (Set-Reset) latches and edge-triggered flip-flops based on SR latches such as D-type flip-flops.
Typically, an RNG can be represented as essentially consisting of two parts, namely, a physical source of randomness, which produces a raw random binary sequence which is random, but not purely random, and a post-processing part, which produces the final output sequence by processing elements of the raw random binary sequence. This is disclosed for example in US 2003/0014452, and such an architecture is for example also recognizable in the circuits described in U.S. Pat. No. 4,641,102, U.S. Pat. No. 5,570,307, U.S. Pat. No. 6,240,432, US 2002/0156819, and US 2002/0186086.
The raw binary sequence may have a high speed, but it typically has statistical weaknesses: individual bits in the sequence show a slight bias, i.e., the probabilities of, occurrence of the two binary output values, “0” and “1” are not equal, and/or bits in the sequence that are relatively close to each other in time are correlated to a certain degree.
The main objective of the post-processing part is to eliminate such statistical weaknesses from the raw random binary sequence, in the information-theoretic or at least in the computational sense. Eliminating statistical weaknesses in the information-theoretic sense means that each output bit should approximately convey one bit of information or entropy, a result possible to be achieved only by reducing the speed of the raw random binary sequence. Such post-processing is sometimes referred to as randomness extraction. Eliminating statistical weaknesses in the computational sense means that the output sequence need not be purely random, but it should be computationally infeasible or at least difficult to distinguish the output sequence from a purely random sequence. In this case, the speed need not be reduced.
Another important objective of post-processing is to provide robustness of the statistical properties of the RNG output sequence with respect to changes in operating conditions, such as voltage, temperature, and other environmental conditions, and also with respect to physical attacks against the physical source of randomness, particularly in case the raw random binary sequence is heavily biased.
Most of the techniques proposed so far for post-processing of raw random binary sequences are essentially linear in one way or the other. More precisely, the known techniques are typically based on sequential linear transformations implemented by synchronously clocked Linear Feedback Shift Registers (LFSRs) with additive inputs (which means that some of the LSFR internal signals are combined in XOR—eXclusive OR—with the raw binary sequence as an input signal), or on block linear transformations applied to the input raw random binary sequence to be post-processed. The Applicant observes that if the speeds of the LFSR clock signal and the input signal are the same, then this method is essentially known in digital communications as data scrambling and serves to improve the statistics of the input signal. A basic block transformation blown in the art is the so-called Von Neyman extractor, which is capable of removing the bias from a raw random binary sequence, provided that the bits in the input raw random binary sequence are statistically independent. More precisely, the input bits are divided into non-overlapping pairs, and for each pair, the output bit is produced only if the two input bits in the pair are not equal and is equal to the first (or the second) input bit. Equivalently, the XOR operation is applied to individual input bit pairs: if the result is equal to zero, then the pair is discarded, and if it is equal to one, then the first (or the second) input bit is taken to the output. Consequently, even if the raw random binary sequence is purely random, the speed is reduced four times on average.
Randomness extraction techniques can be based on linear transformations only. However, these techniques may become insecure if the speed reduction is not sufficiently large.
The usage of general block linear transformations is suggested and analyzed in B. Barak et al., “True Random Number Generators Secure in a Changing Environment,” Cryptographic Hardware and Embedded Systems—GIES 2003, Lecture Notes in Computer Science, vol. 2779, pp. 166-180, 2003, which proposes a technique for digital post-processing of random data called randomness extraction that is based on a randomly chosen and fixed linear function, applied to blocks of input data, taken from any random number generator, to produce reduced-size blocks of output data. By using some known results on the so-called universal hash functions, it is proven that under certain conditions almost all such functions give rise to nearly uniform distributions of the corresponding blocks of output data. However, the Applicant argues that these mathematical results are not very meaningful for a fixed linear function, as there may exist families of linear functions that do not yield nearly uniform distributions of output blocks. Further, a nearly uniform distribution of output blocks does not imply a nearly uniform distribution of each (linear) transformation of output blocks. In addition, this sort of randomness extraction is not sufficiently robust, namely, it does not provide computational unpredictability of output data if the entropy of input data is significantly reduced by the adversary's action on the underlying random number generator. As the blocks have to be long, the implementation cost in terms of the gate count is considerable.
Most physical randomness sources are capable of generating raw random binary sequences that possibly have a high speed, but whose entropy rate (entropy content per bit) is not high. A general, well-known technique for producing a high-speed, but pseudo-random sequence calls for applying a randomness extractor to the raw random binary sequence, thereby reducing its speed, and then to use the obtained sequence to produce a random seed for a PRNG; an example of this general technique is provided in the already cited reference US 2003/0014452. The PRNG is clocked at a high speed to produce the final output sequence. The resulting output sequence is thus a pseudo-random sequence, generated from a truly random seed. Such a PRNG should be computationally secure (i.e., its output sequence should be computationally unpredictable), and this makes the PRNG relatively complex to be implemented. For example, it can be based on cryptographic hash functions.
In U.S. Pat. No. 6,581,078, a physical noise source produces digital signals that are combined with signals produced by a PRNG, through an additional memory and using an XOR gate. The combined signals are sent to the input of the PRNG. The resulting signals are allegedly unpredictable while exhibiting the intended statistical characteristics.
The Applicant observes that the speed of the PRNG is not specified, and that the requirements for the PRNG, except for the output statistics, are not specified in that document, and a linear, and hence insecure, PRNG based on a linear congruence is suggested to be used.