1. Field of the Invention
The present invention is related to cryptography, and, more particularly, to guaranteed recovery of a cryptographic key where a plaintext-cyphertext pair is known.
2. Description of the Related Art
One common problem that is frequently encountered in the field of cryptography is discovering a cryptographic key in a situation where the cryptographic algorithm itself a priori known. For example, consider the situation illustrated schematically in FIG. 1. FIG. 1 illustrates how plaintext is converted to cyphertext through the use of a cryptographic algorithm and a key. There are many examples of such cryptographic algorithms, such as RC4, RC5, RC6, AES, DES, Blowfish, and so on. These algorithms all require a key, and giving a incorrect key as an input to the cryptographic function will produce “garbage” as an output.
The key is a sequence of bits, whose length is chosen depending on the security desired for the particular application. For example, several years ago, in the 1990s, 40-bit keys were common, particularly due to U.S. export control restrictions. Currently, 64-bit keys, and 80-bit keys are frequently used. 128-bit keys are also beginning to be used, and, at the present time, are generally considered to be virtually unbreakable using any of the known methods.
Generally, the selection of the length of the key affects not just the security, but also the efficiency of the algorithm. The longer the key, the more secure the encryption, but also, the longer the process of encrypting any particular plaintext. Thus, it is generally the practice to select the length of a key that would be resistant to a brute force attack (or other known forms of attack) in any realistic time frame. For example, if an 80-bit key would require on the order of 1 million years to identify the correct key using a known method, there is no point in using longer keys, where the attack would take 1 billion years, for the self-evident reason that there is no information that needs to be protected for that period of time.
There are several known methods for attacking cryptographic algorithms. In general, the problem is posed as follows: a plaintext-cyphertext pair is known, and the cryptographic algorithm used to produce the cyphertext from the plaintext is also known, but the cryptographic key used in the cryptographic algorithm is unknown. One situation where this can happen is where a message is intercepted both in its cyphertext form, and in its plaintext form. Another situation where a plaintext-cyphertext pair is available is where a file is encrypted, and some information is known about the file—for example, many files, when stored in known formats (such as Microsoft Word, Excel, Adobe Acrobat, and so on) contain certain header and other file identification information, which is always found at a specific location in the file, and is always the same. As another example, it may be possible to run a string of all zeros through the cryptographic algorithm, generating a cyphertext (even in a situation where the key is not known—for example, where a communication system is at any given point not transmitting anything useful, but is simply sending zeros through the communications channel, in order to maintain lock with the receiver). The relevant point is that for purposes of cryptographic attack, there are three things that are known a priori—the cryptographic algorithm, at least one example of a plaintext, and at least one corresponding example of a cyphertext.
The mathematical question is therefore “what is the key that was used to generate the cyphertext?” Once the key is known, any cyphertext generated by that algorithm using that key can always be very rapidly decrypted.
There are three basic conventional approaches to identifying the key that are known in the conventional art. One approach is the brute force method, where every possible key is run through the cryptographic algorithm, sequentially, one after another, until the right key is found. The amount of time that such an approach would take depends on the number of keys that need to be tested, and the one-way cryptographic function itself. For keys of length N, there are a total of 2N possible keys, which need to be tested sequentially.
Also, it is worth nothing that the process of inputting a key into a cryptographic function and testing the result for correctness has been optimized to a point where no further improvements are likely. Thus, for relatively short keys, such as 16-bit keys or 29-bit keys, this is a manageable problem. For longer keys, such as 56-bit keys or 64-bit keys, this is a problem that, given the current state of computer hardware, is at the edge of the capabilities of the hardware, if the result needs to be known in any reasonable amount of time. Longer keys, such as 80 bit or 128-bit keys, present an insurmountable problem for the brute force approach, given the current state of the computer hardware (and will likely remain so for any foreseeable future).
The brute force approach, however, has one major advantage—it is guaranteed to produce a result at some point in time (although that point may be reached relatively quickly, or may occur at a distant future), however, the fact that sooner or later one of the keys will be the right key is guaranteed by this approach.
Another approach is the use of cryptographic tables, where for a known plaintext (typically a string of all zeros, or, for frequently used document and file formats, a portion of the file representing header information, or file format information, which is always the same), is converted to cyphertext. Thus, a large table is generated, with plaintext-cyphertext pairs that correspond to each possible key. As a further optimization, the table can then be sorted by cyphertext. When the key for a particular newly-received cyphertext needs to be identified, all that needs to be done is locate the newly received cyphertext in the table (a process that even for very large tables does not take very long, on the order of seconds or at most a few minutes), and the key is then identified.
This process also has a major advantage from a mathematical perspective—it is always guaranteed to produce a result, since the table contains all the possible keys, and/or derivatively, all the possible cyphertexts for a particular plaintext (note that for each different plaintext, a separate table needs to be generated, which is why “standard” plaintexts are frequently used, either using commonly found headers and other information of that nature in files of known format, or strings of all zeros).
The two methods described above, therefore, represent the two opposite ends of the spectrum—in the first method, the brute force testing of each possible key requires a very long time, but does not require any “preparatory work” on the part of the attacker. In the second case, the process of identifying the key is very fast (in essence, trivial in comparison to other methods—on the order of seconds), however, the process of generating a table can be very time consuming—as of 2007, no such table exists for 64-bit keys, and several years of continuous computation would be necessary to generate such as table. Also, such tables can be very large—a table for 40-bit keys is on the order of 5 Terabytes, and a table for 64-bit keys would be 134 million terabytes in size—a clearly unfeasible amount of storage today. Although computer hardware is improving, attempting to do this the same thing for 80-bit keys is at the present time a virtually insurmountable challenge, and attempting to create such a table for 128-bit keys is, at the present time, a practical impossibility.
A third conventional method is typically referred to as “Rainbow Tables,” and represents a compromise, or a trade-off, between the two extremes. See Making a Faster Cryptanalytical Time-Memory Trade-Off, Philippe Oechslin, Advances in Cryptology—CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa Barbara, Calif., USA, Aug. 17-21, 2003, Proceedings, Lecture Notes in Computer Science 2729, Springer (2003) (which builds on the 1980 work of Martin Hellmann, which is sometimes referred to as “classical tables”). In the Rainbow Table approach, an algorithm is used to generate a relatively large table (but much smaller than the tables in the second method, described above—on the order of a few gigabytes for 40-bit keys), which corresponds to most, but not all, of the keys. Typically such as table, as shown somewhat simplistically in FIG. 2, can be represented by groups of keys, beginning with one start key, and ending with one end key, and with a certain number of keys in between (a typical number would be on the order of 10,000 keys between start key K1 and end key K1). The number is always the same, and therefore the number of keys between start key K2 and end key K2 is the same as the numbers of keys between start key K3 and end key K3 and so on. Only the start and end keys need to be stored, but not the intermediate keys in the chains, which are generated “on the fly.”
The Rainbow Table approach has an advantage in that it takes much less storage space to store such a table, compared to a table for every single key, as described earlier. However, the Rainbow Table approach has one significant disadvantage—it is not guaranteed to produce a result. Typically, when a Rainbow Table is generated, the parameters for generating the table are chosen as a compromise between several factors—the amount of time it takes to generate the table, estimated attack time complexity, the size of the table (where storage requirements are an issue), and the probability that the key will be found in the table. Typically such probabilities (usually where several Rainbow Tables are used) are on the order of 70-90% for a single table, and 95%, 99%, and sometimes 99.99% where multiple tables are used. The higher the probability of finding the key that the user desires, the larger the table, and the longer such a table will take to generate.
Depending on the application, the fact that the Rainbow Table is not guaranteed to produce a result may or may not be a problem. There are services available commercially, which, when provided with an encrypted document, can produce a key to the user requesting the service (frequently on-line). The fact that a very small percentage of users will not “get an answer” is not commercially a problem—it is often easier to simply give the users refunds in those rare cases where the key is not in the Rainbow Table. On the other hand, there are applications where a certainty of finding a key is relatively important—for example, in law enforcement or national security applications it is highly desirable to know that whatever method is used to attack the encryption is one that is guaranteed to produce a result.