“Data at rest” is the name that is commonly used for stored data, such as that residing on large hard drive or tape drive systems. Data at rest describes such data as it is disposed, for example, in large scale commercial data storage systems, such as might be used for credit card data and medical records. The security of data at rest is becoming an increasingly important topic as vast amounts of an individual's sensitive information are digitized and stored.
Several different methods and standards for protecting the confidentiality of such data have been developed and utilized. The Institute of Electrical and Electronics Engineers, Inc. has organized an effort to develop additional security measures for data at rest. This project is refining an advanced encryption standard that was originally developed by Liskov, Rivest, and Wagner, and is variously referred to as LRW-AES, AES-LRW, P1619, or tweakable narrow block encryption.
Such methods are developed to prevent malicious and unauthorized access to data, such as copy-and-paste attacks and dictionary attacks. These methods are based on standard AES, where both input and output of an AES encryptor/decryptor are XORed with vectors computed from a tweakable key and a data block's serial number (so-called whitening vectors). Compact and fast computation of whitening vectors is critical for numerous applications.
Let M and T be a so-called master key (128, 192, or 256 bits, depending on the implementation) and a tweakable key (128 bits) respectively, let D be a 128-bit data block, and let n be its serial number (address), also represented as a 128-bit number; then the 128-bit result R of LRW-AES encoding of data D can be expressed by the formula R=(Tn)⊕AES((Tn)⊕D, M), where AES(b, k) denotes the encoding operation of the cipher AES for data block b and key k,  is multiplication in the Galois field GF(2128) and ⊕ is addition in the same field (defined identically as bitwise XOR). This is depicted graphically in FIG. 1.
Decoding is a similar process. It can be expressed by the formula D=(Tn)⊕AES−1((Tn)⊕R, M), where AES−1(b, k) denotes the decoding operation of the cipher AES for the encoded data block b and key k.
Typically, thirty-two 128-bit data blocks used in the encoding/decoding process are grouped together to form 512-byte (i.e. 4096-bit) disk blocks, having sequential numbers n=32 m, 32 m+1, . . . , 32 m+31, where m is the address of the disk's block.
There are two main methods for implementing such computations. The first one directly follows the structure shown in FIG. 1. It requires a very effective multiplication unit to be used thirty-two times for each block. To achieve good performance, the multiplication must be implemented in unrolled pipelined form. Therefore, it requires tens of thousands of gates.
Another approach is based on some pre-computations. Namely, one may pre-compute and store 128-bit values of the formT(2−1), T(4−1), T(8−1), . . . , T(2128−1)  (1)
Now one can sequentially find any series of products Pn=Tn, where n=n0, n0+1, n0+2 etc. using only one actual multiplication for n=n0: each next value Pn+1=T(n+1) can be expressed from the previous one Pn=Tn as:
                                             P                          n              +              1                                =                    ⁢                                    P              n                        ⊕                          (                                                P                                      n                    +                    1                                                  ⊕                                  P                  n                                            )                                                                    =                    ⁢                                    P              n                        ⊕                          (                                                T                  ⁢                                    ⁢                                      (                                          n                      +                      1                                        )                                                  ⊕                                  T                  ⁢                                    ⁢                  n                                            )                                                                                =                        ⁢                                          P                n                            ⊕                              (                                  T                  ⁢                                    ⁢                                      (                                                                  (                                                  n                          +                          1                                                )                                            ⊕                      n                                        )                                                  )                                              ,                    
where (n+1)⊕n has the form 2m+1−1, with m equal to the number of trailing 1's in the binary expansion of n (or, m+1 is equal to the number of trailing 0's in the binary expansion of n+1). For example,1⊕0=12⊕02=12=1=21−1 (zero trailing 1's in 0=02)2⊕1=102⊕12=112=3=22−1 (one trailing 1's in 1=12)3⊕2=112⊕+102=12=1=21−1 (zero trailing 1's in 2=102)4⊕3=1002⊕112=1112=7=23−1 (two trailing 1's in 3=112)5⊕4=1012⊕1002=12=1=21−1 (zero trailing 1's in 4=1002)6⊕5=1102⊕1012=112=3=22−1 (one trailing 1's in 5=1012)7⊕6=1112⊕1102=12=1=21−1 (zero trailing 1's in 6=1102)8⊕7=10002⊕1112=11112=15=24−1 (three trailing 1's in 7=1112)9⊕8=10012⊕10002=12=1=21−1 (zero trailing 1's in 8=10002), etc.
Thus, the product T((n+1)⊕n) is one of our stored pre-computed values, and it is sufficient just to add (XOR) it to Pn to produce Pn+1. This method is efficient in terms of performance but requires a lot of memory.
The amount of required memory can be reduced if one may process 128-bit data blocks in a non-standard order. For example, usage of so-called Gray code (i.e. reordering number sequence in such a way that any two neighboring numbers have distinction in only one bit, like 0-1-3-2-6-7-5-4, i.e. binary 000-001-011-010-110-111-101-100) reduces the number of required pre-computed values from 128 to 7. But this kind of reordering may require changing the data transfer protocols, interfaces etc., or adding extra buffering, i.e. this method is not a universal solution.
Thus, additional implementations of the LRW-AES are needed, such as do not rely on expensive pipelined multipliers or extra random access memory, but still process data in natural order. While there are multiple existing implementations of LRW-AES, it is desirable to achieve a combination of compactness and maximum speed.