Current methods use a single circuit to decode an object memory address for finding a location in computer cache memories for storing and retrieving the object as a program is executed on computer processors (or cores). This causes most program objects to be stored in only very few cache memory locations, leading to underutilization of the cache memories and underperformance.
Current methods for keeping information secure and private rely on encrypting data with a cryptographic algorithm. Data confidentiality assures that the information is revealed to only authorized users. Encrypting data ensures that even if the unauthorized users get hold of the data they will not be able to extract any useful information, unless they have the key with which the data was encrypted. This leads to attempts (or attacks) to obtain the secret key. The attacks can be direct or indirect (side channel). They can be brute force or based on an understanding on how computer systems execute instructions and how information is stored internally in processor memory systems.
As shown in FIG. 1, computer systems or devices 100 containing one or more processors 102 typically use physical cache memory 104 to reduce the average time to access physical main memory 106. In modern computer implementations, additional levels of cache memories are employed between the physical main memory 106 and processor(s) 102. The cache memory 104 is communicably coupled to the processor 102 with a high-speed bus 108. The main memory 106 is communicably coupled to the processor 102 and the cache memory 104 with any suitable communication mechanism. The physical cache memory 104 includes data 110 and corresponding tags 112, identifying a part of the address to which the data belongs. One possible configuration of computer systems or devices 200 with multiple cache memories is shown in FIG. 2, but other configurations are possible. As shown in FIG. 2 some caches 202, often referred to in trade as Level-1 caches, are tightly coupled with a single processor (or core) 102 such that only that processor (e.g., 102a) can use the cache (e.g., 202a) for storing its program objects. Other caches 204, referred to in trade as Level-2, Level-3 and Last Level Caches (LLC), are normally shared by more than one processor (or cores).
To simplify the explanation of the following figures, and without loss of generality, the presence of a single cache memory (as shown in FIG. 1) will be assumed. Although not shown in the following figures, the cache memory 104 is communicably coupled to the processor 102 with a high-speed bus. The memory 106 is communicably coupled to the processor 102 and the cache memory 104 with any suitable communication mechanism. Referring to FIG. 3, computer memory systems 300, in general, and cache memories 104 in particular use a selected portion of an object address 302 to locate the object in the cache memory 104. Typically address bits from the lower end of the object address 302 are used for this purpose and these bits collectively are known as the Set Index 304; these bits are used as an index into the cache memory 104 to locate one or more memory locations (collectively called a cache set). Then, the “tag” bits 110 stored at these locations are compared with the tag bits 306 of the object address 302 to verify that at least one of the data items currently stored in the identified set of cache memory locations is the object being sought. The byte address bits 308 of the object address 302 are used to extract the needed data from the cache contents 108. A cache miss results if none of the tag bits 110 of the locations in the cache set match the address tag bits 306 of the object address 302. Then higher-level memories (including higher level caches and DRAM “main” memories) are consulted to obtain the missing object.
Selecting index bits from an address to define an address decoding can impact the cache accesses in terms of hits and misses. Extensive studies exist which show that selecting lower end bits for the set index 304, as done in most current implementations, can lead to non-uniform cache accesses. In other words, very few cache sets (or cache memory locations) receive most accesses, and these accesses can cause conflicts and thus increase cache misses, leading to poor execution performance. This behavior also implies that increasing the cache size does not lead to improved performance since only a small portion of the cache memory is utilized. There have been proposals to more uniformly distribute cache accesses. These proposals can be categorized into two groups. The first group proposes to select address bits as a cache set index such that addresses are randomly distributed among all cache sets (for example [3], [7]). The second group proposes to dynamically remap addresses from heavily utilized cache sets to underutilized cache sets (for example [8], [12]).
In other implementations, cache memories are partitioned and each portion is dedicated for a specific program object (such as array caches and scalar caches, [1], [5]), or each partition is dedicated for a specific function (such as system cache and user caches).
Current implementations of set associative cache involve grouping two or more cache memory locations (referred in the trade as cache lines or cache blocks) into a single set (as shown in FIG. 2). All the cache memory locations in a set have a single set index, and the set is located using common set index bits from memory addresses. The addresses of the cache lines within a set differ in a few higher bits located beyond the set index bits. While grouping several cache lines into a single set helps in removing some cache conflicts, the current implementations do not significantly improve the utilization of cache memories, unless the cache memory is designed as a fully associative cache. In a fully associative cache, all cache lines are grouped into a single set, requiring an exhaustive search through the cache to locate a needed program object. This would require very complex hardware apparatus.
A side channel attack is any attack that is based on information gained by observing (application) behaviors without actually interfering with an application (or modifying the application code in some manner). Observing execution times or energy consumed by applications, acoustic cryptanalysis, and differential fault analysis are among widely used side channel attacks. Timing attack is a form of attack that is relatively easy to perform by measuring how much execution time elapses for certain computations. Any unauthorized user can perform a cache-based side channel attack by measuring execution times for cache misses and thus retrieve keys vital for encryption algorithms to work. As illustrated in prior research [11] users can retrieve RSA and AES keys by performing such simple attacks with relative ease.
Presently, virtually every computer system uses caches. Thus, exploits based on cache attacks can impact a wide array of systems. Because they are easy to perform, adaptable to a wide range of systems, and require little resources to apply, software cache-based side channel attacks are a very appealing new form of system attacks. Current methods to alleviate cache-based attacks are attack specific and no general solutions have been available.
The Percival's attack [9] is a side channel attack. Encryption algorithms use parts of the private key (or blocks of the key) to index a table of pre-computed constants. These constants are used in encoding a plain text. If the attacker knows the address of the table used at any given step, he will know a part of the private key, since that part is used to address the table of constants when stored in cache.
If the attacker can cause cache misses when the table is accessed, the attacker can decode the addresses of the table entries (because the attacker knows the part of the address—or cache set index—that is the address of the table index). By causing cache misses (by running a thread concurrently with the victim thread) the attacker is able to assemble the private key, piece by piece.
The Bernstein's attack [2] is a side channel attack that depends on specific bits of the key that caused cache misses by measuring memory access times. This attack is based on the AES algorithm internals. It uses a part of the key and a part of the plain text. The bits from the key and plain text are Exclusive-Ored, and the result is used to index a table. Thus to find the hidden key the attacker does the following: assuming that the same table entry will cause a cache miss when accessed, one can observe which entry caused the cache miss by measuring execution time averages—cache misses causes longer execution times. The attack is performed in two steps. First, the attacker will use a known key and a very long plain text. This will ensure that the plain text contains all possible bit combinations when blocks of the plain text are used. The attacker then observes which known key combination and plain text combination caused cache misses. The second phase is to use another plain text and an unknown key to observe cache misses. Assuming that the cache miss is for the same value (as a result of the Exclusive-Or of the key and plain text) as that from the known key and known plain text, one can easily determine the bits of the unknown key.
Wang and Lee [11] propose two solutions to thwart the above attacks. In the first, cache lines are locked and cannot be evicted. This technique can be used to lock data used by encryption algorithms so that an attacker will never be able to stitch the key. This method is similar to the partitioned caches proposed in [6] to block side-channel attacks. The other solution proposed by Wang and Lee [11] is based on randomizing data placement and replacement policy. The randomization uses a table of permutations as shown in FIG. 4. The size of the table is equal to the number of sets in a cache memory. Each entry contains a new set address. It works as follows. The set-index bits of an address obtained using conventional cache addressing methods as described previously, are used to find an entry into the permutation table. The table provides modified set indexes that are used to access cache. The new replacement policy works as follows. Assuming that a k-way set associative cache is provided (that is k-cache memory locations are grouped into a single set), even if the attacker knows which set caused a cache miss, by randomly selecting one of the k lines of the set the attacker will not be able to identify the address of the line (since log(k) bits corresponding to the lines in a set are not decodable). Use of a fully associative cache, wherein all cache locations are grouped into a single set, is a generalization of this solution.
Wang and Lee [11] show how their designs can prevent the two types of attacks. It is easy to see that if you use locked cache lines, the first type of attack (Percival's attack) is not possible. If no other process can displace locked data, no information about which address is used for the table of constants is revealed. The solution to the second type of attack (Bernstein's attack) is based on the reasoning that since cache address mapping and replacement of lines within a set are randomized, the attacker cannot assume that the same value resulting from the Exclusive-Or of key and plain text will result in a cache miss. Wang and Lee's solutions come at a performance degradation and hardware cost. Locking cache sets for a specific application may lead to underutilization of the cache. The RP (random permutations) cache (FIG. 4) requires an additional access to the permutation table before cache could be read, lengthening the critical path from CPU to cache memory. Moreover, on a cache miss, it will be necessary to reverse the permutation (i.e., look up a table to find the original set index) so that a missing data could be brought from higher level memories into cache.
As a result, there is a need for a method and apparatus for protecting memory systems from a side channel attack that sustains system performance, which is flexible and does not require substantial hardware additions.