Caches are generally small, fast storage buffers employable to store information, such as code or data, in order for a program running in a processing device to execute faster. Typically, it is faster for the processing device to read the cache than to read a main memory. Also, with the rapid increase of intensive computational requirements, their importance in a computing system will only increase.
A cache structure is conceptually a matrix of S*W cache lines (conceptually cells) arranged in S sets (conceptually rows) and W ways (conceptually columns). The set (that is, the row) in which a piece of data is placed into cache is determined by the placement policy. The placement policy implements a hash function that uses certain bits of the memory address in which the piece of data is stored to map that piece of data into a specific cache set (row). From the point of view of the cache, different pieces of data can be grouped into cache lines, or simply lines. Since different memory lines can collide into the same cache set, cache sets have a given number of cache lines called ways (that is, columns). All sets have the same number of ways, which determine the associativity of the cache. Hence a W-way set-associative cache has W ways per set. For a given set, the way (cache line) in which a memory line is placed is determined by the replacement policy. On the event of a cache access that turns out to be a miss, the replacement policy selects, among all the ways (columns) in a given set (row), which memory line is evicted to make room for the new memory line. The new memory line is then stored in the cache line whose contents (a memory line) have been just evicted. Each line can be composed by one or several words. The granularity of a word is usually measured in bytes (e.g. 1, 2 or 4 bytes).
The timing behaviour of a cache is mainly determined by its placement and replacement policies. The line size, the word size and other cache parameters may also affect the timing behaviour of the cache. For a given cache configuration both, line size and word size, are fixed.
Randomised caches in high-performance processors have been proposed to remove cache conflicts by using pseudo-random hash functions [A. Gonzalez et al. Eliminating cache conflict misses through XOR-based placement functions. In ICS, 1997.][A. Seznec and F. Bodin. Skewed-associative caches. In PARLE. 1993.][Nigel Topham and Antonio González. Randomized cache placement for eliminating conflicts. IEEE Trans. Comput., 48, February 1999]. However, the behaviour of all those cache designs is fully deterministic. This means that, whenever a given data input set for a program makes that program generate a pathological access pattern, this pattern will repeat systematically for such input set in all runs of the program. Therefore, although the frequency of pathological cases is reduced, they can still appear systematically because there is no way to prove that their probability is bound.
The real-time domain Probabilistic Timing Analysis (PTA) (see, for example, [F. J. Cazorla et al. Proartis: Probabilistically analysable real-time systems. Technical Report 7869 (http://hal.inria.fr/hal-00663329), INRIA, 2012], [D. Griffin and A. Burns. Realism in Statistical Analysis of Worst Case Execution Times. In the 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2011), pages 44-53, 2010] or [J. Hansen, S. Hissam, and G. A. Moreno. Statistical-based WCET estimation and validation. In the 9th International Workshop on Worst-Case Execution Time (WCET) Analysis, 2009]) has emerged as a promising effective solution to the problems of current WCET analysis techniques, namely, static timing analysis and measurement-based timing analysis.
PTA imposes new requirements on hardware designs. More specifically, the cache design PTA techniques require that the timing behaviour of memory accesses can be defined by the pair of vectors:{{right arrow over (l)}i,{right arrow over (p)}i}={{li1,li2, . . . ,liN},{pi1,pi2, . . . ,piN}}where li lists all the possible latencies the memory hierarchy can take to serve the data and pi its associated probability of occurrence. It is noted that probability of occurrence of a given latency is different from its frequency: while frequency provides information about past events, probabilities enable providing guarantees about the occurrence of future events (see, for example, [F. J. Cazorla et al. Proartis: Probabilistically analysable real-time systems. Technical Report 7869 (http://hal.inria.fr/hal-00663329), INRIA, 2012]). Hence, for the case of a cache resource PTA requires that for each access there is a computable probability of it to hit or miss in cache.
Random replacement policies exist to make random the selection of a cache line (cell) inside a cache set (row). However, existing placement policies are purely deterministic based on the address accessed. Therefore, whether accesses to two different addresses compete for the same cache set depends solely on their particular addresses and the placement function used. Hence, if two addresses are placed into the same cache set at the beginning of the execution, they will collide in that cache set always during the execution of the program and across all runs of the program. Since the behaviour is purely deterministic no true probability can be computed. Hence, the probabilistic properties required by Probabilistic Timing Analysis (PTA) are not fulfilled.
In processor security, standard non-randomised caches are vulnerable to leakage of critical information such as cryptographic keys. Attacks to standard caches rely only on the timing difference between cache hits and misses. Breaking the determinism between accesses and whether they hit or miss by using random-replacement caches or the like, makes hits and misses occur with a given probability, improving security since information is obscured to attackers.
Overall, so far only those caches with one cache set (no placement function required) and a plurality of cache ways implementing random replacement are suitable for PTA techniques and to reduce security vulnerabilities. Unfortunately, those caches, also known as fully-associative caches, are typically expensive in terms of energy and area and do not scale well.