Applications which are being run on data processing systems employing parallel or multiple processor architectures will typically employ cache memory to bring data closer to the processor which is operating on that data. Cache memories are typically implemented as smaller, faster memory devices which store copies of the data from the most frequently used main memory locations. An issue associated with the use of cache memories is the design tradeoff between cache latency and cache hit rate. Larger caches have better hit rates, i.e., the percentage of times that a request for data from the cache can be filled by a stored data copy, but longer latency, i.e., the amount of time needed to serve a request. To address this tradeoff, some architectures use multiple levels of cache, with small fast caches (sometimes called Level 1 or L1 caches) backed up by larger, slower caches (sometimes called Level 2 or L2 caches).
There are three general ways to architect the relationship between main memory locations and memory locations within a cache memory. First, the cache memory can be directly mapped to the main memory such that there is one and only one cache memory location in which data associated with each main memory location can be stored. Second, at the other end of the spectrum, the cache memory can be fully associatively mapped to the main memory using a rule set which permits each main memory location to be mapped to any of the cache memory locations. Thirdly, between the first and second options, a set associative approach provides for each main memory location to be mapped to one of the cache memory locations within a particular subset of all available cache memory locations. More specifically, memory addresses are placed into cache sets according to their tag. When a replacement is due to find room for new address tag, a cache line is chosen by the processor following some internal criteria and then replaced. This latter, set associative technique is of particular interest for the present application.
Cache sharing impacts the performance of distributed applications running on multiple processors or cores (in this specification the terms “processor” or “processors” are used interchangeably with the terms “core” or “cores”, respectively). As shown for example in FIG. 1, a distributed software application can be considered to be an application running on multiple cores 100-106 and sharing data structures. The distributed application receives traffic from one or several network interfaces 108-112 and uses a configurable hardware or software packet input engine 114 to distribute packets to the cores (or to make packets available to the cores).
The packet input engine 114 is typically configured to provide a fair distribution of traffic among the cores 100-106 and to be able to re-organize the quantity of traffic each core can handle as the traffic varies. The packet input engine 114 can, for example, be implemented as hardware circuitry for scheduling input packets on the network interfaces for the cores or as a software packet filter running with the network interface drivers and pushing packets into core specific input queues. Once the packets are, for example, ordered into core specific input queues, a shared hash table can be used to lookup the corresponding data based on the packet information as input. For example such a lookup could be based on the 5-upple [source IP, destination IP, source port, destination port, protocol].
Hash tables are widely used to speedup data lookup and prevent collisions (e.g., due to similar data) by evenly distributing data into the table. This advantage becomes an issue for set associative L2 caches since hash results become suddenly spread all over the cache sets in a random fashion. This is likely to happen due to, for example, traffic growing in terms of address range and the fact that the L2 cache is shared between the cores. This, in turn, causes cache misses due to inter-processor conflicts, this phenomena is sometimes also referred to as “cache trashing” or “cache set invasion”. More specifically, cache set invasion is caused by an application running on one core stealing the set associative, L2 cache sets of the same application running on another core but with different input data. This makes the application's behavior prediction, resource usage and traffic shaping difficult to manage.
Accordingly, it would be desirable to provide software, methods, devices and systems which address these, and other, problems associated with cache management in multiprocessor systems.