1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for horizontal cache persistence in a multi-compute node, symmetric multiprocessing (‘SMP’) computer.
2. Description Of Related Art
Contemporary high performance computer systems, such as, for example, the IBM System z series of mainframes, are typically implemented as multi-compute node, symmetric multiprocessing (‘SMP’) computers with many compute nodes. SMP is a multiprocessor computer hardware architecture where two or more, typically many more, identical processors are connected to a single shared main memory and controlled by a single operating system. Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors. Processors may be interconnected using buses, crossbar switches, mesh networks, and the like. Each compute node typically includes a number of processors, each of which has at least some local memory, at least some of which is accelerated with cache memory. The cache memory can be local to each processor, local to a compute node shared across more than one processor, or shared across compute nodes. All of these architectures require maintenance of cache coherence among the separate caches.
Taking for example a computer with multiple levels of caches, the caches form a vertical structure with smaller caches towards the processor and consistently larger caches, called L1-L2-L3-L4, moving towards main memory. As data within this type of system is aged out from a given level of cache, due to more recent cache fetches requiring storage space, cache lines move from L1 to L2, then from L2 to L3, from L3 to L4, with an eventual write back to main memory as the eviction process completes.
In an architecture with multiple identical compute nodes having horizontal communications among one of the cache levels, L4 to L4 communications for example, then the same type of eviction policy going on in the system can additionally evict a cache line from one L4 to another L4, before completing the eviction out to main storage. This type of cache management structure for evicted cache lines is commonly seen in some variant across many contemporary multi-level/multi-compute node cache designs. One problem with this management scheme arises from the fact that as your observe the latency incurred in crossing each level/link between caches, the penalty increases significantly in magnitude. As a result, a typical processor fetch from L1 may incur a penalty of x, while the fetch from a corresponding L2 may incur a penalty of 3x, and from L3 would be 10x respectively, and so on, with main storage access being substantially higher in access penalty.
As a result of this exponential increase in latency penalty numerous schemes have been devised to improve caching algorithms, such that the selected cache lines for eviction are better chosen for a given system design and respective workload. Besides this, various prefetch algorithms have been created at different levels of caches in hardware and software to try and preempt the processors request for a given cache line such that the exponential effect of cache latency penalty could be avoided or diminished in some regard. The drawbacks of both of these types of solutions and many others are that they require large amounts of additional hardware and/or software support to provide any measurable gains.