This invention relates to a speculative memory fetching method, and more particularly to providing a method, system and computer program product for preventing lockout and stalling conditions in a multi-node system with speculative memory fetching.
Computer systems have developed from a single processor system to a large symmetric multi-processor system (SMP) that is commonly found in today's corporate infrastructure. An SMP system can be defined as a multiprocessor computer system where two or more identical processors are connected to a single shared main memory. As these systems have evolved, methods for improving processor request response times have been a critical part of the design process for these systems.
In existing large SMP systems, while processor frequency and overall system performance has grown dramatically, memory response times have not matched these rates of improvement. In order to overcome this problem, SMP systems include algorithms such as multi-level caching, processor/hardware initiated pre-fetching, and software pre-fetching hints. Although, the use of these algorithms increase the overall system performance, each one has failed to address unique issues present in large SMP systems. For example, in large SMP systems, multiple levels of caches are interconnected vertically from the processor to the memory (referred to as processor stacks or nodes). These vertical processor stacks or nodes interconnect with other vertical processor stacks or nodes via one of the shared levels of caches.
FIG. 1 illustrates a conventional node 10 including a plurality of processors 11, 12, 13, 14 and 15 interconnected by a shared level cache 16 with a storage/memory 17 shared among the processors (11, 12, 13, 14 and 15) and common I/O devices 18 which is interconnected with other nodes within a multi-node system through interconnected buses 19. FIG. 2 illustrates a conventional multi-node system 20 including a plurality remote nodes 21, 22, 23 and 24 and a plurality of interconnect buses 25 which connects the remote nodes 21, 22, 23 and 24, which follow a given coherency protocol.
While this interconnectivity increases the aggregate amount of shared cache within the system and increases the chance of finding a line within the shared level of cache, it also requires each cache to be searched, either in parallel or sequentially, depending on the interconnectivity of the processor stacks or nodes, before a hit or miss state of the line within the system can be determined. As a result, on a fetch operation extra latency is incurred while the search of each cache takes place, and can delay the launch of fetch operation to memory.
In order to overcome this problem, some SMP systems include speculative memory fetching, where the target/home memory is speculatively accessed while the state of line within the shared level of caches is determined within the system. Inasmuch, when the line does not exist in any of the shared caches, the leading edge memory access latency would be reduced by the amount of time required to poll the shared caches within the system. However, speculative memory fetching does not work well when contention is encountered on the line address within the system, because the speculative memory fetch has to be cancelled and the shared cache level polling sequence of the system needs to be restarted. This restarting/recycling of the cache polling includes relaunching the speculative memory fetch request.
Recycling of the shared cache search sequence increases the amount of aggregate memory requests within the system by a factor directly proportional to the amount of line contention encountered in the system. A system having minimal traffic would not notice any abnormal increase in memory traffic, however, a system experiencing high address contention would see the memory request rate grow exponentially with the amount of address contention in the system. Thus, the memory access rate would increase to a point where non-speculative memory access requests would be locked out by the volume of speculative memory fetch requests with the system, until the contention is resolved which increases the aggregate amount of latency incurred by the operation.