Processing system based diagnostics come in a number of flavors, and can best be defined as systems and methods designed to locate hardware and/or software performance problems. System diagnostics are currently available in one form or another for all stand alone mainframe, mini and personal computer systems. In general, diagnostics systems produce messages during the malfunctioning of a process activity. These messages identify the location of a software or hardware performance problem. The problem however, is that existing diagnostics systems fail to explain why a particular processing activity is malfunctioning. The existing problem is further compounded should the plurality of processing system nodes exist within a parallel processing environment.
In Reduced Instruction Set Computing (RISC) based architectures, a key cause of poor performance is memory accessing patterns which give rise to a large number cache misses. This anomaly is commonly known as "cache thrashing," which is the constant swapping of memory regions between a main memory and a cache. While accessing a cache requires less time than accessing main memory, whenever a cache miss occurs the processing system must request a memory region transfer from main memory to cache. If cache misses occur with regularity, cache latency problems are caused.
In multi-processor systems, cache thrashing is further aggravated because multi-processor systems by design have a greater number of opportunities for cache misses while attempting to access data. For example, a situation can arise where a first processor will encache a piece of data followed by a second processor requesting that same data, this causes the data to be evicted from the first processor's cache into the second processor's cache. Subsequently, the first processor may re-request that same data. What results is a "ping-pong" effect whereby data is bounced back and forth over a cache line from the first processor's cache to the second's. When evaluating a processing activity, it is difficult to determine whether the processing system is running into typical cache thrashing or running into the "ping-pong" effect due to a multi-processor configuration. This problem is further compounded in parallel processing environments where the processing activity is distributed across multiple caches.
In multi-processor systems with non-uniform access times (latencies) to main memory, cache thrashing is further aggravated because a memory access to one part of memory make take significantly more or less time than an access to another part of memory. For example, an access to memory close to a processor may take one unit of time, while an access to memory far from a processor may take 50 units of time. Thus, accesses to memory far from the processor would idle a processor 50 times more than access to memory near to the processor. For performance diagnostics, it would be grossly misleading to count each of these memory accesses (i.e., cache misses) equally.
Accordingly, there exists a need in the art for a diagnostics system which provides a means for isolating and diagnosing processing system performance problems due to poor cache behavior.
There further exists a need in the art for a diagnostics system which is designed to monitor cache misses within a processing network supporting parallel processing.
Lastly, there exists a need in the art for a diagnostics system which is designed to monitor the time period during which a single processor within a multi-processor system is idle due to one or more memory accesses.