The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for detecting false sharing misses for use in a performance monitoring unit or other types of system optimization.
In shared memory multiprocessor systems, multiple software threads share data through reads and writes to shared memory locations. These systems usually include caches of memory locations local to each processor that are managed using invalidation-based cache coherence protocols, such that when a shared memory location is written by one processor, that shared memory location is removed (“invalidated”) from the cache(s) of all other processors in the system. When those processors subsequently read or write the memory location, the memory access will cause a cache miss, leading to an additional latency penalty in order to retrieve the data from the cache that wrote the location. In some applications, these penalties may account for a significant fraction of execution time. In the literature, such cache misses are referred to as coherence misses or communication misses.
Cache coherence protocols operate at the granularity of a cache line, whose sizes range, for example, from 32 bytes to 256 bytes in most computer systems. Because of this coarse granularity, such coherence misses may occur even though two processors are not touching the same data. One processor may write a subset of a cache line and the other processor may access a mutually exclusive subset of that line. However, the second processor will still observe an additional latency penalty upon access. In the literature, these misses are referred to as “False Sharing Misses”.
For software developers, it is useful to differentiate between coherence misses that are due to false sharing and coherence misses that are due to the true sharing of data. Each type of miss will lead to a different software optimization strategy to avoid such misses. Although performance monitoring units may be able to detect coherence misses, the same performance monitoring units in current state-of-the-art systems are not able to differentiate between false sharing and true sharing misses.