One of the factors that can limit the performance scaling of large symmetric multiprocessor (SMP) systems is the occurrence of highly referenced data that is shared amongst multiple processors in the system. In SMP systems there is some sort of cache coherency protocol that is enforced to guarantee a consistent view of memory contents by all processors in the system. The most popular protocols in the industry are modified, exclusive, shared, invalid (MESI) and modified, owned, exclusive, shared, invalid (MOESI). These are commonly known as write once protocols as the occurrence of the first write to a cache line will cause all other cache resident copies of this line to be invalidated.
Shared data can be classified into two broad categories. The first is true sharing, a situation in which the data memory locations are being shared by two or more processors in the system. This type of sharing is common to a large class of commercial applications. The second form of sharing is commonly referred to as false sharing. This is a situation in which two or more processors are referencing totally independent data items that just happen to reside on the same cache line as a happenstance. There are a variety of situations that can lead to false cache line sharing; however a common source is the operating system. In this instance the sharing is generally a result of global variable access when the operating system executes on various processors as part of its normal system operation.
The cache line invalidation that occurs as a result of the cache coherency protocol enforcing a consistent view of memory has the unintended result of causing the cache miss rates for processes that share data to increase. The increase in cache miss rate can be extremely high for processes that have high reference rates to shared data. The increased miss rates in turn increase bus, and possibly memory, utilization thereby increasing apparent bus (memory) latency. The combination of increased miss rate and increased latency has the net effect of degrading the performance of processes that share data, which progressively increases as the number of processors increase thereby limiting performance scaling.
Given the serious performance impact that can result from false sharing as well as highly referenced truly shared structures one would expect that there are would be effective means for identifying the data structures that are responsible for either or both types of data sharing. Unfortunately, current techniques generally require heavily obtrusive compiler inserted instrumentation. While this may work for workloads comprised of a collection of homogenous processes it tends to be every ineffective for heterogeneous workloads. At best the software instrumentation approach is heavily obtrusive and generally can not be used at customer sites, as this would require taking the application down to install instrumented software. If the desire is to be able to identify the sources and true/false data sharing without any software or performance impacts then another approach has to be developed.