1. Field of the Invention
The present invention is generally directed to computing operations performed in multi-processor computer systems. More particularly, the present invention is directed to reducing processor cache-coherency probe traffic resulting from false sharing of data in multi-processor computer systems and applications thereof.
2. Background
A multiprocessor computing system includes a main memory and a plurality of processors. Each processor can read from and write to the main memory. In addition to the main memory, each processor includes a cache memory, or simply a cache. The cache of a processor can be accessed by that processor faster than that processor can access the main memory. Thus, each processor stores frequently accessed data in its cache.
Consequently, multiple processors in a multi-processor system can each hold a copy of data corresponding to a single location in the main memory. Because each processor can access its own cache faster than it can access the main memory, each processor has the potential to update its local copy of the data before the updated data is stored in the main memory. If one of the processors modifies its local copy of the data and the other processors do not receive those modifications, the local copy of the data in each of the other processors may be out-of-date.
Conventional processors in a multiprocessor system implement one or more cache-coherency protocols to signal changes to cached data shared by multiple processors. Example cache-coherency protocols include, for example, MOESI, MESI, MESIF, and others. The signals that are broadcasted are termed probes or snoops.
Unfortunately, the sharing of cached data between processors in a multiprocessor system can lead to false sharing. False sharing occurs when multiple processors each store a local copy of a cache line, but each processor accesses a different data object/memory block of the cache line.
For example, a first processor and a second processor may each store a local copy of a cache line that includes two data objects—a data object A and a data object B—wherein the first processor accesses only the data object A and the second processor accesses only the data object B. Conventionally, if the first processor modifies data object A of its local copy of the cache line, the first processor will send a probe to the second processor, causing the second processor to update its local copy of the cache line even though the second processor is not accessing data object A. The first and second processor in this example are involved in false sharing because, although they each store local copies of the same cache line, they are each accessing different data objects of the cache line. False sharing is inefficient and leads to performance overhead and is, therefore, undesirable.
Conventional solutions for dealing with false sharing are software-based solutions. One such software-based solution is to pad data to insure that data objects that are accessed by two different processors do not fall on the same cache line. For example, if the first processor accessed only data object A and the second processor accessed only data object B, then this conventional solution would be to pad the data so that data object A falls on one cache line and data object B falls on another cache line.
This type of conventional solution is problematic for several reasons. For example, padding the data increases the memory footprint, thereby affecting performance because worthless data (i.e., the padding data) must be moved on a systems data busses.
Given the foregoing, what is needed is an improved manner for dealing with false sharing in multiprocessor systems.