Presently, computer systems and the operation thereof are utilized in all facets of modern life. Accordingly, there is a need for many different types of software to operate on computer systems. However, with the mix of operating systems and the software operating thereon, the probability of technical errors and computer system crashes and/or slow downs are high.
Coherence is a constraint in any SMP (symmetric multi-processing) architecture. Memory in caching SMP systems must be kept coherent. For example, in caching SMP systems there are two or more computing devices (e.g., central processing units (CPU's)) accessing any number of cache lines. Thus, it is necessary that if one of the CPU's writes something to data, another CPU will subsequently be able to read from the same data and get the results that the first CPU wrote.
For example, with reference to the caching SMP system 100 of FIG. 1, a normal CPU (e.g., 105, 107, or 109 of FIG. 1) has a cache (e.g., cache 110, 113, or 115) that is accessible via a bus 150. In general, each cache (or line within a cache) has an idea of ownership. When a load is performed from a certain address, the load of data goes into the cache and an amount of ownership is associated with the data. For example, if a read is performed on the data in the cache line, the ownership may be shared. That is, if one CPU (e.g., 105) reads from a specific cache line (e.g., cache 113 line b) that data is marked as shared and no action is performed on it. However, if CPU 105 performs a write, then the status of the data from cache 113 line b needs to be changed from shared to exclusive. That way, no other CPU (e.g., 107 or 109) can make simultaneous changes to the data which may result in the line of data being corrupted or a system error to occur.
However, if a cache line contains a memory object such as an array of structures, another sharing method may be involved. For example, if there is an array of structures in cache 113 line b. One thread (e.g., CPU 105) may access a first element in the structure, a second thread (e.g., CPU 107) may access a second element in the structure, and a third thread (e.g., CPU 109) may access a third element in the structure. Thus, three elements within the array may be accessed at the same time. Furthermore, each of the elements may be manipulated and the data may increment per element without error. In addition, since the elements of the array are parallel data structures with disjoint data elements, the three users are not interfering and in many cases the software will scale the effect. That is, with three CPU's accessing three distinct elements, the operations will occur three times faster than if just one CPU was accessing one element.
However, a deleterious effect may occur with the above stated array structure when the element size of the array structure is less than the size of the cache line. For example, if the size of the element is 16 bytes and the size of the cache line (e.g., cache 113 line b) is 64 bytes. Then all three structures may live on the same cache line (e.g., cache 113 line b). Therefore, although the array structure has been designed so that the three users accessing the elements don't interfere (e.g., no fighting is done over the elements of the array), there is fighting between the single cache line (e.g., cache 113 line b). For example, each CPU may be fighting for ownership of the cache line. This fighting degrades performance considerably, and progress may be made as though only one CPU were operating at a time instead of all three. This is known as false sharing.
In general, false sharing is extremely difficult to detect due to the fact that contention in a system is recognized by finding contention within the hardware of a system and not contention with the software. For example, since each of the three CPU's are editing their own data elements within the array structure, they each have their own lock on the data and no contention between the locks on the data is occurring. However, because they are all mapped to the same cache line, the bus is being overused and inter-device fighting is occurring.
Thus, false sharing is normally found when it has already become a problem, and the resolution of false sharing normally occurs with keen intuition and good luck. For example, when a performance problem is encountered and is reduced to a lack of sufficient parallelism, it is intuition to look for fault sharing. Furthermore, it takes an amount of good luck to find the right structure.