JAVA, and other computer programming languages, allocate storage from a memory storage area, often referred to as the heap. JAVA also implements the concepts of threads. Threads are independent units of work that share a common, process local address space. Thus, while threads are a collection of units that can be individually dispatched, the threads share the common, local memory storage.
When processing threads in a language which allocates storage from a heap, especially in multiprocessor configurations, there is a potential for "false sharing" that will at least intermittently affect performance. Modern processors typically select cache line sizes of 128 bytes and sometimes larger powers of two. However, many modern object-oriented languages, especially JAVA, have a much smaller average object size (e.g. sizes of under 32 bytes are common). Sometimes, objects are created by two separate threads, but they happen to share the same physical cache line. Sharing the same physical cache line may happen by chance because the objects' virtual addresses were "close enough" to each other to share a cache line.
If two objects happen to share the same physical cache line and if the two threads access the objects in different processors, then the performance of the processing suffers. This problem occurs because the processors are forced to move the contents of the cache line from one processor to the other. Even a rare occurrence seriously impacts performance since certain forms of this occurrence (especially when at least one of the objects is being modified) cause operations that take 100 or so cycles to resolve when a typical load takes about 1 cycle. Worse, if the event occurs once, then it might happen repeatedly. This leads to situations that are difficult-to-reproduce. For example, a situation may occur where a program sometimes runs well and sometimes runs poorly.
This situation is called "false sharing." When this occurs, even though the hardware "sees" a shared cache line, and invokes worst-case hardware functions to maintain both machines in "synch," in fact the two objects in the same cache line have no dependency on each other. If the objects happen to fall in separate cache lines, nothing would happen except it would run faster. Thus, "false sharing" is an artifact of the cache line size being larger than the individual objects.
To avoid this situation, designers of allocation schemes resort to certain practices. First, using a somewhat drastic measure, the minimum object allocation can be made a cache line in size or 1/2 a cache line. While this scheme may waste much main memory storage, it results in little or no "false" sharing. But, there is no guarantee that a thread will stay on a given processor, so false sharing might still occur. The per CPU strategy also has the additional problem that sub-dividing the heap may introduce other inefficiencies known in the art. Second, CPU local heaps are utilized under the assumption that modern processor schedulers will try and keep the same thread running on the same processor whenever possible. This generally minimizes other cache performance problems. Thus, assuming threads are not often moved between processors, sub-allocating objects based on the current processor of the thread helps minimize false sharing. But even infrequent movement may reintroduce false sharing.
From the above, it can be seen that in the JAVA language running under multiple processors, there is a need to allocate storage from a thread local heap to minimize or eliminate false sharing.