Multithreading on multiple cores is often used in computing. Multiple cores are used in a variety of devices, including smart phones, tablets, laptops, workstations, supercomputers, and data centers. Multithreading is a programming and execution model which utilizes the underlying hardware resources by running different threads on different hardware cores concurrently. These threads may share data, files, and input/output (I/O) in order to facilitate cooperatively completing a specified task.
One challenge in multithreading is false sharing, which is related to cache usage. Cache, which is accessed much faster than main memory, is used by central processing units (CPUs) to accelerate program executions. Before accesses, the CPU checks whether the data to be accessed is in the cache. When the data is already stored in the cache, the CPU directly accesses the data from the cache, reducing access latency by avoiding accessing the slower main memory. When the data is not already stored in the cache, the CPU automatically fetches the data to the cache from the main memory in blocks of a fixed size, referred to as cache lines.
In an example multicore system, the cores have their own private caches. Thus, data accessed by threads running on different cores may be duplicated in caches of those involved cores. A cache coherence protocol is invoked to facilitate correct accesses from different threads concurrently. When the data of a cache line in one core has been changed, the cache coherence protocol invalidates other copies of the same cache line in other cores so changes made by one core are seen by the other cores.
This cache line level coherency creates a false sharing problem. When threads running on different cores access different locations in the same cache line, every write by one core on the cache line invalidates the cache line copies on the other core. As a result, frequent cache line invalidation may degrade performance, because other cores with their cache entries invalidated have to re-fetch the data from the main memory, using CPU time and memory bandwidth. Also, false sharing may further degrade performance when a system has more cores or a larger cache line size.