There is a strong architectural trend towards slowdown in processor clock frequency increase and use of multi-core systems, where improving performance from one generation to another relies primarily on increasing the degree of parallelism exploited by software rather than the increase in clock frequency of the underlying processor. Many multi-threaded applications, however, have problems with scalability on multi-cores due to issues such as synchronization and cache coherence. In highly threaded applications that extensively use threads that use locks to share data among them (such as, for example, Java applications), synchronization can become very expensive if the threads are distributed across various central processing units (CPUs) of the system. This can degrade the performance of these applications significantly, especially on a system with a large number of cores.
The scheduling of threads to the processors can be performed by an operating system (OS) based on the current workload and various factors such as, for example, affinity of a thread to the CPU where it had run previously. There is a possibility that threads that contend for the same locks (and that share the same data) will be assigned to different processors by the scheduler, and therefore one would have multiple copies of the data which will have to be synchronized to maintain cache coherence.