Communication between cores in a multi-core processor is an important parameter in many computer applications such as packet processing, high-performance computing, and machine learning. On a general-purpose platform, shared memory space managed by software is often employed to realize inter-core communication. As the number of cores increases, communication between the cores may become a limiting factor for performance scaling in certain scenarios.
The above-described problem is magnified in architectures having a large number of cores, as additional overhead is required to manage communication among all of the cores, leading to high latency and low throughput. The overhead in such an environment includes software overhead related to maintaining the data structure in memory and flow control, as well as hardware overhead to maintain cache memory coherence.