Network devices, e.g., firewalls, switches, routers, storage/compute servers or other network attached devices often utilize multiple core processor systems or multiple-processing unit systems to achieve increased performance. However, processing streams of data, such as network packets, with systems having multiple processing units can present many programming challenges. For example, it is often difficult to move processing of a packet or set of packets from one processing unit to another, such as for load balancing across the processing units. Transitioning program execution from one processing unit to another can be difficult and often requires brute force movement or mapping of state, cached data, and other memory pieces associated with the program execution. Maintaining consistency of cached data and other memory across processing units while achieving high-throughput and utilization is often extremely technically challenging. For example, when using coherent memory, significant processing overhead and delays may result from operations performed by a memory coherence protocol. When using non-coherent memory, the overhead of the coherence protocol is avoided, but some processing units might not have access to data cached by another processing unit.
For example, memory can be shared in multiprocessor or multi-core systems having two or more simultaneously operating processors, each having one or more local memory caches. However, if one processor or core changes data at a particular memory location, procedures generally exist to notify all processors or cores of the change, to invalidate the respective local memory caches or refresh the caches with the updated information. This procedure is commonly known as a memory coherence protocol, and memory operating in accordance with the protocol is known as coherent memory. Typically, supporting coherent memory requires tracking cache line state and handling associated transactions for all memory blocks that are cached within the processor or processing cores and other elements of the system.
In contrast, non-coherent memory does not provide for tracking and updating data to maintain cache coherency. Without the processing overhead and delays associated with conventional coherent memory systems, memory access and utilization can be very fast and efficient. There is a large body of applications that do not benefit from a coherent memory system, particularly ones that process data linearly (e.g., process once, therefore have accesses with poor temporal locality). These “stream” applications, such as networking and storage infrastructure workloads, are increasingly important in large scale datacenters. For such applications, using a coherent memory system tends to result in significant overhead with little benefit in return.