Modern computing devices have become ubiquitous tools for personal, business, and social uses. As such, many modern computing devices are capable of connecting to various data networks, including the Internet, to transmit and receive data communications over the various data networks at varying rates of speed. To facilitate communications between computing devices, the data networks typically include one or more network computing devices (e.g., compute servers, storage servers, etc.) to route communications (e.g., via switches, routers, etc.) that enter/exit a network (e.g., north-south network traffic) and between network computing devices in the network (e.g., east-west network traffic). In present packet-switched network architectures, data is transmitted in the form of network packets between networked computing devices. At a high level, data is packetized into a network packet at one computing device and the resulting packet transmitted, via a transmission device (e.g., a network interface controller (NIC) of the computing device), to another computing device over a network.
Upon receipt of a network packet, the computing device typically performs one or more processing operations (e.g., security, network address translation (NAT), load-balancing, deep packet inspection (DPI), transmission control protocol (TCP) optimization, caching, Internet Protocol (IP) management, etc.) to determine what the computing device is to do with the network packet (e.g., drop the network packet, process/store at least a portion of the network packet, forward the network packet, etc.). To do so, such packet processing is often performed in a packet processing pipeline (e.g., a service function chain) where at least a portion of the data of the network packet is passed from one processor core to another as it is processed. However, during such packet processing, stalls can occur due to cross-core snoops and cache pollution with stale data can be a problem.