Packet processing can be modeled as a sequence of classification operations and actions. Classification operations involve matching a packet against a flow table to identify a highest priority match, which specifies the actions to execute for the packet: how to modify packet headers, where to send the packet, to which classification stage to proceed next, or whether to drop the packet.
With the above model in mind, standard L2 and L3 network forwarding operations can be modeled as a sequence of classification operations and their corresponding actions: classifications are either about matching over L2 destination MAC address or doing a longest prefix matching over the destination IP address. More complicated matching may be incorporated, to include chaining classifications to simulate arbitrary L2/L3 topologies, policy routing that matches over arbitrary fields, and using other packet header fields to implement ACLs.
The classification and actions operate over only the standard packet headers in a stateless manner, and do not inspect payload. Thus, all packets with similar packet headers will receive similar treatment; to accommodate middlebox services that can modify the payload or perform other stateful operations, actions may be included that send the packet to such services.
Implementing such a packet processing pipeline in software (e.g., in a software virtual switch) utilizes CPU resources for four types of tasks                moving packets from the NIC(s) through the layers of operating system software into the classification and back to the NIC(s) for sending out packets        classification of packets (i.e., identifying the actions to execute)        executing the packet header field transformations based on the identified actions        executing services (i.e., applying payload transformations)        
Moving of packets between the NIC(s) and the software is primarily dealt with using principles demonstrated by dpdk (a Linux library for packet processing), netmap, and pf_ring. That is, the software layers between the NIC and the classification pipeline are mostly removed. Similar principles apply to the execution of the packet header field transformations: memory accesses should be minimized, packet copies removed, memory allocated proactively, and locality of execution guaranteed through a run-to-completion model, by processing a single packet using a single CPU core, without threading or process context switches, merely as a chain of function invocations, before sending the packet further along.
However, classification and the execution of services need to be implemented in a more efficient manner. Classification on a general purpose CPU using standard DRAM (e.g., on a standard x86 machine) is computationally expensive, which is why special purpose network appliances use specialized memory chips (e.g., TCAM and CAMs). For arbitrarily large logical topologies, the number of classification operations required corresponds to the complexity of the logical topology and its configuration (e.g., ACLs, etc.). For more complex configurations, more classification operations are required, which use more computing resources.