A unique challenge of packet processing is to maintain stability while maximizing throughput and minimizing latency for the worse-case traffic scenarios. On the other hand, the latency associated with a single external memory access within network processors is usually larger than a worse-case service time. Consequently, modern network processors are usually implemented with a highly parallel architecture with multiple processors. Each processor can support a plurality of processing threads (applications).
Additionally, network applications may also be highly parallel and are usually multi-threaded and/or multi-processed for purposes of hiding long memory access latencies. Whenever a new packet arrives at a network processor a series of tasks (e.g., receipt of the packet, routing table look-up, enqueuing, etc.) is performed on that packet by a new thread within the network processor. However, updates associated with the global data or the packet for packet processing have to be performed in a pre-defined thread order and in an atomic fashion in order to ensure that the integrity of the packet's processing is maintained amongst multiple competing threads that may update the data or the packet.
To ensure packet-processing integrity, an ordered section or update process for the global data or the packet is typically implemented within network processors. In this process, packets are distributed to a chain of threads in an order in which the packets are received. Each thread has to wait for a signal from a previous thread before entering its ordered section update process; after the signal is received, the waiting thread can read the data or the packet, modify it, and write it back to memory, and then send a signal of completion to a next waiting thread.
This process creates latencies for non-volatile operations, such as the latency associated with read operations that cannot be hidden even with a multithreaded and/or multiprocessing environment.