1. Field of the Invention
This invention generally relates to computer processing and, more particularly, to a system and method for packet processing that prevents the occurrence of cache thrashing.
2. Description of the Related Art
FIG. 1 is a schematic diagram depicting the flow of processing in a conventional packet system (prior art). A major problem in packet processing involves the latency in reading packets stored in an external system memory, such as a double data rate (DDR) memory, for further processing by a central processing unit (CPU) or processor. For incoming packets, the CPU generally allocates a buffer descriptor ring. When packet P1 arrives on the input/output (IO) interface, this packet is copied to system memory and packet-related information such as packet address, packet length, and other information are copied into packet descriptor 1. Then, the IO triggers an interrupt to the CPU. The CPU reads buffer descriptor 1 from the buffer descriptor ring, and finds out information about packet. The CPU issues a read request to the address where the packet is residing in system memory. The CPU is blocked until it gets the data it requested from system memory. Generally, access to system memory is expensive and the latency is in terms of hundreds of nanoseconds. In contrast, the L2 cache latency is about 30 ns and L1 cache latency is on the order of 1-5 ns. This memory read process is performed for every packet, and it introduces significant delay in packet processing.
FIG. 2 is a schematic diagram depicting the use of packet stashing in a packet processing system (prior art). To address the above-mentioned packet data read latency issue, a stashing technique may be used. As in the system of FIG. 1, when the packet P1 arrives on the IO interface, this packet is copied to system memory and packet-related information is copied into packet descriptor 1. The IO triggers an interrupt to the CPU, which reads buffer descriptor 1 and discovers information concerning the packet. Then, the CPU issues a read request to the address where the packet is residing.
The Ethernet interface also sends a sideband signal to the IO bus, indicating that the packet needs to be stashed into the cache. After getting this signal from IO), the CPU system bus copies the packet into the external memory. A cache entry is also created, the packet is copied into the CPU cache, and the cache lines are marked as valid. Then, the IO sends the interrupt, including packet descriptor 1 to CPU. The CPU reads this descriptor to obtain the address of the packet. The CPU issues a read request to the packet address in system memory. The cache controller receives the read request for this address and it finds that the cache lines for this address exist, and that they are valid. So, there is no need to access the packet in system memory. This process reduces system memory latency, which is generally 10 times slower than cache latency.
However, if the packet incoming rate increases faster than the packet processing rate, the stashing process results in cache thrashing. For example, while the CPU is processing packet P1, it is possible that packet P2 is already being copied to system memory by IO and packet descriptor 2 is also prepared. Advantageously, packet P2 already resides in cache before the CPU begins processing it. However, when several packets arrive as a burst, faster than can be processed by the CPU, the CPU cache may become filled with received packets waiting to be processed by the CPU. The IO continues to issue cache stash requests for arriving packets without regard to the full CPU cache status. The caches generally use a LRU (least recently used) algorithm to evict lines and add new lines in cache. As a result, the newer packets being added to cache displace the older, yet to be processed packets. As the CPU always reads packets in order, CPU issues read request for the older packets that are no longer in cache. So, the older packets must be read from system memory, incurring the system memory read latency penalty, while at the same time throwing packets out of the cache. In this scenario, cache trashing causes a greater read latency problem then if no stashing is used.
It would be advantageous if a cache stashing approach could be used for packet processing in a manner that avoided the above-mentioned cache thrashing problems.